Do you own a Debenu Quick PDF Library version 7, 8, 9, 10, 11, 12, 13 or iSEDQuickPDF license? Upgrade to Debenu Quick PDF Library 14 today!

Debenu Quick PDF Library - PDF SDK Community Forum Homepage
Forum Home Forum Home > For Users of the Library > I need help - I can help
  New Posts New Posts RSS Feed - Copying/Searching unicode chars in a PDF
  FAQ FAQ  Forum Search   Register Register  Login Login

Copying/Searching unicode chars in a PDF

 Post Reply Post Reply
Author
Message
stakon View Drop Down
Team Player
Team Player


Joined: 09 Oct 09
Status: Offline
Points: 22
Post Options Post Options   Thanks (0) Thanks(0)   Quote stakon Quote  Post ReplyReply Direct Link To This Post Topic: Copying/Searching unicode chars in a PDF
    Posted: 08 Jan 10 at 11:49AM
Good day and happy new year!

Today i came across a problem i had never noticed before:

I select some unicode (greek)  text from a PDFdocument, copy and paste it in a word or txt document and what i get is some unreadable characters...

I guess this is also the reason why a search within the PDF for greek characters  returns no result.

As a note i used embedded fonts when creating my documents:
cour_gr = pdf_dll->AddTrueTypeFont("Courier New {1253}",1);//Add a greek courier font and embed it
pdf_dll->SelectFont(cour_gr);

Any ideas on the matter would be great!

Thanx in advance,
stakon

Back to Top
Ingo View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 29 Oct 05
Status: Offline
Points: 3524
Post Options Post Options   Thanks (0) Thanks(0)   Quote Ingo Quote  Post ReplyReply Direct Link To This Post Posted: 08 Jan 10 at 4:15PM
Hi Stakon!

The pdf uses unicode format to display these characters ...
Where do you insert these copied unicode-characters?
Is this field or textcomponent in a unicode format?

If you select a unicode filename (seeing in your explorer with japanese characters) with an old delphi-component there you have only a placeholder (perhaps a box or ?) at the position of japanese characters.

Another try: Copy some unicoded text from an arabian pdf-document into notepad  and save it as unicode. Open it again... the characters are still the same.
Save this notepad document now as ansi. Open it again... the characters aren't the same. Only ugly placeholders.

So if you want to check, copy the content of a unicode pdf you need components with the ability to handle unicode content. Have a look at WideString ;-)

Cheers, Ingo
Back to Top
stakon View Drop Down
Team Player
Team Player


Joined: 09 Oct 09
Status: Offline
Points: 22
Post Options Post Options   Thanks (0) Thanks(0)   Quote stakon Quote  Post ReplyReply Direct Link To This Post Posted: 11 Jan 10 at 9:45AM
Hello Ingo,

thanx for the info.

Unfortunately nothing i tried works (saving in txt in different formats etc.)
As for your question, i am pasting the pdf text in txt files and word files. Even if i paste it here in this reply text box the same weird text is displayed.

The text in the PDF appears like this: "ΔΙΑΣΤΑΣΙΟΛΟΓΗΣΗ ΔΟΚΩΝ ΣΤΑΘΜΗΣ"
When copying this from the PDF and paste it anywhere : "ÄÉÁÓÔÁÓÉÏËÏÃÇÓÇ ÄÏÊÙÍ ÓÔÁÈÌÇÓ"

PS. I am using the dll version of QuickPDF and developing in Visual C++
Back to Top
manuel76413 View Drop Down
Beginner
Beginner
Avatar

Joined: 31 Dec 09
Status: Offline
Points: 4
Post Options Post Options   Thanks (0) Thanks(0)   Quote manuel76413 Quote  Post ReplyReply Direct Link To This Post Posted: 11 Jan 10 at 9:59AM
Unicode character is very difficult when use QuickPDF library.
 
I have the same problem.
Back to Top
Ingo View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 29 Oct 05
Status: Offline
Points: 3524
Post Options Post Options   Thanks (0) Thanks(0)   Quote Ingo Quote  Post ReplyReply Direct Link To This Post Posted: 11 Jan 10 at 10:16AM
Hi!

Put the resulting values from QuickPDF into WideString-fields and it will work.
What version of VC++ you're using?
If you're selecting a file which name contains kyrillic or asian characters into your VC++-app... and show this filename in an edit-field (in your app)... what do you see? If you don't see the correct filename then the problem is your ide and not QuickPDF.
I'm working with Delphi 2007 (no unicode-support) and Free Pascal/Lazarus (with unicode support). Calling the QuickPDF-routines from Free Pascal with WideStrings works fine for me.

Cheers, Ingo
 
Back to Top
stakon View Drop Down
Team Player
Team Player


Joined: 09 Oct 09
Status: Offline
Points: 22
Post Options Post Options   Thanks (0) Thanks(0)   Quote stakon Quote  Post ReplyReply Direct Link To This Post Posted: 11 Jan 10 at 12:56PM
Hi again!

I am using Visual C++ 2005 + SP

What exactly do you mean with WideString fields?  If i simply paste text from my pdf  in any texbox, editbox etc. it isn't displayed correctly.

Selecting files with cyrillic,greek etc and displaying them in edit-fields works fine.
Back to Top
Ingo View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 29 Oct 05
Status: Offline
Points: 3524
Post Options Post Options   Thanks (0) Thanks(0)   Quote Ingo Quote  Post ReplyReply Direct Link To This Post Posted: 11 Jan 10 at 3:03PM
Hi!

Do you extract the text or do you copy and paste from a pdf-reader?
With WideString i mean WideString and not String 'cause formattype String can't show unicode-content. If the edit-fields of your app can show (for example) cyrillic characters you should use fields with the same formattype to get the result of the textextraction from QuickPDF... and it'll work. If not please ask the official supportpage (general section... first steps...).
BTW: You should use the last QuickPDF-version...

Cheers, Ingo

Back to Top
Wheeley View Drop Down
Senior Member
Senior Member
Avatar

Joined: 30 Oct 05
Location: United States
Status: Offline
Points: 146
Post Options Post Options   Thanks (0) Thanks(0)   Quote Wheeley Quote  Post ReplyReply Direct Link To This Post Posted: 12 Jan 10 at 1:04AM
The DLL editions does NOT have wide strings. So your solution will not work Ingo. It does have UTF8 ANSI strings. So hypothetically if you convert the UTF8 string to a wide string you should see your correct text. So maybe you need to paste your text into an editor to convert it to unicode.

Wheeley
Back to Top
Ingo View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 29 Oct 05
Status: Offline
Points: 3524
Post Options Post Options   Thanks (0) Thanks(0)   Quote Ingo Quote  Post ReplyReply Direct Link To This Post Posted: 12 Jan 10 at 6:15AM
Hi Wheeley!

I know that QuickPDF works INSIDE with AnsiString and PAnsiString.
If you initiate an external call to a QuickPDF-function and (for example) a filename is needed then you should have this filename (if it contains asian or other characters) in a WideString-field.
I've tested it long enough. I'm out of office now. I'll post a codepart later...

Cheers, Ingo

Back to Top
 Post Reply Post Reply
  Share Topic   

Forum Jump Forum Permissions View Drop Down

Forum Software by Web Wiz Forums® version 11.01
Copyright ©2001-2014 Web Wiz Ltd.

Copyright © 2017 Debenu. Debenu Quick PDF Library is a PDF SDK. All rights reserved. AboutContactBlogSupportOnline Store