Copying/Searching unicode chars in a PDF

Message Topic Search Topic Options Post Reply Create New Topic Printable Version Translate Topic

   Good day and happy new year!

Today i came across a problem i had never noticed before:

I select some unicode (greek)  text from a PDFdocument, copy and paste it in a word or txt document and what i get is some unreadable characters...

I guess this is also the reason why a search within the PDF for greek characters  returns no result.

As a note i used embedded fonts when creating my documents: 
cour_gr = pdf_dll->AddTrueTypeFont("Courier New {1253}",1);//Add a greek courier font and embed it
pdf_dll->SelectFont(cour_gr);

Any ideas on the matter would be great!

Thanx in advance,
stakon

Author	Message Topic Search Topic Options Post Reply Create New Topic Printable Version Translate Topic
stakon Members Profile Find Members Posts Team Player Joined: 09 Oct 09 Status: Offline Points: 22	Post Options Post Reply Quote stakon Report Post Thanks(0) Quote Reply Topic: Copying/Searching unicode chars in a PDF Posted: 08 Jan 10 at 11:49AM
	Good day and happy new year! Today i came across a problem i had never noticed before: I select some unicode (greek) text from a PDFdocument, copy and paste it in a word or txt document and what i get is some unreadable characters... I guess this is also the reason why a search within the PDF for greek characters returns no result. As a note i used embedded fonts when creating my documents: cour_gr = pdf_dll->AddTrueTypeFont("Courier New {1253}",1);//Add a greek courier font and embed it pdf_dll->SelectFont(cour_gr); Any ideas on the matter would be great! Thanx in advance, stakon

Ingo Members Profile Find Members Posts Moderator Group Joined: 29 Oct 05 Status: Offline Points: 3529	Post Options Post Reply Quote Ingo Report Post Thanks(0) Quote Reply Posted: 08 Jan 10 at 4:15PM
	Hi Stakon! The pdf uses unicode format to display these characters ... Where do you insert these copied unicode-characters? Is this field or textcomponent in a unicode format? If you select a unicode filename (seeing in your explorer with japanese characters) with an old delphi-component there you have only a placeholder (perhaps a box or ?) at the position of japanese characters. Another try: Copy some unicoded text from an arabian pdf-document into notepad and save it as unicode. Open it again... the characters are still the same. Save this notepad document now as ansi. Open it again... the characters aren't the same. Only ugly placeholders. So if you want to check, copy the content of a unicode pdf you need components with the ability to handle unicode content. Have a look at WideString ;-) Cheers, Ingo

stakon Members Profile Find Members Posts Team Player Joined: 09 Oct 09 Status: Offline Points: 22	Post Options Post Reply Quote stakon Report Post Thanks(0) Quote Reply Posted: 11 Jan 10 at 9:45AM
	Hello Ingo, thanx for the info. Unfortunately nothing i tried works (saving in txt in different formats etc.) As for your question, i am pasting the pdf text in txt files and word files. Even if i paste it here in this reply text box the same weird text is displayed. The text in the PDF appears like this: "ΔΙΑΣΤΑΣΙΟΛΟΓΗΣΗ ΔΟΚΩΝ ΣΤΑΘΜΗΣ" When copying this from the PDF and paste it anywhere : "ÄÉÁÓÔÁÓÉÏËÏÃÇÓÇ ÄÏÊÙÍ ÓÔÁÈÌÇÓ" PS. I am using the dll version of QuickPDF and developing in Visual C++

manuel76413 Members Profile Find Members Posts Beginner Joined: 31 Dec 09 Status: Offline Points: 4	Post Options Post Reply Quote manuel76413 Report Post Thanks(0) Quote Reply Posted: 11 Jan 10 at 9:59AM
	Unicode character is very difficult when use QuickPDF library. I have the same problem.

Ingo Members Profile Find Members Posts Moderator Group Joined: 29 Oct 05 Status: Offline Points: 3529	Post Options Post Reply Quote Ingo Report Post Thanks(0) Quote Reply Posted: 11 Jan 10 at 10:16AM
	Hi! Put the resulting values from QuickPDF into WideString-fields and it will work. What version of VC++ you're using? If you're selecting a file which name contains kyrillic or asian characters into your VC++-app... and show this filename in an edit-field (in your app)... what do you see? If you don't see the correct filename then the problem is your ide and not QuickPDF. I'm working with Delphi 2007 (no unicode-support) and Free Pascal/Lazarus (with unicode support). Calling the QuickPDF-routines from Free Pascal with WideStrings works fine for me. Cheers, Ingo

stakon Members Profile Find Members Posts Team Player Joined: 09 Oct 09 Status: Offline Points: 22	Post Options Post Reply Quote stakon Report Post Thanks(0) Quote Reply Posted: 11 Jan 10 at 12:56PM
	Hi again! I am using Visual C++ 2005 + SP What exactly do you mean with WideString fields? If i simply paste text from my pdf in any texbox, editbox etc. it isn't displayed correctly. Selecting files with cyrillic,greek etc and displaying them in edit-fields works fine.

Ingo Members Profile Find Members Posts Moderator Group Joined: 29 Oct 05 Status: Offline Points: 3529	Post Options Post Reply Quote Ingo Report Post Thanks(0) Quote Reply Posted: 11 Jan 10 at 3:03PM
	Hi! Do you extract the text or do you copy and paste from a pdf-reader? With WideString i mean WideString and not String 'cause formattype String can't show unicode-content. If the edit-fields of your app can show (for example) cyrillic characters you should use fields with the same formattype to get the result of the textextraction from QuickPDF... and it'll work. If not please ask the official supportpage (general section... first steps...). BTW: You should use the last QuickPDF-version... Cheers, Ingo

Wheeley Members Profile Find Members Posts Senior Member Joined: 30 Oct 05 Location: United States Status: Offline Points: 146	Post Options Post Reply Quote Wheeley Report Post Thanks(0) Quote Reply Posted: 12 Jan 10 at 1:04AM
	The DLL editions does NOT have wide strings. So your solution will not work Ingo. It does have UTF8 ANSI strings. So hypothetically if you convert the UTF8 string to a wide string you should see your correct text. So maybe you need to paste your text into an editor to convert it to unicode. Wheeley

Ingo Members Profile Find Members Posts Moderator Group Joined: 29 Oct 05 Status: Offline Points: 3529	Post Options Post Reply Quote Ingo Report Post Thanks(0) Quote Reply Posted: 12 Jan 10 at 6:15AM
	Hi Wheeley! I know that QuickPDF works INSIDE with AnsiString and PAnsiString. If you initiate an external call to a QuickPDF-function and (for example) a filename is needed then you should have this filename (if it contains asian or other characters) in a WideString-field. I've tested it long enough. I'm out of office now. I'll post a codepart later... Cheers, Ingo