Print Page | Close Window

Extracting Special Characters is missing

Printed From: Debenu Quick PDF Library - PDF SDK Community Forum
Category: For Users of the Library
Forum Name: I need help - I can help
Forum Description: Problems and solutions while programming with the Debenu Quick PDF Library and Debenu PDF Viewer SDK
URL: http://www.quickpdf.org/forum/forum_posts.asp?TID=1642
Printed Date: 29 Apr 24 at 9:15AM
Software Version: Web Wiz Forums 11.01 - http://www.webwizforums.com


Topic: Extracting Special Characters is missing
Posted By: Thedino
Subject: Extracting Special Characters is missing
Date Posted: 13 Nov 10 at 11:03AM
Hi,
If I try to extract words from a pdf file, the special Turkish Characters are missing.
I don't have the source codes. It's clear that the extracting functions are skipping them while extracting.
 
Is there any solution for that?
 
Thanks



Replies:
Posted By: Ingo
Date Posted: 13 Nov 10 at 6:40PM
Hi!

Please use the advanced search function (above) with "chinese"...
http://www.quickpdf.org/forum/search_results_posts.asp?SearchID=20101113183854&KW=Chinese

Cheers and welcome here,
Ingo



Posted By: Thedino
Date Posted: 13 Nov 10 at 10:14PM
Hi Ingo,
I searched but I couldn't the solution.  My code is below. Could you please fix that?
Thanks
 
    PDFLibrary := TQuickPDF0721.Create;
      err := PDFLibrary.UnlockKey('my key');
      err := PDFLibrary.AddSubsettedFont('Times New Roman', 13 ,'ĞŞİğış');
      str := PDFLibrary.ExtractFilePageText('C:\OCRTest\TXT\gk2.pdf', '', 1, 3);

     memo1.Lines.Add(  Utf8ToAnsi(str ));



Posted By: Wheeley
Date Posted: 14 Nov 10 at 12:09AM
Well, what is missing? Are they some other character or just not showing up? Based on your code you are converting an UTF-8 encoded string to Ansi. Now depending on that function, your Turkish characters are probably being stripped since they are not part of the Ansi character set. So debug your code and see if str has the correct characters before you call Utf8ToAnsi which is most likely stripping them.

Wheeley


Posted By: Thedino
Date Posted: 14 Nov 10 at 8:48AM

Wheeley,

They are missing. I debuged millions of time. They are just missing. If I get any other character such as a box or something, I can find a way to convert. But they are missing.
 
 


Posted By: Thedino
Date Posted: 17 Nov 10 at 7:13AM
I sent a message through the official support link with the samples a week ago about this problem. But no answer. I just received an automatic "your message received" email. That's it.
 
Very interesting..!
 


Posted By: Wheeley
Date Posted: 18 Nov 10 at 7:31AM
Be patient. They will get to you. Right now they are trying to wrap up a new release and need to fix all the stuff they know is broken.

Wheeley



Posted By: Thedino
Date Posted: 09 Jan 11 at 9:23PM
Two months passed, still no answer.


Posted By: Rowan
Date Posted: 10 Jan 11 at 1:07PM
What is your case number? Do you mean no answer at all or just no bug fix provided?


Posted By: Dimitry
Date Posted: 11 Jan 11 at 12:28PM
There is special function called GetSubsetString().
This function remaps Unicode characters that were added to the font subset to the corresponding character codes assigned to the glyphs in the subsetted font.
Please try attached code snippet. Hope 'ouput.pdf' contains what you need.
You may also visit http://help.quickpdflibrary.com/search?q=GetSubsetString - Quick PDF Library Knowledge Base
 
var
  QPL: TQuickPDF;
  t, s: string;
begin
  QPL := TQuickPDF.Create;
  try
    with QPL do
    begin
      t := UTF8Decode('ĞŞİğış');
      AddSubsettedFont('Times New Roman', 13, t);
      s := GetSubsetString(t);
      SetOrigin(1);
      DrawText(100, 100, s);
      SaveToFile('output.pdf');
    end;
  finally
    QPL.Free;
  end;
 


-------------
Regards,
Dmitry


Posted By: Thedino
Date Posted: 17 Feb 11 at 7:44PM
I Just received an answer today that the problem is located and fixed in the next release 7.24 ..!
I knew that it was a bug..! 
 



Print Page | Close Window

Forum Software by Web Wiz Forums® version 11.01 - http://www.webwizforums.com
Copyright ©2001-2014 Web Wiz Ltd. - http://www.webwiz.co.uk