Print Page | Close Window

Cannot extract text

Printed From: Debenu Quick PDF Library - PDF SDK Community Forum
Category: For Users of the Library
Forum Name: I need help - I can help
Forum Description: Problems and solutions while programming with the Debenu Quick PDF Library and Debenu PDF Viewer SDK
URL: http://www.quickpdf.org/forum/forum_posts.asp?TID=226
Printed Date: 15 May 24 at 3:09AM
Software Version: Web Wiz Forums 11.01 - http://www.webwizforums.com


Topic: Cannot extract text
Posted By: Alexey
Subject: Cannot extract text
Date Posted: 26 Dec 05 at 7:21AM

Using Quick PDF, I am not able to extract the russian text from pdf pages. All text in PDF use Type1 Embedded fonts.

Thanks




Replies:
Posted By: Ingo
Date Posted: 26 Dec 05 at 7:34AM
Hi Alexey,
sorry... perhaps my questions help...
Perhaps it have to do with the/a codepage?
No russian text can be extracted from all pdf-files...?


-------------
Cheers,
Ingo



Posted By: Alexey
Date Posted: 26 Dec 05 at 12:33PM
What about codepage? My windows have default codepage Win-1251. Russian text can be extracted from some pdf files where truetype fonts used, but not from pdf files with only Type1 fonts


Posted By: Ingo
Date Posted: 26 Dec 05 at 4:55PM
Hi!
I think that's it...
TrueType is better supported from quickpdf than other fonts...


-------------
Cheers,
Ingo



Posted By: Alexey
Date Posted: 26 Dec 05 at 11:59PM

Hi!

Ok. Do you know another tools or libraries which can extract non-latin characters correctly? Or may be some methods to replace fonts in pdf?

Thanks for help



Posted By: Ingo
Date Posted: 27 Dec 05 at 1:28AM
Sorry... perhaps you can try iText (the open source library for pdf-files. Written as a .net- and java-version).


-------------
Cheers,
Ingo




Print Page | Close Window

Forum Software by Web Wiz Forums® version 11.01 - http://www.webwizforums.com
Copyright ©2001-2014 Web Wiz Ltd. - http://www.webwiz.co.uk