Get & Remove Text Directives

Message Topic Search Topic Options Post Reply Create New Topic Printable Version Translate Topic

   Application A generates pdf files and can easily place special text tokens on each page. I am free to dictate that these text tokens start with a certain key word, use a certain font, a certain color, etc.  Let's assume the text will always look like: #PDF_Directive:<action_request>#

Application B, using QuickPDF needs to grab the text directive from each page (each page can have a maximum of one such directive), use it, and ensure that the text directive doesn't show up in printouts (by removing the text or by making it invisible in print).

I'm looking for advice on the best way to implement this. Specifically:

* what would make the search for these text tokens easy:  just the key word? special font size? location on page? what function(s) should I use for the search? are there any problems I should be aware of?

* how do I remove or hide the text tokens after I find them in the pdf?

Thanks,

- Ido

Author	Message Topic Search Topic Options Post Reply Create New Topic Printable Version Translate Topic
ixm7 Members Profile Find Members Posts Senior Member Joined: 13 Jan 06 Status: Offline Points: 68	Post Options Post Reply Quote ixm7 Report Post Thanks(0) Quote Reply Topic: Get & Remove Text Directives Posted: 13 Jan 06 at 11:07AM
	Application A generates pdf files and can easily place special text tokens on each page. I am free to dictate that these text tokens start with a certain key word, use a certain font, a certain color, etc. Let's assume the text will always look like: #PDF_Directive:<action_request># Application B, using QuickPDF needs to grab the text directive from each page (each page can have a maximum of one such directive), use it, and ensure that the text directive doesn't show up in printouts (by removing the text or by making it invisible in print). I'm looking for advice on the best way to implement this. Specifically: * what would make the search for these text tokens easy: just the key word? special font size? location on page? what function(s) should I use for the search? are there any problems I should be aware of? * how do I remove or hide the text tokens after I find them in the pdf? Thanks, - Ido

Ingo Members Profile Find Members Posts Moderator Group Joined: 29 Oct 05 Status: Offline Points: 3524	Post Options Post Reply Quote Ingo Report Post Thanks(0) Quote Reply Posted: 13 Jan 06 at 2:50PM
	Hi Ido! I would use one of the textextract-functions with option 3 (the csv-output). If i have the real text-content i will search for the textstring (in delphi with "pos"). 'Cause it#s the csv-output i have the coordinates if i've found the textstring. With this data i can use the drawbox-function to overprint the textarea. Good luck and happy clicking, Ingo

ixm7 Members Profile Find Members Posts Senior Member Joined: 13 Jan 06 Status: Offline Points: 68	Post Options Post Reply Quote ixm7 Report Post Thanks(0) Quote Reply Posted: 13 Jan 06 at 3:13PM
	Hi Ingo, Thanks! - Ido

ixm7 Members Profile Find Members Posts Senior Member Joined: 13 Jan 06 Status: Offline Points: 68	Post Options Post Reply Quote ixm7 Report Post Thanks(0) Quote Reply Posted: 16 Jan 06 at 11:06AM
	Using GetPageText(3) works perfectly with documents that are PDF Version 1.2 (Acrobat 3.x). However, the csv information returned from documents that are PDF Version 1.3 (Acrobat 4.x) fails to return the full text content. It seems to return only the first 1 or 2 characters from each text object. Is this a known issue? Any solution? Many Thanks! - Ido

swb1 Members Profile Find Members Posts Debenu Quick PDF Library Expert Joined: 05 Dec 05 Location: United States Status: Offline Points: 100	Post Options Post Reply Quote swb1 Report Post Thanks(0) Quote Reply Posted: 16 Jan 06 at 4:51PM
	I am using GetPageText(3) in v1.3 docs with no issues at all. In fact, GetPageText(3) was the single most compelling reason for my purchase of QuickPDF. How are your PDFs created? Can you post an example? sb Edited by swb1

ixm7 Members Profile Find Members Posts Senior Member Joined: 13 Jan 06 Status: Offline Points: 68	Post Options Post Reply Quote ixm7 Report Post Thanks(0) Quote Reply Posted: 16 Jan 06 at 5:01PM
	They are created through Crystal Reports. The 1.2 (Acrobat 3) version is from Crystal 9. Here's a sample that works perfectly with GetPageText(3): http://www.milletsoftware.com/Download/visual_cut_9.pdf The 1.3 (Acrobat 4) version is from Crystal XI. Here's a sample that fails to return the full text with GetPageText(3): http://www.milletsoftware.com/Download/visual_cut_11.pdf Doing LoadFromFile(originalfile)... SaveToFile(workfile)... LoadFromFile(workfile)... to try to work around the problem doesn't make a difference. The GetPageText()still returns only the first 1-2 characters from each text object. - Ido

Ingo Members Profile Find Members Posts Moderator Group Joined: 29 Oct 05 Status: Offline Points: 3524	Post Options Post Reply Quote Ingo Report Post Thanks(0) Quote Reply Posted: 17 Jan 06 at 1:37AM
	Hi Ido! I've test it, too. Textextraction from ...cut_9.pdf is okay - textextraction from ...cut_11.pdf looks very funny :-( In ...cut_11.pdf it seems to me that only the outlines of the text were displayed? It doesn't matter if i resave the file or if i change pdf 1.3 back to 1.2 ... Best regards, Ingo

ixm7 Members Profile Find Members Posts Senior Member Joined: 13 Jan 06 Status: Offline Points: 68	Post Options Post Reply Quote ixm7 Report Post Thanks(0) Quote Reply Posted: 17 Jan 06 at 10:46AM
	From the other thread, we now know that trying to play with Layers doesn't fix the issue. It's strange that the text is clearly there in the pdf, and the function finds it, but then it fails to recognize all of it. Many thanks for the detective work... - Ido