Print Page | Close Window

How to remove only text from a page

Printed From: Debenu Quick PDF Library - PDF SDK Community Forum
Category: For Users of the Library
Forum Name: I need help - I can help
Forum Description: Problems and solutions while programming with the Debenu Quick PDF Library and Debenu PDF Viewer SDK
URL: http://www.quickpdf.org/forum/forum_posts.asp?TID=3825
Printed Date: 20 Apr 24 at 3:18PM
Software Version: Web Wiz Forums 11.01 - http://www.webwizforums.com


Topic: How to remove only text from a page
Posted By: aderg
Subject: How to remove only text from a page
Date Posted: 15 Jul 20 at 7:41PM
How to remove only text from the page so that the graphics remain?
If I use DeleteContentStream, I get a blank page..



Replies:
Posted By: Ingo
Date Posted: 15 Jul 20 at 7:52PM
Hi aderg,

no real solution for this.
With extract functionality you can try to determine the area where the text blocks are...
With draw functionality you can overwrite these areas with white colour for example.

Perhaps it's better to do it the other way round extracting the things you want to have onto  new blank pages...



-------------
Cheers,
Ingo



Posted By: aderg
Date Posted: 15 Jul 20 at 8:10PM
"you can overwrite these areas with white colour" - it’s not good if the text was drawn over the picture
"extracting the things you want to have onto  new blank pages" - this only works with bitmaps, but how to extract vector graphics?


Posted By: Ingo
Date Posted: 16 Jul 20 at 3:22PM
that are easy steps you can do for similar results.
Not a good solution that's true ;-)
There isn't any built-in-function for your needs.
You have to read the online reference and the developer guide and develope your own solution with a little help by QuickPDF ;-)



-------------
Cheers,
Ingo



Posted By: aderg
Date Posted: 16 Jul 20 at 3:29PM
Thanks, I read the developer guide :)
But I did not find the ability to delete text, as well as the ability to copy all content without text..


Posted By: Ingo
Date Posted: 16 Jul 20 at 10:59PM
Now you know the reason - there isn't any special direct functionality ;-)



-------------
Cheers,
Ingo



Posted By: dicmmooee
Date Posted: 02 Sep 20 at 1:43AM
Dear sir,
Could you please give a sample to draw functionality you can overwrite these areas with white colour above certain text.
High appreciation for your support!


Posted By: tfrost
Date Posted: 02 Sep 20 at 2:07PM
There is a sample of using GetPageText to get the position of each text item here at:
http://quickpdf.org/forum/extracting-text-by-csv-coordinates_topic3805_post15214.html

Then for each text item you can draw a filled box over it with SetFillColor and DrawBox

Check the Reference Guide for the options for these functions.




Posted By: Sopracenery
Date Posted: 02 Sep 20 at 7:46PM
Hi,

If there is no built-in solution nor a better suggestion for this and you REALLY want to have the job done, you can try to eliminate each character in the content stream. 
Usually text looks like this:
(HELLO WORLD) or like this:
(H) 2.342 (E) 2.122 (L) 2.214 (L) 2.434 (O) 9.342 (W) 2.12 (O) 2.21 (R) 2.34 (L) 2.12 (D) 
Now you can try to replace all (H)s and (E)s and so on with one space in brackets "( )" and each full word found by text extraction with spaces without brackets:

Select a ContentStream and call 
DebenuPDFLibrary1811.ReplaceTag("HELLO", "     ")
DebenuPDFLibrary1811.ReplaceTag("WORLD", "     ")
DebenuPDFLibrary1811.ReplaceTag("(H)", "( )")
DebenuPDFLibrary1811.ReplaceTag("(E)", "( )")
and repeat it with the complete alphabet.
Please check if your graphs are affected. No warranty!

Feedback appreciated

Martin


Posted By: dicmmooee2
Date Posted: 09 Sep 20 at 4:38PM
Thanks for your useful reply! Good luck!


Posted By: mandriospo
Date Posted: 05 Apr 23 at 11:54AM
Originally posted by Sopracenery Sopracenery wrote:

...you can try to eliminate each character in the content stream. 
Usually text looks like this:
(HELLO WORLD) or like this:
(H) 2.342 (E) 2.122 (L) 2.214 (L) 2.434 (O) 9.342 (W) 2.12 (O) 2.21 (R) 2.34 (L) 2.12 (D) ...
how to check if stream looks this way?


Posted By: Ingo
Date Posted: 05 Apr 23 at 7:51PM
Originally posted by mandriospo mandriospo wrote:

Originally posted by Sopracenery Sopracenery wrote:

...you can try to eliminate each character in the content stream. 
Usually text looks like this:
(HELLO WORLD) or like this:
(H) 2.342 (E) 2.122 (L) 2.214 (L) 2.434 (O) 9.342 (W) 2.12 (O) 2.21 (R) 2.34 (L) 2.12 (D) ...
how to check if stream looks this way?

Mostly this won't work.
Try to open a pdf with notepad and browse through the content. Most content you'll not be able to read few content you will be able and this will give you an idea what's possible and what not.



-------------
Cheers,
Ingo




Print Page | Close Window

Forum Software by Web Wiz Forums® version 11.01 - http://www.webwizforums.com
Copyright ©2001-2014 Web Wiz Ltd. - http://www.webwiz.co.uk