Print Page | Close Window

How to get x,y lines coordinates around a text

Printed From: Debenu Quick PDF Library - PDF SDK Community Forum
Category: For Users of the Library
Forum Name: I need help - I can help
Forum Description: Problems and solutions while programming with the Debenu Quick PDF Library and Debenu PDF Viewer SDK
URL: http://www.quickpdf.org/forum/forum_posts.asp?TID=3841
Printed Date: 28 Mar 24 at 7:21PM
Software Version: Web Wiz Forums 11.01 - http://www.webwizforums.com


Topic: How to get x,y lines coordinates around a text
Posted By: eddypi
Subject: How to get x,y lines coordinates around a text
Date Posted: 15 Sep 20 at 5:10AM

I am looking for a code to get (x,y) coordinates of the line(s), which surrounds a text.

E.g. in the picture below, if we know coordinates for text “WEST LAKES LIBRARY” then which function from Quick PDF library can provide (x,y) coordinates for Top, Bottom, Left & Right lines?


                                     Top Line

                    ___________________________

                    |                                               |

                    |      WEST LAKES LIBRARY         |    <--Right Line

Left Line -->  |                                               |

                    |                                               |

                     ___________________________

                                     Bottom Line




Replies:
Posted By: Ingo
Date Posted: 15 Sep 20 at 7:29AM
Good Morning Eddy,

the extract functionalities will feed your needs:
https://www.debenu.com/docs/pdf_library_reference/Extraction.php
The ExtractOptions 2, 3, 4 and 5 will return with a csv-string and this string includes the relevant rectangle coordinates and font data.

Cheers and welcome here,
Ingo



-------------
Cheers,
Ingo



Posted By: eddypi
Date Posted: 15 Sep 20 at 10:05AM
Hi Ingo,

Thank you for prompt reply. 
I have no problem finding text box coordinates with https://www.debenu.com/docs/pdf_library_reference/DAGetTextBlockBound.php" rel="nofollow - GetTextBlockBound  or similar function from the Debenu library. 
The problem is how by knowing text box coordinate find x,y coordinate of the line(s) on each side of the text.
E.g referring to original picture and using
QP.SetOrigin (7) 'Top left of media box
QP.SetMeasurementUnits (1) ' Set the measurement unit in millimetres.

We found from  https://www.debenu.com/docs/pdf_library_reference/DAGetTextBlockBound.php" rel="nofollow - GetTextBlockBound  function that corner of "WEST LAKE LIBRARY" is x1=500 & y1=600 
Now we want to find out what are coordiantes of the line drawn on top of the this text (marked with "Top Line"). Visualy we knew that "Top Line" will be at x1=500-x & y1=600-y, but I am looking for a function which will give us that "Top Line" starts at x1=450 & y2=650 and ends at x2=550 & y2=650


Posted By: tfrost
Date Posted: 15 Sep 20 at 5:29PM
I assume that you are working with a PDF that you did not draw yourself, otherwise you would know where the "lines" are.  And it follows that they might be drawn in dozens of different ways, such as on an image, together as a 'box', as four individual lines, or somewhat as you have done, with underlines and vertical bar single characters.

The simplest way of cutting though all this is to render the page to a bitmap, start at the known position of your text, and walk outwards through the scan lines in four directions until you find a change of pixel colour.  Yes, it is tedious to do this, but it is simple and reliable, unless your text overlays an image, of course.


Posted By: Ingo
Date Posted: 15 Sep 20 at 5:31PM
There isn't a solution inside QuickPDF for this.
You have to create your own algo for this ... perhaps with th help of QuickPDF ;-)
Please keep in mind this is a user forum here (from user ... to user).
No official Debenu/Foxit-support here...
For this (technical questions) you should use the official contact page.



-------------
Cheers,
Ingo



Posted By: eddypi
Date Posted: 16 Sep 20 at 9:14AM
tfrost is right I am working with PDF created by various draftsman in AutoCAD. I checked with FoxitPhantom and PDF has line as a separate object with x,y + width, height parameters.
here are sample files https://www.dropbox.com/sh/7ywhw6fysswbaa7/AADQLIRWMeUYRCRhByQgMGtza?dl=0" rel="nofollow - https://www.dropbox.com/sh/7ywhw6fysswbaa7/AADQLIRWMeUYRCRhByQgMGtza?dl=0

tfrost are you able to provide reference to a function in QuickPDF or sample code, which I need to look for implementing “…bitmap walk through …. Pixel color” approach.

Also, I do realise that it is not a straight forward problem, but trust if solved can of great benefit to wide QuickPDF community.
Let me explain full problem in details based on sample PDF files at the above link.
1)      a) 1003.pdf and 1004.pdf files are original files provided by a draftsman (normally, full set of pdf files may have 20-100 files similar to 1003 & 1004)
2)      b) All files have same layout title block which is located in this example at the bottom right corner of each page (other draftsman  may use different layout an positon of the title block)

Task: Read all supplied PDFs and extract e.g. “DRAWING TITLE”, “DRAWING NUMBER” & REVISION and etc.
Note: in our files “DRAWING TITLE” header is not even provided by a draftsman, but it can be seen under text “CHARLES STREET”

Proposed solution:
1) use QuickPDF to number all text position inside one of the pdf file i.e. “1004.pdf text blocks.pdf” file and then opens in Adobe Reader for user to review.
2) User will enter 224,227 & 228, which are correspond to “DRAWING TITLE”, “DRAWING NUMBER” & REVISION back in to software. 
Note: 224,227 & 228 at the same time represent (x1,y1) coordinates of the left bottom part of the text. E.g. 224=(788,572)
3) Software base on that info runs through rest of the files and extracts values for “DRAWING TITLE”, “DRAWING NUMBER” & REVISION and etc.

The code for 3 steps works if all text is left align, but if it is centred then software would not pick up text at expected position due to text moving left and right depending on number of characters. Implementing something like look in between y1-5mm and y1+5mm would not work in case there text is to long and close to a text on its left. So y1-5mm may overlap with the text at the left of 224 and give incorrect result.



Posted By: tfrost
Date Posted: 16 Sep 20 at 12:02PM
The QPDF functions are DARenderPageToFile or RenderPageToFile.  Or another function in the Rendering section of the reference guide - for example I use Delphi so I would choose RenderPageToStream and open the bitmap from the stream, to avoid using a file.

Once you have the bitmap you are operating outside QPDF and you need whatever your language provides to work with bitmaps.  Remember that the rendering scale and origin you use in QPDF will require conversion between the PDF co-ordinates of your found text and the BMP co-ordinates.

I agree it is not at all straightforward, but not that it is of much general interest.  If it was my problem, I think I would tell the originator either to encode the title, number and revision in the filename, or hide them in a fixed position in the page margin, if necessary in "white ink"!





Print Page | Close Window

Forum Software by Web Wiz Forums® version 11.01 - http://www.webwizforums.com
Copyright ©2001-2014 Web Wiz Ltd. - http://www.webwiz.co.uk