Do you own a Debenu Quick PDF Library version 7, 8, 9, 10, 11, 12, 13 or iSEDQuickPDF license? Upgrade to Debenu Quick PDF Library 14 today!

Debenu Quick PDF Library - PDF SDK Community Forum Homepage
Forum Home Forum Home > For Users of the Library > I need help - I can help
  New Posts New Posts RSS Feed - The GetPageText(3/4) returns invalids rectangles
  FAQ FAQ  Forum Search   Register Register  Login Login

The GetPageText(3/4) returns invalids rectangles

 Post Reply Post Reply
Author
Message
DELBEKE View Drop Down
Debenu Quick PDF Library Expert
Debenu Quick PDF Library Expert
Avatar

Joined: 31 Oct 05
Location: France
Status: Offline
Points: 151
Post Options Post Options   Thanks (0) Thanks(0)   Quote DELBEKE Quote  Post ReplyReply Direct Link To This Post Topic: The GetPageText(3/4) returns invalids rectangles
    Posted: 23 Nov 07 at 4:31AM

The gettextpage(3) or gettextpage(4) returns lines defining each text with it's bounding rectangle,font

Somes times to times, the data returned are not a rectangle.
 
The following Vb6 code has been writen to show this

Private Sub Command1_Click()
  Dim lPnt As Long
  Dim lRet As Long
  Dim iPosit As Integer
  Dim hFich As Integer
  Dim Text As String
  Dim InputFileName As String
  Dim OutputFileName As String
  Dim DocId As Long
  Dim Tbl1 As Variant
  Dim Tbl2 As Variant
  Dim sTemp As String
  Dim X1 As Double
  Dim Y1 As Double
  Dim X2 As Double
  Dim Y2 As Double
  Dim X3 As Double
  Dim Y3 As Double
  Dim X4 As Double
  Dim Y4 As Double
 
  Dim Ok As Boolean
 
  'get input and output filenames
  InputFileName = LCase(Text1)
  OutputFileName = Replace(InputFileName, ".pdf", ".txt")
  'get a free handle number
  hFich = FreeFile
  'ctreat a new occurence of ised
  Set Doc = New iSED.QuickPDF
  'unlock ised
  lRet = Doc.UnlockKey("XXXXXXXXXXXXXXXXXXXXXX")
  'load the sample file
  DocId = Doc.LoadFromFile(InputFileName )
  'select first page
  lRet = Doc.SelectPage(1)
  'combine layers to got the whole text
  lRet = Doc.CombineLayers
  'set SetMeasurementUnits to millimeters
  Doc.SetMeasurementUnits 1
  ' set origin to top left
  Doc.SetOrigin 1
  'get the text
  Text = Doc.GetPageText(3)
  
  'free memory
  Doc.RemoveDocument DocId
 
  ' Split the text in a table, using the cr+lf as a line séparator
  Tbl1 = Split(Text, vbCrLf)
  'analyse each table's line
  For lPnt = 0 To UBound(Tbl1) - 1
    sTemp = Tbl1(lPnt)
    'remove the font name
    iPosit = InStr(sTemp, Chr(34) & ",")
    sTemp = Mid(sTemp, iPosit + 1)
    'split the line into parts using comma as separator
    Tbl2 = Split(sTemp, ",")
    X1 = CDbl(Tbl2(3))
    Y1 = CDbl(Tbl2(4))
    X2 = CDbl(Tbl2(5))
    Y2 = CDbl(Tbl2(6))
    X3 = CDbl(Tbl2(7))
    Y3 = CDbl(Tbl2(8))
    X4 = CDbl(Tbl2(9))
    Y4 = CDbl(Tbl2(10))
    'to be a rectangle (assuming points are define clockwise)
    'x1 must equal x4
    'y1 must equal y2
    'x2 must equal x3
    'y3 must equal y4
    Ok = True 'by default, the datas define a rectangle
    If X1 <> X4 Then
      Ok = False
    End If
    If X2 <> X3 Then
      Ok = False
    End If
    If Y1 <> Y2 Then
      Ok = False
    End If
    If Y3 <> Y4 Then
      Ok = False
    End If
    If Not Ok Then
      MsgBox "this line do not define a rectangle" & vbCrLf & _
              Tbl1(lPnt)
    End If
  Next
  MsgBox "test finished"
End Sub

 
 
Back to Top
swb1 View Drop Down
Debenu Quick PDF Library Expert
Debenu Quick PDF Library Expert
Avatar

Joined: 05 Dec 05
Location: United States
Status: Offline
Points: 100
Post Options Post Options   Thanks (0) Thanks(0)   Quote swb1 Quote  Post ReplyReply Direct Link To This Post Posted: 23 Nov 07 at 9:06AM

I have had this problem as well. It seems depend upon what created the document. There are a number of different ways of expressing the location of text on the page and some of these ways seem to hide the text from QuickPDF. I don’t not have and answer and because I use QuickPDF principally to extract text I am hopeful that there is a solution.

 

Steve
Back to Top
DELBEKE View Drop Down
Debenu Quick PDF Library Expert
Debenu Quick PDF Library Expert
Avatar

Joined: 31 Oct 05
Location: France
Status: Offline
Points: 151
Post Options Post Options   Thanks (0) Thanks(0)   Quote DELBEKE Quote  Post ReplyReply Direct Link To This Post Posted: 23 Nov 07 at 10:12AM
The problem is to get the real Y position, the bottom line can be found by adding the height for the current font
The Y position may be the bottom or the top line for the text (but should be always the same)
As far i've got, one of the y1/y2/y3/y4 is the good one, but i can not found which one.
 
Ps :  i've found sone documents where y1=y2=y3=y4
Back to Top
swb1 View Drop Down
Debenu Quick PDF Library Expert
Debenu Quick PDF Library Expert
Avatar

Joined: 05 Dec 05
Location: United States
Status: Offline
Points: 100
Post Options Post Options   Thanks (0) Thanks(0)   Quote swb1 Quote  Post ReplyReply Direct Link To This Post Posted: 23 Nov 07 at 10:59AM

I guess my problem is not the same after all. My issue is not with the bounding rectangle. GetPageText(3) is returning the correct rectangle boundaries however it is not returning the correct text. In most cases the text is empty.

Back to Top
 Post Reply Post Reply
  Share Topic   

Forum Jump Forum Permissions View Drop Down

Forum Software by Web Wiz Forums® version 11.01
Copyright ©2001-2014 Web Wiz Ltd.

Copyright © 2017 Debenu. Debenu Quick PDF Library is a PDF SDK. All rights reserved. About — Contact — Blog — Support — Online Store