Print Page | Close Window

Text Extraction

Printed From: Debenu Quick PDF Library - PDF SDK Community Forum
Category: For Users of the Library
Forum Name: I need help - I can help
Forum Description: Problems and solutions while programming with the Debenu Quick PDF Library and Debenu PDF Viewer SDK
URL: http://www.quickpdf.org/forum/forum_posts.asp?TID=2104
Printed Date: 11 Jun 26 at 9:06PM
Software Version: Web Wiz Forums 11.01 - http://www.webwizforums.com


Topic: Text Extraction
Posted By: rnw
Subject: Text Extraction
Date Posted: 19 Jan 12 at 4:52PM
I am using the following code to extract text from a pdf file in Visual Basic, and it works great.  But I can not figure out how to set the "text Extraction area" to only a portion of the pdf file, say like the firts 3 inches of the pdf file.
 
nPage = 1
strInputFilePath="c:\rnw.pdf"
tt = QP.LoadFromFile(strInputFilePath, “”)
strtext = QP.ExtractFilePageText(strInputFilePath, "", nPage, 8)
 
Any help on where to find this would be great.
 
Roger
 

 



Replies:
Posted By: AndrewC
Date Posted: 19 Jan 12 at 9:42PM
If you are using LoadFromFile then it would be better to use the GetPageText(8); call.

  nPage = 1
  strInputFilePath="c:\rnw.pdf"
  tt = QP.LoadFromFile(strInputFilePath, “”)

  QP.SelectPage(1);

  QP.SetOrigin(1);
  QP.SetTextExtractionArea(1, 1, QP.PageWidth(), 3 * 72);   // 72pts = 1 inch
  strText = QP.GetPageText(8);

Note: If you use the ExtractFilePageText call then it gets a little more complicated as you need to use the QP.DASetTextExtractionArea function and the native PDF coordinate system as SetOrigin doesn't work with the DA functions and you don't have access to the Page Height and Width until you open the file.  It is easier to work with the standard functions and GetPageText().



Print Page | Close Window

Forum Software by Web Wiz Forums® version 11.01 - http://www.webwizforums.com
Copyright ©2001-2014 Web Wiz Ltd. - http://www.webwiz.co.uk