Do you own a Debenu Quick PDF Library version 7, 8, 9, 10, 11, 12, 13 or iSEDQuickPDF license? Upgrade to Debenu Quick PDF Library 14 today!
![]() |
Find text by font size? |
Post Reply ![]() |
Author | |
Skylla ![]() Beginner ![]() Joined: 21 May 13 Status: Offline Points: 4 |
![]() ![]() ![]() ![]() ![]() Posted: 21 May 13 at 7:57PM |
I am trying to find out how to extract text with "specific font size" from pdf file via C#? For example search pdf file, find 22pt text and extract it. Is there a way to accomplist this via quick pdf? Any ideas or sample codes? Need help from gurus! Thank you!
|
|
![]() |
|
Ingo ![]() Moderator Group ![]() ![]() Joined: 29 Oct 05 Status: Offline Points: 3529 |
![]() ![]() ![]() ![]() ![]() |
Hi Skylla!
That's easy stuff so you should succeed by your own ;-) Take a starting code for beginners: http://www.quickpdflibrary.com/help/getting-started-activex.php Insert a LoadFromFile... Insert a PageCount... Then create a loop with PageCount ...and there insert the functionality of ExtractFilePageText with Option 3. Here you can read all about option 3 and then you know how to do: http://www.quickpdflibrary.com/help/quickpdf/ExtractFilePageText.php Cheers and welcome here, Ingo |
|
![]() |
|
Skylla ![]() Beginner ![]() Joined: 21 May 13 Status: Offline Points: 4 |
![]() ![]() ![]() ![]() ![]() |
Thank you for your good starting points!
|
|
![]() |
|
AndrewC ![]() Moderator Group ![]() ![]() Joined: 08 Dec 10 Location: Geelong, Aust Status: Offline Points: 841 |
![]() ![]() ![]() ![]() ![]() |
Skylla, QP.LoadFromFile("99pages.pdf", ""); for (int i = 1; i <= QP.PageCount();i++) { QP.SelectPage(i); int id = QP.ExtractPageTextBlocks(3); for (int w=1 ; w<=QP.GetTextBlockCount(id) ; w++) { double size = QP.GetTextBlockFontSize(id, w); if (Math.Round(size) == 22) MessageBox.Show("Page :" + i.ToString() + " Word:" + w.ToString() + "'" + QP.GetTextBlockText(id, w) + "'"); } QP.ReleaseTextBlocks(id); } Andrew. |
|
![]() |
|
Skylla ![]() Beginner ![]() Joined: 21 May 13 Status: Offline Points: 4 |
![]() ![]() ![]() ![]() ![]() |
Hi Andrew.
Thank you for your sample. Tried that code, in runtime i got the following results; id = 1476395009, Result of qp.GetTextBlockCount(id) = 0 so loop in for (int w = 1; w <= qp.GetTextBlockCount(id); w++) not succeed. Do you have an idea what is happening? var qp = new PDFLibrary("C:\\DebenuPDFLibraryDLL0914.dll"); const string licenseKey = "licencekey"; var result = qp.UnlockKey(licenseKey); if (qp.LibraryLoaded()) { if (result == 1) { qp.LoadFromFile("aaa.pdf", ""); for (int i = 1; i <= qp.PageCount(); i++) { qp.SelectPage(i); int id = qp.ExtractPageTextBlocks(3); for (int w = 1; w <= qp.GetTextBlockCount(id); w++) { double size = qp.GetTextBlockFontSize(id, w); if (Math.Round(size) == 22) Response.Write("Page :" + i.ToString(CultureInfo.InvariantCulture) + " Word:" + w.ToString(CultureInfo.InvariantCulture) + "'" + qp.GetTextBlockText(id, w) + "'" + "<br>"); } qp.ReleaseTextBlocks(id); } } } |
|
![]() |
|
AndrewC ![]() Moderator Group ![]() ![]() Joined: 08 Dec 10 Location: Geelong, Aust Status: Offline Points: 841 |
![]() ![]() ![]() ![]() ![]() |
If it returned 0 then it is not finding any text on the page. I would need to see the PDF file before I could make and further comments.
Andrew.
|
|
![]() |
|
Skylla ![]() Beginner ![]() Joined: 21 May 13 Status: Offline Points: 4 |
![]() ![]() ![]() ![]() ![]() |
Its a basic pdf actually which is create by me for testing.
Just 24, 23, 22, 20 pt text's in it. Created with word, saved as pdf.
|
|
![]() |
|
AndrewC ![]() Moderator Group ![]() ![]() Joined: 08 Dec 10 Location: Geelong, Aust Status: Offline Points: 841 |
![]() ![]() ![]() ![]() ![]() |
My code is working correctly with your PDF and is returning the 22pt font from both pages.
Is LoadFromFile returning 1 in your case ? Does QP.PageCount return 1 or 2 ? It should be 2 for your PDF. It could be a permissions problem. I suspect LoadFromFile is failing. By default QPL always has a single blank page allocated in memory it could be that is the reason nothing is being extracted. You may then try string s= QP.GetPageText(7); MessageBox.Show(s); to make sure the text is actually being extracted. Andrew.
|
|
![]() |
Post Reply ![]() |
|
Tweet
|
Forum Jump | Forum Permissions ![]() You cannot post new topics in this forum You cannot reply to topics in this forum You cannot delete your posts in this forum You cannot edit your posts in this forum You cannot create polls in this forum You cannot vote in polls in this forum |
Copyright © 2017 Debenu. Debenu Quick PDF Library is a PDF SDK. All rights reserved. About — Contact — Blog — Support — Online Store