Do you own a Debenu Quick PDF Library version 7, 8, 9, 10, 11, 12, 13 or iSEDQuickPDF license? Upgrade to Debenu Quick PDF Library 14 today!
ExtractFilePageText extracts question mark! |
Post Reply |
Author | |
rezabb
Beginner Joined: 09 Sep 19 Status: Offline Points: 4 |
Post Options
Thanks(0)
Posted: 09 Sep 19 at 3:05AM |
Hi,
I have problem in extracting words inside my PDF file. the PDF content is in Persian or Farsi language. after using ExtractFilePageText function in vb6, I receive question marks (??? ??) instead of actual text. How can I get the real Persian Texts and not a series of question marks? Thanks, Reza p.s. this is my code in vb6: Dim ClassName Dim LicenseKey ClassName = "DebenuPDFLibraryAX1613.PDFLibrary" LicenseKey = "***" Dim DPL Dim Result Set DPL = CreateObject(ClassName) Result = DPL.UnlockKey(LicenseKey) DPL.LoadFromFile strInputFilePath, "" iNumPages = DPL.PageCount() '// Calculate the number of pages strText = "" nPage = 0 For nPage = 1 To iNumPages strText = DPL.ExtractFilePageText(strInputFilePath, "", nPage, 0) Clipboard.Clear Clipboard.SetText strText text1.text = strText Next |
|
Ingo
Moderator Group Joined: 29 Oct 05 Status: Offline Points: 3524 |
Post Options
Thanks(0)
|
Hi Reza, if you look into the online reference you'll see that all string content is handled as unicode (wchar). Your vb6 doesn't support unicode with normal vb6-code - you have to convert the content first from unicode to string or the other way round (depending if you want to get or put). Cheers and welcome here, Ingo |
|
Cheers,
Ingo |
|
rezabb
Beginner Joined: 09 Sep 19 Status: Offline Points: 4 |
Post Options
Thanks(0)
|
Dear Ingo,
Many Thanks for your answer and for your warm welcoming me in the forum, Using your exact keyword and using the exact conversion you mentioned, I could find the proper vb code. and now I have extracted texts in Persian language. it worked like a charm. Another question: in Persian, Texts are written from Right to left.... Quickpdf seems to extract texts (characters) from "Top Left" side of the page. is there any option to reverse this process lets say from "Top Right" side of the page. because my words are extracted in reveres character order and it does not make sense, making it unreadable. (for example consider the word "google", it is extracted like "elgoog" ) when I copy text from Persian PDF and paste it in MSWord, the text is correct. I want to have my extracted texts similar to what MSWord is doing. Thanks again, Reza |
|
Ingo
Moderator Group Joined: 29 Oct 05 Status: Offline Points: 3524 |
Post Options
Thanks(0)
|
try extract option 0 or 7 to have a normal readable content or option 4 to get word by word.
This should make it easier for you. https://www.debenu.com/docs/pdf_library_reference/ExtractFilePageText.php |
|
Cheers,
Ingo |
|
rezabb
Beginner Joined: 09 Sep 19 Status: Offline Points: 4 |
Post Options
Thanks(0)
|
great. option 4 is a better solution so to deal with the text, word by word and post-processing extracted words later using vb code.
in my PDF I have a few half-space character which acts like a space but in this case, the two connecting words are assumed to be a one word rather than two separated words. (just like the character "-" in English; sample word: "non-destructive" which is considered 1 word not 2 words). I see that Quickpdf considers words which have a half-space character in the middle, as two separated words and extracts them separately. is there any way to extract them as one word? ... maybe define the separating character. because ascii code for space and half-space are not the same.
|
|
Ingo
Moderator Group Joined: 29 Oct 05 Status: Offline Points: 3524 |
Post Options
Thanks(0)
|
Hi again,
i have few code snippets for you how to deal with unicode- and integer-values using vb6: module1.bas ----------- Attribute VB_Name = "Module1" Public Declare Function functionname1 Lib "function.dll" (ByVal parameter As String) As Integer Public Declare Function functionname2 Lib "function.dll" (ByVal parameter1 As String, ByVal parameter2 As Integer) As Long ' The returned string content Public Declare Function apiLStrCopyW Lib "kernel32.dll" Alias "lstrcpyW" (ByVal lpString1 As Long, ByVal lpString2 As Long) As Long Public Declare Function apiLStrLenW Lib "kernel32.dll" Alias "lstrlenW" (ByVal lpString As Long) As Long Public Function GetStringFromPtrW(ByVal ptr As Long) As String 'create a matching buffer GetStringFromPtrW = String$(apiLStrLenW(ptr), 0) 'copying the string into the buffer apiLStrCopyW StrPtr(GetStringFromPtrW), ptr End Function form1.frm --------- VERSION 5.00 Begin VB.Form Form1 Caption = "vb6-sample - ..." ClientHeight = 5475 ClientLeft = 45 ClientTop = 435 ClientWidth = 7365 LinkTopic = "Form1" ScaleHeight = 5475 ScaleWidth = 7365 StartUpPosition = 3 'Windows-Standard Begin VB.CheckBox Check7 " . . . Public r As String Private Sub Command1_Click() Dim sPfad() As Byte sPfad = StrConv(Text1.Text, vbUnicode) Text7.Text = Str(functionsname1(sPfad)) End Sub Private Sub option1_Click() Dim sPfad() As Byte Dim tPfad() As Byte Dim title() As Byte Dim sp As Integer " . . . If Check1.Value = 1 Then sp = 1 Else sp = 0 End If " . . . |
|
Cheers,
Ingo |
|
Post Reply | |
Tweet
|
Forum Jump | Forum Permissions You cannot post new topics in this forum You cannot reply to topics in this forum You cannot delete your posts in this forum You cannot edit your posts in this forum You cannot create polls in this forum You cannot vote in polls in this forum |
Copyright © 2017 Debenu. Debenu Quick PDF Library is a PDF SDK. All rights reserved. About — Contact — Blog — Support — Online Store