Do you own a Debenu Quick PDF Library version 7, 8, 9, 10, 11, 12, 13 or iSEDQuickPDF license? Upgrade to Debenu Quick PDF Library 14 today!

Debenu Quick PDF Library - PDF SDK Community Forum Homepage
Forum Home Forum Home > For Users of the Library > I need help - I can help
  New Posts New Posts RSS Feed - ExtractFilePageText extracts question mark!
  FAQ FAQ  Forum Search   Register Register  Login Login

ExtractFilePageText extracts question mark!

 Post Reply Post Reply
Author
Message
rezabb View Drop Down
Beginner
Beginner


Joined: 09 Sep 19
Status: Offline
Points: 4
Post Options Post Options   Thanks (0) Thanks(0)   Quote rezabb Quote  Post ReplyReply Direct Link To This Post Topic: ExtractFilePageText extracts question mark!
    Posted: 09 Sep 19 at 3:05AM
Hi,

I have problem in extracting words inside my PDF file. the PDF content is in Persian or Farsi language. 

after using ExtractFilePageText function in vb6, I receive question marks (??? ??) instead of actual text. How can I get the real Persian Texts and not a series of question marks?

Thanks,
Reza

p.s. this is my code in vb6:

Dim ClassName
Dim LicenseKey

ClassName = "DebenuPDFLibraryAX1613.PDFLibrary"
LicenseKey = "***"

Dim DPL
Dim Result
 
Set DPL = CreateObject(ClassName)
Result = DPL.UnlockKey(LicenseKey)

DPL.LoadFromFile strInputFilePath, ""


iNumPages = DPL.PageCount() '// Calculate the number of pages

strText = ""
nPage = 0

For nPage = 1 To iNumPages

strText = DPL.ExtractFilePageText(strInputFilePath, "", nPage, 0)
Clipboard.Clear
Clipboard.SetText strText
text1.text = strText

Next
Back to Top
Ingo View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 29 Oct 05
Status: Offline
Points: 3524
Post Options Post Options   Thanks (0) Thanks(0)   Quote Ingo Quote  Post ReplyReply Direct Link To This Post Posted: 09 Sep 19 at 10:46AM
Hi Reza,

if you look into the online reference you'll see that all string content is handled as unicode (wchar). Your vb6 doesn't support unicode with normal vb6-code - you have to convert the content first from unicode to string or the other way round (depending if you want to get or put).

Cheers and welcome here,
Ingo

Cheers,
Ingo

Back to Top
rezabb View Drop Down
Beginner
Beginner


Joined: 09 Sep 19
Status: Offline
Points: 4
Post Options Post Options   Thanks (0) Thanks(0)   Quote rezabb Quote  Post ReplyReply Direct Link To This Post Posted: 09 Sep 19 at 1:39PM
Dear Ingo,
Many Thanks for your answer and for your warm welcoming me in the forum,
Using your exact keyword and using the exact conversion you mentioned, I could find the proper vb code. and now I have extracted texts in Persian language. it worked like a charm.

Another question: in Persian, Texts are written from Right to left.... Quickpdf seems to extract texts (characters) from "Top Left" side of the page. is there any option to reverse this process lets say from "Top Right" side of the page. because my words are extracted in reveres character order and it does not make sense, making it unreadable. (for example consider the word "google", it is extracted like "elgoog" )

when I copy text from Persian PDF and paste it in MSWord, the text is correct. I want to have my extracted texts similar to what MSWord is doing.

Thanks again,
Reza

Back to Top
Ingo View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 29 Oct 05
Status: Offline
Points: 3524
Post Options Post Options   Thanks (0) Thanks(0)   Quote Ingo Quote  Post ReplyReply Direct Link To This Post Posted: 09 Sep 19 at 4:56PM
try extract option 0 or 7 to have a normal readable content or option 4 to get word by word.
This should make it easier for you.
https://www.debenu.com/docs/pdf_library_reference/ExtractFilePageText.php

Cheers,
Ingo

Back to Top
rezabb View Drop Down
Beginner
Beginner


Joined: 09 Sep 19
Status: Offline
Points: 4
Post Options Post Options   Thanks (0) Thanks(0)   Quote rezabb Quote  Post ReplyReply Direct Link To This Post Posted: 09 Sep 19 at 5:45PM
great. option 4 is a better solution so to deal with the text, word by word and post-processing extracted words later using vb code.
in my PDF I have a few half-space character which acts like a space but in this case, the two connecting words are assumed to be a one word rather than two separated words. (just like the character "-" in English; sample word: "non-destructive" which is considered 1 word not 2 words). 

I see that Quickpdf considers words which have a half-space character in the middle, as two separated words and extracts them separately. is there any way to extract them as one word? ... maybe define the separating character. because ascii code for space and half-space are not the same.
Back to Top
Ingo View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 29 Oct 05
Status: Offline
Points: 3524
Post Options Post Options   Thanks (0) Thanks(0)   Quote Ingo Quote  Post ReplyReply Direct Link To This Post Posted: 09 Sep 19 at 7:10PM
Hi again,


i have few code snippets for you how to deal with unicode- and integer-values using vb6:

module1.bas
-----------

Attribute VB_Name = "Module1"

Public Declare Function functionname1 Lib "function.dll" (ByVal parameter As String) As Integer 
Public Declare Function functionname2 Lib "function.dll" (ByVal parameter1 As String, ByVal parameter2 As Integer) As Long ' The returned string content

Public Declare Function apiLStrCopyW Lib "kernel32.dll" Alias "lstrcpyW" (ByVal lpString1 As Long, ByVal lpString2 As Long) As Long
Public Declare Function apiLStrLenW Lib "kernel32.dll" Alias "lstrlenW" (ByVal lpString As Long) As Long

Public Function GetStringFromPtrW(ByVal ptr As Long) As String
  'create a matching buffer
  GetStringFromPtrW = String$(apiLStrLenW(ptr), 0)
  'copying the string into the buffer
  apiLStrCopyW StrPtr(GetStringFromPtrW), ptr
End Function

form1.frm
---------

VERSION 5.00
Begin VB.Form Form1
   Caption         =   "vb6-sample - ..."
   ClientHeight    =   5475
   ClientLeft      =   45
   ClientTop       =   435
   ClientWidth     =   7365
   LinkTopic       =   "Form1"
   ScaleHeight     =   5475
   ScaleWidth      =   7365
   StartUpPosition =   3  'Windows-Standard
   Begin VB.CheckBox Check7

"  . . .

Public r As String

Private Sub Command1_Click()
  Dim sPfad() As Byte

    sPfad = StrConv(Text1.Text, vbUnicode)
   
    Text7.Text = Str(functionsname1(sPfad))

End Sub

Private Sub option1_Click()
  Dim sPfad() As Byte
  Dim tPfad() As Byte
  Dim title() As Byte
  Dim sp As Integer
" . . .
    
    If Check1.Value = 1 Then
       sp = 1
      Else
       sp = 0
    End If
"   . . .

Cheers,
Ingo

Back to Top
 Post Reply Post Reply
  Share Topic   

Forum Jump Forum Permissions View Drop Down

Forum Software by Web Wiz Forums® version 11.01
Copyright ©2001-2014 Web Wiz Ltd.

Copyright © 2017 Debenu. Debenu Quick PDF Library is a PDF SDK. All rights reserved. AboutContactBlogSupportOnline Store