Do you own a Debenu Quick PDF Library version 7, 8, 9, 10, 11, 12, 13 or iSEDQuickPDF license? Upgrade to Debenu Quick PDF Library 14 today!

Debenu Quick PDF Library - PDF SDK Community Forum Homepage
Forum Home Forum Home > For Users of the Library > I need help - I can help
  New Posts New Posts RSS Feed - TPDFTrueTypeParser.LoadFromString
  FAQ FAQ  Forum Search   Register Register  Login Login

TPDFTrueTypeParser.LoadFromString

 Post Reply Post Reply
Author
Message
Ingo View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 29 Oct 05
Status: Offline
Points: 3524
Post Options Post Options   Thanks (0) Thanks(0)   Quote Ingo Quote  Post ReplyReply Direct Link To This Post Topic: TPDFTrueTypeParser.LoadFromString
    Posted: 06 Sep 06 at 1:42AM
Hello!

While extracting text from a special pdf file i've got a range-error. The german error text is "Fehler bei Bereichsprüfung". In English it should be like "Error while range check" (or so...). 'Till now it's only this one pdf file. When i try to get this error while starting my app from delphi the program stops in the file uPDFTrueTypeParser.pas, in the procedure TPDFTrueTypeParser.LoadFromString, at the command "LoadFromStream(SS)".
It seems to me that "SS" contains characters which are out of the defined range?! I should check the content before... or should i check if "SS" was created properly before... but how?
The pdf is at http://www.is-soft.de/RPT060077YU.pdf perhaps anyone of you can help.

Thanks a lot in advance!

Best regards,
Ingo
Back to Top
Ingo View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 29 Oct 05
Status: Offline
Points: 3524
Post Options Post Options   Thanks (0) Thanks(0)   Quote Ingo Quote  Post ReplyReply Direct Link To This Post Posted: 06 Sep 06 at 4:34PM
Hi!

It's a bit better now. In the file uPDFTrueTypeParser.pas / procedure TPDFTrueTypeParser.LoadFromString i've made few changes. Now the textextraction runs through but few lines of text (looking a bit different than the other ones) is missing. Perhaps here's still anybody with an idea? Here are my changes in uPDFTrueTypeParser.pas:

. . .
function IsString(const str: string): boolean;
var
len: integer;
p : PChar;
begin
len := length(str);
result := CompareMem( p, pchar(str), len);
inc(p, len);
end;
. . .
procedure TPDFTrueTypeParser.LoadFromString(const Source: String);
var
SS: TStringStream;
begin
// Begin - Stringtest - Ingo - 2006/09/06
If not IsString(Source) Then
     Exit;
// End - Stringtest - Ingo - 2006/09/06
     
SS := TStringStream.Create(Source);

try
    LoadFromStream(SS);
finally
    SS.Free;
end;
Back to Top
hbarclay View Drop Down
Team Player
Team Player


Joined: 29 Oct 05
Location: United States
Status: Offline
Points: 39
Post Options Post Options   Thanks (0) Thanks(0)   Quote hbarclay Quote  Post ReplyReply Direct Link To This Post Posted: 06 Sep 06 at 5:39PM
Ingo,

I'm afraid I have more questions than answers, but here is what I found.

The exception is actually happening in the htmxload function which has no error trapping. I added a try except around the offending line and just exited when an error occurred, everything seemed to work normally after that.(code below)

Obviously not a good solution, but maybe it helps to find where the problem really is.

Is this a one time thing, or does this error happen in multiple documents. I noticed that one of the fonts in the document is listed as an "Embedded Subset". Is that intentional, or does that indicate corruption of what is supposed to be an embedded font?

Sorry I can't help more, but I have a lot to learn about the pdf format still.

Harry



procedure TPDFTrueTypeParser.hmtxLoad;
var
TI: Integer;
CP: Integer;
X: Integer;
begin
TI := FindTable('hmtx');
if TI >= 0 then
begin
    CP := FTableDirectory[TI].offset;
    FStream.Seek(CP, soFromBeginning);
    try
      SetLength(FhmtxTable.Widths, FhheaTable.numberOfHMetrics);
    except
      exit;
    end;
    for X := 0 to Length(FhmtxTable.Widths) - 1 do
    begin
      FhmtxTable.Widths[X] := ReadUShort * 1000 div FheadTable.unitsPerEm;
      ReadShort;
    end;
end;
end;
Back to Top
Ingo View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 29 Oct 05
Status: Offline
Points: 3524
Post Options Post Options   Thanks (0) Thanks(0)   Quote Ingo Quote  Post ReplyReply Direct Link To This Post Posted: 07 Sep 06 at 1:31AM
Hello Harry!

Thanks for your help!
'Till now it's only this single pdf ...
It was an email-text with send and reply-parts and so there are few different fonts, too.

I think the problem is one of the used fonts 'cause quickpdf can't recognize the relevant characters... returns with an empty string... and crashes.

I think there should be anywhere constant settings how to recognize characters... I'll searching ;-)

Best regards,
Ingo
Back to Top
hbarclay View Drop Down
Team Player
Team Player


Joined: 29 Oct 05
Location: United States
Status: Offline
Points: 39
Post Options Post Options   Thanks (0) Thanks(0)   Quote hbarclay Quote  Post ReplyReply Direct Link To This Post Posted: 07 Sep 06 at 10:27AM
Ingo,

There are two possibilities, and I'm not sure I know enough about pdf format to determine which it is. This is either a valid pdf file that QuickPdf doesn't know how to handle, or it is a corrupted file that QuickPdf should handle more gracefully.

Unless you consistently run into this issue I think I would treat it as the second option, a corrupted file that needs to be handled more gracefully.

I'm not sure your approach of trying to recognize characters is going to be successful. If I'm not mistaken, the string you are testing actually contains binary data from the pdf file. Do the pdf specs say this information should be character based? The function that is crashing is doing so when it is trying to extract the font header information from that string. Either that font information is incorrect, or QuickPdf just doesn't know what to do with it.

I think the best solution would be to trap the error, report it to the user as an invalid file and expect the data to be suspect.

Harry


Back to Top
hbarclay View Drop Down
Team Player
Team Player


Joined: 29 Oct 05
Location: United States
Status: Offline
Points: 39
Post Options Post Options   Thanks (0) Thanks(0)   Quote hbarclay Quote  Post ReplyReply Direct Link To This Post Posted: 07 Sep 06 at 12:29PM
Ingo,

I'm not sure if I understood the purpose of your IsString function, but if it was to test for non-ascii characters in a string, try the function below.

The strings used in the LoadFromString procedure do contain binary data, so searching for non-ascii characters there is of no use. The problem is that QuickPdf is not recognizing the strange font. That does seem to be a problem since it does display properly in Acrobat. I'll keep trying, but it may be a few days before I have the time to make any progress.

Harry


function IsAsciiString(const str: string): boolean;
var
i : integer;
begin
result := true;
for i := 1 to length(str) do
begin
    if (ord(str) < 32) and (not (ord(str) in [9, 10, 13])) then
      begin
        result := false;
        break;
      end;
end;
end;
Back to Top
Ingo View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 29 Oct 05
Status: Offline
Points: 3524
Post Options Post Options   Thanks (0) Thanks(0)   Quote Ingo Quote  Post ReplyReply Direct Link To This Post Posted: 07 Sep 06 at 4:59PM
Harry.
Thanks again for your help!!!
Back to Top
hbarclay View Drop Down
Team Player
Team Player


Joined: 29 Oct 05
Location: United States
Status: Offline
Points: 39
Post Options Post Options   Thanks (0) Thanks(0)   Quote hbarclay Quote  Post ReplyReply Direct Link To This Post Posted: 07 Sep 06 at 6:47PM
Ingo,

More information.

The pdf was created by an activePDF toolkit, here are some interesting links in their knowledge base.

http://www.activepdf.com/support/knowledgebase/viewKb.cfm?fs=1&ID=10318

http://www.activepdf.com/support/knowledgebase/viewKb.cfm?fs=1&ID=10051

I'm assuming that QuickPdf just doesn't know how to handle the subset of an embedded font. I'll see if I can figure that out, but it may be more than I am prepared to handle.

Harry
Back to Top
Ingo View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 29 Oct 05
Status: Offline
Points: 3524
Post Options Post Options   Thanks (0) Thanks(0)   Quote Ingo Quote  Post ReplyReply Direct Link To This Post Posted: 08 Sep 06 at 1:19AM
Harry,

i think there's no need to dig deeper ;-)
The knowledge base links seems to be the cause? ...
I think there's only the question why foxit or acrobat can copy the text content completely ;-)
They don't extract the selected characters with the help of an embedded font???

Again - Thanks a lot.

Best regards,
Ingo


Edited by Ingo
Back to Top
hbarclay View Drop Down
Team Player
Team Player


Joined: 29 Oct 05
Location: United States
Status: Offline
Points: 39
Post Options Post Options   Thanks (0) Thanks(0)   Quote hbarclay Quote  Post ReplyReply Direct Link To This Post Posted: 08 Sep 06 at 10:05AM
Ingo,

I unfortunately do have a need to dig deeper. Our application deals with two different types of pdf documents. The most important is a set of financial documents provided by a commercial document vendor. Those documents we have so far had no problems with. We use QuickPdf to manipulate and print the documents.

The other type of document is whatever the banks want to create locally to have printed along with the documents provided by the commercial vendor. With these locally generated documents you just never know what you will run into. Our most common issue is with fonts that don't reproduce the same way they do in Adobe. That's not a huge problem because we can ask the bank to change fonts, or get the document and do it ourselves. While it's a problem we can overcome, it would be better if we did not have to deal with that.

My biggest fear is that one day the commercial vendor will release a new set of documents that QuickPdf can't handle properly. With these documents fonts not reproducing exactly the way they are supposed to is a show stopper, and obviously there is going to be no help from iSed when that happens. We have a "plan b" for that situation, but it is not nearly as good a solution as QuickPdf is. I would feel much better about our situation if we could get these irritating font issues resolved.

Thanks
Harry
Back to Top
Ingo View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 29 Oct 05
Status: Offline
Points: 3524
Post Options Post Options   Thanks (0) Thanks(0)   Quote Ingo Quote  Post ReplyReply Direct Link To This Post Posted: 10 Sep 06 at 2:18PM
I've found out that the pdf-files with problems are always files without embedded fonts. So i think it's a problem to interprete the character data without font data...
It should be possible to insert a standard font or to replace an exotic font by a standard font?

Best regards,
Ingo
Back to Top
hbarclay View Drop Down
Team Player
Team Player


Joined: 29 Oct 05
Location: United States
Status: Offline
Points: 39
Post Options Post Options   Thanks (0) Thanks(0)   Quote hbarclay Quote  Post ReplyReply Direct Link To This Post Posted: 10 Sep 06 at 4:12PM
Ingo,

I just noticed that in the IsAsciiString procedure I posted there are some very important characters missing. At first I thought I had messed up posting the code, but I copied and pasted from a working Delphi app so that was very unlikely. When I tried to post a correction I noticed on the "Preview Post" function that the forum software is apparently dropping out some very important characters.

This line.

if (ord(str) < 32) and (not (ord(str) in [9, 10, 13])) then

should have been

if (ord(str[i]) < 32) and (not (ord(str[i]) in [9, 10, 13])) then

I had Forum Codes enabled and it was stripping out the [i] for me.

Sorry for any confusion.

Harry
Back to Top
swb1 View Drop Down
Debenu Quick PDF Library Expert
Debenu Quick PDF Library Expert
Avatar

Joined: 05 Dec 05
Location: United States
Status: Offline
Points: 100
Post Options Post Options   Thanks (0) Thanks(0)   Quote swb1 Quote  Post ReplyReply Direct Link To This Post Posted: 16 Oct 06 at 10:52AM

I really don’t know the full ramifications of this change but I do know what is causing the Range Check Error. The hheaLoad procedure gets numberOfHMetrics using the ReadShort function. The ReadShort function changes a Word value to a ShortInt value. When the ReadUShort function gets a Word value from the stream that is value that is larger than 32767. the ReadShort function "wraps" the value and returns a negative number. That will cause the Range Check Error.

 

The small code change below seems to work however as I stated at the onset - I don’t know the full ramifications of this change.

I believe that this is safe given that any call to ReadUShort that would not have caused a Range Error would still return the same value as ReadShort.

 

 

procedure TPDFTrueTypeParser.hheaLoad;

var

  TI: Integer;

  CP: Integer;

begin

  TI := FindTable('hhea');

  if TI >= 0 then

  begin

    CP := FTableDirectory[TI].offset;

    FStream.Seek(CP, soFromBeginning);

    if ReadFixed = 1 then

    begin

             FhheaTable.Ascender := ReadShort * 1000 div FheadTable.unitsPerEm;

             FhheaTable.Descender := ReadShort * 1000 div FheadTable.unitsPerEm;

             FhheaTable.LineGap := ReadShort * 1000 div FheadTable.unitsPerEm;

             FhheaTable.advanceWidthMax := ReadUShort * 1000 div FheadTable.unitsPerEm;

             FhheaTable.minLeftSideBearing := ReadShort * 1000 div FheadTable.unitsPerEm;

             FhheaTable.minRightSideBearing := ReadShort * 1000 div FheadTable.unitsPerEm;

             FhheaTable.xMaxExtent := ReadShort * 1000 div FheadTable.unitsPerEm;

             FhheaTable.caretSlopeRise := ReadShort;

             FhheaTable.caretSlopeRun := ReadShort;

             FhheaTable.caretOffset := ReadShort;

             ReadShort;

             ReadShort;

             ReadShort;

             ReadShort;

             ReadShort;

            //FhheaTable.numberOfHMetrics := ReadShort; //swb

             FhheaTable.numberOfHMetrics := ReadUShort; //swb

    end;

  end;

end;



Edited by swb1
Back to Top
Ingo View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 29 Oct 05
Status: Offline
Points: 3524
Post Options Post Options   Thanks (0) Thanks(0)   Quote Ingo Quote  Post ReplyReply Direct Link To This Post Posted: 16 Oct 06 at 3:51PM
Thanks a lot Steve!

I'll try it.

Best regards,
Ingo
Back to Top
marian_pascalau View Drop Down
Debenu Quick PDF Library Expert
Debenu Quick PDF Library Expert


Joined: 28 Mar 06
Location: Germany
Status: Offline
Points: 278
Post Options Post Options   Thanks (0) Thanks(0)   Quote marian_pascalau Quote  Post ReplyReply Direct Link To This Post Posted: 17 Oct 06 at 6:40AM

Steve,

I have checked your suggestion. Based on the TrueType description I have found in internet you are 100% right.

See: http://developer.apple.com/textfonts/TTRefMan/RM06/Chap6hhea.html

Table 23 : 'hhea' table

Type
Name
Description
Fixed version 0x00010000 (1.0)
FWord ascent Distance from baseline of highest ascender
FWord descent Distance from baseline of lowest descender
FWord lineGap typographic line gap
uFWord advanceWidthMax must be consistent with horizontal metrics
FWord minLeftSideBearing must be consistent with horizontal metrics
FWord minRightSideBearing must be consistent with horizontal metrics
FWord xMaxExtent max(lsb + (xMax-xMin))
int16 caretSlopeRise used to calculate the slope of the caret (rise/run) set to 1 for vertical caret
int16 caretSlopeRun 0 for vertical
FWord caretOffset set value to 0 for non-slanted fonts
int16 reserved set value to 0
int16 reserved set value to 0
int16 reserved set value to 0
int16 reserved set value to 0
int16 metricDataFormat 0 for current format
uint16 numOfLongHorMetrics number of advance widths in metrics table

This is self explanatory .

Thanks Steve

Back to Top
 Post Reply Post Reply
  Share Topic   

Forum Jump Forum Permissions View Drop Down

Forum Software by Web Wiz Forums® version 11.01
Copyright ©2001-2014 Web Wiz Ltd.

Copyright © 2017 Debenu. Debenu Quick PDF Library is a PDF SDK. All rights reserved. About — Contact — Blog — Support — Online Store