Do you own a Debenu Quick PDF Library version 7, 8, 9, 10, 11, 12, 13 or iSEDQuickPDF license? Upgrade to Debenu Quick PDF Library 14 today!
TPDFTrueTypeParser.LoadFromString |
Post Reply |
Author | ||||||||||||||||||||||||||||||||||||||||||||||||||||||
Ingo
Moderator Group Joined: 29 Oct 05 Status: Offline Points: 3524 |
Post Options
Thanks(0)
Posted: 06 Sep 06 at 1:42AM |
|||||||||||||||||||||||||||||||||||||||||||||||||||||
Hello!
While extracting text from a special pdf file i've got a range-error. The german error text is "Fehler bei Bereichsprüfung". In English it should be like "Error while range check" (or so...). 'Till now it's only this one pdf file. When i try to get this error while starting my app from delphi the program stops in the file uPDFTrueTypeParser.pas, in the procedure TPDFTrueTypeParser.LoadFromString, at the command "LoadFromStream(SS)". It seems to me that "SS" contains characters which are out of the defined range?! I should check the content before... or should i check if "SS" was created properly before... but how? The pdf is at http://www.is-soft.de/RPT060077YU.pdf perhaps anyone of you can help. Thanks a lot in advance! Best regards, Ingo |
||||||||||||||||||||||||||||||||||||||||||||||||||||||
Ingo
Moderator Group Joined: 29 Oct 05 Status: Offline Points: 3524 |
Post Options
Thanks(0)
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||
Hi!
It's a bit better now. In the file uPDFTrueTypeParser.pas / procedure TPDFTrueTypeParser.LoadFromString i've made few changes. Now the textextraction runs through but few lines of text (looking a bit different than the other ones) is missing. Perhaps here's still anybody with an idea? Here are my changes in uPDFTrueTypeParser.pas: . . . function IsString(const str: string): boolean; var len: integer; p : PChar; begin len := length(str); result := CompareMem( p, pchar(str), len); inc(p, len); end; . . . procedure TPDFTrueTypeParser.LoadFromString(const Source: String); var SS: TStringStream; begin // Begin - Stringtest - Ingo - 2006/09/06 If not IsString(Source) Then Exit; // End - Stringtest - Ingo - 2006/09/06 SS := TStringStream.Create(Source); try LoadFromStream(SS); finally SS.Free; end; |
||||||||||||||||||||||||||||||||||||||||||||||||||||||
hbarclay
Team Player Joined: 29 Oct 05 Location: United States Status: Offline Points: 39 |
Post Options
Thanks(0)
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||
Ingo,
I'm afraid I have more questions than answers, but here is what I found. The exception is actually happening in the htmxload function which has no error trapping. I added a try except around the offending line and just exited when an error occurred, everything seemed to work normally after that.(code below) Obviously not a good solution, but maybe it helps to find where the problem really is. Is this a one time thing, or does this error happen in multiple documents. I noticed that one of the fonts in the document is listed as an "Embedded Subset". Is that intentional, or does that indicate corruption of what is supposed to be an embedded font? Sorry I can't help more, but I have a lot to learn about the pdf format still. Harry procedure TPDFTrueTypeParser.hmtxLoad; var TI: Integer; CP: Integer; X: Integer; begin TI := FindTable('hmtx'); if TI >= 0 then begin CP := FTableDirectory[TI].offset; FStream.Seek(CP, soFromBeginning); try SetLength(FhmtxTable.Widths, FhheaTable.numberOfHMetrics); except exit; end; for X := 0 to Length(FhmtxTable.Widths) - 1 do begin FhmtxTable.Widths[X] := ReadUShort * 1000 div FheadTable.unitsPerEm; ReadShort; end; end; end; |
||||||||||||||||||||||||||||||||||||||||||||||||||||||
Ingo
Moderator Group Joined: 29 Oct 05 Status: Offline Points: 3524 |
Post Options
Thanks(0)
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||
Hello Harry!
Thanks for your help! 'Till now it's only this single pdf ... It was an email-text with send and reply-parts and so there are few different fonts, too. I think the problem is one of the used fonts 'cause quickpdf can't recognize the relevant characters... returns with an empty string... and crashes. I think there should be anywhere constant settings how to recognize characters... I'll searching ;-) Best regards, Ingo |
||||||||||||||||||||||||||||||||||||||||||||||||||||||
hbarclay
Team Player Joined: 29 Oct 05 Location: United States Status: Offline Points: 39 |
Post Options
Thanks(0)
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||
Ingo,
There are two possibilities, and I'm not sure I know enough about pdf format to determine which it is. This is either a valid pdf file that QuickPdf doesn't know how to handle, or it is a corrupted file that QuickPdf should handle more gracefully. Unless you consistently run into this issue I think I would treat it as the second option, a corrupted file that needs to be handled more gracefully. I'm not sure your approach of trying to recognize characters is going to be successful. If I'm not mistaken, the string you are testing actually contains binary data from the pdf file. Do the pdf specs say this information should be character based? The function that is crashing is doing so when it is trying to extract the font header information from that string. Either that font information is incorrect, or QuickPdf just doesn't know what to do with it. I think the best solution would be to trap the error, report it to the user as an invalid file and expect the data to be suspect. Harry |
||||||||||||||||||||||||||||||||||||||||||||||||||||||
hbarclay
Team Player Joined: 29 Oct 05 Location: United States Status: Offline Points: 39 |
Post Options
Thanks(0)
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||
Ingo,
I'm not sure if I understood the purpose of your IsString function, but if it was to test for non-ascii characters in a string, try the function below. The strings used in the LoadFromString procedure do contain binary data, so searching for non-ascii characters there is of no use. The problem is that QuickPdf is not recognizing the strange font. That does seem to be a problem since it does display properly in Acrobat. I'll keep trying, but it may be a few days before I have the time to make any progress. Harry function IsAsciiString(const str: string): boolean; var i : integer; begin result := true; for i := 1 to length(str) do begin if (ord(str) < 32) and (not (ord(str) in [9, 10, 13])) then begin result := false; break; end; end; end; |
||||||||||||||||||||||||||||||||||||||||||||||||||||||
Ingo
Moderator Group Joined: 29 Oct 05 Status: Offline Points: 3524 |
Post Options
Thanks(0)
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||
Harry.
Thanks again for your help!!! |
||||||||||||||||||||||||||||||||||||||||||||||||||||||
hbarclay
Team Player Joined: 29 Oct 05 Location: United States Status: Offline Points: 39 |
Post Options
Thanks(0)
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||
Ingo,
More information. The pdf was created by an activePDF toolkit, here are some interesting links in their knowledge base. http://www.activepdf.com/support/knowledgebase/viewKb.cfm?fs=1&ID=10318 http://www.activepdf.com/support/knowledgebase/viewKb.cfm?fs=1&ID=10051 I'm assuming that QuickPdf just doesn't know how to handle the subset of an embedded font. I'll see if I can figure that out, but it may be more than I am prepared to handle. Harry |
||||||||||||||||||||||||||||||||||||||||||||||||||||||
Ingo
Moderator Group Joined: 29 Oct 05 Status: Offline Points: 3524 |
Post Options
Thanks(0)
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||
Harry,
i think there's no need to dig deeper ;-) The knowledge base links seems to be the cause? ... I think there's only the question why foxit or acrobat can copy the text content completely ;-) They don't extract the selected characters with the help of an embedded font??? Again - Thanks a lot. Best regards, Ingo Edited by Ingo |
||||||||||||||||||||||||||||||||||||||||||||||||||||||
hbarclay
Team Player Joined: 29 Oct 05 Location: United States Status: Offline Points: 39 |
Post Options
Thanks(0)
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||
Ingo,
I unfortunately do have a need to dig deeper. Our application deals with two different types of pdf documents. The most important is a set of financial documents provided by a commercial document vendor. Those documents we have so far had no problems with. We use QuickPdf to manipulate and print the documents. The other type of document is whatever the banks want to create locally to have printed along with the documents provided by the commercial vendor. With these locally generated documents you just never know what you will run into. Our most common issue is with fonts that don't reproduce the same way they do in Adobe. That's not a huge problem because we can ask the bank to change fonts, or get the document and do it ourselves. While it's a problem we can overcome, it would be better if we did not have to deal with that. My biggest fear is that one day the commercial vendor will release a new set of documents that QuickPdf can't handle properly. With these documents fonts not reproducing exactly the way they are supposed to is a show stopper, and obviously there is going to be no help from iSed when that happens. We have a "plan b" for that situation, but it is not nearly as good a solution as QuickPdf is. I would feel much better about our situation if we could get these irritating font issues resolved. Thanks Harry |
||||||||||||||||||||||||||||||||||||||||||||||||||||||
Ingo
Moderator Group Joined: 29 Oct 05 Status: Offline Points: 3524 |
Post Options
Thanks(0)
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||
I've found out that the pdf-files with problems are always files without embedded fonts. So i think it's a problem to interprete the character data without font data...
It should be possible to insert a standard font or to replace an exotic font by a standard font? Best regards, Ingo |
||||||||||||||||||||||||||||||||||||||||||||||||||||||
hbarclay
Team Player Joined: 29 Oct 05 Location: United States Status: Offline Points: 39 |
Post Options
Thanks(0)
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||
Ingo,
I just noticed that in the IsAsciiString procedure I posted there are some very important characters missing. At first I thought I had messed up posting the code, but I copied and pasted from a working Delphi app so that was very unlikely. When I tried to post a correction I noticed on the "Preview Post" function that the forum software is apparently dropping out some very important characters. This line. if (ord(str) < 32) and (not (ord(str) in [9, 10, 13])) then should have been if (ord(str[i]) < 32) and (not (ord(str[i]) in [9, 10, 13])) then I had Forum Codes enabled and it was stripping out the [i] for me. Sorry for any confusion. Harry |
||||||||||||||||||||||||||||||||||||||||||||||||||||||
swb1
Debenu Quick PDF Library Expert Joined: 05 Dec 05 Location: United States Status: Offline Points: 100 |
Post Options
Thanks(0)
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||
I really don’t know the full ramifications of this change but I do know what is causing the Range Check Error. The hheaLoad procedure gets numberOfHMetrics using the ReadShort function. The ReadShort function changes a Word value to a ShortInt value. When the ReadUShort function gets a Word value from the stream that is value that is larger than 32767. the ReadShort function "wraps" the value and returns a negative number. That will cause the Range Check Error. The small code change below seems to work however as I stated at the onset - I don’t know the full ramifications of this change. I believe that this is safe given that any call to ReadUShort that would not have caused a Range Error would still return the same value as ReadShort. procedure TPDFTrueTypeParser.hheaLoad; var TI: Integer; CP: Integer; begin TI := FindTable('hhea'); if TI >= 0 then begin CP := FTableDirectory[TI].offset; FStream.Seek(CP, soFromBeginning); if ReadFixed = 1 then begin FhheaTable.Ascender := ReadShort * 1000 div FheadTable.unitsPerEm; FhheaTable.Descender := ReadShort * 1000 div FheadTable.unitsPerEm; FhheaTable.LineGap := ReadShort * 1000 div FheadTable.unitsPerEm; FhheaTable.advanceWidthMax := ReadUShort * 1000 div FheadTable.unitsPerEm; FhheaTable.minLeftSideBearing := ReadShort * 1000 div FheadTable.unitsPerEm; FhheaTable.minRightSideBearing := ReadShort * 1000 div FheadTable.unitsPerEm; FhheaTable.xMaxExtent := ReadShort * 1000 div FheadTable.unitsPerEm; FhheaTable.caretSlopeRise := ReadShort; FhheaTable.caretSlopeRun := ReadShort; FhheaTable.caretOffset := ReadShort; ReadShort; ReadShort; ReadShort; ReadShort; ReadShort; //FhheaTable.numberOfHMetrics := ReadShort; //swb FhheaTable.numberOfHMetrics := ReadUShort; //swb end; end; end; Edited by swb1 |
||||||||||||||||||||||||||||||||||||||||||||||||||||||
Ingo
Moderator Group Joined: 29 Oct 05 Status: Offline Points: 3524 |
Post Options
Thanks(0)
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||
Thanks a lot Steve!
I'll try it. Best regards, Ingo |
||||||||||||||||||||||||||||||||||||||||||||||||||||||
marian_pascalau
Debenu Quick PDF Library Expert Joined: 28 Mar 06 Location: Germany Status: Offline Points: 278 |
Post Options
Thanks(0)
|
|||||||||||||||||||||||||||||||||||||||||||||||||||||
Steve, I have checked your suggestion. Based on the TrueType description I have found in internet you are 100% right. See: http://developer.apple.com/textfonts/TTRefMan/RM06/Chap6hhea.html Table 23 :
This is self explanatory . Thanks Steve |
||||||||||||||||||||||||||||||||||||||||||||||||||||||
Post Reply | |
Tweet
|
Forum Jump | Forum Permissions You cannot post new topics in this forum You cannot reply to topics in this forum You cannot delete your posts in this forum You cannot edit your posts in this forum You cannot create polls in this forum You cannot vote in polls in this forum |
Copyright © 2017 Debenu. Debenu Quick PDF Library is a PDF SDK. All rights reserved. About — Contact — Blog — Support — Online Store