Do you own a Debenu Quick PDF Library version 7, 8, 9, 10, 11, 12, 13 or iSEDQuickPDF license? Upgrade to Debenu Quick PDF Library 14 today!

Debenu Quick PDF Library - PDF SDK Community Forum Homepage
Forum Home Forum Home > For Users of the Library > I need help - I can help
  New Posts New Posts RSS Feed - ExtractFilePageText - what is the color in CMYK
  FAQ FAQ  Forum Search   Register Register  Login Login

ExtractFilePageText - what is the color in CMYK

 Post Reply Post Reply
Author
Message
edvoigt View Drop Down
Senior Member
Senior Member
Avatar

Joined: 26 Mar 11
Location: Berlin, Germany
Status: Offline
Points: 111
Post Options Post Options   Thanks (0) Thanks(0)   Quote edvoigt Quote  Post ReplyReply Direct Link To This Post Topic: ExtractFilePageText - what is the color in CMYK
    Posted: 28 Sep 11 at 9:54AM
I would like to use ExtractFilePageText in an existing PDF with the goal, to get all text of a page written with one color, which the user of my program may determine. For example I want to get by filtering the CSV only all yellow text.

In the CSV-string I find color-codes like #231F20 for a pure black or #EC008C for a pure magenta. I know the used color (I wrote the PDF before with QuickPDF using SetTextColorCMYK, so I'm sure about this).

I miss the possibility to get from the colorcodes used by QuickPDF in the CSV back to the real definition of the color in CMYk or RGB, depending from the used operator in the pdf.
Back to Top
edvoigt View Drop Down
Senior Member
Senior Member
Avatar

Joined: 26 Mar 11
Location: Berlin, Germany
Status: Offline
Points: 111
Post Options Post Options   Thanks (0) Thanks(0)   Quote edvoigt Quote  Post ReplyReply Direct Link To This Post Posted: 28 Sep 11 at 3:44PM
The color-code looks like RGB in hex-notation as in HTML, CSS ...

And it is a RGB-value.

But how identify this #231F20 as a CMYK-value of (0 0 0 1)? It has to go wrong, latest, if there some different gray colors.

Back to Top
AndrewC View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 08 Dec 10
Location: Geelong, Aust
Status: Offline
Points: 841
Post Options Post Options   Thanks (0) Thanks(0)   Quote AndrewC Quote  Post ReplyReply Direct Link To This Post Posted: 28 Sep 11 at 4:04PM
The CMYK values are converted to RGB during text extraction and are currently not stored anywhere as a CMYK value.

You either need a RGB to CMYK conversion routine by the sounds of things or if you know what the colours are then your could have make your own RGB ->CMYK conversion table if you know all the colours that you have used.

Back to Top
edvoigt View Drop Down
Senior Member
Senior Member
Avatar

Joined: 26 Mar 11
Location: Berlin, Germany
Status: Offline
Points: 111
Post Options Post Options   Thanks (0) Thanks(0)   Quote edvoigt Quote  Post ReplyReply Direct Link To This Post Posted: 28 Sep 11 at 4:24PM
Thanks,

but impossible, the PDFs come normally from costumers. Only in the test-case I know the used colors.

The best solution would be a separate function, which does no colorconversion, because the conversion is not 1:1. That means more than one CMYK-quadruple gives th same RGB-value. You may compose for example a gray bei using only K or by a mixture of CMY. You get two diffenrent colors which will look for human eye and for RGB to be the same, but are not. Therefor no chance to be sure.


But thanks for your idea.

Back to Top
edvoigt View Drop Down
Senior Member
Senior Member
Avatar

Joined: 26 Mar 11
Location: Berlin, Germany
Status: Offline
Points: 111
Post Options Post Options   Thanks (0) Thanks(0)   Quote edvoigt Quote  Post ReplyReply Direct Link To This Post Posted: 13 Oct 11 at 2:52PM
The problem is solved.

In version 8.12 beta 2 is a new way opened for this. Using

    QP.LoadFromFile(filename, '');
    QP.SetTextExtractionOptions(4, 1);
    CSV := QP.GetPageText(3);


where CSV is a string-var, we get lines like this:

"MicrosoftSansSerif",FF000000,17.01,85.0394,47....4054,"CYAN"
"MicrosoftSansSerif",00FF0000,17.01,198.4252,4....4054,"MAGENTA"
"MicrosoftSansSerif",0000FF00,17.01,311.811,47....4054,"YELLOW"
"MicrosoftSansSerif",000000FF,17.01,425.1969,4....4054,"KEY"


This is: every CMYK-part is in a byte, where $FF means 1.0

Andrew, thanks for this solution.

Werner
Back to Top
 Post Reply Post Reply
  Share Topic   

Forum Jump Forum Permissions View Drop Down

Forum Software by Web Wiz Forums® version 11.01
Copyright ©2001-2014 Web Wiz Ltd.

Copyright © 2017 Debenu. Debenu Quick PDF Library is a PDF SDK. All rights reserved. AboutContactBlogSupportOnline Store