Print Page | Close Window

ExtractFilePageText - what is the color in CMYK

Printed From: Debenu Quick PDF Library - PDF SDK Community Forum
Category: For Users of the Library
Forum Name: I need help - I can help
Forum Description: Problems and solutions while programming with the Debenu Quick PDF Library and Debenu PDF Viewer SDK
URL: http://www.quickpdf.org/forum/forum_posts.asp?TID=1975
Printed Date: 16 Jul 25 at 10:09PM
Software Version: Web Wiz Forums 11.01 - http://www.webwizforums.com


Topic: ExtractFilePageText - what is the color in CMYK
Posted By: edvoigt
Subject: ExtractFilePageText - what is the color in CMYK
Date Posted: 28 Sep 11 at 9:54AM
I would like to use ExtractFilePageText in an existing PDF with the goal, to get all text of a page written with one color, which the user of my program may determine. For example I want to get by filtering the CSV only all yellow text.

In the CSV-string I find color-codes like #231F20 for a pure black or #EC008C for a pure magenta. I know the used color (I wrote the PDF before with QuickPDF using SetTextColorCMYK, so I'm sure about this).

I miss the possibility to get from the colorcodes used by QuickPDF in the CSV back to the real definition of the color in CMYk or RGB, depending from the used operator in the pdf.



Replies:
Posted By: edvoigt
Date Posted: 28 Sep 11 at 3:44PM
The color-code looks like RGB in hex-notation as in HTML, CSS ...

And it is a RGB-value.

But how identify this #231F20 as a CMYK-value of (0 0 0 1)? It has to go wrong, latest, if there some different gray colors.



Posted By: AndrewC
Date Posted: 28 Sep 11 at 4:04PM
The CMYK values are converted to RGB during text extraction and are currently not stored anywhere as a CMYK value.

You either need a RGB to CMYK conversion routine by the sounds of things or if you know what the colours are then your could have make your own RGB ->CMYK conversion table if you know all the colours that you have used.



Posted By: edvoigt
Date Posted: 28 Sep 11 at 4:24PM
Thanks,

but impossible, the PDFs come normally from costumers. Only in the test-case I know the used colors.

The best solution would be a separate function, which does no colorconversion, because the conversion is not 1:1. That means more than one CMYK-quadruple gives th same RGB-value. You may compose for example a gray bei using only K or by a mixture of CMY. You get two diffenrent colors which will look for human eye and for RGB to be the same, but are not. Therefor no chance to be sure.


But thanks for your idea.



Posted By: edvoigt
Date Posted: 13 Oct 11 at 2:52PM
The problem is solved.

In version 8.12 beta 2 is a new way opened for this. Using

    QP.LoadFromFile(filename, '');
    QP.SetTextExtractionOptions(4, 1);
    CSV := QP.GetPageText(3);


where CSV is a string-var, we get lines like this:

"MicrosoftSansSerif",FF000000,17.01,85.0394,47....4054,"CYAN"
"MicrosoftSansSerif",00FF0000,17.01,198.4252,4....4054,"MAGENTA"
"MicrosoftSansSerif",0000FF00,17.01,311.811,47....4054,"YELLOW"
"MicrosoftSansSerif",000000FF,17.01,425.1969,4....4054,"KEY"


This is: every CMYK-part is in a byte, where $FF means 1.0

Andrew, thanks for this solution.

Werner



Print Page | Close Window

Forum Software by Web Wiz Forums® version 11.01 - http://www.webwizforums.com
Copyright ©2001-2014 Web Wiz Ltd. - http://www.webwiz.co.uk