Print Page | Close Window

Help with detecting visible OCGs and removing OCGs

Printed From: Debenu Quick PDF Library - PDF SDK Community Forum
Category: For Users of the Library
Forum Name: I need help - I can help
Forum Description: Problems and solutions while programming with the Debenu Quick PDF Library and Debenu PDF Viewer SDK
URL: http://www.quickpdf.org/forum/forum_posts.asp?TID=3153
Printed Date: 27 Jan 26 at 10:18PM
Software Version: Web Wiz Forums 11.01 - http://www.webwizforums.com


Topic: Help with detecting visible OCGs and removing OCGs
Posted By: smleino
Subject: Help with detecting visible OCGs and removing OCGs
Date Posted: 03 Aug 15 at 6:41PM
I am working with QuickPDF Library 11.15 using the ActiveX C# interface on a PDF that has both visible and non-visible Optional Content Groups (OCGs) and my goal is:
1. Determine the OCGs present and which ones are visible and non-visible
2. Remove the non-visible OCGs and their content from the document
3. Save the PDF which now has just the visible OCGs remaining

Based on the Library documentation, this seemed simple enough, but I am running into various problems.  The source PDF I am testing with has one page and two OCGS - a visible OCG containing English text and a non-visible OCG containing French text.  Based on the code snippet below, I expect that QuickPDF will tell me that the first OCG is visible and the second one non-visible:

                string sourceDoc = @"..\..\Test Files\samplepdfwithlayers.pdf";
                qp.LoadFromFile(sourceDoc, "");

                // Count OCGs
                int OCGCount = qp.OptionalContentGroupCount();
                // Loop through each OCG and delete it
                for (int i = 1; i <= OCGCount; i++)
                {
                    int OCGID = qp.GetOptionalContentGroupID(i);
                    int visible = qp.GetOptionalContentGroupVisible(OCGID);
                }

Okay, so far, so good, QuickPDF does indeed indicate on visible and one non-visible OCG.

Next, I want to add a call to Remove the non-visible OCG and then save what should now be a PDF with just the single remaining visible, English text, OCG:


                string sourceDoc = @"..\..\Test Files\samplepdfwithlayers.pdf";
                string destfileName = @"..\..\Test Files\convertedPDFA.pdf";
    
                qp.LoadFromFile(sourceDoc, "");

                // Count OCGs
                int OCGCount = qp.OptionalContentGroupCount(); //should be 2 initially
                // Loop through each OCG and delete any that are non-visible
                for (int i = 1; i <= OCGCount; i++)
                {
                    int OCGID = qp.GetOptionalContentGroupID(i);
                    int visible = qp.GetOptionalContentGroupVisible(OCGID);
                    if (visible == 0) //if invisible, delete the OCG
                    {
                        qp.DeleteOptionalContentGroup(OCGID);
                    }
                }

                // Count OCGs again
                OCGCount = qp.OptionalContentGroupCount(); //should be 1 now

                qp.SaveToFile(destfileName);

So, here is the problem - the new PDF does have just one remaining OCG but instead of only showing the English text from the original visible OCG, it shows both the English AND French text on top of each other!  

Can someone tell me why the French text is even still in the PDF and why it has been put into the remaining OCG?  How do I make sure that content from a removed OCG is removed from the PDF?

Thanks!



Replies:
Posted By: smleino
Date Posted: 04 Aug 15 at 2:25PM
Does anyone know if this is working as designed or if this is a bug?


Posted By: Rowan
Date Posted: 05 Aug 15 at 9:14PM
Optional Content Groups don't contain any content themselves, rather it's a way of grouping content streams in the document. So removing one OCG just means that the remaining OCG has to contain all of the content streams. Or to put it another way, the text that was in the OCG which was deleted is shown on the page because it no longer belongs to an OCG telling it to be invisible.

What you want to do is delete the OCG and its associated content streams, however, this can be fraught with danger as not all content streams are safe to delete (i.e. might mess up your document in unexpected ways), so it requires testing with your documents.

I will put together some sample code for you tomorrow. The function you use to delete content streams is DeleteContentStream but the trick is determining which content stream the OCG is assigned to (if you already know this then it's obviously easier and you can probably work it out yourself).


Posted By: smleino
Date Posted: 06 Aug 15 at 2:20PM
I have tried to see whether the content from the layers is in separate content streams using the following code:
 
                int xPageCount = qp.PageCount();
                // Go through each page and encapsulate content streams
                for (int i = 1; i <= xPageCount; i++)
                {
                    result = qp.SelectPage(i);
                    int xContentStreamCount = qp.ContentStreamCount();
                    for (int x = 1; x <= xContentStreamCount; x++)
                    {
                        result = qp.SelectContentStream(x);
                        byte [] var = (byte[]) qp.GetContentStreamToVariant();
                        string contentString = Encoding.UTF8.GetString(var, 0, var.Length);
                    }
                }
 
The source doc has a single page with two OCGs:  the first, visible one contains English text; the second, non-visible one contains French text.
Unfortunately, even though the code above does find two content streams, it seems that the first stream has all the text, both English and French, while the second stream appears to be empty.
 
Thoughts?


Posted By: Rowan
Date Posted: 07 Sep 15 at 6:53AM
There is just the one content stream and it has this form:

/OC /MC0 BDC [French content] EMC /OC /MC1 BDC [English content] EMC

So it should be possible to split the page content just by deleting from the /OC to the EMC tag.

We've tried doing that and the first part works okay. But when we try the same thing with the second part it results in an invalid page content stream and Acrobat gives an error when rendering the page.

It looks like when Acrobat hides marked content it still processes all of the commands but just doesn't make any output on the page.

So to split the content we would need to:

1. Identify all the OCG parts of the content stream, looking for BDC and EMC tags.
2. Process all of the page commands between BDC and EMC
3. Delete or otherwise disable any command that causes output

As you can see, it should be possible to accomplish what you are trying to do but due to the fact that each OCG doesn't have its own content stream it becomes complicated.


Posted By: smleino
Date Posted: 08 Sep 15 at 12:55PM
Thanks for the information - I had noticed some of what you reported but did not have enough PDF internals knowledge to fully explain and diagnose the problem.

Will Quick PDF be able to handle this now or will it require changes to the library?  Or do I need to look at handling this some other way?

Thanks!



Print Page | Close Window

Forum Software by Web Wiz Forums® version 11.01 - http://www.webwizforums.com
Copyright ©2001-2014 Web Wiz Ltd. - http://www.webwiz.co.uk