Do you own a Debenu Quick PDF Library version 7, 8, 9, 10, 11, 12, 13 or iSEDQuickPDF license? Upgrade to Debenu Quick PDF Library 14 today!

Debenu Quick PDF Library - PDF SDK Community Forum Homepage
Forum Home Forum Home > For Users of the Library > I need help - I can help
  New Posts New Posts RSS Feed - Help with detecting visible OCGs and removing OCGs
  FAQ FAQ  Forum Search   Register Register  Login Login

Help with detecting visible OCGs and removing OCGs

 Post Reply Post Reply
Author
Message
smleino View Drop Down
Beginner
Beginner


Joined: 03 Aug 15
Status: Offline
Points: 5
Post Options Post Options   Thanks (0) Thanks(0)   Quote smleino Quote  Post ReplyReply Direct Link To This Post Topic: Help with detecting visible OCGs and removing OCGs
    Posted: 03 Aug 15 at 6:41PM
I am working with QuickPDF Library 11.15 using the ActiveX C# interface on a PDF that has both visible and non-visible Optional Content Groups (OCGs) and my goal is:
1. Determine the OCGs present and which ones are visible and non-visible
2. Remove the non-visible OCGs and their content from the document
3. Save the PDF which now has just the visible OCGs remaining

Based on the Library documentation, this seemed simple enough, but I am running into various problems.  The source PDF I am testing with has one page and two OCGS - a visible OCG containing English text and a non-visible OCG containing French text.  Based on the code snippet below, I expect that QuickPDF will tell me that the first OCG is visible and the second one non-visible:

                string sourceDoc = @"..\..\Test Files\samplepdfwithlayers.pdf";
                qp.LoadFromFile(sourceDoc, "");

                // Count OCGs
                int OCGCount = qp.OptionalContentGroupCount();
                // Loop through each OCG and delete it
                for (int i = 1; i <= OCGCount; i++)
                {
                    int OCGID = qp.GetOptionalContentGroupID(i);
                    int visible = qp.GetOptionalContentGroupVisible(OCGID);
                }

Okay, so far, so good, QuickPDF does indeed indicate on visible and one non-visible OCG.

Next, I want to add a call to Remove the non-visible OCG and then save what should now be a PDF with just the single remaining visible, English text, OCG:


                string sourceDoc = @"..\..\Test Files\samplepdfwithlayers.pdf";
                string destfileName = @"..\..\Test Files\convertedPDFA.pdf";
    
                qp.LoadFromFile(sourceDoc, "");

                // Count OCGs
                int OCGCount = qp.OptionalContentGroupCount(); //should be 2 initially
                // Loop through each OCG and delete any that are non-visible
                for (int i = 1; i <= OCGCount; i++)
                {
                    int OCGID = qp.GetOptionalContentGroupID(i);
                    int visible = qp.GetOptionalContentGroupVisible(OCGID);
                    if (visible == 0) //if invisible, delete the OCG
                    {
                        qp.DeleteOptionalContentGroup(OCGID);
                    }
                }

                // Count OCGs again
                OCGCount = qp.OptionalContentGroupCount(); //should be 1 now

                qp.SaveToFile(destfileName);

So, here is the problem - the new PDF does have just one remaining OCG but instead of only showing the English text from the original visible OCG, it shows both the English AND French text on top of each other!  

Can someone tell me why the French text is even still in the PDF and why it has been put into the remaining OCG?  How do I make sure that content from a removed OCG is removed from the PDF?

Thanks!
Back to Top
smleino View Drop Down
Beginner
Beginner


Joined: 03 Aug 15
Status: Offline
Points: 5
Post Options Post Options   Thanks (0) Thanks(0)   Quote smleino Quote  Post ReplyReply Direct Link To This Post Posted: 04 Aug 15 at 2:25PM
Does anyone know if this is working as designed or if this is a bug?
Back to Top
Rowan View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 10 Jan 09
Status: Offline
Points: 398
Post Options Post Options   Thanks (0) Thanks(0)   Quote Rowan Quote  Post ReplyReply Direct Link To This Post Posted: 05 Aug 15 at 9:14PM
Optional Content Groups don't contain any content themselves, rather it's a way of grouping content streams in the document. So removing one OCG just means that the remaining OCG has to contain all of the content streams. Or to put it another way, the text that was in the OCG which was deleted is shown on the page because it no longer belongs to an OCG telling it to be invisible.

What you want to do is delete the OCG and its associated content streams, however, this can be fraught with danger as not all content streams are safe to delete (i.e. might mess up your document in unexpected ways), so it requires testing with your documents.

I will put together some sample code for you tomorrow. The function you use to delete content streams is DeleteContentStream but the trick is determining which content stream the OCG is assigned to (if you already know this then it's obviously easier and you can probably work it out yourself).
Back to Top
smleino View Drop Down
Beginner
Beginner


Joined: 03 Aug 15
Status: Offline
Points: 5
Post Options Post Options   Thanks (0) Thanks(0)   Quote smleino Quote  Post ReplyReply Direct Link To This Post Posted: 06 Aug 15 at 2:20PM
I have tried to see whether the content from the layers is in separate content streams using the following code:
 
                int xPageCount = qp.PageCount();
                // Go through each page and encapsulate content streams
                for (int i = 1; i <= xPageCount; i++)
                {
                    result = qp.SelectPage(i);
                    int xContentStreamCount = qp.ContentStreamCount();
                    for (int x = 1; x <= xContentStreamCount; x++)
                    {
                        result = qp.SelectContentStream(x);
                        byte [] var = (byte[]) qp.GetContentStreamToVariant();
                        string contentString = Encoding.UTF8.GetString(var, 0, var.Length);
                    }
                }
 
The source doc has a single page with two OCGs:  the first, visible one contains English text; the second, non-visible one contains French text.
Unfortunately, even though the code above does find two content streams, it seems that the first stream has all the text, both English and French, while the second stream appears to be empty.
 
Thoughts?
Back to Top
Rowan View Drop Down
Moderator Group
Moderator Group
Avatar

Joined: 10 Jan 09
Status: Offline
Points: 398
Post Options Post Options   Thanks (0) Thanks(0)   Quote Rowan Quote  Post ReplyReply Direct Link To This Post Posted: 07 Sep 15 at 6:53AM
There is just the one content stream and it has this form:

/OC /MC0 BDC [French content] EMC /OC /MC1 BDC [English content] EMC

So it should be possible to split the page content just by deleting from the /OC to the EMC tag.

We've tried doing that and the first part works okay. But when we try the same thing with the second part it results in an invalid page content stream and Acrobat gives an error when rendering the page.

It looks like when Acrobat hides marked content it still processes all of the commands but just doesn't make any output on the page.

So to split the content we would need to:

1. Identify all the OCG parts of the content stream, looking for BDC and EMC tags.
2. Process all of the page commands between BDC and EMC
3. Delete or otherwise disable any command that causes output

As you can see, it should be possible to accomplish what you are trying to do but due to the fact that each OCG doesn't have its own content stream it becomes complicated.
Back to Top
smleino View Drop Down
Beginner
Beginner


Joined: 03 Aug 15
Status: Offline
Points: 5
Post Options Post Options   Thanks (0) Thanks(0)   Quote smleino Quote  Post ReplyReply Direct Link To This Post Posted: 08 Sep 15 at 12:55PM
Thanks for the information - I had noticed some of what you reported but did not have enough PDF internals knowledge to fully explain and diagnose the problem.

Will Quick PDF be able to handle this now or will it require changes to the library?  Or do I need to look at handling this some other way?

Thanks!
Back to Top
 Post Reply Post Reply
  Share Topic   

Forum Jump Forum Permissions View Drop Down

Forum Software by Web Wiz Forums® version 11.01
Copyright ©2001-2014 Web Wiz Ltd.

Copyright © 2017 Debenu. Debenu Quick PDF Library is a PDF SDK. All rights reserved. AboutContactBlogSupportOnline Store