Help with detecting visible OCGs and removing OCGs

Message Topic Search Topic Options Post Reply Create New Topic Printable Version Translate Topic

   I am working with QuickPDF Library 11.15 using the ActiveX C# interface on a PDF that has both visible and non-visible Optional Content Groups (OCGs) and my goal is:1. Determine the OCGs present and which ones are visible and non-visible
2. Remove the non-visible OCGs and their content from the document
3. Save the PDF which now has just the visible OCGs remaining

Based on the Library documentation, this seemed simple enough, but I am running into various problems.  The source PDF I am testing with has one page and two OCGS - a visible OCG containing English text and a non-visible OCG containing French text.  Based on the code snippet below, I expect that QuickPDF will tell me that the first OCG is visible and the second one non-visible:

                string sourceDoc = @"..\..\Test Files\samplepdfwithlayers.pdf";
                qp.LoadFromFile(sourceDoc, "");

                // Count OCGs
                int OCGCount = qp.OptionalContentGroupCount();
                // Loop through each OCG and delete it
                for (int i = 1; i <= OCGCount; i++)
                {
                    int OCGID = qp.GetOptionalContentGroupID(i);
                    int visible = qp.GetOptionalContentGroupVisible(OCGID);
                }

Okay, so far, so good, QuickPDF does indeed indicate on visible and one non-visible OCG.

Next, I want to add a call to Remove the non-visible OCG and then save what should now be a PDF with just the single remaining visible, English text, OCG:

                string sourceDoc = @"..\..\Test Files\samplepdfwithlayers.pdf";
                string destfileName = @"..\..\Test Files\convertedPDFA.pdf";

                qp.LoadFromFile(sourceDoc, "");

                // Count OCGs
                int OCGCount = qp.OptionalContentGroupCount(); //should be 2 initially
                // Loop through each OCG and delete any that are non-visible
                for (int i = 1; i <= OCGCount; i++)
                {
                    int OCGID = qp.GetOptionalContentGroupID(i);
                    int visible = qp.GetOptionalContentGroupVisible(OCGID);
                    if (visible == 0) //if invisible, delete the OCG
                    {
                        qp.DeleteOptionalContentGroup(OCGID);
                    }
                }

                // Count OCGs again
                OCGCount = qp.OptionalContentGroupCount(); //should be 1 now

                qp.SaveToFile(destfileName);

So, here is the problem - the new PDF does have just one remaining OCG but instead of only showing the English text from the original visible OCG, it shows both the English AND French text on top of each other!  

Can someone tell me why the French text is even still in the PDF and why it has been put into the remaining OCG?  How do I make sure that content from a removed OCG is removed from the PDF?

Thanks!

Author	Message Topic Search Topic Options Post Reply Create New Topic Printable Version Translate Topic
smleino Members Profile Find Members Posts Beginner Joined: 03 Aug 15 Status: Offline Points: 5	Post Options Post Reply Quote smleino Report Post Thanks(0) Quote Reply Topic: Help with detecting visible OCGs and removing OCGs Posted: 03 Aug 15 at 6:41PM
	I am working with QuickPDF Library 11.15 using the ActiveX C# interface on a PDF that has both visible and non-visible Optional Content Groups (OCGs) and my goal is: 1. Determine the OCGs present and which ones are visible and non-visible 2. Remove the non-visible OCGs and their content from the document 3. Save the PDF which now has just the visible OCGs remaining Based on the Library documentation, this seemed simple enough, but I am running into various problems. The source PDF I am testing with has one page and two OCGS - a visible OCG containing English text and a non-visible OCG containing French text. Based on the code snippet below, I expect that QuickPDF will tell me that the first OCG is visible and the second one non-visible: string sourceDoc = @"..\..\Test Files\samplepdfwithlayers.pdf"; qp.LoadFromFile(sourceDoc, ""); // Count OCGs int OCGCount = qp.OptionalContentGroupCount(); // Loop through each OCG and delete it for (int i = 1; i <= OCGCount; i++) { int OCGID = qp.GetOptionalContentGroupID(i); int visible = qp.GetOptionalContentGroupVisible(OCGID); } Okay, so far, so good, QuickPDF does indeed indicate on visible and one non-visible OCG. Next, I want to add a call to Remove the non-visible OCG and then save what should now be a PDF with just the single remaining visible, English text, OCG: string sourceDoc = @"..\..\Test Files\samplepdfwithlayers.pdf"; string destfileName = @"..\..\Test Files\convertedPDFA.pdf"; qp.LoadFromFile(sourceDoc, ""); // Count OCGs int OCGCount = qp.OptionalContentGroupCount(); //should be 2 initially // Loop through each OCG and delete any that are non-visible for (int i = 1; i <= OCGCount; i++) { int OCGID = qp.GetOptionalContentGroupID(i); int visible = qp.GetOptionalContentGroupVisible(OCGID); if (visible == 0) //if invisible, delete the OCG { qp.DeleteOptionalContentGroup(OCGID); } } // Count OCGs again OCGCount = qp.OptionalContentGroupCount(); //should be 1 now qp.SaveToFile(destfileName); So, here is the problem - the new PDF does have just one remaining OCG but instead of only showing the English text from the original visible OCG, it shows both the English AND French text on top of each other! Can someone tell me why the French text is even still in the PDF and why it has been put into the remaining OCG? How do I make sure that content from a removed OCG is removed from the PDF? Thanks!

smleino Members Profile Find Members Posts Beginner Joined: 03 Aug 15 Status: Offline Points: 5	Post Options Post Reply Quote smleino Report Post Thanks(0) Quote Reply Posted: 04 Aug 15 at 2:25PM
	Does anyone know if this is working as designed or if this is a bug?

Rowan Members Profile Find Members Posts Moderator Group Joined: 10 Jan 09 Status: Offline Points: 398	Post Options Post Reply Quote Rowan Report Post Thanks(0) Quote Reply Posted: 05 Aug 15 at 9:14PM
	Optional Content Groups don't contain any content themselves, rather it's a way of grouping content streams in the document. So removing one OCG just means that the remaining OCG has to contain all of the content streams. Or to put it another way, the text that was in the OCG which was deleted is shown on the page because it no longer belongs to an OCG telling it to be invisible. What you want to do is delete the OCG and its associated content streams, however, this can be fraught with danger as not all content streams are safe to delete (i.e. might mess up your document in unexpected ways), so it requires testing with your documents. I will put together some sample code for you tomorrow. The function you use to delete content streams is DeleteContentStream but the trick is determining which content stream the OCG is assigned to (if you already know this then it's obviously easier and you can probably work it out yourself).

smleino Members Profile Find Members Posts Beginner Joined: 03 Aug 15 Status: Offline Points: 5	Post Options Post Reply Quote smleino Report Post Thanks(0) Quote Reply Posted: 06 Aug 15 at 2:20PM
	I have tried to see whether the content from the layers is in separate content streams using the following code: int xPageCount = qp.PageCount(); // Go through each page and encapsulate content streams for (int i = 1; i <= xPageCount; i++) { result = qp.SelectPage(i); int xContentStreamCount = qp.ContentStreamCount(); for (int x = 1; x <= xContentStreamCount; x++) { result = qp.SelectContentStream(x); byte [] var = (byte[]) qp.GetContentStreamToVariant(); string contentString = Encoding.UTF8.GetString(var, 0, var.Length); } } The source doc has a single page with two OCGs: the first, visible one contains English text; the second, non-visible one contains French text. Unfortunately, even though the code above does find two content streams, it seems that the first stream has all the text, both English and French, while the second stream appears to be empty. Thoughts?

Rowan Members Profile Find Members Posts Moderator Group Joined: 10 Jan 09 Status: Offline Points: 398	Post Options Post Reply Quote Rowan Report Post Thanks(0) Quote Reply Posted: 07 Sep 15 at 6:53AM
	There is just the one content stream and it has this form: /OC /MC0 BDC [French content] EMC /OC /MC1 BDC [English content] EMC So it should be possible to split the page content just by deleting from the /OC to the EMC tag. We've tried doing that and the first part works okay. But when we try the same thing with the second part it results in an invalid page content stream and Acrobat gives an error when rendering the page. It looks like when Acrobat hides marked content it still processes all of the commands but just doesn't make any output on the page. So to split the content we would need to: 1. Identify all the OCG parts of the content stream, looking for BDC and EMC tags. 2. Process all of the page commands between BDC and EMC 3. Delete or otherwise disable any command that causes output As you can see, it should be possible to accomplish what you are trying to do but due to the fact that each OCG doesn't have its own content stream it becomes complicated.

smleino Members Profile Find Members Posts Beginner Joined: 03 Aug 15 Status: Offline Points: 5	Post Options Post Reply Quote smleino Report Post Thanks(0) Quote Reply Posted: 08 Sep 15 at 12:55PM
	Thanks for the information - I had noticed some of what you reported but did not have enough PDF internals knowledge to fully explain and diagnose the problem. Will Quick PDF be able to handle this now or will it require changes to the library? Or do I need to look at handling this some other way? Thanks!