Print Page | Close Window

Extract web / email links

Printed From: Debenu Quick PDF Library - PDF SDK Community Forum
Category: For Users of the Library
Forum Name: I need help - I can help
Forum Description: Problems and solutions while programming with the Debenu Quick PDF Library and Debenu PDF Viewer SDK
URL: http://www.quickpdf.org/forum/forum_posts.asp?TID=1451
Printed Date: 20 May 24 at 7:30AM
Software Version: Web Wiz Forums 11.01 - http://www.webwizforums.com


Topic: Extract web / email links
Posted By: ZarkoGajic
Subject: Extract web / email links
Date Posted: 18 May 10 at 2:01PM
Hi,

What would be the easiest way to extract web links like "http://" or "www.site.com" and email links like "mailto:mail@domain.com" or "mail@domain.com" from an existing PDF document?

The GetAnnotStrProperty(111) would retrieve the annotation link value.

I am looking for a way to extract those "web-like" links that a PDF reader would represent as web links and ask to open a web browser or start the default email client.

-zarko


-------------
-zarko gajic



Replies:
Posted By: Ingo
Date Posted: 18 May 10 at 3:31PM
Hi Zarko!

You need this page:
http://www.quickpdflibrary.com/help/quickpdf/AnnotationsAndHotspotLinks.php
Additional there was an older newsletter from Rowan or Karl explaining how to separate these links.
I think you should go on the official supportpages. There you'll find these things.

Cheers, Ingo



Posted By: ZarkoGajic
Date Posted: 18 May 10 at 3:35PM
Ingo,

Thanks. I'm aware of the Annotations related function.

I'm looking for the best way to extract text and look for "web-alike"  patterns :)




-------------
-zarko gajic


Posted By: dsola
Date Posted: 27 May 10 at 3:57PM
Hi,

brute force ?
GetPageText or direct access equivalent method and then search.
If all links have same font or colour search would be simple.

Pozdrav iz Nove

Davor


-------------
registered QuickPDF user


Posted By: ZarkoGajic
Date Posted: 27 May 10 at 4:09PM
Davore, thanks ... that's how it was done :)

-------------
-zarko gajic


Posted By: Ingo
Date Posted: 27 May 10 at 6:10PM
Hi!

I don't think that GetPageText will work in every case.
With GetPageText you'll get things like "please klick on this link to enter the website"
but you won't get the real link behind.

You can do this things by yourself, too.
I've made an Unencryption and search for things like http and so on in the real file-content.

Cheers, Ingo
 



Print Page | Close Window

Forum Software by Web Wiz Forums® version 11.01 - http://www.webwizforums.com
Copyright ©2001-2014 Web Wiz Ltd. - http://www.webwiz.co.uk