Avatar of Brandon Garnett
Brandon Garnett
 asked on

removing hyperlinks from end-notes in a PDF to HTML conversion

We are using a program called JPDF to HTML from IDR solutions to convert over 500 PDF's into HTML and in the PDF’s, the end notes contain URL’s. Now some of those URL’s have been converted into hyperlinks, and it doesn’t convert the whole URL just the first line of the URL. Does anyone know the easiest way to strip the hyperlinks from the URL’s?

Thanks!
HTMLWeb ApplicationsWeb Languages and Standards

Avatar of undefined
Last Comment
Brandon Garnett

8/22/2022 - Mon
Julian Hansen

When you say strip do you mean the conversion does this
<a href="http://www.somedomain.com/folder/truncated_">http://www.somedomain.com/folder/truncated_</a> path/somefile.html

Open in new window

And you want to end up with
http://www.somedomain.com/folder/truncated_path/somefile.html

Open in new window


Some examples of what you are referring to be would be helpful to understand what it is you are asking.
Brandon Garnett

ASKER
Yes exactly
Julian Hansen

Do you have a sample of a converted document (or part of one with a broken hyperlink)?

Specifically need to see how lines are broken.

One solution is to use a regular expressiong

/<a(.*?)>(.*?)</a>/g, \1
The expression and implementation will vary depending on what you use to implement it but basically the expression matches all <a> tags and takes what is between them as the replacement for everything from the opening tag to the closing tag.

What tools would you use to process the html files to do the replace - Java?
I started with Experts Exchange in 2004 and it's been a mainstay of my professional computing life since. It helped me launch a career as a programmer / Oracle data analyst
William Peck
Brandon Garnett

ASKER
The solution that we are using right now for this is to put the converted web pages into Dreamweaver and then use the find and replace function to quickly go through and find the hyperlinks and the remove the /a tag
Julian Hansen

Understood but I could recommend a solution in PHP only to find you don't use PHP - hence my question given a scripted solution what server side script environment would you prefer to use?
Brandon Garnett

ASKER
We can use what ever language, what do you recommend?
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
ASKER CERTIFIED SOLUTION
Julian Hansen

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
GET A PERSONALIZED SOLUTION
Ask your own question & get feedback from real experts
Find out why thousands trust the EE community with their toughest problems.
Brandon Garnett

ASKER
Thanks for the help