Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

removing hyperlinks from end-notes in a PDF to HTML conversion

Posted on 2016-09-22
8
Medium Priority
?
168 Views
Last Modified: 2016-09-26
We are using a program called JPDF to HTML from IDR solutions to convert over 500 PDF's into HTML and in the PDF’s, the end notes contain URL’s. Now some of those URL’s have been converted into hyperlinks, and it doesn’t convert the whole URL just the first line of the URL. Does anyone know the easiest way to strip the hyperlinks from the URL’s?

Thanks!
0
Comment
Question by:rcimasi
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 4
8 Comments
 
LVL 59

Expert Comment

by:Julian Hansen
ID: 41812243
When you say strip do you mean the conversion does this
<a href="http://www.somedomain.com/folder/truncated_">http://www.somedomain.com/folder/truncated_</a> path/somefile.html

Open in new window

And you want to end up with
http://www.somedomain.com/folder/truncated_path/somefile.html

Open in new window


Some examples of what you are referring to be would be helpful to understand what it is you are asking.
0
 

Author Comment

by:rcimasi
ID: 41812691
Yes exactly
0
 
LVL 59

Expert Comment

by:Julian Hansen
ID: 41812943
Do you have a sample of a converted document (or part of one with a broken hyperlink)?

Specifically need to see how lines are broken.

One solution is to use a regular expressiong

/<a(.*?)>(.*?)</a>/g, \1
The expression and implementation will vary depending on what you use to implement it but basically the expression matches all <a> tags and takes what is between them as the replacement for everything from the opening tag to the closing tag.

What tools would you use to process the html files to do the replace - Java?
0
Looking for a new Web Host?

Lunarpages' assortment of hosting products and solutions ensure a perfect fit for anyone looking to get their vision or products to market. Our award winning customer support and 30-day money back guarantee show the pride we take in being the industry's premier MSP.

 

Author Comment

by:rcimasi
ID: 41816304
The solution that we are using right now for this is to put the converted web pages into Dreamweaver and then use the find and replace function to quickly go through and find the hyperlinks and the remove the /a tag
0
 
LVL 59

Expert Comment

by:Julian Hansen
ID: 41816493
Understood but I could recommend a solution in PHP only to find you don't use PHP - hence my question given a scripted solution what server side script environment would you prefer to use?
0
 

Author Comment

by:rcimasi
ID: 41816556
We can use what ever language, what do you recommend?
0
 
LVL 59

Accepted Solution

by:
Julian Hansen earned 2000 total points
ID: 41816869
It does not really make a difference. Anything that supports regular expressions.
Here is a PHP solution
The script searches for all .html files in a folder, converts the URLs and then writes the file back to a folder (output).
<?php
$files = glob('*.html');
foreach($files as $file) {
  $content = file_get_contents($file);
  $fixed = preg_replace('/\<a(.*?)href="(.*?)"(.*?)>(.*?)\<\/a>/i', '\2', $content);
  file_put_contents("output/{$file}", $fixed);
}

Open in new window

To be able to determine if the script is correct I would need to see a sample of a file.
Here is the test file I used
<!doctype html>
<html>
<body>
	This is a test to see if ths <a href="http://www.somedomain.com/folder/truncated_">http://www.somedomain.com/folder/truncated_</a>path/somefile.html and some more
	text over here <a href="http://www.somedomain.com/folder/someotherfile.html">http://www.somedomain.com/folder/someotherfile.html</a> would go over here.
</body>
</html>

Open in new window

0
 

Author Closing Comment

by:rcimasi
ID: 41816978
Thanks for the help
0

Featured Post

Moving data to the cloud? Find out if you’re ready

Before moving to the cloud, it is important to carefully define your db needs, plan for the migration & understand prod. environment. This wp explains how to define what you need from a cloud provider, plan for the migration & what putting a cloud solution into practice entails.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

When it comes to write a Context Sensitive Help (an online help that is obtained from a specific point in state of software to provide help with that state) ,  first we need to make the file that contains all topics, which are given exclusive IDs. …
Today, the web development industry is booming, and many people consider it to be their vocation. The question you may be asking yourself is – how do I become a web developer?
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.
The viewer will the learn the benefit of plain text editors and code an HTML5 based template for use in further tutorials.

715 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question