Link to home
Start Free TrialLog in
Avatar of Bobby
BobbyFlag for United States of America

asked on

replace a string with a string from a different file

I have a txt file with a lot of [urlid=12345] (the 12345 can be any sequence of numbers, and it's not only 5 numbers, could be 3, 6, etc). I need to replace all those with the actual URL's that they reference, which I have in an excel file with two columns: urlid and url. The urlid column matches the urlid in [urlid=12345] in the txt file, and the url column contains the url I need to replace every [urlid=12345] with in the txt file.
Avatar of David Favor
David Favor
Flag of United States of America image

Use curl to resolve the links to URLs.

Simple PERL or BASH + sed script to replace URLs.

Likely any of your tech staff can do this in a short amount of time.
Avatar of Bobby

ASKER

We don't know any of that, at least those of us on staff now and not on vacation. I was hoping more like we, via regex, find all the instances of [urlid=12345] (or whatever the number sequence is) in the txt file, copy them, and then I can dump them into the excel file. Even at that though, it's not a one-to-one... there are way more records in the excel file (urlid's) than there are in the txt file.
This is a very simple task + likely take some time to code.

If you have no one on staff to do this, just hire someone.

You're also making this way to hard. If you have a link, like https://foolcom/12345 which redirects to another URL, just use curl to resolve the redirect. This way you know for sure you have the correct redirect also.

You can go the Excel data way. If you do, then you just ingest all your data into a simple script, like PERL, into hashes + just correlate your data of... [urlid=12345] to whatever your Excel data real URL might be.

As I said, simple to code + will require a bit of time.

Something this simple, just hire someone off Fiverr.
Can you share the files? This is a pretty quick task I think
Avatar of Bobby

ASKER

We don't have a link like https://whatever.com/12345 in the txt file, we only have [urlid=12345]. There are no redirects. The txt file is referring to a URL id in a table (excel file copy in this case)... in that table, there is the URL ID and the corresponding URL. Im trying to find the actual URL by tying together the two data sources.
Avatar of Bobby

ASKER

Yes, I'll share them in 2 minutes...
Avatar of Bobby

ASKER

Sample of the txt file...

It's <a href="http://www.stresscure.com/hrn/april.html">National Stress Awareness Month</a>.Yes, it's also [urlid=3150]Occupational Therapy Month[/urlid] too, but every month is about 5 different National Months, so bear with us on this. While we won't do a weekly feature on stress, because there's already enough stress in the world, we did want to share with you some helpful reminders about stress and relieving it. </p>\
<br>\
<p>\
Stress is both a biological and psychological term. It's been a popular topic of discussion in healthcare since the 1930's, but the term is thrown around in conversation without much real understanding. It has become a topic of concern for most American and European societies, and yet was scarcely talked about less than 100 years ago. Some recent researchers have called into question the very existence of the popular notion of stress, claiming it is too wide a term for a variety of distinct problems. But for those who experience stress, it's a very real force.\
</p> \
<p>\
In the 1970's a popular idea among scientists dealt with eustress and distress. Eustress, it was theorized was the positive stress that comes from a demanding physical or mental activity; distress was theorized as the negative kind that comes from a similar activity, but proves damaging to the body. In the early 21st century, research showed that any stress response in the human body creates hormones like adrenaline which damage the body's tissues, slightly in small amounts and in large doses can cause serious long term damage.\
</p>\
<p>\
What we call [urlid=81328]stress[/urlid], whether from our jobs, family, friends or communities, ultimately is an inescapable part of life. A utopia free from human worry has not yet been created, but when it is created, I hope I get the email. In the meantime, we are forced to cope. My great-great-great uncle, once removed, Sigmund Freud had some wild ideas about all this. He called it The Pleasure Principle, and even he wasn't quite sure what it was all about, only that humans have a tendency to try to find ways to get happiness in life, even when happiness is nowhere to be found.  \
</p>\

Open in new window

Avatar of Bobby

ASKER

sample of the Excel file...
sample.xlsx
If you can't provide the real data, I'd need to provide you a PHP script. Can you run PHP?

Also, you would need to copy and paste the data from Excel into a txt file so that it's easier to work with in PHP (otherwise there's substantial work involved for pulling data out of the xlsx file).
Avatar of Bobby

ASKER

Yes to both questions. Thank you.
Did you want the output data in html format with <a href...   ...</a> ?
Avatar of Bobby

ASKER

Yes, please.
Here's the PHP code for you. It should be reasonably self explanatory for you to be able to configure the file names of your input data as needed.

Note that each time you run it, the output file will be overwritten.

<?php
$data_url = file_get_contents("data_url.txt"); // Input file of URLs copied and pasted from 2 columns of data in Excel. The copy and paste process automatically adds a tab character between the id and the URL
$data_in = file_get_contents("data_in.txt"); // Input data

$data_output_filename = "data_out.txt";
$data_out = $data_in;

$url_lines = explode("\n", $data_url);

foreach ($url_lines as $line) {
    list($id, $url) = explode("\t", $line);
    $data_out = preg_replace("#\[urlid=$id\](.*?)\[/urlid\]#", "<a href='$url'>$1</a>", $data_out);
}

file_put_contents($data_output_filename, $data_out);

Open in new window

Avatar of Bobby

ASKER

I named all the files what you have there, I created data_out.txt and put it in same directory, I gave all 3 text files and the php file 777 perms, and then running it puts the contents of data_in.txt into data_out.txt, doesnt remove or alter the [urlid=12345] tags. I also added a ?> to the end of your script, no diff.
Avatar of Bobby

ASKER

oh crap, maybe I see what it is... you have [/urlid\] in there, but Ive already replaced all those with </a>. I will alter and try again.
Avatar of Bobby

ASKER

ugh... Ill have to alter your PHP because it's tool late to undo what I did. Do I do this?...

$data_out = preg_replace("#\[urlid=$id\](.*?)\#", "<a href='$url'>$1", $data_out);
Avatar of Bobby

ASKER

and is this supposed to say data_url.txt at the end?

$url_lines = explode("\n", $data_url);
I tested it and it was working ok with the given sample data. If you update your sample, I can alter the code to match.
Your input data should be in data_in.txt

The file data_out.txt is the result file that gets created by the script
Avatar of Bobby

ASKER

It's <a href="http://www.stresscure.com/hrn/april.html">National Stress Awareness Month</a>.Yes, it's also [urlid=3150]Occupational Therapy Month</a>too, but every month is about 5 different National Months, so bear with us on this. While we won't do a weekly feature on stress, because there's already enough stress in the world, we did want to share with you some helpful reminders about stress and relieving it. </p>\
<br>\
<p>\
Stress is both a biological and psychological term. It's been a popular topic of discussion in healthcare since the 1930's, but the term is thrown around in conversation without much real understanding. It has become a topic of concern for most American and European societies, and yet was scarcely talked about less than 100 years ago. Some recent researchers have called into question the very existence of the popular notion of stress, claiming it is too wide a term for a variety of distinct problems. But for those who experience stress, it's a very real force.\
</p> \
<p>\
In the 1970's a popular idea among scientists dealt with eustress and distress. Eustress, it was theorized was the positive stress that comes from a demanding physical or mental activity; distress was theorized as the negative kind that comes from a similar activity, but proves damaging to the body. In the early 21st century, research showed that any stress response in the human body creates hormones like adrenaline which damage the body's tissues, slightly in small amounts and in large doses can cause serious long term damage.\
</p>\
<p>\
What we call [urlid=81328]stress</a>, whether from our jobs, family, friends or communities, ultimately is an inescapable part of life. A utopia free from human worry has not yet been created, but when it is created, I hope I get the email. In the meantime, we are forced to cope. My great-great-great uncle, once removed, Sigmund Freud had some wild ideas about all this. He called it The Pleasure Principle, and even he wasn't quite sure what it was all about, only that humans have a tendency to try to find ways to get happiness in life, even when happiness is nowhere to be found.  \
</p>\

Open in new window

All the code I provided was working, so would only require changing if you had data different to your sample.
Ok, I'm working on a different copy of the code now... will be done in about 2 mins
ASKER CERTIFIED SOLUTION
Avatar of Terry Woods
Terry Woods
Flag of New Zealand image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Bobby

ASKER

gotta go for tonight but will check first thing tomorrow. Thanks.
Avatar of Bobby

ASKER

Thanks very much.