Fernanditos
asked on
Modifying multiple lines in text file using PHP
Hi,
I have a .txt file which has thousands of domain names listed in the same format you see in the attached lines.
I need to be able pick up the .net and .com only, or just delete the lines not containing ".com" or ".net"
I need solotion to get rid off non .com and .net domains, ALSO, to get rid off all characters after first ","... so, I need to get a final list that looks like this: (one domain name per line)
apartemen.com
luvme.com
sandwichbar.net
studio13.com
sunfar.net
Can some expert here please help me to find a solution, I think PHP can make it possible.
thank you.
I have a .txt file which has thousands of domain names listed in the same format you see in the attached lines.
I need to be able pick up the .net and .com only, or just delete the lines not containing ".com" or ".net"
I need solotion to get rid off non .com and .net domains, ALSO, to get rid off all characters after first ","... so, I need to get a final list that looks like this: (one domain name per line)
apartemen.com
luvme.com
sandwichbar.net
studio13.com
sunfar.net
Can some expert here please help me to find a solution, I think PHP can make it possible.
thank you.
apartemen.com,10/18/2010 12:00:00 AM,AUC
beehiveseller.asia,10/18/2010 12:00:00 AM,AUC
berlin.asia,10/18/2010 12:00:00 AM,AUC
besplatno.org,10/18/2010 12:00:00 AM,AUC
dekio.asia,10/18/2010 12:00:00 AM,AUC
edoctor.asia,10/18/2010 12:00:00 AM,AUC
enterbada.asia,10/18/2010 12:00:00 AM,AUC
global-gong.asia,10/18/2010 12:00:00 AM,AUC
globalgong.asia,10/18/2010 12:00:00 AM,AUC
gratuit.asia,10/18/2010 12:00:00 AM,AUC
karafarini.asia,10/18/2010 12:00:00 AM,AUC
lists.asia,10/18/2010 12:00:00 AM,AUC
luvme.com,10/18/2010 12:00:00 AM,AUC
numama.asia,10/18/2010 12:00:00 AM,AUC
sandwichbar.net,10/18/2010 12:00:00 AM,AUC
sandwichbars.asia,10/18/2010 12:00:00 AM,AUC
studio13.com,10/18/2010 12:00:00 AM,AUC
sunfar.net,10/18/2010 12:00:00 AM,AUC
Suggest you get the free program Notepad++ and do this in the text editor.
ASKER
Thank you jmatix, very interesting. I do not have Pearl actually on my server but if I will keep it in case I do not find a solution with php. Thank you!
ASKER
Ray, I have Emeditor which is great but I still really do not know to do it with the editor. Do you know?
<?php // RAY_temp_fernanditos.php
error_reporting(E_ALL);
echo "<pre>";
// TEST DATA FROM THE POST AT EE
$str = <<<EOSTR
apartemen.com,10/18/2010 12:00:00 AM,AUC
beehiveseller.asia,10/18/2010 12:00:00 AM,AUC
berlin.asia,10/18/2010 12:00:00 AM,AUC
besplatno.org,10/18/2010 12:00:00 AM,AUC
dekio.asia,10/18/2010 12:00:00 AM,AUC
edoctor.asia,10/18/2010 12:00:00 AM,AUC
enterbada.asia,10/18/2010 12:00:00 AM,AUC
global-gong.asia,10/18/2010 12:00:00 AM,AUC
globalgong.asia,10/18/2010 12:00:00 AM,AUC
gratuit.asia,10/18/2010 12:00:00 AM,AUC
karafarini.asia,10/18/2010 12:00:00 AM,AUC
lists.asia,10/18/2010 12:00:00 AM,AUC
luvme.com,10/18/2010 12:00:00 AM,AUC
numama.asia,10/18/2010 12:00:00 AM,AUC
sandwichbar.net,10/18/2010 12:00:00 AM,AUC
sandwichbars.asia,10/18/2010 12:00:00 AM,AUC
studio13.com,10/18/2010 12:00:00 AM,AUC
sunfar.net,10/18/2010 12:00:00 AM,AUC
EOSTR;
// THE NEEDLES TO SEARCH FOR
$needles = array
( '.com,'
, '.net,'
)
;
// MAKE AN ARRAY FROM THE TEST DATA STRING
$arr = explode(PHP_EOL, $str);
// ITERATE OVER EACH LINE
foreach ($arr as $key => $val)
{
// MAN PAGE http://us.php.net/manual/en/function.strpos.php
if ( (strpos($val, $needles[0]) === FALSE) && (strpos($val, $needles[1]) === FALSE) ) unset($arr[$key]);
}
$new = implode(PHP_EOL, $arr);
echo $new;
ASKER
Thank you Ray!
The script output:
apartemen.com,10/18/2010 12:00:00 AM,AUC
luvme.com,10/18/2010 12:00:00 AM,AUC
sandwichbar.net,10/18/2010 12:00:00 AM,AUC
studio13.com,10/18/2010 12:00:00 AM,AUC
sunfar.net,10/18/2010 12:00:00 AM,AUC
Any way to get only the domain names? :
apartemen.com
luvme.com
sandwichbar.net
studio13.com
sunfar.net
Any way to read it from external .txt file?
thank you!
The script output:
apartemen.com,10/18/2010 12:00:00 AM,AUC
luvme.com,10/18/2010 12:00:00 AM,AUC
sandwichbar.net,10/18/2010
studio13.com,10/18/2010 12:00:00 AM,AUC
sunfar.net,10/18/2010 12:00:00 AM,AUC
Any way to get only the domain names? :
apartemen.com
luvme.com
sandwichbar.net
studio13.com
sunfar.net
Any way to read it from external .txt file?
thank you!
Sure! Please post a link to the external text file.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
This is a 15 MB text file domain list. I am not sure why do you need the real text file, however here it is zipped. http://musichat.net/domains.rar
thank you for your great help.
thank you for your great help.
ASKER
You can test with this smaller: http://musichat.net/domains.txt
ASKER
Sorry, I did not see your last post. I tried your solution and it works like a charm! Thank you!
perl -i.bak -ne 's/,.+//;print if /\.(net|com)$/' domains.txt