Fernanditos
asked on
Removing lines containing numbers
Hi
i have the attached code which remove all lines with no .com or .net domain names, it also removes all characters after the first "," from domains.txt files:
Example of domains.txt content:
amaze.com,10/20/2010 12:00:00 AM,AUC
ample.asia,10/20/2010 12:00:00 AM,AUC
am12ements.net,10/20/2010 12:00:00 AM,AUC
ant-arctic.com,10/20/2010 12:00:00 AM,AUC
antibiotic.net,10/20/2010 12:00:00 AM,AUC
antitrust.com,10/20/2010 12:00:00 AM,AUC
anyone.de,10/20/2010 12:00:00 AM,AUC
anyoneanyoneanyoneanyone.c om,10/20/2 010 12:00:00 AM,AUC
The attached code returns a cleaned list: (only .com and .net)
amaze.com
am12ements.net
ant-arctic.com
antibiotic.net
antitrust.com
anyoneanyoneanyoneanyone.c om
I need to modify the code in order to remove also domains meeting any of these 3 criterias:
containing numbers
containing "-" character
domain name longer than 10 characters.
How can I add this to my existing code?
Thank you!
i have the attached code which remove all lines with no .com or .net domain names, it also removes all characters after the first "," from domains.txt files:
Example of domains.txt content:
amaze.com,10/20/2010 12:00:00 AM,AUC
ample.asia,10/20/2010 12:00:00 AM,AUC
am12ements.net,10/20/2010 12:00:00 AM,AUC
ant-arctic.com,10/20/2010 12:00:00 AM,AUC
antibiotic.net,10/20/2010 12:00:00 AM,AUC
antitrust.com,10/20/2010 12:00:00 AM,AUC
anyone.de,10/20/2010 12:00:00 AM,AUC
anyoneanyoneanyoneanyone.c
The attached code returns a cleaned list: (only .com and .net)
amaze.com
am12ements.net
ant-arctic.com
antibiotic.net
antitrust.com
anyoneanyoneanyoneanyone.c
I need to modify the code in order to remove also domains meeting any of these 3 criterias:
containing numbers
containing "-" character
domain name longer than 10 characters.
How can I add this to my existing code?
Thank you!
<?php // RAY_temp_fernanditos.php
error_reporting(E_ALL);
echo "<pre>";
// TEST DATA FROM THE POST AT EE
$str = file_get_contents('domains.txt');
// THE NEEDLES TO SEARCH FOR
$needles = array
( '.com,'
, '.net,'
)
;
// MAKE AN ARRAY FROM THE TEST DATA STRING
$arr = explode(PHP_EOL, $str);
// ITERATE OVER EACH LINE
foreach ($arr as $key => $val)
{
// MAN PAGE http://us.php.net/manual/en/function.strpos.php
if ( (strpos($val, $needles[0]) === FALSE) && (strpos($val, $needles[1]) === FALSE) )
{
unset($arr[$key]);
}
else
{
// FIND THE COMMA AT THE END OF THE TLD
$poz = strpos($val, ',');
$arr[$key] = substr($val, 0, $poz);
}
}
$new = implode(PHP_EOL, $arr);
echo $new;
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thank you!. I am still getting domains with numbers:
4bidpay.com
470algerdr.com
4770794.com
435436.com
466466.com
Please check.
4bidpay.com
470algerdr.com
4770794.com
435436.com
466466.com
Please check.
ASKER
Oh, I fixed adding a "+": preg_match("/[0-9]+/",...
It works like a charm.
Can you please tell me how to tell to remove also lines NOT CONTAINING: "blog" ?
Thank you.
It works like a charm.
Can you please tell me how to tell to remove also lines NOT CONTAINING: "blog" ?
Thank you.
Oh, good catch. I see I fumble-fingered the double ']]' in my code :-)
To remove lines that do not contain "blog" just add:
if (! preg_match("/blog/", $tmpval)) {unset($arr[$key]); continue; }
To remove lines that do not contain "blog" just add:
if (! preg_match("/blog/", $tmpval)) {unset($arr[$key]); continue; }
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Change the line:
$arr[$key] = substr($val, 0, $poz);
To:
if (preg_match("/[0-9]]/", $tmpval)) { unset($arr[$key]); continue; }
if (strpos($tmpval, "-")) { unset($arr[$key]); continue; }
$dompieces = explode(".", $tmpval);
if (strlen($dompieces[0]) > 10) { unset($arr[$key]); continue; }
$arr[$key] = $tmpval;