phpdotnet
asked on
A proxy regex ?
https://www.experts-exchange.com/questions/22050007/stipping-out-proxy-list-from-web-using-php.html
I have read this topic about how to use regex to strip out proxies from a page. Here is the final code by TeRReF :
<?php
$s = '80.249.72.180:80 elite proxy Algeria (Algiers)
80.249.76.82:80 anonymous Algeria
190.49.168.251:6588 elite proxy Argentina (Buenos Aires)
83.160.170.10 8080 transparent Netherlands 2006-10-25 Whois
203.106.52.102 3128 transparent Malaysia 2006-10-25 Whois
213.52.140.53 80 anonymous Great Britain (UK) 2006-10-25 Whois
128.134.137.23:8080 3 Kb/s
138.89.253.5:33322 26 Kb/s
192.38.109.143:3128 23 Kb/s
165.228.131.10 3128 transparent Australia 2006-10-25 Whois
212.174.34.186 8080 anonymous Turkey 2006-10-25 Whois
1 202.147.181.2 8080 transparent Pakistan 2006-11-05 WHOIS
2 62.101.80.187 8080 high anonymity Italy 2006-11-05 WHOIS
3 84.20.143.8 8080 transparent Finland 2006-11-05 WHOIS
4 203.115.1.135 80 transparent Sri Lanka 2006-11-05 WHOIS
yoho.uwaterloo.ca:8000 transparent Pakistan 2006-11-05 WHOIS
kleinbonum.ethz.ch:8000 elite proxy Algeria (Algiers)';
preg_match_all('/([\w\d]+\ .[\w\d\.]+ )[:\s]+(\d {1,5})/i', $s, $matches);
$count = count($matches[1]);
for ($i = 0; $i < $count; $i++)
$lines[] = $matches[1][$i].':'.$match es[2][$i];
print_r($lines);
?>
It works well with the strings inside the $s but if the are something like abcdef255.255.255.255:80 then the code will go wrong. Can anyone provide an update for this please ? All helps will be appreciated. Thanks.
I have read this topic about how to use regex to strip out proxies from a page. Here is the final code by TeRReF :
<?php
$s = '80.249.72.180:80 elite proxy Algeria (Algiers)
80.249.76.82:80 anonymous Algeria
190.49.168.251:6588 elite proxy Argentina (Buenos Aires)
83.160.170.10 8080 transparent Netherlands 2006-10-25 Whois
203.106.52.102 3128 transparent Malaysia 2006-10-25 Whois
213.52.140.53 80 anonymous Great Britain (UK) 2006-10-25 Whois
128.134.137.23:8080 3 Kb/s
138.89.253.5:33322 26 Kb/s
192.38.109.143:3128 23 Kb/s
165.228.131.10 3128 transparent Australia 2006-10-25 Whois
212.174.34.186 8080 anonymous Turkey 2006-10-25 Whois
1 202.147.181.2 8080 transparent Pakistan 2006-11-05 WHOIS
2 62.101.80.187 8080 high anonymity Italy 2006-11-05 WHOIS
3 84.20.143.8 8080 transparent Finland 2006-11-05 WHOIS
4 203.115.1.135 80 transparent Sri Lanka 2006-11-05 WHOIS
yoho.uwaterloo.ca:8000 transparent Pakistan 2006-11-05 WHOIS
kleinbonum.ethz.ch:8000 elite proxy Algeria (Algiers)';
preg_match_all('/([\w\d]+\
$count = count($matches[1]);
for ($i = 0; $i < $count; $i++)
$lines[] = $matches[1][$i].':'.$match
print_r($lines);
?>
It works well with the strings inside the $s but if the are something like abcdef255.255.255.255:80 then the code will go wrong. Can anyone provide an update for this please ? All helps will be appreciated. Thanks.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thank you very much for your helps. That's all good.
As this example abcdef255.255.255.255:80 is ambigious it will be hard to find a regex which also covers your other examples.