wsyy
asked on
How to use regex to get things out of hostname
Hi,
I would like to use ONE regular expression to get abc.com or abc.com.cn out of all the following host names:
1) www.abc.com
2) www.abc.com.cn
3) www.xyz.abc.com
4) www.xyz.abc.com.cn
5) xyz.abc.com
6) xyz.abc.com.cn
Thanks!
I would like to use ONE regular expression to get abc.com or abc.com.cn out of all the following host names:
1) www.abc.com
2) www.abc.com.cn
3) www.xyz.abc.com
4) www.xyz.abc.com.cn
5) xyz.abc.com
6) xyz.abc.com.cn
Thanks!
Something like this works:
If you need to match more domains, not just ".com.cn" and ".com", then the second part should contain more complicated alternatives, but the idea stays the same.
String[] t = {"www.abc.com",
"www.abc.com.cn",
"www.xyz.abc.com",
"www.xyz.abc.com.cn",
"xyz.abc.com",
"xyz.abc.com.cn",
};
Pattern p = Pattern.compile("((?:[a-z0-9][-a-z0-9]*[a-z0-9]|[a-z0-9])"
+ "(?:\\.com\\.cn|\\.com)$)");
for (String s : t) {
Matcher m = p.matcher(s);
if (m.find()) {
System.out.println("Found " + m.group(1) + " in " + s);
} else {
System.out.println("Not found in " + s);
}
}
Looks a bit ugly because I couldn't find a more elegant way to enforce the "host name can't end or start with a hyphen" rule.If you need to match more domains, not just ".com.cn" and ".com", then the second part should contain more complicated alternatives, but the idea stays the same.
Try this code
public class TestSubstring {
public static void main(String[] args) {
String[] string = {"www.abc.com","www.abc.com.cn","www.xyz.abc.com","www.xyz.abc.com.cn","xyz.abc.com","xyz.abc.com.cn"};
for (String stg: string){
System.out.println(stg.substring(0, stg.indexOf(".abc.com")));
}
}
}
String [] hosts = {
"www.abc.com",
"www.abc.com.cn",
"www.xyz.abc.com",
"www.xyz.abc.com.cn",
"xyz.abc.com",
"xyz.abc.com.cn"
};
for(String sh : hosts){
sh = sh.replaceAll(".*\\.(.+?\\.com)","$1");
System.out.println("result: " + sh);
}
Output:
result: abc.com
result: abc.com.cn
result: abc.com
result: abc.com.cn
result: abc.com
result: abc.com.cn
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
@for_yan, this doesn't work for something like "com.cn.com.cn".
why? it returns:
result: cn.com.cn
com.cn.com
returns
cn.com
that is what is expected, as I understand.
And certainly for any regex you can invent
some strign which will break it.
Sorry, I was wrong, it actually works.
No problem.
Though nothing is ideal, I'm sure there is some string which will break it.
Still it helps in great majority of cases
Though nothing is ideal, I'm sure there is some string which will break it.
Still it helps in great majority of cases
Personally i would use URL.getHost
just check
if (string.indexOf("abc.com")
{
//string is containing required substrings
}