Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

How to use regex to get things out of hostname

Posted on 2011-09-20
10
Medium Priority
?
296 Views
Last Modified: 2012-05-12
Hi,

I would like to use ONE regular expression to get abc.com or abc.com.cn out of all the following host names:

1) www.abc.com
2) www.abc.com.cn
3) www.xyz.abc.com
4) www.xyz.abc.com.cn
5) xyz.abc.com
6) xyz.abc.com.cn

Thanks!
0
Comment
Question by:wsyy
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
10 Comments
 
LVL 40

Expert Comment

by:Gurvinder Pal Singh
ID: 36567535
http://www.exampledepot.com/egs/java.lang/HasSubstr.html

just check

if (string.indexOf("abc.com") != -1 }|| string.indexOf("abc.com") != -1 )
{
   //string is containing required substrings
}
0
 
LVL 4

Expert Comment

by:stachenov
ID: 36567722
Something like this works:
 
String[] t = {"www.abc.com", 
            "www.abc.com.cn", 
            "www.xyz.abc.com", 
            "www.xyz.abc.com.cn",
            "xyz.abc.com", 
            "xyz.abc.com.cn",
        };
        Pattern p = Pattern.compile("((?:[a-z0-9][-a-z0-9]*[a-z0-9]|[a-z0-9])"
                + "(?:\\.com\\.cn|\\.com)$)");
        for (String s : t) {
            Matcher m = p.matcher(s);
            if (m.find()) {
                System.out.println("Found " + m.group(1) + " in " + s);
            } else {
                System.out.println("Not found in " + s);
            }
        }

Open in new window

Looks a bit ugly because I couldn't find a more elegant way to enforce the "host name can't end or start with a hyphen" rule.

If you need to match more domains, not just ".com.cn" and ".com", then the second part should contain more complicated alternatives, but the idea stays the same.
0
 
LVL 1

Expert Comment

by:stephano12
ID: 36567729
Try this code
public class TestSubstring {
public static void main(String[] args) {
	String[] string = {"www.abc.com","www.abc.com.cn","www.xyz.abc.com","www.xyz.abc.com.cn","xyz.abc.com","xyz.abc.com.cn"};
	for (String stg: string){
		System.out.println(stg.substring(0, stg.indexOf(".abc.com")));
	}

}
}

Open in new window

0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 47

Expert Comment

by:for_yan
ID: 36568193
        String [] hosts = {
"www.abc.com",
"www.abc.com.cn",
"www.xyz.abc.com",
"www.xyz.abc.com.cn",
"xyz.abc.com",
"xyz.abc.com.cn"
};

        for(String sh : hosts){

            sh = sh.replaceAll(".*\\.(.+?\\.com)","$1");
            System.out.println("result: " + sh);

        }

Open in new window


Output:

result: abc.com
result: abc.com.cn
result: abc.com
result: abc.com.cn
result: abc.com
result: abc.com.cn

Open in new window

0
 
LVL 47

Accepted Solution

by:
for_yan earned 500 total points
ID: 36568397
maybe you want even this way:

  
        String [] hosts = {
"tyr.mnm.org",                
"www.abc.com",
"www.abc.com.cn",
"www.xyz.abc.com",
"www.xyz.abc.com.cn",
"xyz.abc.com",
"xyz.abc.com.cn",
"xryz.ttt.net",
"www.dfg.com",
"tt.mju.edu",
"ttty.mki.edu.au"
};

        for(String sh : hosts){

            sh = sh.replaceAll("(?:.*\\.(.+?\\.com))|(?:.*\\.(.+?\\.net))|(?:.*\\.(.+?\\.org))|(?:.*\\.(.+?\\.edu))","$1$2$3$4");

            System.out.println("result: " + sh);

        }

Open in new window

result: mnm.org
result: abc.com
result: abc.com.cn
result: abc.com
result: abc.com.cn
result: abc.com
result: abc.com.cn
result: ttt.net
result: dfg.com
result: mju.edu
result: mki.edu.au

Open in new window

0
 
LVL 4

Expert Comment

by:stachenov
ID: 36568577
@for_yan, this doesn't work for something like "com.cn.com.cn".
0
 
LVL 47

Expert Comment

by:for_yan
ID: 36568690

 why?  it returns:
result: cn.com.cn

com.cn.com

returns

cn.com

that is what is expected, as I understand.

And certainly for any regex you  can invent
some strign which will break it.


0
 
LVL 4

Expert Comment

by:stachenov
ID: 36568697
Sorry, I was wrong, it actually works.
0
 
LVL 47

Expert Comment

by:for_yan
ID: 36568707
No problem.
Though nothing is ideal, I'm sure there is some string which will break it.
Still it helps in great majority of cases
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 36568801
Personally i would use URL.getHost
0

Featured Post

VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Are you developing a Java application and want to create Excel Spreadsheets? You have come to the right place, this article will describe how you can create Excel Spreadsheets from a Java Application. For the purposes of this article, I will be u…
Java Flight Recorder and Java Mission Control together create a complete tool chain to continuously collect low level and detailed runtime information enabling after-the-fact incident analysis. Java Flight Recorder is a profiling and event collectio…
Viewers will learn one way to get user input in Java. Introduce the Scanner object: Declare the variable that stores the user input: An example prompting the user for input: Methods you need to invoke in order to properly get  user input:
This tutorial will introduce the viewer to VisualVM for the Java platform application. This video explains an example program and covers the Overview, Monitor, and Heap Dump tabs.
Suggested Courses

715 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question