Solved

Java regex needed

Posted on 2011-09-21
9
262 Views
Last Modified: 2012-05-12
Hi,

I need a regex to match the following situations. Thanks!

A. The last segment of (1)-(3) show increased numeric pieces:
1) http://www.binggo.com/shipin/20207011122957
2) http://www.binggo.com/shipin/202070111
3) http://www.binggo.com/shipin/202070

B. The last segment of (4)-(5) begins with a\d+.html
4) http://www.binggo.com/shipin/a0207.html
5) http://www.binggo.com/shipin/a0207b111c2957.html

C) The last segment of (6) are \d+.html
6) http://www.binggo.com/shipin/20601081184657.html

D) The (7)-(10) are much more difficult, and I can figure out a regex for it.
7) http://www.binggo.com/shipin/a20209b2020904h2616/315432.html
8) http://www.binggo.com/shipin/a20209b2020904h2616/
9) http://www.binggo.com/shipin/a20209b2020904/
10) http://www.binggo.com/shipin/a20209/
0
Comment
Question by:wsyy
  • 3
  • 2
  • 2
  • +2
9 Comments
 
LVL 47

Expert Comment

by:for_yan
ID: 36578294

So do you want in all cases to return everything after "shipin/" ?

I think your 5) will not match a\d+.html either
0
 
LVL 47

Expert Comment

by:for_yan
ID: 36578385
         String [] urlShips = {


 "http://www.binggo.com/shipin/20207011122957",
 "http://www.binggo.com/shipin/202070111",
 "http://www.binggo.com/shipin/202070",


"http://www.binggo.com/shipin/a0207.html",
 "http://www.binggo.com/shipin/a0207b111c2957.html",


 "http://www.binggo.com/shipin/20601081184657.html",


 "http://www.binggo.com/shipin/a20209b2020904h2616/315432.html",
 "http://www.binggo.com/shipin/a20209b2020904h2616/",
"http://www.binggo.com/shipin/a20209b2020904/",
 "http://www.binggo.com/shipin/a20209/",



         };

        for(String url : urlShips) {

            url = url.replaceAll(".+shipin/(.+)","$1");
            System.out.println(url);
            

        }

      

Open in new window


Output:
20207011122957
202070111
202070
a0207.html
a0207b111c2957.html
20601081184657.html
a20209b2020904h2616/315432.html
a20209b2020904h2616/
a20209b2020904/
a20209/

Open in new window

0
 
LVL 86

Assisted Solution

by:CEHJ
CEHJ earned 62 total points
ID: 36578800
>>I need a regex to match the following situations.

1-3 are not regex territory
7-10 likewise

The rest are suitable for regex treatment:
final String PATTERN = ".*?\\d+\\.html";
boolean valid = urlString.matches(PATTERN);

Open in new window

0
 
LVL 63

Expert Comment

by:Zvonko
ID: 36578924
Your \d is not realy digit but hex: [\da-f]+
Do not forget the iGnoreCase modifier.

0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 
LVL 63

Expert Comment

by:Zvonko
ID: 36578930
Sory no hex.It has "h" and therefore it can be base64?
0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 36593968
What are you trying to match/extract? The whole URL, or just parts of each?
0
 

Author Comment

by:wsyy
ID: 36593991
I am trying to match the whole url.
0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 36594001
Please forgive my ignorance, but are you looking for one regex pattern to match all, or one per group (i.e. A, B, C, D)?
0
 
LVL 75

Accepted Solution

by:
käµfm³d   👽 earned 63 total points
ID: 36594009
While I await your response, I'll hazard a guess to say you want all to be matched. Please try the following:

String pattern = "http://www\\.binggo\\.com/shipin/(?:[a-z0-9]+\\.html|[a-z0-9]+(?:/(?:[a-z0-9]+\\.html)?)?)?";

Open in new window

0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Java had always been an easily readable and understandable language.  Some relatively recent changes in the language seem to be changing this pretty fast, and anyone that had not seen any Java code for the last 5 years will possibly have issues unde…
Go is an acronym of golang, is a programming language developed Google in 2007. Go is a new language that is mostly in the C family, with significant input from Pascal/Modula/Oberon family. Hence Go arisen as low-level language with fast compilation…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
This tutorial covers a practical example of lazy loading technique and early loading technique in a Singleton Design Pattern.

912 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now