Solved

Regular expression ?

Posted on 2003-12-01
17
739 Views
Last Modified: 2010-03-31
I'm having difficulty with using a regular expression to validate/invalidate a repeating pattern like [*.*.*], where * are wildcards.  For example, I could have a string java.sun.com which should validate or a string java.sun..com which should invalidate because of the double ".."
My thought was to use a RE like:
(.*\.[a-zA-Z0-9])
but repeating the expression doesn't work.  
0
Comment
Question by:Taurus
  • 9
  • 5
  • 2
  • +1
17 Comments
 
LVL 92

Expert Comment

by:objects
ID: 9855490
Maybe something like:

\w+(\.+\w+)+
0
 
LVL 92

Expert Comment

by:objects
ID: 9855546
woops only want one dot :) , that should be:

\w+(\.\w+)+
0
 

Author Comment

by:Taurus
ID: 9856701
Does not seem to do it.  I tested "java.sun..com" with this RE on http://www.regexlib.com/RETester.aspx and http://home.tiscali.be/stevevh/tools/testRE.html.
0
Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
LVL 92

Expert Comment

by:objects
ID: 9856717
I just tested it with java's regexp and it failed? Might be a case of different regexp implementation.
0
 
LVL 92

Expert Comment

by:objects
ID: 9856740
import java.util.regex.*;

public class a
{
      public static void main(String[] args)
      {
            Pattern p = Pattern.compile("\\w+(\\.\\w+)+");
            Matcher m = p.matcher(args[0]);
            if (m.matches()) System.out.println("match");
      }
}
0
 
LVL 92

Expert Comment

by:objects
ID: 9856745
Perhaps what it not processing the whole input string, and saying it finds a match with "java.sun"
0
 
LVL 86

Expert Comment

by:CEHJ
ID: 9858529
This is a possible workaround:

    String re = "[a-zA-Z_\\.]+";
    String input = "java..sun.com";
    boolean valid = input.matches(re) && input.indexOf("..") < 0;
0
 
LVL 3

Expert Comment

by:savalou
ID: 9858786
What you are trying to do is understandable but I don't think using schema validation is the right way to go.  The W3 document on data types (http://www.w3.org/TR/xmlschema-2/#anyURI) more or less says shema pattern validation of URIs is useless because URIs can be so many things.  Even in your circs where you want a URL (more or less), so many patterns are valid, that  the "obvious" inconsistencies are not so obvious.  In most cases you would hae to take the URI and do something with it in code anyway, right, like get the file or whatever.

Anyway, the following pattern

<xsd:pattern value="((\w+://\w+(\.\w+)+(:\d+)?)?(/\w?)*)|([a-zA-Z]:)?(\\[a-zA-Z0-9\-_ ]*)*"/>

approves the following
<url>http://java.sun.com</url>
<url>http://java.sun.com:80</url>
<url>http://java.sun.com:80/</url>
<url>http://java.sun.com:80/a/b/c</url>
<url>d:\dir \dir_</url>
<url>\dir \dir_</url>

but not:

<url>http:/java.sun.com</url>
<url>http://java.sun..com</url>
<url>d:\dir@\dir</url>
0
 

Author Comment

by:Taurus
ID: 9860216
I tested the pattern you gave on http://www.regexlib.com/RETester.aspx and http://home.tiscali.be/stevevh/tools/testRE.html.   It does not work as suggested on these.  I tested it on XMLSPY and it works for a couple of the case I tried but incorrectly invalidates on things like: http://www.msn.com/mydir.txt and basically any relative path part I think(without having tested lots of cases).  I guess I will not rely on the online testers as they seem unreliable (which makes experimenting with not so simple RE's impossible for the inexperienced)

Per your comment about schema validation not being the right way to go, yes URI's I suppose they can be many things.  My URIs however will be fairly contstrained and what I want to check/validate for are simple typos and they follow two basic form(s).  Ideally, per my post in the XML topic area I'd like to be able to define an enumeration of patterns similar to the ones shown on http://www.cafesoft.com/support/tips/permission-resource-pattern-matching.html.  

But for starters match on the following patterns (specified in no RE language)to verify form and check for basic typos:
*://*:*/*  //URI
*:\*\*  //PC path

examples to further clarify:
http:///msn.com  //invalid because of the "///"
http://msn..com //invalid because of the ".."
http://www.msn.com//mytext.txt //invalid because of the second "//" should be a single "/".
c:\myrelativepath\\mytext.txt //invalid because the "\\mytext.txt" should be "\mytext.txt".

Why not do some basic validation with the schema since it will likely eliminate 50% of user input errors?  Why have the anyURI field at all if no validation is carried out?  

I am confounded that I cannot find a set of well tested, well specified, robust patterns for URI and path validation, anywhere it seems.  All I've been able to find thus far are sites like regexlib.com that only offer RE patterns that are written by whoever and don't have any formal specification and or test harness set to them.


0
 

Author Comment

by:Taurus
ID: 9860221
Above comment was for Savalou.
0
 
LVL 92

Accepted Solution

by:
objects earned 100 total points
ID: 9861519
Looking at it again that Javascript regex is only doing partial matching.

this should do the trick for you:

^\w+(\.\w+)+$
0
 
LVL 3

Expert Comment

by:savalou
ID: 9861665
Yes, I know it doesn't work on http://home.tiscali.be/stevevh/tools/testRE.html.  Not all RE matching algorithms are created equal (though well it may be that they should).

Anyway, if you want it to work on the tiscali one, you need ^ and $:
^((\w+://\w+(\.\w+)+(:\d+)?)?(/\w?)*)$|^([a-zA-Z]:)?(\\[a-zA-Z0-9\-_ ]*)*$

To validate filenames with extensions, you need to add a "\.".  I thought you'd fiigure that out yourself.  Same goes with any other symbols that can be part of filenames on your system:
<xsd:pattern value="((\w+://\w+(\.\w+)+(:\d+)?)?(/\w?)*)|([a-zA-Z]:)?(\\[a-zA-Z0-9\-\._ ]*)*"/>


I don't know what parser you are using (XML Spy?), but the regex I posted generates complaints about each of your four examples when I use the Xerces SAX parser.  

I'm afraid I can't do much more for you.
0
 
LVL 92

Expert Comment

by:objects
ID: 9861783
I thought we were matching hostnames, when did the requirements change.
0
 

Author Comment

by:Taurus
ID: 9861981
>when did the requirements change?
Well, in this particular post I started with a very simple example. I was just experimenting after my other (more encompassing) post in the XML topic area didn't get me very far (http://www.experts-exchange.com/Web/Web_Languages/XML/Q_20811314.html).  I haven't had much opportunity to work with RE's (did so a tiny bit several years past as part of some Java scripting work).  Coming back to it, in the context of writing REs for a schema validator, is frustrating, especially when I can't find resources that are complete or robust (as is the case of the online validator's).  Or as Salvalou said, not all matching algorithms are created equal.  

Salvalou,

Per not figuring out your pattern and adding a "\.",  well as Objects said, this post was originally just intended to allow me to figure out how to validate a simple repeating pattern.  I've seen now about two dozen different, lengthy, URI patterns and I haven't the time to go through and understand each until I find one that seems to work as advertised.  Hence I didn't spend time looking at your pattern I just tested it.   So might I ask, is this your pattern or does it orig. from another source?  How well has it been tested?  Not to sound lazy but pattern matching/ parsing on URI's and paths feels so much like reinventing the wheel (which I try to avoid).
0
 
LVL 92

Expert Comment

by:objects
ID: 9862018
Well it makes it a little hard to give you exactly what you need if you don't tell us the full story :)

Anyway the RE I posted above should meet the requirements in this question.
0
 

Author Comment

by:Taurus
ID: 9862424
Objects, yes what you gave seems to work with the ^ and $.   I expanded it to ^(\w+:\/\/)\w+(\.\w+)+(\/\w+).?\w+$  Have not tested much, but I'm getting the idea.
0
 
LVL 92

Expert Comment

by:objects
ID: 9885090
0

Featured Post

Announcing the Most Valuable Experts of 2016

MVEs are more concerned with the satisfaction of those they help than with the considerable points they can earn. They are the types of people you feel privileged to call colleagues. Join us in honoring this amazing group of Experts.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Introduction This article is the first of three articles that explain why and how the Experts Exchange QA Team does test automation for our web site. This article explains our test automation goals. Then rationale is given for the tools we use to a…
Introduction This article is the second of three articles that explain why and how the Experts Exchange QA Team does test automation for our web site. This article covers the basic installation and configuration of the test automation tools used by…
Viewers learn how to read error messages and identify possible mistakes that could cause hours of frustration. Coding is as much about debugging your code as it is about writing it. Define Error Message: Line Numbers: Type of Error: Break Down…
Viewers will learn one way to get user input in Java. Introduce the Scanner object: Declare the variable that stores the user input: An example prompting the user for input: Methods you need to invoke in order to properly get  user input:

790 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question