Regular expression ?

I'm having difficulty with using a regular expression to validate/invalidate a repeating pattern like [*.*.*], where * are wildcards.  For example, I could have a string java.sun.com which should validate or a string java.sun..com which should invalidate because of the double ".."
My thought was to use a RE like:
(.*\.[a-zA-Z0-9])
but repeating the expression doesn't work.  
TaurusAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

objectsCommented:
Maybe something like:

\w+(\.+\w+)+
0
objectsCommented:
woops only want one dot :) , that should be:

\w+(\.\w+)+
0
TaurusAuthor Commented:
Does not seem to do it.  I tested "java.sun..com" with this RE on http://www.regexlib.com/RETester.aspx and http://home.tiscali.be/stevevh/tools/testRE.html.
0
Cloud Class® Course: Microsoft Windows 7 Basic

This introductory course to Windows 7 environment will teach you about working with the Windows operating system. You will learn about basic functions including start menu; the desktop; managing files, folders, and libraries.

objectsCommented:
I just tested it with java's regexp and it failed? Might be a case of different regexp implementation.
0
objectsCommented:
import java.util.regex.*;

public class a
{
      public static void main(String[] args)
      {
            Pattern p = Pattern.compile("\\w+(\\.\\w+)+");
            Matcher m = p.matcher(args[0]);
            if (m.matches()) System.out.println("match");
      }
}
0
objectsCommented:
Perhaps what it not processing the whole input string, and saying it finds a match with "java.sun"
0
CEHJCommented:
This is a possible workaround:

    String re = "[a-zA-Z_\\.]+";
    String input = "java..sun.com";
    boolean valid = input.matches(re) && input.indexOf("..") < 0;
0
savalouCommented:
What you are trying to do is understandable but I don't think using schema validation is the right way to go.  The W3 document on data types (http://www.w3.org/TR/xmlschema-2/#anyURI) more or less says shema pattern validation of URIs is useless because URIs can be so many things.  Even in your circs where you want a URL (more or less), so many patterns are valid, that  the "obvious" inconsistencies are not so obvious.  In most cases you would hae to take the URI and do something with it in code anyway, right, like get the file or whatever.

Anyway, the following pattern

<xsd:pattern value="((\w+://\w+(\.\w+)+(:\d+)?)?(/\w?)*)|([a-zA-Z]:)?(\\[a-zA-Z0-9\-_ ]*)*"/>

approves the following
<url>http://java.sun.com</url>
<url>http://java.sun.com:80</url>
<url>http://java.sun.com:80/</url>
<url>http://java.sun.com:80/a/b/c</url>
<url>d:\dir \dir_</url>
<url>\dir \dir_</url>

but not:

<url>http:/java.sun.com</url>
<url>http://java.sun..com</url>
<url>d:\dir@\dir</url>
0
TaurusAuthor Commented:
I tested the pattern you gave on http://www.regexlib.com/RETester.aspx and http://home.tiscali.be/stevevh/tools/testRE.html.   It does not work as suggested on these.  I tested it on XMLSPY and it works for a couple of the case I tried but incorrectly invalidates on things like: http://www.msn.com/mydir.txt and basically any relative path part I think(without having tested lots of cases).  I guess I will not rely on the online testers as they seem unreliable (which makes experimenting with not so simple RE's impossible for the inexperienced)

Per your comment about schema validation not being the right way to go, yes URI's I suppose they can be many things.  My URIs however will be fairly contstrained and what I want to check/validate for are simple typos and they follow two basic form(s).  Ideally, per my post in the XML topic area I'd like to be able to define an enumeration of patterns similar to the ones shown on http://www.cafesoft.com/support/tips/permission-resource-pattern-matching.html.  

But for starters match on the following patterns (specified in no RE language)to verify form and check for basic typos:
*://*:*/*  //URI
*:\*\*  //PC path

examples to further clarify:
http:///msn.com  //invalid because of the "///"
http://msn..com //invalid because of the ".."
http://www.msn.com//mytext.txt //invalid because of the second "//" should be a single "/".
c:\myrelativepath\\mytext.txt //invalid because the "\\mytext.txt" should be "\mytext.txt".

Why not do some basic validation with the schema since it will likely eliminate 50% of user input errors?  Why have the anyURI field at all if no validation is carried out?  

I am confounded that I cannot find a set of well tested, well specified, robust patterns for URI and path validation, anywhere it seems.  All I've been able to find thus far are sites like regexlib.com that only offer RE patterns that are written by whoever and don't have any formal specification and or test harness set to them.


0
TaurusAuthor Commented:
Above comment was for Savalou.
0
objectsCommented:
Looking at it again that Javascript regex is only doing partial matching.

this should do the trick for you:

^\w+(\.\w+)+$
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
savalouCommented:
Yes, I know it doesn't work on http://home.tiscali.be/stevevh/tools/testRE.html.  Not all RE matching algorithms are created equal (though well it may be that they should).

Anyway, if you want it to work on the tiscali one, you need ^ and $:
^((\w+://\w+(\.\w+)+(:\d+)?)?(/\w?)*)$|^([a-zA-Z]:)?(\\[a-zA-Z0-9\-_ ]*)*$

To validate filenames with extensions, you need to add a "\.".  I thought you'd fiigure that out yourself.  Same goes with any other symbols that can be part of filenames on your system:
<xsd:pattern value="((\w+://\w+(\.\w+)+(:\d+)?)?(/\w?)*)|([a-zA-Z]:)?(\\[a-zA-Z0-9\-\._ ]*)*"/>


I don't know what parser you are using (XML Spy?), but the regex I posted generates complaints about each of your four examples when I use the Xerces SAX parser.  

I'm afraid I can't do much more for you.
0
objectsCommented:
I thought we were matching hostnames, when did the requirements change.
0
TaurusAuthor Commented:
>when did the requirements change?
Well, in this particular post I started with a very simple example. I was just experimenting after my other (more encompassing) post in the XML topic area didn't get me very far (http://www.experts-exchange.com/Web/Web_Languages/XML/Q_20811314.html).  I haven't had much opportunity to work with RE's (did so a tiny bit several years past as part of some Java scripting work).  Coming back to it, in the context of writing REs for a schema validator, is frustrating, especially when I can't find resources that are complete or robust (as is the case of the online validator's).  Or as Salvalou said, not all matching algorithms are created equal.  

Salvalou,

Per not figuring out your pattern and adding a "\.",  well as Objects said, this post was originally just intended to allow me to figure out how to validate a simple repeating pattern.  I've seen now about two dozen different, lengthy, URI patterns and I haven't the time to go through and understand each until I find one that seems to work as advertised.  Hence I didn't spend time looking at your pattern I just tested it.   So might I ask, is this your pattern or does it orig. from another source?  How well has it been tested?  Not to sound lazy but pattern matching/ parsing on URI's and paths feels so much like reinventing the wheel (which I try to avoid).
0
objectsCommented:
Well it makes it a little hard to give you exactly what you need if you don't tell us the full story :)

Anyway the RE I posted above should meet the requirements in this question.
0
TaurusAuthor Commented:
Objects, yes what you gave seems to work with the ^ and $.   I expanded it to ^(\w+:\/\/)\w+(\.\w+)+(\/\w+).?\w+$  Have not tested much, but I'm getting the idea.
0
objectsCommented:
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Java

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.