dmontgom
asked on
A regular Expression for fnding a relative path
Hi.
I need a regular expression for finding relatives paths e.g. ../img/xxx.gif or ./img/xxx.gif or img/xxx.gif in general ../path/to/file/xxx.gif
I am using python2.5.
Please provide a clear solution...I am not an expert in regex...
Thanks
I need a regular expression for finding relatives paths e.g. ../img/xxx.gif or ./img/xxx.gif or img/xxx.gif in general ../path/to/file/xxx.gif
I am using python2.5.
Please provide a clear solution...I am not an expert in regex...
Thanks
better...
>>> def isRelative( path ) :
... RE = re.compile( r'(^|[\\/])\.\.?[\\/]' )
... return ( RE.search( path ) != None )
...
>>> isRelative( 'a./b' )
False
>>> isRelative( './b' )
True
>>> isRelative( 'a../b' )
False
>>> isRelative( 'img/xxx.gif' )
False
>>> isRelative( '../path/to/file/xxx.gif' )
True
>>>
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ah. I was thinking of an object method.
Nevermind. Go with the 2nd... (i.e., better)
Nevermind. Go with the 2nd... (i.e., better)
ASKER
Hi,
My bad...I was not clear....
I have a html page and there are relative tags in java scripts. I need to search the string and find patters that look like a relative url so I can do a find and replace.... I will be looking for files that have extensions if e.g. css,js,jpg etc....
So...if given this string "adadfadf adafd ../test/test.gif adfadfaf" how can I do this for any arbitrary pattern?
My bad...I was not clear....
I have a html page and there are relative tags in java scripts. I need to search the string and find patters that look like a relative url so I can do a find and replace.... I will be looking for files that have extensions if e.g. css,js,jpg etc....
So...if given this string "adadfadf adafd ../test/test.gif adfadfaf" how can I do this for any arbitrary pattern?
Ah, so you want to locate the relative address substring(s) within a string?
Something like this?
>>> def Relative( str ) :
... RE = re.compile( r'(^|[\\/])\.\.?[\\/]' )
... result = []
... for data in str.split() :
... if RE.search( data ) != None :
... result.append( data )
... return result
...
>>> Relative( "adadfadf adafd ../test/test.gif adfadfaf" )
['../test/test.gif']
>>>
The answer HonorGod gave is quite good. It doesn't match the img/xxx.gif case you gave, but that's easily fixed.
The problem, however, isn't particularly well defined. For example, is "foo.gif" a relative path? Strictly speaking it is, but that may not be what you have in mind.
Are you looking for all paths that have at least one / (or \) and don't start at the root (don't begin with a /). What's your platform (Windows, Linux, ...)?
Again, using HG's script above, where it splits into strings first, you can use this:
r'(^[^\\/].*[\\/].*)'
which basically says: 1) don't start w/ a / or \, then there must be at least one / or \ somewhere else in the string.
ASKER
cool...almost there...here a real example...
because of the path between the ' ' it did not find it.....
tt = """ <body onload="MM_preloadImages(' ../images_ home/home1 .gif ','../images_home/started1 .gif','../ images_hom e/pricing1 .gif','../ images_hom e/success1 .gif','../ images_hom e/how1.gif ','../imag es_home/ab out1.gif', '../images _home/faqs 1.gif','.. /images_ho me/home_st ep01.gif', '../images _home/home _step02.gi f','../ima ges_home/h ome_step03 .gif')">
"""
because of the path between the ' ' it did not find it.....
tt = """ <body onload="MM_preloadImages('
"""
change the split above to:
for data in str.split(''") :
that's double quote, single quote, double quote... I don't think you have to escape the ' quote... but if you do, its
"\'"
basically, instead of splitting on whitespace, we're splitting on the single quotes
for data in str.split(''") :
that's double quote, single quote, double quote... I don't think you have to escape the ' quote... but if you do, its
"\'"
basically, instead of splitting on whitespace, we're splitting on the single quotes
no need for regular expression
tt = """ <body onload="MM_preloadImages('../images_home/home1.gif ','../images_home/started1.gif','../images_home/pricing1.gif','../images_home/success1.gif','../images_home/how1.gif','../images_home/about1.gif','../images_home/faqs1.gif','../images_home/home_step01.gif','../images_home/home_step02.gif','../images_home/home_step03.gif')">
"""
for item in tt.split(","):
item = item[ item.index("../"):]
print item[: item.index("'")]
ASKER
what is need is regex...there is a larger scope here...
I use beautiful soup is parse html tags....
I need regex it handle arbitrary complexity....
I use beautiful soup is parse html tags....
I need regex it handle arbitrary complexity....
ASKER
PS jiust imagine a big html file.....,is there a regex way to find all relative links? e.g. / ./ ../ ../../ etc....
HonorGod's code worked great for all all cases...just not if there is a quote e.g. '' or a " ".. Also it wont work if the rel path is e.g. images/test.gif.
HonorGod's code worked great for all all cases...just not if there is a quote e.g. '' or a " ".. Also it wont work if the rel path is e.g. images/test.gif.
Thanks for the grade & points.
Good luck and have a great day.
Good luck and have a great day.
Open in new window