Link to home
Start Free TrialLog in
Avatar of enigma1234567890
enigma1234567890Flag for Ireland

asked on

regex to pull web links in java

anyone know the regex patters to go through a string and pull all web links from <a href
and display in the form www.google.com
Avatar of a_b
a_b

Avatar of enigma1234567890

ASKER

i want to do it via regex
use Pattern.compile("href=\\\"(.*?)\\\"", Pattern.CASE_INSENSITIVE)
this returns stuff like href="http://google.com"  ho do i drop the href

thanks
ASKER CERTIFIED SOLUTION
Avatar of a_b
a_b

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
finally

is their a method to not report blank  href=""
SOLUTION
Avatar of Terry Woods
Terry Woods
Flag of New Zealand image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
could you explain what is taking place for example what does the following do

?
\\\
^\\\


thanks
(?:blah) is a non-capturing sub pattern. It means match blah but don't capture it in the result.

[^abc] means match any one character that's not a or b or c

\\\" is used to match a double quote character. I'm probably not the best person to explain how escaping backslash characters works in java, but it somehow works out that they need double-escaping! People (including me) commonly get confused with how many backslashes to use.
You might find this cheat-sheet useful:
http://www.phpguru.org/article/pcre-cheat-sheet
thanks all will close the call