enigma1234567890
asked on
regex to pull web links in java
anyone know the regex patters to go through a string and pull all web links from <a href
and display in the form www.google.com
and display in the form www.google.com
http://www.java2s.com/Code/Java/Network-Protocol/Getallhyperlinksfromawebpage.htm
Actually try this - http://www.java2s.com/Code/Java/Network-Protocol/ExtractlinksfromanHTMLpage.htm
ASKER
i want to do it via regex
use Pattern.compile("href=\\\" (.*?)\\\"" , Pattern.CASE_INSENSITIVE)
ASKER
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
finally
is their a method to not report blank href=""
is their a method to not report blank href=""
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
could you explain what is taking place for example what does the following do
?
\\\
^\\\
thanks
?
\\\
^\\\
thanks
(?:blah) is a non-capturing sub pattern. It means match blah but don't capture it in the result.
[^abc] means match any one character that's not a or b or c
\\\" is used to match a double quote character. I'm probably not the best person to explain how escaping backslash characters works in java, but it somehow works out that they need double-escaping! People (including me) commonly get confused with how many backslashes to use.
[^abc] means match any one character that's not a or b or c
\\\" is used to match a double quote character. I'm probably not the best person to explain how escaping backslash characters works in java, but it somehow works out that they need double-escaping! People (including me) commonly get confused with how many backslashes to use.
You might find this cheat-sheet useful:
http://www.phpguru.org/article/pcre-cheat-sheet
http://www.phpguru.org/article/pcre-cheat-sheet
ASKER
thanks all will close the call