How do I extract google scholar's html tags using regular expressions?
Posted on 2010-01-10
I need to extract the authors' name from the html code of a Google Scholar page, using java.
For example, i need to extract "R Banuelos, RG Smits" from the following html code, using regular expressions.
Could i ask for someone's advice?
<br><span class=gs_a>R Bañuelos, RG Smits - Probability Theory and Related Fields, 1997 - Springer</span><br>Summary. We study the asymptotic behavior of Brownian motion and its conditioned process <br>
in cones using an in®nite series representation of its transition density. A concise probabilistic <br>
interpretation of this series in terms of the skew product decomposition of Brownian <b> ...</b> <br><span class=gs_fl><a href="/scholar?cites=726791209970358048&hl=en&as_sdt=2000">Cited by 52</a> - <a href="/scholar?q=related:INcoOMEUFgoJ:scholar.google.com/&hl=en&as_sdt=2000">Related articles</a>