Link to home
Start Free TrialLog in
Avatar of stuayre
stuayre

asked on

extract text from html file

Hi,

I need a function to extract some data from inside a html string.

the information repeats itself on the page, here's an example..in a <tr>

<tr><td align=left class="stdtext">AA-T128</a></td><td align=left class="stdtext"><a href="http://www.domain.com/shops/directlink.asp?name=AA-T128">Coriander Oil 10ml</td><td align=right class="stdtext">2</td><td align=right class="stdtext">£2.67</td><td align=right class="stdtext"><input type=text size=4 name="qty_AA-T128" value="0"></td><td align=right class="stdtext"></td></tr>

I need to extract the product code AA-T128 and the stock 2

any ideas?

cheers

Stu

Avatar of Geert G
Geert G
Flag of Belgium image

did you delete parts of the text ?
some parts are invalid
<a href="">Coriander Oil 10ml</td>

should be
<a href="">Coriander Oil 10ml</a>

and you have a missing start tag for <A too ...

or isn't this the problem ?
Avatar of stuayre
stuayre

ASKER

Hi,

you're right I think that's a mistake on the website's part. here's a bigger chunk of code.

its not the problem tho :)

Stu
<tr><td align=left class="stdtext">AA-T118</a></td><td align=left class="stdtext"><a href="http://www.domain.com/shops/directlink.asp?name=AA-T118">Mandarin Oil 10ml</td><td align=right class="stdtext">1</td><td align=right class="stdtext">£2.67</td><td align=right class="stdtext"><input type=text size=4 name="qty_AA-T118" value="0"></td><td align=right class="stdtext"></td></tr><input type=hidden name="code48" value="AA-T119"><tr><td align=left class="stdtext">AA-T119</a></td><td align=left class="stdtext"><a href="http://www.domain.com/shops/directlink.asp?name=AA-T119">Myrrh Oil 5ml</td><td align=right class="stdtext">2</td><td align=right class="stdtext">£3.37</td><td align=right class="stdtext"><input type=text size=4 name="qty_AA-T119" value="0"></td><td align=right class="stdtext"></td></tr><input type=hidden name="code49" value="AA-T1191"><tr><td align=left class="stdtext">AA-T1191</a></td><td align=left class="stdtext"><a href="http://www.domain.com/shops/directlink.asp?name=AA-T1191">Myrrh Oil 10ml</td><td align=right class="stdtext">1</td><td align=right class="stdtext">£5.39</td><td align=right class="stdtext"><input type=text size=4 name="qty_AA-T1191" value="0"></td><td align=right class="stdtext"></td></tr><input type=hidden name="code50" value="AA-T120"><tr><td align=left class="stdtext">AA-T120</a></td><td align=left class="stdtext"><a href="http://www.domain.com/shops/directlink.asp?name=AA-T120">Patchouli Oil 10ml</td><td align=right class="stdtext">2</td><td align=right class="stdtext">£2.29</td><td align=right class="stdtext"><input type=text size=4 name="qty_AA-T120" value="0"></td><td align=right class="stdtext"></td></tr><input type=hidden name="code51" value="AA-T121"><tr><td align=left class="stdtext">AA-T121</a></td><td align=left class="stdtext"><a href="http://www.domain.com/shops/directlink.asp?name=AA-T121">Peppermint English Oil 10ml</td><td align=right class="stdtext">6</td><td align=right class="stdtext">£2.42</td><td align=right class="stdtext"><input type=text size=4 name="qty_AA-T121" value="0"></td><td align=right class="stdtext"></td></tr><input type=hidden name="code52" value="AA-T122"><tr><td align=left class="stdtext">AA-T122</a></td><td align=left class="stdtext"><a href="http://www.domain.com/shops/directlink.asp?name=AA-T122">Pine Scotch Oil 10ml</td><td align=right class="stdtext">3</td><td align=right class="stdtext">£2.42</td><td align=right class="stdtext"><input type=text size=4 name="qty_AA-T122" value="0"></td><td align=right class="stdtext"></td></tr><input type=hidden name="code53" value="AA-T123"><tr><td align=left class="stdtext">AA-T123</a></td><td align=left class="stdtext"><a href="http://www.domain.com/shops/directlink.asp?name=AA-T123">Rosemary Oil 10ml</td><td align=right class="stdtext">3</td><td align=right class="stdtext">£2.29</td><td align=right class="stdtext"><input type=text size=4 name="qty_AA-T123" value="0"></td><td align=right class="stdtext"></td></tr><input type=hidden name="code54" value="AA-T124"><tr><td align=left class="stdtext">AA-T124</a></td><td align=left class="stdtext"><a href="http://www.domain.com/shops/directlink.asp?name=AA-T124">Sandalwood Oil 5ml</td><td align=right class="stdtext">2</td><td align=right class="stdtext">£5.12</td><td align=right class="stdtext"><input type=text size=4 name="qty_AA-T124" value="0"></td><td align=right class="stdtext"></td></tr><input type=hidden name="code55" value="AA-T1241"><tr><td align=left class="stdtext">AA-T1241</a></td><td align=left class="stdtext"><a href="http://www.domain.com/shops/directlink.asp?name=AA-T1241">Sandalwood Oil 10ml</td><td align=right class="stdtext">2</td><td align=right class="stdtext">£9.02</td><td align=right class="stdtext"><input type=text size=4 name="qty_AA-T1241" value="0"></td><td align=right class="stdtext"></td></tr><input type=hidden name="code56" value="AA-T125"><tr><td align=left class="stdtext">AA-T125</a></td><td align=left class="stdtext"><a href="http://www.domain.com/shops/directlink.asp?name=AA-T125">Tea Tree Oil 10ml</td><td align=right class="stdtext">8</td><td align=right class="stdtext">£2.21</td><td align=right class="stdtext"><input type=text size=4 name="qty_AA-T125" value="0"></td><td align=right class="stdtext"></td></tr><input type=hidden name="code57" value="AA-T1251"><tr><td align=left class="stdtext">AA-T1251</a></td><td align=left class="stdtext"><a href="http://www.domain.com/shops/directlink.asp?name=AA-T1251">Tea Tree Oil 30ml</td><td align=right class="stdtext">2</td><td align=right class="stdtext">£4.50</td><td align=right class="stdtext"><input type=text size=4 name="qty_AA-T1251" value="0"></td><td align=right class="stdtext"></td></tr><input type=hidden name="code58" value="AA-T126"><tr><td align=left class="stdtext">AA-T126</a></td><td align=left class="stdtext"><a href="http://www.domain.com/shops/directlink.asp?name=AA-T126">Ylang Ylang I Oil 10ml</td><td align=right class="stdtext">2</td><td align=right class="stdtext">£3.10</td><td align=right class="stdtext"><input type=text size=4 name="qty_AA-T126" value="0"></td><td align=right class="stdtext"></td></tr><input type=hidden name="code59" value="AA-T127"><tr><td align=left class="stdtext">AA-T127</a></td><td align=left class="stdtext"><a href="http://www.domain.com/shops/directlink.asp?name=AA-T127">Cedarwood Atlas Oil 10ml</td><td align=right class="stdtext">2</td><td align=right class="stdtext">£2.13</td><td align=right class="stdtext"><input type=text size=4 name="qty_AA-T127" value="0"></td><td align=right class="stdtext"></td></tr><input type=hidden name="code60" value="AA-T128"><tr><td align=left class="stdtext">AA-T128</a></td><td align=left class="stdtext"><a href="http://www.domain.com/shops/directlink.asp?name=AA-T128">Coriander Oil 10ml</td><td align=right class="stdtext">2</td><td align=right class="stdtext">£2.67</td><td align=right class="stdtext"><input type=text size=4 name="qty_AA-T128" value="0"></td><td align=right class="stdtext"></td></tr>

Open in new window

actually it will be ...
you would be using a parser to extract the value from a structure

The structure is invalid --> so the parser will fail at extracting the value
it's full of mistakes, not much you can do with that

Avatar of stuayre

ASKER

is it possible to write a reg exp to get everything between

<tr><td align=left class="stdtext">     and    </a></td><td align=left class="stdtext"><a href

to get the product code

and then another one to get everything between..

</td><td align=right class="stdtext">    and   </td><td align=right class="stdtext">£

to get the qty ?

i found this question if it helps

https://www.experts-exchange.com/questions/22104497/Extract-data-from-HTML-Tables-form-post.html

cheers

Stu
This small class rather helpful such tag parsing works
http://fit.c2.com/Release/Source/fit/Parse.java
ASKER CERTIFIED SOLUTION
Avatar of Geert G
Geert G
Flag of Belgium image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of stuayre

ASKER

Hi Geert_Gruwez, it almost works.

it came back with this...in memo2


AA-T118
AA-T1191</a></td><td 
align=left class="stdtext"><a href="http://www.domain.com/shops/directlink.asp?name=AA-T1191">Myrrh Oil 10ml</td><td align=right class="stdtext">1</td><td align=right class="stdtext">£5.39</td><td align=right class="stdtext"><input type=text 
size=4 name="qty_AA-T1191" value="0"></td><td align=right class="stdtext"></td></tr><input type=hidden name="code50" value="AA-T120"><tr><td align=left class="stdtext">AA-T120</a></td><td align=left class="stdtext"><a 
href="http://www.domain.com/shops/directlink.asp?name=AA-T120">Patchouli Oil 10ml</td><td align=right class="stdtext">2</td><td align=right class="stdtext">£2.29</td><td align=right class="stdtext"><input type=text size=4 name="qty_AA-
T120" value="0"></td><td align=right class="stdtext"></td></tr><input type=hidden name="code51" value="AA-T121"><tr><td align=left class="stdtext">AA-T121</a></td><td align=left class="stdtext"><a 
href="http://www.domain.com/shops/directlink.asp?name=AA-T121">Peppermint English Oil 10ml</td><td align=right class="stdtext">6</td><td align=right class="stdtext">£2.42</td><td align=right class="stdtext"><input type=text size=4 
name="qty_AA-T121" value="0"></td><td align=right class="stdtext"></td></tr><input type=hidden name="code52" value="AA-T122"><tr><td align=left class="stdtext">AA-T122</a></td><td align=left class="stdtext"><a 
href="http://www.domain.com/shops/directlink.asp?name=AA-T122">Pine Scotch Oil 10ml</td><td align=right class="stdtext">3</td><td align=right class="stdtext">£2.42</td><td align=right class="stdtext"><input type=text size=4 name="qty_AA-
T122" value="0"></td><td align=right class="stdtext"></td></tr><input type=hidden name="code53" value="AA-T123"><tr><td align=left class="stdtext">AA-T123</a></td><td align=left class="stdtext"><a 
href="http://www.domain.com/shops/directlink.asp?name=AA-T123">Rosemary Oil 10ml</td><td align=right class="stdtext">3</td><td align=right class="stdtext">£2.29</td><td align=right class="stdtext"><input type=text size=4 name="qty_AA-
T123" value="0"></td><td align=right class="stdtext"></td></tr><input type=hidden name="code54" value="AA-T124"><tr><td align=left class="stdtext">AA-T124</a></td><td align=left class="stdtext"><a 
href="http://www.domain.com/shops/directlink.asp?name=AA-T124">Sandalwood Oil 5ml</td><td align=right class="stdtext">2</td><td align=right class="stdtext">£5.12</td><td align=right class="stdtext"><input type=text size=4 name="qty_AA-
T124" value="0"></td><td align=right class="stdtext"></td></tr><input type=hidden name="code55" value="AA-T1241"><tr><td align=left class="stdtext">AA-T1241</a></td><td align=left class="stdtext"><a 
href="http://www.domain.com/shops/directlink.asp?name=AA-T1241">Sandalwood Oil 10ml</td><td align=right class="stdtext">2</td><td align=right class="stdtext">£9.02</td><td align=right class="stdtext"><input type=text size=4 name="qty_AA
-T1241" value="0"></td><td align=right class="stdtext"></td></tr><input type=hidden name="code56" value="AA-T125"><tr><td align=left class="stdtext">AA-T125</a></td><td align=left class="stdtext"><a 
href="http://www.domain.com/shops/directlink.asp?name=AA-T125">Tea Tree Oil 10ml</td><td align=right class="stdtex
</td><td align=right class="stdtext">£3.10</td><td align=right class="stdtext"><input type=text size=4 name="qty_AA-T126" value="0"></td><td align=right class="stdtext"></td></tr><input type=hidden name="code59" value="AA-T127"><tr><td 
align=left class="stdtext">AA-T127

Open in new window

Avatar of stuayre

ASKER

oh sorry i had word wrap on... doh!
Avatar of stuayre

ASKER

thanks :)