Solved

Regex fine tuning again.

Posted on 2010-11-29
3
328 Views
Last Modified: 2012-05-10




I'm building a small application to allow me to gather sell price on ebay, I divided the script in 5 different preg_match_all section, one for each of the 5 data I want to pull

1) title
2) item number
3) bids
4) price
5) date

with the help of other Experts here I've been able to filter only the sold items, everything work pretty good but there are a few glitches and my script needs some fine tuning, here is an example;

            $match_count1 = preg_match_all('#class\s*=\s*"vip">([^<]*)</a>(?=(?:.(?!class\s*=\s*"vip"))*<span\s+class\s*=\s*"sold">)#is',$source,$title_arr);
 
 //print_r($title_arr);
 
// print "<td><table border=1>";

 if ($match_count1>0)

foreach($title_arr[1] as $title)
 {
        echo "<tr><td><input type=\"text\" name=\"title[]\" size=\"75\" value=\"".$title."\"></td>";
           }

 else
  print "No match found.";
 
This script allows me to get the title of an item, the line to scrape look like this

<a href=\"http://cgi.ebay.com/1969-O-PEE-CHEE-HOCKEY-146-NORM-FERGUSON-PSA-9-MINT-/150523384223?pt=US_Hockey_Trading_Cards&hash=item230be4919f\" class=\"vip\">1969 O-PEE-CHEE HOCKEY #146 NORM FERGUSON PSA 9 MINT</a>

But if the sellers took the "bold" option than the line look like this

<a href=\"http://cgi.ebay.com/1979-Topps-18-Wayne-Gretzky-Rookie-HOF-Oilers-PSA-7-/270666196846?pt=US_Hockey_Trading_Cards&hash=item3f04f6676e\" class=\"vip g-b\">1979 Topps #18 Wayne Gretzky Rookie HOF Oilers PSA 7</a>

Not a huge difference here were going from class=\"vip\" to class=\"vip g-b\"

So the question is how can I modify the preg_match_all regex to pickup both instance

Thanks
0
Comment
Question by:gamebits
3 Comments
 
LVL 35

Expert Comment

by:Terry Woods
ID: 34235785
Try this:
$match_count1 = preg_match_all('#class\s*=\s*"vip( g-b)?">([^<]*)</a>(?=(?:.(?!class\s*=\s*"vip( g-b)?"))*<span\s+class\s*=\s*"sold">)#is',$source,$title_arr);

Open in new window

0
 
LVL 75

Accepted Solution

by:
käµfm³d   👽 earned 500 total points
ID: 34235792
I believe changing it from

    vip

to

    vip[^"]*

in the pattern will do it. Be sure to escape that quote if you need to.
0
 
LVL 28

Author Comment

by:gamebits
ID: 34235860
@TerryAtOpus I'm loosing everything else and for the title I should be getting all I have is g-b

@kaufmed Yep, that did it, awesome.
0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Whatever be the reason, if you are working on web development side,  you will need day-today validation codes like email validation, date validation , IP address validation, phone validation on any of the edit page or say at the time of registration…
Password hashing is better than message digests or encryption, and you should be using it instead of message digests or encryption.  Find out why and how in this article, which supplements the original article on PHP Client Registration, Login, Logo…
The viewer will learn how to count occurrences of each item in an array.
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …

679 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question