Solved

Regex fine tuning again.

Posted on 2010-11-29
3
324 Views
Last Modified: 2012-05-10




I'm building a small application to allow me to gather sell price on ebay, I divided the script in 5 different preg_match_all section, one for each of the 5 data I want to pull

1) title
2) item number
3) bids
4) price
5) date

with the help of other Experts here I've been able to filter only the sold items, everything work pretty good but there are a few glitches and my script needs some fine tuning, here is an example;

            $match_count1 = preg_match_all('#class\s*=\s*"vip">([^<]*)</a>(?=(?:.(?!class\s*=\s*"vip"))*<span\s+class\s*=\s*"sold">)#is',$source,$title_arr);
 
 //print_r($title_arr);
 
// print "<td><table border=1>";

 if ($match_count1>0)

foreach($title_arr[1] as $title)
 {
        echo "<tr><td><input type=\"text\" name=\"title[]\" size=\"75\" value=\"".$title."\"></td>";
           }

 else
  print "No match found.";
 
This script allows me to get the title of an item, the line to scrape look like this

<a href=\"http://cgi.ebay.com/1969-O-PEE-CHEE-HOCKEY-146-NORM-FERGUSON-PSA-9-MINT-/150523384223?pt=US_Hockey_Trading_Cards&amp;hash=item230be4919f\" class=\"vip\">1969 O-PEE-CHEE HOCKEY #146 NORM FERGUSON PSA 9 MINT</a>

But if the sellers took the "bold" option than the line look like this

<a href=\"http://cgi.ebay.com/1979-Topps-18-Wayne-Gretzky-Rookie-HOF-Oilers-PSA-7-/270666196846?pt=US_Hockey_Trading_Cards&amp;hash=item3f04f6676e\" class=\"vip g-b\">1979 Topps #18 Wayne Gretzky Rookie HOF Oilers PSA 7</a>

Not a huge difference here were going from class=\"vip\" to class=\"vip g-b\"

So the question is how can I modify the preg_match_all regex to pickup both instance

Thanks
0
Comment
Question by:gamebits
3 Comments
 
LVL 35

Expert Comment

by:Terry Woods
Comment Utility
Try this:
$match_count1 = preg_match_all('#class\s*=\s*"vip( g-b)?">([^<]*)</a>(?=(?:.(?!class\s*=\s*"vip( g-b)?"))*<span\s+class\s*=\s*"sold">)#is',$source,$title_arr);

Open in new window

0
 
LVL 74

Accepted Solution

by:
käµfm³d   👽 earned 500 total points
Comment Utility
I believe changing it from

    vip

to

    vip[^"]*

in the pattern will do it. Be sure to escape that quote if you need to.
0
 
LVL 28

Author Comment

by:gamebits
Comment Utility
@TerryAtOpus I'm loosing everything else and for the title I should be getting all I have is g-b

@kaufmed Yep, that did it, awesome.
0

Featured Post

IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Introduction HTML checkboxes provide the perfect way for a web developer to receive client input when the client's options might be none, one or many.  But the PHP code for processing the checkboxes can be confusing at first.  What if a checkbox is…
Developers of all skill levels should learn to use current best practices when developing websites. However many developers, new and old, fall into the trap of using deprecated features because this is what so many tutorials and books tell them to u…
The viewer will learn how to count occurrences of each item in an array.
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…

728 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

9 Experts available now in Live!

Get 1:1 Help Now