Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people, just like you, are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
Solved

Regex fine tuning again.

Posted on 2010-11-29
3
327 Views
Last Modified: 2012-05-10




I'm building a small application to allow me to gather sell price on ebay, I divided the script in 5 different preg_match_all section, one for each of the 5 data I want to pull

1) title
2) item number
3) bids
4) price
5) date

with the help of other Experts here I've been able to filter only the sold items, everything work pretty good but there are a few glitches and my script needs some fine tuning, here is an example;

            $match_count1 = preg_match_all('#class\s*=\s*"vip">([^<]*)</a>(?=(?:.(?!class\s*=\s*"vip"))*<span\s+class\s*=\s*"sold">)#is',$source,$title_arr);
 
 //print_r($title_arr);
 
// print "<td><table border=1>";

 if ($match_count1>0)

foreach($title_arr[1] as $title)
 {
        echo "<tr><td><input type=\"text\" name=\"title[]\" size=\"75\" value=\"".$title."\"></td>";
           }

 else
  print "No match found.";
 
This script allows me to get the title of an item, the line to scrape look like this

<a href=\"http://cgi.ebay.com/1969-O-PEE-CHEE-HOCKEY-146-NORM-FERGUSON-PSA-9-MINT-/150523384223?pt=US_Hockey_Trading_Cards&hash=item230be4919f\" class=\"vip\">1969 O-PEE-CHEE HOCKEY #146 NORM FERGUSON PSA 9 MINT</a>

But if the sellers took the "bold" option than the line look like this

<a href=\"http://cgi.ebay.com/1979-Topps-18-Wayne-Gretzky-Rookie-HOF-Oilers-PSA-7-/270666196846?pt=US_Hockey_Trading_Cards&hash=item3f04f6676e\" class=\"vip g-b\">1979 Topps #18 Wayne Gretzky Rookie HOF Oilers PSA 7</a>

Not a huge difference here were going from class=\"vip\" to class=\"vip g-b\"

So the question is how can I modify the preg_match_all regex to pickup both instance

Thanks
0
Comment
Question by:gamebits
3 Comments
 
LVL 35

Expert Comment

by:Terry Woods
ID: 34235785
Try this:
$match_count1 = preg_match_all('#class\s*=\s*"vip( g-b)?">([^<]*)</a>(?=(?:.(?!class\s*=\s*"vip( g-b)?"))*<span\s+class\s*=\s*"sold">)#is',$source,$title_arr);

Open in new window

0
 
LVL 75

Accepted Solution

by:
käµfm³d   👽 earned 500 total points
ID: 34235792
I believe changing it from

    vip

to

    vip[^"]*

in the pattern will do it. Be sure to escape that quote if you need to.
0
 
LVL 28

Author Comment

by:gamebits
ID: 34235860
@TerryAtOpus I'm loosing everything else and for the title I should be getting all I have is g-b

@kaufmed Yep, that did it, awesome.
0

Featured Post

Networking for the Cloud Era

Join Microsoft and Riverbed for a discussion and demonstration of enhancements to SteelConnect:
-One-click orchestration and cloud connectivity in Azure environments
-Tight integration of SD-WAN and WAN optimization capabilities
-Scalability and resiliency equal to a data center

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Author Note: Since this E-E article was originally written, years ago, formal testing has come into common use in the world of PHP.  PHPUnit (http://en.wikipedia.org/wiki/PHPUnit) and similar technologies have enjoyed wide adoption, making it possib…
Things That Drive Us Nuts Have you noticed the use of the reCaptcha feature at EE and other web sites?  It wants you to read and retype something that looks like this.Insanity!  It's not EE's fault - that's just the way reCaptcha works.  But it is …
The viewer will learn how to dynamically set the form action using jQuery.
The viewer will learn how to create and use a small PHP class to apply a watermark to an image. This video shows the viewer the setup for the PHP watermark as well as important coding language. Continue to Part 2 to learn the core code used in creat…

860 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question