Solved

Regex fine tuning again.

Posted on 2010-11-29
3
331 Views
Last Modified: 2012-05-10




I'm building a small application to allow me to gather sell price on ebay, I divided the script in 5 different preg_match_all section, one for each of the 5 data I want to pull

1) title
2) item number
3) bids
4) price
5) date

with the help of other Experts here I've been able to filter only the sold items, everything work pretty good but there are a few glitches and my script needs some fine tuning, here is an example;

            $match_count1 = preg_match_all('#class\s*=\s*"vip">([^<]*)</a>(?=(?:.(?!class\s*=\s*"vip"))*<span\s+class\s*=\s*"sold">)#is',$source,$title_arr);
 
 //print_r($title_arr);
 
// print "<td><table border=1>";

 if ($match_count1>0)

foreach($title_arr[1] as $title)
 {
        echo "<tr><td><input type=\"text\" name=\"title[]\" size=\"75\" value=\"".$title."\"></td>";
           }

 else
  print "No match found.";
 
This script allows me to get the title of an item, the line to scrape look like this

<a href=\"http://cgi.ebay.com/1969-O-PEE-CHEE-HOCKEY-146-NORM-FERGUSON-PSA-9-MINT-/150523384223?pt=US_Hockey_Trading_Cards&hash=item230be4919f\" class=\"vip\">1969 O-PEE-CHEE HOCKEY #146 NORM FERGUSON PSA 9 MINT</a>

But if the sellers took the "bold" option than the line look like this

<a href=\"http://cgi.ebay.com/1979-Topps-18-Wayne-Gretzky-Rookie-HOF-Oilers-PSA-7-/270666196846?pt=US_Hockey_Trading_Cards&hash=item3f04f6676e\" class=\"vip g-b\">1979 Topps #18 Wayne Gretzky Rookie HOF Oilers PSA 7</a>

Not a huge difference here were going from class=\"vip\" to class=\"vip g-b\"

So the question is how can I modify the preg_match_all regex to pickup both instance

Thanks
0
Comment
Question by:gamebits
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
3 Comments
 
LVL 35

Expert Comment

by:Terry Woods
ID: 34235785
Try this:
$match_count1 = preg_match_all('#class\s*=\s*"vip( g-b)?">([^<]*)</a>(?=(?:.(?!class\s*=\s*"vip( g-b)?"))*<span\s+class\s*=\s*"sold">)#is',$source,$title_arr);

Open in new window

0
 
LVL 75

Accepted Solution

by:
käµfm³d   👽 earned 500 total points
ID: 34235792
I believe changing it from

    vip

to

    vip[^"]*

in the pattern will do it. Be sure to escape that quote if you need to.
0
 
LVL 28

Author Comment

by:gamebits
ID: 34235860
@TerryAtOpus I'm loosing everything else and for the title I should be getting all I have is g-b

@kaufmed Yep, that did it, awesome.
0

Featured Post

Salesforce Has Never Been Easier

Improve and reinforce salesforce training & adoption using WalkMe's digital adoption platform. Start saving on costly employee training by creating fast intuitive Walk-Thrus for Salesforce. Claim your Free Account Now

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Generating table dynamically is the most common issue faced by php developers.... So it seems there is a need of an article that explains the basic concept of generating tables dynamically. It just requires a basic knowledge of html and little maths…
This article discusses four methods for overlaying images in a container on a web page
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
The viewer will learn how to count occurrences of each item in an array.

726 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question