Solved

php regular expression question?

Posted on 2011-09-12
41
303 Views
Last Modified: 2012-05-12
I have the following 3 lines of text and need a single regular expression to capture only the data. The data can contain any characters. The line with the word test can be (test1|test2|test3|test4)

1. test1 data- data  
1. data- data
data- data

Here's what I tried to do but not working.

$remove_test_name = "(?:test1|test2|test3|test4)";    
preg_match("/(?:\d+\.)? $remove_test_name (.*)\-(.*)/",$string,$matches);    
echo $matches[1] . "-" . $matches[2]; The output I'm getting is this for line 1 (mention above):

:test1 data

for line 2

:data

for line 3

:

What I would like is this:

data: data
0
Comment
Question by:areyouready344
  • 12
  • 12
  • 7
  • +2
41 Comments
 
LVL 109

Expert Comment

by:Ray Paseur
ID: 36525682
I do not understand the example.  Can you please give us some real-world input and show us what you want to get out from it?  Thanks. ~Ray
0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 36525716
How about this?
<?php

	$source = <<<SOURCE
1. test1 data-data
1. data-data
data-data
SOURCE;

	$pattern = '#^\d*\.?\s*(?:test\d+)?\s*#m';

	$result = preg_replace($pattern, "", $source);

	echo $result;
?>

Open in new window

0
 

Author Comment

by:areyouready344
ID: 36525746
For example, I have this file that have the following lines...

note:
          test1 - could also be test2 or test3 or test4
          data - the data could be any characters except test1 or test2 or test3 or test4

1. test1 data- data
1. data-data
data- data

Using php regular expression to filter the lines above, I want the output to be

data- data
0
PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 36525791
Perhaps I should change my pattern a tad then:

$pattern = '#^\d+\.\s*(?:test[1-4])?\s*#m';

Open in new window

0
 
LVL 109

Expert Comment

by:Ray Paseur
ID: 36525829
Do you have spaces in the data-data fields?  I am asking because this looks like a made-up generalization and for better or worse regular expressions tend to be easier to write correctly when you have a few accurate examples of the inputs and the corresponding accurate examples of the desired outputs.

Example:
1. data-data is expected to yield data- data but do you really want to insert a blank after the hyphen?  It's these kinds of seemingly unimportant things that can cause a lot of debugging time.
0
 

Author Comment

by:areyouready344
ID: 36525994
This is the closest answer I can get

preg_match("/(?:\d+\.)?\s?$remove_test_name?\s?(.*)-(.*)/",$string,$matches);


The only problem with this answer is that test1 or test2 or test3 or test4 still display as part of the capture ($matches[1])
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 36526067
Building on kaufmed's code, this works for me:

Output is (source and result shown):
1. test1 data-data
1. data-data
data-data
------
data-data
data-data
data-data
<?php

  $source = <<<SOURCE
1. test1 data-data
1. data-data
data-data
SOURCE;

  $remove_test_name = "test[1-4]";
  $pattern = "#^(\d+\.?)?\s*(?:$remove_test_name)?\s*#m";

  $result = preg_replace($pattern, "", $source);

  echo $source."\n------\n";
  echo $result."\n";
?>

Open in new window

0
 

Author Comment

by:areyouready344
ID: 36526090
Can you provide an answer based on my question? I almost got it working

preg_match("/(?:\d+\.)?\s?$remove_test_name?\s?(.*)-(.*)/",$string,$matches);

The only problem with this answer is that test1 or test2 or test3 or test4 still display as part of the capture ($matches[1])
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 36526124
Like this?
<?php

$string = <<<SOURCE
1. test1 data-data
1. data-data
data-data
SOURCE;

$remove_test_name = "test[1-4]";
preg_match_all("#^(?:\d+\.?)?\s*(?:$remove_test_name)?\s*(.*)$#m",$string,$matches);
unset($matches[0]);
print_r($matches);

Open in new window

0
 

Author Comment

by:areyouready344
ID: 36526125
For example, I have the following line of text

1. test1 dkdkd- dkdkd

This code gives me the following output:

code
------
preg_match("/(?:\d+\.)?\s?(?:test1)?\s?(.*)-(.*)/",$string,$matches);

Output
---------
test1 dkdkd- dkdkd

Question - why is test1 is showing up?
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 36526135
What are you outputting? (eg you shouldn't output $matches[0])
0
 

Author Comment

by:areyouready344
ID: 36526152
I"m outputting like this:

echo $matches[1] . " - " . $matches[2];
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 36526182
Pretty much exactly that code works for me:

$string = "1. test1 dkdkd- dkdkd";
preg_match("/(?:\d+\.)?\s?(?:test1)?\s?(.*)-(.*)/",$string,$matches);
echo $matches[1] . " - " . $matches[2];

Output:
dkdkd -  dkdkd
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 36526190
If you have more than one space, you need a * instead of a ? after each \s:

preg_match("/(?:\d+\.)?\s*(?:test1)?\s*(.*)-(.*)/",$string,$matches);
0
 
LVL 109

Expert Comment

by:Ray Paseur
ID: 36526199
Let's go back to the problem definition.  Quote:

1. test1 data- data
1. data-data
data- data

Using php regular expression to filter the lines above, I want the output to be

data- data

Unquote.

This would seem to say you want to throw away the first two lines.  Can you please give us one or two real-world examples of the input strings with the corresponding output strings?
0
 

Author Comment

by:areyouready344
ID: 36526200
Here is the output of the var_dump:


[0]=>
  string(177) "1.  test1 dkdkd: dkdkd"
[1]=>
  string(37) "test1 dkdkd"
[2]=>
  string(135) "dkdkd"
0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 36526205
You are only going to receive good results here if you clearly describe the input data, the expected output, and any errors you may be encountering. I wrote my pattern based on the requirement you previously described, yet I still see subsequent posts from you with patterns that are completely different that what I posted. If my (or others') pattern does not work for you, then provide sample input, resulting output, and any errors you may be receiving.
0
 
LVL 35

Assisted Solution

by:Terry Woods
Terry Woods earned 167 total points
ID: 36526216
In:

  string(177) "1.  test1 dkdkd: dkdkd"

There are 2 spaces after the 1. so my latest pattern should address that.
0
 

Author Comment

by:areyouready344
ID: 36526238
Thanks Terry for understanding this problem and was hoping your last resolution (\s*) would work.
0
 
LVL 109

Assisted Solution

by:Ray Paseur
Ray Paseur earned 166 total points
ID: 36526281
Maybe it would be easier to get this right if you eat the elephant in bites instead of trying to write a single complicated regular expression.
http://www.laprbass.com/RAY_temp_notready.php

But that said, there is no substitute for test-driven programming.  And for that you need to write your test cases first.  Practice has shown this to be the fastest way to write dependable code.  You can write code without creating test data - heck, any idiot can learn to write bad code without testing.  But the pros would probably want unit tests for something like this little algorithm.
<?php // RAY_temp_notready.php
error_reporting(E_ALL);
echo "<pre>";

$strs = array
( '1. test1 dkdkd- dkdkd'
, '1. data-data'
, 'data- data'
)
;

$rgx1
= '#'              // REGEX DELIMITER
. '^\d\.'          // STARTS WITH A DIGIT AND A DOT
. '#'              // REGEX DELIMITER
;

$rgx2
= '#'              // REGEX DELIMITER
. '^test\d?'       // STARTS WITH 'test' AND MAYBE A DIGIT
. '#'              // REGEX DELIMITER
;

foreach ($strs as $str)
{
    $new = $str;
    $new = trim(preg_replace($rgx1, NULL, $new));
    $new = trim(preg_replace($rgx2, NULL, $new));
    echo PHP_EOL . "$str TRANSFORMED INTO $new";
}

Open in new window

0
 
LVL 75

Assisted Solution

by:käµfm³d 👽
käµfm³d   👽 earned 167 total points
ID: 36526286
Thanks Terry for understanding this problem
Was that a jab? Please allow me to counter with a 1-2...

<?php

	$source = <<<SOURCE
1. test1 data-data
1. data-data
data-data
SOURCE;

	$pattern = '#^\d+\.\s*(?:test[1-4])?\s*#m';

	$result = preg_replace($pattern, "", $source);

	$result = explode(PHP_EOL, $result);

	print_r($result);
?>

Open in new window


Pity it won't be used though. Can't expect everyone to appreciate one's logic, I guess  : \
0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 36526294
Good luck Terry and Ray...  I'm hanging up the gloves  = )
0
 
LVL 109

Expert Comment

by:Ray Paseur
ID: 36526323
@kaufmed: Spot on.

@areyouready344: Computer programming is an activity that requires clarity of thought and precision in execution.  You have to get the ideas into data and code in a way that will behave predictably.  In this matter, PHP is not your friend at all because it is highly permissive of sloppy programming and it hides important things from the programmer, such as accidental reliance on undefined variables.  If PHP is your only programming language you might want to consider studying something a little more structured.  And don't be impatient with yourself as you learn.  Rome was not built in a day.  This article explains what you are up against.
http://norvig.com/21-days.html

Anyway, you've gotten some working answers and hopefully some good ideas about how this kind of thing is usually done when time is money and accuracy matters.  Best of luck with your project, ~Ray
0
 
LVL 109

Expert Comment

by:Ray Paseur
ID: 36526325
Me, too.  Over and out, ~Ray
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 36526430
kaufmed, I agree that a replace is probably a more elegant solution, and avoids mucking around with an array as a result. The author didn't seem comfortable to change much from his original code though - either will work in the long run.

I often feel it's overkill with 2-3 experienced experts working on the same problem, but thanks to my timezone I'm not awake to answer most of the EE questions, so I have to be pretty competitive to pick up some points (even if it feels like I'm butting in on another expert's progress at times!)... on the positive side though, the competition sharpens both my technical skills and my ability to interpret and explain. So thanks, to you and Ray!
0
 

Author Comment

by:areyouready344
ID: 36526490
The name of this webstie says Expert so I was hoping an expert can look at the problem from my point of view.

When I use the following code below it works:


preg_match("/(?:\d+\.)?\s*test1?\s*(.*)-(.*)/",$string,$matches);
echo $matches[1] . " " . $matches[2];

against this line:

1. test1 dkdkd- dkdkd

output:

dkdkd- dkdkd

Why when I use this code below it does not work? The only difference is I don't use the () around test1 in the above example and use
it in the example below.

preg_match("/(?:\d+\.)?\s*(?:test1)?\s*(.*)-(.*)/",$string,$matches);

output is:

test1 dkdkd- dkdkd

Why is test1 is being displayed in the second example?


0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 36526518
In your first piece of code, with the ? after test1, the ? makes the 1 optional so that test or test1 would be matched.

The second piece of code makes the entire test1 optional, but doesn't capture it. I just retested that second piece of code with your given example, and it works fine for me - test1 is not displayed. Can you please retest? If it still fails, then is it possible there is a special (invisible) character included in the string?
0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 36526617
@TerryAtOpus
The author didn't seem comfortable to change much from his original code though - either will work in the long run.
You and I both know this site is as much about deciphering what problems are as it is posing solutions to said problems. The only things that frustrate me here are when people don't clearly express themselves and when someone ignores a potential solution without so much as questioning the logic or why it may be better or worse than one's own approach--or even saying, "hey, I need to do it this way because [fill-in-the-blank]." I have zero problem with explaining any of my posts; I often neglect an explanation because I find often times people just want a "get-er-done" approach rather than a "teach a man to fish" approach. I know Ray's seen this; I'd have trouble believing you haven't seen it. If someone doesn't understand my approach, all I request is they ask me to explain. Since all of us here are volunteers (i.e. we don't get paid), I think it's a small price to pay to say, "Hey, I didn't quite understand why you went that way. Would you mind clarifying for me?" Also, I LOVE being called out when my logic is incorrect. It gives me a chance to learn from my mistakes...  and I'm wrong quite often. Hell, you've corrected me on a number of occasions, and I love you that much for it (totally platonic, I assure you). I'm here to learn just as much as I am to teach.

Terry, I'm not harping on you... I'm just using you as my soapbox for the moment. Hope ya don't mind  ; )


The name of this webstie says Expert so I was hoping an expert can look at the problem from my point of view.
Read my profile. I don't claim to be an expert. I only hope that others who have worked with me can say, "yeah, he knows his stuff." If they are kind enough to bestow the moniker of "expert" upon me, then I thank them for it. Those who tout their own expertise have a need for self-satisfaction. I am completely satisfied with who I am.
0
 
LVL 75

Expert Comment

by:käµfm³d 👽
ID: 36526620
That last bit wasn't for you Terry. Well, none of it was really.
0
 

Accepted Solution

by:
areyouready344 earned 0 total points
ID: 36526890
I resolved the problem. The problem was I had two spaces between the line number and test1 and use the following code to resolve this issue.

preg_match("/(?:\d+\.)?\s*(?:test1)?\s*(.*)-(.*)/",$string,$matches);

the problem was with this part of the code, it made it not greedy so on the first whitespace it stops.

preg_match("/(?:\d+\.)?\s*

I change the code above to the code below and it now works...

preg_match("/(?:\d+\.)?\s\s(?:test1)?\s*(.*)-(.*)/",$string,$matches);

0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 36526902
That doesn't make sense. I have never seen a regex engine work as you describe so I suspect you may have made a mistake in your logic somewhere, or maybe you have a version of php that has a bug?

This:
preg_match("/(?:\d+\.)?\s*
is not non-greedy, with respect to matching spaces.

This is non-greedy (by adding a ? after the \s*):
preg_match("/(?:\d+\.)?\s*?

This:
preg_match("/(?:\d+\.)?\s\s(?:test1)?\s*(.*)-(.*)/",$string,$matches);
requires 2 space characters. Does it match your other cases? ie:
[1]=>
  string(37) "test1 dkdkd"
[2]=>
  string(135) "dkdkd"

0
 

Author Comment

by:areyouready344
ID: 36526916
All I can tell you is that everything is working after changing \s*? to \s\s I tested with Perl and PHP both were not working and both are working now.. Thanks Terry for mentioning an extra character is missing somewhere.
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 36526926
Did you have the non-greedy version:
\s*?
because that would explain why it didn't work. It should have been:
\s*

Anyway, glad you got it working!
0
 

Author Comment

by:areyouready344
ID: 36526945
Still had problem with this:

preg_match("/(?:\d+\.)?\s\s(?:test1)?\s*(.*)-(.*)/",$string,$matches);

But changed it to this and everything is working:

preg_match("/(?:\d+\.)?(?:\s*)?(?:test1)?(?:\s*)?(.*)-(.*)/",$string,$matches);

Thanks for all your help Terry....
0
 

Author Comment

by:areyouready344
ID: 36976722
I've requested that this question be closed as follows:

Accepted answer: 0 points for areyouready344's comment http:/Q_27304152.html#36526890

for the following reason:

best solution
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 36976723
It's been a few weeks since I looked at this, and trying to go back over the trail of logic in this question makes my head spin. However, it's pretty clear to me that the author's final solution was based on code I provided, and my code was based on kaufmed's code. Even if the author's comment is accepted as the solution, both kaufmed and I deserve some points for helping the author along the way, thus I object to the closing of the question this way (to the author: did you know you can accept multiple comments when closing the question?). Thanks...
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 37020839
0
 

Expert Comment

by:South Mod
ID: 37074926
All,
 
Following an 'Objection' by TerryAtOpus (at http://www.experts-exchange.com/Q_27399445.html) to the intended closure of this question, it has been reviewed by at least one Moderator and is being closed as recommended by the Expert.
 
At this point I am going to re-start the auto-close procedure.
 
Thank you,
 
SouthMod
Community Support Moderator
0

Featured Post

PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Things That Drive Us Nuts Have you noticed the use of the reCaptcha feature at EE and other web sites?  It wants you to read and retype something that looks like this.Insanity!  It's not EE's fault - that's just the way reCaptcha works.  But it is …
This article discusses four methods for overlaying images in a container on a web page
The viewer will learn how to dynamically set the form action using jQuery.
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …

773 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question