• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 338
  • Last Modified:

php regular expression question?

I have the following 3 lines of text and need a single regular expression to capture only the data. The data can contain any characters. The line with the word test can be (test1|test2|test3|test4)

1. test1 data- data  
1. data- data
data- data

Here's what I tried to do but not working.

$remove_test_name = "(?:test1|test2|test3|test4)";    
preg_match("/(?:\d+\.)? $remove_test_name (.*)\-(.*)/",$string,$matches);    
echo $matches[1] . "-" . $matches[2]; The output I'm getting is this for line 1 (mention above):

:test1 data

for line 2

:data

for line 3

:

What I would like is this:

data: data
0
areyouready344
Asked:
areyouready344
  • 12
  • 12
  • 7
  • +2
4 Solutions
 
Ray PaseurCommented:
I do not understand the example.  Can you please give us some real-world input and show us what you want to get out from it?  Thanks. ~Ray
0
 
käµfm³d 👽Commented:
How about this?
<?php

	$source = <<<SOURCE
1. test1 data-data
1. data-data
data-data
SOURCE;

	$pattern = '#^\d*\.?\s*(?:test\d+)?\s*#m';

	$result = preg_replace($pattern, "", $source);

	echo $result;
?>

Open in new window

0
 
areyouready344Author Commented:
For example, I have this file that have the following lines...

note:
          test1 - could also be test2 or test3 or test4
          data - the data could be any characters except test1 or test2 or test3 or test4

1. test1 data- data
1. data-data
data- data

Using php regular expression to filter the lines above, I want the output to be

data- data
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
käµfm³d 👽Commented:
Perhaps I should change my pattern a tad then:

$pattern = '#^\d+\.\s*(?:test[1-4])?\s*#m';

Open in new window

0
 
Ray PaseurCommented:
Do you have spaces in the data-data fields?  I am asking because this looks like a made-up generalization and for better or worse regular expressions tend to be easier to write correctly when you have a few accurate examples of the inputs and the corresponding accurate examples of the desired outputs.

Example:
1. data-data is expected to yield data- data but do you really want to insert a blank after the hyphen?  It's these kinds of seemingly unimportant things that can cause a lot of debugging time.
0
 
areyouready344Author Commented:
This is the closest answer I can get

preg_match("/(?:\d+\.)?\s?$remove_test_name?\s?(.*)-(.*)/",$string,$matches);


The only problem with this answer is that test1 or test2 or test3 or test4 still display as part of the capture ($matches[1])
0
 
Terry WoodsIT GuruCommented:
Building on kaufmed's code, this works for me:

Output is (source and result shown):
1. test1 data-data
1. data-data
data-data
------
data-data
data-data
data-data
<?php

  $source = <<<SOURCE
1. test1 data-data
1. data-data
data-data
SOURCE;

  $remove_test_name = "test[1-4]";
  $pattern = "#^(\d+\.?)?\s*(?:$remove_test_name)?\s*#m";

  $result = preg_replace($pattern, "", $source);

  echo $source."\n------\n";
  echo $result."\n";
?>

Open in new window

0
 
areyouready344Author Commented:
Can you provide an answer based on my question? I almost got it working

preg_match("/(?:\d+\.)?\s?$remove_test_name?\s?(.*)-(.*)/",$string,$matches);

The only problem with this answer is that test1 or test2 or test3 or test4 still display as part of the capture ($matches[1])
0
 
Terry WoodsIT GuruCommented:
Like this?
<?php

$string = <<<SOURCE
1. test1 data-data
1. data-data
data-data
SOURCE;

$remove_test_name = "test[1-4]";
preg_match_all("#^(?:\d+\.?)?\s*(?:$remove_test_name)?\s*(.*)$#m",$string,$matches);
unset($matches[0]);
print_r($matches);

Open in new window

0
 
areyouready344Author Commented:
For example, I have the following line of text

1. test1 dkdkd- dkdkd

This code gives me the following output:

code
------
preg_match("/(?:\d+\.)?\s?(?:test1)?\s?(.*)-(.*)/",$string,$matches);

Output
---------
test1 dkdkd- dkdkd

Question - why is test1 is showing up?
0
 
Terry WoodsIT GuruCommented:
What are you outputting? (eg you shouldn't output $matches[0])
0
 
areyouready344Author Commented:
I"m outputting like this:

echo $matches[1] . " - " . $matches[2];
0
 
Terry WoodsIT GuruCommented:
Pretty much exactly that code works for me:

$string = "1. test1 dkdkd- dkdkd";
preg_match("/(?:\d+\.)?\s?(?:test1)?\s?(.*)-(.*)/",$string,$matches);
echo $matches[1] . " - " . $matches[2];

Output:
dkdkd -  dkdkd
0
 
Terry WoodsIT GuruCommented:
If you have more than one space, you need a * instead of a ? after each \s:

preg_match("/(?:\d+\.)?\s*(?:test1)?\s*(.*)-(.*)/",$string,$matches);
0
 
Ray PaseurCommented:
Let's go back to the problem definition.  Quote:

1. test1 data- data
1. data-data
data- data

Using php regular expression to filter the lines above, I want the output to be

data- data

Unquote.

This would seem to say you want to throw away the first two lines.  Can you please give us one or two real-world examples of the input strings with the corresponding output strings?
0
 
areyouready344Author Commented:
Here is the output of the var_dump:


[0]=>
  string(177) "1.  test1 dkdkd: dkdkd"
[1]=>
  string(37) "test1 dkdkd"
[2]=>
  string(135) "dkdkd"
0
 
käµfm³d 👽Commented:
You are only going to receive good results here if you clearly describe the input data, the expected output, and any errors you may be encountering. I wrote my pattern based on the requirement you previously described, yet I still see subsequent posts from you with patterns that are completely different that what I posted. If my (or others') pattern does not work for you, then provide sample input, resulting output, and any errors you may be receiving.
0
 
Terry WoodsIT GuruCommented:
In:

  string(177) "1.  test1 dkdkd: dkdkd"

There are 2 spaces after the 1. so my latest pattern should address that.
0
 
areyouready344Author Commented:
Thanks Terry for understanding this problem and was hoping your last resolution (\s*) would work.
0
 
Ray PaseurCommented:
Maybe it would be easier to get this right if you eat the elephant in bites instead of trying to write a single complicated regular expression.
http://www.laprbass.com/RAY_temp_notready.php

But that said, there is no substitute for test-driven programming.  And for that you need to write your test cases first.  Practice has shown this to be the fastest way to write dependable code.  You can write code without creating test data - heck, any idiot can learn to write bad code without testing.  But the pros would probably want unit tests for something like this little algorithm.
<?php // RAY_temp_notready.php
error_reporting(E_ALL);
echo "<pre>";

$strs = array
( '1. test1 dkdkd- dkdkd'
, '1. data-data'
, 'data- data'
)
;

$rgx1
= '#'              // REGEX DELIMITER
. '^\d\.'          // STARTS WITH A DIGIT AND A DOT
. '#'              // REGEX DELIMITER
;

$rgx2
= '#'              // REGEX DELIMITER
. '^test\d?'       // STARTS WITH 'test' AND MAYBE A DIGIT
. '#'              // REGEX DELIMITER
;

foreach ($strs as $str)
{
    $new = $str;
    $new = trim(preg_replace($rgx1, NULL, $new));
    $new = trim(preg_replace($rgx2, NULL, $new));
    echo PHP_EOL . "$str TRANSFORMED INTO $new";
}

Open in new window

0
 
käµfm³d 👽Commented:
Thanks Terry for understanding this problem
Was that a jab? Please allow me to counter with a 1-2...

<?php

	$source = <<<SOURCE
1. test1 data-data
1. data-data
data-data
SOURCE;

	$pattern = '#^\d+\.\s*(?:test[1-4])?\s*#m';

	$result = preg_replace($pattern, "", $source);

	$result = explode(PHP_EOL, $result);

	print_r($result);
?>

Open in new window


Pity it won't be used though. Can't expect everyone to appreciate one's logic, I guess  : \
0
 
käµfm³d 👽Commented:
Good luck Terry and Ray...  I'm hanging up the gloves  = )
0
 
Ray PaseurCommented:
@kaufmed: Spot on.

@areyouready344: Computer programming is an activity that requires clarity of thought and precision in execution.  You have to get the ideas into data and code in a way that will behave predictably.  In this matter, PHP is not your friend at all because it is highly permissive of sloppy programming and it hides important things from the programmer, such as accidental reliance on undefined variables.  If PHP is your only programming language you might want to consider studying something a little more structured.  And don't be impatient with yourself as you learn.  Rome was not built in a day.  This article explains what you are up against.
http://norvig.com/21-days.html

Anyway, you've gotten some working answers and hopefully some good ideas about how this kind of thing is usually done when time is money and accuracy matters.  Best of luck with your project, ~Ray
0
 
Ray PaseurCommented:
Me, too.  Over and out, ~Ray
0
 
Terry WoodsIT GuruCommented:
kaufmed, I agree that a replace is probably a more elegant solution, and avoids mucking around with an array as a result. The author didn't seem comfortable to change much from his original code though - either will work in the long run.

I often feel it's overkill with 2-3 experienced experts working on the same problem, but thanks to my timezone I'm not awake to answer most of the EE questions, so I have to be pretty competitive to pick up some points (even if it feels like I'm butting in on another expert's progress at times!)... on the positive side though, the competition sharpens both my technical skills and my ability to interpret and explain. So thanks, to you and Ray!
0
 
areyouready344Author Commented:
The name of this webstie says Expert so I was hoping an expert can look at the problem from my point of view.

When I use the following code below it works:


preg_match("/(?:\d+\.)?\s*test1?\s*(.*)-(.*)/",$string,$matches);
echo $matches[1] . " " . $matches[2];

against this line:

1. test1 dkdkd- dkdkd

output:

dkdkd- dkdkd

Why when I use this code below it does not work? The only difference is I don't use the () around test1 in the above example and use
it in the example below.

preg_match("/(?:\d+\.)?\s*(?:test1)?\s*(.*)-(.*)/",$string,$matches);

output is:

test1 dkdkd- dkdkd

Why is test1 is being displayed in the second example?


0
 
Terry WoodsIT GuruCommented:
In your first piece of code, with the ? after test1, the ? makes the 1 optional so that test or test1 would be matched.

The second piece of code makes the entire test1 optional, but doesn't capture it. I just retested that second piece of code with your given example, and it works fine for me - test1 is not displayed. Can you please retest? If it still fails, then is it possible there is a special (invisible) character included in the string?
0
 
käµfm³d 👽Commented:
@TerryAtOpus
The author didn't seem comfortable to change much from his original code though - either will work in the long run.
You and I both know this site is as much about deciphering what problems are as it is posing solutions to said problems. The only things that frustrate me here are when people don't clearly express themselves and when someone ignores a potential solution without so much as questioning the logic or why it may be better or worse than one's own approach--or even saying, "hey, I need to do it this way because [fill-in-the-blank]." I have zero problem with explaining any of my posts; I often neglect an explanation because I find often times people just want a "get-er-done" approach rather than a "teach a man to fish" approach. I know Ray's seen this; I'd have trouble believing you haven't seen it. If someone doesn't understand my approach, all I request is they ask me to explain. Since all of us here are volunteers (i.e. we don't get paid), I think it's a small price to pay to say, "Hey, I didn't quite understand why you went that way. Would you mind clarifying for me?" Also, I LOVE being called out when my logic is incorrect. It gives me a chance to learn from my mistakes...  and I'm wrong quite often. Hell, you've corrected me on a number of occasions, and I love you that much for it (totally platonic, I assure you). I'm here to learn just as much as I am to teach.

Terry, I'm not harping on you... I'm just using you as my soapbox for the moment. Hope ya don't mind  ; )


The name of this webstie says Expert so I was hoping an expert can look at the problem from my point of view.
Read my profile. I don't claim to be an expert. I only hope that others who have worked with me can say, "yeah, he knows his stuff." If they are kind enough to bestow the moniker of "expert" upon me, then I thank them for it. Those who tout their own expertise have a need for self-satisfaction. I am completely satisfied with who I am.
0
 
käµfm³d 👽Commented:
That last bit wasn't for you Terry. Well, none of it was really.
0
 
areyouready344Author Commented:
I resolved the problem. The problem was I had two spaces between the line number and test1 and use the following code to resolve this issue.

preg_match("/(?:\d+\.)?\s*(?:test1)?\s*(.*)-(.*)/",$string,$matches);

the problem was with this part of the code, it made it not greedy so on the first whitespace it stops.

preg_match("/(?:\d+\.)?\s*

I change the code above to the code below and it now works...

preg_match("/(?:\d+\.)?\s\s(?:test1)?\s*(.*)-(.*)/",$string,$matches);

0
 
Terry WoodsIT GuruCommented:
That doesn't make sense. I have never seen a regex engine work as you describe so I suspect you may have made a mistake in your logic somewhere, or maybe you have a version of php that has a bug?

This:
preg_match("/(?:\d+\.)?\s*
is not non-greedy, with respect to matching spaces.

This is non-greedy (by adding a ? after the \s*):
preg_match("/(?:\d+\.)?\s*?

This:
preg_match("/(?:\d+\.)?\s\s(?:test1)?\s*(.*)-(.*)/",$string,$matches);
requires 2 space characters. Does it match your other cases? ie:
[1]=>
  string(37) "test1 dkdkd"
[2]=>
  string(135) "dkdkd"

0
 
areyouready344Author Commented:
All I can tell you is that everything is working after changing \s*? to \s\s I tested with Perl and PHP both were not working and both are working now.. Thanks Terry for mentioning an extra character is missing somewhere.
0
 
Terry WoodsIT GuruCommented:
Did you have the non-greedy version:
\s*?
because that would explain why it didn't work. It should have been:
\s*

Anyway, glad you got it working!
0
 
areyouready344Author Commented:
Still had problem with this:

preg_match("/(?:\d+\.)?\s\s(?:test1)?\s*(.*)-(.*)/",$string,$matches);

But changed it to this and everything is working:

preg_match("/(?:\d+\.)?(?:\s*)?(?:test1)?(?:\s*)?(.*)-(.*)/",$string,$matches);

Thanks for all your help Terry....
0
 
areyouready344Author Commented:
I've requested that this question be closed as follows:

Accepted answer: 0 points for areyouready344's comment http:/Q_27304152.html#36526890

for the following reason:

best solution
0
 
Terry WoodsIT GuruCommented:
It's been a few weeks since I looked at this, and trying to go back over the trail of logic in this question makes my head spin. However, it's pretty clear to me that the author's final solution was based on code I provided, and my code was based on kaufmed's code. Even if the author's comment is accepted as the solution, both kaufmed and I deserve some points for helping the author along the way, thus I object to the closing of the question this way (to the author: did you know you can accept multiple comments when closing the question?). Thanks...
0
 
Terry WoodsIT GuruCommented:
0
 
South ModModeratorCommented:
All,
 
Following an 'Objection' by TerryAtOpus (at http://www.experts-exchange.com/Q_27399445.html) to the intended closure of this question, it has been reviewed by at least one Moderator and is being closed as recommended by the Expert.
 
At this point I am going to re-start the auto-close procedure.
 
Thank you,
 
SouthMod
Community Support Moderator
0

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

  • 12
  • 12
  • 7
  • +2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now