Solved

php regular expression question?

Posted on 2011-09-12
41
299 Views
Last Modified: 2012-05-12
I have the following 3 lines of text and need a single regular expression to capture only the data. The data can contain any characters. The line with the word test can be (test1|test2|test3|test4)

1. test1 data- data  
1. data- data
data- data

Here's what I tried to do but not working.

$remove_test_name = "(?:test1|test2|test3|test4)";    
preg_match("/(?:\d+\.)? $remove_test_name (.*)\-(.*)/",$string,$matches);    
echo $matches[1] . "-" . $matches[2]; The output I'm getting is this for line 1 (mention above):

:test1 data

for line 2

:data

for line 3

:

What I would like is this:

data: data
0
Comment
Question by:areyouready344
  • 12
  • 12
  • 7
  • +2
41 Comments
 
LVL 108

Expert Comment

by:Ray Paseur
Comment Utility
I do not understand the example.  Can you please give us some real-world input and show us what you want to get out from it?  Thanks. ~Ray
0
 
LVL 74

Expert Comment

by:käµfm³d 👽
Comment Utility
How about this?
<?php

	$source = <<<SOURCE
1. test1 data-data
1. data-data
data-data
SOURCE;

	$pattern = '#^\d*\.?\s*(?:test\d+)?\s*#m';

	$result = preg_replace($pattern, "", $source);

	echo $result;
?>

Open in new window

0
 

Author Comment

by:areyouready344
Comment Utility
For example, I have this file that have the following lines...

note:
          test1 - could also be test2 or test3 or test4
          data - the data could be any characters except test1 or test2 or test3 or test4

1. test1 data- data
1. data-data
data- data

Using php regular expression to filter the lines above, I want the output to be

data- data
0
 
LVL 74

Expert Comment

by:käµfm³d 👽
Comment Utility
Perhaps I should change my pattern a tad then:

$pattern = '#^\d+\.\s*(?:test[1-4])?\s*#m';

Open in new window

0
 
LVL 108

Expert Comment

by:Ray Paseur
Comment Utility
Do you have spaces in the data-data fields?  I am asking because this looks like a made-up generalization and for better or worse regular expressions tend to be easier to write correctly when you have a few accurate examples of the inputs and the corresponding accurate examples of the desired outputs.

Example:
1. data-data is expected to yield data- data but do you really want to insert a blank after the hyphen?  It's these kinds of seemingly unimportant things that can cause a lot of debugging time.
0
 

Author Comment

by:areyouready344
Comment Utility
This is the closest answer I can get

preg_match("/(?:\d+\.)?\s?$remove_test_name?\s?(.*)-(.*)/",$string,$matches);


The only problem with this answer is that test1 or test2 or test3 or test4 still display as part of the capture ($matches[1])
0
 
LVL 35

Expert Comment

by:Terry Woods
Comment Utility
Building on kaufmed's code, this works for me:

Output is (source and result shown):
1. test1 data-data
1. data-data
data-data
------
data-data
data-data
data-data
<?php

  $source = <<<SOURCE
1. test1 data-data
1. data-data
data-data
SOURCE;

  $remove_test_name = "test[1-4]";
  $pattern = "#^(\d+\.?)?\s*(?:$remove_test_name)?\s*#m";

  $result = preg_replace($pattern, "", $source);

  echo $source."\n------\n";
  echo $result."\n";
?>

Open in new window

0
 

Author Comment

by:areyouready344
Comment Utility
Can you provide an answer based on my question? I almost got it working

preg_match("/(?:\d+\.)?\s?$remove_test_name?\s?(.*)-(.*)/",$string,$matches);

The only problem with this answer is that test1 or test2 or test3 or test4 still display as part of the capture ($matches[1])
0
 
LVL 35

Expert Comment

by:Terry Woods
Comment Utility
Like this?
<?php

$string = <<<SOURCE
1. test1 data-data
1. data-data
data-data
SOURCE;

$remove_test_name = "test[1-4]";
preg_match_all("#^(?:\d+\.?)?\s*(?:$remove_test_name)?\s*(.*)$#m",$string,$matches);
unset($matches[0]);
print_r($matches);

Open in new window

0
 

Author Comment

by:areyouready344
Comment Utility
For example, I have the following line of text

1. test1 dkdkd- dkdkd

This code gives me the following output:

code
------
preg_match("/(?:\d+\.)?\s?(?:test1)?\s?(.*)-(.*)/",$string,$matches);

Output
---------
test1 dkdkd- dkdkd

Question - why is test1 is showing up?
0
 
LVL 35

Expert Comment

by:Terry Woods
Comment Utility
What are you outputting? (eg you shouldn't output $matches[0])
0
 

Author Comment

by:areyouready344
Comment Utility
I"m outputting like this:

echo $matches[1] . " - " . $matches[2];
0
 
LVL 35

Expert Comment

by:Terry Woods
Comment Utility
Pretty much exactly that code works for me:

$string = "1. test1 dkdkd- dkdkd";
preg_match("/(?:\d+\.)?\s?(?:test1)?\s?(.*)-(.*)/",$string,$matches);
echo $matches[1] . " - " . $matches[2];

Output:
dkdkd -  dkdkd
0
 
LVL 35

Expert Comment

by:Terry Woods
Comment Utility
If you have more than one space, you need a * instead of a ? after each \s:

preg_match("/(?:\d+\.)?\s*(?:test1)?\s*(.*)-(.*)/",$string,$matches);
0
 
LVL 108

Expert Comment

by:Ray Paseur
Comment Utility
Let's go back to the problem definition.  Quote:

1. test1 data- data
1. data-data
data- data

Using php regular expression to filter the lines above, I want the output to be

data- data

Unquote.

This would seem to say you want to throw away the first two lines.  Can you please give us one or two real-world examples of the input strings with the corresponding output strings?
0
 

Author Comment

by:areyouready344
Comment Utility
Here is the output of the var_dump:


[0]=>
  string(177) "1.  test1 dkdkd: dkdkd"
[1]=>
  string(37) "test1 dkdkd"
[2]=>
  string(135) "dkdkd"
0
 
LVL 74

Expert Comment

by:käµfm³d 👽
Comment Utility
You are only going to receive good results here if you clearly describe the input data, the expected output, and any errors you may be encountering. I wrote my pattern based on the requirement you previously described, yet I still see subsequent posts from you with patterns that are completely different that what I posted. If my (or others') pattern does not work for you, then provide sample input, resulting output, and any errors you may be receiving.
0
 
LVL 35

Assisted Solution

by:Terry Woods
Terry Woods earned 167 total points
Comment Utility
In:

  string(177) "1.  test1 dkdkd: dkdkd"

There are 2 spaces after the 1. so my latest pattern should address that.
0
 

Author Comment

by:areyouready344
Comment Utility
Thanks Terry for understanding this problem and was hoping your last resolution (\s*) would work.
0
Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

 
LVL 108

Assisted Solution

by:Ray Paseur
Ray Paseur earned 166 total points
Comment Utility
Maybe it would be easier to get this right if you eat the elephant in bites instead of trying to write a single complicated regular expression.
http://www.laprbass.com/RAY_temp_notready.php

But that said, there is no substitute for test-driven programming.  And for that you need to write your test cases first.  Practice has shown this to be the fastest way to write dependable code.  You can write code without creating test data - heck, any idiot can learn to write bad code without testing.  But the pros would probably want unit tests for something like this little algorithm.
<?php // RAY_temp_notready.php
error_reporting(E_ALL);
echo "<pre>";

$strs = array
( '1. test1 dkdkd- dkdkd'
, '1. data-data'
, 'data- data'
)
;

$rgx1
= '#'              // REGEX DELIMITER
. '^\d\.'          // STARTS WITH A DIGIT AND A DOT
. '#'              // REGEX DELIMITER
;

$rgx2
= '#'              // REGEX DELIMITER
. '^test\d?'       // STARTS WITH 'test' AND MAYBE A DIGIT
. '#'              // REGEX DELIMITER
;

foreach ($strs as $str)
{
    $new = $str;
    $new = trim(preg_replace($rgx1, NULL, $new));
    $new = trim(preg_replace($rgx2, NULL, $new));
    echo PHP_EOL . "$str TRANSFORMED INTO $new";
}

Open in new window

0
 
LVL 74

Assisted Solution

by:käµfm³d 👽
käµfm³d   👽 earned 167 total points
Comment Utility
Thanks Terry for understanding this problem
Was that a jab? Please allow me to counter with a 1-2...

<?php

	$source = <<<SOURCE
1. test1 data-data
1. data-data
data-data
SOURCE;

	$pattern = '#^\d+\.\s*(?:test[1-4])?\s*#m';

	$result = preg_replace($pattern, "", $source);

	$result = explode(PHP_EOL, $result);

	print_r($result);
?>

Open in new window


Pity it won't be used though. Can't expect everyone to appreciate one's logic, I guess  : \
0
 
LVL 74

Expert Comment

by:käµfm³d 👽
Comment Utility
Good luck Terry and Ray...  I'm hanging up the gloves  = )
0
 
LVL 108

Expert Comment

by:Ray Paseur
Comment Utility
@kaufmed: Spot on.

@areyouready344: Computer programming is an activity that requires clarity of thought and precision in execution.  You have to get the ideas into data and code in a way that will behave predictably.  In this matter, PHP is not your friend at all because it is highly permissive of sloppy programming and it hides important things from the programmer, such as accidental reliance on undefined variables.  If PHP is your only programming language you might want to consider studying something a little more structured.  And don't be impatient with yourself as you learn.  Rome was not built in a day.  This article explains what you are up against.
http://norvig.com/21-days.html

Anyway, you've gotten some working answers and hopefully some good ideas about how this kind of thing is usually done when time is money and accuracy matters.  Best of luck with your project, ~Ray
0
 
LVL 108

Expert Comment

by:Ray Paseur
Comment Utility
Me, too.  Over and out, ~Ray
0
 
LVL 35

Expert Comment

by:Terry Woods
Comment Utility
kaufmed, I agree that a replace is probably a more elegant solution, and avoids mucking around with an array as a result. The author didn't seem comfortable to change much from his original code though - either will work in the long run.

I often feel it's overkill with 2-3 experienced experts working on the same problem, but thanks to my timezone I'm not awake to answer most of the EE questions, so I have to be pretty competitive to pick up some points (even if it feels like I'm butting in on another expert's progress at times!)... on the positive side though, the competition sharpens both my technical skills and my ability to interpret and explain. So thanks, to you and Ray!
0
 

Author Comment

by:areyouready344
Comment Utility
The name of this webstie says Expert so I was hoping an expert can look at the problem from my point of view.

When I use the following code below it works:


preg_match("/(?:\d+\.)?\s*test1?\s*(.*)-(.*)/",$string,$matches);
echo $matches[1] . " " . $matches[2];

against this line:

1. test1 dkdkd- dkdkd

output:

dkdkd- dkdkd

Why when I use this code below it does not work? The only difference is I don't use the () around test1 in the above example and use
it in the example below.

preg_match("/(?:\d+\.)?\s*(?:test1)?\s*(.*)-(.*)/",$string,$matches);

output is:

test1 dkdkd- dkdkd

Why is test1 is being displayed in the second example?


0
 
LVL 35

Expert Comment

by:Terry Woods
Comment Utility
In your first piece of code, with the ? after test1, the ? makes the 1 optional so that test or test1 would be matched.

The second piece of code makes the entire test1 optional, but doesn't capture it. I just retested that second piece of code with your given example, and it works fine for me - test1 is not displayed. Can you please retest? If it still fails, then is it possible there is a special (invisible) character included in the string?
0
 
LVL 74

Expert Comment

by:käµfm³d 👽
Comment Utility
@TerryAtOpus
The author didn't seem comfortable to change much from his original code though - either will work in the long run.
You and I both know this site is as much about deciphering what problems are as it is posing solutions to said problems. The only things that frustrate me here are when people don't clearly express themselves and when someone ignores a potential solution without so much as questioning the logic or why it may be better or worse than one's own approach--or even saying, "hey, I need to do it this way because [fill-in-the-blank]." I have zero problem with explaining any of my posts; I often neglect an explanation because I find often times people just want a "get-er-done" approach rather than a "teach a man to fish" approach. I know Ray's seen this; I'd have trouble believing you haven't seen it. If someone doesn't understand my approach, all I request is they ask me to explain. Since all of us here are volunteers (i.e. we don't get paid), I think it's a small price to pay to say, "Hey, I didn't quite understand why you went that way. Would you mind clarifying for me?" Also, I LOVE being called out when my logic is incorrect. It gives me a chance to learn from my mistakes...  and I'm wrong quite often. Hell, you've corrected me on a number of occasions, and I love you that much for it (totally platonic, I assure you). I'm here to learn just as much as I am to teach.

Terry, I'm not harping on you... I'm just using you as my soapbox for the moment. Hope ya don't mind  ; )


The name of this webstie says Expert so I was hoping an expert can look at the problem from my point of view.
Read my profile. I don't claim to be an expert. I only hope that others who have worked with me can say, "yeah, he knows his stuff." If they are kind enough to bestow the moniker of "expert" upon me, then I thank them for it. Those who tout their own expertise have a need for self-satisfaction. I am completely satisfied with who I am.
0
 
LVL 74

Expert Comment

by:käµfm³d 👽
Comment Utility
That last bit wasn't for you Terry. Well, none of it was really.
0
 

Accepted Solution

by:
areyouready344 earned 0 total points
Comment Utility
I resolved the problem. The problem was I had two spaces between the line number and test1 and use the following code to resolve this issue.

preg_match("/(?:\d+\.)?\s*(?:test1)?\s*(.*)-(.*)/",$string,$matches);

the problem was with this part of the code, it made it not greedy so on the first whitespace it stops.

preg_match("/(?:\d+\.)?\s*

I change the code above to the code below and it now works...

preg_match("/(?:\d+\.)?\s\s(?:test1)?\s*(.*)-(.*)/",$string,$matches);

0
 
LVL 35

Expert Comment

by:Terry Woods
Comment Utility
That doesn't make sense. I have never seen a regex engine work as you describe so I suspect you may have made a mistake in your logic somewhere, or maybe you have a version of php that has a bug?

This:
preg_match("/(?:\d+\.)?\s*
is not non-greedy, with respect to matching spaces.

This is non-greedy (by adding a ? after the \s*):
preg_match("/(?:\d+\.)?\s*?

This:
preg_match("/(?:\d+\.)?\s\s(?:test1)?\s*(.*)-(.*)/",$string,$matches);
requires 2 space characters. Does it match your other cases? ie:
[1]=>
  string(37) "test1 dkdkd"
[2]=>
  string(135) "dkdkd"

0
 

Author Comment

by:areyouready344
Comment Utility
All I can tell you is that everything is working after changing \s*? to \s\s I tested with Perl and PHP both were not working and both are working now.. Thanks Terry for mentioning an extra character is missing somewhere.
0
 
LVL 35

Expert Comment

by:Terry Woods
Comment Utility
Did you have the non-greedy version:
\s*?
because that would explain why it didn't work. It should have been:
\s*

Anyway, glad you got it working!
0
 

Author Comment

by:areyouready344
Comment Utility
Still had problem with this:

preg_match("/(?:\d+\.)?\s\s(?:test1)?\s*(.*)-(.*)/",$string,$matches);

But changed it to this and everything is working:

preg_match("/(?:\d+\.)?(?:\s*)?(?:test1)?(?:\s*)?(.*)-(.*)/",$string,$matches);

Thanks for all your help Terry....
0
 

Author Comment

by:areyouready344
Comment Utility
I've requested that this question be closed as follows:

Accepted answer: 0 points for areyouready344's comment http:/Q_27304152.html#36526890

for the following reason:

best solution
0
 
LVL 35

Expert Comment

by:Terry Woods
Comment Utility
It's been a few weeks since I looked at this, and trying to go back over the trail of logic in this question makes my head spin. However, it's pretty clear to me that the author's final solution was based on code I provided, and my code was based on kaufmed's code. Even if the author's comment is accepted as the solution, both kaufmed and I deserve some points for helping the author along the way, thus I object to the closing of the question this way (to the author: did you know you can accept multiple comments when closing the question?). Thanks...
0
 
LVL 35

Expert Comment

by:Terry Woods
Comment Utility
0
 

Expert Comment

by:South Mod
Comment Utility
All,
 
Following an 'Objection' by TerryAtOpus (at http://www.experts-exchange.com/Q_27399445.html) to the intended closure of this question, it has been reviewed by at least one Moderator and is being closed as recommended by the Expert.
 
At this point I am going to re-start the auto-close procedure.
 
Thank you,
 
SouthMod
Community Support Moderator
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Author Note: Since this E-E article was originally written, years ago, formal testing has come into common use in the world of PHP.  PHPUnit (http://en.wikipedia.org/wiki/PHPUnit) and similar technologies have enjoyed wide adoption, making it possib…
In the distant past (last year) I hacked together a little toy that would allow a couple of Manager types to query, preview, and extract data from a number of MongoDB instances, to their tool of choice: Excel (http://dilbert.com/strips/comic/2007-08…
The viewer will learn how to dynamically set the form action using jQuery.
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.

771 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now