Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 279
  • Last Modified:

Regular expression

For instance :
<p class="myclass">

14-04-03 : this is the title



</p>

I'd like to have :

day = 14
month=04
year=2003
title=this is the file
0
us111
Asked:
us111
  • 12
  • 7
  • 5
  • +1
2 Solutions
 
HackneyCabCommented:
What you need is preg_match or preg_match_all (use the match all option if the desired pattern will occur more than once in the sample string).

For instance:

$count = preg_match_all('#([0-9]{2})-([0-9]{2})-([0-9]{2}) : (.+)#', $string, $match_array);

Then $count will be zero if no matches were found, and $match_array should contain the results if matches were found.
0
 
us111Author Commented:
ok but with the HTML code <p class.........>           </p>
0
 
ahoffmannCommented:
$count=preg_match_all('#<p\s+class="myclass"[^>]*>\s*([0-9]{2})-([0-9]{2})-([0-9]{2}) \s*:\s*([^\r\n]+)[^<]*</p[ >]#', $string, $match);

# IIRC php's preg_match* is to stupid for case insensitive matches, you have to use character classes for that ...
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
us111Author Commented:
ahoffmann => works

<p class="myclass">

14-04-03 : this is the title

</p>
HTML code here part 1
<br><A name="documents"></a>
HTML code here part 2
<!-- END-->

More difficult :)

Now I want to get HTML code part 1 and part 2

I will increase the points


0
 
ahoffmannCommented:
$count=preg_match_all('#</p[^>]*>([^<]*)<br\s*><a[^>]*></a[^>]*>([^<]*)<!--#', $string, $match);
0
 
ZhaawZSoftware DeveloperCommented:
ahoffmann, what do you mean with "php's preg_match* is to stupid for case insensitive matches"?
0
 
ahoffmannCommented:
IIRC it has no "i" modifier
0
 
ZhaawZSoftware DeveloperCommented:
<?php
  echo (int)preg_match('/^test$/i', 'TeSt');
?>

Works fine for me.
0
 
us111Author Commented:
Ok but how can I combine it with the previous

my string is :
<p class="myclass">

14-04-03 : this is the title

</p>
HTML code here part 1
<br><A name="documents"></a>
HTML code here part 2
<!-- END-->
0
 
ZhaawZSoftware DeveloperCommented:
>> Ok but how can I combine it with the previous
us111, ahoffmann already wrote you how to get date/title and html parts 1 and 2 (in 2 different patterns though) ;)

Here's how I would to it (i.e., get date, title, html part 1 and html part 2 in a single pattern)...

<?php
  $str = '
    <p class="myclass">

    14-04-03 : this is the title

    </p>
    HTML code here part 1
    <br><A name="documents"></a>
    HTML code here part 2
    <!-- END-->
  ';
  $count = preg_match_all('/<p class="myclass">\s*(\d\d)-(\d\d)-(\d\d) : (.*)\s*<\/p>\s*(.*)\s*<br><a name="\w*"><\/a>\s*(.*)\s*<!-- END-->/i', $str, $matches);
  if ( $count ) {
    echo '<pre>', print_r($matches, 1), '</pre>';
  } else {
    echo 'nothing was found';
  }
?>
0
 
us111Author Commented:
ZhaawZ : Does not work for me. part 1 and part 2 could be whatever.
0
 
ZhaawZSoftware DeveloperCommented:
Could you show exact example that does not  work (both input data and source code, i.e., string that has to be examined and php script)?
0
 
us111Author Commented:
The goal is to get only the content of old HTML pages

exemple 1 :
<p class="subtitel">    14-04-03 : this is the title
  </p>
   Part 1
    <A name="documents"></a>
    Part 2
                <!-- END-->

exemple 2

<p class="subtitel">    14-04-03 : this is the title



  </p>
   Part 1

    <A name="documents"></a>
    Part 2
 <!-- END-->

Fixed parts :
- <A name="documents"></a>
-  <!-- END-->
- <p class="subtitel"> content here </p>

Part 1 and Part 2 are full HTML depending of the page, it could be 20 lines of HTML code.

The code :
$count=preg_match_all('#<p\s+class="myclass"[^>]*>\s*([0-9]{2})-([0-9]{2})-([0-9]{2}) \s*:\s*([^\r\n]+)[^<]*</p[ >]#', $string, $match);

works for : <p class="subtitel">    14-04-03 : this is the title  </p>

Maybe 2 different regular expressions could be used.

0
 
ZhaawZSoftware DeveloperCommented:
1) At the beginning you said that string starts with ``<p class="myclass">``, now you say that it starts with ``<p class="subtitel">`` ("myclass" and "subtitel" are different strings).
2) At the beginning there was ``<br>`` in front of ``<A name="documents"></a>`` - this "<br>" is gone from your last examples.

$count = preg_match_all('/<p class="subtitel">\s*(\d\d)-(\d\d)-(\d\d) : (.*)\s*<\/p>\s*(.*)\s*<a name="\w*"><\/a>\s*(.*)\s*<!-- END-->/i', $str, $matches);
Works fine with last 2 examples.
0
 
us111Author Commented:
>>>1) At the beginning you said that string starts with ``<p class="myclass">``, now you say that it starts >>>with ``<p class="subtitel">`` ("myclass" and "subtitel" are different strings).

yes same  just have to replace subtitel my myclass

>>>2) At the beginning there was ``<br>`` in front of ``<A name="documents"></a>`` - this "<br>" is gone >>>from your last examples.

I can remove it later
0
 
us111Author Commented:
ok it works for this easy example

try with :


 <p class="myclass">

19-11-02 : ipsum dolor sit amet, consectetuer adipiscing elit. Sed commodo. Curabitur se
</p>
<p>

<p>

<p>
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Sed commodo. Curabitur sed ante. Nam ut sem. Proin mattis justo a dui. Nulla varius. In ut mi. Donec sit amet risus ac lorem posuere

interdum. Vestibulum vestibulum lacus sed ligula. In rutrum rhoncus elit. Integer aliquam tincidunt mi. Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Nunc a tortor. Nunc consectetuer.
</p>
<p>
Aenean ut enim. Praesent et pede. Nulla imperdiet mollis mi. Nullam dignissim. Ut placerat posuere turpis. Donec aliquam lacus vel dui. Proin laoreet iaculis mi. Fusce placerat dapibus mauris.

Quisque euismod sem sed odio. Sed id sapien eu massa ultricies posuere.
</p>
<p>
Aliquam semper faucibus lacus. Donec eleifend arcu nec quam. Nunc tristique accumsan ante. Praesent neque velit, pretium in, rhoncus aliquam, volutpat ut, enim. Morbi lorem sem, elementum non,
</p></p><br>
                    <br><A name="documents"></a>
               
                    <center>
                   <table border="0" cellspacing="0" cellpadding="0">
                    <tr>
                        <td class="subtitel"></td>
                        <td width="5"><img src="/images/shared/x.gif" width="5" height="30"></td>
                        <td class="subtitel" align="center" colspan="2"></td>
                    </tr>
                       
                    </table>
                    </center>
     


<!--END-->
                </td>
                <td width="16"><img src="/images/shared/x.gif" width="16" height="1"></td>
0
 
us111Author Commented:
as mentioned fixed parts are always the same, but HTML code inside is..........very bad.....
0
 
ZhaawZSoftware DeveloperCommented:
What about this?

$count = preg_match_all('/<p class="\w*">\s*(\d\d)-(\d\d)-(\d\d) : (.*)\s*<\/p>\s*(\S.*)\s*<br><a name="\w*"><\/a>\s*(\S.*)\s*<!--\s*end\s*-->/isU', $str, $matches);

Remove "<br>" from pattern (before  "<a") if it is not needed.
0
 
us111Author Commented:
better, it seems to work.
I'll  try tomorrow. You're on the good way to get the points :)
0
 
ZhaawZSoftware DeveloperCommented:
Don't forget about others (who assisted) when you give away points ;)
0
 
ahoffmannCommented:
keep in mind: i.g. it's not possible to write unambigious regular expressions to parse HTML (SGML i.g.). It may work for very simple examples but not for more complicated ones, and admirable fails for most nested tags and for invalid syntax.
In this case you better go with a HTML parser.

As we see in your above examples, you still trapped into these cases.
We can write a regex for your initial question, and probably expand it with your further requirements, but are you then able to understand it, not talking about chaning a single . in in if it miserably fails 'cause the input is different in one charater.
Said this, I'd recommend that you use php's functions for parsing HTML rather than such sophisticated regex. It's up to you.
0
 
us111Author Commented:
I know that but I was optimistic :). php's functions for parsing HTML ?? What do you mean ?
0
 
ahoffmannCommented:
IIRC there is a xml_parse() or similar which should work for (well formed) HTML too
0
 
us111Author Commented:
yes but......HTML is not well formed.................

ok thanks for your help I think I will figure out.
0

Featured Post

Upgrade your Question Security!

Your question, your audience. Choose who sees your identity—and your question—with question security.

  • 12
  • 7
  • 5
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now