Solved

Regex help, please

Posted on 2009-07-09
27
492 Views
Last Modified: 2012-05-07
I need a regex to capture the text between 'asm' and 'end;' in this text:

asm
  MOV AX, 1234H
  MOV Number, AX
end;

0
Comment
Question by:EddieShipman
  • 12
  • 8
  • 5
  • +1
27 Comments
 
LVL 39

Expert Comment

by:Roger Baklund
ID: 24818275
Are you sure you need a regexp for that? Couldn't you just remove the three first and the four last characters?

Anyways, try this:
$string = "asm

  MOV AX, 1234H

  MOV Number, AX

end;";

preg_match('/asm(.+)end;/s',$string,$matches);

$asm = $matches[1];

Open in new window

0
 
LVL 26

Author Comment

by:EddieShipman
ID: 24818322
This is to be used in the GeSHi highlighter written in PHP so I do need a regex. The above won't work for me.
I am testing in Expresso and the regex you posted returns nothing.

Thanks for trying, I guess I should have been a little more straightforward in my original post...
0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 24818346
Here is the non-REGEX solution.  I often find it is faster to just write a little bit of code for these simple extractions, versus taking the time to figure out a complicated (and possibly faluty) REGEX.

Best of luck with your project, ~Ray
<?php // RAY_temp_non_regex.php

error_reporting(E_ALL);

echo "<pre>\n";
 

// FROM THE OP

$txt = 'asm

  MOV AX, 1234H

  MOV Number, AX

end;';
 

// DELIMITERS

$a = 'asm';

$z = 'end;';
 

// GET AN ARRAY OF ELEMENTS BROKEN ON $a

$arr = explode($a, $txt);

// VISUALIZE THE RAW DATA

var_dump($arr);

echo "\n";
 

// ITERATE OVER THE ARRAY

foreach ($arr as $key => $val)

{

    if (empty($val)) continue;

    $val = ereg_replace("$z" . '$', '', trim($val));

    echo "\n$val";

}

Open in new window

0
 
LVL 14

Expert Comment

by:profya
ID: 24818397
cxr solution missing an echo, it is working:
<?php

$string = "asm

  MOV AX, 1234H

  MOV Number, AX

end;";

preg_match('/asm(.+)end;/s',$string,$matches);

$asm = $matches[1];

echo $asm;

?>

Open in new window

0
 
LVL 14

Expert Comment

by:profya
ID: 24818423
echo nl2br($asm);
to look like the input string.
0
 
LVL 26

Author Comment

by:EddieShipman
ID: 24818511
Those options won't work for me as I stated earlier.
0
 
LVL 108

Expert Comment

by:Ray Paseur
ID: 24818542
@EddieShipman: We need you to explain a little more why these options will not work.  Is there more test data you can show us?  All of the examples here seem either 100% OK or 99% OK.

RSVP, ~Ray
0
 
LVL 14

Expert Comment

by:profya
ID: 24818568
Try this:
<?php

$string = "asm

  MOV AX, 1234H

  MOV Number, AX

end;";

if (ereg('asm(.+)end;',$string,$matches))

{

	$asm = $matches[1];

	echo nl2br($asm);

}

?>

Open in new window

0
 
LVL 14

Expert Comment

by:profya
ID: 24818581
As Ray says, if this also not working, please let us know much more about the problem, since the solutions gays provided I test them all now, and they are all working fine. We are here for help.
0
 
LVL 26

Author Comment

by:EddieShipman
ID: 24818589
They do not work in either Expresso nor the GeSHi highlighter for the text supplied.

As I stated, this is a regex for use inside the GeShi highlighter, it uses preg_replace to
highlight based on the regex passed.

The other options, as far as stripping out the asm and end; using explode, etc, aren't going to work
because this is for a language file for GeSHi.



0
 
LVL 14

Expert Comment

by:profya
ID: 24818620
Do you think that the line break affects the work of the above solutions?
0
 
LVL 26

Author Comment

by:EddieShipman
ID: 24818659
Take, for instance, the CSS highlighter in GeSHi.

These regexes:

'REGEXPS' => array(
            0 => '\#[a-zA-Z0-9\-]+\s+\{',
            1 => '\.[a-zA-Z0-9\-]+\s',
            2 => ':[a-zA-Z0-9\-]+\s'
            ),


highlight this CSS:
#wrapheader {
      min-height: 120px;
      height: auto !important;
      height: 120px;
/*      background-image: url('./images/background.gif');
      background-repeat: repeat-x;*/
/*      padding: 0 25px 15px 25px;*/
      padding: 0;
}

#wrapcentre {
      margin: 15px 25px 0 25px;
}


like this:
<span style="color: #cc00cc;">#wrapheader <span style="color: #66cc66;">&#123;</span></span>
      min-<span style="color: #000000; font-weight: bold;">height</span>: 120px;
      <span style="color: #000000; font-weight: bold;">height</span>: <span style="color: #993333;">auto</span> !important;
      <span style="color: #000000; font-weight: bold;">height</span>: 120px;

<span style="color: #808080; font-style: italic;">/*      background-image: url('./images/background.gif');
      background-repeat: repeat-x;*/</span>
<span style="color: #808080; font-style: italic;">/*      padding: 0 25px 15px 25px;*/</span>
      <span style="color: #000000; font-weight: bold;">padding</span>: <span style="color: #cc66cc;">0</span>;
<span style="color: #66cc66;">&#125;</span>
&nbsp;
<span style="color: #cc00cc;">#wrapcentre <span style="color: #66cc66;">&#123;</span></span>
      <span style="color: #000000; font-weight: bold;">margin</span>: 15px 25px <span style="color: #cc66cc;">0</span> 25px;

<span style="color: #66cc66;">&#125;</span>

based on these values:
'REGEXPS' => array(
             0 => 'color: #cc00cc;',
            1 => 'color: #6666ff;',
            2 => 'color: #3333ff;',
            )
0
 
LVL 26

Author Comment

by:EddieShipman
ID: 24818665
There are other styles for keywords such as background-color, font-weight, etc but they are not controlled
by the same parsing.
0
What Is Threat Intelligence?

Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

 
LVL 39

Expert Comment

by:Roger Baklund
ID: 24819729
I have no experience with GeSHi, but based on the documentation:

http://qbnz.com/highlighter/geshi-doc.html#language-file-regexps

...I think you need something like this:

 array(
    GESHI_SEARCH => '(asm)(.*)(end;)',
    GESHI_REPLACE => '\\2',
    GESHI_MODIFIERS => 'si',
    GESHI_BEFORE => '\\1',
    GESHI_AFTER => '\\3'
    )
0
 
LVL 26

Author Comment

by:EddieShipman
ID: 24827851
Well, that's very close. It makes the assembly code purple but the asm and end; lose their bold attributes.
What does the si modifier do?
0
 
LVL 39

Expert Comment

by:Roger Baklund
ID: 24828019
The "s" is for matching linefeeds with the .* and the "i" means ignore case, ASM ... END; and Asm...End; would also be matched. Try adding bold like this:

 array(
    GESHI_SEARCH => '(asm)(.*)(end;)',
    GESHI_REPLACE => '<b>\\2</b>',
    GESHI_MODIFIERS => 'si',
    GESHI_BEFORE => '\\1',
    GESHI_AFTER => '<b>\\3</b>'
    )
0
 
LVL 26

Author Comment

by:EddieShipman
ID: 24828713
Is there any way to ignore those matches?
0
 
LVL 39

Expert Comment

by:Roger Baklund
ID: 24828760
"those"? Do you mean ASM ... END; and Asm...End; ? If so, just drop the "i" modifier.
0
 
LVL 26

Author Comment

by:EddieShipman
ID: 24830478
Well, that one doesn't seem to help either. Here's the result:

<span style="color: #000000; font-weight: bold;">asm</span>
  MOV AX<span style="color: #000000;">,</span> 1234H
  MOV Number<span style="color: #000000;">,</span> AX
<span style="color: #000000; font-weight: bold;">end</span><span style="color: #000000;">;</span>
0
 
LVL 39

Expert Comment

by:Roger Baklund
ID: 24830614
Where is the purple color you talked about earlier?

The <b> </b> tags are not there, did you use <span style="color: #000000; font-weight: bold;"> instead?
0
 
LVL 26

Author Comment

by:EddieShipman
ID: 24831886
No, the color purple comes from the color values for the 2nd regex style:

'STYLES' => array(
   .
   .
   .
  'REGEXPS' => array(
    0 => 'color: #0000ff;',
    1 => 'color: #ff0000;',
    2 => 'color: #800080;'
    ),
   .
   .
   .
),

The 2nd style in the array goes with the 2nd regex in the REGEXPS array. See how CSS is done in post 24818659
above.
0
 
LVL 39

Expert Comment

by:Roger Baklund
ID: 24863977
>> The 2nd style in the array goes with the 2nd regex in the REGEXPS array.

I don't know what is your second regexp. Which language are you creating/modifying a language file for? Could you post your complete language file and a sample source code that is supposed to work with that language file?
0
 
LVL 26

Author Comment

by:EddieShipman
ID: 24864125
Below is entire language file and code snippet.
Needs to highlight like this:
http://www.delphicommunity.com/syntax.png
<?php

/*************************************************************************************

 * delphi.php

 * ----------

 * Author: Járja Norbert (jnorbi@vipmail.hu)

 * Copyright: (c) 2004 Járja Norbert, Nigel McNie (http://qbnz.com/highlighter)

 * Release Version: 1.0.7.1

 * CVS Revision Version: $Revision: 1.2 $

 * Date Started: 2004/07/26

 * Last Modified: $Date: 2005/07/26 05:23:30 $

 *

 * Delphi (Object Pascal) language file for GeSHi.

 *

 * CHANGES

 * -------

 * 2004/11/27 (1.0.1)

 *  -  Added support for multiple object splitters

 * 2004/10/27 (1.0.0)

 *   -  First Release

 *

 * TODO (updated 2004/11/27)

 * -------------------------

 *

 *************************************************************************************

 *

 *   This file is part of GeSHi.

 *

 *   GeSHi is free software; you can redistribute it and/or modify

 *   it under the terms of the GNU General Public License as published by

 *   the Free Software Foundation; either version 2 of the License, or

 *   (at your option) any later version.

 *

 *   GeSHi is distributed in the hope that it will be useful,

 *   but WITHOUT ANY WARRANTY; without even the implied warranty of

 *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the

 *   GNU General Public License for more details.

 *

 *   You should have received a copy of the GNU General Public License

 *   along with GeSHi; if not, write to the Free Software

 *   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA

 *

 ************************************************************************************/

// 2 => '/asm(.*?)end;/is'
 

$language_data = array (

  'LANG_NAME' => 'Delphi',

  'COMMENT_SINGLE' => array(1 => '//'),

  'COMMENT_MULTI' => array('(*' => '*)', '{' => '}'),

  'CASE_KEYWORDS' => 0,

  'QUOTEMARKS' => array("'", '"'),

  'ESCAPE_CHAR' => '',

  'KEYWORDS' => array(

    1 => array(

      'And', 'Array', 'As', 'Asm', 'Begin', 'Case', 'Class', 'Constructor', 'Destructor', 'Div', 'Do', 'DownTo', 'Else', 

      'End', 'Except', 'File', 'Finally', 'For', 'Function', 'Goto', 'If', 'Implementation', 'In', 'Inherited', 'Interface', 

      'Is', 'Mod', 'Not', 'Object', 'Of', 'On', 'Or', 'Packed', 'Private', 'Procedure', 'Program', 'Property', 'Protected',

                        'Published','Public', 'Raise', 'Record', 'Repeat', 'Set', 'Shl', 'Shr', 'Then', 'ThreadVar', 'To', 'Try', 'Unit', 

                        'Until', 'Uses', 'While', 'With', 'Xor','Strict'

      ),

    2 => array(

      'nil', 'false', 'true', 'var', 'type', 'const'

      ),

    3 => array(

      'Abs', 'Addr', 'AnsiCompareStr', 'AnsiCompareText', 'AnsiContainsStr', 'AnsiEndsStr', 'AnsiIndexStr', 'AnsiLeftStr',

      'AnsiLowerCase', 'AnsiMatchStr', 'AnsiMidStr', 'AnsiPos', 'AnsiReplaceStr', 'AnsiReverseString', 'AnsiRightStr',

      'AnsiStartsStr', 'AnsiUpperCase', 'ArcCos', 'ArcSin', 'ArcTan', 'Assigned', 'BeginThread', 'Bounds', 'CelsiusToFahrenheit',

      'ChangeFileExt', 'Chr', 'CompareStr', 'CompareText', 'Concat', 'Convert', 'Copy', 'Cos', 'CreateDir', 'CurrToStr',

      'CurrToStrF', 'Date', 'DateTimeToFileDate', 'DateTimeToStr', 'DateToStr', 'DayOfTheMonth', 'DayOfTheWeek', 'DayOfTheYear',

      'DayOfWeek', 'DaysBetween', 'DaysInAMonth', 'DaysInAYear', 'DaySpan', 'DegToRad', 'DeleteFile', 'DiskFree', 'DiskSize',

      'DupeString', 'EncodeDate', 'EncodeDateTime', 'EncodeTime', 'EndOfADay', 'EndOfAMonth', 'Eof', 'Eoln', 'Exp', 'ExtractFileDir',

      'ExtractFileDrive', 'ExtractFileExt', 'ExtractFileName', 'ExtractFilePath', 'FahrenheitToCelsius', 'FileAge',

      'FileDateToDateTime', 'FileExists', 'FilePos', 'FileSearch', 'FileSetDate', 'FileSize', 'FindClose', 'FindCmdLineSwitch',

      'FindFirst', 'FindNext', 'FloatToStr', 'FloatToStrF', 'Format', 'FormatCurr', 'FormatDateTime', 'FormatFloat', 'Frac',

      'GetCurrentDir', 'GetLastError', 'GetMem', 'High', 'IncDay', 'IncMinute', 'IncMonth', 'IncYear', 'InputBox',

      'InputQuery', 'Int', 'IntToHex', 'IntToStr', 'IOResult', 'IsInfinite', 'IsLeapYear', 'IsMultiThread', 'IsNaN',

      'LastDelimiter', 'Length', 'Ln', 'Lo', 'Log10', 'Low', 'LowerCase', 'Max', 'Mean', 'MessageDlg', 'MessageDlgPos',

      'MonthOfTheYear', 'Now', 'Odd', 'Ord', 'ParamCount', 'ParamStr', 'Pi', 'Point', 'PointsEqual', 'Pos', 'Pred',

      'Printer', 'PromptForFileName', 'PtInRect', 'RadToDeg', 'Random', 'RandomRange', 'RecodeDate', 'RecodeTime', 'Rect',

      'RemoveDir', 'RenameFile', 'Round', 'SeekEof', 'SeekEoln', 'SelectDirectory', 'SetCurrentDir', 'Sin', 'SizeOf',

      'Slice', 'Sqr', 'Sqrt', 'StringOfChar', 'StringReplace', 'StringToWideChar', 'StrToCurr', 'StrToDate', 'StrToDateTime',

      'StrToFloat', 'StrToInt', 'StrToInt64', 'StrToInt64Def', 'StrToIntDef', 'StrToTime', 'StuffString', 'Succ', 'Sum', 'Tan',

      'Time', 'TimeToStr', 'Tomorrow', 'Trunc', 'UpCase', 'UpperCase', 'VarType', 'WideCharToString', 'WrapText', 'Yesterday',

      'Append', 'AppendStr', 'Assign', 'AssignFile', 'AssignPrn', 'Beep', 'BlockRead', 'BlockWrite', 'Break',

      'ChDir', 'Close', 'CloseFile', 'Continue', 'DateTimeToString', 'Dec', 'DecodeDate', 'DecodeDateTime',

      'DecodeTime', 'Delete', 'Dispose', 'EndThread', 'Erase', 'Exclude', 'Exit', 'FillChar', 'Flush', 'FreeAndNil',

      'FreeMem', 'GetDir', 'GetLocaleFormatSettings', 'Halt', 'Inc', 'Include', 'Insert', 'MkDir', 'Move', 'New',

      'ProcessPath', 'Randomize', 'Read', 'ReadLn', 'ReallocMem', 'Rename', 'ReplaceDate', 'ReplaceTime',

      'Reset', 'ReWrite', 'RmDir', 'RunError', 'Seek', 'SetLength', 'SetString', 'ShowMessage', 'ShowMessageFmt',

      'ShowMessagePos', 'Str', 'Truncate', 'Val', 'Write', 'WriteLn'

      ),

    4 => array(

      'AnsiChar', 'AnsiString', 'Boolean', 'Byte', 'Cardinal', 'Char', 'Comp', 'Currency', 'Double', 'Extended',

      'Int64', 'Integer', 'LongInt', 'LongWord', 'PAnsiChar', 'PAnsiString', 'PChar', 'PCurrency', 'PDateTime',

      'PExtended', 'PInt64', 'Pointer', 'PShortString', 'PString', 'PVariant', 'PWideChar', 'PWideString',

      'Real', 'Real48', 'ShortInt', 'ShortString', 'Single', 'SmallInt', 'String', 'TBits', 'TConvType', 'TDateTime',

      'Text', 'TextFile', 'TFloatFormat', 'TFormatSettings', 'TList', 'TObject', 'TOpenDialog', 'TPoint',

      'TPrintDialog', 'TRect', 'TReplaceFlags', 'TSaveDialog', 'TSearchRec', 'TStringList', 'TSysCharSet',

      'TThreadFunc', 'Variant', 'WideChar', 'WideString', 'Word'

      ),

                

    ),

  'CASE_SENSITIVE' => array(

    GESHI_COMMENTS => true,

      1 => false,

      2 => false,

      3 => false,

      4 => false,

    ),

  'STYLES' => array(

    'KEYWORDS' => array(

      1 => 'color: #000000; font-weight: bold;',

      2 => 'color: #000000; font-weight: bold;',

      3 => 'color: #000000;',

      4 => 'color: #000000;',

      5 => 'color: #C0C0C0;'

      ),

    'COMMENTS' => array(

      1 => 'color: #808080; font-style: italic;',

      'MULTI' => 'color: #808080; font-style: italic;'

      ),

    'ESCAPE_CHAR' => array(

      ),

    'BRACKETS' => array(

      0 => 'color: #000000;'

      ),

    'STRINGS' => array(

      0 => 'color: #ff0000;'

      ),

    'NUMBERS' => array(

      0 => 'color: #0000ff;'

      ),

    'METHODS' => array(

      1 => 'color: #000000;'

      ),

    'REGEXPS' => array(

                0 => 'color: #0000ff;',

                1 => 'color: #ff0000;',

                2 => 'color: #800080; font-weight: bold;'

      ),

    'SYMBOLS' => array(

      0 => 'color: #000000;'

      ),

    'SCRIPT' => array(

      )

    ),

  'URLS' => array(

    1 => '',

    2 => '',

    3 => '',

    4 => ''

    ),

  'OOLANG' => true,

  'OBJECT_SPLITTERS' => array(

    1 => '.'

    ),

  'REGEXPS' => array(

          0 => '\$[0-9a-fA-F]+',

          1 => '\#\$?[0-9]{1,3}',

          2 => array(

                 GESHI_SEARCH => '(\asm)([a-z]+)(end;)', 

                 GESHI_REPLACE => '\\2',            

                 GESHI_MODIFIERS => '',             

                 GESHI_BEFORE => '\\1',             

                 GESHI_AFTER => '\\3'              

          )

    ),

  'STRICT_MODE_APPLIES' => GESHI_NEVER,

  'SCRIPT_DELIMITERS' => array(

    ),

  'HIGHLIGHT_STRICT_BLOCK' => array(

    )

);
 

?>
 
 

Delphi code snippet:

procedure TForm1.Button1Click(Sender:TObject);

var

  Number, I, X: Integer;

begin

  Number := 12356;

  Caption := 'The number is ' + IntToStr(Number);

  for I := 0 to Number do

  begin

    Inc(X);

    Dec(X);

    X := X + 1.0;

    ListBox1.Items.Add(IntToStr(X));

  end;

  asm

    MOX AX, 1234H

    MOV Number, AX

  end;

end;

Open in new window

0
 
LVL 39

Accepted Solution

by:
Roger Baklund earned 500 total points
ID: 24875773
I have tried to make this work without luck. The closest I get is with this rule:

array(
    GESHI_SEARCH => '(asm)(.*)(end)',
    GESHI_REPLACE => '\\2',
    GESHI_MODIFIERS => 'sU',
    GESHI_BEFORE => '\\1',
    GESHI_AFTER => '\\3'
    )

Note that I have removed the ; after end. This is because when this rule is applied, a previous rule have allready changed the ; into <span style="color: #000000;">;</span>, so there is no match for "end;".

I suspect that you can not do this with GeSHi, it seems it is not able to apply styles correctly to a section of code that allready has styles applied.
0
 
LVL 26

Author Comment

by:EddieShipman
ID: 24877907
You know, now that you mention it, the asm and end are also being handled by another rule,
Is there a way to make it not "grab" the asm and end in the regex?
0
 
LVL 39

Expert Comment

by:Roger Baklund
ID: 24877946
>> Is there a way to make it not "grab" the asm and end in the regex?

Not sure if I understand what you mean. You have to match them in the regexp to find them, but you are not grabbing them in the rule above, they are placed in GESHI_BEFORE and GESHI_AFTER, only the part in between (\\2) is placed in GESHI_REPLACE.
0
 
LVL 26

Author Closing Comment

by:EddieShipman
ID: 31601824
Well, it still isn't working so I'm going to post to the geshi-dev mailing list to see if I can get them to correct it.
It now comes out very weird. Thanks for the help. If I get a solution, I'll come back and post it here for you to see.
0

Featured Post

Easy Project Management (No User Manual Required)

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Both Easy and Powerful How easy is PHP? http://lmgtfy.com?q=how+easy+is+php (http://lmgtfy.com?q=how+easy+is+php)  Very easy.  It has been described as "a programming language even my grandmother can use." How powerful is PHP?  http://en.wikiped…
I imagine that there are some, like me, who require a way of getting currency exchange rates for implementation in web project from time to time, so I thought I would share a solution that I have developed for this purpose. It turns out that Yaho…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …

760 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

21 Experts available now in Live!

Get 1:1 Help Now