Regex help, please

I need a regex to capture the text between 'asm' and 'end;' in this text:

asm
  MOV AX, 1234H
  MOV Number, AX
end;

LVL 27
Eddie ShipmanAll-around developerAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Roger BaklundCommented:
Are you sure you need a regexp for that? Couldn't you just remove the three first and the four last characters?

Anyways, try this:
$string = "asm
  MOV AX, 1234H
  MOV Number, AX
end;";
preg_match('/asm(.+)end;/s',$string,$matches);
$asm = $matches[1];

Open in new window

0
Eddie ShipmanAll-around developerAuthor Commented:
This is to be used in the GeSHi highlighter written in PHP so I do need a regex. The above won't work for me.
I am testing in Expresso and the regex you posted returns nothing.

Thanks for trying, I guess I should have been a little more straightforward in my original post...
0
Ray PaseurCommented:
Here is the non-REGEX solution.  I often find it is faster to just write a little bit of code for these simple extractions, versus taking the time to figure out a complicated (and possibly faluty) REGEX.

Best of luck with your project, ~Ray
<?php // RAY_temp_non_regex.php
error_reporting(E_ALL);
echo "<pre>\n";
 
// FROM THE OP
$txt = 'asm
  MOV AX, 1234H
  MOV Number, AX
end;';
 
// DELIMITERS
$a = 'asm';
$z = 'end;';
 
// GET AN ARRAY OF ELEMENTS BROKEN ON $a
$arr = explode($a, $txt);
// VISUALIZE THE RAW DATA
var_dump($arr);
echo "\n";
 
// ITERATE OVER THE ARRAY
foreach ($arr as $key => $val)
{
    if (empty($val)) continue;
    $val = ereg_replace("$z" . '$', '', trim($val));
    echo "\n$val";
}

Open in new window

0
Rowby Goren Makes an Impact on Screen and Online

Learn about longtime user Rowby Goren and his great contributions to the site. We explore his method for posing questions that are likely to yield a solution, and take a look at how his career transformed from a Hollywood writer to a website entrepreneur.

profyaCommented:
cxr solution missing an echo, it is working:
<?php
$string = "asm
  MOV AX, 1234H
  MOV Number, AX
end;";
preg_match('/asm(.+)end;/s',$string,$matches);
$asm = $matches[1];
echo $asm;
?>

Open in new window

0
profyaCommented:
echo nl2br($asm);
to look like the input string.
0
Eddie ShipmanAll-around developerAuthor Commented:
Those options won't work for me as I stated earlier.
0
Ray PaseurCommented:
@EddieShipman: We need you to explain a little more why these options will not work.  Is there more test data you can show us?  All of the examples here seem either 100% OK or 99% OK.

RSVP, ~Ray
0
profyaCommented:
Try this:
<?php
$string = "asm
  MOV AX, 1234H
  MOV Number, AX
end;";
if (ereg('asm(.+)end;',$string,$matches))
{
	$asm = $matches[1];
	echo nl2br($asm);
}
?>

Open in new window

0
profyaCommented:
As Ray says, if this also not working, please let us know much more about the problem, since the solutions gays provided I test them all now, and they are all working fine. We are here for help.
0
Eddie ShipmanAll-around developerAuthor Commented:
They do not work in either Expresso nor the GeSHi highlighter for the text supplied.

As I stated, this is a regex for use inside the GeShi highlighter, it uses preg_replace to
highlight based on the regex passed.

The other options, as far as stripping out the asm and end; using explode, etc, aren't going to work
because this is for a language file for GeSHi.



0
profyaCommented:
Do you think that the line break affects the work of the above solutions?
0
Eddie ShipmanAll-around developerAuthor Commented:
Take, for instance, the CSS highlighter in GeSHi.

These regexes:

'REGEXPS' => array(
            0 => '\#[a-zA-Z0-9\-]+\s+\{',
            1 => '\.[a-zA-Z0-9\-]+\s',
            2 => ':[a-zA-Z0-9\-]+\s'
            ),


highlight this CSS:
#wrapheader {
      min-height: 120px;
      height: auto !important;
      height: 120px;
/*      background-image: url('./images/background.gif');
      background-repeat: repeat-x;*/
/*      padding: 0 25px 15px 25px;*/
      padding: 0;
}

#wrapcentre {
      margin: 15px 25px 0 25px;
}


like this:
<span style="color: #cc00cc;">#wrapheader <span style="color: #66cc66;">&#123;</span></span>
      min-<span style="color: #000000; font-weight: bold;">height</span>: 120px;
      <span style="color: #000000; font-weight: bold;">height</span>: <span style="color: #993333;">auto</span> !important;
      <span style="color: #000000; font-weight: bold;">height</span>: 120px;

<span style="color: #808080; font-style: italic;">/*      background-image: url('./images/background.gif');
      background-repeat: repeat-x;*/</span>
<span style="color: #808080; font-style: italic;">/*      padding: 0 25px 15px 25px;*/</span>
      <span style="color: #000000; font-weight: bold;">padding</span>: <span style="color: #cc66cc;">0</span>;
<span style="color: #66cc66;">&#125;</span>
&nbsp;
<span style="color: #cc00cc;">#wrapcentre <span style="color: #66cc66;">&#123;</span></span>
      <span style="color: #000000; font-weight: bold;">margin</span>: 15px 25px <span style="color: #cc66cc;">0</span> 25px;

<span style="color: #66cc66;">&#125;</span>

based on these values:
'REGEXPS' => array(
             0 => 'color: #cc00cc;',
            1 => 'color: #6666ff;',
            2 => 'color: #3333ff;',
            )
0
Eddie ShipmanAll-around developerAuthor Commented:
There are other styles for keywords such as background-color, font-weight, etc but they are not controlled
by the same parsing.
0
Roger BaklundCommented:
I have no experience with GeSHi, but based on the documentation:

http://qbnz.com/highlighter/geshi-doc.html#language-file-regexps

...I think you need something like this:

 array(
    GESHI_SEARCH => '(asm)(.*)(end;)',
    GESHI_REPLACE => '\\2',
    GESHI_MODIFIERS => 'si',
    GESHI_BEFORE => '\\1',
    GESHI_AFTER => '\\3'
    )
0
Eddie ShipmanAll-around developerAuthor Commented:
Well, that's very close. It makes the assembly code purple but the asm and end; lose their bold attributes.
What does the si modifier do?
0
Roger BaklundCommented:
The "s" is for matching linefeeds with the .* and the "i" means ignore case, ASM ... END; and Asm...End; would also be matched. Try adding bold like this:

 array(
    GESHI_SEARCH => '(asm)(.*)(end;)',
    GESHI_REPLACE => '<b>\\2</b>',
    GESHI_MODIFIERS => 'si',
    GESHI_BEFORE => '\\1',
    GESHI_AFTER => '<b>\\3</b>'
    )
0
Eddie ShipmanAll-around developerAuthor Commented:
Is there any way to ignore those matches?
0
Roger BaklundCommented:
"those"? Do you mean ASM ... END; and Asm...End; ? If so, just drop the "i" modifier.
0
Eddie ShipmanAll-around developerAuthor Commented:
Well, that one doesn't seem to help either. Here's the result:

<span style="color: #000000; font-weight: bold;">asm</span>
  MOV AX<span style="color: #000000;">,</span> 1234H
  MOV Number<span style="color: #000000;">,</span> AX
<span style="color: #000000; font-weight: bold;">end</span><span style="color: #000000;">;</span>
0
Roger BaklundCommented:
Where is the purple color you talked about earlier?

The <b> </b> tags are not there, did you use <span style="color: #000000; font-weight: bold;"> instead?
0
Eddie ShipmanAll-around developerAuthor Commented:
No, the color purple comes from the color values for the 2nd regex style:

'STYLES' => array(
   .
   .
   .
  'REGEXPS' => array(
    0 => 'color: #0000ff;',
    1 => 'color: #ff0000;',
    2 => 'color: #800080;'
    ),
   .
   .
   .
),

The 2nd style in the array goes with the 2nd regex in the REGEXPS array. See how CSS is done in post 24818659
above.
0
Roger BaklundCommented:
>> The 2nd style in the array goes with the 2nd regex in the REGEXPS array.

I don't know what is your second regexp. Which language are you creating/modifying a language file for? Could you post your complete language file and a sample source code that is supposed to work with that language file?
0
Eddie ShipmanAll-around developerAuthor Commented:
Below is entire language file and code snippet.
Needs to highlight like this:
http://www.delphicommunity.com/syntax.png
<?php
/*************************************************************************************
 * delphi.php
 * ----------
 * Author: Járja Norbert (jnorbi@vipmail.hu)
 * Copyright: (c) 2004 Járja Norbert, Nigel McNie (http://qbnz.com/highlighter)
 * Release Version: 1.0.7.1
 * CVS Revision Version: $Revision: 1.2 $
 * Date Started: 2004/07/26
 * Last Modified: $Date: 2005/07/26 05:23:30 $
 *
 * Delphi (Object Pascal) language file for GeSHi.
 *
 * CHANGES
 * -------
 * 2004/11/27 (1.0.1)
 *  -  Added support for multiple object splitters
 * 2004/10/27 (1.0.0)
 *   -  First Release
 *
 * TODO (updated 2004/11/27)
 * -------------------------
 *
 *************************************************************************************
 *
 *   This file is part of GeSHi.
 *
 *   GeSHi is free software; you can redistribute it and/or modify
 *   it under the terms of the GNU General Public License as published by
 *   the Free Software Foundation; either version 2 of the License, or
 *   (at your option) any later version.
 *
 *   GeSHi is distributed in the hope that it will be useful,
 *   but WITHOUT ANY WARRANTY; without even the implied warranty of
 *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 *   GNU General Public License for more details.
 *
 *   You should have received a copy of the GNU General Public License
 *   along with GeSHi; if not, write to the Free Software
 *   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
 *
 ************************************************************************************/
// 2 => '/asm(.*?)end;/is'
 
$language_data = array (
  'LANG_NAME' => 'Delphi',
  'COMMENT_SINGLE' => array(1 => '//'),
  'COMMENT_MULTI' => array('(*' => '*)', '{' => '}'),
  'CASE_KEYWORDS' => 0,
  'QUOTEMARKS' => array("'", '"'),
  'ESCAPE_CHAR' => '',
  'KEYWORDS' => array(
    1 => array(
      'And', 'Array', 'As', 'Asm', 'Begin', 'Case', 'Class', 'Constructor', 'Destructor', 'Div', 'Do', 'DownTo', 'Else', 
      'End', 'Except', 'File', 'Finally', 'For', 'Function', 'Goto', 'If', 'Implementation', 'In', 'Inherited', 'Interface', 
      'Is', 'Mod', 'Not', 'Object', 'Of', 'On', 'Or', 'Packed', 'Private', 'Procedure', 'Program', 'Property', 'Protected',
                        'Published','Public', 'Raise', 'Record', 'Repeat', 'Set', 'Shl', 'Shr', 'Then', 'ThreadVar', 'To', 'Try', 'Unit', 
                        'Until', 'Uses', 'While', 'With', 'Xor','Strict'
      ),
    2 => array(
      'nil', 'false', 'true', 'var', 'type', 'const'
      ),
    3 => array(
      'Abs', 'Addr', 'AnsiCompareStr', 'AnsiCompareText', 'AnsiContainsStr', 'AnsiEndsStr', 'AnsiIndexStr', 'AnsiLeftStr',
      'AnsiLowerCase', 'AnsiMatchStr', 'AnsiMidStr', 'AnsiPos', 'AnsiReplaceStr', 'AnsiReverseString', 'AnsiRightStr',
      'AnsiStartsStr', 'AnsiUpperCase', 'ArcCos', 'ArcSin', 'ArcTan', 'Assigned', 'BeginThread', 'Bounds', 'CelsiusToFahrenheit',
      'ChangeFileExt', 'Chr', 'CompareStr', 'CompareText', 'Concat', 'Convert', 'Copy', 'Cos', 'CreateDir', 'CurrToStr',
      'CurrToStrF', 'Date', 'DateTimeToFileDate', 'DateTimeToStr', 'DateToStr', 'DayOfTheMonth', 'DayOfTheWeek', 'DayOfTheYear',
      'DayOfWeek', 'DaysBetween', 'DaysInAMonth', 'DaysInAYear', 'DaySpan', 'DegToRad', 'DeleteFile', 'DiskFree', 'DiskSize',
      'DupeString', 'EncodeDate', 'EncodeDateTime', 'EncodeTime', 'EndOfADay', 'EndOfAMonth', 'Eof', 'Eoln', 'Exp', 'ExtractFileDir',
      'ExtractFileDrive', 'ExtractFileExt', 'ExtractFileName', 'ExtractFilePath', 'FahrenheitToCelsius', 'FileAge',
      'FileDateToDateTime', 'FileExists', 'FilePos', 'FileSearch', 'FileSetDate', 'FileSize', 'FindClose', 'FindCmdLineSwitch',
      'FindFirst', 'FindNext', 'FloatToStr', 'FloatToStrF', 'Format', 'FormatCurr', 'FormatDateTime', 'FormatFloat', 'Frac',
      'GetCurrentDir', 'GetLastError', 'GetMem', 'High', 'IncDay', 'IncMinute', 'IncMonth', 'IncYear', 'InputBox',
      'InputQuery', 'Int', 'IntToHex', 'IntToStr', 'IOResult', 'IsInfinite', 'IsLeapYear', 'IsMultiThread', 'IsNaN',
      'LastDelimiter', 'Length', 'Ln', 'Lo', 'Log10', 'Low', 'LowerCase', 'Max', 'Mean', 'MessageDlg', 'MessageDlgPos',
      'MonthOfTheYear', 'Now', 'Odd', 'Ord', 'ParamCount', 'ParamStr', 'Pi', 'Point', 'PointsEqual', 'Pos', 'Pred',
      'Printer', 'PromptForFileName', 'PtInRect', 'RadToDeg', 'Random', 'RandomRange', 'RecodeDate', 'RecodeTime', 'Rect',
      'RemoveDir', 'RenameFile', 'Round', 'SeekEof', 'SeekEoln', 'SelectDirectory', 'SetCurrentDir', 'Sin', 'SizeOf',
      'Slice', 'Sqr', 'Sqrt', 'StringOfChar', 'StringReplace', 'StringToWideChar', 'StrToCurr', 'StrToDate', 'StrToDateTime',
      'StrToFloat', 'StrToInt', 'StrToInt64', 'StrToInt64Def', 'StrToIntDef', 'StrToTime', 'StuffString', 'Succ', 'Sum', 'Tan',
      'Time', 'TimeToStr', 'Tomorrow', 'Trunc', 'UpCase', 'UpperCase', 'VarType', 'WideCharToString', 'WrapText', 'Yesterday',
      'Append', 'AppendStr', 'Assign', 'AssignFile', 'AssignPrn', 'Beep', 'BlockRead', 'BlockWrite', 'Break',
      'ChDir', 'Close', 'CloseFile', 'Continue', 'DateTimeToString', 'Dec', 'DecodeDate', 'DecodeDateTime',
      'DecodeTime', 'Delete', 'Dispose', 'EndThread', 'Erase', 'Exclude', 'Exit', 'FillChar', 'Flush', 'FreeAndNil',
      'FreeMem', 'GetDir', 'GetLocaleFormatSettings', 'Halt', 'Inc', 'Include', 'Insert', 'MkDir', 'Move', 'New',
      'ProcessPath', 'Randomize', 'Read', 'ReadLn', 'ReallocMem', 'Rename', 'ReplaceDate', 'ReplaceTime',
      'Reset', 'ReWrite', 'RmDir', 'RunError', 'Seek', 'SetLength', 'SetString', 'ShowMessage', 'ShowMessageFmt',
      'ShowMessagePos', 'Str', 'Truncate', 'Val', 'Write', 'WriteLn'
      ),
    4 => array(
      'AnsiChar', 'AnsiString', 'Boolean', 'Byte', 'Cardinal', 'Char', 'Comp', 'Currency', 'Double', 'Extended',
      'Int64', 'Integer', 'LongInt', 'LongWord', 'PAnsiChar', 'PAnsiString', 'PChar', 'PCurrency', 'PDateTime',
      'PExtended', 'PInt64', 'Pointer', 'PShortString', 'PString', 'PVariant', 'PWideChar', 'PWideString',
      'Real', 'Real48', 'ShortInt', 'ShortString', 'Single', 'SmallInt', 'String', 'TBits', 'TConvType', 'TDateTime',
      'Text', 'TextFile', 'TFloatFormat', 'TFormatSettings', 'TList', 'TObject', 'TOpenDialog', 'TPoint',
      'TPrintDialog', 'TRect', 'TReplaceFlags', 'TSaveDialog', 'TSearchRec', 'TStringList', 'TSysCharSet',
      'TThreadFunc', 'Variant', 'WideChar', 'WideString', 'Word'
      ),
                
    ),
  'CASE_SENSITIVE' => array(
    GESHI_COMMENTS => true,
      1 => false,
      2 => false,
      3 => false,
      4 => false,
    ),
  'STYLES' => array(
    'KEYWORDS' => array(
      1 => 'color: #000000; font-weight: bold;',
      2 => 'color: #000000; font-weight: bold;',
      3 => 'color: #000000;',
      4 => 'color: #000000;',
      5 => 'color: #C0C0C0;'
      ),
    'COMMENTS' => array(
      1 => 'color: #808080; font-style: italic;',
      'MULTI' => 'color: #808080; font-style: italic;'
      ),
    'ESCAPE_CHAR' => array(
      ),
    'BRACKETS' => array(
      0 => 'color: #000000;'
      ),
    'STRINGS' => array(
      0 => 'color: #ff0000;'
      ),
    'NUMBERS' => array(
      0 => 'color: #0000ff;'
      ),
    'METHODS' => array(
      1 => 'color: #000000;'
      ),
    'REGEXPS' => array(
                0 => 'color: #0000ff;',
                1 => 'color: #ff0000;',
                2 => 'color: #800080; font-weight: bold;'
      ),
    'SYMBOLS' => array(
      0 => 'color: #000000;'
      ),
    'SCRIPT' => array(
      )
    ),
  'URLS' => array(
    1 => '',
    2 => '',
    3 => '',
    4 => ''
    ),
  'OOLANG' => true,
  'OBJECT_SPLITTERS' => array(
    1 => '.'
    ),
  'REGEXPS' => array(
          0 => '\$[0-9a-fA-F]+',
          1 => '\#\$?[0-9]{1,3}',
          2 => array(
                 GESHI_SEARCH => '(\asm)([a-z]+)(end;)', 
                 GESHI_REPLACE => '\\2',            
                 GESHI_MODIFIERS => '',             
                 GESHI_BEFORE => '\\1',             
                 GESHI_AFTER => '\\3'              
          )
    ),
  'STRICT_MODE_APPLIES' => GESHI_NEVER,
  'SCRIPT_DELIMITERS' => array(
    ),
  'HIGHLIGHT_STRICT_BLOCK' => array(
    )
);
 
?>
 
 
Delphi code snippet:
procedure TForm1.Button1Click(Sender:TObject);
var
  Number, I, X: Integer;
begin
  Number := 12356;
  Caption := 'The number is ' + IntToStr(Number);
  for I := 0 to Number do
  begin
    Inc(X);
    Dec(X);
    X := X + 1.0;
    ListBox1.Items.Add(IntToStr(X));
  end;
  asm
    MOX AX, 1234H
    MOV Number, AX
  end;
end;

Open in new window

0
Roger BaklundCommented:
I have tried to make this work without luck. The closest I get is with this rule:

array(
    GESHI_SEARCH => '(asm)(.*)(end)',
    GESHI_REPLACE => '\\2',
    GESHI_MODIFIERS => 'sU',
    GESHI_BEFORE => '\\1',
    GESHI_AFTER => '\\3'
    )

Note that I have removed the ; after end. This is because when this rule is applied, a previous rule have allready changed the ; into <span style="color: #000000;">;</span>, so there is no match for "end;".

I suspect that you can not do this with GeSHi, it seems it is not able to apply styles correctly to a section of code that allready has styles applied.
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Eddie ShipmanAll-around developerAuthor Commented:
You know, now that you mention it, the asm and end are also being handled by another rule,
Is there a way to make it not "grab" the asm and end in the regex?
0
Roger BaklundCommented:
>> Is there a way to make it not "grab" the asm and end in the regex?

Not sure if I understand what you mean. You have to match them in the regexp to find them, but you are not grabbing them in the rule above, they are placed in GESHI_BEFORE and GESHI_AFTER, only the part in between (\\2) is placed in GESHI_REPLACE.
0
Eddie ShipmanAll-around developerAuthor Commented:
Well, it still isn't working so I'm going to post to the geshi-dev mailing list to see if I can get them to correct it.
It now comes out very weird. Thanks for the help. If I get a solution, I'll come back and post it here for you to see.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
PHP

From novice to tech pro — start learning today.