Solved

Regex help, please

Posted on 2009-07-09
27
495 Views
Last Modified: 2012-05-07
I need a regex to capture the text between 'asm' and 'end;' in this text:

asm
  MOV AX, 1234H
  MOV Number, AX
end;

0
Comment
Question by:EddieShipman
  • 12
  • 8
  • 5
  • +1
27 Comments
 
LVL 39

Expert Comment

by:Roger Baklund
ID: 24818275
Are you sure you need a regexp for that? Couldn't you just remove the three first and the four last characters?

Anyways, try this:
$string = "asm
  MOV AX, 1234H
  MOV Number, AX
end;";
preg_match('/asm(.+)end;/s',$string,$matches);
$asm = $matches[1];

Open in new window

0
 
LVL 26

Author Comment

by:EddieShipman
ID: 24818322
This is to be used in the GeSHi highlighter written in PHP so I do need a regex. The above won't work for me.
I am testing in Expresso and the regex you posted returns nothing.

Thanks for trying, I guess I should have been a little more straightforward in my original post...
0
 
LVL 109

Expert Comment

by:Ray Paseur
ID: 24818346
Here is the non-REGEX solution.  I often find it is faster to just write a little bit of code for these simple extractions, versus taking the time to figure out a complicated (and possibly faluty) REGEX.

Best of luck with your project, ~Ray
<?php // RAY_temp_non_regex.php
error_reporting(E_ALL);
echo "<pre>\n";
 
// FROM THE OP
$txt = 'asm
  MOV AX, 1234H
  MOV Number, AX
end;';
 
// DELIMITERS
$a = 'asm';
$z = 'end;';
 
// GET AN ARRAY OF ELEMENTS BROKEN ON $a
$arr = explode($a, $txt);
// VISUALIZE THE RAW DATA
var_dump($arr);
echo "\n";
 
// ITERATE OVER THE ARRAY
foreach ($arr as $key => $val)
{
    if (empty($val)) continue;
    $val = ereg_replace("$z" . '$', '', trim($val));
    echo "\n$val";
}

Open in new window

0
Master Your Team's Linux and Cloud Stack

Come see why top tech companies like Mailchimp and Media Temple use Linux Academy to build their employee training programs.

 
LVL 14

Expert Comment

by:profya
ID: 24818397
cxr solution missing an echo, it is working:
<?php
$string = "asm
  MOV AX, 1234H
  MOV Number, AX
end;";
preg_match('/asm(.+)end;/s',$string,$matches);
$asm = $matches[1];
echo $asm;
?>

Open in new window

0
 
LVL 14

Expert Comment

by:profya
ID: 24818423
echo nl2br($asm);
to look like the input string.
0
 
LVL 26

Author Comment

by:EddieShipman
ID: 24818511
Those options won't work for me as I stated earlier.
0
 
LVL 109

Expert Comment

by:Ray Paseur
ID: 24818542
@EddieShipman: We need you to explain a little more why these options will not work.  Is there more test data you can show us?  All of the examples here seem either 100% OK or 99% OK.

RSVP, ~Ray
0
 
LVL 14

Expert Comment

by:profya
ID: 24818568
Try this:
<?php
$string = "asm
  MOV AX, 1234H
  MOV Number, AX
end;";
if (ereg('asm(.+)end;',$string,$matches))
{
	$asm = $matches[1];
	echo nl2br($asm);
}
?>

Open in new window

0
 
LVL 14

Expert Comment

by:profya
ID: 24818581
As Ray says, if this also not working, please let us know much more about the problem, since the solutions gays provided I test them all now, and they are all working fine. We are here for help.
0
 
LVL 26

Author Comment

by:EddieShipman
ID: 24818589
They do not work in either Expresso nor the GeSHi highlighter for the text supplied.

As I stated, this is a regex for use inside the GeShi highlighter, it uses preg_replace to
highlight based on the regex passed.

The other options, as far as stripping out the asm and end; using explode, etc, aren't going to work
because this is for a language file for GeSHi.



0
 
LVL 14

Expert Comment

by:profya
ID: 24818620
Do you think that the line break affects the work of the above solutions?
0
 
LVL 26

Author Comment

by:EddieShipman
ID: 24818659
Take, for instance, the CSS highlighter in GeSHi.

These regexes:

'REGEXPS' => array(
            0 => '\#[a-zA-Z0-9\-]+\s+\{',
            1 => '\.[a-zA-Z0-9\-]+\s',
            2 => ':[a-zA-Z0-9\-]+\s'
            ),


highlight this CSS:
#wrapheader {
      min-height: 120px;
      height: auto !important;
      height: 120px;
/*      background-image: url('./images/background.gif');
      background-repeat: repeat-x;*/
/*      padding: 0 25px 15px 25px;*/
      padding: 0;
}

#wrapcentre {
      margin: 15px 25px 0 25px;
}


like this:
<span style="color: #cc00cc;">#wrapheader <span style="color: #66cc66;">&#123;</span></span>
      min-<span style="color: #000000; font-weight: bold;">height</span>: 120px;
      <span style="color: #000000; font-weight: bold;">height</span>: <span style="color: #993333;">auto</span> !important;
      <span style="color: #000000; font-weight: bold;">height</span>: 120px;

<span style="color: #808080; font-style: italic;">/*      background-image: url('./images/background.gif');
      background-repeat: repeat-x;*/</span>
<span style="color: #808080; font-style: italic;">/*      padding: 0 25px 15px 25px;*/</span>
      <span style="color: #000000; font-weight: bold;">padding</span>: <span style="color: #cc66cc;">0</span>;
<span style="color: #66cc66;">&#125;</span>
&nbsp;
<span style="color: #cc00cc;">#wrapcentre <span style="color: #66cc66;">&#123;</span></span>
      <span style="color: #000000; font-weight: bold;">margin</span>: 15px 25px <span style="color: #cc66cc;">0</span> 25px;

<span style="color: #66cc66;">&#125;</span>

based on these values:
'REGEXPS' => array(
             0 => 'color: #cc00cc;',
            1 => 'color: #6666ff;',
            2 => 'color: #3333ff;',
            )
0
 
LVL 26

Author Comment

by:EddieShipman
ID: 24818665
There are other styles for keywords such as background-color, font-weight, etc but they are not controlled
by the same parsing.
0
 
LVL 39

Expert Comment

by:Roger Baklund
ID: 24819729
I have no experience with GeSHi, but based on the documentation:

http://qbnz.com/highlighter/geshi-doc.html#language-file-regexps

...I think you need something like this:

 array(
    GESHI_SEARCH => '(asm)(.*)(end;)',
    GESHI_REPLACE => '\\2',
    GESHI_MODIFIERS => 'si',
    GESHI_BEFORE => '\\1',
    GESHI_AFTER => '\\3'
    )
0
 
LVL 26

Author Comment

by:EddieShipman
ID: 24827851
Well, that's very close. It makes the assembly code purple but the asm and end; lose their bold attributes.
What does the si modifier do?
0
 
LVL 39

Expert Comment

by:Roger Baklund
ID: 24828019
The "s" is for matching linefeeds with the .* and the "i" means ignore case, ASM ... END; and Asm...End; would also be matched. Try adding bold like this:

 array(
    GESHI_SEARCH => '(asm)(.*)(end;)',
    GESHI_REPLACE => '<b>\\2</b>',
    GESHI_MODIFIERS => 'si',
    GESHI_BEFORE => '\\1',
    GESHI_AFTER => '<b>\\3</b>'
    )
0
 
LVL 26

Author Comment

by:EddieShipman
ID: 24828713
Is there any way to ignore those matches?
0
 
LVL 39

Expert Comment

by:Roger Baklund
ID: 24828760
"those"? Do you mean ASM ... END; and Asm...End; ? If so, just drop the "i" modifier.
0
 
LVL 26

Author Comment

by:EddieShipman
ID: 24830478
Well, that one doesn't seem to help either. Here's the result:

<span style="color: #000000; font-weight: bold;">asm</span>
  MOV AX<span style="color: #000000;">,</span> 1234H
  MOV Number<span style="color: #000000;">,</span> AX
<span style="color: #000000; font-weight: bold;">end</span><span style="color: #000000;">;</span>
0
 
LVL 39

Expert Comment

by:Roger Baklund
ID: 24830614
Where is the purple color you talked about earlier?

The <b> </b> tags are not there, did you use <span style="color: #000000; font-weight: bold;"> instead?
0
 
LVL 26

Author Comment

by:EddieShipman
ID: 24831886
No, the color purple comes from the color values for the 2nd regex style:

'STYLES' => array(
   .
   .
   .
  'REGEXPS' => array(
    0 => 'color: #0000ff;',
    1 => 'color: #ff0000;',
    2 => 'color: #800080;'
    ),
   .
   .
   .
),

The 2nd style in the array goes with the 2nd regex in the REGEXPS array. See how CSS is done in post 24818659
above.
0
 
LVL 39

Expert Comment

by:Roger Baklund
ID: 24863977
>> The 2nd style in the array goes with the 2nd regex in the REGEXPS array.

I don't know what is your second regexp. Which language are you creating/modifying a language file for? Could you post your complete language file and a sample source code that is supposed to work with that language file?
0
 
LVL 26

Author Comment

by:EddieShipman
ID: 24864125
Below is entire language file and code snippet.
Needs to highlight like this:
http://www.delphicommunity.com/syntax.png
<?php
/*************************************************************************************
 * delphi.php
 * ----------
 * Author: Járja Norbert (jnorbi@vipmail.hu)
 * Copyright: (c) 2004 Járja Norbert, Nigel McNie (http://qbnz.com/highlighter)
 * Release Version: 1.0.7.1
 * CVS Revision Version: $Revision: 1.2 $
 * Date Started: 2004/07/26
 * Last Modified: $Date: 2005/07/26 05:23:30 $
 *
 * Delphi (Object Pascal) language file for GeSHi.
 *
 * CHANGES
 * -------
 * 2004/11/27 (1.0.1)
 *  -  Added support for multiple object splitters
 * 2004/10/27 (1.0.0)
 *   -  First Release
 *
 * TODO (updated 2004/11/27)
 * -------------------------
 *
 *************************************************************************************
 *
 *   This file is part of GeSHi.
 *
 *   GeSHi is free software; you can redistribute it and/or modify
 *   it under the terms of the GNU General Public License as published by
 *   the Free Software Foundation; either version 2 of the License, or
 *   (at your option) any later version.
 *
 *   GeSHi is distributed in the hope that it will be useful,
 *   but WITHOUT ANY WARRANTY; without even the implied warranty of
 *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 *   GNU General Public License for more details.
 *
 *   You should have received a copy of the GNU General Public License
 *   along with GeSHi; if not, write to the Free Software
 *   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
 *
 ************************************************************************************/
// 2 => '/asm(.*?)end;/is'
 
$language_data = array (
  'LANG_NAME' => 'Delphi',
  'COMMENT_SINGLE' => array(1 => '//'),
  'COMMENT_MULTI' => array('(*' => '*)', '{' => '}'),
  'CASE_KEYWORDS' => 0,
  'QUOTEMARKS' => array("'", '"'),
  'ESCAPE_CHAR' => '',
  'KEYWORDS' => array(
    1 => array(
      'And', 'Array', 'As', 'Asm', 'Begin', 'Case', 'Class', 'Constructor', 'Destructor', 'Div', 'Do', 'DownTo', 'Else', 
      'End', 'Except', 'File', 'Finally', 'For', 'Function', 'Goto', 'If', 'Implementation', 'In', 'Inherited', 'Interface', 
      'Is', 'Mod', 'Not', 'Object', 'Of', 'On', 'Or', 'Packed', 'Private', 'Procedure', 'Program', 'Property', 'Protected',
                        'Published','Public', 'Raise', 'Record', 'Repeat', 'Set', 'Shl', 'Shr', 'Then', 'ThreadVar', 'To', 'Try', 'Unit', 
                        'Until', 'Uses', 'While', 'With', 'Xor','Strict'
      ),
    2 => array(
      'nil', 'false', 'true', 'var', 'type', 'const'
      ),
    3 => array(
      'Abs', 'Addr', 'AnsiCompareStr', 'AnsiCompareText', 'AnsiContainsStr', 'AnsiEndsStr', 'AnsiIndexStr', 'AnsiLeftStr',
      'AnsiLowerCase', 'AnsiMatchStr', 'AnsiMidStr', 'AnsiPos', 'AnsiReplaceStr', 'AnsiReverseString', 'AnsiRightStr',
      'AnsiStartsStr', 'AnsiUpperCase', 'ArcCos', 'ArcSin', 'ArcTan', 'Assigned', 'BeginThread', 'Bounds', 'CelsiusToFahrenheit',
      'ChangeFileExt', 'Chr', 'CompareStr', 'CompareText', 'Concat', 'Convert', 'Copy', 'Cos', 'CreateDir', 'CurrToStr',
      'CurrToStrF', 'Date', 'DateTimeToFileDate', 'DateTimeToStr', 'DateToStr', 'DayOfTheMonth', 'DayOfTheWeek', 'DayOfTheYear',
      'DayOfWeek', 'DaysBetween', 'DaysInAMonth', 'DaysInAYear', 'DaySpan', 'DegToRad', 'DeleteFile', 'DiskFree', 'DiskSize',
      'DupeString', 'EncodeDate', 'EncodeDateTime', 'EncodeTime', 'EndOfADay', 'EndOfAMonth', 'Eof', 'Eoln', 'Exp', 'ExtractFileDir',
      'ExtractFileDrive', 'ExtractFileExt', 'ExtractFileName', 'ExtractFilePath', 'FahrenheitToCelsius', 'FileAge',
      'FileDateToDateTime', 'FileExists', 'FilePos', 'FileSearch', 'FileSetDate', 'FileSize', 'FindClose', 'FindCmdLineSwitch',
      'FindFirst', 'FindNext', 'FloatToStr', 'FloatToStrF', 'Format', 'FormatCurr', 'FormatDateTime', 'FormatFloat', 'Frac',
      'GetCurrentDir', 'GetLastError', 'GetMem', 'High', 'IncDay', 'IncMinute', 'IncMonth', 'IncYear', 'InputBox',
      'InputQuery', 'Int', 'IntToHex', 'IntToStr', 'IOResult', 'IsInfinite', 'IsLeapYear', 'IsMultiThread', 'IsNaN',
      'LastDelimiter', 'Length', 'Ln', 'Lo', 'Log10', 'Low', 'LowerCase', 'Max', 'Mean', 'MessageDlg', 'MessageDlgPos',
      'MonthOfTheYear', 'Now', 'Odd', 'Ord', 'ParamCount', 'ParamStr', 'Pi', 'Point', 'PointsEqual', 'Pos', 'Pred',
      'Printer', 'PromptForFileName', 'PtInRect', 'RadToDeg', 'Random', 'RandomRange', 'RecodeDate', 'RecodeTime', 'Rect',
      'RemoveDir', 'RenameFile', 'Round', 'SeekEof', 'SeekEoln', 'SelectDirectory', 'SetCurrentDir', 'Sin', 'SizeOf',
      'Slice', 'Sqr', 'Sqrt', 'StringOfChar', 'StringReplace', 'StringToWideChar', 'StrToCurr', 'StrToDate', 'StrToDateTime',
      'StrToFloat', 'StrToInt', 'StrToInt64', 'StrToInt64Def', 'StrToIntDef', 'StrToTime', 'StuffString', 'Succ', 'Sum', 'Tan',
      'Time', 'TimeToStr', 'Tomorrow', 'Trunc', 'UpCase', 'UpperCase', 'VarType', 'WideCharToString', 'WrapText', 'Yesterday',
      'Append', 'AppendStr', 'Assign', 'AssignFile', 'AssignPrn', 'Beep', 'BlockRead', 'BlockWrite', 'Break',
      'ChDir', 'Close', 'CloseFile', 'Continue', 'DateTimeToString', 'Dec', 'DecodeDate', 'DecodeDateTime',
      'DecodeTime', 'Delete', 'Dispose', 'EndThread', 'Erase', 'Exclude', 'Exit', 'FillChar', 'Flush', 'FreeAndNil',
      'FreeMem', 'GetDir', 'GetLocaleFormatSettings', 'Halt', 'Inc', 'Include', 'Insert', 'MkDir', 'Move', 'New',
      'ProcessPath', 'Randomize', 'Read', 'ReadLn', 'ReallocMem', 'Rename', 'ReplaceDate', 'ReplaceTime',
      'Reset', 'ReWrite', 'RmDir', 'RunError', 'Seek', 'SetLength', 'SetString', 'ShowMessage', 'ShowMessageFmt',
      'ShowMessagePos', 'Str', 'Truncate', 'Val', 'Write', 'WriteLn'
      ),
    4 => array(
      'AnsiChar', 'AnsiString', 'Boolean', 'Byte', 'Cardinal', 'Char', 'Comp', 'Currency', 'Double', 'Extended',
      'Int64', 'Integer', 'LongInt', 'LongWord', 'PAnsiChar', 'PAnsiString', 'PChar', 'PCurrency', 'PDateTime',
      'PExtended', 'PInt64', 'Pointer', 'PShortString', 'PString', 'PVariant', 'PWideChar', 'PWideString',
      'Real', 'Real48', 'ShortInt', 'ShortString', 'Single', 'SmallInt', 'String', 'TBits', 'TConvType', 'TDateTime',
      'Text', 'TextFile', 'TFloatFormat', 'TFormatSettings', 'TList', 'TObject', 'TOpenDialog', 'TPoint',
      'TPrintDialog', 'TRect', 'TReplaceFlags', 'TSaveDialog', 'TSearchRec', 'TStringList', 'TSysCharSet',
      'TThreadFunc', 'Variant', 'WideChar', 'WideString', 'Word'
      ),
                
    ),
  'CASE_SENSITIVE' => array(
    GESHI_COMMENTS => true,
      1 => false,
      2 => false,
      3 => false,
      4 => false,
    ),
  'STYLES' => array(
    'KEYWORDS' => array(
      1 => 'color: #000000; font-weight: bold;',
      2 => 'color: #000000; font-weight: bold;',
      3 => 'color: #000000;',
      4 => 'color: #000000;',
      5 => 'color: #C0C0C0;'
      ),
    'COMMENTS' => array(
      1 => 'color: #808080; font-style: italic;',
      'MULTI' => 'color: #808080; font-style: italic;'
      ),
    'ESCAPE_CHAR' => array(
      ),
    'BRACKETS' => array(
      0 => 'color: #000000;'
      ),
    'STRINGS' => array(
      0 => 'color: #ff0000;'
      ),
    'NUMBERS' => array(
      0 => 'color: #0000ff;'
      ),
    'METHODS' => array(
      1 => 'color: #000000;'
      ),
    'REGEXPS' => array(
                0 => 'color: #0000ff;',
                1 => 'color: #ff0000;',
                2 => 'color: #800080; font-weight: bold;'
      ),
    'SYMBOLS' => array(
      0 => 'color: #000000;'
      ),
    'SCRIPT' => array(
      )
    ),
  'URLS' => array(
    1 => '',
    2 => '',
    3 => '',
    4 => ''
    ),
  'OOLANG' => true,
  'OBJECT_SPLITTERS' => array(
    1 => '.'
    ),
  'REGEXPS' => array(
          0 => '\$[0-9a-fA-F]+',
          1 => '\#\$?[0-9]{1,3}',
          2 => array(
                 GESHI_SEARCH => '(\asm)([a-z]+)(end;)', 
                 GESHI_REPLACE => '\\2',            
                 GESHI_MODIFIERS => '',             
                 GESHI_BEFORE => '\\1',             
                 GESHI_AFTER => '\\3'              
          )
    ),
  'STRICT_MODE_APPLIES' => GESHI_NEVER,
  'SCRIPT_DELIMITERS' => array(
    ),
  'HIGHLIGHT_STRICT_BLOCK' => array(
    )
);
 
?>
 
 
Delphi code snippet:
procedure TForm1.Button1Click(Sender:TObject);
var
  Number, I, X: Integer;
begin
  Number := 12356;
  Caption := 'The number is ' + IntToStr(Number);
  for I := 0 to Number do
  begin
    Inc(X);
    Dec(X);
    X := X + 1.0;
    ListBox1.Items.Add(IntToStr(X));
  end;
  asm
    MOX AX, 1234H
    MOV Number, AX
  end;
end;

Open in new window

0
 
LVL 39

Accepted Solution

by:
Roger Baklund earned 500 total points
ID: 24875773
I have tried to make this work without luck. The closest I get is with this rule:

array(
    GESHI_SEARCH => '(asm)(.*)(end)',
    GESHI_REPLACE => '\\2',
    GESHI_MODIFIERS => 'sU',
    GESHI_BEFORE => '\\1',
    GESHI_AFTER => '\\3'
    )

Note that I have removed the ; after end. This is because when this rule is applied, a previous rule have allready changed the ; into <span style="color: #000000;">;</span>, so there is no match for "end;".

I suspect that you can not do this with GeSHi, it seems it is not able to apply styles correctly to a section of code that allready has styles applied.
0
 
LVL 26

Author Comment

by:EddieShipman
ID: 24877907
You know, now that you mention it, the asm and end are also being handled by another rule,
Is there a way to make it not "grab" the asm and end in the regex?
0
 
LVL 39

Expert Comment

by:Roger Baklund
ID: 24877946
>> Is there a way to make it not "grab" the asm and end in the regex?

Not sure if I understand what you mean. You have to match them in the regexp to find them, but you are not grabbing them in the rule above, they are placed in GESHI_BEFORE and GESHI_AFTER, only the part in between (\\2) is placed in GESHI_REPLACE.
0
 
LVL 26

Author Closing Comment

by:EddieShipman
ID: 31601824
Well, it still isn't working so I'm going to post to the geshi-dev mailing list to see if I can get them to correct it.
It now comes out very weird. Thanks for the help. If I get a solution, I'll come back and post it here for you to see.
0

Featured Post

Master Your Team's Linux and Cloud Stack!

The average business loses $13.5M per year to ineffective training (per 1,000 employees). Keep ahead of the competition and combine in-person quality with online cost and flexibility by training with Linux Academy.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Popularity Can Be Measured Sometimes we deal with questions of popularity, and we need a way to collect opinions from our clients.  This article shows a simple teaching example of how we might elect a favorite color by letting our clients vote for …
Introduction This article is intended for those who are new to PHP error handling (https://www.experts-exchange.com/articles/11769/And-by-the-way-I-am-New-to-PHP.html).  It addresses one of the most common problems that plague beginning PHP develop…
The viewer will learn how to count occurrences of each item in an array.
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question