Avatar of scotru
scotruFlag for United States of America

asked on 

Translate perl regular expression parsing to C#

I'm new to both Perl and regular expressions.  I have a messy data file that I need to "neaten".  I've been given a simple Perl script that accomplishes the task and need to translate this script into C# code.  Here's the important part of the script:

while (<INPUTFILE>) {
print OUTPUTFILE "$2+$4+$5\n"
if m@([^0-9]*)(1[0-9]{11}|[0-9]{11})([^0-9]*)([0-9]{4}/[0-9]{2}/[0-9]{2}) ([0-9]{2}:[0-9]{2}:[0-9]{2})@;
}

With no perl background and very little regular expression background, I'm having trouble figuring out what's what here.  C# has regular expression handling via Regex but I'm having trouble converting this to that.  Any help much appreciated!
.NET ProgrammingRegular ExpressionsPerl

Avatar of undefined
Last Comment
scotru
Avatar of Adam314
Adam314


use YAPE::Regex::Explain;
 
print YAPE::Regex::Explain->new(qr@([^0-9]*)(1[0-9]{11}|[0-9]{11})([^0-9]*)([0-9]{4}/[0-9]{2}/[0-9]{2}) ([0-9]{2}:[0-9]{2}:[0-9]{2})@)->explain;
 
 
 
The regular expression:
 
(?-imsx:([^0-9]*)(1[0-9]{11}|[0-9]{11})([^0-9]*)([0-9]{4}/[0-9]{2}/[0-9]{2}) ([0-9]{2}:[0-9]{2}:[0-9]{2}))
 
matches as follows:
  
NODE                     EXPLANATION
----------------------------------------------------------------------
(?-imsx:                 group, but do not capture (case-sensitive)
                         (with ^ and $ matching normally) (with . not
                         matching \n) (matching whitespace and #
                         normally):
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    [^0-9]*                  any character except: '0' to '9' (0 or
                             more times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  (                        group and capture to \2:
----------------------------------------------------------------------
    1                        '1'
----------------------------------------------------------------------
    [0-9]{11}                any character of: '0' to '9' (11 times)
----------------------------------------------------------------------
   |                        OR
----------------------------------------------------------------------
    [0-9]{11}                any character of: '0' to '9' (11 times)
----------------------------------------------------------------------
  )                        end of \2
----------------------------------------------------------------------
  (                        group and capture to \3:
----------------------------------------------------------------------
    [^0-9]*                  any character except: '0' to '9' (0 or
                             more times (matching the most amount
                             possible))
----------------------------------------------------------------------
  )                        end of \3
----------------------------------------------------------------------
  (                        group and capture to \4:
----------------------------------------------------------------------
    [0-9]{4}                 any character of: '0' to '9' (4 times)
----------------------------------------------------------------------
    /                        '/'
----------------------------------------------------------------------
    [0-9]{2}                 any character of: '0' to '9' (2 times)
----------------------------------------------------------------------
    /                        '/'
----------------------------------------------------------------------
    [0-9]{2}                 any character of: '0' to '9' (2 times)
----------------------------------------------------------------------
  )                        end of \4
----------------------------------------------------------------------
                           ' '
----------------------------------------------------------------------
  (                        group and capture to \5:
----------------------------------------------------------------------
    [0-9]{2}                 any character of: '0' to '9' (2 times)
----------------------------------------------------------------------
    :                        ':'
----------------------------------------------------------------------
    [0-9]{2}                 any character of: '0' to '9' (2 times)
----------------------------------------------------------------------
    :                        ':'
----------------------------------------------------------------------
    [0-9]{2}                 any character of: '0' to '9' (2 times)
----------------------------------------------------------------------
  )                        end of \5
----------------------------------------------------------------------
)                        end of grouping
----------------------------------------------------------------------

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of shanikawm
shanikawm
Flag of Sri Lanka image

Blurred text
THIS SOLUTION IS ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
Avatar of scotru
scotru
Flag of United States of America image

ASKER

Thanks both for your responses.  I think maybe my orginal question was a little unclear (and is probably so obvious to someone with Perl background)--it's really the language syntax that's got me a bit confused here.   I understand what the regular expression does to a line a text, but coming from a C background, I can't even find the assignment operators in this code.  Shanikawm, your response helps me.  

Am I correct in understanding that what this code does is execute this regular expression
([^0-9]*)(1[0-9]{11}|[0-9]{11})([^0-9]*)([0-9]{4}/[0-9]{2}/[0-9]{2}) ([0-9]{2}:[0-9]{2}:[0-9]{2})
and store the match results of each unnamed group in $1, $2, $3, $4, etc....?

[update] I think I've got equivalent C code working below.  This is for a single line. (Not quite as compact as the Perl version!)
            string line = "%E?;100000640632?2007/08/28 11:05:53";
            string matchPattern = @"([^0-9]*)(1[0-9]{11}|[0-9]{11})([^0-9]*)([0-9]{4}/[0-9]{2}/[0-9]{2}) ([0-9]{2}:[0-9]{2}:[0-9]{2})";
 
              Regex re = new Regex(matchPattern);
              string[] matches = re.Split(line);
              int count = matches.GetUpperBound(0);
 
              if (count >= 5)
              {
                  Console.Write(matches[2]);
                  Console.Write(matches[4]);
                  Console.WriteLine(matches[5]);
                  Console.ReadKey();
              }

Open in new window

Avatar of shanikawm
shanikawm
Flag of Sri Lanka image

Yes you are correct.  Perl stores matched results in $1,$2,$3 ...
Avatar of scotru
scotru
Flag of United States of America image

ASKER

Thanks for your help!
.NET Programming
.NET Programming

The .NET Framework is not specific to any one programming language; rather, it includes a library of functions that allows developers to rapidly build applications. Several supported languages include C#, VB.NET, C++ or ASP.NET.

137K
Questions
--
Followers
--
Top Experts
Get a personalized solution from industry experts
Ask the experts
Read over 600 more reviews

TRUSTED BY

IBM logoIntel logoMicrosoft logoUbisoft logoSAP logo
Qualcomm logoCitrix Systems logoWorkday logoErnst & Young logo
High performer badgeUsers love us badge
LinkedIn logoFacebook logoX logoInstagram logoTikTok logoYouTube logo