Regular Expression to Split a String

Working in Pascal, I would like a regular expression that can split a string into 3 parts. The string to be split will always be 1 or more digits, followed by 1 or 2 letters, followed by 1 or more digits.

Examples are '1A1', 12B6', '4AA17'.

These should be split into [1, A, 1], [12, B, 6] and [4, AA, 17].

The Pascal code will look like this:
var
  ResultArray: TArray<string>;
  MyRegEx: string;
begin
  MyRegEx := 'The RegEx I am looking for';
  ResultArray := nil;
  try
    ResultArray := TRegEx.Split(MyInputString, MyRegEx);

Open in new window

Is it possible to do this with a single regular expression?
plumothyAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

frankhelkCommented:
Hi,

I'm not that much an expert with regular expressions, but

(\d+)([A-Z]{1,3})(\d+)

Open in new window


should return 3 captured groups as specified. Since I've never done anything substantial in PASCAL, you're on your own with extracting the results  ... ;-)

I've designed and tested  that RegEx with the free tool "Expresso" (search for it on the web ...), which I recommend strongly for such puzzles.  The following image is a sreen shot of Expresso with a test run of your example strings:

Expresso screenshot of test run
frankhelkCommented:
Addendum: Expresso even supplies some code to use that RegEx, but it's limited to C#, VB.Net, Managed C++ and C++/CLI. Here's the C# code as generated by Expresso:

//  using System.Text.RegularExpressions;

/// <summary>
///  Regular expression built for C# on: Do, Sep 24, 2015, 11:46:51 
///  Using Expresso Version: 3.0.4750, http://www.ultrapico.com
///  
///  A description of the regular expression:
///  
///  [1]: A numbered capture group. [\d+]
///      Any digit, one or more repetitions
///  [2]: A numbered capture group. [[A-Z]{1,3}]
///      Any character in this class: [A-Z], between 1 and 3 repetitions
///  [3]: A numbered capture group. [\d+]
///      Any digit, one or more repetitions
///  
///
/// </summary>
public static Regex regex = new Regex(
      "(\\d+)([A-Z]{1,3})(\\d+)",
    RegexOptions.CultureInvariant
    | RegexOptions.Compiled
    );


// This is the replacement string
public static string regexReplace = 
      "$& [${Day}-${Month}-${Year}]";


//// Replace the matched text in the InputText using the replacement pattern
// string result = regex.Replace(InputText,regexReplace);

//// Split the InputText wherever the regex matches
// string[] results = regex.Split(InputText);

//// Capture the first Match, if any, in the InputText
// Match m = regex.Match(InputText);

//// Capture all Matches in the InputText
// MatchCollection ms = regex.Matches(InputText);

//// Test to see if there is a match in the InputText
// bool IsMatch = regex.IsMatch(InputText);

//// Get the names of all the named and numbered capture groups
// string[] GroupNames = regex.GetGroupNames();

//// Get the numbers of all the named and numbered capture groups
// int[] GroupNumbers = regex.GetGroupNumbers();

Open in new window

frankhelkCommented:
Errata:

Ooops ... just saw the error ... would work, too, but is not exactly as specified. Should read

(\d+)([A-Z]{1,2})(\d+)

Open in new window


Sorry ;-)

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
OWASP: Threats Fundamentals

Learn the top ten threats that are present in modern web-application development and how to protect your business from them.

plumothyAuthor Commented:
frankhelk,

Thanks for that. Unfortunately, it doesn't quite work. I am using the RegularExpressions unit in Delphi XE4.
The ResultArray does have three elements as expected but the first and third ones are always empty, and the 2nd element contains the digit(s) after the letter. (So, '4AA17' returns ['', '17', ''])
frankhelkCommented:
Hmmm ... I've tested that expression at http://www.regextester.com/ again, and it works as expected (see attached screenshot - the substitution shows the correct split). As you might notice, I've seleted the PCRE variant of the RegEx engine, which I saw cited on some website for being base of the RegEx processing in DelphiXE4.

I've included the flags used for testing, maybe you could suppy them and play around with that a bit. Could be the defaults are somewhat picky. You could play arount with the flags on regextester.com, I've found some flags blocking the detection of the parts.

You might find some more tips at http://www.regular-expressions.info/delphi.html, maybe you have the wrong engine for your character set (UTF8, UTF16, UNICODE, etc.).
ScreenshotTester.png
plumothyAuthor Commented:
Correction - yes the original solution did work (once I correctly gathered the results!)

Many thanks.
plumothyAuthor Commented:
For the sake of completion, the Delphi code which correctly extracted the results was:
function ParseFMRef(Ref: string; var FRef: Integer; var FFRef: string; var FMRef: Integer): boolean;
var
	FMRefRegEx: TRegEx;
	MatchResults: TMatch;
	GroupObj: TGroup;
	I: Integer;
begin
  result := false;
  try
    FMRefRegEx := TRegEx.Create('(\d+)([A-Z]{1,2})(\d+)', [roIgnoreCase]);
    MatchResults := FMRefRegEx.Match(Ref);
    while MatchResults.Success do begin
      for I := 1 to 3 do begin
        GroupObj := MatchResults.Groups[I];
        if GroupObj.Success then begin
          case I of
            1: FRef := StrToInt(GroupObj.Value);
            2: FFRef := GroupObj.Value;
            3: FMRef := StrToInt(GroupObj.Value);
          end;
        end;
      end;
      MatchResults := MatchResults.NextMatch();
    end;
    result := true;
  except
    result := false;
  end;
end;

Open in new window

It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Regular Expressions

From novice to tech pro — start learning today.