Hi experts,
I have some long lines of data, and in the middle of each of these lines are occasional pseudo-fixed-length numbers. By that I mean that they will always occupy a fixed number of characters (e.g. 5 chars) but they may or may not be padded on the left with spaces (e.g. 2 spaces and 3 digits). For example, I might have some lines of data like this (The pound signs represent other alphanumeric data; I'm just highlighting the portion that I'm referring to):
#####12345#####
##### 123#####
##### 1234#####
So you see the first line has some data followed by the number 12345 followed by some more data. The second line has some data, followed by 2 spaces and 3 digits, and then more data. Finally the third line has data, 1 space and 4 digits, and again more data.
I need a regex that will create a consistent back reference to *just the number part* of that and exclude the spaces. My first thought was, of course, to use something along the lines of this:
/\s*(\d+)/
If that worked, the number would be put into backreference \1 . However, it doesn't always work, because the pound signs represent other bits of data which could possibly be numeric. I run into a problem when the data just beyond this number is also numeric--the expression above wouldn't be able to tell the difference. Example:
abcde 1234a4a4
So this is the string abcde, followed by 2 spaces and then the number 123, followed by the string 4a4a4. I would want to just match 123, but my expression above would spill into the next piece of data and give me 1234.
So what I really want to do is something more like this (this is only a pseudo-regular expression):
/\s*(\d{ (5 - number of spaces matched) })/
Except I don't know if anything like that can be done with regex. I even thought of compiling a grouping of a lot of different possibilities OR'ed together, but if I do that, I'm not sure how to consistently retrieve the back reference there either. Something like this:
/(?:(\d{5})|\s(\d{4})|\s{2
}(\d{3})|\
s{3}(\d{2}
)|\s{4}(\d
))/
That would match perfectly every time and would not spill over into the following data, but it also creates a new problem: the number would be stored in either \1 , \2 , \3 , \4 , or \5 depending on how many digits it was. So if I go to do a replacement, I'm not sure which one to use. I would like it to always be in the same place so that I can actually do something with it.
Let me know if any of this is unclear. Thanks in advance!