Link to home
Start Free TrialLog in
Avatar of psycopath
psycopath

asked on

search and replace

Hey everyone,

I have a string that I would like to replace spaces with colons. I am getting hung up on a few places.

ex. $string = "0010863 2 0.00 0.00 Closed as complete GMB DX"

I would like the to place colons:

$string = "0010863:2:0.00:0.00:Closed as complete:GMB:DX"

I was hoping to do this with a regex, but I am not very good with them. Any help would great :).
Avatar of jkr0605
jkr0605

here ya go:

$string =~ s/\s/:/g;

will replace any space-like character (e.g. space, tabs) with a colon. If you want to replace whole strings of spaces with a single colon e.g. '0.00    0.00' -> '0.00:0.00', then use:

$string =~ s/\s+/:/g;

hth.

$string =~ s/\s/:/g

But your example doesn't have all spaces converted to colons (in the text).

Is that right?  If so, can you give a few more examples of the lines?
Whoops - just reread your post and spotted the lack of colons in the 'Closed as complete' section. This needs a bit more work.

Is the format specified above fixed, i.e.:

<zero padded number><whitespace><number><whitespace><decimal number><whitespace><decimal number><whitespace><some text comment w. whitespace><whitespace><3-letter code><whitespace><2-letter code>

or could the fields be anything?
Avatar of psycopath

ASKER

each line will be the same execpt "<some text comment>", which needs to be kept in a seperate : deliminated section. This line will very from line to line.
ASKER CERTIFIED SOLUTION
Avatar of jkr0605
jkr0605

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
so far, so good... jkr0605, your mojo works. would you be so kind as to provide a little bit of explanation on what you are doing. I have been trying to get the hang of expressions for a month now, and still a little in the dark.
Right ho, I've knocked out the (?:...) bit in the expressions above - it's actually redundant and left over from my attempt to solve the whole thing in one go. This leaves us with two expressions:

$string =~ s/(\d)\s+/\1:/g;                (1)
$string =~ s/\s+(\w+)\s+(\w+)$/:\1:\2/;    (2)

OK - we'll just start with mentioning the use of brackets () and the \<number> bits in the expressiongs. Brackets in the left hand side of the regexp (the stuff between the first two /'s) don't mean that we want to match bracket characters - they mean that we want to remember the stuff that was matched inside the brackets. The \<number> bits mean that we want to put one of our remembered strings in in place of the \<number>, where the <number> means the <number>th match we made on the left hand side.

With that in mind, let's start with (1) - it's a search and replace (the s/// bit).

The first thing we do is look for any single numeric character (\d is shorthand for any character 0-9 ) and remembering their values, followed by any number of whitespace characters (\s being the shorthand for whitespace and the + following it meaning 'one or more'). That's the bit in the first //. The second section says we intend to replace the string we matched and replace it with the number we remembered, followed by a colon. In english: 'look for any number of whitespace characters which follow a number, remember the number and then replace the whitespace and number with the number and a colon'.

Now for (2). Again a search and replace and again  we're looking for any number of whitespace characters with the \s+, followed by any number of 'word' characters (\w is shorthand for a-zA-Z0-9 and the + serves the same 'one or more' purpose as before). We save this using the surrounding brackets (\w+). Then we look for the same pattern again: any number of spaces followed by any number of word characters, and remember the word characters we matched. Finally, we put '$' at the end of the string  - which doesn't mean 'match the character $', but 'the end of the string'. The right hand side of the expression now falls into place - replace what we found with a colon, the first remembered set of word characters, another colon and then the second set of word characters.

In each expression, there's a trick which allows us to work with your format - in the first it's the 'g' at the end of the expression. It means that the expression will keep matching until it can no more, which covers the first 'half' of your string, which is made entirely of numbers separated by whitespace. Because we're not going to see that anywhere else in the string - this allows us to safely format the first section. In the second expression, it's the $ and the fact that we know that there are two groups of characters separated by spaces at the end of the string.

Anyway - if you're going to be working much more with regexps, I heartily reccommend both the command 'perldoc perlre' and 'perldoc perlfaq6' - the perl documentation covering regular expressions and the perl FAQs concerning regexps. Also Jeffery Friedl's 'Mastering Regular Expressions' from O'Reilly - it's a masterpeice and one of the finest and most referred to books on my desk.