split a text file using awk or sed based on tags

I need to split a text file in multiple files

Here is the text file

£ 1.2.3.4
random data including special characters
random data including special characters
random data including special characters
£ end_1.2.3.4
£ 4.5.6.7
more random data including special characters
more random data including special characters
more random data including special characters
more random data including special characters
more random data including special characters
£ end_4.5.6.7
£ 7.8.9.10
even more random data including special characters
more random data including special characters
£ end_7.8.9.10


The desired output:

The first file will be named 1.2.3.4 and will contain the following:

£ 1.2.3.4
random data including special characters
random data including special characters
random data including special characters
£ end_1.2.3.4


The second file will be name 4.5.6.7 and will contain the following:
£ 4.5.6.7
more random data including special characters
more random data including special characters
more random data including special characters
more random data including special characters
more random data including special characters
£ end_4.5.6.7

etc ...

Please use SED to solve this problem or if two complicated use AWK or at last resort bash.


Thanks for your help

PA
pierre-alexAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Bill PrewCommented:
When you say the data is:

more random data including special characters

can there ever be a "£" in the first position of a line in that data?

~bp
0
pierre-alexAuthor Commented:
No
0
Bill PrewCommented:
Give this AWK a try.

{
  if (substr($0, 1, 1) == "£") {
    if (substr($0, 1, 5) != "£ end") {
       fileout = $2
    }
  }
  print $0>>fileout
}

Open in new window

~bp
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Acronis True Image 2019 just released!

Create a reliable backup. Make sure you always have dependable copies of your data so you can restore your entire system or individual files.

pierre-alexAuthor Commented:
Thanks, how do I integrate your code into a bash script ? I am used to one liners, so not sure what the syntax is ... ?

#!/usr/bin/bash

awk ??? input file

{
  if (substr($0, 1, 1) == "£") {
    if (substr($0, 1, 5) != "£ end") {
       fileout = $2
    }
  }
  print $0>>fileout
}
0
Bill PrewCommented:
Save the script as a file, like myscript.awk, and then use the -f option on the awk command line to reference that script.

~bp
0
mccrackyCommented:
This very much sounds like an assignment question.  So, according to the EE policies:  What have you done so far to answer this question yourself?
0
pierre-alexAuthor Commented:
Hi mccracky, quite a bit of work was actually done before I turned to experts-exchange... The entire code itself is 2 pages long. It nay sound as an assignment because I spent many hours thinking on how to solve so I knew exactly the options that would fit correctly with the rest of the code.

regards

PA
0
pierre-alexAuthor Commented:
Hi billprew

Its working great,  thanks!!!

Just two small questions for my understanding:

1- I suppose $2 refers to the block of data between the delimiters. Why $2, why not $3 or $4?

2- I understand that print $0 means print the line  and fileout = $2 means, assign the block to variable fileout - I missing the relationship between the two elements.

Rgds

PA



0
Bill PrewCommented:
==> 1- I suppose $2 refers to the block of data between the delimiters.
==> Why $2, why not $3 or $4?


AWK is parsing each line, one at a time, and by default will parse it up
braking apart at spaces.  So we are looking for the "start of block"
lines, and these are identified by having the "£" in the first position,
but not the "end" word after that.  So when a start of block line is
read it looks like:

£ 1.2.3.4

When AWK process that it splits the line up by spaces, and will set:

$1 = £
$2 = 1.2.3.4

We need to grab the $2 value and save that so that we can use it as the
name of the output file to write that block of data to.

==> 2- I understand that print $0 means print the line  and fileout = $2 means,
==> assign the block to variable fileout - I missing the relationship between the two elements.


Building on the above answer, now that we have the name of the file we want to
write this blocks data to, we can use the

  print $0>>fileout

statement to do this.  $0 referes to the entire line of data just
read in.  ">>" uses standard DOS redirection notation to indicate that
we want the print statement to append its output to the file who's
name is stored in the fileout variable.  For the block mentioned above
this would write to a file named 1.2.3.4

~bp
0
pierre-alexAuthor Commented:
Hi bp,

Thanks for the explaination.

PA
0
pierre-alexAuthor Commented:
GREAT HELP. THANKS
0
Bill PrewCommented:
Very welcome.

~bp
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Regular Expressions

From novice to tech pro — start learning today.