split a text file using awk or sed based on tags

I need to split a text file in multiple files

Here is the text file

£ 1.2.3.4
random data including special characters
random data including special characters
random data including special characters
£ end_1.2.3.4
£ 4.5.6.7
more random data including special characters
more random data including special characters
more random data including special characters
more random data including special characters
more random data including special characters
£ end_4.5.6.7
£ 7.8.9.10
even more random data including special characters
more random data including special characters
£ end_7.8.9.10


The desired output:

The first file will be named 1.2.3.4 and will contain the following:

£ 1.2.3.4
random data including special characters
random data including special characters
random data including special characters
£ end_1.2.3.4


The second file will be name 4.5.6.7 and will contain the following:
£ 4.5.6.7
more random data including special characters
more random data including special characters
more random data including special characters
more random data including special characters
more random data including special characters
£ end_4.5.6.7

etc ...

Please use SED to solve this problem or if two complicated use AWK or at last resort bash.


Thanks for your help

PA
pierre-alexAsked:
Who is Participating?
 
Bill PrewConnect With a Mentor Commented:
Give this AWK a try.

{
  if (substr($0, 1, 1) == "£") {
    if (substr($0, 1, 5) != "£ end") {
       fileout = $2
    }
  }
  print $0>>fileout
}

Open in new window

~bp
0
 
Bill PrewCommented:
When you say the data is:

more random data including special characters

can there ever be a "£" in the first position of a line in that data?

~bp
0
 
pierre-alexAuthor Commented:
No
0
Cloud Class® Course: Python 3 Fundamentals

This course will teach participants about installing and configuring Python, syntax, importing, statements, types, strings, booleans, files, lists, tuples, comprehensions, functions, and classes.

 
pierre-alexAuthor Commented:
Thanks, how do I integrate your code into a bash script ? I am used to one liners, so not sure what the syntax is ... ?

#!/usr/bin/bash

awk ??? input file

{
  if (substr($0, 1, 1) == "£") {
    if (substr($0, 1, 5) != "£ end") {
       fileout = $2
    }
  }
  print $0>>fileout
}
0
 
Bill PrewCommented:
Save the script as a file, like myscript.awk, and then use the -f option on the awk command line to reference that script.

~bp
0
 
mccrackyCommented:
This very much sounds like an assignment question.  So, according to the EE policies:  What have you done so far to answer this question yourself?
0
 
pierre-alexAuthor Commented:
Hi mccracky, quite a bit of work was actually done before I turned to experts-exchange... The entire code itself is 2 pages long. It nay sound as an assignment because I spent many hours thinking on how to solve so I knew exactly the options that would fit correctly with the rest of the code.

regards

PA
0
 
pierre-alexAuthor Commented:
Hi billprew

Its working great,  thanks!!!

Just two small questions for my understanding:

1- I suppose $2 refers to the block of data between the delimiters. Why $2, why not $3 or $4?

2- I understand that print $0 means print the line  and fileout = $2 means, assign the block to variable fileout - I missing the relationship between the two elements.

Rgds

PA



0
 
Bill PrewCommented:
==> 1- I suppose $2 refers to the block of data between the delimiters.
==> Why $2, why not $3 or $4?


AWK is parsing each line, one at a time, and by default will parse it up
braking apart at spaces.  So we are looking for the "start of block"
lines, and these are identified by having the "£" in the first position,
but not the "end" word after that.  So when a start of block line is
read it looks like:

£ 1.2.3.4

When AWK process that it splits the line up by spaces, and will set:

$1 = £
$2 = 1.2.3.4

We need to grab the $2 value and save that so that we can use it as the
name of the output file to write that block of data to.

==> 2- I understand that print $0 means print the line  and fileout = $2 means,
==> assign the block to variable fileout - I missing the relationship between the two elements.


Building on the above answer, now that we have the name of the file we want to
write this blocks data to, we can use the

  print $0>>fileout

statement to do this.  $0 referes to the entire line of data just
read in.  ">>" uses standard DOS redirection notation to indicate that
we want the print statement to append its output to the file who's
name is stored in the fileout variable.  For the block mentioned above
this would write to a file named 1.2.3.4

~bp
0
 
pierre-alexAuthor Commented:
Hi bp,

Thanks for the explaination.

PA
0
 
pierre-alexAuthor Commented:
GREAT HELP. THANKS
0
 
Bill PrewCommented:
Very welcome.

~bp
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.