Solved

modify/sort data in text file

Posted on 2011-02-19
19
959 Views
Last Modified: 2012-05-11
Hi,

I have a 30,000 line text file and need some help with data formatting.

Let me use an example; say I have lines as below:
C
 host_10_57_31_66
 host_10_57_31_67
 host_10_80_40_201
 host_10_80_40_202
A
 10_14_60_0_22       
 10_14_63_0_24       
 10_14_64_0_24       
B
 host_10_13_5_116
 host_10_13_5_117

 I want to modify as below:

set C address host_10_57_31_66
set C address host_10_57_31_67
set C address host_10_80_40_201
set C address host_10_80_40_202

set A address 10_14_60_0_22       
set A address 10_14_63_0_24       
set A address 10_14_64_0_24       

set B address host_10_13_5_116
set B address host_10_13_5_117

Thank you for all the help.
0
Comment
Question by:dpk_wal
  • 10
  • 8
19 Comments
 
LVL 16

Expert Comment

by:sjklein42
Comment Utility
while ( <> )
{
	s/[\r\n]//g;

	if ( /^([A-Z])/ )
	{
		if ( $letter ne '' ) { print "\n"; }
		$letter = $1;
	}
	elsif ( /^ / )
	{
		$addr = $';
		print "set $letter address $addr\n";
	}
}

Open in new window


>perl foo.pl foo.txt
set C address host_10_57_31_66
set C address host_10_57_31_67
set C address host_10_80_40_201
set C address host_10_80_40_202

set A address 10_14_60_0_22
set A address 10_14_63_0_24
set A address 10_14_64_0_24

set B address host_10_13_5_116
set B address host_10_13_5_117

Open in new window

0
 
LVL 16

Expert Comment

by:sjklein42
Comment Utility
Not sure where the "sorting" part of your project (see title) is supposed to come in.
0
 
LVL 3

Expert Comment

by:gopisera
Comment Utility
The easy way use the sed command and replace the code with common lines
0
 
LVL 32

Author Comment

by:dpk_wal
Comment Utility
Thank you for the suggestions; I would try on Monday and get back to you.

sjklein42:
It kind of gets sorted for me as C is set which contains addresses! :)

gopisera:
Sorry but you would need to give full command than suggestion; am total zero with scripting.

Regards.
0
 
LVL 32

Author Comment

by:dpk_wal
Comment Utility
Hi sjklein42,

This works for my sample file which has A,B,C as address names; in actual file the names are bigger and combination of alphabets, numbers and hash(-) or underscore(_).

The sample file format still remains same, I have name in the first line, followed by host_ or 10_ entries [all ending with new line character].

All entries below address set name always start with host_ or 10_ if that helps you.

Can you please post modification to your code.

Thank you.
0
 
LVL 16

Expert Comment

by:sjklein42
Comment Utility
I am not sure if there is a problem or not.

Please post a new sample input file with the expanded format.  Can you include an example where my solution program does not give the right output?
0
 
LVL 32

Author Comment

by:dpk_wal
Comment Utility
Here's the output:
-bash-2.05b$ perl foo.pl d
set  address src_57949
set  address   10_14_60_0_22
set  address   10_14_63_0_24
set  address   10_14_64_0_24
set  address src_CR56066
set  address   host_10_57_31_66
set  address   host_10_57_31_67
set  address   host_10_80_40_201
-bash-2.05b$ cat d
 src_57949
   10_14_60_0_22
   10_14_63_0_24
   10_14_64_0_24
 src_CR56066
   host_10_57_31_66
   host_10_57_31_67
   host_10_80_40_201

where the script works:
-bash-2.05b$ perl foo.pl foo.txt
set C address host_10_57_31_66
set C address host_10_57_31_67
set C address host_10_80_40_201
set C address host_10_80_40_202

set A address 10_14_60_0_22
set A address 10_14_63_0_24
set A address 10_14_64_0_24

set B address host_10_13_5_116
set B address host_10_13_5_117

-bash-2.05b$ cat foo.txt
C
 host_10_57_31_66
 host_10_57_31_67
 host_10_80_40_201
 host_10_80_40_202
A
 10_14_60_0_22
 10_14_63_0_24
 10_14_64_0_24
B
 host_10_13_5_116
 host_10_13_5_117

As I understand in the code we are doing:
5:       if ( /^([A-Z])/ )
per my understanding we are only matching for upper case A-Z; am not 100% sure though.

Thank you.
0
 
LVL 16

Accepted Solution

by:
sjklein42 earned 500 total points
Comment Utility
This version should handle names with digits, dashes and underscores, lower and uppercase.

while ( <> )
{
	s/[\r\n]//g;

	if ( /^([A-Z0-9\-\_]+)/i )
	{
		if ( $name ne '' ) { print "\n"; }
		$name = $1;
	}
	elsif ( /^ / )
	{
		$addr = $';
		print "set $name address $addr\n";
	}
}

Open in new window

0
 
LVL 32

Author Comment

by:dpk_wal
Comment Utility
Stii doesn't work :(

-bash-2.05b$ perl foo.pl d
set  address src_57949
set  address   10_14_60_0_22
set  address   10_14_63_0_24
set  address   10_14_64_0_24
set  address src_CR56066
set  address   host_10_57_31_66
set  address   host_10_57_31_67
set  address   host_10_80_40_201
-bash-2.05b$ cat foo.pl
#!/usr/local/bin/perl
while ( <> )
{
        s/[\r\n]//g;

        if ( /^([A-Z0-9\-\_]+)/i )
        {
                if ( $name ne '' ) { print "\n"; }
                $name = $1;
        }
        elsif ( /^ / )
        {
                $addr = $';
                print "set $name address $addr\n";
        }
}
0
Highfive + Dolby Voice = No More Audio Complaints!

Poor audio quality is one of the top reasons people don’t use video conferencing. Get the crispest, clearest audio powered by Dolby Voice in every meeting. Highfive and Dolby Voice deliver the best video conferencing and audio experience for every meeting and every room.

 
LVL 16

Expert Comment

by:sjklein42
Comment Utility
Please, I don't have the input data you are using so can't try it myself.  Can you post your input data that is not working right.
0
 
LVL 32

Author Comment

by:dpk_wal
Comment Utility
I have posted my input data file named "d".

-bash-2.05b$ cat d
 src_57949
   10_14_60_0_22
   10_14_63_0_24
   10_14_64_0_24
 src_CR56066
   host_10_57_31_66
   host_10_57_31_67
   host_10_80_40_201
0
 
LVL 16

Expert Comment

by:sjklein42
Comment Utility
The problem is that the new "d" file  has a blank character at the beginning of the "name" lines, and three blank characters at the beginning of each of the address lines.

Your original data had no blanks on the name lines and one blank on the address lines.

Is there a consistent rule to be followed here?  How are we to recognize the name lines?  The rule I was using was that there were no blanks at the beginning of the name lines, but that is not the case in your "d" file.

What are the rules for blanks at the beginning of the lines in the input file?
0
 
LVL 16

Expert Comment

by:sjklein42
Comment Utility
This is what I expected to see for input.  I can adjust but we need to be able to describe the right "rule" for leading space characters.

src_57949
 10_14_60_0_22
 10_14_63_0_24
 10_14_64_0_24
src_CR56066
 host_10_57_31_66
 host_10_57_31_67
 host_10_80_40_201 

Open in new window

0
 
LVL 32

Author Comment

by:dpk_wal
Comment Utility
I removed all leading spaces; now I do not get anything at all:
-bash-2.05b$ perl foo.pl d







-bash-2.05b$ cat d
src_57949
10_14_60_0_22
10_14_63_0_24
10_14_64_0_24
src_CR56066
host_10_57_31_66
host_10_57_31_67
host_10_80_40_201

-bash-2.05b$ cat foo.pl
#!/usr/local/bin/perl
while ( <> )
{
        s/[\r\n]//g;

        if ( /^([A-Z0-9\-\_]+)/i )
        {
                if ( $name ne '' ) { print "\n"; }
                $name = $1;
        }
        elsif ( /^ / )
        {
                $addr = $';
                print "set $name address $addr\n";
        }
}
0
 
LVL 16

Expert Comment

by:sjklein42
Comment Utility
I provided the solution program and debugged the problem with your input.  Why no points?
0
 
LVL 16

Expert Comment

by:sjklein42
Comment Utility
dpk_wal,

I am sorry you are getting frustrated, but  if you look at the data I posted, there is one leading space on the lines with addresses, and no leading spaces on the lines with names.  You must have changed that when you copied it into your test.

Why do you keep changing the format of the input data?
0
 
LVL 32

Author Comment

by:dpk_wal
Comment Utility
Oops looks like error by me; I wanted to give you 500 points! I will check; thank you for objecting.
0
 
LVL 16

Expert Comment

by:sjklein42
Comment Utility
dpk_wal,

Thank you, friend.  But I think you may have clicked the wrong button again.
0
 
LVL 32

Author Comment

by:dpk_wal
Comment Utility
Sorry been a crazy day; I think I just clicked a post with code; and in both cases it was the output I was putting forward; sorry again!
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

Suggested Solutions

This Windows batch file is useful for organizing image files from a digital camera or other source, but can have many other uses.  It simply renames the file(s) to match their create date.  For example, if you took a picture today at 1:40pm and the …
Many time we need to work with multiple files all together. If its windows system then we can use some GUI based editor to accomplish our task. But what if you are on putty or have only CLI(Command Line Interface) as an option to  edit your files. I…
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

14 Experts available now in Live!

Get 1:1 Help Now