?
Solved

modify/sort data in text file

Posted on 2011-02-19
19
Medium Priority
?
1,004 Views
Last Modified: 2012-05-11
Hi,

I have a 30,000 line text file and need some help with data formatting.

Let me use an example; say I have lines as below:
C
 host_10_57_31_66
 host_10_57_31_67
 host_10_80_40_201
 host_10_80_40_202
A
 10_14_60_0_22       
 10_14_63_0_24       
 10_14_64_0_24       
B
 host_10_13_5_116
 host_10_13_5_117

 I want to modify as below:

set C address host_10_57_31_66
set C address host_10_57_31_67
set C address host_10_80_40_201
set C address host_10_80_40_202

set A address 10_14_60_0_22       
set A address 10_14_63_0_24       
set A address 10_14_64_0_24       

set B address host_10_13_5_116
set B address host_10_13_5_117

Thank you for all the help.
0
Comment
Question by:dpk_wal
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 10
  • 8
19 Comments
 
LVL 16

Expert Comment

by:sjklein42
ID: 34932610
while ( <> )
{
	s/[\r\n]//g;

	if ( /^([A-Z])/ )
	{
		if ( $letter ne '' ) { print "\n"; }
		$letter = $1;
	}
	elsif ( /^ / )
	{
		$addr = $';
		print "set $letter address $addr\n";
	}
}

Open in new window


>perl foo.pl foo.txt
set C address host_10_57_31_66
set C address host_10_57_31_67
set C address host_10_80_40_201
set C address host_10_80_40_202

set A address 10_14_60_0_22
set A address 10_14_63_0_24
set A address 10_14_64_0_24

set B address host_10_13_5_116
set B address host_10_13_5_117

Open in new window

0
 
LVL 16

Expert Comment

by:sjklein42
ID: 34932614
Not sure where the "sorting" part of your project (see title) is supposed to come in.
0
 
LVL 3

Expert Comment

by:gopisera
ID: 34933195
The easy way use the sed command and replace the code with common lines
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
LVL 32

Author Comment

by:dpk_wal
ID: 34935428
Thank you for the suggestions; I would try on Monday and get back to you.

sjklein42:
It kind of gets sorted for me as C is set which contains addresses! :)

gopisera:
Sorry but you would need to give full command than suggestion; am total zero with scripting.

Regards.
0
 
LVL 32

Author Comment

by:dpk_wal
ID: 34940237
Hi sjklein42,

This works for my sample file which has A,B,C as address names; in actual file the names are bigger and combination of alphabets, numbers and hash(-) or underscore(_).

The sample file format still remains same, I have name in the first line, followed by host_ or 10_ entries [all ending with new line character].

All entries below address set name always start with host_ or 10_ if that helps you.

Can you please post modification to your code.

Thank you.
0
 
LVL 16

Expert Comment

by:sjklein42
ID: 34940269
I am not sure if there is a problem or not.

Please post a new sample input file with the expanded format.  Can you include an example where my solution program does not give the right output?
0
 
LVL 32

Author Comment

by:dpk_wal
ID: 34940401
Here's the output:
-bash-2.05b$ perl foo.pl d
set  address src_57949
set  address   10_14_60_0_22
set  address   10_14_63_0_24
set  address   10_14_64_0_24
set  address src_CR56066
set  address   host_10_57_31_66
set  address   host_10_57_31_67
set  address   host_10_80_40_201
-bash-2.05b$ cat d
 src_57949
   10_14_60_0_22
   10_14_63_0_24
   10_14_64_0_24
 src_CR56066
   host_10_57_31_66
   host_10_57_31_67
   host_10_80_40_201

where the script works:
-bash-2.05b$ perl foo.pl foo.txt
set C address host_10_57_31_66
set C address host_10_57_31_67
set C address host_10_80_40_201
set C address host_10_80_40_202

set A address 10_14_60_0_22
set A address 10_14_63_0_24
set A address 10_14_64_0_24

set B address host_10_13_5_116
set B address host_10_13_5_117

-bash-2.05b$ cat foo.txt
C
 host_10_57_31_66
 host_10_57_31_67
 host_10_80_40_201
 host_10_80_40_202
A
 10_14_60_0_22
 10_14_63_0_24
 10_14_64_0_24
B
 host_10_13_5_116
 host_10_13_5_117

As I understand in the code we are doing:
5:       if ( /^([A-Z])/ )
per my understanding we are only matching for upper case A-Z; am not 100% sure though.

Thank you.
0
 
LVL 16

Accepted Solution

by:
sjklein42 earned 2000 total points
ID: 34940458
This version should handle names with digits, dashes and underscores, lower and uppercase.

while ( <> )
{
	s/[\r\n]//g;

	if ( /^([A-Z0-9\-\_]+)/i )
	{
		if ( $name ne '' ) { print "\n"; }
		$name = $1;
	}
	elsif ( /^ / )
	{
		$addr = $';
		print "set $name address $addr\n";
	}
}

Open in new window

0
 
LVL 32

Author Comment

by:dpk_wal
ID: 34941450
Stii doesn't work :(

-bash-2.05b$ perl foo.pl d
set  address src_57949
set  address   10_14_60_0_22
set  address   10_14_63_0_24
set  address   10_14_64_0_24
set  address src_CR56066
set  address   host_10_57_31_66
set  address   host_10_57_31_67
set  address   host_10_80_40_201
-bash-2.05b$ cat foo.pl
#!/usr/local/bin/perl
while ( <> )
{
        s/[\r\n]//g;

        if ( /^([A-Z0-9\-\_]+)/i )
        {
                if ( $name ne '' ) { print "\n"; }
                $name = $1;
        }
        elsif ( /^ / )
        {
                $addr = $';
                print "set $name address $addr\n";
        }
}
0
 
LVL 16

Expert Comment

by:sjklein42
ID: 34941497
Please, I don't have the input data you are using so can't try it myself.  Can you post your input data that is not working right.
0
 
LVL 32

Author Comment

by:dpk_wal
ID: 34941503
I have posted my input data file named "d".

-bash-2.05b$ cat d
 src_57949
   10_14_60_0_22
   10_14_63_0_24
   10_14_64_0_24
 src_CR56066
   host_10_57_31_66
   host_10_57_31_67
   host_10_80_40_201
0
 
LVL 16

Expert Comment

by:sjklein42
ID: 34941538
The problem is that the new "d" file  has a blank character at the beginning of the "name" lines, and three blank characters at the beginning of each of the address lines.

Your original data had no blanks on the name lines and one blank on the address lines.

Is there a consistent rule to be followed here?  How are we to recognize the name lines?  The rule I was using was that there were no blanks at the beginning of the name lines, but that is not the case in your "d" file.

What are the rules for blanks at the beginning of the lines in the input file?
0
 
LVL 16

Expert Comment

by:sjklein42
ID: 34941581
This is what I expected to see for input.  I can adjust but we need to be able to describe the right "rule" for leading space characters.

src_57949
 10_14_60_0_22
 10_14_63_0_24
 10_14_64_0_24
src_CR56066
 host_10_57_31_66
 host_10_57_31_67
 host_10_80_40_201 

Open in new window

0
 
LVL 32

Author Comment

by:dpk_wal
ID: 34941613
I removed all leading spaces; now I do not get anything at all:
-bash-2.05b$ perl foo.pl d







-bash-2.05b$ cat d
src_57949
10_14_60_0_22
10_14_63_0_24
10_14_64_0_24
src_CR56066
host_10_57_31_66
host_10_57_31_67
host_10_80_40_201

-bash-2.05b$ cat foo.pl
#!/usr/local/bin/perl
while ( <> )
{
        s/[\r\n]//g;

        if ( /^([A-Z0-9\-\_]+)/i )
        {
                if ( $name ne '' ) { print "\n"; }
                $name = $1;
        }
        elsif ( /^ / )
        {
                $addr = $';
                print "set $name address $addr\n";
        }
}
0
 
LVL 16

Expert Comment

by:sjklein42
ID: 34941662
I provided the solution program and debugged the problem with your input.  Why no points?
0
 
LVL 16

Expert Comment

by:sjklein42
ID: 34941719
dpk_wal,

I am sorry you are getting frustrated, but  if you look at the data I posted, there is one leading space on the lines with addresses, and no leading spaces on the lines with names.  You must have changed that when you copied it into your test.

Why do you keep changing the format of the input data?
0
 
LVL 32

Author Comment

by:dpk_wal
ID: 34941796
Oops looks like error by me; I wanted to give you 500 points! I will check; thank you for objecting.
0
 
LVL 16

Expert Comment

by:sjklein42
ID: 34941816
dpk_wal,

Thank you, friend.  But I think you may have clicked the wrong button again.
0
 
LVL 32

Author Comment

by:dpk_wal
ID: 34942185
Sorry been a crazy day; I think I just clicked a post with code; and in both cases it was the output I was putting forward; sorry again!
0

Featured Post

Want to be a Web Developer? Get Certified Today!

Enroll in the Certified Web Development Professional course package to learn HTML, Javascript, and PHP. Build a solid foundation to work toward your dream job!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

There are many situations when we need to display the data in sorted order. For example: Student details by name or by rank or by total marks etc. If you are working on data driven based projects then you will use sorting techniques very frequently.…
Checking the Alert Log in AWS RDS Oracle can be a pain through their user interface.  I made a script to download the Alert Log, look for errors, and email me the trace files.  In this article I'll describe what I did and share my script.
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
Six Sigma Control Plans
Suggested Courses

801 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question