[Okta Webinar] Learn how to a build a cloud-first strategyRegister Now

x
?
Solved

modify/sort data in text file

Posted on 2011-02-19
19
Medium Priority
?
1,015 Views
Last Modified: 2012-05-11
Hi,

I have a 30,000 line text file and need some help with data formatting.

Let me use an example; say I have lines as below:
C
 host_10_57_31_66
 host_10_57_31_67
 host_10_80_40_201
 host_10_80_40_202
A
 10_14_60_0_22       
 10_14_63_0_24       
 10_14_64_0_24       
B
 host_10_13_5_116
 host_10_13_5_117

 I want to modify as below:

set C address host_10_57_31_66
set C address host_10_57_31_67
set C address host_10_80_40_201
set C address host_10_80_40_202

set A address 10_14_60_0_22       
set A address 10_14_63_0_24       
set A address 10_14_64_0_24       

set B address host_10_13_5_116
set B address host_10_13_5_117

Thank you for all the help.
0
Comment
Question by:dpk_wal
  • 10
  • 8
19 Comments
 
LVL 16

Expert Comment

by:sjklein42
ID: 34932610
while ( <> )
{
	s/[\r\n]//g;

	if ( /^([A-Z])/ )
	{
		if ( $letter ne '' ) { print "\n"; }
		$letter = $1;
	}
	elsif ( /^ / )
	{
		$addr = $';
		print "set $letter address $addr\n";
	}
}

Open in new window


>perl foo.pl foo.txt
set C address host_10_57_31_66
set C address host_10_57_31_67
set C address host_10_80_40_201
set C address host_10_80_40_202

set A address 10_14_60_0_22
set A address 10_14_63_0_24
set A address 10_14_64_0_24

set B address host_10_13_5_116
set B address host_10_13_5_117

Open in new window

0
 
LVL 16

Expert Comment

by:sjklein42
ID: 34932614
Not sure where the "sorting" part of your project (see title) is supposed to come in.
0
 
LVL 3

Expert Comment

by:gopisera
ID: 34933195
The easy way use the sed command and replace the code with common lines
0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
LVL 32

Author Comment

by:dpk_wal
ID: 34935428
Thank you for the suggestions; I would try on Monday and get back to you.

sjklein42:
It kind of gets sorted for me as C is set which contains addresses! :)

gopisera:
Sorry but you would need to give full command than suggestion; am total zero with scripting.

Regards.
0
 
LVL 32

Author Comment

by:dpk_wal
ID: 34940237
Hi sjklein42,

This works for my sample file which has A,B,C as address names; in actual file the names are bigger and combination of alphabets, numbers and hash(-) or underscore(_).

The sample file format still remains same, I have name in the first line, followed by host_ or 10_ entries [all ending with new line character].

All entries below address set name always start with host_ or 10_ if that helps you.

Can you please post modification to your code.

Thank you.
0
 
LVL 16

Expert Comment

by:sjklein42
ID: 34940269
I am not sure if there is a problem or not.

Please post a new sample input file with the expanded format.  Can you include an example where my solution program does not give the right output?
0
 
LVL 32

Author Comment

by:dpk_wal
ID: 34940401
Here's the output:
-bash-2.05b$ perl foo.pl d
set  address src_57949
set  address   10_14_60_0_22
set  address   10_14_63_0_24
set  address   10_14_64_0_24
set  address src_CR56066
set  address   host_10_57_31_66
set  address   host_10_57_31_67
set  address   host_10_80_40_201
-bash-2.05b$ cat d
 src_57949
   10_14_60_0_22
   10_14_63_0_24
   10_14_64_0_24
 src_CR56066
   host_10_57_31_66
   host_10_57_31_67
   host_10_80_40_201

where the script works:
-bash-2.05b$ perl foo.pl foo.txt
set C address host_10_57_31_66
set C address host_10_57_31_67
set C address host_10_80_40_201
set C address host_10_80_40_202

set A address 10_14_60_0_22
set A address 10_14_63_0_24
set A address 10_14_64_0_24

set B address host_10_13_5_116
set B address host_10_13_5_117

-bash-2.05b$ cat foo.txt
C
 host_10_57_31_66
 host_10_57_31_67
 host_10_80_40_201
 host_10_80_40_202
A
 10_14_60_0_22
 10_14_63_0_24
 10_14_64_0_24
B
 host_10_13_5_116
 host_10_13_5_117

As I understand in the code we are doing:
5:       if ( /^([A-Z])/ )
per my understanding we are only matching for upper case A-Z; am not 100% sure though.

Thank you.
0
 
LVL 16

Accepted Solution

by:
sjklein42 earned 2000 total points
ID: 34940458
This version should handle names with digits, dashes and underscores, lower and uppercase.

while ( <> )
{
	s/[\r\n]//g;

	if ( /^([A-Z0-9\-\_]+)/i )
	{
		if ( $name ne '' ) { print "\n"; }
		$name = $1;
	}
	elsif ( /^ / )
	{
		$addr = $';
		print "set $name address $addr\n";
	}
}

Open in new window

0
 
LVL 32

Author Comment

by:dpk_wal
ID: 34941450
Stii doesn't work :(

-bash-2.05b$ perl foo.pl d
set  address src_57949
set  address   10_14_60_0_22
set  address   10_14_63_0_24
set  address   10_14_64_0_24
set  address src_CR56066
set  address   host_10_57_31_66
set  address   host_10_57_31_67
set  address   host_10_80_40_201
-bash-2.05b$ cat foo.pl
#!/usr/local/bin/perl
while ( <> )
{
        s/[\r\n]//g;

        if ( /^([A-Z0-9\-\_]+)/i )
        {
                if ( $name ne '' ) { print "\n"; }
                $name = $1;
        }
        elsif ( /^ / )
        {
                $addr = $';
                print "set $name address $addr\n";
        }
}
0
 
LVL 16

Expert Comment

by:sjklein42
ID: 34941497
Please, I don't have the input data you are using so can't try it myself.  Can you post your input data that is not working right.
0
 
LVL 32

Author Comment

by:dpk_wal
ID: 34941503
I have posted my input data file named "d".

-bash-2.05b$ cat d
 src_57949
   10_14_60_0_22
   10_14_63_0_24
   10_14_64_0_24
 src_CR56066
   host_10_57_31_66
   host_10_57_31_67
   host_10_80_40_201
0
 
LVL 16

Expert Comment

by:sjklein42
ID: 34941538
The problem is that the new "d" file  has a blank character at the beginning of the "name" lines, and three blank characters at the beginning of each of the address lines.

Your original data had no blanks on the name lines and one blank on the address lines.

Is there a consistent rule to be followed here?  How are we to recognize the name lines?  The rule I was using was that there were no blanks at the beginning of the name lines, but that is not the case in your "d" file.

What are the rules for blanks at the beginning of the lines in the input file?
0
 
LVL 16

Expert Comment

by:sjklein42
ID: 34941581
This is what I expected to see for input.  I can adjust but we need to be able to describe the right "rule" for leading space characters.

src_57949
 10_14_60_0_22
 10_14_63_0_24
 10_14_64_0_24
src_CR56066
 host_10_57_31_66
 host_10_57_31_67
 host_10_80_40_201 

Open in new window

0
 
LVL 32

Author Comment

by:dpk_wal
ID: 34941613
I removed all leading spaces; now I do not get anything at all:
-bash-2.05b$ perl foo.pl d







-bash-2.05b$ cat d
src_57949
10_14_60_0_22
10_14_63_0_24
10_14_64_0_24
src_CR56066
host_10_57_31_66
host_10_57_31_67
host_10_80_40_201

-bash-2.05b$ cat foo.pl
#!/usr/local/bin/perl
while ( <> )
{
        s/[\r\n]//g;

        if ( /^([A-Z0-9\-\_]+)/i )
        {
                if ( $name ne '' ) { print "\n"; }
                $name = $1;
        }
        elsif ( /^ / )
        {
                $addr = $';
                print "set $name address $addr\n";
        }
}
0
 
LVL 16

Expert Comment

by:sjklein42
ID: 34941662
I provided the solution program and debugged the problem with your input.  Why no points?
0
 
LVL 16

Expert Comment

by:sjklein42
ID: 34941719
dpk_wal,

I am sorry you are getting frustrated, but  if you look at the data I posted, there is one leading space on the lines with addresses, and no leading spaces on the lines with names.  You must have changed that when you copied it into your test.

Why do you keep changing the format of the input data?
0
 
LVL 32

Author Comment

by:dpk_wal
ID: 34941796
Oops looks like error by me; I wanted to give you 500 points! I will check; thank you for objecting.
0
 
LVL 16

Expert Comment

by:sjklein42
ID: 34941816
dpk_wal,

Thank you, friend.  But I think you may have clicked the wrong button again.
0
 
LVL 32

Author Comment

by:dpk_wal
ID: 34942185
Sorry been a crazy day; I think I just clicked a post with code; and in both cases it was the output I was putting forward; sorry again!
0

Featured Post

[Webinar] Cloud and Mobile-First Strategy

Maybe you’ve fully adopted the cloud since the beginning. Or maybe you started with on-prem resources but are pursuing a “cloud and mobile first” strategy. Getting to that end state has its challenges. Discover how to build out a 100% cloud and mobile IT strategy in this webinar.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I've just discovered very important differences between Windows an Unix formats in Perl,at least 5.xx.. MOST IMPORTANT: Use Unix file format while saving Your script. otherwise it will have ^M s or smth likely weird in the EOL, Then DO NOT use m…
How to remove superseded packages in windows w60 or w61 installation media (.wim) or online system to prevent unnecessary space. w60 means Windows Vista or Windows Server 2008. w61 means Windows 7 or Windows Server 2008 R2. There are various …
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Suggested Courses

873 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question