?
Solved

File Manipulation: Sed and unique value replacement

Posted on 2005-03-08
12
Medium Priority
?
271 Views
Last Modified: 2013-12-26
I recently have been looking for a way to help ease a process, I have been working on it via some bash scripting: but I believe I am failing to go in the right direction and need some advice in which direction I should be pointed!

Problem: CSV File modification
1. Needs additional column (Simple)
2. Host Column Field contains dynamic data located within the DESC field (Where HOST data is)
3. Any duplicate DESC fields need to be removed **worry about later
--------------------------------------------------------

Simple example of what the file presently looks like:
ID, IP, PORT, DESC

What the file needs to look like:
ID, IP, HOST, PORT, DESC

(simple sed s/IP/&\"\, \"$HOST_&/g) ...to keep $HOST and $IP unique

now I need to remove the "." with "_" to each $HOST  (simplest way to do this would be greatly appreciated)

I can also: sed -n 's/[0-9]{3} [parse ip]/&/p' | sed -n 's/DESC/&/p'  | sed -n 's/REMOVE garbage/\=/p'  .... until it leaves me with “UNIQUE_IP=CORRESPEONDING_UNIQUE_HOST”

eg: 10.10.10.1=John.thinker.net (sed again to get it to HOST_10_10_10_1=John.thinker.net)

Here is where I am having a problem trying to associate the UNIQUE_HOST with the new HOST field to the corresponding UNIQUE_IP. It's already configured; but what is the easier way to search/replace through the file until all unique hosts are replaced (besides manually)

Original:

“334”,”10.10.10.1”,””,”John.thinker.net”
“335”,”10.10.10.2”,”23”,”Telnet”

Currently:
“334”,”10.10.10.1”,”$HOST_10_10_10_1”,””,”John.thinker.net”
“335”,”10.10.10.2”,”$HOST_10_10_10_2”,”23”,”Telnet

Need:
“334”,”10.10.10.1”,”John.thinker.net”,””,”John.thinker.net”
“335”,”10.10.10.2”,”Mark.thinker.net”,”23”,”Telnet”




0
Comment
Question by:mattlal
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 6
  • 3
  • 3
12 Comments
 
LVL 51

Expert Comment

by:ahoffmann
ID: 13491201
> now I need to remove the "." with "_" to each $HOST
you mean replace . by _
  tr '.' '_' <file >newfile

hmm can't follow your other description ..
is this what you want:
  echo '"334","10.10.10.1","","John.thinker.net"' | awk -F, '{print $1","$2","$4","$3","$4}'
0
 
LVL 16

Expert Comment

by:manav_mathur
ID: 13493429
How do you generate a list of
UNIQUE_IP=CORRESPEONDING_UNIQUE_HOST
for example, where do you resolve 10.10.10.2 to mark.thinker.net??

as this information is not in your file?? Or is it the description field that you want to be put into the host field??

0
 

Author Comment

by:mattlal
ID: 13495031
Sorry for the confusion:

the $HOST information can be found in the DESC field.


as for the $HOST(HOST_$IP[without "."])  [eg.$HOST10.10.1.1 cannot work must be $HOST_10_10_1_1; but how do i manipulate only one field without changing the other fields in the file?) [I was thinking use sed ...replace the 4-7th occurance only ..but i tried s/\./\{4,7\}/\_/g with no luck (I am more than likely just using a function wrong)]

as for generating unique_ip = unique_hostname that's easy: just use muliple sed and save to a serpate file; I can have it say anything "export HOST_10_10_1_1=john.thinker.net" (although here I would probably be best off with the s/\./\_/g on first 3! same problem as before; but still not the real issue; more concerend on $HOST_($IP) replacement with correct value according to the sed'd file (or however else it can be done :) )
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 51

Expert Comment

by:ahoffmann
ID: 13495185
confusing, very confusing ...
could you please just post your input file (2 or 3 lines) and the output you expect for that, nothing else
0
 
LVL 16

Expert Comment

by:manav_mathur
ID: 13495190
What we can do in Perl is
- make one pass thru the file, storing all ip->host mapping in memory (read)
- make another pass thru the file, reading the correct host entry corresponding to a record's IP from the memory, and insert the host value into a record....that will save us the intermediate process of creating HOST_10_10_2 type files.

Is this what you want??

Manav
0
 

Author Comment

by:mattlal
ID: 13496597
Manav: that could work as well,

I was trying for a generic script to work almost anywhere; but perl has it's place to in this: Originally I thought it would have done simply with bash/sed/awk/etc. (guess not!)

ahoffmann:
------------------------------------------
Original:
IP, PORT, DESC
"10.10.10.1”,””,”John.thinker.net”
”10.10.10.2”,”23”,”Telnet”

Need:
IP,HOST,PORT,DESC
”10.10.10.1”,”John.thinker.net”,””,”John.thinker.net”
”10.10.10.2”,”Mark.thinker.net”,”23”,”Telnet”


if this helps: it's basically a custom nmap output....the needs an extra field from the resolved hostname
0
 
LVL 16

Expert Comment

by:manav_mathur
ID: 13496916
The only thing I dont uderstand is where do you get "mark.thinker.net" corresponding to 10.10.10.2.....does it occur somewhere down in the input file??

use strict ;
use warnings ;
my %host; #this will maintain our ip->host mappings
open(INFILE,"< input") or die("$! : cant open input") ;
while(<INFILE>) {
chomp ;
my ($id,$ip,$port,$desc) = split(/,/$_,4) ;
$hash{$ip}=$desc if !$hash{$ip}; ##map ip->host if entry for tthat ip doesn;t exist
}
seek(INFILE,0,0) ;
while(<INFILE>) {
my ($id,$ip,$port,$desc) = split(/,/$_,4) ;
print "$id,$ip,$host{$ip},$port,$desc" ; ##insert host for the corresonding ip
}
close(INFILE) ;
 
 

 
0
 

Author Comment

by:mattlal
ID: 13497160
oh sorry should have posted a few more lines:

"10.10.10.2”,”443”,”https"
"10.10.10.2”,”21”,”ftp”
"10.10.10.2”,””,”mark.thinker.net”

(and 10.10.10.3 may not have a hostname associated with it; but neeeds the "field" to maintain the CSV integrity).

The split should probably be done with ' "," '  [There might be ',' in the DESC field]

The DESC field contains many "banners"


0
 
LVL 16

Expert Comment

by:manav_mathur
ID: 13497274
use strict ;
use warnings ;
my %host; #this will maintain our ip->host mappings
open(INFILE,"< input") or die("$! : cant open input") ;
while(<INFILE>) {
chomp ;
my ($id,$ip,$port,$desc) = split(/,/$_,4) ;
$hash{$ip}=$desc if($desc =~ /^(?:\d{1,3}\.){3}\d+$/) ; ##map ip->host if desc matched aa.bb.cc.dd
}
seek(INFILE,0,0) ;
while(<INFILE>) {
my ($id,$ip,$port,$desc) = split(/,/$_,4) ;
print "$id,$ip,$host{$ip},$port,$desc" ; ##insert host for the corresonding ip
}
close(INFILE) ;

as split limit has been set as 4, any commas in desc will not be considered for splitting as long as there is no comma in $id,$ip,$port
0
 
LVL 16

Accepted Solution

by:
manav_mathur earned 2000 total points
ID: 13497487
Ok, is this what you want

SCRIPT
=====
use strict ;
use warnings ;
my %host; #this will maintain our ip->host mappings
open(INFILE,"< input_file") or die("$! : cant open input") ;
while(<INFILE>) {
chomp ;
my ($id,$ip,$port,$desc) = split(/,/,$_,4) ;
$host{$ip}=$desc if($desc =~ /^\"\w+\.\w+\.\w+\"$/) ;
}
seek(INFILE,0,0) ;
while(<INFILE>) {
my ($id,$ip,$port,$desc) = split(/,/,$_,4) ;
print "$id,$ip,$host{$ip},$port,$desc" ; ##insert host for the corresonding ip
}
close(INFILE) ;


INPUT
====
"334","10.10.10.1","","John.thinker.net"
"335","10.10.10.2","23","Telnet"
"336","10.10.10.2","23","Telnet"
"336","10.10.10.2","443","https"
"336","10.10.10.2","21","ftp"
"336","10.10.10.2","","mark.thinker.net"

OUTPUT
=====
"334","10.10.10.1","John.thinker.net","","John.thinker.net"
"335","10.10.10.2","mark.thinker.net","23","Telnet"
"336","10.10.10.2","mark.thinker.net","23","Telnet"
"336","10.10.10.2","mark.thinker.net","443","https"
"336","10.10.10.2","mark.thinker.net","21","ftp"
"336","10.10.10.2","mark.thinker.net","","mark.thinker.net"
0
 
LVL 16

Expert Comment

by:manav_mathur
ID: 13497547
tested it with different inputs
===================

E:\>type file1
"334","10.10.10.1","","John.thinker.net"
"335","10.10.10.2","23","Telnet"
"336","10.10.10.2","23","Telnet"
"336","10.10.10.1","443","https" ##notice this changes ip address
"336","10.10.10.1","21","ftp"      ##and here again
"336","10.10.10.2","","mark.thinker.net"

E:\>tr.pl
"334","10.10.10.1","John.thinker.net","","John.thinker.net"
"335","10.10.10.2","mark.thinker.net","23","Telnet"
"336","10.10.10.2","mark.thinker.net","23","Telnet"
"336","10.10.10.1","John.thinker.net","443","https"
"336","10.10.10.1","John.thinker.net","21","ftp"
"336","10.10.10.2","mark.thinker.net","","mark.thinker.net"
0
 
LVL 51

Expert Comment

by:ahoffmann
ID: 13498508
hmm, my suggestion was nearly perfect :-(
awk -F, '{if($3~/""/){print $1","$2","$4","$4}else{print $1","$2",\"mark.thinker.net\","$2","$3}}'

or do I miss something?
0

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction: Database storage, where is the exe actually on the disc? Playing a game selected randomly (how to generate random numbers).  Error trapping with try..catch to help the code run even if something goes wrong. Continuing from the seve…
In this post we will learn different types of Android Layout and some basics of an Android App.
This video will show you how to get GIT to work in Eclipse.   It will walk you through how to install the EGit plugin in eclipse and how to checkout an existing repository.
In this video, Percona Solution Engineer Rick Golba discuss how (and why) you implement high availability in a database environment. To discuss how Percona Consulting can help with your design and architecture needs for your database and infrastr…
Suggested Courses

764 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question