[Last Call] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 476
  • Last Modified:

string parsing

Hi all,

This is my input.
1000,1010,2000,3000,4
1001,1010,2000,3000,4
1002,1010,2000,3000,4
1003,1011,2010,3000,4
1004,1011,2010,3000,4
1005,1012,2010,3000,4
1006,1012,2010,3000,4
1007,1012,2010,3000,4
1008,1012,2010,3000,4
1010,2000,3000,,3
1011,2010,3000,,3
1012,2010,3000,,3
2000,3000,,,2
2010,3000,,,2

This is what I want as output.
3000,2000,1010,1000,4
3000,2000,1010,1001,4
3000,2000,1010,1002,4
3000,2010,1011,1003,4
3000,2010,1011,1004,4
3000,2010,1012,1005,4
3000,2010,1012,1006,4
3000,2010,1012,1007,4
3000,2010,1012,1008,4
3000,2000,1010,,3
3000,2010,1011,,3
3000,2010,1012,,3
3000,2000,,,2
3000,2010,,,2

It is important that the exact number of commas should appear in the output.

Can somebody suggest me an awk/anything script for this.
Perl is not preferable but if it is a simple solution, would be certainly appreciated.

Manav
0
manav_mathur
Asked:
manav_mathur
  • 22
  • 11
  • 6
  • +2
3 Solutions
 
manav_mathurAuthor Commented:
I get a solution as
perl -aF, -ne 'chomp; @arr=split /,/ ; my $max=$arr[$#arr] ;for (my $i=0;$i<($max-1)/2 ;$i++) {my $temp=$arr[$i] ;$arr[$i]=$arr[$max-1-$i] ;$arr[$max-1-$i]=$temp ;} local $"="," ;print "@arr\n" ;'

Can anybody come up with a more elegant one??

Manav
0
 
lynxlupodianCommented:
awk '{FS=",";print $4","$3","$2","$1","$5}'

this works, but the commas are put on the start, due to the shuffling. Is just the number of tehm important or the order too?
0
 
manav_mathurAuthor Commented:
Yup.
And I had already tried the solution you proposed.
I require the exact output format as I have shown there. This is because, the software that process
this data takes a ,, as a null field. Now inside the record, I need the correct placement of null fields.

Manav
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
manav_mathurAuthor Commented:
I still maintain that a UNIX based solution will be preferable over a PERL solution.

Manav
0
 
lynxlupodianCommented:
Are the fields always in growing/falling order? Then a sort would be possible ...
0
 
manav_mathurAuthor Commented:
Although, I am thinking now, with ~120000+ of these rows, perl might prove to be more efficient.

How do you put a pointer question here

Manav
0
 
manav_mathurAuthor Commented:
>Are the fields always in growing/falling order?

Not necessarily. And that reminds me that I havent tried out my PERL solution with unordered fields.

Manav
0
 
manav_mathurAuthor Commented:
One of the point regarding the data is that the last field in a row always contains a number equal to the non-null fields in the rest of the record(excluding the last field itself)

If that helps.

Manav
0
 
lynxlupodianCommented:
hmm, if the commas always "gather" at the end, you could do it with a sed script over the awk line (or advanced awk).
0
 
manav_mathurAuthor Commented:
and in the input data, except the last field, all null fields will always be placed after all non-null fields. eg there can never be a situation where there is a null field at position $2 and a non-null field at $3.

Manav
0
 
manav_mathurAuthor Commented:
How do you propose to do so??

Manav
0
 
lynxlupodianCommented:
search for commas at the beginning and delete them, then add them to the end one.
0
 
manav_mathurAuthor Commented:
lynxlupodian,

my awk/sed is not that strong. Can you put down a sample code and maybe we can tune it??

Manav
0
 
lynxlupodianCommented:
Yeah, I'm trying. Mine isn't either, but there is some way to do it. Reading on sed atm.
0
 
lynxlupodianCommented:
I had to put a blank line in temp(your input), since awk seems to garble the first line, and removed it afterwards.

awk '{FS=",";print $4","$3","$2","$1","$5}' temp | tail -n +2 | sed -r -e 's/^,+.*,/&,,/g' -e 's/^,+//g'
This creates too much or too little of the commas, depending on how many you set as a replacement on the first sed expression. :/
You could solve that by searching that output for exactly 4 commas and deleting them as needed, but that's further complicating the matter. Which means even less effectivity.
0
 
manav_mathurAuthor Commented:
What does -r do?? I dont think it is supported on my system

Manav
0
 
manav_mathurAuthor Commented:
The logic is something like,

1) read the last column. let it be n.
2) reverse the array formed by the first n fields in the record. leave the rest of the fields in the record as they are.

Manav
0
 
lynxlupodianCommented:
-r enables extended Regular Expressions for sed, so the + I used works.

Well, bash knows arrays, but I haven't used them yet. Look at this:
http://www.tldp.org/LDP/abs/html/arrays.html
... if you're using bash, of course. I don't know if other shells support it nor how.
0
 
manav_mathurAuthor Commented:
Im on ksh :(

Manav
0
 
lynxlupodianCommented:
0
 
manav_mathurAuthor Commented:
Tried here but no luck :/

Manav

0
 
lynxlupodianCommented:
Too bad, I'm at the end of my capabilities now. :/

In awk, you could test if the $vars are empty and make a big if/then test. Awk scripts aren't hard though, look at man awk for details. Here's a slow intro to (g)awk:
http://tille.soti.org/training/bash/ch06.html
0
 
ahoffmannCommented:
sed -e 's/\(.*\),,,/,,\1,/' -e 's/\(.*\),,/,\1,/' file|awk -F, '{printf"%s,%s,%s,%s,%s\n",$4,$3,$2,$1,$5}'

# tested widt Gnu's sed and awk, AT&T ones may behave slighly different
0
 
gisellla_igrCommented:
my solution, (no matter how many columns or fields), this solution is general.
awk -v OFS="," '
{
  revLine=""
  commas=""
  numFields=split($0, arrLine, OFS)
  numNotNullFields=numFields-1
  for (i=numNotNullFields; i>=1;i--) {
     if (arrLine[i] == "") {
        commas=commas OFS
     }
     else {
        revLine=revLine arrLine[i]
        if (i!= 1) {
           revLine=revLine OFS
        }
     }
  }
  revLine = revLine commas
  sub(/$/,OFS arrLine[numFields],revLine)
  print revLine
  #print  numFields "=" arrLine[1] "-" arrLine[2] "-"arrLine[3] "-"arrLine[4] "-"arrLine[5]

}
' temp
0
 
manav_mathurAuthor Commented:
Thanks to all for your suggestions.

1) Is gisella'a solution more efficient than the one I proposed in PERL?? I mean, I'll go with PERL only if thee is a lot of difference in efficiency( I have ~120000 rows). In all other cases, I'll go with UNIX based solution.

2) ahoffman - thanx for you solution. Looks good to me. The only problem I see is when for example, the numbe of fields changes. although thats not in the scope now...so ur solution will work.


I'll get back to you in 2 days with solution woked out. I dont have access to my system

and special thanx to lynxlupodian for his persistence.

Manav
0
 
lynxlupodianCommented:
you can see what's more efficient (faster in my interpretation) by running both with "time", if you have it, then comparing the results. See man time for details.

example:
--------------------------------------
navaden@lynxlynxsp psCVS $ time ls
a.out  cal3d/  planeshift/  temp

real    0m0.132s
user    0m0.002s
sys     0m0.004s
--------------------------------------
0
 
lynxlupodianCommented:
And don't worry about ahoffmann's solution - if you needed more fields, you would just add more of the similar sed expressions to match.
0
 
ozoCommented:
perl -aF, -ne '$b=1;@F[0..$c]=reverse@F[0..$c]if$c=(grep{$b&&=$_}@F[0..$#F-1])-1;print join",",@F'
0
 
manav_mathurAuthor Commented:
Ozo's solution leaves me mostly in awe.
Can you explain me the main part

if$c=(grep{$b&&=$_}@F[0..$#F-1])-1

step by step please......

Manav
0
 
ozoCommented:
scalar(grep{$.&&=$_}@F[0..$#F-1])
counts the number of nonblank fields at the beginning of the line, stopping at the first false field, and not counting the last field.
($. will be set true on each line, so you don't need the $b=1;)
scalar(grep{$_}@F)
would count the total number of true fields
@F[0..$#F-1] omits the last field
$.&&=$_
starts true, and becomes false the first time $_ is false.
$c=(grep{$.&&=$_}@F[0..$#F-1])-1
subtracts 1 and assigns the value to $c
STATEMENT if EXPR
evaluates EXPR, and executes STATEMENT if EXPR is true

since '0' is false, the above could fail on
0,2010,,,2
To fix that, you could use either
$c=(grep{$.&&=/./}@F[0..$#F-1])-1
or
$c=(grep{$.&&=length}@F[0..$#F-1])-1
since the last field will end with a newline, you could also use
$c=(grep{$.&&=/.\z/}@F)-1
0
 
ahoffmannCommented:
> ( I have ~120000 rows).
you'll get problems with sed on most *nix (if not Gnu sed).
go with awk or perl when dealing with huge data
0
 
ahoffmannCommented:
is ozo an alias for Larry?
perfect explanation :-))
0
 
ozoCommented:
No, Larry would not make as many mistakes as I do.
0
 
ozoCommented:
> 1) read the last column. let it be n.
> 2) reverse the array formed by the first n fields in the record. leave the rest of the fields in the record as they are.
Sorry, I missed that.
I thought it was the blank fields which delimited the portion of the record to be reverersed.

This makes the problem easier:
 perl -aF, -ne '@F[0..$F[-1]-1]=reverse@F[0..$F[-1]-1];print join",",@F'
0
 
manav_mathurAuthor Commented:
Sorry for the points split ppl if any of you think its unfair. Everybody chipped in and everybody's solution was correct.

2 days bfore, I was jumping jacks all over the town that I had learnt to write perl one-liners(Ozo inspired) , or sed/awk's like ahoffman and all the other brilliant minds here. But the simplicity of your solutions is amazing.

ahoffman, sorry for no points, but I thought, you being the page editor, didnt need any :)

thanks a lot and cheers,
Manav
0
 
manav_mathurAuthor Commented:
Ozo,

I believe the last comment you posted is the best possible solution.

Manav
0
 
ozoCommented:
well, if you're golfing, there's also

perl -aF, -pe '@F[0..$_]=reverse@F[0..$_]for$F[-1]-1;$_=join",",@F'
0
 
manav_mathurAuthor Commented:
nice one, but I dont have the need to golf right now. Satisfied with your last solution.

The problem was, I disnt know the reverse function in PERL. even if u see my code(the very first comment), most of it is basic array-reversal. :/


Manav
0
 
manav_mathurAuthor Commented:
Hey Ozo,

while at the topic of golfing

Isnt $"=,;print @F better than print join",",@F ??Ofcourse we would have to take care that it doesnt affect other print statements. But as long as there is only one print ina single block(mostly the case), ........

Manav


0
 
manav_mathurAuthor Commented:
What does it take to become a PE/moderator here??

Manav
0
 
ozoCommented:
well,
$"=,;print @F
would have to be
$"=',';print"@F"
or
$,=",";print@F
which still doesn't beat
$_=join',',@F

But while I do try to simplify the program by eliminating unnecessary statements, unless I'm golfing I don't usually tend to bring in extra special variables for the sole purpose of reducing character count.
0
 
ahoffmannCommented:
>  What does it take to become a PE/moderator here??
you act as expert, earn points (or not), then a admin might get informed about you and checks your postings ... then probably decides to invite you as volunteer, moderator, etc.  .. I'm not really shure, nor am I involved in such selections.
If you're interested, have a look at the Customer Support TA, lot of admins, mods, PEs post there ...
0
 
manav_mathurAuthor Commented:
Okay!!

But i guess my postings are not that accurate t omake me qualify for that.

neways, thanks ahoffman.
0
 
manav_mathurAuthor Commented:
Right Ozo,

I was talking about the '-ne' solution. As you have posted a solution with -pe already, you win!!

Manav

0

Featured Post

VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

  • 22
  • 11
  • 6
  • +2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now