Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 188
  • Last Modified:

Parsing Files

I have many large files that contain data like the below example.  I would like to get some input on the best way to parse these files so that each part of the data (item number, description, price, etc) can be separated and then packed into an excel file (where the item number would be in like cell A1 then the description in cell A2).

Just fishing for some ideas on what the most efficient way of doing this may be.

Example 1:
1  1  020-140  CS  26 X 36 REG BLK GARBAGE BAG  12.80  YY  12.80  13-E1-04   250/CASE  91/SK

Example 2:
1  1  375-590  PC  MA612 12"BLUE HD PLST.DUSTPAN  2.30  YY  2.30  18-C2-02  EACH (12PCS/CS MA612)

  • 4
  • 3
  • 2
  • +2
1 Solution
Let's say you have a fixed length file, and
field one, from char 1 to char2
field two , from char 3 to char5

so here is shell script

f1=`cat file.txt| cut -c1-2`
f2=`cat file.txt| cut -c3-5`
echo $f1,$f2

Why not just convert it to a CSV file and import it into Excel?

you can seperate each item based on spaces in to variables(if this is what you are looking for)
($item_no,$description, $price $something ... ) = split;

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Thinking about it, you can open the data as is into Excel (open as a txt file).  No conversions required.

Perhaps you're not telling us the full story.
TyrantAuthor Commented:
HamdyHassan: the spaces between fields is different.

Tintin: convert to CSV? text file is no good...need to I have each part of the data in its own cell

prady21: theres spaces in the description...which is also varried length

What I need to do is break each entry down to item #, description, price, package size, etc.  
Once I've got those individual values im going to be adding them into an excel spreadsheet into the matching cells.

Its 7:30am...sorry if i haven't explained it any clearer..try to once i get a couple of cups of coffee into me ;)

Thanks fors posting
"the spaces between fields is different" means it's not fixed length records? yes/no?

Do you have spaces inside the same field?
Like spaces in desc field for example?
If yes, let me know
If no, then we could use perl that split with one or more spaces

TyrantAuthor Commented:
no they're not fixed length records
and yes there is spaces in the description field
basically a script for this would look similar to the following ready-to-go script.
it will parse your file line by line and match it against
a regular expression that splits the fields.
the fields are written into a new file, stripped of leading/trailing spaces and delimited by a semicolon.

you can easily import this new file into excel or a darabase:

#!perl -w
use strict;

my $file = $ARGV[0] or die "No input-file specified!";
die "file <$file> not found!" unless -e $file;
die "<$file> is not a file!" unless -f $file;

my $ofile = $ARGV[1] or die "No output-file specified!";
die "<$ofile> is a directory!" if -d $ofile;
die "file <$ofile> already exists!" if -e $ofile;

my $line="";

open IN, "<$file" or die "Cannot open file <$file>!\n$!";
open OUT, ">$ofile" or die "Cannot open file <$ofile> for writing!\n$!";

while ($line=<IN>)
  # example: $line = "1  5  020-140  CS  26 X 36 REG BLK GARBAGE BAG  12.80  YY  12.80  13-E1-04   250/CASE  91/SK";

  my @values = ( $line =~ m%^(\d)\s+                        #will match the first field (1)
                            (\d)\s+                        #will match the second field (5)
                            (\d{3}-\d{3})\s+               #will match the third field (020-140)
                            (\w\w)\s+                      #will match the fourth field (CS)
                            (.+?)\s{2,}                    #will math everything up to the price
                            (\d+\.\d+)\s+YY\s+             #will match the price, >YY< is ignored because it is always the same?
                            (\d+\.\d+)\s+                  #will match the second price
                            ([\d\w]+-[\d\w]+-[\d\w]+)\s+   #will match 13-E1-04
                            (.+)$%xi);                    #will match the rest

  print "line skipped: $line" unless @values;
  print OUT join ";", @values, "\n"; #replace ; by \t for copy and paste to excel

close IN;
close OUT;

running this script on a file with the examples you provided, the following output is generated:

1;1;020-140;CS;26 X 36 REG BLK GARBAGE BAG;12.80;12.80;13-E1-04;250/CASE  91/SK;
1;1;375-590;PC;MA612 12"BLUE HD PLST.DUSTPAN;2.30;2.30;18-C2-02;EACH (12PCS/CS MA612);

please replace

print "line skipped: $line" unless @values;
 print OUT join ";", @values, "\n"; #replace ; by \t for copy and paste to excel


if (@values)
  { print OUT join ";", @values, "\n"; #replace ; by \t for copy and paste to excel }
  { print "line skipped: $line"; }
TyrantAuthor Commented:
no they're not fixed length records
and yes there is spaces in the description field
TyrantAuthor Commented:
Thanks for the input guy but i found another solution...i didn't relize that each field had a set amount of characters to use.  
so now i break the strings apart and directly input them into they're excel cells

Ive never dealt with a Maytech system, (which this particular script logs into, gets the price quote, and parses it into a spreadsheet on the local users box), so Im still learning about it....quite quickly as much to my dismay :/

now...next question...how should i distribute the points?
count the characters each expert typed and calculate the percentage ;-)


Featured Post

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

  • 4
  • 3
  • 2
  • +2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now