?
Solved

Parsing Files

Posted on 2003-03-24
12
Medium Priority
?
187 Views
Last Modified: 2010-03-05
I have many large files that contain data like the below example.  I would like to get some input on the best way to parse these files so that each part of the data (item number, description, price, etc) can be separated and then packed into an excel file (where the item number would be in like cell A1 then the description in cell A2).

Just fishing for some ideas on what the most efficient way of doing this may be.

Example 1:
1  1  020-140  CS  26 X 36 REG BLK GARBAGE BAG  12.80  YY  12.80  13-E1-04   250/CASE  91/SK

Example 2:
1  1  375-590  PC  MA612 12"BLUE HD PLST.DUSTPAN  2.30  YY  2.30  18-C2-02  EACH (12PCS/CS MA612)


Thanks
0
Comment
Question by:Tyrant
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
  • 2
  • +2
12 Comments
 
LVL 9

Expert Comment

by:HamdyHassan
ID: 8199661
Let's say you have a fixed length file, and
field one, from char 1 to char2
field two , from char 3 to char5
....

so here is shell script


f1=`cat file.txt| cut -c1-2`
f2=`cat file.txt| cut -c3-5`
echo $f1,$f2




0
 
LVL 48

Expert Comment

by:Tintin
ID: 8199785
Why not just convert it to a CSV file and import it into Excel?

0
 
LVL 3

Expert Comment

by:prady_21
ID: 8199796
you can seperate each item based on spaces in to variables(if this is what you are looking for)
($item_no,$description, $price $something ... ) = split;

0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
LVL 48

Expert Comment

by:Tintin
ID: 8199920
Thinking about it, you can open the data as is into Excel (open as a txt file).  No conversions required.

Perhaps you're not telling us the full story.
0
 

Author Comment

by:Tyrant
ID: 8202319
HamdyHassan: the spaces between fields is different.

Tintin: convert to CSV? text file is no good...need to I have each part of the data in its own cell

prady21: theres spaces in the description...which is also varried length

What I need to do is break each entry down to item #, description, price, package size, etc.  
Once I've got those individual values im going to be adding them into an excel spreadsheet into the matching cells.

Its 7:30am...sorry if i haven't explained it any clearer..try to once i get a couple of cups of coffee into me ;)

Thanks fors posting
0
 
LVL 9

Expert Comment

by:HamdyHassan
ID: 8203641
"the spaces between fields is different" means it's not fixed length records? yes/no?




Do you have spaces inside the same field?
Like spaces in desc field for example?
If yes, let me know
If no, then we could use perl that split with one or more spaces


0
 

Author Comment

by:Tyrant
ID: 8204896
no they're not fixed length records
and yes there is spaces in the description field
0
 
LVL 6

Expert Comment

by:holli
ID: 8205806
basically a script for this would look similar to the following ready-to-go script.
it will parse your file line by line and match it against
a regular expression that splits the fields.
the fields are written into a new file, stripped of leading/trailing spaces and delimited by a semicolon.

you can easily import this new file into excel or a darabase:

#!perl -w
use strict;

my $file = $ARGV[0] or die "No input-file specified!";
die "file <$file> not found!" unless -e $file;
die "<$file> is not a file!" unless -f $file;

my $ofile = $ARGV[1] or die "No output-file specified!";
die "<$ofile> is a directory!" if -d $ofile;
die "file <$ofile> already exists!" if -e $ofile;

my $line="";

open IN, "<$file" or die "Cannot open file <$file>!\n$!";
open OUT, ">$ofile" or die "Cannot open file <$ofile> for writing!\n$!";

while ($line=<IN>)
{
  # example: $line = "1  5  020-140  CS  26 X 36 REG BLK GARBAGE BAG  12.80  YY  12.80  13-E1-04   250/CASE  91/SK";

  my @values = ( $line =~ m%^(\d)\s+                        #will match the first field (1)
                            (\d)\s+                        #will match the second field (5)
                            (\d{3}-\d{3})\s+               #will match the third field (020-140)
                            (\w\w)\s+                      #will match the fourth field (CS)
                            (.+?)\s{2,}                    #will math everything up to the price
                            (\d+\.\d+)\s+YY\s+             #will match the price, >YY< is ignored because it is always the same?
                            (\d+\.\d+)\s+                  #will match the second price
                            ([\d\w]+-[\d\w]+-[\d\w]+)\s+   #will match 13-E1-04
                            (.+)$%xi);                    #will match the rest

  print "line skipped: $line" unless @values;
  print OUT join ";", @values, "\n"; #replace ; by \t for copy and paste to excel
}

close IN;
close OUT;


running this script on a file with the examples you provided, the following output is generated:

1;1;020-140;CS;26 X 36 REG BLK GARBAGE BAG;12.80;12.80;13-E1-04;250/CASE  91/SK;
1;1;375-590;PC;MA612 12"BLUE HD PLST.DUSTPAN;2.30;2.30;18-C2-02;EACH (12PCS/CS MA612);



holli
0
 
LVL 6

Expert Comment

by:holli
ID: 8205827
please replace

print "line skipped: $line" unless @values;
 print OUT join ";", @values, "\n"; #replace ; by \t for copy and paste to excel


by

if (@values)
  { print OUT join ";", @values, "\n"; #replace ; by \t for copy and paste to excel }
else
  { print "line skipped: $line"; }
 
0
 

Author Comment

by:Tyrant
ID: 8206138
no they're not fixed length records
and yes there is spaces in the description field
0
 

Author Comment

by:Tyrant
ID: 8207919
Thanks for the input guy but i found another solution...i didn't relize that each field had a set amount of characters to use.  
so now i break the strings apart and directly input them into they're excel cells

Ive never dealt with a Maytech system, (which this particular script logs into, gets the price quote, and parses it into a spreadsheet on the local users box), so Im still learning about it....quite quickly as much to my dismay :/

now...next question...how should i distribute the points?
;)
0
 
LVL 6

Accepted Solution

by:
holli earned 500 total points
ID: 8211804
count the characters each expert typed and calculate the percentage ;-)

XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
0

Featured Post

Enroll in August's Course of the Month

August's CompTIA IT Fundamentals course includes 19 hours of basic computer principle modules and prepares you for the certification exam. It's free for Premium Members, Team Accounts, and Qualified Experts!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Email validation in proper way is  very important validation required in any web pages. This code is self explainable except that Regular Expression which I used for pattern matching. I originally published as a thread on my website : http://www…
I have been pestered over the years to produce and distribute regular data extracts, and often the request have explicitly requested the data be emailed as an Excel attachement; specifically Excel, as it appears: CSV files confuse (no Red or Green h…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans

777 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question