Solved

Shell script needed to parse fields of text files, fastest way possible.

Posted on 2006-06-08
4
1,631 Views
Last Modified: 2007-12-19
I have a bunch of text files i need to parse only a few items from, for example:

A sample text file:
   X4066414  APPLES                    2.500      LB
1231      Acme Mart Co                                 36.1200      90.3000      90.3000
1414      Foliage Acres, Inc.                           37.9700      94.9250      94.9250

  Z1064411  ORANGES                  2.750      LB
1231      Acme Mart Co                                 33.2300      91.3825      91.3825
1414      Foliage Acres, Inc.                           34.4400      94.7100      94.7100

I am checking hundreds of these .txt files, the only thing i need from them is the list
of companies and their ID in the following format so I can import into a database or spreadsheet.

1231;Acme Mart Co
1414;Foliage Acres, Inc.

Some of these files have 10, 20, or more entries per .txt file like the above.

Some will also have more then 2 companies like:
 Z1064411  ORANGES                  2.750      LB
1231      Acme Mart Co                                 33.2300      91.3825      91.3825
1414      Foliage Acres, Inc.                           34.4400      94.7100      94.7100
1414      Grover Brand, Inc.                           34.4400      94.7100      94.7100

There is a space between each of the groups of items, ie (APPLES, ORANGES) as shown in the above example if that helps to just pull the main companies and their respective ID's out.

Any help would be appreciated, thanks in advance!



0
Comment
Question by:cybrthug
  • 2
4 Comments
 
LVL 84

Expert Comment

by:ozo
ID: 16867370
What marks the end of "Foliage Acres, Inc." that tells you that "34.4400" is not part of the company name?
How do you know that "APPLES" and "ORANGES" are not company names?
0
 

Author Comment

by:cybrthug
ID: 16867463
The 4 Digit ID is the only thing that will help in locating the region to start with.
From there you'd have to check how many 4 digit ID's there are and then rip maybe up to 40 characters one space after it. Does that help any?
The first line can always be skipped as each file will start like this:
Z1064411  ORANGES                  2.750      LB

drop to the second line always and start to count how many company ID's there are maybe before it hits the next blank line to start the next group for APPLES, which I will not need. As long as I can get the first group from each file that is all that is important.
0
 
LVL 7

Accepted Solution

by:
glassd earned 125 total points
ID: 16868442
Assuming the lines of interest always start with a digit, and always contain four value fields at the end:

awk '/^[0-9]/{
  printf("%s;",$1)
  for(i=2;i<=(NF-3);i++) {
    printf("%s ",$i)
  }
  print ""
}' <filename> | sort -u
0
 

Author Comment

by:cybrthug
ID: 16875279
Excellent, thank you!
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Installing FreeBSD… FreeBSD is a darling of an operating system. The stability and usability make it a clear choice for servers and desktops (for the cunning). Savvy?  The Ports collection makes available every popular FOSS application and packag…
FreeBSD on EC2 FreeBSD (https://www.freebsd.org) is a robust Unix-like operating system that has been around for many years. FreeBSD is available on Amazon EC2 through Amazon Machine Images (AMIs) provided by FreeBSD developer and security office…
Learn how to navigate the file tree with the shell. Use pwd to print the current working directory: Use ls to list a directory's contents: Use cd to change to a new directory: Use wildcards instead of typing out long directory names: Use ../ to move…
In a previous video, we went over how to export a DynamoDB table into Amazon S3.  In this video, we show how to load the export from S3 into a DynamoDB table.

920 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now