Solved

Shell script needed to parse fields of text files, fastest way possible.

Posted on 2006-06-08
4
1,643 Views
Last Modified: 2007-12-19
I have a bunch of text files i need to parse only a few items from, for example:

A sample text file:
   X4066414  APPLES                    2.500      LB
1231      Acme Mart Co                                 36.1200      90.3000      90.3000
1414      Foliage Acres, Inc.                           37.9700      94.9250      94.9250

  Z1064411  ORANGES                  2.750      LB
1231      Acme Mart Co                                 33.2300      91.3825      91.3825
1414      Foliage Acres, Inc.                           34.4400      94.7100      94.7100

I am checking hundreds of these .txt files, the only thing i need from them is the list
of companies and their ID in the following format so I can import into a database or spreadsheet.

1231;Acme Mart Co
1414;Foliage Acres, Inc.

Some of these files have 10, 20, or more entries per .txt file like the above.

Some will also have more then 2 companies like:
 Z1064411  ORANGES                  2.750      LB
1231      Acme Mart Co                                 33.2300      91.3825      91.3825
1414      Foliage Acres, Inc.                           34.4400      94.7100      94.7100
1414      Grover Brand, Inc.                           34.4400      94.7100      94.7100

There is a space between each of the groups of items, ie (APPLES, ORANGES) as shown in the above example if that helps to just pull the main companies and their respective ID's out.

Any help would be appreciated, thanks in advance!



0
Comment
Question by:cybrthug
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
4 Comments
 
LVL 84

Expert Comment

by:ozo
ID: 16867370
What marks the end of "Foliage Acres, Inc." that tells you that "34.4400" is not part of the company name?
How do you know that "APPLES" and "ORANGES" are not company names?
0
 

Author Comment

by:cybrthug
ID: 16867463
The 4 Digit ID is the only thing that will help in locating the region to start with.
From there you'd have to check how many 4 digit ID's there are and then rip maybe up to 40 characters one space after it. Does that help any?
The first line can always be skipped as each file will start like this:
Z1064411  ORANGES                  2.750      LB

drop to the second line always and start to count how many company ID's there are maybe before it hits the next blank line to start the next group for APPLES, which I will not need. As long as I can get the first group from each file that is all that is important.
0
 
LVL 7

Accepted Solution

by:
glassd earned 125 total points
ID: 16868442
Assuming the lines of interest always start with a digit, and always contain four value fields at the end:

awk '/^[0-9]/{
  printf("%s;",$1)
  for(i=2;i<=(NF-3);i++) {
    printf("%s ",$i)
  }
  print ""
}' <filename> | sort -u
0
 

Author Comment

by:cybrthug
ID: 16875279
Excellent, thank you!
0

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
how to check for shares on aix 1 113
Bash Script to Analyze Oracle Schemas 11 130
how to send mail in unix 2 50
comm diff cmp unix commands 2 23
A metadevice consists of one or more devices (slices). It can be expanded by adding slices. Then, it can be grown to fill a larger space while the file system is in use. However, not all UNIX file systems (UFS) can be expanded this way. The conca…
This tech tip describes how to install the Solaris Operating System from a tape backup that was created using the Solaris flash archive utility. I have used this procedure on the Solaris 8 and 9 OS, and it shoudl also work well on the Solaris 10 rel…
Learn how to get help with Linux/Unix bash shell commands. Use help to read help documents for built in bash shell commands.: Use man to interface with the online reference manuals for shell commands.: Use man to search man pages for unknown command…
This video shows how to set up a shell script to accept a positional parameter when called, pass that to a SQL script, accept the output from the statement back and then manipulate it in the Shell.

752 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question