Solved

Shell script needed to parse fields of text files, fastest way possible.

Posted on 2006-06-08
4
1,625 Views
Last Modified: 2007-12-19
I have a bunch of text files i need to parse only a few items from, for example:

A sample text file:
   X4066414  APPLES                    2.500      LB
1231      Acme Mart Co                                 36.1200      90.3000      90.3000
1414      Foliage Acres, Inc.                           37.9700      94.9250      94.9250

  Z1064411  ORANGES                  2.750      LB
1231      Acme Mart Co                                 33.2300      91.3825      91.3825
1414      Foliage Acres, Inc.                           34.4400      94.7100      94.7100

I am checking hundreds of these .txt files, the only thing i need from them is the list
of companies and their ID in the following format so I can import into a database or spreadsheet.

1231;Acme Mart Co
1414;Foliage Acres, Inc.

Some of these files have 10, 20, or more entries per .txt file like the above.

Some will also have more then 2 companies like:
 Z1064411  ORANGES                  2.750      LB
1231      Acme Mart Co                                 33.2300      91.3825      91.3825
1414      Foliage Acres, Inc.                           34.4400      94.7100      94.7100
1414      Grover Brand, Inc.                           34.4400      94.7100      94.7100

There is a space between each of the groups of items, ie (APPLES, ORANGES) as shown in the above example if that helps to just pull the main companies and their respective ID's out.

Any help would be appreciated, thanks in advance!



0
Comment
Question by:cybrthug
  • 2
4 Comments
 
LVL 84

Expert Comment

by:ozo
ID: 16867370
What marks the end of "Foliage Acres, Inc." that tells you that "34.4400" is not part of the company name?
How do you know that "APPLES" and "ORANGES" are not company names?
0
 

Author Comment

by:cybrthug
ID: 16867463
The 4 Digit ID is the only thing that will help in locating the region to start with.
From there you'd have to check how many 4 digit ID's there are and then rip maybe up to 40 characters one space after it. Does that help any?
The first line can always be skipped as each file will start like this:
Z1064411  ORANGES                  2.750      LB

drop to the second line always and start to count how many company ID's there are maybe before it hits the next blank line to start the next group for APPLES, which I will not need. As long as I can get the first group from each file that is all that is important.
0
 
LVL 7

Accepted Solution

by:
glassd earned 125 total points
ID: 16868442
Assuming the lines of interest always start with a digit, and always contain four value fields at the end:

awk '/^[0-9]/{
  printf("%s;",$1)
  for(i=2;i<=(NF-3);i++) {
    printf("%s ",$i)
  }
  print ""
}' <filename> | sort -u
0
 

Author Comment

by:cybrthug
ID: 16875279
Excellent, thank you!
0

Featured Post

Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

Join & Write a Comment

Installing FreeBSD… FreeBSD is a darling of an operating system. The stability and usability make it a clear choice for servers and desktops (for the cunning). Savvy?  The Ports collection makes available every popular FOSS application and packag…
Using libpcap/Jpcap to capture and send packets on Solaris version (10/11) Library used: 1.      Libpcap (http://www.tcpdump.org) Version 1.2 2.      Jpcap(http://netresearch.ics.uci.edu/kfujii/Jpcap/doc/index.html) Version 0.6 Prerequisite: 1.      GCC …
Learn how to get help with Linux/Unix bash shell commands. Use help to read help documents for built in bash shell commands.: Use man to interface with the online reference manuals for shell commands.: Use man to search man pages for unknown command…
This video shows how to set up a shell script to accept a positional parameter when called, pass that to a SQL script, accept the output from the statement back and then manipulate it in the Shell.

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

14 Experts available now in Live!

Get 1:1 Help Now