Solved

Shell script needed to parse fields of text files, fastest way possible.

Posted on 2006-06-08
4
1,644 Views
Last Modified: 2007-12-19
I have a bunch of text files i need to parse only a few items from, for example:

A sample text file:
   X4066414  APPLES                    2.500      LB
1231      Acme Mart Co                                 36.1200      90.3000      90.3000
1414      Foliage Acres, Inc.                           37.9700      94.9250      94.9250

  Z1064411  ORANGES                  2.750      LB
1231      Acme Mart Co                                 33.2300      91.3825      91.3825
1414      Foliage Acres, Inc.                           34.4400      94.7100      94.7100

I am checking hundreds of these .txt files, the only thing i need from them is the list
of companies and their ID in the following format so I can import into a database or spreadsheet.

1231;Acme Mart Co
1414;Foliage Acres, Inc.

Some of these files have 10, 20, or more entries per .txt file like the above.

Some will also have more then 2 companies like:
 Z1064411  ORANGES                  2.750      LB
1231      Acme Mart Co                                 33.2300      91.3825      91.3825
1414      Foliage Acres, Inc.                           34.4400      94.7100      94.7100
1414      Grover Brand, Inc.                           34.4400      94.7100      94.7100

There is a space between each of the groups of items, ie (APPLES, ORANGES) as shown in the above example if that helps to just pull the main companies and their respective ID's out.

Any help would be appreciated, thanks in advance!



0
Comment
Question by:cybrthug
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
4 Comments
 
LVL 84

Expert Comment

by:ozo
ID: 16867370
What marks the end of "Foliage Acres, Inc." that tells you that "34.4400" is not part of the company name?
How do you know that "APPLES" and "ORANGES" are not company names?
0
 

Author Comment

by:cybrthug
ID: 16867463
The 4 Digit ID is the only thing that will help in locating the region to start with.
From there you'd have to check how many 4 digit ID's there are and then rip maybe up to 40 characters one space after it. Does that help any?
The first line can always be skipped as each file will start like this:
Z1064411  ORANGES                  2.750      LB

drop to the second line always and start to count how many company ID's there are maybe before it hits the next blank line to start the next group for APPLES, which I will not need. As long as I can get the first group from each file that is all that is important.
0
 
LVL 7

Accepted Solution

by:
glassd earned 125 total points
ID: 16868442
Assuming the lines of interest always start with a digit, and always contain four value fields at the end:

awk '/^[0-9]/{
  printf("%s;",$1)
  for(i=2;i<=(NF-3);i++) {
    printf("%s ",$i)
  }
  print ""
}' <filename> | sort -u
0
 

Author Comment

by:cybrthug
ID: 16875279
Excellent, thank you!
0

Featured Post

On Demand Webinar: Networking for the Cloud Era

Did you know SD-WANs can improve network connectivity? Check out this webinar to learn how an SD-WAN simplified, one-click tool can help you migrate and manage data in the cloud.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Let's say you need to move the data of a file system from one partition to another. This generally involves dismounting the file system, backing it up to tapes, and restoring it to a new partition. You may also copy the file system from one place to…
I have been running these systems for a few years now and I am just very happy with them.   I just wanted to share the manual that I have created for upgrades and other things.  Oooh yes! FreeBSD makes me happy (as a server), no maintenance and I al…
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
Learn how to find files with the shell using the find and locate commands. Use locate to find a needle in a haystack.: With locate, check if the file still exists.: Use find to get the actual location of the file.:
Suggested Courses

622 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question