Solved

Read in data from 100,000+ files via command line

Posted on 2014-02-06
2
304 Views
Last Modified: 2014-02-08
I had originally asked the following question:
http://www.experts-exchange.com/Programming/Languages/Scripting/Shell/Q_28329553.html

This at first seemed like it work exactly the way I needed, however I just discovered that lines with spaces were not correctly read in.

Below is the last iteration of the code:
#!/bin/sh

echo `date`
find . -name "*.arf" | while read f; do
  newpath="$(basename $(dirname "$f"))"
#/$(basename $f)"
  cat "$f" | gawk -v p="$newpath" '{ 
    attname=substr($1,1,length($1)-1); nlist=nlist"`, `"attname;
    attvalue= substr($2,2,length($2)-2); vlist=vlist", '\''"attvalue"'\''";
  }
  END { 
    printf "insert into `mydatabase`.`archives` (`NEWPATH%s`) values ('\''%s'\''%s);\n", nlist, p, vlist;
  }' >> myinsertfile.sql
#| tee -a myinsertfile.sql
  cnt=$((cnt+1))
  [ $(($cnt%100)) -eq 0 ] && echo "File #$cnt: $f"
done

echo "Total Files: $cnt"

echo `date`

Open in new window


For the following .arf file:
FILEID: "TIF490336"
PATH: "/optical/incoming/TIF490336"
TYPE: "TIF"
SECLEV: "10"
STATID: ""
USRID: "admin"
REQDATE: "08/02/2012"
REQTIME: "09:02:32"
GENDATE: "08/03/2012"
GENTIME: "09:02:32"
PROGID: ""
GROUPID: "Check Stubs"
DESC: "August"

Open in new window

It produced the following SQL statement:
insert into `mydatabase`.`archives` (NEWPATH,FILEID,PATH,TYPE,SECLEV,STATID,USRID,REQDATE,REQTIME,GENDATE,GENTIME,PROGID,GROUPID,`DESC`) values ('TIF18','TIF490336','/optical/incoming/TIF490336','TIF','10','','admin','08/02/2012','09:02:32','08/03/2012','09:02:32','','Chec','August');

Open in new window


Which resulted in the GROUPID column containing incorrect values.  How can the code above be adjusted to process spaces as well?
0
Comment
Question by:bdhtechnology
2 Comments
 
LVL 19

Accepted Solution

by:
simon3270 earned 500 total points
Comment Utility
The problem is that gawk is splitting fields on spaces (its default).  If you change it to split on : or " characters, you can change 4 lines of the code to:
  cat "$f" | gawk -F'[:"]' -v p="$newpath" '{ 
      nlist=nlist "`, `" $1;
      vlist=vlist ", '\''" $3 "'\''";
      }

Open in new window

The rest of the script stays the same.  This assumes that the format of the input file is consistent (in particular, that there is one colon immediately after the first name, and the value field is always enclosed by double quotes).
0
 
LVL 1

Author Comment

by:bdhtechnology
Comment Utility
Perfect, that's exactly what I needed!
0

Featured Post

Enabling OSINT in Activity Based Intelligence

Activity based intelligence (ABI) requires access to all available sources of data. Recorded Future allows analysts to observe structured data on the open, deep, and dark web.

Join & Write a Comment

Suggested Solutions

PL/SQL can be a very powerful tool for working directly with database tables. Being able to loop will allow you to perform more complex operations, but can be a little tricky to write correctly. This article will provide examples of basic loops alon…
I. Introduction There's an interesting discussion going on now in an Experts Exchange Group — Attachments with no extension (http://www.experts-exchange.com/discussions/210281/Attachments-with-no-extension.html). This reminded me of questions tha…
Learn how to get help with Linux/Unix bash shell commands. Use help to read help documents for built in bash shell commands.: Use man to interface with the online reference manuals for shell commands.: Use man to search man pages for unknown command…
Learn how to find files with the shell using the find and locate commands. Use locate to find a needle in a haystack.: With locate, check if the file still exists.: Use find to get the actual location of the file.:

763 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now