?
Solved

Read in data from 100,000+ files via command line

Posted on 2014-02-06
2
Medium Priority
?
313 Views
Last Modified: 2014-02-08
I had originally asked the following question:
http://www.experts-exchange.com/Programming/Languages/Scripting/Shell/Q_28329553.html

This at first seemed like it work exactly the way I needed, however I just discovered that lines with spaces were not correctly read in.

Below is the last iteration of the code:
#!/bin/sh

echo `date`
find . -name "*.arf" | while read f; do
  newpath="$(basename $(dirname "$f"))"
#/$(basename $f)"
  cat "$f" | gawk -v p="$newpath" '{ 
    attname=substr($1,1,length($1)-1); nlist=nlist"`, `"attname;
    attvalue= substr($2,2,length($2)-2); vlist=vlist", '\''"attvalue"'\''";
  }
  END { 
    printf "insert into `mydatabase`.`archives` (`NEWPATH%s`) values ('\''%s'\''%s);\n", nlist, p, vlist;
  }' >> myinsertfile.sql
#| tee -a myinsertfile.sql
  cnt=$((cnt+1))
  [ $(($cnt%100)) -eq 0 ] && echo "File #$cnt: $f"
done

echo "Total Files: $cnt"

echo `date`

Open in new window


For the following .arf file:
FILEID: "TIF490336"
PATH: "/optical/incoming/TIF490336"
TYPE: "TIF"
SECLEV: "10"
STATID: ""
USRID: "admin"
REQDATE: "08/02/2012"
REQTIME: "09:02:32"
GENDATE: "08/03/2012"
GENTIME: "09:02:32"
PROGID: ""
GROUPID: "Check Stubs"
DESC: "August"

Open in new window

It produced the following SQL statement:
insert into `mydatabase`.`archives` (NEWPATH,FILEID,PATH,TYPE,SECLEV,STATID,USRID,REQDATE,REQTIME,GENDATE,GENTIME,PROGID,GROUPID,`DESC`) values ('TIF18','TIF490336','/optical/incoming/TIF490336','TIF','10','','admin','08/02/2012','09:02:32','08/03/2012','09:02:32','','Chec','August');

Open in new window


Which resulted in the GROUPID column containing incorrect values.  How can the code above be adjusted to process spaces as well?
0
Comment
Question by:bdhtechnology
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
2 Comments
 
LVL 19

Accepted Solution

by:
simon3270 earned 2000 total points
ID: 39841695
The problem is that gawk is splitting fields on spaces (its default).  If you change it to split on : or " characters, you can change 4 lines of the code to:
  cat "$f" | gawk -F'[:"]' -v p="$newpath" '{ 
      nlist=nlist "`, `" $1;
      vlist=vlist ", '\''" $3 "'\''";
      }

Open in new window

The rest of the script stays the same.  This assumes that the format of the input file is consistent (in particular, that there is one colon immediately after the first name, and the value field is always enclosed by double quotes).
0
 
LVL 1

Author Comment

by:bdhtechnology
ID: 39844053
Perfect, that's exactly what I needed!
0

Featured Post

Veeam Task Manager for Hyper-V

Task Manager for Hyper-V provides critical information that allows you to monitor Hyper-V performance by displaying real-time views of CPU and memory at the individual VM-level, so you can quickly identify which VMs are using host resources.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Recently, an awarded photographer, Selina De Maeyer (http://www.selinademaeyer.com/), completed a photo shoot of a beautiful event (http://www.sintjacobantwerpen.be/verslag-en-fotoreportage-van-de-sacramentsprocessie-door-antwerpen#thumbnails) in An…
Google Drive is extremely cheap offsite storage, and it's even possible to get extra storage for free for two years.  You can use the free account 15GB, and if you have an Android device..when you install Google Drive for the first time it will give…
This demo shows you how to set up the containerized NetScaler CPX with NetScaler Management and Analytics System in a non-routable Mesos/Marathon environment for use with Micro-Services applications.
How to Install VMware Tools in Red Hat Enterprise Linux 6.4 (RHEL 6.4) Step-by-Step Tutorial
Suggested Courses

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question