Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people, just like you, are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
Solved

Read in data from 100,000+ files via command line

Posted on 2014-02-06
2
308 Views
Last Modified: 2014-02-08
I had originally asked the following question:
http://www.experts-exchange.com/Programming/Languages/Scripting/Shell/Q_28329553.html

This at first seemed like it work exactly the way I needed, however I just discovered that lines with spaces were not correctly read in.

Below is the last iteration of the code:
#!/bin/sh

echo `date`
find . -name "*.arf" | while read f; do
  newpath="$(basename $(dirname "$f"))"
#/$(basename $f)"
  cat "$f" | gawk -v p="$newpath" '{ 
    attname=substr($1,1,length($1)-1); nlist=nlist"`, `"attname;
    attvalue= substr($2,2,length($2)-2); vlist=vlist", '\''"attvalue"'\''";
  }
  END { 
    printf "insert into `mydatabase`.`archives` (`NEWPATH%s`) values ('\''%s'\''%s);\n", nlist, p, vlist;
  }' >> myinsertfile.sql
#| tee -a myinsertfile.sql
  cnt=$((cnt+1))
  [ $(($cnt%100)) -eq 0 ] && echo "File #$cnt: $f"
done

echo "Total Files: $cnt"

echo `date`

Open in new window


For the following .arf file:
FILEID: "TIF490336"
PATH: "/optical/incoming/TIF490336"
TYPE: "TIF"
SECLEV: "10"
STATID: ""
USRID: "admin"
REQDATE: "08/02/2012"
REQTIME: "09:02:32"
GENDATE: "08/03/2012"
GENTIME: "09:02:32"
PROGID: ""
GROUPID: "Check Stubs"
DESC: "August"

Open in new window

It produced the following SQL statement:
insert into `mydatabase`.`archives` (NEWPATH,FILEID,PATH,TYPE,SECLEV,STATID,USRID,REQDATE,REQTIME,GENDATE,GENTIME,PROGID,GROUPID,`DESC`) values ('TIF18','TIF490336','/optical/incoming/TIF490336','TIF','10','','admin','08/02/2012','09:02:32','08/03/2012','09:02:32','','Chec','August');

Open in new window


Which resulted in the GROUPID column containing incorrect values.  How can the code above be adjusted to process spaces as well?
0
Comment
Question by:bdhtechnology
2 Comments
 
LVL 19

Accepted Solution

by:
simon3270 earned 500 total points
ID: 39841695
The problem is that gawk is splitting fields on spaces (its default).  If you change it to split on : or " characters, you can change 4 lines of the code to:
  cat "$f" | gawk -F'[:"]' -v p="$newpath" '{ 
      nlist=nlist "`, `" $1;
      vlist=vlist ", '\''" $3 "'\''";
      }

Open in new window

The rest of the script stays the same.  This assumes that the format of the input file is consistent (in particular, that there is one colon immediately after the first name, and the value field is always enclosed by double quotes).
0
 
LVL 1

Author Comment

by:bdhtechnology
ID: 39844053
Perfect, that's exactly what I needed!
0

Featured Post

Webinar: Aligning, Automating, Winning

Join Dan Russo, Senior Manager of Operations Intelligence, for an in-depth discussion on how Dealertrack, leading provider of integrated digital solutions for the automotive industry, transformed their DevOps processes to increase collaboration and move with greater velocity.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

PL/SQL can be a very powerful tool for working directly with database tables. Being able to loop will allow you to perform more complex operations, but can be a little tricky to write correctly. This article will provide examples of basic loops alon…
Occasionally there is a need to clean table columns, especially if you have inherited legacy data. There are obviously many ways to accomplish that, including elaborate UPDATE queries with anywhere from one to numerous REPLACE functions (even within…
Learn how to navigate the file tree with the shell. Use pwd to print the current working directory: Use ls to list a directory's contents: Use cd to change to a new directory: Use wildcards instead of typing out long directory names: Use ../ to move…
In a recent question (https://www.experts-exchange.com/questions/29004105/Run-AutoHotkey-script-directly-from-Notepad.html) here at Experts Exchange, a member asked how to run an AutoHotkey script (.AHK) directly from Notepad++ (aka NPP). This video…

839 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question