[Last Call] Learn about multicloud storage options and how to improve your company's cloud strategy. Register Now

x
?
Solved

Read in data from 100,000+ files via command line

Posted on 2014-02-06
2
Medium Priority
?
314 Views
Last Modified: 2014-02-08
I had originally asked the following question:
http://www.experts-exchange.com/Programming/Languages/Scripting/Shell/Q_28329553.html

This at first seemed like it work exactly the way I needed, however I just discovered that lines with spaces were not correctly read in.

Below is the last iteration of the code:
#!/bin/sh

echo `date`
find . -name "*.arf" | while read f; do
  newpath="$(basename $(dirname "$f"))"
#/$(basename $f)"
  cat "$f" | gawk -v p="$newpath" '{ 
    attname=substr($1,1,length($1)-1); nlist=nlist"`, `"attname;
    attvalue= substr($2,2,length($2)-2); vlist=vlist", '\''"attvalue"'\''";
  }
  END { 
    printf "insert into `mydatabase`.`archives` (`NEWPATH%s`) values ('\''%s'\''%s);\n", nlist, p, vlist;
  }' >> myinsertfile.sql
#| tee -a myinsertfile.sql
  cnt=$((cnt+1))
  [ $(($cnt%100)) -eq 0 ] && echo "File #$cnt: $f"
done

echo "Total Files: $cnt"

echo `date`

Open in new window


For the following .arf file:
FILEID: "TIF490336"
PATH: "/optical/incoming/TIF490336"
TYPE: "TIF"
SECLEV: "10"
STATID: ""
USRID: "admin"
REQDATE: "08/02/2012"
REQTIME: "09:02:32"
GENDATE: "08/03/2012"
GENTIME: "09:02:32"
PROGID: ""
GROUPID: "Check Stubs"
DESC: "August"

Open in new window

It produced the following SQL statement:
insert into `mydatabase`.`archives` (NEWPATH,FILEID,PATH,TYPE,SECLEV,STATID,USRID,REQDATE,REQTIME,GENDATE,GENTIME,PROGID,GROUPID,`DESC`) values ('TIF18','TIF490336','/optical/incoming/TIF490336','TIF','10','','admin','08/02/2012','09:02:32','08/03/2012','09:02:32','','Chec','August');

Open in new window


Which resulted in the GROUPID column containing incorrect values.  How can the code above be adjusted to process spaces as well?
0
Comment
Question by:bdhtechnology
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
2 Comments
 
LVL 20

Accepted Solution

by:
simon3270 earned 2000 total points
ID: 39841695
The problem is that gawk is splitting fields on spaces (its default).  If you change it to split on : or " characters, you can change 4 lines of the code to:
  cat "$f" | gawk -F'[:"]' -v p="$newpath" '{ 
      nlist=nlist "`, `" $1;
      vlist=vlist ", '\''" $3 "'\''";
      }

Open in new window

The rest of the script stays the same.  This assumes that the format of the input file is consistent (in particular, that there is one colon immediately after the first name, and the value field is always enclosed by double quotes).
0
 
LVL 1

Author Comment

by:bdhtechnology
ID: 39844053
Perfect, that's exactly what I needed!
0

Featured Post

Hire Technology Freelancers with Gigs

Work with freelancers specializing in everything from database administration to programming, who have proven themselves as experts in their field. Hire the best, collaborate easily, pay securely, and get projects done right.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

The purpose of this article is to demonstrate how we can use conditional statements using Python.
It is possible to export the data of a SQL Table in SSMS and generate INSERT statements. It's neatly tucked away in the generate scripts option of a database.
Learn several ways to interact with files and get file information from the bash shell. ls lists the contents of a directory: Using the -a flag displays hidden files: Using the -l flag formats the output in a long list: The file command gives us mor…
This demo shows you how to set up the containerized NetScaler CPX with NetScaler Management and Analytics System in a non-routable Mesos/Marathon environment for use with Micro-Services applications.
Suggested Courses

650 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question