Solved

Parse a Flat file into a MySQL DB

Posted on 2001-08-15
8
537 Views
Last Modified: 2012-05-04
Hello people.
I am a complete novice when it comes to perl. I am in a hard spot right now and need a script. I don't have the time to learn the new language or else I would. Can someone help me or write for me a script that will take a Flat File (Pipe line delimited) and parse it into an existing MySQL DB, It needs to run from a shell script.
Also need it to e-mail a message to someone if there is an error.
I will give 660 points for a complete script. Thank you.
(You can look at my previous questions and see that I am good for the points)
0
Comment
Question by:Deathead
  • 4
  • 2
  • 2
8 Comments
 
LVL 6

Accepted Solution

by:
christopher sagayam earned 300 total points
ID: 6388526
you really dont need a script at all if you have access to the command prompt


show me your text file and I will send you an exact statement

or else read the docs below

7.19 LOAD DATA INFILE syntax
LOAD DATA [LOW_PRIORITY] [LOCAL] INFILE 'file_name.txt' [REPLACE | IGNORE]
    INTO TABLE tbl_name
    [FIELDS
        [TERMINATED BY '\t']
        [OPTIONALLY] ENCLOSED BY '']
        [ESCAPED BY '\\' ]]
    [LINES TERMINATED BY '\n']
    [IGNORE number LINES]
    [(col_name,...)]

The LOAD DATA INFILE statement reads rows from a text file into a table at a very high speed. If the LOCAL keyword is specified, the file is read from the client host. If LOCAL is not specified, the file must be located on the server. (LOCAL is available in MySQL 3.22.6 or later.)

For security reasons, when reading text files located on the server, the files must either reside in the database directory or be readable by all. Also, to use LOAD DATA INFILE on server files, you must have the file privilege on the server host. See section 6.7 Privileges provided by MySQL.

If you specify the keyword LOW_PRIORITY, execution of the LOAD DATA statement is delayed until no other clients are reading from the table.

Using LOCAL will be a bit slower than letting the server access the files directly, because the contents of the file must travel from the client host to the server host. On the other hand, you do not need the file privilege to load local files.

You can also load data files by using the mysqlimport utility; it operates by sending a LOAD DATA INFILE command to the server. The --local option causes mysqlimport to read data files from the client host. You can specify the --compress option to get better performance over slow networks if the client and server support the compressed protocol.

When locating files on the server host, the server uses the following rules:

If an absolute pathname is given, the server uses the pathname as is.
If a relative pathname with one or more leading components is given, the server searches for the file relative to the server's data directory.
If a filename with no leading components is given, the server looks for the file in the database directory of the current database.
Note that these rules mean a file given as `./myfile.txt' is read from the server's data directory, whereas a file given as `myfile.txt' is read from the database directory of the current database. For example, the following LOAD DATA statement reads the file `data.txt' from the database directory for db1 because db1 is the current database, even though the statement explicitly loads the file into a table in the db2 database:

mysql> USE db1;
mysql> LOAD DATA INFILE "data.txt" INTO TABLE db2.my_table;

The REPLACE and IGNORE keywords control handling of input records that duplicate existing records on unique key values. If you specify REPLACE, new rows replace existing rows that have the same unique key value. If you specify IGNORE, input rows that duplicate an existing row on a unique key value are skipped. If you don't specify either option, an error occurs when a duplicate key value is found, and the rest of the text file is ignored.

If you load data from a local file using the LOCAL keyword, the server has no way to stop transmission of the file in the middle of the operation, so the default bahavior is the same as if IGNORE is specified.

LOAD DATA INFILE is the complement of SELECT ... INTO OUTFILE. See section 7.15 SELECT syntax. To write data from a database to a file, use SELECT ... INTO OUTFILE. To read the file back into the database, use LOAD DATA INFILE. The syntax of the FIELDS and LINES clauses is the same for both commands. Both clauses are optional, but FIELDS must precede LINES if both are specified.

If you specify a FIELDS clause, each of its subclauses (TERMINATED BY, [OPTIONALLY] ENCLOSED BY and ESCAPED BY) is also optional, except that you must specify at least one of them.

If you don't specify a FIELDS clause, the defaults are the same as if you had written this:

FIELDS TERMINATED BY '\t' ENCLOSED BY '' ESCAPED BY '\\'

If you don't specify a LINES clause, the default is the same as if you had written this:

LINES TERMINATED BY '\n'

In other words, the defaults cause LOAD DATA INFILE to act as follows when reading input:

Look for line boundaries at newlines
Break lines into fields at tabs
Do not expect fields to be enclosed within any quoting characters
Interpret occurrences of tab, newline or `\' preceded by `\' as literal characters that are part of field values
Conversely, the defaults cause SELECT ... INTO OUTFILE to act as follows when writing output:

Write tabs between fields
Do not enclose fields within any quoting characters
Use `\' to escape instances of tab, newline or `\' that occur within field values
Write newlines at the ends of lines
Note that to write FIELDS ESCAPED BY '\\', you must specify two backslashes for the value to be read as a single backslash.

The IGNORE number LINES option can be used to ignore a header of column names at the start of the file:

mysql> LOAD DATA INFILE "/tmp/file_name" into table test IGNORE 1 LINES;

When you use SELECT ... INTO OUTFILE in tandem with LOAD DATA INFILE to write data from a database into a file and then read the file back into the database later, the field and line handling options for both commands must match. Otherwise, LOAD DATA INFILE will not interpret the contents of the file properly. Suppose you use SELECT ... INTO OUTFILE to write a file with fields delimited by commas:

mysql> SELECT * INTO OUTFILE 'data.txt'
           FIELDS TERMINATED BY ','
           FROM ...

To read the comma-delimited file back in, the correct statement would be:

mysql> LOAD DATA INFILE 'data.txt' INTO TABLE table2
           FIELDS TERMINATED BY ',';

If instead you tried to read in the file with the statement shown below, it wouldn't work because it instructs LOAD DATA INFILE to look for tabs between fields:

mysql> LOAD DATA INFILE 'data.txt' INTO TABLE table2
           FIELDS TERMINATED BY '\t';

The likely result is that each input line would be interpreted as a single field.

LOAD DATA INFILE can be used to read files obtained from external sources, too. For example, a file in dBASE format will have fields separated by commas and enclosed in double quotes. If lines in the file are terminated by newlines, the command shown below illustrates the field and line handling options you would use to load the file:

mysql> LOAD DATA INFILE 'data.txt' INTO TABLE tbl_name
           FIELDS TERMINATED BY ',' ENCLOSED BY '"'
           LINES TERMINATED BY '\n';

Any of the field or line handling options may specify an empty string (''). If not empty, the FIELDS [OPTIONALLY] ENCLOSED BY and FIELDS ESCAPED BY values must be a single character. The FIELDS TERMINATED BY and LINES TERMINATED BY values may be more than one character. For example, to write lines that are terminated by carriage return-linefeed pairs, or to read a file containing such lines, specify a LINES TERMINATED BY '\r\n' clause.

FIELDS [OPTIONALLY] ENCLOSED BY controls quoting of fields. For output (SELECT ... INTO OUTFILE), if you omit the word OPTIONALLY, all fields are enclosed by the ENCLOSED BY character. An example of such output (using a comma as the field delimiter) is shown below:

"1","a string","100.20"
"2","a string containing a , comma","102.20"
"3","a string containing a \" quote","102.20"
"4","a string containing a \", quote and comma","102.20"

If you specify OPTIONALLY, the ENCLOSED BY character is used only to enclose CHAR and VARCHAR fields:

1,"a string",100.20
2,"a string containing a , comma",102.20
3,"a string containing a \" quote",102.20
4,"a string containing a \", quote and comma",102.20

Note that occurrences of the ENCLOSED BY character within a field value are escaped by prefixing them with the ESCAPED BY character. Also note that if you specify an empty ESCAPED BY value, it is possible to generate output that cannot be read properly by LOAD DATA INFILE. For example, the output just shown above would appear as shown below if the escape character is empty. Observe that the second field in the fourth line contains a comma following the quote, which (erroneously) appears to terminate the field:

1,"a string",100.20
2,"a string containing a , comma",102.20
3,"a string containing a " quote",102.20
4,"a string containing a ", quote and comma",102.20

For input, the ENCLOSED BY character, if present, is stripped from the ends of field values. (This is true whether or not OPTIONALLY is specified; OPTIONALLY has no effect on input interpretation.) Occurrences of the ENCLOSED BY character preceded by the ESCAPED BY character are interpreted as part of the current field value. In addition, duplicated ENCLOSED BY characters occurring within fields are interpreted as single ENCLOSED BY characters if the field itself starts with that character. For example, if ENCLOSED BY '"' is specified, quotes are handled as shown below:

"The ""BIG"" boss"  -> The "BIG" boss
The "BIG" boss      -> The "BIG" boss
The ""BIG"" boss    -> The ""BIG"" boss

FIELDS ESCAPED BY controls how to write or read special characters. If the FIELDS ESCAPED BY character is not empty, it is used to prefix the following characters on output:

The FIELDS ESCAPED BY character
The FIELDS [OPTIONALLY] ENCLOSED BY character
The first character of the FIELDS TERMINATED BY and LINES TERMINATED BY values
ASCII 0 (what is actually written following the escape character is ASCII '0', not a zero-valued byte)
If the FIELDS ESCAPED BY character is empty, no characters are escaped. It is probably not a good idea to specify an empty escape character, particularly if field values in your data contain any of the characters in the list just given.

For input, if the FIELDS ESCAPED BY character is not empty, occurrences of that character are stripped and the following character is taken literally as part of a field value. The exceptions are an escaped `0' or `N' (e.g., \0 or \N if the escape character is `\'). These sequences are interpreted as ASCII 0 (a zero-valued byte) and NULL. See below for the rules on NULL handling.

For more information about `\'-escape syntax, see section 7.1 Literals: how to write strings and numbers.

In certain cases, field and line handling options interact:

If LINES TERMINATED BY is an empty string and FIELDS TERMINATED BY is non-empty, lines are also terminated with FIELDS TERMINATED BY.
If the FIELDS TERMINATED BY and FIELDS ENCLOSED BY values are both empty (''), a fixed-row (non-delimited) format is used. With fixed-row format, no delimiters are used between fields. Instead, column values are written and read using the ``display'' widths of the columns. For example, if a column is declared as INT(7), values for the column are written using 7-character fields. On input, values for the column are obtained by reading 7 characters. Fixed-row format also affects handling of NULL values; see below. Note that fixed size format will not work if you are using a multi-byte character set.
Handling of NULL values varies, depending on the FIELDS and LINES options you use:

For the default FIELDS and LINES values, NULL is written as \N for output and \N is read as NULL for input (assuming the ESCAPED BY character is `\').
If FIELDS ENCLOSED BY is not empty, a field containing the literal word NULL as its value is read as a NULL value (this differs from the word NULL enclosed within FIELDS ENCLOSED BY characters, which is read as the string 'NULL').
If FIELDS ESCAPED BY is empty, NULL is written as the word NULL.
With fixed-row format (which happens when FIELDS TERMINATED BY and FIELDS ENCLOSED BY are both empty), NULL is written as an empty string. Note that this causes both NULL values and empty strings in the table to be indistinguishable when written to the file because they are both written as empty strings. If you need to be able to tell the two apart when reading the file back in, you should not use fixed-row format.
Some cases are not supported by LOAD DATA INFILE:

Fixed-size rows (FIELDS TERMINATED BY and FIELDS ENCLOSED BY both empty) and BLOB or TEXT columns.
If you specify one separator that is the same as or a prefix of another, LOAD DATA INFILE won't be able to interpret the input properly. For example, the following FIELDS clause would cause problems:
FIELDS TERMINATED BY '"' ENCLOSED BY '"'

If FIELDS ESCAPED BY is empty, a field value that contains an occurrence of FIELDS ENCLOSED BY or LINES TERMINATED BY followed by the FIELDS TERMINATED BY value will cause LOAD DATA INFILE to stop reading a field or line too early. This happens because LOAD DATA INFILE cannot properly determine where the field or line value ends.
The following example loads all columns of the persondata table:

mysql> LOAD DATA INFILE 'persondata.txt' INTO TABLE persondata;

No field list is specified, so LOAD DATA INFILE expects input rows to contain a field for each table column. The default FIELDS and LINES values are used.

If you wish to load only some of a table's columns, specify a field list:

mysql> LOAD DATA INFILE 'persondata.txt'
           INTO TABLE persondata (col1,col2,...);

You must also specify a field list if the order of the fields in the input file differs from the order of the columns in the table. Otherwise, MySQL cannot tell how to match up input fields with table columns.

If a row has too few fields, the columns for which no input field is present are set to default values. Default value assignment is described in section 7.7 CREATE TABLE syntax.

An empty field value is interpreted differently than if the field value is missing:

For string types, the column is set to the empty string.
For numeric types, the column is set to 0.
For date and time types, the column is set to the appropriate ``zero'' value for the type. See section 7.3.6 Date and time types.
TIMESTAMP columns are only set to the current date and time if there is a NULL value for the column, or (for the first TIMESTAMP column only) if the TIMESTAMP column is left out from the field list when a field list is specified.

If an input row has too many fields, the extra fields are ignored and the number of warnings is incremented.

LOAD DATA INFILE regards all input as strings, so you can't use numeric values for ENUM or SET columns the way you can with INSERT statements. All ENUM and SET values must be specified as strings!

If you are using the C API, you can get information about the query by calling the API function mysql_info() when the LOAD DATA INFILE query finishes. The format of the information string is shown below:

Records: 1  Deleted: 0  Skipped: 0  Warnings: 0

Warnings occur under the same circumstances as when values are inserted via the INSERT statement (see section 7.17 INSERT syntax), except that LOAD DATA INFILE also generates warnings when there are too few or too many fields in the input row. The warnings are not stored anywhere; the number of warnings can only be used as an indication if everything went well. If you get warnings and want to know exactly why you got them, one way to do this is to use SELECT ... INTO OUTFILE into another file and compare this to your original input file.

If you need LOAD DATA INFILE to read from a pipe, you can use the following trick:

mkfifo /mysql/db/x/x
chmod 666 /mysql/db/x/x
cat < /dev/tcp/10.1.1.12/4711 > /nt/mysql/db/x/x
mysql -e "LOAD DATA INFILE 'x' INTO TABLE x" x

For more information about the efficiency of INSERT versus LOAD DATA INFILE and speeding up LOAD DATA INFILE, See section 12.5.6 Speed of INSERT queries.

0
 

Author Comment

by:Deathead
ID: 6389082
I don't have access to the command line, What is happening is that I am setting up a cron job on a shared hosting environment and the why that the cron works that it can only call a script. Therefore, I need the script to be able to do everything.
0
 
LVL 6

Expert Comment

by:christopher sagayam
ID: 6389112
ok give me a sample of the datafile with the FIELDNAMES of the mysql db and I will write a script for you
0
 

Author Comment

by:Deathead
ID: 6389130
cool.
Here you go:
<Sample file>

FGM|OTC |00172572810|Famotidine 20mg UD Tab|Pepcid AR |20MG|20/5728|BEIGE |ROUND |NONE|IVAX PHARMACEUTICALS|PENDIN|OTC |AWP|173|144|24X24
FGM|OTC |00172572810|Famotidine 20mg UD Tab|Pepcid AR |20MG|20/5728|BEIGE |ROUND |NONE|IVAX PHARMACEUTICALS|PENDIN|OTC |SRP|173|144|24X24
FGM|OTC |00172572860|Famotidine 20mg Tab |Pepcid AR |20MG|20/5728|BEIGE |ROUND |NONE|IVAX PHARMACEUTICALS|PENDIN|OTC |AWP|173|144|24X24
FGM|OTC |00172572860|Famotidine 20mg Tab |Pepcid AR |20MG|20/5728|BEIGE |ROUND |NONE|IVAX PHARMACEUTICALS|PENDIN|OTC |SRP|173|144|24X24
FGM|OTC |00172572870|Famotidine 20mg Tab |Pepcid AR |20MG|20/5728|BEIGE |ROUND |NONE|IVAX PHARMACEUTICALS|PENDIN|OTC |AWP|865|144|24X24
FGM|OTC |00172572870|Famotidine 20mg Tab |Pepcid AR |20MG|20/5728|BEIGE |ROUND |NONE|IVAX PHARMACEUTICALS|PENDIN|OTC |SRP|865|144|24X24
FGM|OTC |00172572880|Famotidine 20mg Tab |Pepcid AR |20MG|20/5728|BEIGE |ROUND |NONE|IVAX PHARMACEUTICALS|PENDIN|OTC |AWP|1730|144|24X24
FGM|OTC |00172572880|Famotidine 20mg Tab |Pepcid AR |20MG|20/5728|BEIGE |ROUND |NONE|IVAX PHARMACEUTICALS|PENDIN|OTC |SRP|1730|144|24X24
FGM|OTC |00172572910|Famotidine 40mg UD Tab|Pepcid AR |40MG|40/5729|TAN |ROUND |NONE|IVAX PHARMACEUTICALS|PENDIN|OTC |AWP|334.4|144|24X24
</sample file>

Here is the Table Construct:
CREATE TABLE web_prod (
   FIN_GOODS_TYPE char(3),
   IVAX_RATING varchar(6),
   NDC_NUM varchar(25) NOT NULL,
   NDC_DESC varchar(30),
   BRAND_COMP varchar(30),
   SZE varchar(10),
   IMPRINT varchar(15),
   COLOR varchar(10),
   SHAPE varchar(10),
   FLAVOR varchar(10),
   MANUFACTURER varchar(50),
   THERAPEUTIC_CAT varchar(6),
   FDA_RATING varchar(6),
   AWP_SRP_CD char(3),
   AWP_SRP_PRC varchar(20),
   CASE_QTY varchar(10),
   CASE_DIMENSIONS varchar(30)
);

0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 3

Expert Comment

by:dalehend
ID: 6391663
#/usr/bin/perl
# required modules include AnyData, DBD-AnyData, DBD-Mysql
# I did not test this and there might be small tweaking
# required for assigning SELECTED row to table fields
use DBI;
my $dbh = DBI->connect('dbi:AnyData(RaiseError=>1):');
my $mysql_dbh = DBI->connect("DBI:mysql:dbname:localhost", "user", "passwd") or die;

 $dbh->func( $tablename, 'pipe', $filename 'ad_catalog');
 my $sth = $dbh->prepare("SELECT $field1,$field2,$field3 ... WHERE 1");

$sth->execute();
 my $mysql_sth = $dbh->prepare( q{
      INSERT INTO $mysqltable ($field1, $field2, $field3 ...) VALUES (?, ?)
  });

 while (my $row = $sth->fetch) {
   ($field1, $field2, $field3 ...) = @row;
   $mysql_sth->execute($field1, $field2, field3 ...);
 }
$mysql_dbh->disconnect;
 exit;

0
 

Author Comment

by:Deathead
ID: 6392524
dalehend,
Thank you, but 1) This does not open the flat file to pull the strings, 2)chris18 has already started doing the script for me, and 3) You proposed an answer instead of posting a comment which locked the question.
0
 
LVL 3

Expert Comment

by:dalehend
ID: 6393581
no prob
I thought you might be looking for something short and sweet.
Using AnyData, the following statement will open the pipe delimited file where $filename is the variable for the actual filename and $tablename is any value.
$dbh->func( $tablename, 'pipe', $filename 'ad_catalog');
0
 

Author Comment

by:Deathead
ID: 6531207
Thank you.
0

Featured Post

Find Ransomware Secrets With All-Source Analysis

Ransomware has become a major concern for organizations; its prevalence has grown due to past successes achieved by threat actors. While each ransomware variant is different, we’ve seen some common tactics and trends used among the authors of the malware.

Join & Write a Comment

I've just discovered very important differences between Windows an Unix formats in Perl,at least 5.xx.. MOST IMPORTANT: Use Unix file format while saving Your script. otherwise it will have ^M s or smth likely weird in the EOL, Then DO NOT use m…
Many time we need to work with multiple files all together. If its windows system then we can use some GUI based editor to accomplish our task. But what if you are on putty or have only CLI(Command Line Interface) as an option to  edit your files. I…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Access reports are powerful and flexible. Learn how to create a query and then a grouped report using the wizard. Modify the report design after the wizard is done to make it look better. There will be another video to explain how to put the final p…

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

26 Experts available now in Live!

Get 1:1 Help Now