Avatar of Joel Buhr
Joel Buhr
Flag for United States of America asked on

Perl Script - Import Multiple CSV Files MySQL with Update Query In Between

I have never messed with PERL but understand it is probably what I want to be using to import due to the file sizes I am dealing with.

Environment : Linux MariaDB server.

I have 7 data files of roughly 24 million rows each.

Normally I am importing these via SSH mysql command after I move the ".csv" files over to the server running the following command for each file.

LOAD DATA LOCAL INFILE '/home/APR17/20161003_FULL FILE_01.csv' INTO TABLE FD_MASTER FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\r\n' IGNORE 1 LINES;

Open in new window


Then I run this query to code the batch that got loaded.

UPDATE FD_MASTER
SET LoadInputFile = '01'
WHERE LoadInputFile IS NULL

Open in new window


So my question is how do I create a PERL script that will load each of the 7 files one at a time, and in between each load run the Update Query.

It would be nice if it at the end provided a simple report of what it did including how long each segment took to load.

After this there are a series of queries that I also run on that Master file which I would like to also add to this process, but those can wait.
LinuxPerlMySQL Server

Avatar of undefined
Last Comment
Joel Buhr

8/22/2022 - Mon
arnold

You would use a text editor such vi, emacs, nano, pico, etc
The first line must be #!/usr/lib/perl

You would use dbi, Mysql modules to use features others added to enhance the .... So you do not need to start from scratch.

The path you specify must be accessible the MySQL user
/home might or might not be suitable depending on your Linux security, Selinux settings.

The coding is not that complex, the complexity deals with whether you want the import all or none, or you are ok with importing what can be imported and discard what fails. Deals

Run the command perldoc -M Mysql
If memory serves right, this will display an example of how to use MySQL module and connect to MySQL.
Since you've not attempted this before, I would suggest you learn/practice perl scripting as time permits, while manually performing the import task.

Unless the. Imports have to be done in sequence, you could using screen (Linux command to avail a user to have multiple sessions (virt window) via a single connection.
You can then run each import in its own window, when complete, you can run your second query.

If you are familiar with shell scripting bash, sh, ksh, zsh

If you do not require transactional loads,
echo "load data Infile ............... Your command " | mysql -u root
You can then testtheexitcode
And if valid send through the next command.

Perl would be useful when you need to individually process data from the csv file.
Joel Buhr

ASKER
I am ideally looking for someone that can perhaps write the PERL code and I can pay them for this.  I do appreciate the above, but time is money right now.
ASKER CERTIFIED SOLUTION
arnold

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
GET A PERSONALIZED SOLUTION
Ask your own question & get feedback from real experts
Find out why thousands trust the EE community with their toughest problems.
Joel Buhr

ASKER
That appears to be a good start, the field layout of the files I am importing match exactly the layout of the first part of my table layout in MySQL.  In this situation I have 7 csv files to import of about 24 million rows each. So on the update "LoadInputFile" part I have been simply coding that sequentially.

20161003_FULL FILE_0.csv  = 00
20161003_FULL FILE_1.csv = 01
20161003_FULL FILE_2.csv = 02
20161003_FULL FILE_3.csv = 03
20161003_FULL FILE_4.csv = 04
20161003_FULL FILE_5.csv = 05
20161003_FULL FILE_6.csv =06

Ideally I want to have the script work through each file one at a time. And when done let me know how many rows it imported for each file.

Is there a way to do that with no manually intervention?
Your help has saved me hundreds of hours of internet surfing.
fblack61
arnold

The list filename index/inputfile can be in a file and read space separated
cat file_list | while read filename, read inputfile; do
echo $filename $inputfile
done

This would replace the outer data collection loop.

Try this on the command line
Data=$(echo "select count(*) from local.FD_master;"|mysql -u root --password=<password>)
echo $Data
This is how you can run the command and get the response.

I'll look at providing a perl example.
Joel Buhr

ASKER
Where is the script getting the value it is going to use to update the field "LoadInputFile"? Is it updating it with the full file name? or just the ending numeric portion with leading "0"?
arnold

It will read it in from the reference file
Filename reference
Sequential reads read a,b ..
$a will have the first
$b will have the reference
Another way is read a that will have the entire line 'filename reference'
Using the command:
set $a
Will have filename set in $1 while the reference will be set in $2
Subsequent cariable assignment

FILENAME=$1
INPUTFILE=$2

Will work with the prior shell script example.
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
Tomas Helgi Johannsson

Hi!

You mentioned in your post that you have MariaDB as the database system.
In MariaDB there is a Storage Engine called Connect where you can easilly create a table and connect it directly to a csv file in a "split second".
This simplifies in great deal import of csv files as well as any work with the csv data.

In such case your import script only needs to create a temporary table that connects to the csv file and then execute a sql script or stored procedure that takes the data further into other tables as your application requires. And do so for all files (simple loop through ) that need to be imported. :)

You can do a simple "select count(*) from yourtemptablename" to see how many rows are in the file/table after you create the table.

Regards,
    Tomas Helgi
Joel Buhr

ASKER
Thank you for your help!