Perl Script - Import Multiple CSV Files MySQL with Update Query In Between
I have never messed with PERL but understand it is probably what I want to be using to import due to the file sizes I am dealing with.
Environment : Linux MariaDB server.
I have 7 data files of roughly 24 million rows each.
Normally I am importing these via SSH mysql command after I move the ".csv" files over to the server running the following command for each file.
LOAD DATA LOCAL INFILE '/home/APR17/20161003_FULL FILE_01.csv' INTO TABLE FD_MASTER FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\r\n' IGNORE 1 LINES;
So my question is how do I create a PERL script that will load each of the 7 files one at a time, and in between each load run the Update Query.
It would be nice if it at the end provided a simple report of what it did including how long each segment took to load.
After this there are a series of queries that I also run on that Master file which I would like to also add to this process, but those can wait.
LinuxPerlMySQL Server
Last Comment
Joel Buhr
8/22/2022 - Mon
arnold
You would use a text editor such vi, emacs, nano, pico, etc
The first line must be #!/usr/lib/perl
You would use dbi, Mysql modules to use features others added to enhance the .... So you do not need to start from scratch.
The path you specify must be accessible the MySQL user
/home might or might not be suitable depending on your Linux security, Selinux settings.
The coding is not that complex, the complexity deals with whether you want the import all or none, or you are ok with importing what can be imported and discard what fails. Deals
Run the command perldoc -M Mysql
If memory serves right, this will display an example of how to use MySQL module and connect to MySQL.
Since you've not attempted this before, I would suggest you learn/practice perl scripting as time permits, while manually performing the import task.
Unless the. Imports have to be done in sequence, you could using screen (Linux command to avail a user to have multiple sessions (virt window) via a single connection.
You can then run each import in its own window, when complete, you can run your second query.
If you are familiar with shell scripting bash, sh, ksh, zsh
If you do not require transactional loads,
echo "load data Infile ............... Your command " | mysql -u root
You can then testtheexitcode
And if valid send through the next command.
Perl would be useful when you need to individually process data from the csv file.
Joel Buhr
ASKER
I am ideally looking for someone that can perhaps write the PERL code and I can pay them for this. I do appreciate the above, but time is money right now.
That appears to be a good start, the field layout of the files I am importing match exactly the layout of the first part of my table layout in MySQL. In this situation I have 7 csv files to import of about 24 million rows each. So on the update "LoadInputFile" part I have been simply coding that sequentially.
The list filename index/inputfile can be in a file and read space separated
cat file_list | while read filename, read inputfile; do
echo $filename $inputfile
done
This would replace the outer data collection loop.
Try this on the command line
Data=$(echo "select count(*) from local.FD_master;"|mysql -u root --password=<password>)
echo $Data
This is how you can run the command and get the response.
I'll look at providing a perl example.
Joel Buhr
ASKER
Where is the script getting the value it is going to use to update the field "LoadInputFile"? Is it updating it with the full file name? or just the ending numeric portion with leading "0"?
arnold
It will read it in from the reference file
Filename reference
Sequential reads read a,b ..
$a will have the first
$b will have the reference
Another way is read a that will have the entire line 'filename reference'
Using the command:
set $a
Will have filename set in $1 while the reference will be set in $2
Subsequent cariable assignment
You mentioned in your post that you have MariaDB as the database system.
In MariaDB there is a Storage Engine called Connect where you can easilly create a table and connect it directly to a csv file in a "split second".
This simplifies in great deal import of csv files as well as any work with the csv data.
In such case your import script only needs to create a temporary table that connects to the csv file and then execute a sql script or stored procedure that takes the data further into other tables as your application requires. And do so for all files (simple loop through ) that need to be imported. :)
You can do a simple "select count(*) from yourtemptablename" to see how many rows are in the file/table after you create the table.
The first line must be #!/usr/lib/perl
You would use dbi, Mysql modules to use features others added to enhance the .... So you do not need to start from scratch.
The path you specify must be accessible the MySQL user
/home might or might not be suitable depending on your Linux security, Selinux settings.
The coding is not that complex, the complexity deals with whether you want the import all or none, or you are ok with importing what can be imported and discard what fails. Deals
Run the command perldoc -M Mysql
If memory serves right, this will display an example of how to use MySQL module and connect to MySQL.
Since you've not attempted this before, I would suggest you learn/practice perl scripting as time permits, while manually performing the import task.
Unless the. Imports have to be done in sequence, you could using screen (Linux command to avail a user to have multiple sessions (virt window) via a single connection.
You can then run each import in its own window, when complete, you can run your second query.
If you are familiar with shell scripting bash, sh, ksh, zsh
If you do not require transactional loads,
echo "load data Infile ............... Your command " | mysql -u root
You can then testtheexitcode
And if valid send through the next command.
Perl would be useful when you need to individually process data from the csv file.