Using Perl to extract, style and email data from a MySQL Database as an Excel email attachement

I have been pestered over the years to produce and distribute regular data extracts, and often the request have explicitly requested the data be emailed as an Excel attachement; specifically Excel, as it appears: CSV files confuse (no Red or Green highlighting), Google Docs links confound (potentially require the recall and entry of a password) and Business Object, Crystal, iReports and the like all require some thought and action by the recipient (a non starter).

Anyway as the source of the data has generally been a remote Oracle / MySQL / DB2 / Mongo... DB, and the requirements generally included a statement mandating the extract be available first thing Monday morning ,the obvious action would be to write and schedule (crontab) a script to:

Connect to a DB
Submit a query or two to Extract some Data
Apply a bit of logic to reformat, add a few formula, filters, graphs and colour (Senior managers like Green)
Output the results as an Excel sheet
Encode and Email the sheet (as if you'd been in all weekend) to one ore more recipients.

...and all while you're still seeking your morning caffeine hit.

The above actions can be performed reasonably easily in any of the half dozen languages you'll likely find on the average (non-hardened) Unix box: Perl (Practical Extraction and Reporting Language), PHP, Java, Python......  but as the purpose of this script resembles one of the languages acronym's I'll attempt to scribble down a few hints / code snippets to get you started in Perl.

Assuming your Server / PC / Mac has the applicable developer tools installed, along with access to your database server, and to an open email relay, you'll first want to pull down, from CPAN ( - make sure any firewall's open), the necessary libraries to connect to a Database, create an Excel workbook and possibly encode a workbook.

The following one liner should more than suffice:
 cpan YAML Data::Dumper Spreadsheet::WriteExcel Tie::IxHash Encode Scalar::Util DBI DBD::mysql JSON MongoDB File::Basename Excel::Writer::XLSX IO::File MIME::Base64 Encode::Encoding

Open in new window

Ideally run the above as root (sudo)
Alter the DBI::xxxxxxx  include to suite.
Generally just hit enter when prompted
You'll loose half an hour, or so, of your life while the above download / compile / install.

Next you'll want a few lines of code, ideally stuck in a separate package (file), to establish and return a connection to a DB. As this script is likely to contain some hard coded credentials, so  make sure the package is ONLY accessible to applicable / calling user(s) , also externalising the DB connection code encourages re-use, and possibly prevents the proliferation of DB credentials:
# MySQL database information
                      use strict;
                      use DBI;
                      package myConnectDB;
                      # Assign a variable, as the body must do something in lue of a proper constructor.
                      our $someVarName="myConnectDB";
                      return 1; # True
                      # Return a connection to database
                      sub db_connect {
                      # Default - Localhost
                      my $db="testDatabase";
                      my $host="";
                      my $port="3306";
                      my $userid="someUser";
                      my $passwd="somePassword";
                      my $connectionInfo="DBI:mysql:database=$db;$host:$port";
                        if ( @_ == 3 ) {
                              ($connectionInfo,$userid,$passwd) = @_;
                      # database information
                      my $dbh = DBI->connect($connectionInfo,$userid,$passwd)
                               or die "Couldn't connect to database: " . DBI->errstr;
                      return $dbh;
                      # #### END ####

Open in new window

Now we have some mechanism of connecting to the database we'll want some logic to create an Excel workbook. Again as your more likely than not to be asked to produce another / similar extract it is best to stick this and any other reusable / associated code in a separate package.  So below is some fairly general purpose, but rather messy package stub that includes enough sample code to style and write out and order some or all of the columns in a passed data object, and either XLS or XLSX format workbook. Please note you'll need to go down the XLSX route if your extract has > 64K and < 2M rows.

The required functionality is provided by a simple call to myExcelLib::convertToExcel() with what  I hope are some reasonably self explanatory parameters:  
$sFilename  - a name for the generated Excel workbook.
$oJsonData  -  a  JSON style object, containing an ARRAY of HASHs (ROWS), with each HASH /  containing a series of KEY (Column Name) / VALUE pairs (The VALUE can be a simple scalar / Boolean / timestamp VALUE or a HASH supplying a value and optional styling information).
$raOutFields  - an Array listing the Keys / Columns, from the JSON Document that are to appear in the Sheet. The columns are added to the sheet in the order they appear in the array.
$sFileFormat - Either XLS or XLSX
                      # Purpose: Assorted Excel related functions
                      use strict;
                      use Spreadsheet::WriteExcel;
                      use Excel::Writer::XLSX;
                      use Tie::IxHash;
                      use Scalar::Util 'reftype';
                      use JSON;
                      use File::Basename;
                      use Data::Dumper;
                      use IO::File;
                      package myExcelLib;
                      sub getValue;
                      my $myExcelLib = 1;
                      return 1; # True
                      # --------------------------------------------
                      sub base64_encode_file {
                         # see:       (have been using this)
                         # alternatively:
                         # Or:
                      # --------------------------------------------
                      sub getValue{
                        my ($sKey, $rhDoc,$bReturnRef)=@_;
                        my ($sKeyPt1,$sKeyPt2,$sKeyPt3,$refType);
                        if ( ($sKey =~ /^[^.]*[.]/) ){
                                      ($sKeyPt1, $sKeyPt2) = ($sKey =~  /^([^.]+)\.(.+)$/ );
                                      $refType = reftype($rhDoc->{$sKeyPt1});
                                      if ( $refType && $refType eq 'HASH' ) {
                                              return getValue($sKeyPt2, $rhDoc->{$sKeyPt1});
                                      elsif ( $refType && $refType eq 'ARRAY' ) {
                                              if ( ($sKeyPt2 =~ /^[0-9]+$/) ){
                                                      return ($rhDoc->{$sKeyPt1}[$sKeyPt2])           if ( isNotARef($rhDoc->{$sKeyPt1}[$sKeyPt2]));
                                              } else {
                                                      if ( ($sKeyPt2 =~ /^[0-9]+[.]/) ){
                                                              ($sKeyPt2, $sKeyPt3) = ($sKeyPt2 =~  /^([^.]+)\.(.+)$/ );
                                                              return getValue($sKeyPt3, $rhDoc->{$sKeyPt1}[$sKeyPt2]);
                                                      } else {  #Default to first element of ARRAY, if no index given
                                                              return getValue($sKeyPt2, $rhDoc->{$sKeyPt1}[0]);
                        } else {
                              if ( ref($rhDoc->{$sKey}) eq "DateTime") {
                                      my $dt = $rhDoc->{$sKey};
                                      return ($dt->datetime);
                              return  $rhDoc->{$sKey}                                                              if ( $bReturnRef || isNotARef($rhDoc->{$sKey}));
                        return ("");
                      } ## End sub
                      # --------------------------------------------
                      sub isNotARef($){
                        my ($refToTest)=@_;
                              if( ! ref($refToTest) || ref($refToTest) eq 'boolean'){
                                      return 1;
                              return 0;
                      # --------------------------------------------
                      sub convertToExcel($$$$){
                        my ($sFilename, $oJsonData, $raOutFields, $sFileFormat)=@_;
                        my ($iRow,$iCol,$rhDoc,$oTitleFormat,$oTotalFormat,$oTotalPCTFormat,$oPCTFormat,$sKey,$workbook,$sVal,$oFormat);
                        # Create a new workbook and add a worksheet.
                        $sFileFormat = 'xls' if ( ! $sFileFormat );
                        if ( $sFileFormat =~ /^xlsx$/i ){
                              $workbook  = Excel::Writer::XLSX->new("$sFilename") || die ("Couldn't create: $sFilename");
                        elsif ( $sFileFormat =~ /^xls$/i ){
                              $workbook  = Spreadsheet::WriteExcel->new("$sFilename") || die ("Couldn't create: $sFilename");
                        else {
                              die ("Error: Unknown file format");
                        my $worksheet = $workbook->add_worksheet();
                        # Define a few commonly used pre-set Formats /Styles that can be applied by NAME
                        # Title format
                        $oTitleFormat = $workbook->add_format(
                                                              bg_color => 12,   
                                                              color    => 22,   
                                                              pattern  => 1,
                                                              border   => 1,
                                                              bold     => 1
                        $oTotalFormat = $workbook->add_format(
                                                  bold        => 1,
                                                  top         => 1,
                                                  num_format  => '#,##0'
                        $oTotalPCTFormat = $workbook->add_format(
                                                  bold        => 1,
                                                  top         => 1,
                                                  num_format  => 10
                        $oPCTFormat = $workbook->add_format(
                                                  num_format  => 10
                        #Write the Column headings
                        foreach $sKey ( @{$raOutFields} ) {
                         $worksheet->write($iRow, $iCol++, $sKey, $oTitleFormat);
                        #Freeze the first Row
                        $worksheet->freeze_panes(1, 0);
                        #Write any passed JSON Data 
                        if ($oJsonData){ 
                          foreach $rhDoc ( @{$oJsonData} ) {
                              foreach $sKey ( @{$raOutFields} ) {
                                      $sVal = getValue($sKey, $rhDoc, 'true');
                                      if ( ref($sVal) eq 'HASH' ) {
                                              $oFormat = undef;
                                              if ( $sVal->{ 'format' } ) {
                                                      if ( ref($sVal->{ 'format' }) eq 'HASH') {
                                                              $oFormat = $workbook->add_format($sVal->{ 'format' });
                                                      } elsif ( $sVal->{ 'format' } eq 'Total' ) {
                                                              $oFormat = $oTotalFormat;
                                                      } elsif ( $sVal->{ 'format' } eq 'TotalPCT' ) {
                                                              $oFormat = $oTotalPCTFormat;
                                                      } elsif ( $sVal->{ 'format' } eq 'PCT' ) {
                                                              $oFormat = $oPCTFormat;
                                              $worksheet->write($iRow, $iCol, $sVal->{ 'value' }, $oFormat );
                                      } else {
                                              $worksheet->write($iRow, $iCol, $sVal);   
                              } # End - foreach $sKey
                          } #End Foreach row (JSON object)
                        return $iRow;
                      } ## End sub

Open in new window

Note: The above code has been cannibalised / butchered from a larger package I created some time ago, it's not complete, but the links will point you at a code snippet you can use to complete the Base64 encode logic / re-write. I have also cut out a load of other named Styles I personally use for my own purposes (left just enough to provide a few ideas). If you want to add you own names styles the spreadsheet::WriteExcel page provides plenty of info and examples. I have also omitted a few lines of code that walk the passed JSON style object, for all keys when $raOutFields is null.

As we now have a mechanism to connect to a Database and to convert a Data object into an Excel workbook it's time to glue the two together by creating the Calling / Reporting script.
The code below I hope is fairly self explanatory / vanilla, with the possible exception of the: get_report_info() function. Essentially this function takes a passed DB connection object and returns an Array containing the results from running one or more queries against a database, along with any local post processing, additional data or formula and styling you require.  Each member of the returned array represents one Row of results.

The Rows themselves are encoded as a Hash, with the Column names as the Keys and the Value being either:
A simple scalar value, probably returned from the database
An Excel formula string e.g.  =SUM(A2:A10)
A HASH containing two keys:
1) "value" - With its value being either a scalar or an Excel formula
2) "format" - Either the Name of a locally predefined style or a Hash defining the styling elements, see spreadsheet::WriteExcel

Once the results have been converted to an Excel workbook we simply Base64 / mime encode the file and send it as an attachment to a predefined distribution list.
#!/usr/bin/perl -w
                      # Purpose: Example - Query a Database, play with then output, write to an Excel sheet, encode and email
                      use strict;
                      use DBI;
                      use myConnectDB;
                      use myExcelLib;
                      sub mail_monthly_report($);
                      # make a connection to the database DB
                      my $dbh = myConnectDB::db_connect();
                      print mail_report($dbh)."\n\n";
                      # disconnect from database
                      # -------------------------------------------
                      # Retrieve the data for the report
                      # -------------------------------------------
                      sub get_report_info($) {
                      my ($dbh)  = @_;
                      my ($sQuery, $sth, @aResults, $rhDoc, $iRow);
                      my ($dbSummary_seller, $dbSummary_producttype, $dbSummary_completed, $dbSummary_suspended, $dbSummary_cancelled, $dbSummary_count);
                      #  PSEUDO SQL - Replace with something sensible
                      # prepare and execute query
                      $sQuery = "SELECT as 'Seller', 
                                      IF((s.cancelled <> 0), 'N', s.completed) as Completed, 
                                      IF((s.cancelled <> 0), 'Y', 'N') as Cancelled, 
                                FROM  sales s, 
                                      login l 
                                WHERE s.Seller = 
                                GROUP BY Seller,  
                                ORDER BY Seller, 
                      $sth = $dbh->prepare($sQuery)
                              or die "Couldn't prepare statement: " . $dbh->errstr;
                              or die "Couldn't execute statement: " . $sth->errstr;
                      # assign fields to variables
                      #Make sure we have a a few numbers.
                      if ($sth->rows > 0) {
                              # output member list to browser as drop-down listings
                              while($sth->fetch()) {
                                      $rhDoc = undef;
                                      $rhDoc->{ 'Seller' }            = $dbSummary_seller;
                                      $rhDoc->{ 'Product Type' }      = $dbSummary_producttype;
                                      $rhDoc->{ 'Completed' }            = $dbSummary_completed;
                                      $rhDoc->{ 'Suspended' } 		= $dbSummary_suspended;
                                      $rhDoc->{ 'Cancelled' }            = $dbSummary_cancelled;
                                      $rhDoc->{ 'Product Count' }     = $dbSummary_count;
                                      # Add a formula to calculate the Percentage of the total
                                      $rhDoc->{ 'Percentage Of All' }->{ 'value' } = "=(F$iRow/F".($sth->rows + 2).")";
                                      $rhDoc->{ 'Percentage Of All' }->{ 'format' }  = 'PCT';
                                      # Add a Styles Column Summary / Total Row to the Doc
                                      $rhDoc = undef;
                                      $rhDoc->{ 'Product Count' }->{ 'value' }   = "=SUM(F2:F$iRow)";
                                      $rhDoc->{ 'Product Count' }->{ 'format' }  = 'Total';
                                      $rhDoc->{ 'Percentage Of All' }->{ 'value' } = "=SUM(G2:G$iRow)";
                                      $rhDoc->{ 'Percentage Of All' }->{ 'format' }  = 'TotalPCT';
                             return \@aResults;
                      return undef
                      # -------------------------------------------
                      # Extract and Email a file
                      # -------------------------------------------
                      sub mail_report($) {
                      my $raResults   = get_report_info($_[0]);
                      if ($raResults) {
                          my $sFileName       = "Some_Report.xls";
                          my $sTmpDir         = '/tmp/';
                          my $sOriginator     = "doNotReply\";
                          my $sRecipient      = "some.user\";
                          my $sMessage         = "\n\nSome report detailing some random statistics:\n\n";
                          my $sBoundary       = "====" . time() . "====";
                          my $sendmail        = "/usr/sbin/sendmail -t";
                          my $reply_to        = "Reply-to: $sOriginator\n";
                          my $subject         = "Subject: $sMessage\n";
                          my $send_to         = "To: $sRecipient\n";
                          my $cc              = "Cc: $sOriginator\n";
                          my $raOutFields     = ['Seller', 'Product Type', 'Completed', 'Suspended', 'Cancelled', 'Product  Count','Percentage Of All'];
                          my $iRowsWritten    = myExcelLib::convertToExcel ("$sTmpDir$sFileName", undef, $raResults, $raOutFields, 'xls');
                          open(DLFILE, "<$sTmpDir$sFileName") || Error('open', 'file');
                          my @fileholder = <DLFILE>;
                          close (DLFILE) || Error ('close', 'file');
                          open(SENDMAIL, "|$sendmail") or die "Cannot open $sendmail: $!";
                          print SENDMAIL $reply_to;
                          print SENDMAIL $send_to;
                          print SENDMAIL $cc;
                          print SENDMAIL $subject;
                          print SENDMAIL "Mime-Version: 1.0\n";
                          print SENDMAIL "Content-Type: multipart/mixed; boundary=\"$sBoundary\"\n";
                          print SENDMAIL "Content-Disposition: inline\n\n";
                          print SENDMAIL "--$sBoundary\n";
                          print SENDMAIL "Content-type: text/plain\n\n";
                          print SENDMAIL $sMessage;
                          print SENDMAIL "--$sBoundary\n";
                          print SENDMAIL "Content-Type: application/octet-stream; name=\"$sFileName\"\n";
                          print SENDMAIL "Content-Disposition: attachment; filename=\"$sFileName\"\n";
                          print SENDMAIL "Content-Transfer-Encoding: base64\n\n";
                          print SENDMAIL myExcelLib::base64_encode_file("$sTmpDir$sFileName")."\n";
                          print SENDMAIL "\n--$sBoundary--\n";
                          # Delete the local copy of the Excel sheet
                          return "$sMessage - has been emailed\n";
                      return 'Error: Unable to send email report';
                      ## End ###

Open in new window

Note: You'll obviously need to alter the: SQL query, email recipients, email originator, email subject, and message. You may also need to alter the location of the sendmail binary and the instantiation of $sTmpDir

Comments (0)

Have a question about something in this article? You can receive help directly from the article author. Sign up for a free trial to get started.