Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17


Using Perl to extract, style and email data from a MySQL Database as an Excel email attachement

Published on
12,479 Points
Last Modified:
I have been pestered over the years to produce and distribute regular data extracts, and often the request have explicitly requested the data be emailed as an Excel attachement; specifically Excel, as it appears: CSV files confuse (no Red or Green highlighting), Google Docs links confound (potentially require the recall and entry of a password) and Business Object, Crystal, iReports and the like all require some thought and action by the recipient (a non starter).

Anyway as the source of the data has generally been a remote Oracle / MySQL / DB2 / Mongo... DB, and the requirements generally included a statement mandating the extract be available first thing Monday morning ,the obvious action would be to write and schedule (crontab) a script to:

Connect to a DB
Submit a query or two to Extract some Data
Apply a bit of logic to reformat, add a few formula, filters, graphs and colour (Senior managers like Green)
Output the results as an Excel sheet
Encode and Email the sheet (as if you'd been in all weekend) to one ore more recipients.

...and all while you're still seeking your morning caffeine hit.

The above actions can be performed reasonably easily in any of the half dozen languages you'll likely find on the average (non-hardened) Unix box: Perl (Practical Extraction and Reporting Language), PHP, Java, Python......  but as the purpose of this script resembles one of the languages acronym's I'll attempt to scribble down a few hints / code snippets to get you started in Perl.

Assuming your Server / PC / Mac has the applicable developer tools installed, along with access to your database server, and to an open email relay, you'll first want to pull down, from CPAN (ftp.perl.org - make sure any firewall's open), the necessary libraries to connect to a Database, create an Excel workbook and possibly encode a workbook.

The following one liner should more than suffice:
 cpan YAML Data::Dumper Spreadsheet::WriteExcel Tie::IxHash Encode Scalar::Util DBI DBD::mysql JSON MongoDB File::Basename Excel::Writer::XLSX IO::File MIME::Base64 Encode::Encoding

Open in new window

Ideally run the above as root (sudo)
Alter the DBI::xxxxxxx  include to suite.
Generally just hit enter when prompted
You'll loose half an hour, or so, of your life while the above download / compile / install.

Next you'll want a few lines of code, ideally stuck in a separate package (file), to establish and return a connection to a DB. As this script is likely to contain some hard coded credentials, so  make sure the package is ONLY accessible to applicable / calling user(s) , also externalising the DB connection code encourages re-use, and possibly prevents the proliferation of DB credentials:

# MySQL database information
use strict;
use DBI;
package myConnectDB;

# Assign a variable, as the body must do something in lue of a proper constructor.
our $someVarName="myConnectDB";

return 1; # True

# Return a connection to database
sub db_connect {

# Default - Localhost
my $db="testDatabase";
my $host="";
my $port="3306";
my $userid="someUser";
my $passwd="somePassword";
my $connectionInfo="DBI:mysql:database=$db;$host:$port";
  if ( @_ == 3 ) {
        ($connectionInfo,$userid,$passwd) = @_;

# database information
my $dbh = DBI->connect($connectionInfo,$userid,$passwd)
         or die "Couldn't connect to database: " . DBI->errstr;

return $dbh;

# #### END ####

Open in new window

Now we have some mechanism of connecting to the database we'll want some logic to create an Excel workbook. Again as your more likely than not to be asked to produce another / similar extract it is best to stick this and any other reusable / associated code in a separate package.  So below is some fairly general purpose, but rather messy package stub that includes enough sample code to style and write out and order some or all of the columns in a passed data object, and either XLS or XLSX format workbook. Please note you'll need to go down the XLSX route if your extract has > 64K and < 2M rows.

The required functionality is provided by a simple call to myExcelLib::convertToExcel() with what  I hope are some reasonably self explanatory parameters:  
$sFilename  - a name for the generated Excel workbook.
$oJsonData  -  a  JSON style object, containing an ARRAY of HASHs (ROWS), with each HASH /  containing a series of KEY (Column Name) / VALUE pairs (The VALUE can be a simple scalar / Boolean / timestamp VALUE or a HASH supplying a value and optional styling information).
$raOutFields  - an Array listing the Keys / Columns, from the JSON Document that are to appear in the Sheet. The columns are added to the sheet in the order they appear in the array.
$sFileFormat - Either XLS or XLSX

# Purpose: Assorted Excel related functions
use strict;
use Spreadsheet::WriteExcel;
use Excel::Writer::XLSX;
use Tie::IxHash;
use Scalar::Util 'reftype';
use JSON;
use File::Basename;
use Data::Dumper;
use IO::File;
package myExcelLib;

sub getValue;

my $myExcelLib = 1;

return 1; # True

# --------------------------------------------
sub base64_encode_file {

   # see:  http://perl.find-info.ru/perl/003/base64.htm       (have been using this)
   # alternatively:   http://perldoc.perl.org/Encode/Encoder.html#Example%3a-base64-transcoder
   # Or:   http://perldoc.perl.org/MIME/Base64.html


# --------------------------------------------
sub getValue{
  my ($sKey, $rhDoc,$bReturnRef)=@_;
  my ($sKeyPt1,$sKeyPt2,$sKeyPt3,$refType);

  if ( ($sKey =~ /^[^.]*[.]/) ){
                ($sKeyPt1, $sKeyPt2) = ($sKey =~  /^([^.]+)\.(.+)$/ );
                $refType = reftype($rhDoc->{$sKeyPt1});
                if ( $refType && $refType eq 'HASH' ) {
                        return getValue($sKeyPt2, $rhDoc->{$sKeyPt1});
                elsif ( $refType && $refType eq 'ARRAY' ) {
                        if ( ($sKeyPt2 =~ /^[0-9]+$/) ){
                                return ($rhDoc->{$sKeyPt1}[$sKeyPt2])           if ( isNotARef($rhDoc->{$sKeyPt1}[$sKeyPt2]));
                        } else {
                                if ( ($sKeyPt2 =~ /^[0-9]+[.]/) ){
                                        ($sKeyPt2, $sKeyPt3) = ($sKeyPt2 =~  /^([^.]+)\.(.+)$/ );
                                        return getValue($sKeyPt3, $rhDoc->{$sKeyPt1}[$sKeyPt2]);
                                } else {  #Default to first element of ARRAY, if no index given
                                        return getValue($sKeyPt2, $rhDoc->{$sKeyPt1}[0]);
  } else {
        if ( ref($rhDoc->{$sKey}) eq "DateTime") {
                my $dt = $rhDoc->{$sKey};
                return ($dt->datetime);
        return  $rhDoc->{$sKey}                                                              if ( $bReturnRef || isNotARef($rhDoc->{$sKey}));
  return ("");
} ## End sub

# --------------------------------------------
sub isNotARef($){
  my ($refToTest)=@_;

        if( ! ref($refToTest) || ref($refToTest) eq 'boolean'){
                return 1;
        return 0;
# --------------------------------------------
sub convertToExcel($$$$){
  my ($sFilename, $oJsonData, $raOutFields, $sFileFormat)=@_;
  my ($iRow,$iCol,$rhDoc,$oTitleFormat,$oTotalFormat,$oTotalPCTFormat,$oPCTFormat,$sKey,$workbook,$sVal,$oFormat);

  # Create a new workbook and add a worksheet.
  $sFileFormat = 'xls' if ( ! $sFileFormat );
  if ( $sFileFormat =~ /^xlsx$/i ){
        $workbook  = Excel::Writer::XLSX->new("$sFilename") || die ("Couldn't create: $sFilename");
  elsif ( $sFileFormat =~ /^xls$/i ){
        $workbook  = Spreadsheet::WriteExcel->new("$sFilename") || die ("Couldn't create: $sFilename");
  else {
        die ("Error: Unknown file format");
  my $worksheet = $workbook->add_worksheet();

  # Define a few commonly used pre-set Formats /Styles that can be applied by NAME
  # Title format
  $oTitleFormat = $workbook->add_format(
                                        bg_color => 12,   
                                        color    => 22,   
                                        pattern  => 1,
                                        border   => 1,
                                        bold     => 1
  $oTotalFormat = $workbook->add_format(
                            bold        => 1,
                            top         => 1,
                            num_format  => '#,##0'

  $oTotalPCTFormat = $workbook->add_format(
                            bold        => 1,
                            top         => 1,
                            num_format  => 10
  $oPCTFormat = $workbook->add_format(
                            num_format  => 10

  #Write the Column headings
  foreach $sKey ( @{$raOutFields} ) {
   $worksheet->write($iRow, $iCol++, $sKey, $oTitleFormat);
  #Freeze the first Row
  $worksheet->freeze_panes(1, 0);

  #Write any passed JSON Data 
  if ($oJsonData){ 
    foreach $rhDoc ( @{$oJsonData} ) {
        foreach $sKey ( @{$raOutFields} ) {
                $sVal = getValue($sKey, $rhDoc, 'true');
                if ( ref($sVal) eq 'HASH' ) {
                        $oFormat = undef;
                        if ( $sVal->{ 'format' } ) {
                                if ( ref($sVal->{ 'format' }) eq 'HASH') {
                                        $oFormat = $workbook->add_format($sVal->{ 'format' });
                                } elsif ( $sVal->{ 'format' } eq 'Total' ) {
                                        $oFormat = $oTotalFormat;
                                } elsif ( $sVal->{ 'format' } eq 'TotalPCT' ) {
                                        $oFormat = $oTotalPCTFormat;
                                } elsif ( $sVal->{ 'format' } eq 'PCT' ) {
                                        $oFormat = $oPCTFormat;
                        $worksheet->write($iRow, $iCol, $sVal->{ 'value' }, $oFormat );
                } else {
                        $worksheet->write($iRow, $iCol, $sVal);   
        } # End - foreach $sKey
    } #End Foreach row (JSON object)

  return $iRow;
} ## End sub

Open in new window

Note: The above code has been cannibalised / butchered from a larger package I created some time ago, it's not complete, but the links will point you at a code snippet you can use to complete the Base64 encode logic / re-write. I have also cut out a load of other named Styles I personally use for my own purposes (left just enough to provide a few ideas). If you want to add you own names styles the spreadsheet::WriteExcel page provides plenty of info and examples. I have also omitted a few lines of code that walk the passed JSON style object, for all keys when $raOutFields is null.

As we now have a mechanism to connect to a Database and to convert a Data object into an Excel workbook it's time to glue the two together by creating the Calling / Reporting script.
The code below I hope is fairly self explanatory / vanilla, with the possible exception of the: get_report_info() function. Essentially this function takes a passed DB connection object and returns an Array containing the results from running one or more queries against a database, along with any local post processing, additional data or formula and styling you require.  Each member of the returned array represents one Row of results.

The Rows themselves are encoded as a Hash, with the Column names as the Keys and the Value being either:
A simple scalar value, probably returned from the database
An Excel formula string e.g.  =SUM(A2:A10)
A HASH containing two keys:
1) "value" - With its value being either a scalar or an Excel formula
2) "format" - Either the Name of a locally predefined style or a Hash defining the styling elements, see spreadsheet::WriteExcel

Once the results have been converted to an Excel workbook we simply Base64 / mime encode the file and send it as an attachment to a predefined distribution list.

#!/usr/bin/perl -w
# Purpose: Example - Query a Database, play with then output, write to an Excel sheet, encode and email

use strict;
use DBI;
use myConnectDB;
use myExcelLib;

sub mail_monthly_report($);

# make a connection to the database DB
my $dbh = myConnectDB::db_connect();

print mail_report($dbh)."\n\n";

# disconnect from database


# -------------------------------------------
# Retrieve the data for the report
# -------------------------------------------
sub get_report_info($) {
my ($dbh)  = @_;

my ($sQuery, $sth, @aResults, $rhDoc, $iRow);
my ($dbSummary_seller, $dbSummary_producttype, $dbSummary_completed, $dbSummary_suspended, $dbSummary_cancelled, $dbSummary_count);

#  PSEUDO SQL - Replace with something sensible
# prepare and execute query
$sQuery = "SELECT l.name as 'Seller', 
                IF((s.cancelled <> 0), 'N', s.completed) as Completed, 
                IF((s.cancelled <> 0), 'Y', 'N') as Cancelled, 
          FROM  sales s, 
                login l 
          WHERE s.Seller = l.id 
          GROUP BY Seller,  
          ORDER BY Seller, 

$sth = $dbh->prepare($sQuery)
        or die "Couldn't prepare statement: " . $dbh->errstr;

        or die "Couldn't execute statement: " . $sth->errstr;

# assign fields to variables

#Make sure we have a a few numbers.
if ($sth->rows > 0) {
        # output member list to browser as drop-down listings
        while($sth->fetch()) {
                $rhDoc = undef;
                $rhDoc->{ 'Seller' }            = $dbSummary_seller;
                $rhDoc->{ 'Product Type' }      = $dbSummary_producttype;
                $rhDoc->{ 'Completed' }            = $dbSummary_completed;
                $rhDoc->{ 'Suspended' } 		= $dbSummary_suspended;
                $rhDoc->{ 'Cancelled' }            = $dbSummary_cancelled;
                $rhDoc->{ 'Product Count' }     = $dbSummary_count;
                # Add a formula to calculate the Percentage of the total
                $rhDoc->{ 'Percentage Of All' }->{ 'value' } = "=(F$iRow/F".($sth->rows + 2).")";
                $rhDoc->{ 'Percentage Of All' }->{ 'format' }  = 'PCT';
                # Add a Styles Column Summary / Total Row to the Doc
                $rhDoc = undef;
                $rhDoc->{ 'Product Count' }->{ 'value' }   = "=SUM(F2:F$iRow)";
                $rhDoc->{ 'Product Count' }->{ 'format' }  = 'Total';
                $rhDoc->{ 'Percentage Of All' }->{ 'value' } = "=SUM(G2:G$iRow)";
                $rhDoc->{ 'Percentage Of All' }->{ 'format' }  = 'TotalPCT';
       return \@aResults;
return undef

# -------------------------------------------
# Extract and Email a file
# -------------------------------------------
sub mail_report($) {

my $raResults   = get_report_info($_[0]);

if ($raResults) {
    my $sFileName       = "Some_Report.xls";
    my $sTmpDir         = '/tmp/';
    my $sOriginator     = "doNotReply\@some.host.com";
    my $sRecipient      = "some.user\@some.host.com";
    my $sMessage         = "\n\nSome report detailing some random statistics:\n\n";
    my $sBoundary       = "====" . time() . "====";
    my $sendmail        = "/usr/sbin/sendmail -t";
    my $reply_to        = "Reply-to: $sOriginator\n";
    my $subject         = "Subject: $sMessage\n";
    my $send_to         = "To: $sRecipient\n";
    my $cc              = "Cc: $sOriginator\n";

    my $raOutFields     = ['Seller', 'Product Type', 'Completed', 'Suspended', 'Cancelled', 'Product  Count','Percentage Of All'];
    my $iRowsWritten    = myExcelLib::convertToExcel ("$sTmpDir$sFileName", undef, $raResults, $raOutFields, 'xls');

    open(DLFILE, "<$sTmpDir$sFileName") || Error('open', 'file');
    my @fileholder = <DLFILE>;
    close (DLFILE) || Error ('close', 'file');

    open(SENDMAIL, "|$sendmail") or die "Cannot open $sendmail: $!";
    print SENDMAIL $reply_to;
    print SENDMAIL $send_to;
    print SENDMAIL $cc;
    print SENDMAIL $subject;
    print SENDMAIL "Mime-Version: 1.0\n";
    print SENDMAIL "Content-Type: multipart/mixed; boundary=\"$sBoundary\"\n";
    print SENDMAIL "Content-Disposition: inline\n\n";
    print SENDMAIL "--$sBoundary\n";
    print SENDMAIL "Content-type: text/plain\n\n";
    print SENDMAIL $sMessage;
    print SENDMAIL "--$sBoundary\n";
    print SENDMAIL "Content-Type: application/octet-stream; name=\"$sFileName\"\n";
    print SENDMAIL "Content-Disposition: attachment; filename=\"$sFileName\"\n";
    print SENDMAIL "Content-Transfer-Encoding: base64\n\n";
    print SENDMAIL myExcelLib::base64_encode_file("$sTmpDir$sFileName")."\n";
    print SENDMAIL "\n--$sBoundary--\n";

    # Delete the local copy of the Excel sheet
    return "$sMessage - has been emailed\n";
return 'Error: Unable to send email report';

## End ###

Open in new window

Note: You'll obviously need to alter the: SQL query, email recipients, email originator, email subject, and message. You may also need to alter the location of the sendmail binary and the instantiation of $sTmpDir
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Join & Write a Comment

Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…
Six Sigma Control Plans

Keep in touch with Experts Exchange

Tech news and trends delivered to your inbox every month