Using Perl to convert an Excel Workbook into a MongoDB, with each Worksheet converted into a separate MongoDB collection

Published on
13,638 Points
1 Endorsement
Last Modified:
A year or so back I was asked to have a play with MongoDB; within half an hour I had downloaded,  installed and started the daemon, and had a console window open.

After an hour or two of playing at the command line I created a database or two, a couple of collections and a number of handcrafted JSON documents. At which point I went in search of a GUI and found RockMongo.

Another half hour of playing and I had a Web based GUI, that's great for ad-hoc queries and admin tasks, but your still to left to manually handcraft and enter your own JSON documents, via a textarea box. At which point I realised that if I was to evaluate the map-reduce functionality,  attempt to join data from two collections, let alone identify and evaluate any Business Reporting tools, that I would need to start cutting some code and convert an existing data source.

Data wasn't really an issue as the Companies business was data, so a simple choice  of either hooking up to a DB or using one or more of the many Excel reports kicking around and picking a language.  After a quick surf of the MongoDB site and read of the Perl tutorial I chose the latter and Perl.

After installing the necessary Perl libraries, enter:
cpan YAML Data::Dumper Spreadsheet::ParseExcel Tie::IxHash Encode Scalar::Util  JSON MongoDB MongoDB::OID File::Basename 

Open in new window

A quick play with both the MongoDB and Spreadsheet::ParseExcel examples and a little bit of thought, I had hacked together a very basic (and slightly naughty - blindly inserts without checking the status) command line tool that will happily convert an Excel Workbook (XLS not XLSX) into:
A Database - named after the file
A Series of collections - One per Worksheet present in the  Workbook, and named accordingly
A Series of Documents in each Collection, where each document represents one row from a Worksheet, with it's Key names taken from the Cell (Column) names in Row 1 of the sheet

Anyway enough of a rant, some code.
#!/usr/bin/perl -w
# Purpose: Insert each Worksheet, in an Excel Workbook, into an existing MongoDB, of the same name as the Excel(.xls).
#          The worksheet names are mapped to the collection names, and the column named to the document hash labels.
#          Assumes each sheet is named and that the first ROW on each sheet contains the hash(field) names.
use strict;
use Spreadsheet::ParseExcel;
use MongoDB;
use MongoDB::OID;
use Tie::IxHash;
die "You must provide a filename to $0 to be parsed as an Excel file" unless @ARGV;
my $sDbName              = $ARGV[0];
   $sDbName              =~ s/\.xls//i;
my $oExcel               = new Spreadsheet::ParseExcel;
my $oBook                = $oExcel->Parse($ARGV[0]);
my $oConn                = MongoDB::Connection->new(host => 'some.server:27017');
my $oDB                  = $oConn->$sDbName;
my ($sColName, %hNewDoc, $hColToInsertInto, $sFieldName, $iR, $iC, $oWkS, $oWkC);
print "FILE  :", $oBook->{File} , "\n";
print "DB: $sDbName\n";
print "Collection Count :", $oBook->{SheetCount} , "\n";
for(my $iSheet=0; $iSheet < $oBook->{SheetCount} ; $iSheet++)
 $oWkS                   = $oBook->{Worksheet}[$iSheet];
 $sColName               = $oWkS->{Name};
 $hColToInsertInto       = $oDB->$sColName;
 print "Collection(WorkSheet name):", $sColName, "\n";
 for(my $iR   = $oWkS->{MinRow} ; defined $oWkS->{MaxRow} && $iR <= $oWkS->{MaxRow} ;  $iR++)
  tie ( %hNewDoc, "Tie::IxHash");
  for(my $iC = $oWkS->{MinCol} ; defined $oWkS->{MaxCol} && $iC <= $oWkS->{MaxCol} ; $iC++)
   $sFieldName           = $oWkS->{Cells}[$oWkS->{MinRow}][$iC]->Value;
   $oWkC                 = $oWkS->{Cells}[$iR][$iC];
   $hNewDoc{$sFieldName} = $oWkC->Value if($oWkC && $sFieldName);
 print "Documents inserted(Rows):", ($oWkS->{MaxRow} - $oWkS->{MinRow}), "\n";

Open in new window

Change the connection ($oConn) string to suit, and if needed add a user-id and password to the arguments.
If you need XLSX support a quick switch to Spreadsheet::XLSX is all that's needed. Alternatively it only takes a few lines of code, to detect the filetype and call the appropriate library.
The above is a simple hack, assumes everything in a cell is a string / scalar, if preserving type is important, a little function with a few regexp can be used in conjunction with a few if statements to ensure numbers / dates remain in the applicable format when written to the DB

Apparently the command line is scarey, so if asked to share consider wrapping your logic in a CGI script / upload form :)

The script should return some output along the following lines:
arober11@wibble:~/src/perl> ./mongoTST.pl testData.xls
FILE  :testData.xls
DB: testData
Collection Count :1
Collection(WorkSheet name):Sheet1
Documents inserted(Rows):244

Open in new window

Enjoy this complimentary article view.

Get unlimited access to our entire library of technical procedures, guides, and tutorials written by certified industry professionals.

Get 7 days free
Click here to view the full article

Using this article for work? Experts Exchange can benefit your whole team.

Learn More
Experts Exchange is a tech solutions provider where users receive personalized tech help from vetted certified professionals. These industry professionals also write and publish relevant articles on our site.
Ask questions about what you read
If you have a question about something within an article, you can receive help directly from the article author. Experts Exchange article authors are available to answer questions and further the discussion.
Learn from the best.