Link to home
Start Free TrialLog in
Avatar of jgrammer42
jgrammer42

asked on

Advice needed on storage of unstructured text data

I have over 60 stores that use cash registers for sales.  I have a method of collecting every transaction from a cash register by capturing the cash registers electronic journal through the serial port.  The data is all text based, but very unstructured.  (I am including a sample file of this data capture to this post.)

The question I have is this:  IF you were tasked with storing, then retrieving and analyzing these data; what type of database storage system would you use?  And why that one?  

I am waffling on whether to use a SQL or NoSQL solution.

Thank you,
Jeff

Here is a sample of the raw data capture.  (Note: about half of the way down there is something called a "Z report"; which is run when a cash register is closed out.  That zero's out all of the balances and totals during that period of time it was operating.)

DATE  10/09/2019 WED  TIME 02:17
                               
2X             @ 4.29          
Purses/Backpack T1         $8.58
Childrens Clothing T1      $1.99
TAX1                       $0.63
TOTAL                     $11.20
CASH                      $12.00
CHANGE                     $0.80
CLERK 1        No.000072   00000
                               
                               
                               
                               
                               
                               
                               
DATE  10/09/2019 WED  TIME 02:18
                               
5X             @ 5.79          
Jeans T1                  $28.95
Sweater/Sweatshirt T1      $4.59
3X             @ 2.59          
T-Shirt/Cami T1            $7.77
50% OFF                 -50.000%
AMOUNT T1                  -3.89
Furniture T1              $25.00
TAX1                       $3.75
TOTAL                     $66.17
CASH                      $67.00
CHANGE                     $0.83
CLERK 1        No.000073   00000
                               
                               
                               
                               
                               
                               
                               
DATE  10/09/2019 WED  TIME 02:18
                               
Winter Coat T1             $6.29
2X             @ 4.59          
Sweater/Sweatshirt T1      $9.18
Elec/Mech T1              $15.00
TAX1                       $1.83
TOTAL                     $32.30
Credit Card               $32.30
--------------------------------
--------------------------------
CLERK 1        No.000074   00000
                               
                               
                               
                               
                               
                               
                               
DATE  10/09/2019 WED  TIME 02:19
                               
Shoes T1                  $25.00
4X             @ 1.99          
Childrens Clothing T1      $7.96
Newborn - 5t T1            $0.99
Shirt T1                   $4.29
2X             @ 2.59          
T-Shirt/Cami T1            $5.18
DONATION                   $0.92
DONATION                   $0.05
TAX1                       $2.61
TOTAL                     $47.00
Credit Card               $47.00
--------------------------------
--------------------------------
CLERK 1        No.000075   00000
                               
                               
                               
                               
                               
                               
                               
DATE  10/09/2019 WED  TIME 02:19
                               
Dress T1                   $5.79
Jacket/Blazer T1           $6.29
Skirts T1                  $3.99
3X             @ 2.59          
Bra/Slip T1                $7.77
TAX1                       $1.43
TOTAL                     $25.27
CASH                      $30.00
CHANGE                     $4.73
CLERK 1        No.000076   00000
                               
                               
                               
                               
                               
                               
                               
DATE  10/09/2019 WED  TIME 02:20
                               
Z 1  REPORT                00007
LAST REPORT           10/08/2019
--------------------------------
FINANCIAL                      
                               
+PLU TTL                      42
                         $217.61
ADJST TTL                     42
                         $217.61
--------------------------------
NONTAX                     $0.97
Sales                    $216.64
Tax                       $13.01
% 1                            1
                           -3.89
Customer #                     7
                         $230.62
GROSS SALES              $234.51
CASH SALES                     5
                         $151.32
CASH-IN-D                      5
                         $151.32
CHG1-IN-D                      2
                          $79.30
Credit Sales                   2
                          $79.30
DRWR TTL                 $230.62
AVG $/CUST                $32.94
OPEN CHECK TOTAL           $0.00
********************************
                               
GRAND                   $1429.83
                               
CLERK 1        No.000077   00000
                               
                               
                               
                               
                               
                               
                               
                               
                               
DATE  10/09/2019 WED  TIME 02:20
                               
Z 1  REPORT                00007
LAST REPORT           10/08/2019
--------------------------------
ALL CLERK                      
                               
Management                      
NET SALE                       7
                         $230.62
DRWR TTL                 $230.62
--------------------------------
AP-200793                      
--------------------------------
BT-200803                      
--------------------------------
CP-201588                      
--------------------------------
DM-402789                      
--------------------------------
JH-406479                      
--------------------------------
KR-200543                      
--------------------------------
LB-404475                      
--------------------------------
PH-200890                      
--------------------------------
C10                            
--------------------------------
C11                            
--------------------------------
C12                            
--------------------------------
C13                            
--------------------------------
C14                            
--------------------------------
CLERK 15                        
--------------------------------
CLERK 16                        
--------------------------------
CLERK 17                        
--------------------------------
CLERK 18                        
--------------------------------
CLERK 19                        
--------------------------------
CLERK 20                        
--------------------------------
CLERK 21                        
--------------------------------
CLERK 22                        
--------------------------------
CLERK 23                        
--------------------------------
CLERK 24                        
--------------------------------
CLERK 25                        
--------------------------------
                               
CLERK 1        No.000078   00000
                               
                               
                               
                               
                               
                               
                               
DATE  10/09/2019 WED  TIME 02:20
                               
Z 1  REPORT                00007
LAST REPORT           10/08/2019
--------------------------------
GROUP                          
                               
GROUP : 1 TEXTILES              
 CNT                          35
 SALES AMT               $143.06
GROUP : 2 FURNITURE            
 CNT                           1
 SALES AMT                $25.00
GROUP : 3 ELEC/MECH            
 CNT                           1
 SALES AMT                $15.00
GROUP : 4 ACCESSORIES          
 CNT                           3
 SALES AMT                $33.58
GROUP : 10 DONATION            
 CNT                           2
 SALES AMT                 $0.97
********************************
TOTAL CNT                     42
TOTAL AMT                $217.61
                               
CLERK 1        No.000079   00000
                               
                               
                               
                               
                               
                               
                               
DATE  10/09/2019 WED  TIME 02:21
                               
Z 1  REPORT                00007
LAST REPORT           10/08/2019
--------------------------------
ALL PLU                        
                               
Shirt                          
 CNT 1                     $4.29
T-Shirt/Cami                    
 CNT 5                     $9.06
Sweater/Sweatshirt              
 CNT 5                    $22.95
Winter Coat                    
 CNT 5                    $31.45
Jeans                          
 CNT 7                    $40.53
Skirts                          
 CNT 1                     $3.99
Dress                          
 CNT 1                     $5.79
Jacket/Blazer                  
 CNT 1                     $6.29
Bra/Slip                        
 CNT 3                     $7.77
Childrens Clothing              
 CNT 5                     $9.95
Newborn - 5t                    
 CNT 1                     $0.99
Purses/Backpack                
 CNT 2                     $8.58
Furniture                      
 CNT 1                    $25.00
Elec/Mech                      
 CNT 1                    $15.00
Shoes                          
 CNT 1                    $25.00
DONATION                        
 CNT 2                     $0.97
********************************
TOTAL CNT                     42
TOTAL AMT                $217.61
                               
CLERK 1        No.000080   00000
Avatar of Dustin Saunders
Dustin Saunders
Flag of United States of America image

I assume these will be parsed and broken out into actual data?  

But for storing anything like this where you have transactions and are undoubtedly going to need reporting and aggregation, you'll be best served with a relational SQL database like SQL Server or MySQL.  The only advantage NoSQL has on SQL Server is the ability to scale horizontally which isn't going to be a concern for the vast majority of applications (you can even do "schemaless" implementations in SQL Server with a few tables to support the schemaless schema).

How you build those tables would be based on what you need to parse-- but you'd want to store things such as the quantity, individual cost, product name, the discount that was applied, the clerk involved, etc.  and probably the rest of the transaction such as the tax collected and total costs.

Either way, you will want to parse out the data, but on NoSQL you're going to eat it on aggregation and reporting, and you'll end up storing much more data than is required as opposed to doing something like creating a new product record for each existing product and then referencing and indexing that id.

My rule of thumb (and this isn't everyone's):  NoSQL is good for static, non-changing, non-reporting data.  So something like tons of log files that could have varying schemas, or chat conversations that won't be a part of reporting.
Keeping unstructured data means that in the end you don't have anything more like a folder full of little papers...you can't get any useful reports...you can't monitor stock, client trends...you name it...but if you want to dump them just somehow for future reference even the simple file system would be just fine (thousands of little 1K files...given that you have 60 stores x 300 days = 18000 files per year.. assuming that the stores work 6 days a week minus the holidays roughly )
What you really need is a good "parser" ...an application that will read your input and based on predefined rules split the "data" to useful data and store them in an appropriate database engine (doesn't really matter which one).
60 stores x 300 days = 18000 files per year.. assuming that the stores work 6 days a week minus the holidays roughly

Assuming of course each store only has one register. Out of curiosity, how many registers are at each location (approx)

Do the stores have a "controller/server" which report to a primary?

Or are you pulling each report manually?
ASKER CERTIFIED SOLUTION
Avatar of dpearson
dpearson

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
This makes #5 to support a SQL type solution over NoSQL.  :)

The data looks like it will actually parse rather easily.  Each transaction seems to start with a date/time stamp, followed by a list of sold items (and the transaction close-out items) or a "Z 1 REPORT" header.  There's not a lot of challenge in writing something that would convert the text into something that a load routine could handle.

Use the date/time as a transaction ID (assuming that your cashier isn't so fast that he/she can run multiple customers in under a second).  You might want to prepend that with a store-id.  Then list the sales.  You've got several special items that you'll have to decide how to handle -- "2X @ 2.29", "50 % OFF", etc. but they aren't that much of a challenge.

What you really want is the register to post a UPC or product code on every sales line.  That would give you much better reporting and analysis.  But what you have will still let you categorize sales and volumes.


Kent
Avatar of jgrammer42
jgrammer42

ASKER

kenfcamp:
Answers to your questions:   On average each store has 3 cash registers.

What I have at each location is a fanless, miniPC, with 4Gb Ram and 128Gb SSD running Linux with a USB to 4-port serial hydra cable which connects to a serial port of each cash register.   These localized servers can and will upload to a master server.  (but I have not built that yet as I am waffling on just what I should do there as a clearing house DB.)

NOTE TO ALL: This capture is for audit trails.  It will not be used to track inventory or even "live sales data".  How these data will be used on the backend is for historical reporting, trend analysis and financial audits.
Just setup a cloud VPS with MySQL and push everything there.....for the good data..
For the text part...i would setup a Firebird server...do a bit of parsing and store the Text as BLOBs..
Thank you all so far for your very valuable insight and expertise!!

This is greatly helping me think this through!  

I truly cannot thank you all enough.
It looks like you have both transaction detail segments and end-of-day reports.  I'd probably separate those two types of data segments.
Hi,

I would look at the MariaDB database and the Connect Storage Engine and create some sort of parser that can be called using REST API to move the data easily from unstructured to structured data.
The connect storage engine has also other forms to read data like this
Then use InnoDB or Aria storage engines to hold the relational tables.

Regards,
    Tomas Helgi
Should not be too difficult to write something in Perl or some other text processing language to turn this into a series of structured transaction, that will give you much better insight into your sales data than storing as free text