Link to home
Start Free TrialLog in
Avatar of day6
day6Flag for United States of America

asked on

Need help adding files to a MySQL table, but want to simply export an html table as a xls or csv file.

Can't seem to figure out how to save this html table as an excel file or csv. My preference is to be able to write it directly to a MySQL table using ColdFusion but can't figure out how to do that either because the html table is just an output of a complex set of queries that include summary calculations, etc. I am ultimately trying to consolidate a MySQL table that has nearly 100,000 rows into a new table that calculates and summarizes the overall rows into a summary table like this page.

http://day6management.com/Scripts/distro_comp.cfm

I intend to simply import the excel or csv to my MySQL database as a new table using PHPMyAdmin's import tool. I hate that I need this extra step, but I am not a good enough coldfusion programmer to know how to build an array, etc. It's just too complex for my experience. So this is my possible workaround.
Avatar of _agx_
_agx_
Flag of United States of America image

Can you show how are you generating that html table? ie Outputting a cfquery?

<cfoutput query="yourQuery">
   <tr><td>#artist#</td> 
            <td>#year#</td> 
            <td>#song#</td>
            ... more columns
    </tr>
</cfoutput>

Open in new window

Avatar of day6

ASKER

<cfoutput>


<cfquery datasource="#Mydsn#" name="qARTIST">
SELECT * FROM distro
WHERE 1=1
GROUP BY artist
ORDER BY artist
Limit 1
</cfquery>
<table width="100%" id="distro_comp">
<tr>
<td>artist</td><td>year</td><td>song</td><td>method</td><td>store</td><td>catg</td><td>jan</td><td>feb</td><td>mar</td><td>apr</td><td>may</td><td>jun</td><td>jul</td><td>aug</td><td>sep</td><td>oct</td><td>nov</td><td>dec</td><td>total</td></tr>
<cfloop query="qARTIST">
<!--- Find release year --->
<cfquery name="qYR" datasource="#mydsn#">
SELECT Extract(YEAR from reporting_date) as YR FROM distro
WHERE artist='#qARTIST.artist#'
GROUP BY YR
ORDER BY YR DESC
</cfquery>
<cfset yr1=qYR.recordcount>
<!--- After YEAR is determined, then find what songs released that year and loop over them --->
<cfquery name="qSONGS" datasource="#mydsn#">
SELECT title FROM distro
WHERE artist='#qARTIST.artist#'
AND extract(YEAR from reporting_date)='#qYR.yr#'
GROUP BY title
ORDER BY title
</cfquery>

<cfloop query="qYR">

<cfloop query="qSONGS">
<!--- Now that we know the song, choose what stores sold them and by what method stream or download. First we determine method and then decide qty and then revenue by month --->

<cfquery name="dwnld" datasource="#mydsn#">
SELECT store as STRE, SUM(quantity) as SQTY FROM distro
WHERE artist='#qARTIST.artist#'
AND extract(YEAR from reporting_date)='#qYR.yr#'
AND title='#qSONGS.title#'
AND (store='Amazon' OR store='iTunes' OR store='Google Play' OR store='MediaNet' OR store='Groove(Downloads)')
GROUP BY store
ORDER BY store
</cfquery>

<cfquery name="strm" datasource="#mydsn#">
SELECT store as STRE, SUM(quantity) as SQTY FROM distro
WHERE artist='#qARTIST.artist#'
AND extract(YEAR from reporting_date)='#qYR.yr#'
AND title='#qSONGS.title#'
AND store<>'Amazon' 
AND store<>'iTunes' 
AND store<>'Google Play' 
AND store<>'MediaNet' 
AND store<>'Groove(Downloads)'
GROUP BY store
ORDER BY store
</cfquery>

<cfloop list="dwnld|strm" index="qry" delimiters="|">

<cfset meth=#qry#>


<cfloop query="#qry#">
<!--- Now that we have a loop of stores, we must sort by month a sum total of qtys and revenues. QTY first and the REV --->

  <tr>
    <td>#Ucase(qARTIST.artist)# </td>
    <td>#qYR.yr#</td>
    <td>#qSONGS.title#</td>
    <td>#meth#</td>
    <td>#STRE#</td>
    <td>qty</td>
    <cfset row=0>
<cfloop from="1" to="12" index="mn">
<cfset row=row+1>
<cfquery name="qMon" datasource="#mydsn#">
SELECT Sum(quantity) as QTY from Distro
WHERE artist='#qARTIST.ARTIST#'
AND title='#qSONGS.TITLE#'
AND store='#stre#'
AND Extract(YEAR from reporting_date)='#qYR.yr#'
AND Extract(MONTH from reporting_date)='#row#'
</cfquery>
<cfset cat="qty">
<cfif qMON.QTY is "">
<cfset qt=0>
<cfelse>
<cfset qt=qMON.QTY>
</cfif>
	
    <cfif mn is 1><td>#QT#</td></cfif>
    <cfif mn is 2><td>#QT#</td></cfif>
    <cfif mn is 3><td>#QT#</td></cfif>
    <cfif mn is 4><td>#QT#</td></cfif>
    <cfif mn is 5><td>#QT#</td></cfif>
    <cfif mn is 6><td>#QT#</td></cfif>
    <cfif mn is 7><td>#QT#</td></cfif>
    <cfif mn is 8><td>#QT#</td></cfif>
    <cfif mn is 9><td>#QT#</td></cfif>
    <cfif mn is 10><td>#QT#</td></cfif>
    <cfif mn is 11><td>#QT#</td></cfif>
    <cfif mn is 12><td>#QT#</td></cfif>
    </cfloop>
    <td>#sqty#</td></tr>
    </cfloop>

</cfloop>
</cfloop>
</cfloop>
</cfloop>
</table>

</cfoutput>

Open in new window

SOLUTION
Avatar of _agx_
_agx_
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of day6

ASKER

@agx  Thank you so much for the easy to follow instructions. I'm going to test it out tonight. I wanted to just clarify that I was correct in it not being able to just be written to my MySQL database table straight from <cfoutput> but, indeed, I need to create the TSV file as a middleman document and then import that or append the table from that. I hate that I have to create a file to import versus just using CFML to directly write to the table, but I just couldn't see how to do that with so many queries and cfloops required to get the output I have.

I'll post my results asap.
Technically you could run a bunch of cfquery's, within the cfloop's, but it's not a great way to bulk insert data IMO. Running 1 query per record can be significantly slower than LOAD DATA. Also, if 1 of the queries crashes within the loop, it's hard to reconstruct what happened, then figure out how to insert the remaining data.
Avatar of day6

ASKER

@agx

This is NOT a one time script. It is something I need to do, perhaps, one time a month or maybe twice a month, but the source DB table is updated daily with new rows to calculate. I just don't need to consolidate it daily.

HOWEVER... I'm working this as a beta solution for possibly expanding it to more than one table and likely tables that have many more than 100,000 rows. (millions of rows). I'm hoping to create a workflow that doesn't tax my server memory or CF11 or MySQL processing times. It's just a horrendous time eater doing it manually and being able to write various reports and output them to TSV or XLS format will be a huge help for me since I share the results with others who aren't able to do code or read tables and make sense of them.
If it's a recurring task, I'd probably try and simplify the queries, or create a stored procedure to do it - but I figured you were most interested in getting the import working  first.  Then improve it.

I'm hoping to create a workflow that doesn't tax my server memory or CF11 or MySQL processing times

Good, then LOAD DATA or a stored proc is the way to go.  I don't have time to do it tonight, but maybe open a new thread and post the table DLL's and some sample data and I could help you streamline it for this table. Then you could use it as a model for other tables.
I'm going to head out, but the example above worked on my machine (CF11 & MySQL 5+) so I'm pretty confident it'll work for you.
Avatar of day6

ASKER

Tested everything and it works as expected. I have a few tweaks to make to things, but overall the solution is great!
Avatar of day6

ASKER

@agx

Unfortunately, when I do the entire database with all the artists, it times out in the browser even though I have the CF Admin set to timeout after 1 hour. Not sure how to make this work. I tried it as a scheduled task in CF11 and that didn't run the script either. I looked at the logs and don't see the script throwing any errors, but I assume it's a browser timeout. I just don't know how to get the script to run full way through.
Updated:
Did the original version (that generated an HTML table) work with the whole database (probably no)? I'd guess all of querying within a loop is taking too long, and it needs to either

a) broken up into smaller batches OR
b) the queries refactored.

Questions:

1. Since all of the data seems to be from one table, can you post the CREATE TABLE statement?
2. How many records total (approximately)?
Avatar of day6

ASKER

I tested the same script in chrome and it didn't timeout there. For some reason, Firefox times out. I even modified Firefox timeout through the about:config menu. I ended up adding the scripts timeoutrequest to 0 which I read in cf11 into mean NO timeout at all.

So it takes about 45-50 mins to process.

My starting DB  table has 86,000 rows and it consolidated down to 3500 rows for my new table. It will continue to grow well past 85k rows next time so I imagine this will only become a longer process.

The only thought may be to reduce that original table by purging rows dating back three and four years since now it is unlikely that reporting will change. It is highly possible data from 1-2 years back could be changed so I imagine that needs to stay.

The initial compiler script seems to be the main concern since I will likely do this same routine on additional tables that start out much larger than 85k rows.
>> So it takes about 45-50 mins to process.

Well, 86K records isn't a lot, so that's a long time.  Consolidating shouldn't take that long.  Most likely it's due to all the loops and queries. Looping tends to slow everything down.  I suspect the time would be a lot less without all those inner cfqueries.  If you want to post the CREATE TABLE dll, I'd be happy to take a look. See what improvements I can suggest.

My shop mostly uses SQL Server, but in some of our apps we've queried tables with millions of records, without an issue.  Though creating normalized tables and applying good indexes have a lot to do with that.  If your app primarily does reporting, then consolidating isn't necessarily a bad idea, but I'm curious what problem is it trying to solve? Query performance issues or something else?
Avatar of day6

ASKER

@agx

I don't know what you mean by CREATE TABLE dll. I don't know what you're asking for.
I'd like to look at the SQL used to create your "distro" table (and create the table in my db so I can test queries).  Different tools have different options. From CF, the simplest way is probably:

<cfquery name="qry" datasource="YourDatasourceName">
	SHOW CREATE TABLE distro
</cfquery>
<cfdump var="#qry#">

Open in new window


The one of the columns in that dump will contain the "CREATE TABLE" SQL statement I'm looking for.  For example, it'll look something like this:

CREATE TABLE `distro` ( `recordID` int(11) NOT NULL AUTO_INCREMENT, `artist` varchar(100) DEFAULT NULL, `year` int(11) DEFAULT NULL, `song` varchar(100) DEFAULT NULL, `method` varchar(100) DEFAULT NULL, `catg` varchar(100) DEFAULT NULL, `jan` int(11) DEFAULT NULL, `feb` int(11) DEFAULT NULL, `mar` int(11) DEFAULT NULL, `apr` int(11) DEFAULT NULL, `may` int(11) DEFAULT NULL, `jun` int(11) DEFAULT NULL, `jul` int(11) DEFAULT NULL, `aug` int(11) DEFAULT NULL, `sep` int(11) DEFAULT NULL, `oct` int(11) DEFAULT NULL, `nov` int(11) DEFAULT NULL, `dec` int(11) DEFAULT NULL, `total` int(11) DEFAULT NULL, PRIMARY KEY (`recordID`) ) ENGINE=InnoDB AUTO_INCREMENT=25 DEFAULT CHARSET=latin1 

Open in new window

Avatar of day6

ASKER

http://day6management.com/scripts/distrotablestruct.cfm

I can tell you now that there is no index or primary key column because I use PHPMyAdmin to import a TSV file of 85k records and that file has no key field. I've tried to import the TSV and unless my column count is identical to the table I'M importing it into, the import fails (in PHPMyAdmin).

I'm going to give you a sample TSV file so you know what I'm importing to the distro table in MySQL.
Dec2017.txt
Thanks.

Yeah, not every tool let's you import only select columns.  

So is `distro` a table you regularly import into from an outside source, or is it updated through your application?  Also, what was the issue you're ultimately trying to solve by consolidating the data into another table (minimize total records, improve performance, ...)?
Avatar of day6

ASKER

I only import TSV external files into the DISTRO table. I occasionally have to tweak column data in Google Sheets as a TSV if for some reason the file comes to me with a different header column added or taken away. I make sure the data in the TSV is accurate to my MySQL table columns.

My original goal was to make creating PDF reports much quicker (on the fly) than to have to crunch through 85k records each time a unique report query was being generated by the user of the interface. Based on knowing that each report query may be subject to conditional summaries, it would be much faster to query a smaller table than the growing table DISTRO.  It won't be the same report everytime. It may involve date ranges, specific artists, specific songs, specific retail stores, etc. So the report builder will allow the user to add what information they want to see as their custom report.

I will, however, have my own internal standard reports that do repeat each time, so those queries won't be a headache.
Edit: Added ORDER BY clause to query

Can you try this query (for a single artist like 'Spencer Kane')?  If it works, compare the totals to what you get in the original HTML table. That's just to verify the SQL logic is ccrrect.
http://day6management.com/Scripts/distro_comp.cfm

If the totals for a single artist seem correct, comment out the WHERE  cause and see if the query runs with for all 86k rows .

SELECT Artist
	, Extract(YEAR from Reporting_Date) AS Yr
    , Title AS Song
    , CASE WHEN Store IN ('Amazon', store='iTunes', 'Google Play','MediaNet','Groove(Downloads)') THEN 'dwnld'
		   WHEN Store NOT IN ('Amazon','iTunes','Google Play','MediaNet','Groove(Downloads)') THEN 'strm'
           ELSE 'other'
	   END AS Meth
	, Store 
    , 'qty' AS Catg
    , SUM(CASE WHEN Extract(MONTH from reporting_date) = 1 THEN Quantity ELSE 0 END) AS Jan
    , SUM(CASE WHEN Extract(MONTH from reporting_date) = 2 THEN Quantity ELSE 0 END) AS Feb    
    , SUM(CASE WHEN Extract(MONTH from reporting_date) = 3 THEN Quantity ELSE 0 END) AS Mar        
    , SUM(CASE WHEN Extract(MONTH from reporting_date) = 4 THEN Quantity ELSE 0 END) AS Apr        
    , SUM(CASE WHEN Extract(MONTH from reporting_date) = 5 THEN Quantity ELSE 0 END) AS May        
    , SUM(CASE WHEN Extract(MONTH from reporting_date) = 6 THEN Quantity ELSE 0 END) AS Jun        
    , SUM(CASE WHEN Extract(MONTH from reporting_date) = 7 THEN Quantity ELSE 0 END) AS Jul        
    , SUM(CASE WHEN Extract(MONTH from reporting_date) = 8 THEN Quantity ELSE 0 END) AS Aug        
    , SUM(CASE WHEN Extract(MONTH from reporting_date) = 9 THEN Quantity ELSE 0 END) AS Sep        
    , SUM(CASE WHEN Extract(MONTH from reporting_date) = 10 THEN Quantity ELSE 0 END) AS Oct        
    , SUM(CASE WHEN Extract(MONTH from reporting_date) = 11 THEN Quantity ELSE 0 END) AS Nov        
    , SUM(CASE WHEN Extract(MONTH from reporting_date) = 12 THEN Quantity ELSE 0 END) AS `Dec`
FROM  distro
WHERE  artist = 'Spencer Kane'
GROUP BY     
		Artist
		, Extract(YEAR from Reporting_Date) 
		, Title
		, CASE WHEN Store IN ('Amazon', store='iTunes', 'Google Play','MediaNet','Groove(Downloads)') THEN 'dwnld'
			   WHEN Store NOT IN ('Amazon','iTunes','Google Play','MediaNet','Groove(Downloads)') THEN 'strm'
			   ELSE 'other'
		   END 
		, Store 
ORDER BY Artist
		, Yr
        , Song 
        , Meth
		, Store
    

Open in new window

Avatar of day6

ASKER

@agx

So this sample query replaces the CFLOOP and multiple QUERIES on my original CF code I posted? The output of this query generating an HTML table will work with this above new query you typed? I'm just double checking.
so this sample query replaces the CFLOOP and multiple QUERIES on my original CF code I posted?

In theory, yes.   The one thing it doesn't include is a "total" column, but that can easily be done either in code or with an UPDATE.  If the new query works, you could ultimately skip generating a temp file and the LOAD DATA altogether.  Just insert the records directly into the other table with SQL like this:

         INSERT INTO yourtargettable (... column names ....)
         SELECT ... column names
         FROM    distro
         GROUP BY ....

But let's see how it works first.

The output of this query generating an HTML table will work with this above new query you typed? I'm just double checking.

Yes, it should have all the same columns except for "total".
Sorry, just caught a logic error in the SQL. Change this:

CASE WHEN Store IN ('Amazon', store='iTunes', 'Google Play'

to this:

CASE WHEN Store IN ('Amazon', 'iTunes', 'Google Play'
Avatar of day6

ASKER

@agx

Everything worked in less than 30 seconds for the entire 85k search, compilation and insert into my other table. For ALL artists. I added another query to total $sales by month by modifying your query to hunt for earnings_usd instead of Quantity on the monthly summaries.

I've recreated my results page and will now be beefing up the GUI for a few betatesters. This is awesome. Thank you.
Great.   Just fyi, if needed, you can also run ALL artists for a specific year by adding a WHERE clause.  

Say you wanted to query for the entire year of 2017, just calculate the 1st of that year:

                <cfset firstDayOfYear = createdate(2017, 1, 1)>

Then use the date in the WHERE clause.  ie Find all records where the "reporting_date" is between Jan 1, 2017 and Dec 31, 2017

   SELECT   .... columns
   FROM     distro
   WHERE  reporting_date >= <cfqueryparam value="#firstDayOfYear#" cfsqltype="cf_sql_date">
   AND       reporting_date < <cfqueryparam value="#dateAdd('yyyy', 1, firstDayOfYear)#" cfsqltype="cf_sql_date">
   GROUP BY .... columns
One last thing. Looking over the query I realized it can include the "Total" column too.  Just add a SUM right after the `Dec` column

SELECT .... other columns
         , SUM(CASE WHEN Extract(MONTH from reporting_date) = 12 THEN Quantity ELSE 0 END) AS `Dec`
   , SUM(Quantity) AS Total
FROM  distro
GROUP BY ....