• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 663
  • Last Modified:

Import CSV data into mysql conditionally

Hi, I am importing data into MySQL contained in a CSV file. I am using a simple php script for that. The import is working fine, no problems there.

Orderx is a column in my CSV file that has some identical values.

What I need to do "before" importing that data into mysql is to be able to identify how many rows of identical value records of field 'Orderx' I have and then for each instance of those identical set of rows, instead of uploading those rows as separate rows in mySQL I want to be able to upload these rows as just one row.

Also since I am trimming the 4 rows into one, I am looking for the script to perform some mathematical operations for remaining fields.

For each occurance of identical value of Orderx being imported as one row, the field Qty has to be added and then imported, for example in my sample data for values of field 'Qty' (1200+300+1200+33) respectively, the value '3000' would be imported.

Same addition has to be performed for the field 'Total' (2615.98+653.99+2615.98+646.99) = 6532.94. So the value '6532.94' will be imported for the field 'Total'.

The 'Symbol' field is also going to be identical so that would show once.

The 'Price' field will be averaged (2.18+2.18+2.18+2.18)/4 , therefore the value '2.18' would be imported.

'TradeDate' is going to be same so that can be imported once.

'TimeCompleted' will have almost similar values, so the first one from the identical rows can be picked.

I am therefore looking for code that would enable the following and I am assuming the code will reside along with the php import code snippet that I have provided.

Thank you very much in advance.
The php code used to import CSV into Mysql
 
<?
require_once("conn.php");
require_once("access.php");
require_once("includes.php");
if(isset($_POST['submit']))
 
   {
 
     $filename=$_POST['filename'];
 
     $handle = fopen("$filename", "r");
 
     while (($data = fgetcsv($handle, 1000, ",")) !== FALSE)
 
     {
 
       $import="INSERT into re_listings(AgentID,Action,Qty,Symbol,Price,Total,Orderx,TradeDate,TimeCompleted) values('$data[0]','$data[1]','$data[2]','$data[3]','$data[4]','$data[5]','$data[6]','$data[7]','$data[8]')";
 
       mysql_query($import) or die(mysql_error());
 
     }
 
     fclose($handle);
 
     print "Import done";
 
   }
 
   else
 
   {
 
      print "<form action='import.php' method='post'>";
 
      print "Type file name to import:<br>";
 
      print "<input type='text' name='filename' size='20'><br>";
 
      print "<input type='submit' name='submit' value='submit'></form>";
 
   }
?>
 
 
 
 
AS FOR WHAT I AM TRYING TO DO:
 
 
I WOULD LIKE THE FOLLOWING:
 
AgentID	Action	Qty	Symbol	Price	Total	Orderx		TradeDate	TimeCompleted
91	SELL	1200	QTWW	2.18	2615.98	XMC913020080609	6/2/2008	3:59:06 PM
91	SELL	300	QTWW	2.18	653.99	XMC913020080609	6/2/2008	3:59:03 PM
91	SELL	1200	QTWW	2.18	2615.98	XMC913020080609	6/2/2008	3:59:03 PM
91	SELL	300	QTWW	2.18	646.99	XMC913020080609	6/2/2008	3:58:37 PM
 
 
 
TO BE IMPORTED AS:
 
 
AgentID	Action	Qty	Symbol	Price	Total	Orderx		TradeDate	TimeCompleted
91	SELL	3000	QTWW	2.18	6532.94	XMC913020080609	6/2/2008	3:58:37 PM
 
Sorry about the caps.
 
 
Remember there are multiple instances of similar values of 'Orderx' among hundreds of rows of data so I am thinking a loop has to be performed to pull out all different instances of identical values of 'OrderX'  and upload each one of those instances as one row each.
 
For example :
In my data I could have:
 
AgentID	Action	Qty	Symbol	Price	Total	Orderx		TradeDate	TimeCompleted
91	SELL	1200	QTWW	2.18	2615.98	XMC913020080609	6/2/2008	3:59:06 PM
91	SELL	300	QTWW	2.18	653.99	XMC913020080609	6/2/2008	3:59:03 PM
91	SELL	1200	QTWW	2.18	2615.98	XMC913020080609	6/2/2008	3:59:03 PM
91	SELL	300	QTWW	2.18	646.99	XMC913020080609	6/2/2008	3:58:37 PM
 
followed by another instance of similar values of Orderx
 
AgentID	Action	Qty	Symbol	Price	Total	Orderx		TradeDate	TimeCompleted
91	SELL	500	ABCD	5.22	5615.98	ABC838385862829	6/2/2008	1:59:06 PM
91	SELL	300	ABCD	5.22	753.99	ABC838385862829	6/2/2008	1:59:03 PM
91	SELL	1400	ABCD	5.22	415.98	ABC838385862829	6/2/2008	1:59:03 PM
 
 
For both above instances of similar values of Orderx I would like 2 unique records imported based on the rules I mentioned above.
 
Thanks in advance.

Open in new window

0
rts5678
Asked:
rts5678
  • 6
  • 4
1 Solution
 
rts5678Author Commented:
Thank you very much for your help. If you have any questions, please send me message.
0
 
Marcus BointonCommented:
How many rows of data do you have?

Have you considered keeping the data as it is and doing the calculations in the DB?
0
 
rts5678Author Commented:
I have hundreds of rows of data on an ongoing basis. I could do calculations in the database, but the multiple rows with similar Orderx value still have to be imported in database as 1 row.

How do I go about that?
0
Free Backup Tool for VMware and Hyper-V

Restore full virtual machine or individual guest files from 19 common file systems directly from the backup file. Schedule VM backups with PowerShell scripts. Set desired time, lean back and let the script to notify you via email upon completion.  

 
Marcus BointonCommented:
Why do they need to be one row? It sounds like a textbook case to implement as two tables and do a join to get the averages/sum/count etc
0
 
rts5678Author Commented:
What it is that one unique value of Orderx represents a buy, or sell for a stock market trade.

In my sample data (which is reality based), a customer bought 3000 shares of a stock of the symbol QTWW. Now the stock market sometimes executes that one order in parts, in this case  4 parts (selling 1200,300,1200 and then again 300), however, since it was one order by the customer, the order number remained the same on all 4 transactions (XMC913020080609).

Now if the customer sold those 3000 shares, that order could be executed as one transaction, or again in multiple partial transactions, but as always, it will be assigned just 1 unique Order number.

Ok so now, we have a full trade. Buying of 3000 shares, and then selling of 3000 shares. Once the buy and sell can be imported as 1 row each, the analytics of trade can be calculated afterwards via mySQL and php.

But the import is the big issue right now.

I am hoping you understand what I am after.

I am open to ideas. Let me know what could be the best way to accomplish this along with code suggestions on your end.  

Thanks so much again. I'll await your response.

I am providing a larger data sample this time.
AgentID	Action	Qty	Symbol	Price	Total	Orderx	Trade Date
91	SELL	200	CRDC	$9.90 	$1,979.98 	XAA988020080610	6/10/2008
91	SELL	500	CRDC	$9.90 	$4,942.97 	XAA988020080610	6/10/2008
91	SELL	100	CRDC	$9.90 	$982.99 	XPD612020080610	6/10/2008
91	SELL	100	CRDC	$9.90 	$989.99 	XKD857420080610	6/10/2008
91	SELL	100	CRDC	$9.90 	$982.99 	XKD857420080610	6/10/2008
91	BUY	400	CRDC	$9.70 	$3,880.00 	XGC140820080610	6/10/2008
91	BUY	600	CRDC	$9.70 	$5,827.00 	XGC140820080610	6/10/2008
91	SELL	600	CRDC	$9.78 	$5,860.96 	XFC118820080610	6/10/2008
91	SELL	400	CRDC	$9.78 	$3,911.97 	XFC118820080610	6/10/2008
91	BUY	1000	CRDC	$9.68 	$9,686.90 	XBA941320080610	6/10/2008
91	SELL	180	CRDC	$9.28 	$1,663.39 	XIA577920080610	6/10/2008
91	SELL	120	CRDC	$9.27 	$1,112.40 	XIA577920080610	6/10/2008
91	SELL	700	CRDC	$9.35 	$6,537.96 	XTA833120080610	6/10/2008
91	BUY	1000	CRDC	$9.44 	$9,446.90 	XIB977220080610	6/10/2008
91	SELL	300	CRDC	$9.27 	$2,774.01 	XPB800220080610	6/10/2008
91	SELL	1000	CRDC	$9.15 	$9,142.94 	XLC818420080610	6/10/2008
91	BUY	100	CRDC	$9.00 	$900.00 	XCD354320080610	6/10/2008
91	BUY	150	CRDC	$9.00 	$1,350.00 	XCD354320080610	6/10/2008
91	BUY	100	CRDC	$9.00 	$900.00 	XCD354320080610	6/10/2008
91	BUY	100	CRDC	$9.00 	$900.00 	XCD354320080610	6/10/2008
91	BUY	1323	CRDC	$9.00 	$11,907.00 	XCD354320080610	6/10/2008
91	BUY	100	CRDC	$9.00 	$900.00 	XCD354320080610	6/10/2008
91	BUY	100	CRDC	$9.00 	$900.00 	XCD354320080610	6/10/2008
91	BUY	27	CRDC	$8.99 	$249.73 	XCD354320080610	6/10/2008

Open in new window

0
 
rts5678Author Commented:
Also understand that we cannot lump (add up) all sells and all buys for the same symbol and try to calculate profit/loss.

Instead we have to perform those analytics on an order by order basis. Hence the importance of sorting these transactions out by Orderx.
0
 
rts5678Author Commented:
"What it is that one unique value of Orderx represents a buy, or sell for a stock market trade."

My above statement in an earlier comment for sake of clarity is supposed to be read like this.

"What it is that one set of unique values of Orderx represents a buy, or sell for a stock market trade."
0
 
Marcus BointonCommented:
OK, after everything you describe, it still seems like a very good fit for two tables. You would have one listing order numbers, the date, whether it was a buy or a sell (essentially everything that is common to all of the individual transactions), then another table indexed by transaction id with all the elements linked to that trade. The problem with working everything out on import is that you lose the history of the individual parts of the trade. If that information is not of value, then fine, but if you're only throwing away information because you don't know how to keep it, that's another.
Also, you can't work out the average priced based only on the price because the number of shares bought at that price alters the overall value - if you bought 100 at 2.0 and 1000 at 2.10, the average price would be 2.09, not 2.05. The total price should match the average price * the total number of shares.
0
 
rts5678Author Commented:
Thanks for your helps so far.

So from the fields I have in the above data, from your suggestions can you tell me exactly what fields go in 'table 1', and what fields go in 'table2' ??
0
 
Marcus BointonCommented:
Well, I can only guess as I don't have all the facts. I'll call the two tables trades and transactions, where one trade can have many transactions. From the data presented, it looks like this:

Trade:
orderx
agentid
action
symbol

Transaction:
orderx
qty
price
date
time

I'm assuming that not all transactions for a trade have to occur on the same day, also that all transactions are for the same symbol, have the same action (i.e. you can't mix buy and sell in one trade), and are owned by one agent. You don't need to store total as you can just multiply price by qty.

One other reason for maintaining the original data is what happens if you add more transactions to a trade on two separate imports?

After all that, import code is pretty similar to what you have already (not tested):
<?
require_once("conn.php");
require_once("access.php");
require_once("includes.php");
if(isset($_POST['submit'])) {
 
	$filename=$_POST['filename'];
 
	$handle = fopen("$filename", "r");
 
	$lastorderx = '';
	while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {
		if ($lastorderx != $data[6]) { //This is a new order
			//INSERT IGNORE so we can skip duplicates
			mysql_query("INSERT IGNORE INTO trade SET orderx = '$data[6]', action = '$data[1]', symbol = '$data[3]', agentid = '$data[0]'");
			$lastorderx = $data[6];
		}
		$import="INSERT INTO trans SET orderx = '$data[6]', price = '$data[4]', qty = '$data[3]', tradedate = '$data[7]', timecompleted = '$data[8]'";
		 mysql_query($import) or die(mysql_error());
	}
	fclose($handle);
	print "Import done";
 } else {
	 print "<form action='import.php' method='post'>";
	 print "Type file name to import:<br>";
	 print "<input type='text' name='filename' size='20'><br>";
	 print "<input type='submit' name='submit' value='submit'></form>";
 }
?>

Open in new window

0

Featured Post

Granular recovery for Microsoft Exchange

With Veeam Explorer for Microsoft Exchange you can choose the Exchange Servers and restore points you’re interested in, and Veeam Explorer will present the contents of those mailbox stores for browsing, searching and exporting.

  • 6
  • 4
Tackle projects and never again get stuck behind a technical roadblock.
Join Now