Solved

Import CSV data into mysql conditionally

Posted on 2008-06-10
12
597 Views
Last Modified: 2013-12-13
Hi, I am importing data into MySQL contained in a CSV file. I am using a simple php script for that. The import is working fine, no problems there.

Orderx is a column in my CSV file that has some identical values.

What I need to do "before" importing that data into mysql is to be able to identify how many rows of identical value records of field 'Orderx' I have and then for each instance of those identical set of rows, instead of uploading those rows as separate rows in mySQL I want to be able to upload these rows as just one row.

Also since I am trimming the 4 rows into one, I am looking for the script to perform some mathematical operations for remaining fields.

For each occurance of identical value of Orderx being imported as one row, the field Qty has to be added and then imported, for example in my sample data for values of field 'Qty' (1200+300+1200+33) respectively, the value '3000' would be imported.

Same addition has to be performed for the field 'Total' (2615.98+653.99+2615.98+646.99) = 6532.94. So the value '6532.94' will be imported for the field 'Total'.

The 'Symbol' field is also going to be identical so that would show once.

The 'Price' field will be averaged (2.18+2.18+2.18+2.18)/4 , therefore the value '2.18' would be imported.

'TradeDate' is going to be same so that can be imported once.

'TimeCompleted' will have almost similar values, so the first one from the identical rows can be picked.

I am therefore looking for code that would enable the following and I am assuming the code will reside along with the php import code snippet that I have provided.

Thank you very much in advance.
The php code used to import CSV into Mysql
 
<?
require_once("conn.php");
require_once("access.php");
require_once("includes.php");
if(isset($_POST['submit']))
 
   {
 
     $filename=$_POST['filename'];
 
     $handle = fopen("$filename", "r");
 
     while (($data = fgetcsv($handle, 1000, ",")) !== FALSE)
 
     {
 
       $import="INSERT into re_listings(AgentID,Action,Qty,Symbol,Price,Total,Orderx,TradeDate,TimeCompleted) values('$data[0]','$data[1]','$data[2]','$data[3]','$data[4]','$data[5]','$data[6]','$data[7]','$data[8]')";
 
       mysql_query($import) or die(mysql_error());
 
     }
 
     fclose($handle);
 
     print "Import done";
 
   }
 
   else
 
   {
 
      print "<form action='import.php' method='post'>";
 
      print "Type file name to import:<br>";
 
      print "<input type='text' name='filename' size='20'><br>";
 
      print "<input type='submit' name='submit' value='submit'></form>";
 
   }
?>
 
 
 
 
AS FOR WHAT I AM TRYING TO DO:
 
 
I WOULD LIKE THE FOLLOWING:
 
AgentID	Action	Qty	Symbol	Price	Total	Orderx		TradeDate	TimeCompleted
91	SELL	1200	QTWW	2.18	2615.98	XMC913020080609	6/2/2008	3:59:06 PM
91	SELL	300	QTWW	2.18	653.99	XMC913020080609	6/2/2008	3:59:03 PM
91	SELL	1200	QTWW	2.18	2615.98	XMC913020080609	6/2/2008	3:59:03 PM
91	SELL	300	QTWW	2.18	646.99	XMC913020080609	6/2/2008	3:58:37 PM
 
 
 
TO BE IMPORTED AS:
 
 
AgentID	Action	Qty	Symbol	Price	Total	Orderx		TradeDate	TimeCompleted
91	SELL	3000	QTWW	2.18	6532.94	XMC913020080609	6/2/2008	3:58:37 PM
 
Sorry about the caps.
 
 
Remember there are multiple instances of similar values of 'Orderx' among hundreds of rows of data so I am thinking a loop has to be performed to pull out all different instances of identical values of 'OrderX'  and upload each one of those instances as one row each.
 
For example :
In my data I could have:
 
AgentID	Action	Qty	Symbol	Price	Total	Orderx		TradeDate	TimeCompleted
91	SELL	1200	QTWW	2.18	2615.98	XMC913020080609	6/2/2008	3:59:06 PM
91	SELL	300	QTWW	2.18	653.99	XMC913020080609	6/2/2008	3:59:03 PM
91	SELL	1200	QTWW	2.18	2615.98	XMC913020080609	6/2/2008	3:59:03 PM
91	SELL	300	QTWW	2.18	646.99	XMC913020080609	6/2/2008	3:58:37 PM
 
followed by another instance of similar values of Orderx
 
AgentID	Action	Qty	Symbol	Price	Total	Orderx		TradeDate	TimeCompleted
91	SELL	500	ABCD	5.22	5615.98	ABC838385862829	6/2/2008	1:59:06 PM
91	SELL	300	ABCD	5.22	753.99	ABC838385862829	6/2/2008	1:59:03 PM
91	SELL	1400	ABCD	5.22	415.98	ABC838385862829	6/2/2008	1:59:03 PM
 
 
For both above instances of similar values of Orderx I would like 2 unique records imported based on the rules I mentioned above.
 
Thanks in advance.

Open in new window

0
Comment
Question by:rts5678
  • 6
  • 4
12 Comments
 

Author Comment

by:rts5678
ID: 21749082
Thank you very much for your help. If you have any questions, please send me message.
0
 
LVL 25

Expert Comment

by:Marcus Bointon
ID: 21755255
How many rows of data do you have?

Have you considered keeping the data as it is and doing the calculations in the DB?
0
 

Author Comment

by:rts5678
ID: 21755296
I have hundreds of rows of data on an ongoing basis. I could do calculations in the database, but the multiple rows with similar Orderx value still have to be imported in database as 1 row.

How do I go about that?
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 25

Expert Comment

by:Marcus Bointon
ID: 21755617
Why do they need to be one row? It sounds like a textbook case to implement as two tables and do a join to get the averages/sum/count etc
0
 

Author Comment

by:rts5678
ID: 21755709
What it is that one unique value of Orderx represents a buy, or sell for a stock market trade.

In my sample data (which is reality based), a customer bought 3000 shares of a stock of the symbol QTWW. Now the stock market sometimes executes that one order in parts, in this case  4 parts (selling 1200,300,1200 and then again 300), however, since it was one order by the customer, the order number remained the same on all 4 transactions (XMC913020080609).

Now if the customer sold those 3000 shares, that order could be executed as one transaction, or again in multiple partial transactions, but as always, it will be assigned just 1 unique Order number.

Ok so now, we have a full trade. Buying of 3000 shares, and then selling of 3000 shares. Once the buy and sell can be imported as 1 row each, the analytics of trade can be calculated afterwards via mySQL and php.

But the import is the big issue right now.

I am hoping you understand what I am after.

I am open to ideas. Let me know what could be the best way to accomplish this along with code suggestions on your end.  

Thanks so much again. I'll await your response.

I am providing a larger data sample this time.
AgentID	Action	Qty	Symbol	Price	Total	Orderx	Trade Date
91	SELL	200	CRDC	$9.90 	$1,979.98 	XAA988020080610	6/10/2008
91	SELL	500	CRDC	$9.90 	$4,942.97 	XAA988020080610	6/10/2008
91	SELL	100	CRDC	$9.90 	$982.99 	XPD612020080610	6/10/2008
91	SELL	100	CRDC	$9.90 	$989.99 	XKD857420080610	6/10/2008
91	SELL	100	CRDC	$9.90 	$982.99 	XKD857420080610	6/10/2008
91	BUY	400	CRDC	$9.70 	$3,880.00 	XGC140820080610	6/10/2008
91	BUY	600	CRDC	$9.70 	$5,827.00 	XGC140820080610	6/10/2008
91	SELL	600	CRDC	$9.78 	$5,860.96 	XFC118820080610	6/10/2008
91	SELL	400	CRDC	$9.78 	$3,911.97 	XFC118820080610	6/10/2008
91	BUY	1000	CRDC	$9.68 	$9,686.90 	XBA941320080610	6/10/2008
91	SELL	180	CRDC	$9.28 	$1,663.39 	XIA577920080610	6/10/2008
91	SELL	120	CRDC	$9.27 	$1,112.40 	XIA577920080610	6/10/2008
91	SELL	700	CRDC	$9.35 	$6,537.96 	XTA833120080610	6/10/2008
91	BUY	1000	CRDC	$9.44 	$9,446.90 	XIB977220080610	6/10/2008
91	SELL	300	CRDC	$9.27 	$2,774.01 	XPB800220080610	6/10/2008
91	SELL	1000	CRDC	$9.15 	$9,142.94 	XLC818420080610	6/10/2008
91	BUY	100	CRDC	$9.00 	$900.00 	XCD354320080610	6/10/2008
91	BUY	150	CRDC	$9.00 	$1,350.00 	XCD354320080610	6/10/2008
91	BUY	100	CRDC	$9.00 	$900.00 	XCD354320080610	6/10/2008
91	BUY	100	CRDC	$9.00 	$900.00 	XCD354320080610	6/10/2008
91	BUY	1323	CRDC	$9.00 	$11,907.00 	XCD354320080610	6/10/2008
91	BUY	100	CRDC	$9.00 	$900.00 	XCD354320080610	6/10/2008
91	BUY	100	CRDC	$9.00 	$900.00 	XCD354320080610	6/10/2008
91	BUY	27	CRDC	$8.99 	$249.73 	XCD354320080610	6/10/2008

Open in new window

0
 

Author Comment

by:rts5678
ID: 21755726
Also understand that we cannot lump (add up) all sells and all buys for the same symbol and try to calculate profit/loss.

Instead we have to perform those analytics on an order by order basis. Hence the importance of sorting these transactions out by Orderx.
0
 

Author Comment

by:rts5678
ID: 21755733
"What it is that one unique value of Orderx represents a buy, or sell for a stock market trade."

My above statement in an earlier comment for sake of clarity is supposed to be read like this.

"What it is that one set of unique values of Orderx represents a buy, or sell for a stock market trade."
0
 
LVL 25

Expert Comment

by:Marcus Bointon
ID: 21757918
OK, after everything you describe, it still seems like a very good fit for two tables. You would have one listing order numbers, the date, whether it was a buy or a sell (essentially everything that is common to all of the individual transactions), then another table indexed by transaction id with all the elements linked to that trade. The problem with working everything out on import is that you lose the history of the individual parts of the trade. If that information is not of value, then fine, but if you're only throwing away information because you don't know how to keep it, that's another.
Also, you can't work out the average priced based only on the price because the number of shares bought at that price alters the overall value - if you bought 100 at 2.0 and 1000 at 2.10, the average price would be 2.09, not 2.05. The total price should match the average price * the total number of shares.
0
 

Author Comment

by:rts5678
ID: 21759665
Thanks for your helps so far.

So from the fields I have in the above data, from your suggestions can you tell me exactly what fields go in 'table 1', and what fields go in 'table2' ??
0
 
LVL 25

Accepted Solution

by:
Marcus Bointon earned 400 total points
ID: 21760981
Well, I can only guess as I don't have all the facts. I'll call the two tables trades and transactions, where one trade can have many transactions. From the data presented, it looks like this:

Trade:
orderx
agentid
action
symbol

Transaction:
orderx
qty
price
date
time

I'm assuming that not all transactions for a trade have to occur on the same day, also that all transactions are for the same symbol, have the same action (i.e. you can't mix buy and sell in one trade), and are owned by one agent. You don't need to store total as you can just multiply price by qty.

One other reason for maintaining the original data is what happens if you add more transactions to a trade on two separate imports?

After all that, import code is pretty similar to what you have already (not tested):
<?
require_once("conn.php");
require_once("access.php");
require_once("includes.php");
if(isset($_POST['submit'])) {
 
	$filename=$_POST['filename'];
 
	$handle = fopen("$filename", "r");
 
	$lastorderx = '';
	while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {
		if ($lastorderx != $data[6]) { //This is a new order
			//INSERT IGNORE so we can skip duplicates
			mysql_query("INSERT IGNORE INTO trade SET orderx = '$data[6]', action = '$data[1]', symbol = '$data[3]', agentid = '$data[0]'");
			$lastorderx = $data[6];
		}
		$import="INSERT INTO trans SET orderx = '$data[6]', price = '$data[4]', qty = '$data[3]', tradedate = '$data[7]', timecompleted = '$data[8]'";
		 mysql_query($import) or die(mysql_error());
	}
	fclose($handle);
	print "Import done";
 } else {
	 print "<form action='import.php' method='post'>";
	 print "Type file name to import:<br>";
	 print "<input type='text' name='filename' size='20'><br>";
	 print "<input type='submit' name='submit' value='submit'></form>";
 }
?>

Open in new window

0

Featured Post

Ransomware-A Revenue Bonanza for Service Providers

Ransomware – malware that gets on your customers’ computers, encrypts their data, and extorts a hefty ransom for the decryption keys – is a surging new threat.  The purpose of this eBook is to educate the reader about ransomware attacks.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Developers of all skill levels should learn to use current best practices when developing websites. However many developers, new and old, fall into the trap of using deprecated features because this is what so many tutorials and books tell them to u…
Password hashing is better than message digests or encryption, and you should be using it instead of message digests or encryption.  Find out why and how in this article, which supplements the original article on PHP Client Registration, Login, Logo…
The viewer will learn how to dynamically set the form action using jQuery.
The viewer will learn how to create a basic form using some HTML5 and PHP for later processing. Set up your basic HTML file. Open your form tag and set the method and action attributes.: (CODE) Set up your first few inputs one for the name and …

733 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question