Solved

Import CSV data into mysql conditionally

Posted on 2008-06-10
12
576 Views
Last Modified: 2013-12-13
Hi, I am importing data into MySQL contained in a CSV file. I am using a simple php script for that. The import is working fine, no problems there.

Orderx is a column in my CSV file that has some identical values.

What I need to do "before" importing that data into mysql is to be able to identify how many rows of identical value records of field 'Orderx' I have and then for each instance of those identical set of rows, instead of uploading those rows as separate rows in mySQL I want to be able to upload these rows as just one row.

Also since I am trimming the 4 rows into one, I am looking for the script to perform some mathematical operations for remaining fields.

For each occurance of identical value of Orderx being imported as one row, the field Qty has to be added and then imported, for example in my sample data for values of field 'Qty' (1200+300+1200+33) respectively, the value '3000' would be imported.

Same addition has to be performed for the field 'Total' (2615.98+653.99+2615.98+646.99) = 6532.94. So the value '6532.94' will be imported for the field 'Total'.

The 'Symbol' field is also going to be identical so that would show once.

The 'Price' field will be averaged (2.18+2.18+2.18+2.18)/4 , therefore the value '2.18' would be imported.

'TradeDate' is going to be same so that can be imported once.

'TimeCompleted' will have almost similar values, so the first one from the identical rows can be picked.

I am therefore looking for code that would enable the following and I am assuming the code will reside along with the php import code snippet that I have provided.

Thank you very much in advance.
The php code used to import CSV into Mysql

 

<?

require_once("conn.php");

require_once("access.php");

require_once("includes.php");

if(isset($_POST['submit']))

 

   {

 

     $filename=$_POST['filename'];

 

     $handle = fopen("$filename", "r");

 

     while (($data = fgetcsv($handle, 1000, ",")) !== FALSE)

 

     {

 

       $import="INSERT into re_listings(AgentID,Action,Qty,Symbol,Price,Total,Orderx,TradeDate,TimeCompleted) values('$data[0]','$data[1]','$data[2]','$data[3]','$data[4]','$data[5]','$data[6]','$data[7]','$data[8]')";

 

       mysql_query($import) or die(mysql_error());

 

     }

 

     fclose($handle);

 

     print "Import done";

 

   }

 

   else

 

   {

 

      print "<form action='import.php' method='post'>";

 

      print "Type file name to import:<br>";

 

      print "<input type='text' name='filename' size='20'><br>";

 

      print "<input type='submit' name='submit' value='submit'></form>";

 

   }

?>

 

 

 

 

AS FOR WHAT I AM TRYING TO DO:

 

 

I WOULD LIKE THE FOLLOWING:

 

AgentID	Action	Qty	Symbol	Price	Total	Orderx		TradeDate	TimeCompleted

91	SELL	1200	QTWW	2.18	2615.98	XMC913020080609	6/2/2008	3:59:06 PM

91	SELL	300	QTWW	2.18	653.99	XMC913020080609	6/2/2008	3:59:03 PM

91	SELL	1200	QTWW	2.18	2615.98	XMC913020080609	6/2/2008	3:59:03 PM

91	SELL	300	QTWW	2.18	646.99	XMC913020080609	6/2/2008	3:58:37 PM

 

 

 

TO BE IMPORTED AS:

 

 

AgentID	Action	Qty	Symbol	Price	Total	Orderx		TradeDate	TimeCompleted

91	SELL	3000	QTWW	2.18	6532.94	XMC913020080609	6/2/2008	3:58:37 PM

 

Sorry about the caps.
 
 

Remember there are multiple instances of similar values of 'Orderx' among hundreds of rows of data so I am thinking a loop has to be performed to pull out all different instances of identical values of 'OrderX'  and upload each one of those instances as one row each.
 

For example :

In my data I could have:
 

AgentID	Action	Qty	Symbol	Price	Total	Orderx		TradeDate	TimeCompleted

91	SELL	1200	QTWW	2.18	2615.98	XMC913020080609	6/2/2008	3:59:06 PM

91	SELL	300	QTWW	2.18	653.99	XMC913020080609	6/2/2008	3:59:03 PM

91	SELL	1200	QTWW	2.18	2615.98	XMC913020080609	6/2/2008	3:59:03 PM

91	SELL	300	QTWW	2.18	646.99	XMC913020080609	6/2/2008	3:58:37 PM
 

followed by another instance of similar values of Orderx
 

AgentID	Action	Qty	Symbol	Price	Total	Orderx		TradeDate	TimeCompleted

91	SELL	500	ABCD	5.22	5615.98	ABC838385862829	6/2/2008	1:59:06 PM

91	SELL	300	ABCD	5.22	753.99	ABC838385862829	6/2/2008	1:59:03 PM

91	SELL	1400	ABCD	5.22	415.98	ABC838385862829	6/2/2008	1:59:03 PM
 
 

For both above instances of similar values of Orderx I would like 2 unique records imported based on the rules I mentioned above.
 

Thanks in advance.

Open in new window

0
Comment
Question by:rts5678
  • 6
  • 4
12 Comments
 

Author Comment

by:rts5678
ID: 21749082
Thank you very much for your help. If you have any questions, please send me message.
0
 
LVL 25

Expert Comment

by:Squinky
ID: 21755255
How many rows of data do you have?

Have you considered keeping the data as it is and doing the calculations in the DB?
0
 

Author Comment

by:rts5678
ID: 21755296
I have hundreds of rows of data on an ongoing basis. I could do calculations in the database, but the multiple rows with similar Orderx value still have to be imported in database as 1 row.

How do I go about that?
0
 
LVL 25

Expert Comment

by:Squinky
ID: 21755617
Why do they need to be one row? It sounds like a textbook case to implement as two tables and do a join to get the averages/sum/count etc
0
 

Author Comment

by:rts5678
ID: 21755709
What it is that one unique value of Orderx represents a buy, or sell for a stock market trade.

In my sample data (which is reality based), a customer bought 3000 shares of a stock of the symbol QTWW. Now the stock market sometimes executes that one order in parts, in this case  4 parts (selling 1200,300,1200 and then again 300), however, since it was one order by the customer, the order number remained the same on all 4 transactions (XMC913020080609).

Now if the customer sold those 3000 shares, that order could be executed as one transaction, or again in multiple partial transactions, but as always, it will be assigned just 1 unique Order number.

Ok so now, we have a full trade. Buying of 3000 shares, and then selling of 3000 shares. Once the buy and sell can be imported as 1 row each, the analytics of trade can be calculated afterwards via mySQL and php.

But the import is the big issue right now.

I am hoping you understand what I am after.

I am open to ideas. Let me know what could be the best way to accomplish this along with code suggestions on your end.  

Thanks so much again. I'll await your response.

I am providing a larger data sample this time.
AgentID	Action	Qty	Symbol	Price	Total	Orderx	Trade Date

91	SELL	200	CRDC	$9.90 	$1,979.98 	XAA988020080610	6/10/2008

91	SELL	500	CRDC	$9.90 	$4,942.97 	XAA988020080610	6/10/2008

91	SELL	100	CRDC	$9.90 	$982.99 	XPD612020080610	6/10/2008

91	SELL	100	CRDC	$9.90 	$989.99 	XKD857420080610	6/10/2008

91	SELL	100	CRDC	$9.90 	$982.99 	XKD857420080610	6/10/2008

91	BUY	400	CRDC	$9.70 	$3,880.00 	XGC140820080610	6/10/2008

91	BUY	600	CRDC	$9.70 	$5,827.00 	XGC140820080610	6/10/2008

91	SELL	600	CRDC	$9.78 	$5,860.96 	XFC118820080610	6/10/2008

91	SELL	400	CRDC	$9.78 	$3,911.97 	XFC118820080610	6/10/2008

91	BUY	1000	CRDC	$9.68 	$9,686.90 	XBA941320080610	6/10/2008

91	SELL	180	CRDC	$9.28 	$1,663.39 	XIA577920080610	6/10/2008

91	SELL	120	CRDC	$9.27 	$1,112.40 	XIA577920080610	6/10/2008

91	SELL	700	CRDC	$9.35 	$6,537.96 	XTA833120080610	6/10/2008

91	BUY	1000	CRDC	$9.44 	$9,446.90 	XIB977220080610	6/10/2008

91	SELL	300	CRDC	$9.27 	$2,774.01 	XPB800220080610	6/10/2008

91	SELL	1000	CRDC	$9.15 	$9,142.94 	XLC818420080610	6/10/2008

91	BUY	100	CRDC	$9.00 	$900.00 	XCD354320080610	6/10/2008

91	BUY	150	CRDC	$9.00 	$1,350.00 	XCD354320080610	6/10/2008

91	BUY	100	CRDC	$9.00 	$900.00 	XCD354320080610	6/10/2008

91	BUY	100	CRDC	$9.00 	$900.00 	XCD354320080610	6/10/2008

91	BUY	1323	CRDC	$9.00 	$11,907.00 	XCD354320080610	6/10/2008

91	BUY	100	CRDC	$9.00 	$900.00 	XCD354320080610	6/10/2008

91	BUY	100	CRDC	$9.00 	$900.00 	XCD354320080610	6/10/2008

91	BUY	27	CRDC	$8.99 	$249.73 	XCD354320080610	6/10/2008

Open in new window

0
Why You Should Analyze Threat Actor TTPs

After years of analyzing threat actor behavior, it’s become clear that at any given time there are specific tactics, techniques, and procedures (TTPs) that are particularly prevalent. By analyzing and understanding these TTPs, you can dramatically enhance your security program.

 

Author Comment

by:rts5678
ID: 21755726
Also understand that we cannot lump (add up) all sells and all buys for the same symbol and try to calculate profit/loss.

Instead we have to perform those analytics on an order by order basis. Hence the importance of sorting these transactions out by Orderx.
0
 

Author Comment

by:rts5678
ID: 21755733
"What it is that one unique value of Orderx represents a buy, or sell for a stock market trade."

My above statement in an earlier comment for sake of clarity is supposed to be read like this.

"What it is that one set of unique values of Orderx represents a buy, or sell for a stock market trade."
0
 
LVL 25

Expert Comment

by:Squinky
ID: 21757918
OK, after everything you describe, it still seems like a very good fit for two tables. You would have one listing order numbers, the date, whether it was a buy or a sell (essentially everything that is common to all of the individual transactions), then another table indexed by transaction id with all the elements linked to that trade. The problem with working everything out on import is that you lose the history of the individual parts of the trade. If that information is not of value, then fine, but if you're only throwing away information because you don't know how to keep it, that's another.
Also, you can't work out the average priced based only on the price because the number of shares bought at that price alters the overall value - if you bought 100 at 2.0 and 1000 at 2.10, the average price would be 2.09, not 2.05. The total price should match the average price * the total number of shares.
0
 

Author Comment

by:rts5678
ID: 21759665
Thanks for your helps so far.

So from the fields I have in the above data, from your suggestions can you tell me exactly what fields go in 'table 1', and what fields go in 'table2' ??
0
 
LVL 25

Accepted Solution

by:
Squinky earned 400 total points
ID: 21760981
Well, I can only guess as I don't have all the facts. I'll call the two tables trades and transactions, where one trade can have many transactions. From the data presented, it looks like this:

Trade:
orderx
agentid
action
symbol

Transaction:
orderx
qty
price
date
time

I'm assuming that not all transactions for a trade have to occur on the same day, also that all transactions are for the same symbol, have the same action (i.e. you can't mix buy and sell in one trade), and are owned by one agent. You don't need to store total as you can just multiply price by qty.

One other reason for maintaining the original data is what happens if you add more transactions to a trade on two separate imports?

After all that, import code is pretty similar to what you have already (not tested):
<?

require_once("conn.php");

require_once("access.php");

require_once("includes.php");

if(isset($_POST['submit'])) {
 

	$filename=$_POST['filename'];
 

	$handle = fopen("$filename", "r");
 

	$lastorderx = '';

	while (($data = fgetcsv($handle, 1000, ",")) !== FALSE) {

		if ($lastorderx != $data[6]) { //This is a new order

			//INSERT IGNORE so we can skip duplicates

			mysql_query("INSERT IGNORE INTO trade SET orderx = '$data[6]', action = '$data[1]', symbol = '$data[3]', agentid = '$data[0]'");

			$lastorderx = $data[6];

		}

		$import="INSERT INTO trans SET orderx = '$data[6]', price = '$data[4]', qty = '$data[3]', tradedate = '$data[7]', timecompleted = '$data[8]'";

		 mysql_query($import) or die(mysql_error());

	}

	fclose($handle);

	print "Import done";

 } else {

	 print "<form action='import.php' method='post'>";

	 print "Type file name to import:<br>";

	 print "<input type='text' name='filename' size='20'><br>";

	 print "<input type='submit' name='submit' value='submit'></form>";

 }

?>

Open in new window

0

Featured Post

Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

Join & Write a Comment

Popularity Can Be Measured Sometimes we deal with questions of popularity, and we need a way to collect opinions from our clients.  This article shows a simple teaching example of how we might elect a favorite color by letting our clients vote for …
Introduction Since I wrote the original article about Handling Date and Time in PHP and MySQL (http://www.experts-exchange.com/articles/201/Handling-Date-and-Time-in-PHP-and-MySQL.html) several years ago, it seemed like now was a good time to updat…
The viewer will learn how to dynamically set the form action using jQuery.
This tutorial will teach you the core code needed to finalize the addition of a watermark to your image. The viewer will use a small PHP class to learn and create a watermark.

706 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

21 Experts available now in Live!

Get 1:1 Help Now