Link to home
Start Free TrialLog in
Avatar of steva

asked on

Loading a new database

I want to create a new MySQL database that will be loaded from about 50,000 current PayPal records. The PayPal records have the customer's information plus the product information.  I want to split this into a Customer table and an Order table.

My current thinking is to get a CSV from PayPal and input it into a Raw table in MySQL, using phpMyAdmin. Then I could do a SELECT on the Raw table to build a table that just has the customer information fields, and If I add a DISTINCT I should have a table, or "view"  of just the unique customers with their information.  I would then like to feed that data to the Customer table, which will have an auto incrementing  Primary key of Customer_ID and then all the fields for the customer info.  I guess I could output that "view" as a CSV and thne input the CSV file to the new Customer table, but it seems like there should be a better way.

I'm just starting with MySQL and phpMyAdmin so perhaps someone could suggest the "proper" way to build this Customer table from the raw PayPal data.

Then I need to build an Order table, that contains just a single Customer_ID column, in place of all the customer information fields in the Raw data, with the details of each order in the rest of the columns.

Any suggestions on the proper road to down to get this all done would be most welcome.

Avatar of paulwquinn
Flag of Canada image

If the raw data is well-defined, i.e. the source fields are all easily identifiable, you can use an INSERT...SELECT statement to extract the data directly from the raw data table into your desired target, e.g. the CUSTOMER table. Not knowing the formats you're using it's hard to be more specific, but here's a simple example;

CREATE TABLE raw_data (raw_text VARCHAR(255);
INSERT INTO raw_data VALUES('John Smith');

This gives us a raw_data table with a single record and a target customer table containing (for the purposes of our simple example) only the custid primary key and first and last name fields.

Now we use an INSERT...SELECT statement:

INSERT INTO customer (id, fname,lname) SELECT DISTINCT NULL, SUBSTRING_INDEX(raw_text,' ',2) FROM raw_data;

Another technique often used by DBAs for handling these types of "one off" conversions is to use SQL to generate SQL. You basically use a combination of queries and literal text strings to get the info you need and build the required INSERT statements in a text file. The basic query, using the above simple example would be:

SELECT CONCAT('INSERT INTO customer (id, fname, lname) VALUES (NULL,\'', LEFT(raw_text,INSTR(raw_text,' ')-1 ),'\',\'',RIGHT(raw_text, INSTR(raw_text,' ')+1),'\');' FROM raw_data;

The benefits of this technique include having a permanent record of the operation (the text file with the SQL In it), and being able to review the intermediate results in the text file and correct any 'gotchas' using your favorite text editor.

Using this technicque along with the data dictionary (e.g. the INFORMATION_SCHEMA database in MySQL) can be very powerful. For example, you can build queries that will create the SQL to recreate your entire schema!

Using this second technique is not quite as easy in MySQL as in some other databases, but there is a good discussion of several ways to use the technique for MySQL given on
Anders Karlsson's blog

Hope that helps!
BTW, building your Orders table can use similar techniques as above. You can join to your new Customer table using whatever unique combination of columns you used to extract the individual Customer data to go and get the corresponding custid primary key to use as a foreign key in your Orders table.
Avatar of steva


Thanks, Paul, for all the information.

I was thinking later, though, why break the raw data into two tables at all? I know if you were starting out with a new database and no data you'd want a customer table that was separate from the order table so you could just have one column in the order table representing the customer.  That is, you wouldn't want to take up the space in every order record for all of the customer info. But space seems to be the only reason not to use the raw PayPal data as is.  It seems that I could certainly do any queries on the single  raw table that I could on the two tables, so if I don't mind using a few extra megaybtes  why bother at all splitting the raw table into two tables?

Avatar of paulwquinn
Flag of Canada image

Link to home
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of steva


Thanks Paul, again,  for the great information.

I was hoping to use DISTINCT in the queries to at least avoid the duplicate problem you mentioned.  But I agree.  Once a customer changes his address it's certainly easier to change it in one record in the customer table than in every record for them in the order table.  

It's going to come down to how difficult it is to split the current data into the two tables. I'll try to split them first, using the suggestions from your first post.

I gave you the points.

I hope you're around the next time I post a MySQL question.