Link to home
Start Free TrialLog in
Avatar of G0ggy
G0ggyFlag for United Kingdom of Great Britain and Northern Ireland

asked on

Remove quotes from large text file, for import into MySQL

I have a rather large text file, 1,637,676 lines, of which the text is delimited by commas, however each field between the commas is in double quotes.

I can easily import this into MySQL, but I want rid of the quotes before I do. I have tried removing the quotes using TextPad, a superb little utility but even that is struggling, and who can blame it 1,637,676 lines with 58 quotes per line, 94,985,208 iterations if you were wondering.

Can anyone think of a way to get rid of the quotes without me having to reboot my PC using Task Manager everytime!
ASKER CERTIFIED SOLUTION
Avatar of akshah123
akshah123
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Steve Bink
akshah123's solution is the way to go.  For future reference though, check out UltraEdit.  It could have handled this with no problem..

http://www.ultraedit.com
only correction to akshah otherwise brilliant comment is

LOAD DATA INFILE '/tmp/test.txt' INTO TABLE test
  FIELDS TERMINATED BY ','  
      ENCLOSED BY '"'
LINES TERMINATED BY '\n';


lines terminated is what i would use as i do not want any trailing junk ...
Avatar of G0ggy

ASKER

Thanks for all your comments, especially akshah123, who pointed me to the relevant article on the MySQL website. The load data function is currently running, and has been for overnight, I will leave it today and see what the outcome it.

Does anyone think that it should have processed 1.6m+ lines quicker than this or does it sound about right?

It wasn't a smooth process by any means, as the file was .CSV I already knew about the '\n' that yqoutham mentioned, although in the MySQL comments they say to use '\r\n', I then had a further problem with a "invalid integer value at line 36500 for column country_code", looking at the data and the corresponding table column the field was set up to be VARCHAR(2), I tried almost every type of data I could think of to no avail, even setting 'null' for the column, the proble was that the country code was generally:

C0
C3
F4

And the column at 36500 was "  ", in the end and at the end if my tether, I used TextPad to save the file as .TXT using Unicode encoding, this is the state we're at now with the processing. But at least it hasn't errored!

I'll keep you posted.
1.6M should not take so much time if the number of fields is only 29 (going by your explanation of 58 quotes per line). it should have happened much faster

if you have saved the file using a MS Windows textpad then the lines would be terminated with \r\n

if it were to be a linux text editor like pico, nano, kwrite then they would only be a \n

when the load data is running, you can always open a second CLI for mysql and check the number of lines added.  i have done load-data for about 900K lines within minutes (hardly 4 - 5 mins) for something around 70 odd fields with a text field running huge chunks of data.  a couple of hours is unheard of.  unless your mysql server is too busy catering to clients
Avatar of G0ggy

ASKER

No not really, it's a local development version, so no clients at the moment. I will check again tonight, cheers for keeping on this topic.
>>>No not really, it's a local development version, so no clients at the moment. I will check again tonight, cheers for keeping on this topic.
Are you running windows for the local development version? If yes, what option did you select while installing mysql?  For certain option, mysql limits itself to only use a small percentage of CPU & memory as it does not want you to stop working on other applications.  That may explain why it is taking it such a long time.
Avatar of G0ggy

ASKER

Damn, the operation failed with an out of memory error.
are you by any chance running a php or someother program to do this???  mysql should not time out on out of memory stuff???

what is the os on the server?
Avatar of G0ggy

ASKER

It's running on my local machine and the import was started through MySQL command line utility, I find it a bit bizarre to be honest.
Avatar of G0ggy

ASKER

XP Media Centre
do one thing.  back up the table if you feel like.  truncate the table.

start two command line utility for mysql and login into both.  

on one start the import from text file.

on the other , check for count of rows with something like

select count( primary_key_field ) from table_name;

keep running it to see how many rows are getting added approx every second / minute.  this should give you a rough idea as to when to finish with the whole import thing.  give it that much tme.  check on number of rows periodically.  see where it hangs.  probably a good idea would be to trouble shoot it with two pieces of the text file.  we can cut the file into two depending on the number of trouble free rows and

the second piece starting with the trouble some part to identify the error to a smaller extent.
Avatar of G0ggy

ASKER

Ok, I'll give it a whirl tonight, cheers for your help.
UltraEdit, costs about £35 from: http://www.ultraedit.com/ and allows perl regualr expressions in its search and replace.

We use it all the time for cleaning up text files before importing into other apps.

Also a damn good text editor. beats the pants off any MS rubbish.