Postgresql: Import csv constant errors, how to fix?

Anthony Mellor
Anthony Mellor used Ask the Experts™
on
I am using Navicat and I (finally) have Postgresql running.

I accept defaults as
Record delimiter LF
Text qualifier " (there aren't any - maybe there should be)

Error Message: ERROR:  invalid input syntax for type timestamp: "01:28:00"
LINE 1: ...','180480','','','1','2','1','1','1987-01-05','2','01:28:00'...
                                                             ^

[Msg] [Imp] Processed: 49, Added: 0, Updated: 0, Deleted: 0, Error: 1
[Msg] [Imp] Finished - Unsuccessfully

Open in new window


So I make both date and time field varchar and try again:

Error Message: ERROR:  malformed array literal: "5/1/1987"
LINE 1: ...1AD00002','529950','180480','','','1','2','1','1','5/1/1987'...
                                                             ^
DETAIL:  Array value must start with "{" or dimension information.

[Msg] [Imp] Processed: 49, Added: 0, Updated: 0, Deleted: 0, Error: 1
[Msg] [Imp] Finished - Unsuccessfully

Open in new window


and the query is I think: (not written by me I hasten to add, by Navicat)

[Msg] [Imp] Import start
[Msg] [Imp] Import type - CSV
[Msg] [Imp] Import from - /Users/ADM/Sync/6-Process01/accXtract.csv
[Msg] [Imp] Create table [accXtract]
[Msg] [Imp] Import table [accXtract]
[Err] [Imp] Cannot insert rows 1 - 49:
Query: insert into "public"."accXtract" ("Accident_Index","Location_Easting_OSGR","Location_Northing_OSGR","Longitude","Latitude","Police_Force","Accident_Severity","Number_of_Vehicles","Number_of_Casualties","Date","Day_of_Week","Time","Local_Authority_District","Local_Authority_Highway","1st_Road_Class","1st_Road_Number","Road_Type","Speed_limit","Junction_Detail","Junction_Control","2nd_Road_Class","2nd_Road_Number","Pedestrian_Crossing-Human_Control","Pedestrian_Crossing-Physical_Facilities","Light_Conditions","Weather_Conditions","Road_Surface_Conditions","Special_Conditions_at_Site","Carriageway_Hazards","Urban_or_Rural_Area","Did_Police_Officer_Attend_Scene_of_Accident","LSOA_of_Accident_Location","ACCREF","ACCMTH","A8M","ACCDAY","A8H","ACCYR","MONTH","Location_5Easting_OSGR","Location_5Northing_OSGR") values 

Open in new window

Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
It seems that you have a (typical, classical and awful) problem of dates handling.

On your first example, the target field expects a time stamp while you deliver a (hour) string, but the day field seems fine
On your second example, you have changed the day format to a string format different from your first example, and this creates the error.

My suggestion:
- keep the input fields unchanged
- keep the (export) day field as it was in the first example
- test the impact of changing the (export) hour format to a timestamp value insted of a date value
Anthony MellorChartered Accountant

Author

Commented:
Can't I set the dates and times to some other type? I don't need them. What type though? I tried "varchar" (thinking that might be a sort of catchall)  and that made no difference. It sounds like the system is deciding and I have to comply properly whereas I'd like to comply not at all.

I don't understand the bit where you write about "export" because I am trying to import only.
I did notice it didn't seem to like the seconds part of the time; I could delete those I think (AWK to the rescue)

Maybe the penny will drop over a cuppa tea.
If a column is defined as a timestamp type then it is expecting both a date and a time.  There is a specific TIME type for time on its own.
Ensure you’re charging the right price for your IT

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden using our free interactive tool and use it to determine the right price for your IT services. Start calculating Now!

Anthony MellorChartered Accountant

Author

Commented:
ah! thank you. It's being auto set, so when it's seeing either a date OR a time it's assuming both - I guess. I'll see.

wahay!

[Msg] [Imp] Import start
[Msg] [Imp] Import type - CSV
[Msg] [Imp] Import from - /Users/ADM/Sync//5-CombineBOTH/accEXT.csv
[Msg] [Imp] Create table [accEXT]
[Msg] [Imp] Import table [accEXT]
[Msg] [Imp] Processed: 585, Added: 585, Updated: 0, Deleted: 0, Error: 0
[Msg] [Imp] Finished - Successfully

Open in new window


That was on an extract of the file.. but the one that failed also works. Will try the full size file.

Tables 1
Processed 5,434,484
Errors 150
Added 5,425,230
Update 0
Deleted 0
Time 1523.64s

and the extract form the end is:

insert into "public"."acc" ("Accident_Index","Location_Easting_OSGR","Location_Northing_OSGR","Longitude","Latitude","Police_Force","Accident_Severity","Number_of_Vehicles","Number_of_Casualties","Date","Day_of_Week","Time","Local_Authority_(District)","Local_Authority_(Highway)","1st_Road_Class","1st_Road_Number","Road_Type","Speed_limit","Junction_Detail","Junction_Control","2nd_Road_Class","2nd_Road_Number","Pedestrian_Crossing-Human_Control","Pedestrian_Crossing-Physical_Facilities","Light_Conditions","Weather_Conditions","Road_Surface_Conditions","Special_Conditions_at_Site","Carriageway_Hazards","Urban_or_Rural_Area","Did_Police_Officer_Attend_Scene_of_Accident","LSOA_of_Accident_Location","ACCREF","ACCMTH","A8M","ACCDAY","A8H","ACCYR","MONTH","Location_5Easting_OSGR","Location_5Northing_OSGR") values ('2011921200188','323460','870490','-3.286473','57.717724','92','3','3','2','2011-12-18','1','21:35:00','930','S12000020','3','941','6','30','6','4','6','0','0','0','4','1','2','0','0','2','1','','921200188','12','35','18','21','2011','24150','32346','87049');
insert into "public"."acc" ("Accident_Index","Location_Easting_OSGR","Location_Northing_OSGR","Longitude","Latitude","Police_Force","Accident_Severity","Number_of_Vehicles","Number_of_Casualties","Date","Day_of_Week","Time","Local_Authority_(District)","Local_Authority_(Highway)","1st_Road_Class","1st_Road_Number","Road_Type","Speed_limit","Junction_Detail","Junction_Control","2nd_Road_Class","2nd_Road_Number","Pedestrian_Crossing-Human_Control","Pedestrian_Crossing-Physical_Facilities","Light_Conditions","Weather_Conditions","Road_Surface_Conditions","Special_Conditions_at_Site","Carriageway_Hazards","Urban_or_Rural_Area","Did_Police_Officer_Attend_Scene_of_Accident","LSOA_of_Accident_Location","ACCREF","ACCMTH","A8M","ACCDAY","A8H","ACCYR","MONTH","Location_5Easting_OSGR","Location_5Northing_OSGR") values ('2011921200193','394230','806020','-2.096969','57.145101','92','3','1','1','2011-11-21','2','17:36:00','910','S12000033','3','93','3','30','0','-1','-1','0','0','0','4','1','1','0','0','1','1','','921200193','11','36','21','17','2011','24153','39423','80602')
Error Message: ERROR:  invalid input syntax for type time: ""
LINE 5: ...66','57.142932','92','3','2','1','2011-12-03','7','','910','...
                                                                  ^

[Msg] [Imp] Processed: 5434484, Added: 5425230, Updated: 0, Deleted: 0, Error: 150
[Msg] [Imp] Finished - Unsuccessfully

and this is the line from the csv file:

"2011921200193","394230","806020","-2.096969","57.145101","92","3","1","1","21/11/2011","2","17:36","910","S12000033","3","93","3","30","0","-1","-1","0","0","0","4","1","1","0","0","1","1","","921200193","11","36","21","17","2011","24153","39423","80602"
the receiving part is expecting a time (not date) value at this place...
Anthony MellorChartered Accountant

Author

Commented:
that's what I was thinking, but how has it accepted 5 million previously ?

I'll just add the error line and the two before it here to compare:

"2011921200082","340210","853490","-3.001145","57.567589","92","2","2","1","20/12/2011","3","08:50","930","S12000020","4","9016","6","40","0","-1","-1","0","0","0","1","1","4","0","0","2","1","","921200082","12","50","20","08","2011","24152","34021","85349"
"2011921200188","323460","870490","-3.286473","57.717724","92","3","3","2","18/12/2011","1","21:35","930","S12000020","3","941","6","30","6","4","6","0","0","0","4","1","2","0","0","2","1","","921200188","12","35","18","21","2011","24150","32346","87049"
"2011921200193","394230","806020","-2.096969","57.145101","92","3","1","1","21/11/2011","2","17:36","910","S12000033","3","93","3","30","0","-1","-1","0","0","0","4","1","1","0","0","1","1","","921200193","11","36","21","17","2011","24153","39423","80602"
"2011921200318","387300","806150","-2.211498","57.146128","92","3","3","1","23/12/2011","6","14:00","910","S12000033","4","944","3","50","0","-1","-1","0","0","0","1","1","1","0","0","2","2","","921200318","12","00","23","14","2011","24155","38730","80615"

Open in new window


I cannot see any difference between lines 1 and 2 versus 3


can the text as seen at the hex level have an impact? and looking at the file in hex I can't see any odd characters around the date of the time: but I have plagued myself with file compatibility errors during this work.


comparison from the navicat import log above

previous line accepted, second line rejected

('2011921200188','323460','870490','-3.286473','57.717724','92','3','3','2','2011-12-18','1','21:35:00','930','S12000020','3','941','6','30','6','4','6','0','0','0','4','1','2','0','0','2','1','','921200188','12','35','18','21','2011','24150','32346','87049');
('2011921200193','394230','806020','-2.096969','57.145101','92','3','1','1','2011-11-21','2','17:36:00','910','S12000033','3','93','3','30','0','-1','-1','0','0','0','4','1','1','0','0','1','1','','921200193','11','36','21','17','2011','24153','39423','80602')

Open in new window


I'm stripping out the colons from the whole file and will try an import without specifying the time format.

edit 6 million "find and replaces" is taking a while...

light dawns. Removing the colons reduces the errors from 150 to 44. There were 44 files of one type incorporated in to this file. A quick search on their (old) headings reveals that I have combined i.e. not excluded line one of each file and within my 6 million records are 44 surplus headers. I suppose thanks are to Navicat for spotting this.
Anthony MellorChartered Accountant

Author

Commented:
In the end I deleted the date and time fields altogether, then the import worked.
Glad you eventually solved the problem. The repeated / extra headers have probablu not made things easier.

Note also that depending on how you import such a huge file, problems might arise from various time limits: if possible, avoid making a single huge import, that will help you to have at least not that possible source of trouble.
Anthony MellorChartered Accountant

Author

Commented:
Thanks - hadn't occurred to me there may be timeouts.
I'm still perplexed about how the headers got in there, because I combined  52 files all with headers and there were only 44 duplicates.. so either I set the ignore row one or I didn't and yet it seems I did both!

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial