rowek
asked on
SQL*Loader How To strip out CR/LF in one column
We have a very simple need and have received a lot of conflictig advise. The simple question is "How do I remove a CR or LF out of one column contained in a CSV text INFILE?" I want to do this inside the SQL*Loader 10g CTL file below. The rest of this proces has run for four months as is, but we now want to remove these characters from the COMMENTS column to include its data as well.
Note that MS Access is the source database, we run a TransferText to produce the CSV text file. CR/LF is legal in MS Access but not in Oracle, so we need to get SQL*Loader to strip these out. Must be done here, not upstream. Thanks!
LOAD DATA
INFILE 'Refer to app.config file'
BADFILE 'Refer to app.config file'
DISCARDFILE 'Refer to app.config file'
APPEND
INTO TABLE dairy.insp
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' TRAILING NULLCOLS
(
insp_ID INTEGER EXTERNAL,
Insp_Empl_ID CHAR,
.
. <some detail removed>
.
SA_Date_Hazard_Analysis_Is sue "to_date(:SA_Date_Hazard_A nalysis_Is sue,'MM/DD /YYYY HH24:MI:SS')",
SA_Date_HACCP_Plan_Issue "to_date(:SA_Date_HACCP_Pl an_Issue,' MM/DD/YYYY HH24:MI:SS')",
SA_Date_Prereq_Programs_is sue "to_date(:SA_Date_Prereq_P rograms_is sue,'MM/DD /YYYY HH24:MI:SS')",
Insp_Order INTEGER EXTERNAL,
COMMENTS CHAR, <=========== Need to remove CR/LRs from this column
CREATE_DATE "to_date(:CREATE_DATE,'MM/ DD/YYYY HH24:MI:SS')",
CREATE_USER CHAR,
MODIFY_DATE "to_date(:MODIFY_DATE,'MM/ DD/YYYY HH24:MI:SS')",
MODIFY_USER CHAR,
PROGRAM_USE CHAR,
Upload_Date "to_date(:Upload_Date ,'MM/DD/YYYY HH24:MI:SS')",
Press_Insp CHAR
)
Note that MS Access is the source database, we run a TransferText to produce the CSV text file. CR/LF is legal in MS Access but not in Oracle, so we need to get SQL*Loader to strip these out. Must be done here, not upstream. Thanks!
LOAD DATA
INFILE 'Refer to app.config file'
BADFILE 'Refer to app.config file'
DISCARDFILE 'Refer to app.config file'
APPEND
INTO TABLE dairy.insp
FIELDS TERMINATED BY ',' OPTIONALLY ENCLOSED BY '"' TRAILING NULLCOLS
(
insp_ID INTEGER EXTERNAL,
Insp_Empl_ID CHAR,
.
. <some detail removed>
.
SA_Date_Hazard_Analysis_Is
SA_Date_HACCP_Plan_Issue "to_date(:SA_Date_HACCP_Pl
SA_Date_Prereq_Programs_is
Insp_Order INTEGER EXTERNAL,
COMMENTS CHAR, <=========== Need to remove CR/LRs from this column
CREATE_DATE "to_date(:CREATE_DATE,'MM/
CREATE_USER CHAR,
MODIFY_DATE "to_date(:MODIFY_DATE,'MM/
MODIFY_USER CHAR,
PROGRAM_USE CHAR,
Upload_Date "to_date(:Upload_Date ,'MM/DD/YYYY HH24:MI:SS')",
Press_Insp CHAR
)
ASKER
Actually I think I will need an OR.
My fault, but it can be either a CR or a LF or both. I will look at text file with a hex editor and post more detail, but if you could help me with and OR situaiton I should be able to award points. Thanks!
My fault, but it can be either a CR or a LF or both. I will look at text file with a hex editor and post more detail, but if you could help me with and OR situaiton I should be able to award points. Thanks!
In that case, use translate
translate(COMMENTS ,chr(13)||chr(10),' ') -- to replace CR or LF with a white space
Again, the important factor is that you don't use newlines as record delimiters. If you do, then this method will not work and you will most likely have to either modify the export process from Access or to do some preprocessing on the CSV file.
translate(COMMENTS ,chr(13)||chr(10),' ') -- to replace CR or LF with a white space
Again, the important factor is that you don't use newlines as record delimiters. If you do, then this method will not work and you will most likely have to either modify the export process from Access or to do some preprocessing on the CSV file.
ASKER
Okay, this may sound silly, but what is the best way to tell if that is the case? use a hex editor to look at the file? MS Access runs on Windows XP and creates a standard CSV text file. If I do have NewLines what else can I do? The change must take place in this process as the export process cannot be changed. Thanks. Testing now.
ASKER
I just did extensive testing. The text file is dependent on the CRLF at the end of each row to terminate the row. Looks like the REPLACE and TRANSLATE will not work. Is there another way to handle this INSIDE of the SQL*Loader Control file? Thank you for your efforts so far.
To the best of my knowledge, no. What I've done in situations like this was to run a small app that would pre-process the file and remove the newlines inside the text field.
Can you completely bypass SQL loader and just write an app that connects to the Access database, pulls the data and inserts it into Oracle?
Can you completely bypass SQL loader and just write an app that connects to the Access database, pulls the data and inserts it into Oracle?
ASKER
I would love to write that small app, but the situation has gone political and additional programming is not allowed. If we could do it in the CTL file then that would be okay, but no new code. The solution we decided on was to use an aliasing query on the export in order to replace the COMMENTS column contents with NULL. Not elegant, but they do not really care about that column for their application. I guess if there was a silver bullet they would go for it, but the project is out of funding and we need to close it out quickly.
I did confirm with Visual Notepad++ that there are CRLFs at the end of each row and in the records that are getting thrown to the BAD file.
Thanks again for your help.
I did confirm with Visual Notepad++ that there are CRLFs at the end of each row and in the records that are getting thrown to the BAD file.
Thanks again for your help.
rowek,
The problem is that SQL*Loader can't tell that you are still on the same row. The CR/LF looks like a new row. Try taking out the OPTIONALLY in the "OPTIONALLY ENCLOSED BY" and see if that allows it to scan multiple records. I'll look and see if I can find an example of multiple physical records becoming one logical record.
Good luck!
The problem is that SQL*Loader can't tell that you are still on the same row. The CR/LF looks like a new row. Try taking out the OPTIONALLY in the "OPTIONALLY ENCLOSED BY" and see if that allows it to scan multiple records. I'll look and see if I can find an example of multiple physical records becoming one logical record.
Good luck!
rowek,
OK, I think this is the right path. Put the OPTIONALLY back in if you have any data not enclosed by quotes (like INSP_ID, I'm guessing). The add this as the next line:
CONTINUEIF LAST PRESERVE != ''''
As long as the last column in the record is a string (and enclosed by a quote), I think this will work. Here's where I got it from (10g, but not much changed in this area): http://download.oracle.com/docs/cd/B19306_01/server.102/b14215/ldr_control_file.htm#i1005509
Good luck!
OK, I think this is the right path. Put the OPTIONALLY back in if you have any data not enclosed by quotes (like INSP_ID, I'm guessing). The add this as the next line:
CONTINUEIF LAST PRESERVE != ''''
As long as the last column in the record is a string (and enclosed by a quote), I think this will work. Here's where I got it from (10g, but not much changed in this area): http://download.oracle.com/docs/cd/B19306_01/server.102/b14215/ldr_control_file.htm#i1005509
Good luck!
ASKER
DrSQL: the politics have won out on this one. We elected to drop to column, but I will do this...I will try your suggestion tomorrow and if it works I will award points. I want to leave a solution. gatorvip was very close but we needed the CRLF at the end of the steam. Must run now...late for appt, but will try in the morning. Thank you both for trying. Keith
ASKER CERTIFIED SOLUTION
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
awking00,
From what I gather, the problem is that the content of the data file might look like:
1,'FRED','I have only myself to blame.<CR>
<LF>If only I had looked right when I stepped off the curb on my visit to England.'
And SQL*Loader sees that as two records, when it should be one. I could be wrong, but that's the problem I was trying to solve for rowek.
Good luck!
From what I gather, the problem is that the content of the data file might look like:
1,'FRED','I have only myself to blame.<CR>
<LF>If only I had looked right when I stepped off the curb on my visit to England.'
And SQL*Loader sees that as two records, when it should be one. I could be wrong, but that's the problem I was trying to solve for rowek.
Good luck!
ASKER
DrSQL, that's exactly the issue: rows are being "broken" sometimes two or three times. Sorry I cannot test at the moment, must run payroll for my troops. I hope to confirm your technique later today. Cheers.
Thanks, DrSQL
Now, I see the problem. I gather it only gets worse if the comments data also contains commas.
Now, I see the problem. I gather it only gets worse if the comments data also contains commas.
ASKER
awking00, do you still think your method is worth pursuing? I did reveal your code and it appears to only strip out those characters for columns that I pass to it. This means I would not harm the CRLF needed at the end of each row. Will test after payroll run.
I would say it's worth a try.
ASKER
My data has been changed. The PM asked us to put a "NR" in the COMMENTS field so we could close out the project. If I can locate a copy of the old data (the one with the CRLFs in it) then I will see this thru. Its important for me to know and for the other folks that are searching out a solution like I was.
Keith
Keith
>>CONTINUEIF LAST PRESERVE != ''''
This might work. Last year I tried to get it to run in an external table but was unable to do so, then I took a different approach.
>>Create a function to remove cr/lf, then only apply it to the comments column.
I don't think that's going to work, as by the time the COMMENTS field is passed to the function it has already been parsed in by the sql loader. If you test the function separately it will do it, of course, but not when it's invoked in the parfile.
>> Its important for me to know and for the other folks that are searching out a solution like I was.
I'm definitely interested to see if you're able to find a viable solution here.
This might work. Last year I tried to get it to run in an external table but was unable to do so, then I took a different approach.
>>Create a function to remove cr/lf, then only apply it to the comments column.
I don't think that's going to work, as by the time the COMMENTS field is passed to the function it has already been parsed in by the sql loader. If you test the function separately it will do it, of course, but not when it's invoked in the parfile.
>> Its important for me to know and for the other folks that are searching out a solution like I was.
I'm definitely interested to see if you're able to find a viable solution here.
ASKER
I am out all this week for surgery...it may be a while. Thank you both for trying hard.
ASKER
Good news, I may be able to locate an old copy of the data and test out this suggestion:
<<<awking00:Create a function to remove cr/lf, then only apply it to the comments column.>>>
Will know in the morning. I want to know just as bad as you guys. This is a real weakness of Oracle SQL*Loader, IMHO.
<<<awking00:Create a function to remove cr/lf, then only apply it to the comments column.>>>
Will know in the morning. I want to know just as bad as you guys. This is a real weakness of Oracle SQL*Loader, IMHO.
ASKER
Thank you for your patience on this one, surgery had me down for a while. I gave you an excellent for effort. The solution is partial. It seems to work well when only one CRLF is found in the COMMENTS field. If multiple CRLFs are found then I get "second enclosure string not present". In the BAD file I see a double quote to start the string but no ending double quote. Thank you again for the solution and the patience. Keith
REPLACE(COMMENTS ,chr(13)||chr(10),' ') COMMENTS