Solved

SQL - Processing text files without CR

Posted on 2011-09-06
14
231 Views
Last Modified: 2012-05-12
Hi,
can you please help.
I am loading and processing the contents of text files into a table.
However, some of my files have LF instead of CR.

I've been told that if I open the file in notpad and then save it, it will resolve the problem.

Has anyone come across a similar problem and if so a little guidance on how to do this would be really appreciated.

Regards

0
Comment
Question by:EWHTLC
  • 8
  • 5
14 Comments
 
LVL 40

Expert Comment

by:RQuadling
ID: 36489268
What are you using to load the data?

I use
 TRUNCATE TABLE TripleCDataConversion.dbo.Addresses
 BULK INSERT TripleCDataConversion.dbo.Addresses
 FROM 'D:\CCC2SQL\AD.TAB'
 WITH
  (
  FIELDTERMINATOR = '\t',
  KEEPIDENTITY,
  KEEPNULLS,
  LASTROW =               3313,
  ORDER (UniqueID ASC)
  )

Open in new window


The FIELDTERMINATOR is the tab character.

By default the ROWTERMINATOR is '\r\n' (http://msdn.microsoft.com/en-us/library/ms188365.aspx).

If you added or used ROWTERMINATOR = '\n', then that should all work for you.
0
 
LVL 40

Expert Comment

by:RQuadling
ID: 36489284
http://msdn.microsoft.com/en-us/library/ms191485.aspx disagrees with the previous link.

Here it says "\n" is the default ROWTERMINATOR.

Either way, play around with these to find what matches your data.
0
 
LVL 42

Expert Comment

by:dqmq
ID: 36489466
Different platforms use conventions for marking a new line in text files  For example, Unix uses LF and Windows uses CRLF.  Some older systems use CR, but I think it's pretty rare today.

It's generally best to convert to the native format when the file is first loaded onto the platform.  FTP will do so, provided the transfer is done in ASCII or character mode rather than binary mode.  Free commands utilities are also available that can do the conversion for most platforms.

Some text editors are capable of translating different standards and even translating to the "correct" standard during a save operation. Notepad, I kind of doubt--you may have better luck with Wordpad a command line tool designed for the purpose.
 
0
 

Author Comment

by:EWHTLC
ID: 36489875
Hi There
Thanks for this it looked great to me.
However, although the syntax looks fine it doesn't seem to do anything.
I've tried /t /n and /r and nothing seems to have any effect.

here's my code

  EXEC ('bulk INSERT temp_EPOS_Ord_File FROM "' + 'V:\ewh\Website\Live\ErroredFiles\' + @filename + '" WITH (FIELDTERMINATOR = ' + '''' + '\n' + '''' + ', KEEPIDENTITY, KEEPNULLS, LASTROW = 3313)')

Any ideas
0
 
LVL 40

Expert Comment

by:RQuadling
ID: 36489944
ROWTERMINATOR.

Not FIELDTERMINATOR.

Different purposes.

column FIELDTERMINATOR column FIELDTERMINATOR column ROWTERMINATOR
0
 

Author Comment

by:EWHTLC
ID: 36490525
oops.
Cheers but still not joy

  EXEC ('bulk INSERT temp_EPOS_Ord_File FROM "' + 'V:\ewh\Website\Live\ErroredFiles\' + @filename + '" WITH (ROWTERMINATOR = ' + '''' + '\n' + '''' + ', KEEPIDENTITY, KEEPNULLS, LASTROW = 3313)')
0
 
LVL 40

Expert Comment

by:RQuadling
ID: 36491355
Using the following code
create table FamilyNames (Name varchar(50), Age smallint)
bulk insert FamilyNames from 'D:\FamilyNamesCR.csv' with (CODEPAGE='RAW', DATAFILETYPE='CHAR', FIELDTERMINATOR='\t', ROWTERMINATOR='\r')
bulk insert FamilyNames from 'D:\FamilyNamesLF.csv' with (CODEPAGE='RAW', DATAFILETYPE='CHAR', FIELDTERMINATOR='\t', ROWTERMINATOR='\n')
bulk insert FamilyNames from 'D:\FamilyNamesCRLF.csv' with (CODEPAGE='RAW', DATAFILETYPE='CHAR', FIELDTERMINATOR='\t', ROWTERMINATOR='\r\n')
select * from FamilyNames
drop table FamilyNames

Open in new window


And 3 different CSV files (Windows CRLF, Unix LF and Mac CR).

I'm getting ...

(5 row(s) affected)
Msg 4864, Level 16, State 1, Line 3
Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 1, column 2 (Age).
Msg 4864, Level 16, State 1, Line 4
Bulk load data conversion error (type mismatch or invalid character for the specified codepage) for row 1, column 2 (Age).

(5 row(s) affected)

Open in new window


So 5 rows imported from the first file using the \r terminator, but that's it.

Changing the terminator to be used for the third file to \n (rather than the correct \r\n) and that file is imported.

Argh! http://msdn.microsoft.com/en-us/library/ms191485.aspx
Using Row Terminators
The row terminator can be the same character as the terminator for the last field. Generally, however, a distinct row terminator is useful. For example, to produce tabular output, terminate the last field in each row with the newline character (\n) and all other fields with the tab character (\t). To place each data record on its own line in the data file, specify the combination \r\n as the row terminator.

Note:  
When you use bcp interactively and specify \n (newline) as the row terminator, bcp automatically prefixes it with a \r (carriage return) character, which results in a row terminator of \r\n.

I SEEM to be getting this error using BULK INSERT.

Seems like MS SQL won't let you import data from a unix machine.

But no. Example C at http://msdn.microsoft.com/en-us/library/ms188365.aspx

C. Using line feed as a row terminator
The following example imports a file that uses the line feed as a row terminator such as a UNIX output:
DECLARE @bulk_cmd varchar(1000)
SET @bulk_cmd = 'BULK INSERT AdventureWorks2008R2.Sales.SalesOrderDetail
FROM ''<drive>:\<path>\<filename>'' 
WITH (ROWTERMINATOR = '''+CHAR(10)+''')'
EXEC(@bulk_cmd)

Open in new window

FamilyNames.zip
0
What Should I Do With This Threat Intelligence?

Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

 
LVL 40

Accepted Solution

by:
RQuadling earned 500 total points
ID: 36491406
Here you go. Using the same zip files as previously supplied.

create table FamilyNames (Name varchar(50), Age smallint)

-- Successful statement to import data where the row terminator is CR \r
bulk insert FamilyNames from 'D:\FamilyNamesCR.csv' with (FIELDTERMINATOR='\t', ROWTERMINATOR='\r')


-- Successful statement to import data where the row terminator is LF \n
DECLARE @bulk_cmd varchar(1000)
SET @bulk_cmd = 'bulk insert FamilyNames from ''D:\FamilyNamesLF.csv'' with (FIELDTERMINATOR=''\t'', ROWTERMINATOR='''+CHAR(10)+''')'
EXEC(@bulk_cmd)


-- Successful statement to import data where the row terminator is CRLF \r\n
bulk insert FamilyNames from 'D:\FamilyNamesCRLF.csv' with (FIELDTERMINATOR='\t', ROWTERMINATOR='\n')

-- Show the lengths to show that no trailing odd characters were added
select *, LEN(name) from FamilyNames order by Name

drop table FamilyNames

Open in new window


outputs
Name	Age	(No column name)
Hesper	8	6
Hesper	8	6
Hesper	8	6
Oscar	3	5
Oscar	3	5
Oscar	3	5
Richard	43	7
Richard	43	7
Richard	43	7
Sally	37	5
Sally	37	5
Sally	37	5
Thomas	6	6
Thomas	6	6
Thomas	6	6

Open in new window

with message of
(5 row(s) affected)

(5 row(s) affected)

(5 row(s) affected)

(15 row(s) affected)

Open in new window

0
 

Author Comment

by:EWHTLC
ID: 36494269
Thanks for this you are being really helpful.
I can see the theory is absolutely correct but it still doesn't seem to work.

I don't have a field delimiter. I'm inserting the record as a single string.
I've tried 9,10 and 13 but still no success.

The files are being sent to me by a third party so I have no control over format. i'm told that they are in 'linex' format so it should need the /n (char(10)) options you provided.
It would be nice if they were consistent. It appears some are line terminited and some CR.
(just my luck)

All I know if that if I open the file in notepad and then save it, the terminator corrects iteaslf and all is OK withthe file.
Obviously I can't do this in batch mode.

I will keep trying. Thanks again for your help.
0
 
LVL 40

Expert Comment

by:RQuadling
ID: 36494315
OK. So you need only to define the ROWTERMINATOR.

Can you zip a file and send it to me.

As you are on windows, can you try the following.

1 - Open a command prompt
2 - Change directory to the location of the file.
3 - Type in the following command
edit /078 <filename>

Open in new window


That will give a screen along the lines of An example screen showing the binary mode, wrapped to 78 characters.
If you look carefully, moving the cursor to the end of the word "UPDATE" (the file I'm showing is a saved SQL statement), the next character is musical note. It has a value (look in the bottom right corner of the screen) "Line:1   Col:7   Value:13". Next to it "Line:1   Col:8   Value:10" and finally, some Value:9.

This is telling me that the lines terminator with CR(13 / 0x0D) LF (10 / 0x0A).

The 9's are tab characters for the next line.

The actual text looks like
UPDATE
                                CONTRACTS.dbo.POP_HEADER
                        SET
                                POH_RECHARGE_DATETIME = GetDate(), -- Now
                                POH_RECHARGE_STATUS = 6 -- Recharged/NoDamage/Ignore
                        WHERE
                                POH_PRIMARY = 1269460;

Open in new window


Once you can confirm how the lines are terminated, then we can see why you are not getting the desired results.

How are you executing the SQL?

0
 
LVL 40

Expert Comment

by:RQuadling
ID: 36494328
If you look through the rest of the file, you will see the same 13 / 10 pairing for each line ending.
0
 

Author Comment

by:EWHTLC
ID: 36498443
Well thanks to you it is now working.
I did everything you said which confirmed that it was indeed a LF being used as a row terminator.
So I went bak to your last solution and actually found out that it was the wat I was displaying the data to check the results.

In fact it works fine

Firstly I must apologies for not picking this up before your last post but must must also add that I really benefitted from the extra posts you made. That's some really useful stuff
0
 

Author Closing Comment

by:EWHTLC
ID: 36498453
What an abolutely brilliant response.
some much useful informaton in here
0
 
LVL 40

Expert Comment

by:RQuadling
ID: 36501459
Your question made me examine a LOT of things I took for granted. That is one of the joys of Experts Exchange. I may know a lot, but I don't know everything and every question can be a learning opportunity for both of us.

Really pleased you got it working and very chuffed with your comments.

I just read your profile. If you fancy a chat, drop me a line. My e-addy is on my profile.
0

Featured Post

Control application downtime with dependency maps

Visualize the interdependencies between application components better with Applications Manager's automated application discovery and dependency mapping feature. Resolve performance issues faster by quickly isolating problematic components.

Join & Write a Comment

Introduction: When running hybrid database environments, you often need to query some data from a remote db of any type, while being connected to your MS SQL Server database. Problems start when you try to combine that with some "user input" pass…
INTRODUCTION: While tying your database objects into builds and your enterprise source control system takes a third-party product (like Visual Studio Database Edition or Red-Gate's SQL Source Control), you can achieve some protection using a sing…
This video gives you a great overview about bandwidth monitoring with SNMP and WMI with our network monitoring solution PRTG Network Monitor (https://www.paessler.com/prtg). If you're looking for how to monitor bandwidth using netflow or packet s…
This tutorial demonstrates a quick way of adding group price to multiple Magento products.

746 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now