File copy using bcp with newlines in data

Posted on 2005-05-14
Last Modified: 2008-01-09
I am dumping data that I would like to import into Excel. Unfortunately, there are many newlines in the text fiels, so I am restoreing to the -r switch to change the record separator, so that the embedded newlines can be distinguished.

This makes is hard to import into Excel.

So far I have tried using regex replacement, but cannot so far get the right combination.

To solve this, I either need a way to escape the newlines as they are exported, or a set to regex replacements that I can do to get the data into excel. It would be ok to have the newlines replaced by some character while in Excel. Unfortunately every combination I have tried so far, evenb including muilti-character sequences, leads to some conflict somewhere along the lines.

Question by:jasimon9
    LVL 13

    Accepted Solution

    You cannot use Coll001 = REPLACE(REPLACE(Coll001,CHAR(13),' '),CHAR(10),' ') directly in SELECT statement?

    Author Comment

    While it would be possible on a case by case basis to use a queryout with bcp and the suggested transformation, it would be too tediou in general, as there are lots of tables with lots of columns that would need to be handled this way.

    Author Comment

    Actually, all it takes sometimes is writing out the problem for the solution to jump into your mind. A simipler solution has just occurred to me that should work:

    1. Copy via bcp and substitute a mutil-character string guaranteed not to occur in the data for the record separator.

    2. Replace all newlines in the text output file with a symbol that does not occur in the file, such as a pipe. Unfortunately, such a symbol differs by file and there is probably not a generic symbol that works for each file). As stated in my question, it is ok to have such a character left in the Excel file.

    3. Replace the multi-character string with newlines.

    4. Import into Excel.

    I am going to try this approach and report back if it works.

    Author Comment

    While my approach works in principal, it is bogging down in practice. I usually use a regex-enabled text editor, but it is choking on files of production size (over several MB).

    So I tried a version of sed to try to replace the newlines, but cannot get that to work. So I still need either more help with this approach, or a new approach.
    LVL 13

    Expert Comment

    10 MB/min

    If you want to make it faster, you must rewrite script logic into VB6 or C++ code.

    start /wait CSV2excel.vbs testin.txt testout.txt

    dim txtin
    dim fso
    dim str
    dim stra
    dim cnt
    dim cnta
    dim i

    Const ForReading = 1
    Const ForWriting = 2


    set fso=CreateObject("Scripting.FileSystemObject")
    set txtin=fso.OpenTextFile(WScript.Application.Arguments(0),ForReading,False)
    set txtout=fso.OpenTextFile(WScript.Application.Arguments(1),ForWriting,True)
    do while not txtin.AtEndOfStream

      cnt= ( len(str) - len(replace(str,TEXTDELIMITER,"") )) / LENTEXTDELIMITER
      if i=1 then
      end if
      if cnta MOD 2 = 0 then
        txtout.WriteLine stra
      end if
    set fso=nothing
    set txt=nothing
    LVL 1

    Expert Comment

    Have you considered using DTS with the Excel provider to export the data directly to an Excel file?
    LVL 13

    Expert Comment

    With DTS, you are able to fill Excel sheet into correct rows, but contents of output rows is exactly the same as input rows. Excel needs LF row-in-row delimiter (ALT+Enter), for CRLF delimiter it displays 2 "unknown" characters. DTS can handle this in ActiveX transformation, it is similar to REPLACE directly in SELECT,but it uses DTS server memory and CPU.

    Author Comment

    The new "bcp" procedure is intended to replace the current direct export using DTS into Excel files. I did not give this background before, but a Q & D export into excel has been in production for several years (and is still).

    The reasons to avoid the export into Excel:

    1. It is a slow, ugly process. The are some tables which exceed the 64K row limit, which require special processing.

    2. It is slow. The Excel export takes many miinutes -- at least 10, whereas the bcp export is less than a minute.

    3. The ability to use a batch process which also includes additional post-processing means that ultimately it can be more highly automated.

    4. Probably security issues of the current process that I am not going to get into here, but which are completely solved by the new method.

    5. Better data handling. Some of the large text fields with embedded linefeeds are not handled properly in Excel. The troubles seem to be erratic.

    In any case, the Excel method is really Q & D (quick and dirty) and the newer method is much more robust and elegant.

    I have been considering the queryout suggestion from ispaleny. It actually may work because as it turns out, the embedded linefeeds are probably only in 3 fields. So with some custom work on the bcp script for those specific areas, it may provide the solution. When I have a chance I will explore this avenue further.

    Author Comment

    I was finally able to finish this project, and last weekend actually replaced the Q & D export into Excel with a bcp-based export. It is working nicely and is now fully automated.

    After all was said and done, the queryout method suggested by isapaleny was the only that worked, with special processing for the fields where is was needed.

    Therefore, points to isapaleny.

    Featured Post

    What Should I Do With This Threat Intelligence?

    Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

    Join & Write a Comment

    I recently came across an interesting Question In EE ( and was puzzled about how to achieve that using SSIS out of the box tasks, which was i…
    Introduction SQL Server Integration Services can read XML files, that’s known by every BI developer.  (If you didn’t, don’t worry, I’m aiming this article at newcomers as well.) But how far can you go?  When does the XML Source component become …
    Familiarize people with the process of retrieving data from SQL Server using an Access pass-thru query. Microsoft Access is a very powerful client/server development tool. One of the ways that you can retrieve data from a SQL Server is by using a pa…
    Viewers will learn how to use the SELECT statement in SQL to return specific rows and columns, with various degrees of sorting and limits in place.

    730 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    16 Experts available now in Live!

    Get 1:1 Help Now