Solved

SSIS CSV conversion to Pipe Flat File

Posted on 2014-01-21
6
548 Views
Last Modified: 2016-02-26
Implemented the solution below to convert extra csv commas to a pipe flat file, now getting an error regarding commas at the end of the file or outside of the delimiters:  



Solution for SSIS’s Flat File Connection which does not support alternating text qualifiers.
Replace the comma as our delimiter to the vertical pipe (|) character thus leaving commas intact. The quotes become unnecessary. For Tab Use:      '''' + REPLACE(REPLACE([Column1], '''', ''''''), CHAR(13) + CHAR(10), CHAR(9)) + ''''

Created a new DFT to reformat the input file
 
Configured the Flat File Source to “see” the input file as a single column file:
 
Added the output that will be the modified row:
 
      
     Imports System
     Imports System.Data
     Imports System.Math
     Imports Microsoft.SqlServer.Dts.Pipeline.Wrapper
     Imports Microsoft.SqlServer.Dts.Runtime.Wrapper
     
    Public Class ScriptMain
        Inherits UserComponent
     
   Public Overrides Sub UnTransformedInput_ProcessInputRow(ByVal Row As UnTransformedInputBuffer)
            '
            ' Unquote and  replace the comma to pipe (|) to become the delimiter
            '
     
            Dim raw_string As String = String.Empty
            Dim seacrhedPos As Int32
            Dim firstCommaPos As Int32
            Dim nextQuotePos As Int32
            Dim section_to_modify As String
     
            raw_string = Row.UnTransformedOrgRow
     
            'MesssageBox to see the initial value
            System.Windows.Forms.MessageBox.Show(raw_string)
     
            If raw_string.Length > 0 Then
     
                seacrhedPos = raw_string.IndexOf(",""", 0)
    
                While seacrhedPos <> -1
                    firstCommaPos = raw_string.IndexOf(",", seacrhedPos + 1)
     
                    nextQuotePos = raw_string.IndexOf(""",", firstCommaPos + 1)
     
                    If nextQuotePos > 0 Then
                        section_to_modify = raw_string.Substring(seacrhedPos + 1, raw_string.Length - seacrhedPos - (raw_string.Length - nextQuotePos))
     
                        section_to_modify = section_to_modify.Replace(",", "~").Replace("""", "")
     
                        'Replace 
                        raw_string = raw_string.Substring(0, seacrhedPos + 1) & section_to_modify & _
                        raw_string.Substring(nextQuotePos + 1, raw_string.Length - nextQuotePos - 1)
     
                    Else
                        section_to_modify = raw_string.Substring(seacrhedPos + 1, raw_string.Length - seacrhedPos - 1)
     
                        section_to_modify = section_to_modify.Replace(",", "~").Replace("""", "")
     
                        'Replace 
                        raw_string = raw_string.Substring(0, seacrhedPos + 1) & section_to_modify
     
                    End If
     
                    'Get the next set
                    seacrhedPos = raw_string.IndexOf(",""", 0)
                End While
     
       raw_string = raw_string.Replace(",", "|").Replace("~", ",")
     
            End If
     
           'MesssageBox to see the final value
           System.Windows.Forms.MessageBox.Show(raw_string)
     
            Row.TransformedRow = raw_string
        End Sub
     
    End Class

Open in new window

0
Comment
Question by:JGH5
6 Comments
 

Author Comment

by:JGH5
ID: 39798549
Name             Number
Should Be:        Apple Inc            12345
Getting Results Of:       Apple                 Inc
0
 
LVL 37

Accepted Solution

by:
ValentinoV earned 500 total points
ID: 39799730
Sounds to me like you're making this way too complicated.  If all you want/need to do is replace the comma separator with a pipe then I wouldn't bother implementing a data flow transformation.  Just use a Script Task in the Control Flow with following code (or similar).  It assumes that there are two package variables: Input that contains full path + name of input file and Output with full path + name for output file.

//additional using
using System.IO;

        public void Main()
        {
            string input = Dts.Variables["Input"].Value.ToString();
            string output = Dts.Variables["Output"].Value.ToString();

            string fileContent = File.ReadAllText(input);
            fileContent = fileContent.Replace(",", "|");
            File.WriteAllText(output, fileContent);

            Dts.TaskResult = (int)ScriptResults.Success;
        }

Open in new window

BTW: I don't have the time to analyze your code but have the impression, based on the length and the comment at the top, that it's doing more than just replacing comma with pipe.  You should be able to achieve this through my code as well, by adding additional calls to .Replace.  Please post more details (in words and with examples, not code) on what you need to achieve if you need further assistance with this.
0
 
LVL 44

Expert Comment

by:Rainer Jeschor
ID: 39800857
Hi,
which version of SQL Server / SSIS?

Can you perhaps please explain your original issue?
As far as I understand, you have a CSV file with comma as delimiter and the quotes as text separators to encapsulate commas inside text columns.
You want now to import / read this CSV and export it as flat file with the pipe as delimiter without text separators - correct?

Thanks.
Rainer
0
 

Author Comment

by:JGH5
ID: 39801852
@Rainer,

2008 R2

Yes read CSV and export to a pipe delimiter without text separators.
0
 

Expert Comment

by:tbruinsma
ID: 40202417
This solution has one failure point, Escaped text qualifiers (0,"This is a ""test""!", 0.00) will cause it to fail. I put this script in the middle of a for each file loop to process all of the files before i began the import. Not the most elegant, but it solves the problem for me.

using System;
using System.IO;
using System.Data;
using Microsoft.SqlServer.Dts.Runtime;
using System.Windows.Forms;
using System.Text.RegularExpressions;

        public void Main()
        {
            try {
                string _srcFile = Dts.Variables["CurrentFile"].Value.ToString();
                string _tmpFile = _srcFile + ".tmp";
                string _inputLine = "";
                string _outputLine = "";

                //build the regex
                string _expression = String.Format("{0}(?=(?:[^{1}]*{1}[^{1}]*{1})*(?![^{1}]*{1}))",Regex.Escape(","), Regex.Escape("\""));
                RegexOptions _regexOptions = RegexOptions.Compiled | RegexOptions.Multiline;
                Regex _regex = new Regex(_expression, _regexOptions);

                //rename the file to tmp
                File.Move(_srcFile, _tmpFile);

                using (StreamReader sr = new StreamReader(_tmpFile))
                {
                      //open the output file
                    using (StreamWriter sw = new StreamWriter(_srcFile))
                    {
                            //read the file in line by line
                        while (sr.Peek() >= 0)
                        {
                            _inputLine = sr.ReadLine();

                                  try {
                                        //rip, strip and replace
                                        _outputLine = string.Join("|", _regex.Split(_inputLine));
                                sw.WriteLine(_outputLine);
                                  } catch (Exception p_Ex) {
                                        Console.WriteLine(p_Ex);
                                Console.WriteLine(_inputLine);
                                  }
                            }

                            //close the output file
                        sw.Close();
                      }

                      //close the input file
                      sr.Close();
                }

                //delete the input file
            } catch (Exception p_Ex) {
                Dts.Log(p_Ex.Message,0,new byte[0]);
                Dts.TaskResult = (int)ScriptResults.Failure;
                return;
            }

           Dts.TaskResult = (int)ScriptResults.Success;
            return;
        }
    }
0

Featured Post

Enabling OSINT in Activity Based Intelligence

Activity based intelligence (ABI) requires access to all available sources of data. Recorded Future allows analysts to observe structured data on the open, deep, and dark web.

Join & Write a Comment

My client has a dictionary table. They're defining a list of standard naming convention. Now, they are requiring my team to provide us a mechanism how to match new incoming data with existing data in their system.
This article explains how to reset the password of the sa account on a Microsoft SQL Server.  The steps in this article work in SQL 2005, 2008, 2008 R2, 2012, 2014 and 2016.
Using examples as well as descriptions, and references to Books Online, show the documentation available for datatypes, explain the available data types and show how data can be passed into and out of variables.
The viewer will learn how to use a discrete random variable to simulate the return on an investment over a period of years, create a Monte Carlo simulation using the discrete random variable, and create a graph to represent the possible returns over…

760 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now