Strip carriage returns / line feeds from database records before they are written to csv

I have the following code that reads lines from a database and writes the matching results to a csv file. The problem I am running into is there are occasionally carriage returns / line feeds in some of the fields in different rows which cause the csv file to be unusable because there are bogus rows.

For example, here is a sample of what happens when there are carriage returns / line feeds in the SQL data and the affect it has on the file. ... Sample content of messed up file:

field1|field2|field3|field4|field5
value 1|value 2|value 3|value 4|value 5
value 1|value 2|value 3|value 4|value 5
value 1|value 2|val
ue 3|value 4|value 5
value 1|value 2|value 3|va
lue 4|value 5

Here is the code that writes the results of a SQL query to the output file. What I am trying to do is strip any results that have carriage returns / line feeds.


'''
    While loop to read each row.  if compares row[2] (updated) against the last record processed
    '''
    latest = params #Declare 'latest' variable for consumption by while loop
    while row:
        if row[2] > latest:
            latest = row[2]
        logger.debug("[%s] - Writing row %s", correlationId, row)
        writer.writerow(row)
        row = cursor.fetchone()

    logger.info("[%s] - last letter date %s " % (correlationId, lastProcessed))
    lastProcessedLog = open(LAST_PROCESSED_LOGFILE , 'wt')
    lastProcessedString = str(latest)
    lastProcessedString = lastProcessedString[0:19]
    lastProcessedLog.write(lastProcessedString)
    lastProcessedLog.close()

    conn.close()
    ofile.close()
    logger.info("[%s] - Copying %s to root for loadBackflow as %s", correlationId, writeFile, outfile)
    shutil.copyfile(writeFile, outfile)
    logger.info("[%s] - Moving %s to completion folder %s", correlationId, writeFile, completionFolder)
    shutil.move(writeFile, completionFolder)

Open in new window


I have tried changing the writer.write(row) line to include a replace but I get an error. Similarly, I get errors when trying to use replace with row = row.replace("\r\n", "") ... I have pasted my attempts and the corresponding errors below.

Any insights on how I can strip carriage returns / line feeds at the time they are being read from the SQL query results into the data file are much appreciated.

Thanks in advance! :)

# Attempt1:
writer.writerow(row).replace("\r\n", "")
# Error:
Unexpected error: 'NoneType' object has no attribute 'replace'

# Attempt2:
row = row.replace("\r\n", "")
#Error:
Unexpected error: 'tuple' object has no attribute 'replace'

#Attempt3:
row = row.replace("\r", "")
row = row.replace("\n", "")
#Error:
Unexpected error: 'tuple' object has no attribute 'replace'

#Attempt 4:
        '''
        While loop to read each row.  if compares row[2] (updated) against the last record processed
        '''
        latest = params #Declare 'latest' variable for consumption by while loop
        def preprocess_item(item):
            if isinstance(item, str):
                return item.replace("\r\n", "")
                #return item.replace("\n", "").replace("\r", "")
            return item

        def preprocess_row(row):
            return tuple(preprocess_item(item) for item in row)

        while row:
            if row[2] > latest:
                #row = cursor.fetchone()
                
                latest = row[2]
            logger.debug("[%s] - Writing row %s", correlationId, row)
            row = cursor.fetchone()
            row = preprocess_row(row)
            writer.writerow(row)

Open in new window

tnesavich2Asked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Brian CroweDatabase AdministratorCommented:
REPLACE(REPLACE(mystring, CHAR(10), ''), CHAR(13), '')

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Vitor MontalvãoMSSQL Senior EngineerCommented:
What are those fields data types in SQL Server?
tnesavich2Author Commented:
They are strings.  Tried Brian's suggestion but it does not seem to have an effect on the data that is put out. ... .Interestingly enough the thought was spot on as I have used a similar technique to identify the fields with carriage returns or line feeds in the past.

IE
                                          (table.field1 LIKE '%'+ char(13 )+'%') OR
                                          (table.field1 LIKE '%'+char (13)+ '%') OR
Big Business Goals? Which KPIs Will Help You

The most successful MSPs rely on metrics – known as key performance indicators (KPIs) – for making informed decisions that help their businesses thrive, rather than just survive. This eBook provides an overview of the most important KPIs used by top MSPs.

Vitor MontalvãoMSSQL Senior EngineerCommented:
In SQL Server doesn't exist STRING data type. Can you check in the database what's really the data type there?
tnesavich2Author Commented:
varchar(80)
Vitor MontalvãoMSSQL Senior EngineerCommented:
You can avoid that in SQL Server. No need to control it in you code.
Can you post the SQL statement that you are running?
tnesavich2Author Commented:
Sure thing Victor. ... There are only certain records that return carraige returns and they all come from the location field.

Here is the SQL Select statement:


SELECT dwcustom .AuditTable_Hazards.operation AS "operation" ,
               Table.Topic.UpdatedBy AS "updatedBy" ,
               Table.Topic.UpdatedDate AS "updated" ,
               Table2UDF10 AS "customerGroup",
               Table2SiteUse AS "isolations",
               Table2StreetNo AS "streetNumber",
               Table.Topic.Installed AS "deviceStatus" ,
               Table.Topic.SerialNum AS "serialNumber" ,
               Table.Topic.MFG AS "manufacturer" ,
               Table.Topic.Model AS "model" ,
               Table.Topic.Type AS "equipmentType" ,
               Table.Topic.devSize AS "size" ,
               Table.Topic.Orientation AS "orientation" ,
               Table.Topic.UDh2 AS "protection" ,
               Table.Topic.UDh3 AS "typeOfUse" ,
               Table.Topic.Location AS "location" ,
               Table.Topic.UDh6 AS "previousDevice" ,
               Table.Topic.UDh8 AS "variance" ,
               Table.Topic.LastTest AS "lastTestDate" ,
               Table.Topic.LastTestStatus AS "lastTestStatus" ,
               Table.Topic.LastLetter AS "lastLetterDate" ,
               Table.Topic.LastLetterCode AS "lastLetterCode" ,
               Table.Topic.UDh10 AS "sosStatus" ,
               Table.Topic.ShutOffDate AS "sosDate" ,
               Table.Topic.ServiceNo AS "serviceNo" ,
               Table.Topic.LastTestBy AS "lastTestBy" ,
               Table.Topic.LastTestCompany AS "lastTestCompany" ,
               Table2LastSurveyStatus AS "lastSurveyStatus",
               Table2LastSurvey AS "lastSurvey",
               Table2LocationId AS "premiseId",
               Table.Topic.HazID AS "hazardId" ,
               Table.Topic.SiteID AS "siteId"              
      FROM Table.Topic INNER JOIN Sites ON Table.Topic .SiteID = Sites .SiteID
      WHERE (Table.Topic.Installed = '' OR
               Table.Topic.Installed = 'Installed' OR
               Table.Topic.Installed = 'Installation Not Required' OR
               Table.Topic.Installed IS NULL)
Vitor MontalvãoMSSQL Senior EngineerCommented:
Which field has the issue?
Here's a solution for "deviceStatus" (you need to change and apply it for the field you want):
SELECT dwcustom .AuditTable_Hazards.operation AS "operation" ,
                Table.Topic.UpdatedBy AS "updatedBy" ,
                Table.Topic.UpdatedDate AS "updated" ,
                Table2UDF10 AS "customerGroup",
                Table2SiteUse AS "isolations",
                Table2StreetNo AS "streetNumber",
                SELECT REPLACE(Table.Topic.Installed, CHAR(13)+CHAR(10), '') AS "deviceStatus" , --> Replace by an empty character
                Table.Topic.SerialNum AS "serialNumber" ,
                Table.Topic.MFG AS "manufacturer" ,
                Table.Topic.Model AS "model" ,
                Table.Topic.Type AS "equipmentType" ,
                Table.Topic.devSize AS "size" ,
                Table.Topic.Orientation AS "orientation" ,
                Table.Topic.UDh2 AS "protection" ,
                Table.Topic.UDh3 AS "typeOfUse" ,
                Table.Topic.Location AS "location" ,
                Table.Topic.UDh6 AS "previousDevice" ,
                Table.Topic.UDh8 AS "variance" ,
                Table.Topic.LastTest AS "lastTestDate" ,
                Table.Topic.LastTestStatus AS "lastTestStatus" ,
                Table.Topic.LastLetter AS "lastLetterDate" ,
                Table.Topic.LastLetterCode AS "lastLetterCode" ,
                Table.Topic.UDh10 AS "sosStatus" ,
                Table.Topic.ShutOffDate AS "sosDate" ,
                Table.Topic.ServiceNo AS "serviceNo" ,
                Table.Topic.LastTestBy AS "lastTestBy" ,
                Table.Topic.LastTestCompany AS "lastTestCompany" ,
                Table2LastSurveyStatus AS "lastSurveyStatus",
                Table2LastSurvey AS "lastSurvey",
                Table2LocationId AS "premiseId",
                Table.Topic.HazID AS "hazardId" ,
                Table.Topic.SiteID AS "siteId"              
       FROM Table.Topic INNER JOIN Sites ON Table.Topic .SiteID = Sites .SiteID
       WHERE (Table.Topic.Installed IN ('','Installed','Installation Not Required')
              OR   Table.Topic.Installed IS NULL) 

Open in new window

tnesavich2Author Commented:
Tried that, ... got an error though:

Msg 156, Level 15, State 1, Line 17
Incorrect syntax near the keyword 'SELECT'.
Msg 102, Level 15, State 1, Line 17
Incorrect syntax near 'CHAR'.
Msg 103, Level 15, State 4, Line 33
The identifier that starts with '
               FROM Table.Topic INNER JOIN Sites ON Table.Topic.SiteID = Sites.SiteID
     ' is too long. Maximum length is 128.
Msg 105, Level 15, State 1, Line 33
Unclosed quotation mark after the character string '
               FROM Table.Topic INNER JOIN Sites ON Table.Topic.SiteID = Sites.SiteID
      '.
tnesavich2Author Commented:
Thanks everyone,

Figured out what worked with a smaller set of data.  We have a "Notes" field in another database that is full of carriage returns and line feeds.  the SELECT statement below solves the issue. Brian was on the money!

SELECT        REPLACE(REPLACE(Notes, CHAR(10), ''), CHAR(13), '') AS TrimmedNotes,
                        Table.Value
FROM            HazardNotes INNER JOIN
                         Hazards ON HazardNotes.HazID = Hazards.HazID
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft Server OS

From novice to tech pro — start learning today.