Cleansing CSV File

I need help cleansing and sorting the the attached file (extract only).  The following actions need to occur for my data to be cleansed:

1. Column A - remove all data after the first email address i.e. remove all data including the space after the email address leaving you with just a simple email address and not mention of SMTP etc

2. Column B,C,D - Extract Only First Name (where one exists) OR If there is not a first name and only Title + Surname THEN return Title + Surname in a single cell.

4. Column E - Select only those people in the list whose criteria in this column = 'DSA'

I am looking to create a comma separated file with the following syntax of all data in my list - with only those people whose 'category' = 'DSA':


email[separator]name[new line]
email[separator]name[new line] etc etc

The objective is that my newsletter application imports all data and the is able to personalise all letters/newsletters

Example

Dear David, (see syntax above) - assuming first name avalable
Dear Mr Scott, (see syntax above) - assuming no first name available

Contacts-CSV-File-Sample.CSV
davidascottAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

answer_dudeCommented:
If you put the code below in another workbook, keep it open, then toggle to your CSV file then run the code from there this should work

Caution:  this code deletes all rows that do not have "DSA" in column E

Sub ReformatCSV()

    Dim FindLastRow As Long
    Dim FindLastColumn As Long
    Dim x As Long
    Dim rRange As Range
        
    ' Delete all rows that do not have Categories = DSA
    With ActiveSheet
         If .AutoFilterMode = False Then .Cells(1, 1).AutoFilter
                .Range("A1").AutoFilter Field:=5, Criteria1:="<>DSA"
                .Range("A1").CurrentRegion.Offset(1, 0).SpecialCells _
                        (xlCellTypeVisible).EntireRow.Delete

        .AutoFilterMode = False

    End With
        
    FindLastRow = 0
    If WorksheetFunction.CountA(Cells) > 0 Then
        'Search for any entry, by searching backwards by Rows.
        FindLastRow = Cells.Find(What:="*", After:=[A1], _
                      SearchOrder:=xlByRows, _
                      SearchDirection:=xlPrevious).Row
    End If
    FindLastColumn = 0
    If WorksheetFunction.CountA(Cells) > 0 Then
        'Search for any entry, by searching backwards by Columns.
        FindLastColumn = Cells.Find(What:="*", After:=[IV1], _
                         SearchOrder:=xlByColumns, _
                         SearchDirection:=xlPrevious).Column
    End If

    Set rRange = ActiveSheet.Range(Cells(1, 1), Cells(FindLastRow, FindLastColumn))
    x = 2
    With rRange
        Do While x <= FindLastRow
        
            .Cells(x, 1).Value = Left(.Cells(x, 1), InStr(1, .Cells(x, 1), " ") - 1)
            .Cells(x, 6) = IIf(IsEmpty(.Cells(x, 3)), "Dear " & .Cells(x, 2) & " " & .Cells(x, 4) & ", ", "Dear " & .Cells(x, 3) & ", ")
            x = x + 1
            
        Loop
    End With

End Sub

Open in new window

0
answer_dudeCommented:
I should add that the above snippet puts your salutation in column F.
0
davidascottAuthor Commented:
Sorry for the delay in responding

Thanks, the code works a treat.  Because the newsletter in the CMS inserts the 'dear' itself.  Do I simply remove line 40 code to give me a result that just ends with (a) first name only or (b) if no first name given then Mr/Mrs Snell as a result.

Thanks in advance
0
Exploring SQL Server 2016: Fundamentals

Learn the fundamentals of Microsoft SQL Server, a relational database management system that stores and retrieves data when requested by other software applications.

davidascottAuthor Commented:
Note 1:
Just realised that when searching through the entire list, many of the records have multiple categories and are 'semi-colon' delimited.  I have attached a revised sample. Please would you take this into account as well as the last update above.

Note 2
Please would you change the 'category' that the search looks to 'ERS' as the ERS 'category' would also be required in terms of producing the final output

Note 3
Can the code also produce just the final result in the syntax required by the CMS

email[separator]name[new line]
email[separator]name[new line]

Thanks, David
0
davidascottAuthor Commented:
Sorry....see revised sample file attached.

Best, David
Contacts-CSV-File-Sample-v2.CSV
0
davidascottAuthor Commented:
The separator as referred to in the comment before last should remain comma delimited
0
davidascottAuthor Commented:
Please ignore last updated sample file. Please refer to the one attached to this post i.e. v3
Contacts-CSV-File-Sample-v3.CSV
0
answer_dudeCommented:
RE:   removing "Dear" -- you would not delete the entire line, just the portion of the line that inserts "Dear" -- the new snippet below does that.

RE:  ERS category... do you want both the ERS category and the DSA category or just ERS?  (snippet attached does both).  HOWEVER -- you have categories such as "Framers" and "Suppliers" which also contain the letters "ERS" -- I've added another block to remove these, specifically... but if you ever have other categories that contain "ERS" this code would fail.

RE:  Final result... did you want the new email address and the salutation name to be in the same field with a comma separating the two?  That's what the new snippet does.  However, there's no reason why you can't have both in their own field, when you save as .csv file Excel will put commas between them -- nonetheless, the code below should work.

Finally, just a warning... the last sample file you posted has a different format -- the code I've given you assumes the same format each time -- it can be made more robust (by tying to column heading names) but even then there needs to be consistency across files..
Sub ReformatCSV()
On Error Resume Next
    Dim FindLastRow As Long
    Dim FindLastColumn As Long
    Dim x As Long
    Dim rRange As Range
        
    ' Delete all rows that do not have Categories containing DSA or ERS
    With ActiveSheet
         If .AutoFilterMode = False Then .Cells(1, 1).AutoFilter
                .Range("A1").AutoFilter Field:=6, Criteria1:="<>*DSA*", Operator:=xlAnd, _
                                                  Criteria2:="<>*ERS*"
                .Range("A1").CurrentRegion.Offset(1, 0).SpecialCells _
                        (xlCellTypeVisible).EntireRow.Delete

        .AutoFilterMode = False

    End With
    
    'Delete known categories with "ERS" in the name
    With ActiveSheet
         If .AutoFilterMode = False Then .Cells(1, 1).AutoFilter
                .Range("A1").AutoFilter Field:=6, Criteria1:="=Suppliers", Operator:=xlOr, _
                                                  Criteria2:="=Framers"
                .Range("A1").CurrentRegion.Offset(1, 0).SpecialCells _
                        (xlCellTypeVisible).EntireRow.Delete

        .AutoFilterMode = False
    End With
    
    FindLastRow = 0
    If WorksheetFunction.CountA(Cells) > 0 Then
        'Search for any entry, by searching backwards by Rows.
        FindLastRow = Cells.Find(What:="*", After:=[A1], _
                      SearchOrder:=xlByRows, _
                      SearchDirection:=xlPrevious).Row
    End If
    FindLastColumn = 0
    If WorksheetFunction.CountA(Cells) > 0 Then
        'Search for any entry, by searching backwards by Columns.
        FindLastColumn = Cells.Find(What:="*", After:=[IV1], _
                         SearchOrder:=xlByColumns, _
                         SearchDirection:=xlPrevious).Column
    End If

    Set rRange = ActiveSheet.Range(Cells(1, 1), Cells(FindLastRow, FindLastColumn))
    x = 2
    With rRange
        Do While x <= FindLastRow
        
            .Cells(x, 7).Value = Left(.Cells(x, 1), InStr(1, .Cells(x, 1), " ") - 1)
            .Cells(x, 7) = .Cells(x, 7) & "," & IIf(IsEmpty(.Cells(x, 4)), .Cells(x, 3) & " " & .Cells(x, 5), .Cells(x, 4))
            x = x + 1
            
        Loop
    End With

End Sub

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
davidascottAuthor Commented:
Your output was great - thank you
0
davidascottAuthor Commented:
Great solution and a real time saver. Thank You.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft Excel

From novice to tech pro — start learning today.