CSV Extraction

Hello,

I am looking for some help with a script that can read a random CSV file. Extract all the unique data under the heading called ‘AuditTItle’ not including any duplicates.

Produce a text file with this data in it

Regards

Nick
Nick CollinsAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

KimputerCommented:
Sounds fairly straightforward. Here's just the steps explaining it, you should be able to figure out the code by yourself:

Read first line of the file. Split by comma. Find which field holds AuditTitle.
With this number, use a loop to read all next lines till the end of the file.
Inside the loop, split by comma, get the field nr entry, and store it in an arraylist (after checking it's already in the list or not).
Next, just output the array list to a new text file.

Done!
0
Nick CollinsAuthor Commented:
That would be fairly straight forward but I am not an developer .
0
aikimarkCommented:
Have you tried using ADODB?  You can invoke a "Select Distinct AuditTItle From " & whateverCSVfilename query against a file.  The database engine will do the heavy lifting for you.

Just iterate the resulting recordset and write out the data.
2
Cloud Class® Course: CompTIA Healthcare IT Tech

This course will help prep you to earn the CompTIA Healthcare IT Technician certification showing that you have the knowledge and skills needed to succeed in installing, managing, and troubleshooting IT systems in medical and clinical settings.

Subodh Tiwari (Neeraj)Excel & VBA ExpertCommented:
You may try something like this...


Sub UniqueDataFromSpecificColumn()
Dim csvFile As String, txtFile As String
Dim lineStr As String, str() As String
Dim i As Long, n As Long
Dim dict, x(), AuditTitle

'Path of the CSV and Text Files
'Assuming both are saved on Desktop
'Assuming name of CSV file is Audit.csv and name of Text file is AuditTitle.txt

csvFile = Environ("UserProfile") & "\Desktop\Audit.csv"
txtFile = Environ("UserProfile") & "\Desktop\AuditTitle.txt"

If Len(Dir(csvFile)) = 0 Then
    MsgBox "The CSV file doesn't exist.", vbExclamation
    Exit Sub
End If

Open csvFile For Input As #1
Line Input #1, lineStr
str = Split(lineStr, ",")
For i = 0 To UBound(str)
    If LCase(str(i)) = "audittitle" Then
        n = i
        Exit For
    End If
Next i

i = 1
Set dict = CreateObject("Scripting.Dictionary")
ReDim x(1 To i)
x(i) = "AuditTitle"

Do
    Line Input #1, lineStr
    AuditTitle = Split(lineStr, ",")(n)
    If AuditTitle <> "" Then
        If Not dict.exists(AuditTitle) Then
            dict.Item(AuditTitle) = ""
            i = i + 1
            ReDim Preserve x(1 To i)
            x(i) = AuditTitle
        End If
    End If
Loop Until EOF(1)
Close #1

Open txtFile For Output As #1
For i = 1 To UBound(x)
    Print #1, x(i)
Next i
Close #1

End Sub

Open in new window

0
aikimarkCommented:
You should never ReDim an array inside a loop.  This is an updated version of the Neeraj VBA routine without the ReDim operations.
Sub UniqueDataFromSpecificColumn()
    Dim csvFile As String, txtFile As String
    Dim lineStr As String, str() As String
    Dim i As Long, n As Long
    Dim dict, AuditTitle    'x(),
    
    'Path of the CSV and Text Files
    'Assuming both are saved on Desktop
    'Assuming name of CSV file is Audit.csv and name of Text file is AuditTitle.txt
    
    csvFile = Environ("UserProfile") & "\Desktop\Audit.csv"
    txtFile = Environ("UserProfile") & "\Desktop\AuditTitle.txt"
    
    If Len(Dir(csvFile)) = 0 Then
        MsgBox "The CSV file doesn't exist.", vbExclamation
        Exit Sub
    End If
    
    Open csvFile For Input As #1
    Line Input #1, lineStr
    str = Split(lineStr, ",")
    For i = 0 To UBound(str)
        If LCase(str(i)) = "audittitle" Then
            n = i
            Exit For
        End If
    Next i
    
    i = 1
    Set dict = CreateObject("Scripting.Dictionary")
'    ReDim x(1 To i)
'    x(i) = "AuditTitle"
    
    Do
        Line Input #1, lineStr
        AuditTitle = Split(lineStr, ",")(n)
        If AuditTitle <> "" Then
            If Not dict.exists(AuditTitle) Then
                dict.Item(AuditTitle) = 1
'                i = i + 1
'                ReDim Preserve x(1 To i)
'                x(i) = AuditTitle
            End If
        End If
    Loop Until EOF(1)
    Close #1
    
    Open txtFile For Output As #1
    Print #1, "AuditTitle"
    Print #1, Join(dict.keys, vbCrLf)

'    For i = 1 To UBound(x)
'        Print #1, x(i)
'    Next i
    
    Close #1

End Sub

Open in new window


Note: This is VBA, not VBScript.
1
Subodh Tiwari (Neeraj)Excel & VBA ExpertCommented:
@Aikimark
Nice. +1
0
Fabrice LambertFabrice LambertCommented:
And you should not use the Dir() function as it store its search criterias in an unknown and unreachable global scope, thus is unreliable.

Better use the FileSystemObject Library, plus this Library's objects and functions are meaningfull.
See the 2nd updated version below:
Sub UniqueDataFromSpecificColumn()
    Dim csvFile As String, txtFile As String
    Dim lineStr As String, str() As String
    Dim i As Long, n As Long
    Dim dict, AuditTitle    'x(),
    
    'Path of the CSV and Text Files
    'Assuming both are saved on Desktop
    'Assuming name of CSV file is Audit.csv and name of Text file is AuditTitle.txt
    
    csvFile = Environ("UserProfile") & "\Desktop\Audit.csv"
    txtFile = Environ("UserProfile") & "\Desktop\AuditTitle.txt"
    
    '// If Len(Dir(csvFile)) = 0 Then
    If (Not fileExist(csvFile)) Then
        MsgBox "The CSV file doesn't exist.", vbExclamation
        Exit Sub
    End If
    
    Open csvFile For Input As #1
    Line Input #1, lineStr
    str = Split(lineStr, ",")
    For i = 0 To UBound(str)
        If LCase(str(i)) = "audittitle" Then
            n = i
            Exit For
        End If
    Next i
    
    i = 1
    Set dict = CreateObject("Scripting.Dictionary")
'    ReDim x(1 To i)
'    x(i) = "AuditTitle"
    
    Do
        Line Input #1, lineStr
        AuditTitle = Split(lineStr, ",")(n)
        If AuditTitle <> "" Then
            If Not dict.Exists(AuditTitle) Then
                dict.item(AuditTitle) = 1
'                i = i + 1
'                ReDim Preserve x(1 To i)
'                x(i) = AuditTitle
            End If
        End If
    Loop Until EOF(1)
    Close #1
    
    Open txtFile For Output As #1
    Print #1, "AuditTitle"
    Print #1, Join(dict.Keys, vbCrLf)

'    For i = 1 To UBound(x)
'        Print #1, x(i)
'    Next i
    
    Close #1
End Sub

Public Function fileExist(ByVal path As String) As Boolean
    Dim fso As Object       '// Scripting.FileSystemObject
    Set fso = CreateObject("Scripting.FileSystemObject")
    
    fileExist = fso.FileExists(path)
    Set fso = Nothing
End Function

Open in new window

0
Nick CollinsAuthor Commented:
For some reason it didn't extract the column I wanted but instead another column..

I will attach the file for you..
Audit.csv
0
Subodh Tiwari (Neeraj)Excel & VBA ExpertCommented:
That's because there is no column header called "AuditTitle" in the file you attached.
If you want to extract data from a column from the file you attached, just change the header in line#23

If LCase(str(i)) = "audittitle" Then

Make sure to input it in lower case.
0
Nick CollinsAuthor Commented:
My correction the heading is auditname..

So I am expecting the following results

Test Data Homes Ltd - Testing Town - Job Number: 1844 / 5 / HV_000017_968424 / 11 Apr 2018 / User1
Demo Trust - Castle Site - Job Number: 1854 / Building / User1
0
Subodh Tiwari (Neeraj)Excel & VBA ExpertCommented:
Please give this a try...

Sub UniqueDataFromSpecificColumn()
    Dim csvFile As String, txtFile As String
    Dim lineStr As String, str() As String
    Dim i As Long, n As Long
    Dim dict, AuditTitle    'x(),
    
    'Path of the CSV and Text Files
    'Assuming both are saved on Desktop
    'Assuming name of CSV file is Audit.csv and name of Text file is AuditTitle.txt
    
    csvFile = Environ("UserProfile") & "\Desktop\Audit.csv"
    txtFile = Environ("UserProfile") & "\Desktop\AuditTitle.txt"
    
    If Len(Dir(csvFile)) = 0 Then
        MsgBox "The CSV file doesn't exist.", vbExclamation
        Exit Sub
    End If
    
    Open csvFile For Input As #1
    Line Input #1, lineStr
    str = Split(lineStr, ",")
    For i = 0 To UBound(str)
        If InStr(LCase(str(i)), "auditname") > 0 Then
            n = i
            Exit For
        End If
    Next i
    
    If n = 0 Then
        MsgBox "The column was not found in CSV File.", vbExclamation
        Close #1
        Exit Sub
    End If
    
    Set dict = CreateObject("Scripting.Dictionary")
    
    Do
        Line Input #1, lineStr
        AuditTitle = Split(lineStr, ",")(n)
        If AuditTitle <> "" Then
            If Not dict.exists(AuditTitle) Then
                dict.Item(AuditTitle) = 1
            End If
        End If
    Loop Until EOF(1)
    Close #1
    
    Open txtFile For Output As #1
    Print #1, "AuditTitle"
    Print #1, Join(dict.keys, vbCrLf)
  
    Close #1

End Sub

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Nick CollinsAuthor Commented:
That's much better but for some reason it's producing 3 lines in the results - see attachment
AuditTitle.txt
0
Subodh Tiwari (Neeraj)Excel & VBA ExpertCommented:
The problem is, some of the strings in Label column contain comma so the splitting is not working as desired.

You may give this a try but don't think it will be reliable in all the cases....

Sub UniqueDataFromSpecificColumn()
    Dim csvFile As String, txtFile As String
    Dim lineStr As String, str() As String
    Dim i As Long, n As Long
    Dim dict, AuditTitle    'x(),
    
    'Path of the CSV and Text Files
    'Assuming both are saved on Desktop
    'Assuming name of CSV file is Audit.csv and name of Text file is AuditTitle.txt
    
    csvFile = Environ("UserProfile") & "\Desktop\Audit.csv"
    txtFile = Environ("UserProfile") & "\Desktop\AuditTitle.txt"
    
    If Len(Dir(csvFile)) = 0 Then
        MsgBox "The CSV file doesn't exist.", vbExclamation
        Exit Sub
    End If
    
    Open csvFile For Input As #1
    Line Input #1, lineStr
    str = Split(lineStr, ",")
    For i = 0 To UBound(str)
        If InStr(LCase(str(i)), "auditname") > 0 Then
            n = i
            Exit For
        End If
    Next i
    
    If n = 0 Then
        MsgBox "The column was not found in CSV File.", vbExclamation
        Close #1
        Exit Sub
    End If
    
    Set dict = CreateObject("Scripting.Dictionary")
    
    Do
        Line Input #1, lineStr
        AuditTitle = Split(lineStr, ",")(n)
        If AuditTitle <> "" And UBound(Split(lineStr, ",")) <= 36 Then
            If Not dict.exists(AuditTitle) Then
                dict.Item(AuditTitle) = 1
            End If
        End If
    Loop Until EOF(1)
    Close #1
    
    Open txtFile For Output As #1
    Print #1, "AuditTitle"
    Print #1, Join(dict.keys, vbCrLf)
  
    Close #1

End Sub

Open in new window

0
aikimarkCommented:
The first line is the (original spec) title of the column.  Comment the first Print statement to remove that line from the output.
0
aikimarkCommented:
I think the non-header SPLIT() would behave if you split on "," character sequence.  As long as the desired column (auditname) doesn't contain a quote character, this would parse correctly.  If it is possibly the first or last column, then include a REPLACE() function to remove any quote characters.
0
Nick CollinsAuthor Commented:
On the output file is there a way to remove the double quotes at the beginning and end of each line.
0
Subodh Tiwari (Neeraj)Excel & VBA ExpertCommented:
Replace line#39 with the following line...

AuditTitle = Replace(Split(lineStr, ",")(n), Chr(34), "")

Open in new window

0
aikimarkCommented:
This is an example of my prior comment.
AuditTitle = Replace(Split(lineStr, """,""")(n), Chr(34), "")

Open in new window

You only need to replace a quote character on the column of interest, not the first and last columns.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
VB Script

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.