Solved

Re Format CSV file

Posted on 2014-03-27
7
306 Views
Last Modified: 2014-03-28
Can an expert help me out with VBA code that will split out the info I need please.

I have a large CSV file in this format

Cat          Reference W/Held COI Tax W/Held B/U Tax    Charges Tax Deducted Type      Admin     Net Amount   Amount to Client
PKNHL0002 (111) CHTY      HTCU-AMPHLETT SCHLR(WAAA)
Grp1       5860.0000           0.00           0.00       0.00                         0.00         666.90
ASHBY0014 (111) CHTY      HTCU-ASHBY A J H W/T(WSSS)
Grp1       3788.0000           0.00           0.00       0.00                         0.00         431.09

Open in new window




I am not bothered about the headings, what I need to end up with this

AMPHL0002	5860
ASHBY0014               3788

Open in new window


I have tried Text to column but this does not work [for me]

There are over 5000 lines in the CSV file  and each time there is a GRP1,  to the right is the amount and above that is the data I need.

All data is in column A

Thanks
0
Comment
Question by:Jagwarman
7 Comments
 
LVL 34

Expert Comment

by:Dan Craciun
ID: 39959397
You don't need Excel for that. Just open your csv in Notepad++, press CTRL-H (or press Search->Replace), under "Search mode" check "Regular expressions" then use this in the Find box:
(\w+)\s+\(\w+\).*\r\nGrp1\s+(\d+).*$

Open in new window

and "$1 $2" in the Replace box, then click on Replace all.
replace on notepad
HTH,
Dan
0
 
LVL 39

Accepted Solution

by:
nutsch earned 450 total points
ID: 39959437
Assuming your actual data (not the header starts in row 3), the attached FixCSV macro will do that for you:

Thomas

Option Explicit
Option Compare Text
 
Sub FixCSV()
Dim shtDest As Worksheet, shtOrg As Worksheet
Dim lLastRow As Long, lRowLoop As Long

'turn off updates to speed up code execution
With Application
    .ScreenUpdating = False
    .EnableEvents = False
    .Calculation = xlCalculationManual
    .DisplayAlerts = False
End With


lLastRow = Cells(Rows.Count, 1).End(xlUp).Row

Columns(1).Insert
Range("A3:A" & lLastRow).FormulaR1C1 = _
    "=IF(LEFT(RC2,4)=""Grp1"",--regexpfind(RC2,""\s\d+"",1),IF(LEFT(R[1]C2,4)=""Grp1"",LEFT(RC2,FIND("" "",RC2)-1),""""))"

Set shtOrg = ActiveSheet
Set shtDest = Sheets.Add

shtDest.Range("A1:A" & lLastRow - 2).Value = shtOrg.Range("A3:A" & lLastRow).Value
shtOrg.Columns(1).Delete

TurnRowsToColumns

With Application
    .ScreenUpdating = True
    .EnableEvents = True
    .Calculation = xlCalculationAutomatic
    .DisplayAlerts = True
End With

End Sub

Function RegExpFind(LookIn As String, PatternStr As String, Optional Pos, _
    Optional MatchCase As Boolean = True, Optional ReturnType As Long = 0, _
    Optional MultiLine As Boolean = False)
    
    ' Function written by Patrick G. Matthews.  You may use and distribute this code freely,
    ' as long as you properly credit and attribute authorship and the URL of where you
    ' found the code
    
    ' This function relies on the VBScript version of Regular Expressions, and thus some of
    ' the functionality available in Perl and/or .Net may not be available.  The full extent
    ' of what functionality will be available on any given computer is based on which version
    ' of the VBScript runtime is installed on that computer
    
    ' This function uses Regular Expressions to parse a string (LookIn), and return matches to a
    ' pattern (PatternStr).  Use Pos to indicate which match you want:
    ' Pos omitted               : function returns a zero-based array of all matches
    ' Pos = 1                   : the first match
    ' Pos = 2                   : the second match
    ' Pos = <positive integer>  : the Nth match
    ' Pos = 0                   : the last match
    ' Pos = -1                  : the last match
    ' Pos = -2                  : the 2nd to last match
    ' Pos = <negative integer>  : the Nth to last match
    ' If Pos is non-numeric, or if the absolute value of Pos is greater than the number of
    ' matches, the function returns an empty string.  If no match is found, the function returns
    ' an empty string.  (Earlier versions of this code used zero for the last match; this is
    ' retained for backward compatibility)
    
    ' If MatchCase is omitted or True (default for RegExp) then the Pattern must match case (and
    ' thus you may have to use [a-zA-Z] instead of just [a-z] or [A-Z]).
    
    ' ReturnType indicates what information you want to return:
    ' ReturnType = 0            : the matched values
    ' ReturnType = 1            : the starting character positions for the matched values
    ' ReturnType = 2            : the lengths of the matched values
    
    ' If MultiLine = False, the ^ and $ match the beginning and end of input, respectively.  If
    ' MultiLine = True, then ^ and $ match the beginning and end of each line (as demarcated by
    ' new line characters) in the input string
    
    ' If you use this function in Excel, you can use range references for any of the arguments.
    ' If you use this in Excel and return the full array, make sure to set up the formula as an
    ' array formula.  If you need the array formula to go down a column, use TRANSPOSE()
    
    ' Note: RegExp counts the character positions for the Match.FirstIndex property as starting
    ' at zero.  Since VB6 and VBA has strings starting at position 1, I have added one to make
    ' the character positions conform to VBA/VB6 expectations
    
    ' Normally as an object variable I would set the RegX variable to Nothing; however, in cases
    ' where a large number of calls to this function are made, making RegX a static variable that
    ' preserves its state in between calls significantly improves performance
    
    Static RegX As Object
    Dim TheMatches As Object
    Dim Answer()
    Dim Counter As Long
    
    ' Evaluate Pos.  If it is there, it must be numeric and converted to Long
    
    If Not IsMissing(Pos) Then
        If Not IsNumeric(Pos) Then
            RegExpFind = ""
            Exit Function
        Else
            Pos = CLng(Pos)
        End If
    End If
    
    ' Evaluate ReturnType
    
    If ReturnType < 0 Or ReturnType > 2 Then
        RegExpFind = ""
        Exit Function
    End If
    
    ' Create instance of RegExp object if needed, and set properties
    
    If RegX Is Nothing Then Set RegX = CreateObject("VBScript.RegExp")
    With RegX
        .Pattern = PatternStr
        .Global = True
        .IgnoreCase = Not MatchCase
        .MultiLine = MultiLine
    End With
        
    ' Test to see if there are any matches
    
    If RegX.test(LookIn) Then
        
        ' Run RegExp to get the matches, which are returned as a zero-based collection
        
        Set TheMatches = RegX.Execute(LookIn)
        
        ' Test to see if Pos is negative, which indicates the user wants the Nth to last
        ' match.  If it is, then based on the number of matches convert Pos to a positive
        ' number, or zero for the last match
        
        If Not IsMissing(Pos) Then
            If Pos < 0 Then
                If Pos = -1 Then
                    Pos = 0
                Else
                    
                    ' If Abs(Pos) > number of matches, then the Nth to last match does not
                    ' exist.  Return a zero-length string
                    
                    If Abs(Pos) <= TheMatches.Count Then
                        Pos = TheMatches.Count + Pos + 1
                    Else
                        RegExpFind = ""
                        GoTo Cleanup
                    End If
                End If
            End If
        End If
        
        ' If Pos is missing, user wants array of all matches.  Build it and assign it as the
        ' function's return value
        
        If IsMissing(Pos) Then
            ReDim Answer(0 To TheMatches.Count - 1)
            For Counter = 0 To UBound(Answer)
                Select Case ReturnType
                    Case 0: Answer(Counter) = TheMatches(Counter)
                    Case 1: Answer(Counter) = TheMatches(Counter).FirstIndex + 1
                    Case 2: Answer(Counter) = TheMatches(Counter).Length
                End Select
            Next
            RegExpFind = Answer
        
        ' User wanted the Nth match (or last match, if Pos = 0).  Get the Nth value, if possible
        
        Else
            Select Case Pos
                Case 0                          ' Last match
                    Select Case ReturnType
                        Case 0: RegExpFind = TheMatches(TheMatches.Count - 1)
                        Case 1: RegExpFind = TheMatches(TheMatches.Count - 1).FirstIndex + 1
                        Case 2: RegExpFind = TheMatches(TheMatches.Count - 1).Length
                    End Select
                Case 1 To TheMatches.Count      ' Nth match
                    Select Case ReturnType
                        Case 0: RegExpFind = TheMatches(Pos - 1)
                        Case 1: RegExpFind = TheMatches(Pos - 1).FirstIndex + 1
                        Case 2: RegExpFind = TheMatches(Pos - 1).Length
                    End Select
                Case Else                       ' Invalid item number
                    RegExpFind = ""
            End Select
        End If
    
    ' If there are no matches, return empty string
    
    Else
        RegExpFind = ""
    End If
    
Cleanup:
    ' Release object variables
    
    Set TheMatches = Nothing
    
End Function


Sub TurnRowsToColumns()
Dim shtOrg As Worksheet, shtDest As Worksheet
Dim lLastRow As Long, lRowLoop As Long

Const lRowsCount As Long = 2

'turn off updates to speed up code execution
With Application
    .ScreenUpdating = False
    .EnableEvents = False
    .Calculation = xlCalculationManual
End With


Set shtOrg = ActiveSheet
Set shtDest = Sheets.Add
 lLastRow = shtOrg.Cells(Rows.Count, 1).End(xlUp).Row

lRowLoop = 0

Do While lRowLoop * lRowsCount < lLastRow
    shtOrg.Cells(lRowLoop * lRowsCount + 1, 1).Resize(lRowsCount).Copy
    shtDest.Cells(lRowLoop + 1, 1).PasteSpecial Paste:=xlPasteAll, Operation:=xlNone, SkipBlanks:= _
        False, Transpose:=True

    lRowLoop = lRowLoop + 1
Loop

With Application
    .ScreenUpdating = True
    .EnableEvents = True
    .Calculation = xlCalculationAutomatic
End With

End Sub

Open in new window

0
 
LVL 31

Assisted Solution

by:Rob Henson
Rob Henson earned 50 total points
ID: 39959460
Assuming your data is in the format above eg:

Header      Header   Header
Data Label
Grp1          DataValue
Data Label
Grp1          DataValue
Data Label
Grp1          DataValue

ie your Data Value is always one row below and one column to the right of the Data label

You can still use the text to columns. Once Imported highlight the first row of headers all but the first column and then delete cells and choose the Move Up option. This will then move the Data Value up one row alongside the Data Label.

You can then use Auto Filter to select the rows you don't want and delete them.

Thanks
Rob H
0
Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

 
LVL 39

Expert Comment

by:nutsch
ID: 39959468
Quite a few ways to skin this cat indeed.
0
 
LVL 31

Expert Comment

by:Rob Henson
ID: 39959484
Alternative formula driven cat skinning.

If Cat(egory) values are unique:

In separate column after TTC import, assuming labels in A and values in B:

=INDEX(B:B,MATCH(A2,A:A,0)+1)

Thanks
Rob H
0
 

Author Comment

by:Jagwarman
ID: 39961097
nutsch

Would you be able to tell me how to change the macro so that where an amount is say   5860.0000 it picks up 5860.000

thanks
0
 
LVL 39

Expert Comment

by:nutsch
ID: 39961620
If you do that, you'll have to either change the formatting to show those zeroes or transfer the data as string. To pick those zeroes, change \s\d+ in the regexpfind line to \s\d+\.?\d+

Sorry, I wasn't able to copy the whole line from my phone, but it's in the formular1c1 line.

Thomas
0

Featured Post

6 Surprising Benefits of Threat Intelligence

All sorts of threat intelligence is available on the web. Intelligence you can learn from, and use to anticipate and prepare for future attacks.

Join & Write a Comment

Improved? Move/Copy Add-in Replacement - How to avoid the annoying, “A formula or sheet you want to move or copy contains the name XXX, which already exists on the destination worksheet.” David Miller (dlmille)  It was one of those days… I wa…
This article descibes how to create a connection between Excel and SAP and how to move data from Excel to SAP or the other way around.
The viewer will learn how to create a normally distributed random variable in Excel, use a normal distribution to simulate the return on an investment over a period of years, Create a Monte Carlo simulation using a normal random variable, and calcul…
This Micro Tutorial demonstrates in Microsoft Excel how to consolidate your marketing data by creating an interactive charts using form controls. This creates cool drop-downs for viewers of your chart to choose from.

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

26 Experts available now in Live!

Get 1:1 Help Now