Cleanse exported data and summarize and group by specific field

I have taskcard program that outputs work accomplished at an auto repair facility and tracks by a specific workcard.  Unfortunately, I need data rolled up to a higher level.  Output is sorted by column B - Car Nbr and then by Column G - Workcard.  I would like to generate a new tab in each spreadsheet that rolls up similar workcards into 1 unique number for each car and then repeated for the next car and any labor hours in columns D and J would be summed together

Pattern Examples:
  • 2308
  • 2308
- These two unique lines would only show up as 1 entry in the output tab with any labor hours for both lines summed together

  • 0018_ZIP01
  • 0018_ZIP01
-These items would show up as one entry "0018" with the the "_xxx" removed and the hours summed together

  • 1855/001
  • 1855/001
-These items would show up as one entry "1855" with the the "/001" removed and the hours summed together

  • 21-97-01
  • 21-97-01
  • 21-97-01/001
-These would show up as one entry "21-97-01" with the "/001" removed and hours summed together

  • C1234AB_0001
  • C1234AB_0002
  • C1234AB_0003
  • C1234AB_0004
-These would show up as one entry "C1234AB" with the "_XXXX" removed and hours summed together

See attached sheet where I show the input and the desired output.

Thank youData-Prep.xlsx
robbdfwAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

aikimarkCommented:
I started to get the base data of the work code with this function, creating a new column with the formula:
=GetBaseData(G3)

Open in new window

Public Function GetBaseData(ByVal parmText) As String
    Static oRE As Object
    Dim oMatches As Object
    If oRE Is Nothing Then
        Set oRE = CreateObject("vbscript.regexp")
        oRE.Global = False
        oRE.Pattern = "^([^/_]+)"
    End If
    Set oMatches = oRE.Execute(parmText)
    GetBaseData = oMatches(0).submatches(0)
End Function

Open in new window

While that worked well, the mapping of the pivot table results against what you needed ran into some snags when you have multiple different values for the same base work code.
For instance, base work code 1855 has two different values for Data Plate and Description.  In your example, there doesn't appear to be a standard rule for what happens in such circumstances.
0
robbdfwAuthor Commented:
Data Plate is a random value that has no bearing on the output (could be ignored).  Do I place the formula in each cell and copy down:
=GetBaseData(G3)   =GetBaseData(G4) = =GetBaseData(G5)
0
aikimarkCommented:
This comes close.  There is some difference due to missing group E.  How does it look to you?
Data-Prep.xlsm
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Determine the Perfect Price for Your IT Services

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden with our free interactive tool and use it to determine the right price for your IT services. Download your free eBook now!

robbdfwAuthor Commented:
The outputfound in Sheet 1 looks great.  I think that will work.  How do I run the macro or start the code?  I appreciate your help.
0
aikimarkCommented:
1. Do you need to preserve the original data?
2. Is this a one-time process?
0
robbdfwAuthor Commented:
I would like to preserve the original data for reference and it would be a one time process for every new spreadsheet.  Receive a new one every week.
0
aikimarkCommented:
yes.  the =getbasedata() formula is in every row in a new column.  The parameter should be the work code column on that row.
0
aikimarkCommented:
This comes close to automating the necessary steps.
Public Sub Q_29093577()
'---------------------------------------------------------------------------------------
' Method : Q_29093577
' Author : Aikimark
' Date   : 4/10/2018
' Purpose: Cleanse exported data and summarize and group by specific field
'---------------------------------------------------------------------------------------
    Dim wksSrc As Worksheet
    Dim wksTgt As Worksheet
    Dim rngSrc As Range
    Dim rngTgt As Range
    
    Application.ScreenUpdating = False
    
    Set wksSrc = Worksheets("DataInput")
    Set wksTgt = Sheets.Add(After:=Sheets(Sheets.Count))
    Set rngTgt = wksTgt.Range("A1")
    Set rngSrc = wksSrc.Range("A1").CurrentRegion
    rngSrc.Copy rngTgt
    
    wksTgt.Rows(1).Delete
    
    wksTgt.Columns("H:H").Insert
    'Selection.Insert Shift:=xlToRight, CopyOrigin:=xlFormatFromLeftOrAbove
    wksTgt.Range("H1").FormulaR1C1 = "WkCardBase"
    Set rngTgt = wksTgt.Range("H2")
    rngTgt.FormulaR1C1 = "=GetBaseData(RC[-1])"
    rngTgt.AutoFill Destination:=wksTgt.Range(wksTgt.Range("H2"), wksTgt.Range("I2").End(xlDown).Offset(0, -1))
    
    rngTgt.Subtotal GroupBy:=9, Function:=xlSum, TotalList:=Array(4, 11), _
        Replace:=True, PageBreaks:=False, SummaryBelowData:=True
    rngTgt.ClearOutline
    wksTgt.Range("I:I").Find(What:="Grand Total").Activate
    Selection.EntireRow.Delete
    
    Set rngSrc = wksTgt.Range("A1").End(xlDown).Offset(1)
    rngSrc.FormulaR1C1 = "=R[-1]C"
    Set rngTgt = rngTgt.CurrentRegion
    Set rngTgt = rngTgt.SpecialCells(xlCellTypeBlanks)
    rngSrc.Copy rngTgt

    Set rngTgt = rngTgt.CurrentRegion
    rngTgt.Value = rngTgt.Value
    
    wksTgt.Range("$A$1").CurrentRegion.AutoFilter Field:=9, Criteria1:="<>*Total*" _
        , Operator:=xlAnd
    
    wksTgt.Range(rngTgt.Rows(2), rngTgt.SpecialCells(xlCellTypeLastCell)).SpecialCells(xlCellTypeVisible).EntireRow.Delete
    
    wksTgt.Range("$A$1").CurrentRegion.AutoFilter
    
    Set rngTgt = wksTgt.Range(wksTgt.Range("I1"), wksTgt.Range("I1").End(xlDown))
    rngTgt.Replace What:=" Total", Replacement:="", LookAt:=xlPart, _
        SearchOrder:=xlByRows, MatchCase:=False, SearchFormat:=False, _
        ReplaceFormat:=False
    
    Set rngTgt = rngTgt.CurrentRegion
    rngTgt.ClearFormats
    
    Application.ScreenUpdating = True

End Sub

Open in new window

0
robbdfwAuthor Commented:
I ran it on the sample data input and it all worked except I noted the following:

25-93-05
25-93-05

showed up twice in the output
0
aikimarkCommented:
That's what I mentioned in the earlier comment about multiple values for a given WorkCodeBase.
0
aikimarkCommented:
If you need to sum SOLELY by the (base) work code, then you'll need to sort by the original workcode column before invoking the VBA code.  Alternatively, you can tweak the VBA code to do the sort.
0
robbdfwAuthor Commented:
Thank you for all your help on this.
0
robbdfwAuthor Commented:
Ability to provide the error cleansing and extract with such a limited set of lines of code.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft Office

From novice to tech pro — start learning today.