• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 212
  • Last Modified:

Code to remove, duplicate and add information to Excel spreadsheet

All

I have a large data set (???) that needs tidying before being 'uploaded' to stats program.

As this data set will be updated on a bi annual basis, I am looking for a solution to a main data tidying exercise.

In one field of data, names of authors, there can be a single author or up to 25 authors in a single cell.

What I need to do is to insert duplicate rows for each of these authors.  That is if there are 10 authors, I need to insert 9 rows BUT each row will only contain one name.

My solution for this round was to insert rows and then manually remove names so that each author had one row.

(All other data is repeated for each author)

In addition, where there is an * after the name I need to add Int to column D and where there is a # I need to add Ext to column D

See attached example

Any ideas or thoughts would be much appreciated as I don't necessarily want to be copying and pasting and deleting for two weeks again to get data in usable format.

TIA

Shaz



ERA-EE.xlsx
0
shaz0503
Asked:
shaz0503
  • 11
  • 7
  • 2
1 Solution
 
balatheexpertCommented:
Cool, all of the above can be done!!! but i would need some time.. when would u need it by?

and only thing i would do is, will add the cleansed data in a new sheet rather than adding new rows inbetween and clean it. Coz, it might cause some disturbance to existing data.

-Bala
0
 
shaz0503Author Commented:
Thanks

Any help you are able to offer is appreciated.... Not really urgent, would just like to get it before the next round of data is out (prob in July) so I can see that it works.

Adding a new sheet is not a problem

rgds
0
 
SiddharthRoutCommented:
Is this what you want?

Please run the macro "Sample" in the attachment. The output will be generated in a new sheet called Output.

HTH

Sid

Code Used

Sub Sample()
    Dim i As Long, j As Long, k As Long, LastRow As Long, r As Long, pos As Long
    Dim ws1 As Worksheet, ws2 As Worksheet
    Dim HashArray() As String, StarArray() As String
    Dim strTemp As String
    
    On Error Resume Next
    Application.DisplayAlerts = False
    Sheets("Output").Delete
    Application.DisplayAlerts = True
    On Error GoTo 0
    
    Set ws1 = Sheets("Raw Data")
    Set ws2 = Sheets.Add
    ws2.Name = "Output"
    
    LastRow = ws1.Range("A" & Rows.Count).End(xlUp).Row
    
    r = 1
    
    For i = 1 To LastRow
        If InStr(1, ws1.Range("B" & i).Value, "#") Or InStr(1, ws1.Range("B" & i).Value, "*") Then
            strTemp = ws1.Range("A" & i).Value
            ws2.Range("A" & r).Value = ws1.Range("A" & i).Value
            HashArray = Split(ws1.Range("B" & i).Value, "#")
            For j = 0 To UBound(HashArray) - 1
                If InStr(1, HashArray(j), "*") Then
                    StarArray = Split(HashArray(j), "*")
                    For k = 0 To UBound(StarArray)
                        If Left(StarArray(k), 1) = "," Then
                            ws2.Range("B" & r).Value = Mid(StarArray(k), 3)
                        Else
                            ws2.Range("B" & r).Value = StarArray(k)
                        End If
                        pos = InStr(1, ws1.Range("B" & i).Value, StarArray(k)) + Len(StarArray(k))
                        If Mid(ws1.Range("B" & i).Value, pos, 1) = "#" Then
                            ws2.Range("C" & r).Value = "EXT"
                        ElseIf Mid(ws1.Range("B" & i).Value, pos, 1) = "*" Then
                            ws2.Range("C" & r).Value = "INT"
                        End If
                        ws2.Range("A" & r).Value = strTemp
                        r = ws2.Range("B" & Rows.Count).End(xlUp).Row + 1
                    Next
                Else
                    If Left(HashArray(j), 1) = "," Then
                        ws2.Range("B" & r).Value = Mid(HashArray(j), 3)
                    Else
                        ws2.Range("B" & r).Value = HashArray(j)
                    End If
                    pos = InStr(1, ws1.Range("B" & i).Value, HashArray(k)) + Len(HashArray(k))
                    If Mid(ws1.Range("B" & i).Value, pos, 1) = "#" Then
                        ws2.Range("C" & r).Value = "EXT"
                    ElseIf Mid(ws1.Range("B" & i).Value, pos, 1) = "*" Then
                        ws2.Range("C" & r).Value = "INT"
                    End If
                    ws2.Range("A" & r).Value = strTemp
                    r = ws2.Range("B" & Rows.Count).End(xlUp).Row + 1
                End If
            Next
        End If
        r = ws2.Range("B" & Rows.Count).End(xlUp).Row + 1
    Next i
End Sub

Open in new window

ERA-EE.xlsm
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
shaz0503Author Commented:
Thanks so much --

This worked in excellent in part...see attached file

There was an error saying 'subscript out of range'

You will see that only 64 rows of data are in the Output sheet.  There should be in excess of 3000.....

Also seems to be skipping some rows - see row three on teh raw data sheet is missing and so are rows 9, 10, 11.

Copy-of-ERA-2010---HASS-master-l.xls
0
 
SiddharthRoutCommented:
Looking at it now. Sorry I didn't have much data to test the macro with :)

Sid
0
 
SiddharthRoutCommented:
I found the error. I was assuming that the text will contain both # and * Where as in Line 12 it was just * :)

Please try this file. let me know if you see any discrepancies.

Sid
Raw-Data.xls
0
 
SiddharthRoutCommented:
No. Some are still missing. Let me recheck it

Sid
0
 
SiddharthRoutCommented:
Ok amended a small piece of code. Please try this file.

Sid
Raw-Data.xls
0
 
balatheexpertCommented:
SiddarthRout, u  r amazing yaar!!! whichever questions i open, u have already done it!! awesome...

u r the Expert!!!!
0
 
SiddharthRoutCommented:
Thanks Bala Bhai :)

Sid
0
 
shaz0503Author Commented:
SiddarthRout

Thank you so much - you have saved me so much time....I will now try this with all the columns of data (30 in total) and get back if any problems...

The only one I can see from what you have done is that the last row does not record and INT or EXT .... Does part of the code need to be 'extended'

rgds
0
 
SiddharthRoutCommented:
>>>I will now try this with all the columns of data (30 in total) and get back if any problems...

The will have to be amended for 30 columns :)

>>>The only one I can see from what you have done is that the last row does not record and INT or EXT .... Does part of the code need to be 'extended'

Let me check that for you.

Sid
0
 
shaz0503Author Commented:
Thanks Sid

I didn't want to send the entire file - should have though

The file actually has data in coulmns A - AJ

If this is what is needed

I am off home for the day - will await your reply... talk tomorrow

rgds
0
 
SiddharthRoutCommented:
>>>The file actually has data in coulmns A - AJ

That is good enough. There is just a minor change which I will upload in the next few moments

Sid
0
 
SiddharthRoutCommented:
>>>The only one I can see from what you have done is that the last row does not record and INT or EXT .... Does part of the code need to be 'extended'

I just noticed that it created that extra row when it shouldn't it is just a duplicate of the above row :)

I will incorporate that in the code.

Sid
0
 
SiddharthRoutCommented:
>>>The file actually has data in coulmns A - AJ

Quick question.

What is in Col C or Rawdata? The reason I am asking is because e are going to put INT/EXT in Col C in the output file.

Sid
0
 
shaz0503Author Commented:
Sid

Apologies for not getting back to you...

Column A has ID numbers; Column B Title of publication; Column C has the authors (this is the one to be 'split/duplicated';

I need to add a column at Column D to insert the 'INT/EXT'

and then columns E - AK contain data

Sorry for the confusion on my part
0
 
SiddharthRoutCommented:
>>>Column A has ID numbers; Column B Title of publication; Column C has the authors
>>>And then columns E - AK contain data

I just created some sample data based on the above. Please see sample file attached.

Is this how you want it? Please run the macro "Sample". the data will be populated in a new sheet called "output".

HTH

Sid

Code Used

Sub Sample()
    Dim i As Long, j As Long, k As Long, LastRow As Long, r As Long, pos As Long
    Dim ws1 As Worksheet, ws2 As Worksheet
    Dim HashArray() As String, StarArray() As String
    Dim strTemp As String
    Dim rng As Range
    
    On Error Resume Next
    Application.DisplayAlerts = False
    Sheets("Output").Delete
    Application.DisplayAlerts = True
    On Error GoTo 0
    
    Set ws1 = Sheets("Raw Data")
    Set ws2 = Sheets.Add
    ws2.Name = "Output"
    
    LastRow = ws1.Range("A" & Rows.Count).End(xlUp).Row
    
    r = 1
    For i = 1 To LastRow
        Set rng = ws1.Range("E" & i & ":AK" & i)
        If InStr(1, ws1.Range("C" & i).Value, "#") Then
            strTemp = ws1.Range("B" & i).Value
            ws2.Range("A" & r).Value = ws1.Range("A" & i).Value
            ws2.Range("B" & r).Value = ws1.Range("B" & i).Value
            HashArray = Split(ws1.Range("C" & i).Value, "#")
            For j = 0 To UBound(HashArray)
                If InStr(1, HashArray(j), "*") Then
                    StarArray = Split(HashArray(j), "*")
                    For k = 0 To UBound(StarArray)
                        If Left(StarArray(k), 1) = "," Then
                            ws2.Range("C" & r).Value = Mid(StarArray(k), 3)
                        Else
                            ws2.Range("C" & r).Value = StarArray(k)
                        End If
                        pos = InStr(1, ws1.Range("C" & i).Value, StarArray(k)) + Len(StarArray(k))
                        If Mid(ws1.Range("C" & i).Value, pos, 1) = "#" Then
                            ws2.Range("D" & r).Value = "EXT"
                        ElseIf Mid(ws1.Range("C" & i).Value, pos, 1) = "*" Then
                            ws2.Range("D" & r).Value = "INT"
                        End If
                        ws2.Range("B" & r).Value = strTemp
                        rng.Copy ws2.Range("E" & r & ":AK" & r)
                        r = ws2.Range("C" & Rows.Count).End(xlUp).Row + 1
                    Next
                Else
                    If Left(HashArray(j), 1) = "," Then
                        ws2.Range("C" & r).Value = Mid(HashArray(j), 3)
                    Else
                        ws2.Range("C" & r).Value = HashArray(j)
                    End If
                    pos = InStr(1, ws1.Range("C" & i).Value, HashArray(j)) + Len(HashArray(j))

                    If Mid(ws1.Range("C" & i).Value, pos, 1) = "#" Then
                        ws2.Range("D" & r).Value = "EXT"
                    ElseIf Mid(ws1.Range("C" & i).Value, pos, 1) = "*" Then
                        ws2.Range("D" & r).Value = "INT"
                    End If
                    ws2.Range("B" & r).Value = strTemp
                    rng.Copy ws2.Range("E" & r & ":AK" & r)
                    r = ws2.Range("C" & Rows.Count).End(xlUp).Row + 1
                End If
            Next
        ElseIf InStr(1, ws1.Range("C" & i).Value, "*") Then
            strTemp = ws1.Range("B" & i).Value
            ws2.Range("A" & r).Value = ws1.Range("A" & i).Value
            ws2.Range("B" & r).Value = ws1.Range("B" & i).Value
            StarArray = Split(ws1.Range("C" & i).Value, "*")
            For k = 0 To UBound(StarArray)
                If Len(Trim(StarArray(k))) = 0 Then Exit For
                If Left(StarArray(k), 1) = "," Then
                    ws2.Range("C" & r).Value = Mid(StarArray(k), 3)
                Else
                    ws2.Range("C" & r).Value = StarArray(k)
                End If
                pos = InStr(1, ws1.Range("C" & i).Value, StarArray(k)) + Len(StarArray(k))
                If Mid(ws1.Range("C" & i).Value, pos, 1) = "#" Then
                    ws2.Range("D" & r).Value = "EXT"
                ElseIf Mid(ws1.Range("C" & i).Value, pos, 1) = "*" Then
                    ws2.Range("D" & r).Value = "INT"
                End If
                ws2.Range("B" & r).Value = strTemp
                rng.Copy ws2.Range("E" & r & ":AK" & r)

                r = ws2.Range("C" & Rows.Count).End(xlUp).Row + 1
            Next
        End If
        r = ws2.Range("C" & Rows.Count).End(xlUp).Row + 1
    Next i
End Sub

Open in new window

Raw-Data.xls
0
 
shaz0503Author Commented:
Sid

Sorry for not getting back sooner...

I ran this and looks ok although got stuck in a loop and had to 'stop'...

Will try again today at some stage and get back to you -
0
 
shaz0503Author Commented:
Sid...

Thank you so much - this has saved me a lot of time (and sores eyes)...I am now going to try and understand the code to apply to other workbooks....

rgds
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

  • 11
  • 7
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now