and only thing i would do is, will add the cleansed data in a new sheet rather than adding new rows inbetween and clean it. Coz, it might cause some disturbance to existing data.

-Bala

Solved

Posted on 2011-04-26

All

I have a large data set (???) that needs tidying before being 'uploaded' to stats program.

As this data set will be updated on a bi annual basis, I am looking for a solution to a main data tidying exercise.

In one field of data, names of authors, there can be a single author or up to 25 authors in a single cell.

What I need to do is to insert duplicate rows for each of these authors. That is if there are 10 authors, I need to insert 9 rows BUT each row will only contain one name.

My solution for this round was to insert rows and then manually remove names so that each author had one row.

(All other data is repeated for each author)

In addition, where there is an * after the name I need to add Int to column D and where there is a # I need to add Ext to column D

See attached example

Any ideas or thoughts would be much appreciated as I don't necessarily want to be copying and pasting and deleting for two weeks again to get data in usable format.

TIA

Shaz

ERA-EE.xlsx

I have a large data set (???) that needs tidying before being 'uploaded' to stats program.

As this data set will be updated on a bi annual basis, I am looking for a solution to a main data tidying exercise.

In one field of data, names of authors, there can be a single author or up to 25 authors in a single cell.

What I need to do is to insert duplicate rows for each of these authors. That is if there are 10 authors, I need to insert 9 rows BUT each row will only contain one name.

My solution for this round was to insert rows and then manually remove names so that each author had one row.

(All other data is repeated for each author)

In addition, where there is an * after the name I need to add Int to column D and where there is a # I need to add Ext to column D

See attached example

Any ideas or thoughts would be much appreciated as I don't necessarily want to be copying and pasting and deleting for two weeks again to get data in usable format.

TIA

Shaz

ERA-EE.xlsx

20 Comments

and only thing i would do is, will add the cleansed data in a new sheet rather than adding new rows inbetween and clean it. Coz, it might cause some disturbance to existing data.

-Bala

Any help you are able to offer is appreciated.... Not really urgent, would just like to get it before the next round of data is out (prob in July) so I can see that it works.

Adding a new sheet is not a problem

rgds

Please run the macro "Sample" in the attachment. The output will be generated in a new sheet called Output.

HTH

Sid

```
Sub Sample()
Dim i As Long, j As Long, k As Long, LastRow As Long, r As Long, pos As Long
Dim ws1 As Worksheet, ws2 As Worksheet
Dim HashArray() As String, StarArray() As String
Dim strTemp As String
On Error Resume Next
Application.DisplayAlerts = False
Sheets("Output").Delete
Application.DisplayAlerts = True
On Error GoTo 0
Set ws1 = Sheets("Raw Data")
Set ws2 = Sheets.Add
ws2.Name = "Output"
LastRow = ws1.Range("A" & Rows.Count).End(xlUp).Row
r = 1
For i = 1 To LastRow
If InStr(1, ws1.Range("B" & i).Value, "#") Or InStr(1, ws1.Range("B" & i).Value, "*") Then
strTemp = ws1.Range("A" & i).Value
ws2.Range("A" & r).Value = ws1.Range("A" & i).Value
HashArray = Split(ws1.Range("B" & i).Value, "#")
For j = 0 To UBound(HashArray) - 1
If InStr(1, HashArray(j), "*") Then
StarArray = Split(HashArray(j), "*")
For k = 0 To UBound(StarArray)
If Left(StarArray(k), 1) = "," Then
ws2.Range("B" & r).Value = Mid(StarArray(k), 3)
Else
ws2.Range("B" & r).Value = StarArray(k)
End If
pos = InStr(1, ws1.Range("B" & i).Value, StarArray(k)) + Len(StarArray(k))
If Mid(ws1.Range("B" & i).Value, pos, 1) = "#" Then
ws2.Range("C" & r).Value = "EXT"
ElseIf Mid(ws1.Range("B" & i).Value, pos, 1) = "*" Then
ws2.Range("C" & r).Value = "INT"
End If
ws2.Range("A" & r).Value = strTemp
r = ws2.Range("B" & Rows.Count).End(xlUp).Row + 1
Next
Else
If Left(HashArray(j), 1) = "," Then
ws2.Range("B" & r).Value = Mid(HashArray(j), 3)
Else
ws2.Range("B" & r).Value = HashArray(j)
End If
pos = InStr(1, ws1.Range("B" & i).Value, HashArray(k)) + Len(HashArray(k))
If Mid(ws1.Range("B" & i).Value, pos, 1) = "#" Then
ws2.Range("C" & r).Value = "EXT"
ElseIf Mid(ws1.Range("B" & i).Value, pos, 1) = "*" Then
ws2.Range("C" & r).Value = "INT"
End If
ws2.Range("A" & r).Value = strTemp
r = ws2.Range("B" & Rows.Count).End(xlUp).Row + 1
End If
Next
End If
r = ws2.Range("B" & Rows.Count).End(xlUp).Row + 1
Next i
End Sub
```

ERA-EE.xlsm
This worked in excellent in part...see attached file

There was an error saying 'subscript out of range'

You will see that only 64 rows of data are in the Output sheet. There should be in excess of 3000.....

Also seems to be skipping some rows - see row three on teh raw data sheet is missing and so are rows 9, 10, 11.

Copy-of-ERA-2010---HASS-master-l.xls

Please try this file. let me know if you see any discrepancies.

Sid

Raw-Data.xls

u r the Expert!!!!

Thank you so much - you have saved me so much time....I will now try this with all the columns of data (30 in total) and get back if any problems...

The only one I can see from what you have done is that the last row does not record and INT or EXT .... Does part of the code need to be 'extended'

rgds

The will have to be amended for 30 columns :)

>>>The only one I can see from what you have done is that the last row does not record and INT or EXT .... Does part of the code need to be 'extended'

Let me check that for you.

Sid

I didn't want to send the entire file - should have though

The file actually has data in coulmns A - AJ

If this is what is needed

I am off home for the day - will await your reply... talk tomorrow

rgds

That is good enough. There is just a minor change which I will upload in the next few moments

Sid

I just noticed that it created that extra row when it shouldn't it is just a duplicate of the above row :)

I will incorporate that in the code.

Sid

Quick question.

What is in Col C or Rawdata? The reason I am asking is because e are going to put INT/EXT in Col C in the output file.

Sid

Apologies for not getting back to you...

Column A has ID numbers; Column B Title of publication; Column C has the authors (this is the one to be 'split/duplicated';

I need to add a column at Column D to insert the 'INT/EXT'

and then columns E - AK contain data

Sorry for the confusion on my part

>>>And then columns E - AK contain data

I just created some sample data based on the above. Please see sample file attached.

Is this how you want it? Please run the macro "Sample". the data will be populated in a new sheet called "output".

HTH

Sid

```
Sub Sample()
Dim i As Long, j As Long, k As Long, LastRow As Long, r As Long, pos As Long
Dim ws1 As Worksheet, ws2 As Worksheet
Dim HashArray() As String, StarArray() As String
Dim strTemp As String
Dim rng As Range
On Error Resume Next
Application.DisplayAlerts = False
Sheets("Output").Delete
Application.DisplayAlerts = True
On Error GoTo 0
Set ws1 = Sheets("Raw Data")
Set ws2 = Sheets.Add
ws2.Name = "Output"
LastRow = ws1.Range("A" & Rows.Count).End(xlUp).Row
r = 1
For i = 1 To LastRow
Set rng = ws1.Range("E" & i & ":AK" & i)
If InStr(1, ws1.Range("C" & i).Value, "#") Then
strTemp = ws1.Range("B" & i).Value
ws2.Range("A" & r).Value = ws1.Range("A" & i).Value
ws2.Range("B" & r).Value = ws1.Range("B" & i).Value
HashArray = Split(ws1.Range("C" & i).Value, "#")
For j = 0 To UBound(HashArray)
If InStr(1, HashArray(j), "*") Then
StarArray = Split(HashArray(j), "*")
For k = 0 To UBound(StarArray)
If Left(StarArray(k), 1) = "," Then
ws2.Range("C" & r).Value = Mid(StarArray(k), 3)
Else
ws2.Range("C" & r).Value = StarArray(k)
End If
pos = InStr(1, ws1.Range("C" & i).Value, StarArray(k)) + Len(StarArray(k))
If Mid(ws1.Range("C" & i).Value, pos, 1) = "#" Then
ws2.Range("D" & r).Value = "EXT"
ElseIf Mid(ws1.Range("C" & i).Value, pos, 1) = "*" Then
ws2.Range("D" & r).Value = "INT"
End If
ws2.Range("B" & r).Value = strTemp
rng.Copy ws2.Range("E" & r & ":AK" & r)
r = ws2.Range("C" & Rows.Count).End(xlUp).Row + 1
Next
Else
If Left(HashArray(j), 1) = "," Then
ws2.Range("C" & r).Value = Mid(HashArray(j), 3)
Else
ws2.Range("C" & r).Value = HashArray(j)
End If
pos = InStr(1, ws1.Range("C" & i).Value, HashArray(j)) + Len(HashArray(j))
If Mid(ws1.Range("C" & i).Value, pos, 1) = "#" Then
ws2.Range("D" & r).Value = "EXT"
ElseIf Mid(ws1.Range("C" & i).Value, pos, 1) = "*" Then
ws2.Range("D" & r).Value = "INT"
End If
ws2.Range("B" & r).Value = strTemp
rng.Copy ws2.Range("E" & r & ":AK" & r)
r = ws2.Range("C" & Rows.Count).End(xlUp).Row + 1
End If
Next
ElseIf InStr(1, ws1.Range("C" & i).Value, "*") Then
strTemp = ws1.Range("B" & i).Value
ws2.Range("A" & r).Value = ws1.Range("A" & i).Value
ws2.Range("B" & r).Value = ws1.Range("B" & i).Value
StarArray = Split(ws1.Range("C" & i).Value, "*")
For k = 0 To UBound(StarArray)
If Len(Trim(StarArray(k))) = 0 Then Exit For
If Left(StarArray(k), 1) = "," Then
ws2.Range("C" & r).Value = Mid(StarArray(k), 3)
Else
ws2.Range("C" & r).Value = StarArray(k)
End If
pos = InStr(1, ws1.Range("C" & i).Value, StarArray(k)) + Len(StarArray(k))
If Mid(ws1.Range("C" & i).Value, pos, 1) = "#" Then
ws2.Range("D" & r).Value = "EXT"
ElseIf Mid(ws1.Range("C" & i).Value, pos, 1) = "*" Then
ws2.Range("D" & r).Value = "INT"
End If
ws2.Range("B" & r).Value = strTemp
rng.Copy ws2.Range("E" & r & ":AK" & r)
r = ws2.Range("C" & Rows.Count).End(xlUp).Row + 1
Next
End If
r = ws2.Range("C" & Rows.Count).End(xlUp).Row + 1
Next i
End Sub
```

Raw-Data.xls
By clicking you are agreeing to Experts Exchange's Terms of Use.

Title | # Comments | Views | Activity |
---|---|---|---|

How to calculate the QUARTILE ? | 5 | 25 | |

Average across mutiple sheets | 5 | 31 | |

Sum based on custom date ranges | 4 | 25 | |

Using Pop Up Calculator in Excel | 4 | 27 |

Join the community of 500,000 technology professionals and ask your questions.

Connect with top rated Experts

**8** Experts available now in Live!