Calyx Teren
asked on
Excel: Concatenate multiple cell values in rows with common ID, and reverse
We prototyped a new taxonomy for files by using pivot tables in Excel. There are several categories in the taxonomy, and some frequently have multiple values for the same file, while others remain the same. To make the pivot tables work, we have to have a separate row for each value. However, to upload the data into the real tool, the multiple values have to be in one cell for each file, separated by a semicolon with no spaces.
We anticipate a few revisions, and so it would be useful to have macros that can go in both directions: one to concatenate the rows and delete duplicates, and the other to split the rows up to make one value per row. There are about 5000 files in the data set.
I have found a number of scripts online that do almost what I need, but my VBA skills aren't up to finishing the job. Thanks in advance for your help. I've attached a mini sample file that contains the common variations. There's one worksheet showing the starting point (split rows), and a second one showing the desired end point (concatenated rows).
We anticipate a few revisions, and so it would be useful to have macros that can go in both directions: one to concatenate the rows and delete duplicates, and the other to split the rows up to make one value per row. There are about 5000 files in the data set.
I have found a number of scripts online that do almost what I need, but my VBA skills aren't up to finishing the job. Thanks in advance for your help. I've attached a mini sample file that contains the common variations. There's one worksheet showing the starting point (split rows), and a second one showing the desired end point (concatenated rows).
ASKER
Thanks for the instructions. That field is subtle indeed. Even after I knew it was there somewhere I had to look around for it.
Sample-File.xlsm
Sample-File.xlsm
Here is the code for concatenating and deleting rows. Install it in a regular module sheet.
As written, the macro works on the active worksheet, starting in row 2. The macro figures out how many rows of data exist. It concatenates the unique values in columns C and E, putting the concatenated results in the corresponding column of the first row of the Content ID in column A. Finally, the macro deletes the extra rows.
It's rather late for me, so I will revert to this thread for the deconcatenation macro tomorrow--unless someone else has already taken care of that for you.
Brad
Sample-FileQ28324185.xlsm
As written, the macro works on the active worksheet, starting in row 2. The macro figures out how many rows of data exist. It concatenates the unique values in columns C and E, putting the concatenated results in the corresponding column of the first row of the Content ID in column A. Finally, the macro deletes the extra rows.
Sub Concatenater()
Dim Col As Variant, v As Variant
Dim i As Long, iCol As Long, iKeep As Long, j As Long, n As Long
Dim rg As Range
Dim s As String, sConcat As String, sDelimiter As String
Application.ScreenUpdating = False
sDelimiter = ";" 'Delimiter for concatenated string
With ActiveSheet
Set rg = .Range("A2")
Set rg = Range(rg, .Cells(.Rows.Count, rg.Column).End(xlUp)) 'All the data in column A
End With
n = rg.Rows.Count
For Each Col In Array("C", "E") 'Columns to be concatenated
sConcat = sDelimiter
iCol = Range(Col & "1").Column
v = ""
For i = 1 To n
s = rg.Cells(i, iCol).Value
If s <> "" Then s = s & sDelimiter
If v = rg.Cells(i, 1).Value Then
If s <> "" Then
If InStr(1, sConcat, sDelimiter & s) = 0 Then sConcat = sConcat & s
End If
Else
If v <> "" Then rg.Cells(iKeep, iCol) = Mid(sConcat, Len(sDelimiter) + 1, Len(sConcat) - 2 * Len(sDelimiter))
iKeep = i
sConcat = sDelimiter
If s <> "" Then sConcat = sConcat & s
v = rg.Cells(i, 1).Value
End If
Next
If v <> "" Then rg.Cells(iKeep, iCol) = Mid(sConcat, Len(sDelimiter) + 1, Len(sConcat) - 2 * Len(sDelimiter))
Next
v = ""
For i = n To 1 Step -1
If rg.Cells(i, 1).Value = v Then
rg.Cells(i + 1).EntireRow.Delete
Else
v = rg.Cells(i, 1).Value
End If
Next
End Sub
It's rather late for me, so I will revert to this thread for the deconcatenation macro tomorrow--unless someone else has already taken care of that for you.
Brad
Sample-FileQ28324185.xlsm
ASKER
Thank you, Brad! I just ran it and got an "Invalid procedure call or argument" error pointing to line 25, starting with "rg." Haven't got time to try debugging that right at the moment but will see if I can tomorrow or soon. Off on a trip tomorrow so I might not be able to get back to this until tomorrow night or the next day, but I'm definitely going to be on it soon. Again, thank you for your quick help.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Here is a macro to deconcatenate your concatenated columns. Please note that the deconcatenated results may not be the same as the original data. This is because duplicates are omitted from the concatenation.
In addition, values in one column to be concatenated may not line up with values in another column. For example, suppose Category values for four rows of a certain Content ID are Poultry, Poultry, Meat, and Meat. Suppose also that Cuisine values for those four rows are Chinese, French, Chinese, and Chinese. The deconcatenated rows (two of them) will be: Poultry/Chinese and Meat/French. Besides having two rows that cannot be reconstructed, the second deconcatenated row has a combination Meat/French that never existed in the original data.
If the deconcatenated data must be exactly the same as the original, then it would be far easier to keep the original data (perhaps on a different worksheet) than to modify the data structure so it can be reconstructed.
In addition, values in one column to be concatenated may not line up with values in another column. For example, suppose Category values for four rows of a certain Content ID are Poultry, Poultry, Meat, and Meat. Suppose also that Cuisine values for those four rows are Chinese, French, Chinese, and Chinese. The deconcatenated rows (two of them) will be: Poultry/Chinese and Meat/French. Besides having two rows that cannot be reconstructed, the second deconcatenated row has a combination Meat/French that never existed in the original data.
If the deconcatenated data must be exactly the same as the original, then it would be far easier to keep the original data (perhaps on a different worksheet) than to modify the data structure so it can be reconstructed.
Sub Deconcatenater()
Dim Col As Variant, ConcatColumns As Variant, v As Variant
Dim i As Long, iCol As Long, k As Long, nRows As Long, nCols As Long
Dim rg As Range
Dim s As String, sConcat As String, sDelimiter As String
Application.ScreenUpdating = False
sDelimiter = ";" 'Delimiter for concatenated string
ConcatColumns = Array("C", "E") 'Concatenated columns
With ActiveSheet
Set rg = .Range("A2") 'First cell with Content ID. Header labels assumed to be in previous row.
nCols = .Cells(rg.Row - 1, .Columns.Count).End(xlToLeft).Column - rg.Column + 1
Set rg = Range(rg, .Cells(.Rows.Count, rg.Column).End(xlUp)) 'All the data in column A
nRows = rg.Rows.Count
Set rg = rg.Resize(nRows, nCols) 'All the data in table
End With
For i = nRows To 1 Step -1
k = 0 'Number of rows to be added
For Each Col In ConcatColumns
iCol = Range(Col & "1").Column
s = rg.Cells(i, iCol).Value
k = Application.Max(k, Len(s) - Len(Replace(s, sDelimiter, "")))
Next
If k > 0 Then
rg.Rows(i + 1).Resize(k).EntireRow.Insert
rg.Rows(i + 1).Resize(k).Value = rg.Rows(i).Value
For Each Col In ConcatColumns
iCol = Range(Col & "1").Column
s = rg.Cells(i, iCol).Value
If s <> "" Then rg.Cells(i, iCol).Resize(k + 1, 1).Value = Application.Transpose(Split(s, sDelimiter))
Next
End If
Next
End Sub
Sample-FileQ28324185.xlsm
ASKER
Thank you! Turned out that I didn't have any internet access during my trip, but I will try this tonight or tomorrow and test it. Can't wait to see the results.
ASKER
The Concatenater script works!!! You are brilliant. Thank you. You can imagine how much time this will save me.
Your suggestion to use the concatenater as a one-way trip, and just maintain the original deconcatenated table separately, is probably the best suggestion. I can work with that.
That said, when I tried the deconcatenater on the example you suggested, it didn't work quite the way I expected even with your caution.
Starting point (though from your description, two are complete duplicates):
Content ID Title Category Sodium Cuisine
123 Title Poultry 5 Chinese
123 Title Poultry 5 French
123 Title Meat 5 Chinese
123 Title Meat 5 Chinese
Concatenated:
Content ID Title Category Sodium Cuisine
123 Title Poultry;Meat 5 Chinese;French
Deconcatenated:
Content ID Title Category Sodium Cuisine
123 Title Poultry 5 Chinese
123 Title Meat 5 French
That gives the combination that doesn't exist in the original (Meat; French), but is missing a combination that does exist in the original--Meat;Chinese. It would be useful if the deconcatenater did the full permutation into Poultry;Chinese - Poultry;French - Meat;Chinese - Meat;French.
Like I said, I can work with the one-way script, using just the concatenater. You've already helped me out to the tune of probably dozens of hours, so no more is necessary. However, if you feel like going farther, let me know. If I don't hear back in a day or so, I'll come back and close this with full points. Again, thank you, and happy new year!
Your suggestion to use the concatenater as a one-way trip, and just maintain the original deconcatenated table separately, is probably the best suggestion. I can work with that.
That said, when I tried the deconcatenater on the example you suggested, it didn't work quite the way I expected even with your caution.
Starting point (though from your description, two are complete duplicates):
Content ID Title Category Sodium Cuisine
123 Title Poultry 5 Chinese
123 Title Poultry 5 French
123 Title Meat 5 Chinese
123 Title Meat 5 Chinese
Concatenated:
Content ID Title Category Sodium Cuisine
123 Title Poultry;Meat 5 Chinese;French
Deconcatenated:
Content ID Title Category Sodium Cuisine
123 Title Poultry 5 Chinese
123 Title Meat 5 French
That gives the combination that doesn't exist in the original (Meat; French), but is missing a combination that does exist in the original--Meat;Chinese. It would be useful if the deconcatenater did the full permutation into Poultry;Chinese - Poultry;French - Meat;Chinese - Meat;French.
Like I said, I can work with the one-way script, using just the concatenater. You've already helped me out to the tune of probably dozens of hours, so no more is necessary. However, if you feel like going farther, let me know. If I don't hear back in a day or so, I'll come back and close this with full points. Again, thank you, and happy new year!
The "lossy" method used for concatenation means that the original can not be reconstructed faithfully, and that artifacts such as you pointed out are likely.
If you would be willing to accept both duplicates and blanks in the concatenation, then the deconcatenation will work perfectly. For example, the Category would be concatenated as Poultry;Poultry;Meat;Meat. Even if this "loss-free" concatenation were stored in a cell comment or an auxiliary column, it could be used to reconstruct the original.
But since I suspect that your real problem is more complex than the cooking example posted in your question, I stand by my previous suggestion that the better alternative is to preserve the original worksheet before concatenation.
Brad
If you would be willing to accept both duplicates and blanks in the concatenation, then the deconcatenation will work perfectly. For example, the Category would be concatenated as Poultry;Poultry;Meat;Meat.
But since I suspect that your real problem is more complex than the cooking example posted in your question, I stand by my previous suggestion that the better alternative is to preserve the original worksheet before concatenation.
Brad
ASKER
Quick, helpful, willing to explain and even do extra work--I truly appreciate Brad's help.
I decided to store a "lossless" concatenation in a comment for each cell being concatenated. I could then use the text of that comment to perform a perfect reconstruction during deconcatenation. Text in the cells being concatenated is not affected.
Sample-FileQ28324185.xlsm
Sub Concatenater()
Dim Col As Variant, v As Variant
Dim i As Long, iCol As Long, iKeep As Long, j As Long, n As Long
Dim rg As Range
Dim s As String, sConcat As String, sDelimiter As String, sFull
Application.ScreenUpdating = False
sDelimiter = ";" 'Delimiter for concatenated string
With ActiveSheet
Set rg = .Range("A2")
Set rg = Range(rg, .Cells(.Rows.Count, rg.Column).End(xlUp)) 'All the data in column A
End With
n = rg.Rows.Count
For Each Col In Array("C", "E") 'Columns to be concatenated
sConcat = sDelimiter
iCol = Range(Col & "1").Column
v = ""
For i = 1 To n
If Not rg.Cells(i, iCol).Comment Is Nothing Then rg.Cells(i, iCol).Comment.Delete
s = rg.Cells(i, iCol).Value
If s <> "" Then s = s & sDelimiter
If v = rg.Cells(i, 1).Value Then
sFull = sFull & rg.Cells(i, iCol).Value & sDelimiter
If s <> "" Then
If InStr(1, sConcat, sDelimiter & s) = 0 Then sConcat = sConcat & s
End If
Else
If (v <> "") And (Len(sConcat) > 2 * Len(sDelimiter)) Then _
rg.Cells(iKeep, iCol) = Mid(sConcat, Len(sDelimiter) + 1, Len(sConcat) - 2 * Len(sDelimiter))
If Len(sFull) > 2 * Len(sDelimiter) Then
rg.Cells(iKeep, iCol).AddComment Mid(sFull, Len(sDelimiter) + 1, Len(sFull) - 2 * Len(sDelimiter))
End If
iKeep = i
sConcat = sDelimiter
sFull = sDelimiter & rg.Cells(i, iCol).Value & sDelimiter
If s <> "" Then sConcat = sConcat & s
v = rg.Cells(i, 1).Value
End If
Next
If Len(sFull) > 2 * Len(sDelimiter) Then
rg.Cells(iKeep, iCol).AddComment Mid(sFull, Len(sDelimiter) + 1, Len(sFull) - 2 * Len(sDelimiter))
End If
If (v <> "") And (Len(sConcat) > 2 * Len(sDelimiter)) Then _
rg.Cells(iKeep, iCol) = Mid(sConcat, Len(sDelimiter) + 1, Len(sConcat) - 2 * Len(sDelimiter))
Next
v = ""
For i = n To 1 Step -1
If rg.Cells(i, 1).Value = v Then
rg.Cells(i + 1).EntireRow.Delete
Else
v = rg.Cells(i, 1).Value
End If
Next
End Sub
Sub Deconcatenater()
Dim Col As Variant, ConcatColumns As Variant, v As Variant
Dim i As Long, iCol As Long, k As Long, nRows As Long, nCols As Long
Dim rg As Range
Dim s As String, sConcat As String, sDelimiter As String
Application.ScreenUpdating = False
sDelimiter = ";" 'Delimiter for concatenated string
ConcatColumns = Array("C", "E") 'Concatenated columns
With ActiveSheet
Set rg = .Range("A2") 'First cell with Content ID. Header labels assumed to be in previous row.
nCols = .Cells(rg.Row - 1, .Columns.Count).End(xlToLeft).Column - rg.Column + 1
Set rg = Range(rg, .Cells(.Rows.Count, rg.Column).End(xlUp)) 'All the data in column A
nRows = rg.Rows.Count
Set rg = rg.Resize(nRows, nCols) 'All the data in table
End With
For i = nRows To 1 Step -1
k = 0 'Number of rows to be added
For Each Col In ConcatColumns
iCol = Range(Col & "1").Column
If Not rg.Cells(i, iCol).Comment Is Nothing Then
s = rg.Cells(i, iCol).Comment.Text
k = Application.Max(k, Len(s) - Len(Replace(s, sDelimiter, "")))
End If
Next
If k > 0 Then
rg.Rows(i + 1).Resize(k).EntireRow.Insert
rg.Rows(i + 1).Resize(k).Value = rg.Rows(i).Value
For Each Col In ConcatColumns
iCol = Range(Col & "1").Column
If Not rg.Cells(i, iCol).Comment Is Nothing Then
s = rg.Cells(i, iCol).Comment.Text
If s <> "" Then rg.Cells(i, iCol).Resize(k + 1, 1).Value = Application.Transpose(Split(s, sDelimiter))
End If
Next
End If
For Each Col In ConcatColumns
iCol = Range(Col & "1").Column
If Not rg.Cells(i, iCol).Comment Is Nothing Then rg.Cells(i, iCol).Comment.Delete
Next
Next
End Sub
BradSample-FileQ28324185.xlsm
ASKER
Wow! Can't wait to test this!
ASKER
I've been using the non-lossy concatenate and deconcatenate scripts for a couple of days. They work perfectly. Thank you!
calyx_teren,
I'm glad you are finding the lossless concatenate and deconcatenate macros so helpful.
The question you asked is one that is likely to come up again, so I added this thread to my Experts Exchange Knowledgebase. For this reason, I'd appreciate your continuing to post in this thread if issues come up with the code. Doing so will make this question even more useful as a future reference for people.
Thanks for posting back with your results!
Brad
I'm glad you are finding the lossless concatenate and deconcatenate macros so helpful.
The question you asked is one that is likely to come up again, so I added this thread to my Experts Exchange Knowledgebase. For this reason, I'd appreciate your continuing to post in this thread if issues come up with the code. Doing so will make this question even more useful as a future reference for people.
Thanks for posting back with your results!
Brad
1. Click the Attach File link
2. Click the Browse... button to find the file in your computer or network
3. After you have selected the file, click the "Open" button in the file browser
4. Click the Upload File button in the Experts Exchange Comment (or question)
5. Add some descriptive text in the field with light gray font that says "Enter a brief description of your file (required)"
Omitting step 5 is a common mistake. I usually say "Sample file" because the field doesn't display very many words.