Link to home
Start Free TrialLog in
Avatar of CAE5942
CAE5942

asked on

Split cell contents into multiple cells

Hi everyone,

In column A of my spreadsheet, I have data that looks like the following:

<p>Some text is here</p>
<p> A name is here <br />
A title is here <br />
A company name is here <br />
<span class="dated">A date is here</span>
</p>
<a href="http://www.somesite.com/"><img alt="" src="/img/logo.jpg" style="float: right; height: 60px; margin-top: -75px; top: -70px;" /></a>

I need to separate out the information so that the data in each row of column A is separated out as follows:

Cell B = <p>Some text is here</p>
Cell C = <p> A name is here <br />
Cell D = A title is here <br />
Cell E = A company name is here <br />
Cell F = <span class="dated">A date is here</span>
Cell G = http://www.somesite.com/
Cell H = /img/logo.jpg

I don't need the html elements, so the above would look like this:

Cell B = Some text is here
Cell C = A name is here
Cell D = A title is here
Cell E = A company name is here
Cell F = A date is here
Cell G = http://www.somesite.com/
Cell H = /img/logo.jpg

The html elements, I guess, would just be used in the formula to target the different information.

I wondered if anyone could tell me how this could be done, ie. whether a formula could be used or whether it would need a VBA script?

Also note that in the example that I gave above, there is a line for company name and also a line for the image but in some instances, both those two lines are not there so it may look like this:

<p>Some text is here</p>
<p> A name is here <br />
A title is here <br />
<span class="dated">A date is here</span>
</p>

Or another variation of the above is that it would have the company name but not the image as follows:

<p>Some text is here</p>
<p> A name is here <br />
A title is here <br />
A company name is here <br />
<span class="dated">A date is here</span>
</p>

So I'm not sure if the code could cater to these types of variations in the data.

Thanks so  much in advance.
Avatar of Roy Cox
Roy Cox
Flag of United Kingdom of Great Britain and Northern Ireland image

An example workbook would help
Avatar of CAE5942
CAE5942

ASKER

Thanks for the reply,

I've attached some sample data ...
Avatar of CAE5942

ASKER

Sorry - it's attached now
sample-data.xlsx
try this...

I use a function to get the content in between.

Function getText(s As String, Optional delimiter As String = "</p>", Optional idx As Integer = 0, Optional replaceStartText As String = "<p>")
    On Error GoTo Err
    Dim tmp As String
    tmp = Split(s, delimiter)(idx)
    lastIdx = InStrRev(tmp, replaceStartText)
    If lastIdx > 0 Then
        getText = Trim(Replace(Mid(tmp, lastIdx + Len(replaceStartText)), vbLf, ""))
    Else
        getText = Trim(Replace(tmp, vbLf, ""))
    End If
    Exit Function
Err:
    getText = ""
End Function

Open in new window

sample-data_b.xlsm
Avatar of CAE5942

ASKER

Thanks for the code. I opened up your file and then went under the Tools menu and then under Macros but I couldn't see the macro there in order to run it. I can see it in the visual basic editor but I don't know how to run it from there. Can you let me know what I should do?
I have put the function into the cell's formula... like:

=getText(A1)

you can download the sample file I had uploaded in my previous comment.
Avatar of CAE5942

ASKER

Ok thanks. I've tested the formula but it doesn't seem to be working correctly.

In Row 1:

(1) column F should say "Company Name1A" but instead it's showing the date
(2) column G should be the date but instead it's showing the url
(3) column H should be the url but instead it's showing the image path
(4) column I should be the image path

In Row 2:

(1) column E is showing the span around the date when it should only show the date itself
(2) column F is showing the date by itself when it should be in column E
(3) column G is showing the following:

"<p>Testimonial text is here</p>
<p> Firstname2 Lastname2 <br />
    Position2<br />
<span class=""dated"">22/9/2014</span>
</p>"

(4) column H is showing the following:

"<p>Testimonial text is here</p>
<p> Firstname2 Lastname2 <br />
    Position2<br />
<span class=""dated"">22/9/2014</span>
</p>"

The other rows are not displaying correctly either.

Are you able to help further?
Avatar of CAE5942

ASKER

Perhaps the problem is that the content of the cells differs. I thought that you could use the html elements to formulate some code but maybe it's not possible?
>>(2) column G should be the date but instead it's showing the url
>>(3) column H should be the url but instead it's showing the image path
>>(4) column I should be the image path

this is different from what you have defined in your question...

Cell F = A date is here
Cell G = http://www.somesite.com/
Cell H = /img/logo.jpg
>>Perhaps the problem is that the content of the cells differs. I thought that you could use the html elements to formulate some code but maybe it's not possible?

some of the content is not "formatted" with proper value, for example: Cell A3 do not have the Position.

<p>Testimonial text is here</p>
<p> Firstname3 Lastname3 <br />
class="dated">14/8/2014</span>
</p>

so we do need a better handler for different scenarios
Avatar of CAE5942

ASKER

Sorry when I first posted the question I didn't realise the variations in Column A. I've since worked out that there will be 5 variations, ie. the rows in Column A will either be broken into 3, 4, 6, 7 or 8 columns.

Due to these variation, will this still be possible?
can you post the latest Excel you have and then we shall work from there onwards...
Avatar of CAE5942

ASKER

The latest one that I have is the one that you're working from. If you look at the contents in Column A you can see the variations. For example Row 1 has the following information:

<p>Testimonial text is here</p>
<p> Firstname1 Lastname1 <br />
Position1<br />
Company Name1<br />
Company Name1A <br />
<span class="dated">1/9/2015</span>
</p>
<a href="http://www.someonesite.com1/"><img alt="" src="/img/logo1.jpg" style="float: right; height: 60px; margin-top: -75px; top: -70px;" /></a>

The above row would need to be split into 8 columns as there is 8 sets of information.

The following is Row B:

<p>Testimonial text is here</p>
<p> Firstname2 Lastname2 <br />
Position2<br />
<span class="dated">22/9/2014</span>
</p>

The above row would only be split into 4 columns as there is only 4 sets of information.

The other 3 rows are also different.

Are you able to work from that same spreadsheet?
>>Are you able to work from that same spreadsheet?
that's the reason why I suggest you to upload the one you have to avoid discrepancies and waste of efforts. cheers
Avatar of CAE5942

ASKER

Ok attached is the sample data - thanks
sample-data.xlsx
it's seems that there are some conflict in your data, for example:

in cell A2:

<p>Testimonial text is here</p>
<p> Firstname2 Lastname2 <br />
   Position2<br />
<span class="dated">22/9/2014</span>
</p>

in cell A4:

<p>Testimonial text is here</p>
<p> Firstname4 Lastname4 <br />
Company Name4<br />
<span class="dated">19/6/2014</span>
</p>
<a href="http://www.someonesite.com4/"><img alt="" src="/img/logo4.jpg" style="float: right; height: 60px; margin-top: -75px; top: -70px;" /></a>

both content hold different values but are in same position. How's your Text Mining logic to handle this?
Avatar of CAE5942

ASKER

Hi, yeah I realise that there are conflicts but to be honest I don't know how to handle them. As long as the information is in its only cell and column, I can then just go through manually and move things around after the code has run.

Are you ok with that? I understand if it's not possible to do.
a decision has to be made either by yourself or by your immediate supervisor to firm out the text mining logic. Without a specific tag or logic in text content, we may need to sacrifice some data accuracy when we do the mining
Avatar of CAE5942

ASKER

Maybe the logic could be:

Anything between <p> and </p>, copy to new cell
Anything between <p> and <br />, copy to new cell
Anything before a <br /> tag, copy to new cell
Anything between span tags, copy to new cell

And the same for the url and image file path
Avatar of CAE5942

ASKER

If you can't do that, then could we revise the scope and just copy the date, the url and image file path. That would be enough I think.
Avatar of CAE5942

ASKER

The rest I can do manually.
ASKER CERTIFIED SOLUTION
Avatar of Ryan Chong
Ryan Chong
Flag of Singapore image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of CAE5942

ASKER

Ok thanks for your help - I appreciate it.