Link to home
Start Free TrialLog in
Avatar of Tom Crowfoot
Tom CrowfootFlag for United Kingdom of Great Britain and Northern Ireland

asked on

formula to extract info from structured data

I am looking for a couple of formulas in excel to extract the date that someone joined their current company and where they joined from in the 2 adjacent cells

The data is all in 1 cell and the data lists their employer, job title and dates chronologically and each job is separated by a line break. EG:

•  Company 3 – Job 4 (May 2018 – Present)
 - Job 3 (Mar 2017 – May 2018)
•  Company 2 – Job 2 (2005 – Feb 2017)
•  Company 1 – Job 1 (May 2000 – Jun 2004)

A change in company is always preceded by a bullet point and 2 spaces.  They may well have been promoted in their current company and where this is the case there is no bullet point just a space hyphen space then the details (as per line 2 in the above example)

So in the above example this person joined company 1 from company 2 in March 2017 (he also happened to be promoted in May 2018, but that’s not the date we’re after).

Can anybody help?
Avatar of Rob Henson
Rob Henson
Flag of United Kingdom of Great Britain and Northern Ireland image

person joined company 1 from company 2 in March 2017
I think you mean joined Company 3 in March 2017.

So, starting with Company, this will split out "Company 2" from above example with text in A2:

=TRIM(MID(A2,FIND("|",SUBSTITUTE(A2,"•","|",2),1)+1,FIND("–",A2,FIND("|",SUBSTITUTE(A2,"•","|",2),1))-FIND("|",SUBSTITUTE(A2,"•","|",2),1)-1))

Still working on the date bit.
And this will give the range of dates from Company 2:

=MID(A2,FIND("(",SUBSTITUTE(A2,"•","|",2),FIND("|",SUBSTITUTE(A2,"•","|",2),1))+1,FIND(")",SUBSTITUTE(A2,"•","|",2),FIND("|",SUBSTITUTE(A2,"•","|",2),1))-FIND("(",SUBSTITUTE(A2,"•","|",2),FIND("|",SUBSTITUTE(A2,"•","|",2),1))-1)

From sample it will return "2005 – Feb 2017"

I know that's not quite what you asked for but will it suffice?
Avatar of Tom Crowfoot

ASKER

That bit works beautifully, but includes their job title (this is always separated by a space hyphen space), if it helps the date bit will always be 4 characters or 8.  Whilst I'd love to avoid it, maybe we have a holding cell with that entire line extracted and put the date and company extract formulas based on that?
When I copied the sample I ended up with just "Company 2" and not the Job Title, maybe its a different hyphen to what was copied from the web page.

The relevant hyphen is bold in this version of the formula:

=TRIM(MID(A2,FIND("|",SUBSTITUTE(A2,"•","|",2),1)+1,FIND("–",A2,FIND("|",SUBSTITUTE(A2,"•","|",2),1))-FIND("|",SUBSTITUTE(A2,"•","|",2),1)-1))

You'll probably get the same issue on the date formula as well.

Enter the text cell and copy the hyphen used and then paste it into the relevant place in the formulas.
Yep that's sorted that bit, thank you
What about the dates bit??

To convert the above date range formula to the month in which they started at Company 3, use:

=EOMONTH(DATEVALUE("1 "&MID(D2,FIND("–",D2,1)+1,LEN(D2))),0)+1

Where the above date range formula is in D2. This assumes that the last part of the date range is in format "mmm yyyy" and changing to "1 mmm yyyy" by adding "1 " to the front is recognised as a date in your locale. The DATEVALUE then converts to a proper date and the EOMONTH rounds that date to the end of the month, the +1 then makes it the first of the next month.

If you want to skip the helper cell D2 and just show the Company 3 Start date it is a lengthy formula:

=EOMONTH(DATEVALUE("1 "&MID(MID(A2,FIND("(",SUBSTITUTE(A2,"•","|",2),FIND("|",SUBSTITUTE(A2,"•","|",2),1))+1,FIND(")",SUBSTITUTE(A2,"•","|",2),FIND("|",SUBSTITUTE(A2,"•","|",2),1))-FIND("(",SUBSTITUTE(A2,"•","|",2),FIND("|",SUBSTITUTE(A2,"•","|",2),1))-1),FIND("–",MID(A2,FIND("(",SUBSTITUTE(A2,"•","|",2),FIND("|",SUBSTITUTE(A2,"•","|",2),1))+1,FIND(")",SUBSTITUTE(A2,"•","|",2),FIND("|",SUBSTITUTE(A2,"•","|",2),1))-FIND("(",SUBSTITUTE(A2,"•","|",2),FIND("|",SUBSTITUTE(A2,"•","|",2),1))-1),1)+1,LEN(MID(A2,FIND("(",SUBSTITUTE(A2,"•","|",2),FIND("|",SUBSTITUTE(A2,"•","|",2),1))+1,FIND(")",SUBSTITUTE(A2,"•","|",2),FIND("|",SUBSTITUTE(A2,"•","|",2),1))-FIND("(",SUBSTITUTE(A2,"•","|",2),FIND("|",SUBSTITUTE(A2,"•","|",2),1))-1)))),0)+1
One potential glitch would be if the person leaves Company 2 on a particular date and has a break in employment before starting in Company 3. For example, if the sample person left Company 2 in Feb 2017 but didn't start in Company 3 until May 2017.
ASKER CERTIFIED SOLUTION
Avatar of Rob Henson
Rob Henson
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Shorter alternative and gets over the break in employment issue and allows for there being no promotion line:

=IF(ISERROR(FIND(")"&CHAR(10)&" - ",A2,1)),
MID(A2,FIND("|",SUBSTITUTE(A2,"(","|",1),1)+1,FIND("–",A2,FIND("|",SUBSTITUTE(A2,"(","|",1),1))-FIND("|",SUBSTITUTE(A2,"(","|",1),1)-1),
MID(A2,FIND("|",SUBSTITUTE(A2,"(","|",2),1)+1,FIND("–",A2,FIND("|",SUBSTITUTE(A2,"(","|",2),1))-FIND("|",SUBSTITUTE(A2,"(","|",2),1)-1))

The error check at the start looks for the string ") <<Carriage return>> - " (without the quotes). If it doesn't find that string (no promotion) it looks at the date after the first opening bracket. If it does find the string it looks at the date after the second opening bracket. This will be the start date of employment at Company 3 rather than relying on finish date from Company 2.

See attached with formulas in place. Two lots of data, each with and without promotion and break in employment, one with extended hyphen, one with short hyphen.
Text-extract.xlsx
Hi Rob

That’s amazing, thank you, I can’t test it out fully as I’m out at present, but that’s all looks 100% spot on and thank you for thinking of the gap in employment piece
Hi Rob

Thank you so much for your help with this - all tested and works perfectly, clearly a genius !