Link to home
Start Free TrialLog in
Avatar of orerockon
orerockonFlag for United States of America

asked on

excel strip repeated text string from a cell

Hello all, in excel 2010 I need to find and strip the second occurrence of a repeated text string from a cell. This is not the same as stripping a single repeated word. I have a program that spits out repeated text strings en masse; there are many thousands of lines and I need to find only the repeated strings, delete these occurrences, and see the result in the next column.

E.g.: A1 contains "dog cat dog cat"
The result in B1 needs to be "dog cat " (note the trailing space). In my case there is a semicolon after the end of the repeated text but I suppose there could be a space as well. It would be nice if this is applicable to both instances, but I don't need the latter.

The string I need to strip can be more than one word long but it only occurs twice in the source cell. I would like to do this without a macro or add-in, if possible. makes things simpler since I further manipulate the text before copying it back to the other program.

Thanks in advance for helping me with this!
Avatar of redmondb
redmondb
Flag of Afghanistan image

Hi,

So would an actual cell look like "dog;cat;dog;cat", "dog;cat;dog;cat;", or something else?

Will a cell always be "First string;Second String;First String;Second String"? Or may the strings come in a different order?

Thanks,
Brian.
Apologies, I reckon I misread the question...

Is the format "dog cat dog cat;"  (with sometimes a space after the semi-colon)?

Regards,
Brian.
With the string in A1, please try the following in B1...
=TRIM(IF(MOD(LEN(A1),2)=0,LEFT(A1,LEN(A1)/2),LEFT(A1,(LEN(A1)-1)/2)))&" "

Regards,
Brian.
Avatar of orerockon

ASKER

Sorry, I wasn't clear enough in my question. The string could contain: "dog cat dog cat"(semicolon)(any characters). All I need to do is strip the second "dog cat", and leave the rest of the string intact.
OK, even easier...
=TRIM(LEFT(A1,FIND(";",A1,1)/2))&" "

Regards,
Brian
Nope, forget the remaining. Just a moment!
=TRIM(LEFT(A1,8.5))&MID(A1,FIND(";",A1,1),9999)&""
Aargh!

=TRIM(LEFT(A1,8.5))&MID(A1,FIND(";",A1,1),9999)&" "
I'd like to have a reasonable excuse for the last few posts, but I think I'll have to settle for stupidity...

=TRIM(LEFT(A1,FIND(";",A1,1)/2))&MID(A1,FIND(";",A1,1),9999)&" "

If you don't want the semi-colon...
=TRIM(LEFT(A3,FIND(";",A3,1)/2))&MID(A3,FIND(";",A3,1)+1,9999)&" "

Regards,
Brian.
Is the repetition always at the start of the string? Is the repeated text of variable length? It would be nice to see some examples of the actual strings with required results:

If you have this text

xyz abc xyz abc; def

then this formula will remove the first xyz abc......but it relies on there always being two instance of the exact same text (length could be variable) followed by a semi-colon and then more text

=REPLACE(A1,1,FIND(";",A1)/2,"")

that would also work, for instance, to change

AAAAA FFFFF AAAAA FFFFF;123

to this

AAAAA FFFFF;123

regards, barry
Barry: Yes, the replace(find()/2) works for a specific case, as you pointed out it has to be in the beginning of the string. Here's a real world example which I should have just included earlier:

44.643100000 -118.663500000  NAD83; JDA 094.01 JDA 094.01_01;
44.642700000 -118.662000000  NAD83; JDA 094.01 JDA 094.01_02;
44.641616260 -118.661310347  NAD83; JDA 094.01 JDA 094.01_03;
44.641500000 -118.664100000  NAD83; JDA 094.01 JDA 094.01_04;

If I can strip out the second " JDA 094.01" then I can take care of the "_XY" characters. The result would be:

44.643100000 -118.663500000  NAD83; JDA 094.01;
44.642700000 -118.662000000  NAD83; JDA 094.01;
44.641616260 -118.661310347  NAD83; JDA 094.01;
44.641500000 -118.664100000  NAD83; JDA 094.01;
orerockon,

(1) Will the repeating strings always immediately follow the first semi-colon?
(2) Will they always immediately precede the last semi-colon?
(3) Will they always be of the particular formating your examples - "AAA nnn.nn AAA nnn.nn"?
(4) Any (other) rules?

If none of the above is true, then a larger number of examples would be very useful.

Thanks,
Brian.
barry, here's a more general format:

example: 44.643100000 -118.663500000  NAD83; JDA 094.01 JDA 094.01_01;

general format: nn.nnnnnnnnn[space][minus]nnn.nnnnnnnnn[space]NAD83[semicolon][space][repeated phrase][space][repeated phrase][underscore][two characters][semicolon]

So the string I want to strip can be a phrase of repeated characters _generally_ of the above specific example but varies somewhat in length and can have a period or not. If if helps I can remove the spaces from the repeated string but not the periods.

If this is getting too complicated then forget it, I will live with it and not waste too much of your time!

Thanks, orerocon. Please try the following...

=MID(A1,1,FIND(";",A1,1)+1)&TRIM(MID(MID(A1,FIND(";",A1,1)+1,FIND(";",SUBSTITUTE(A1,";"," ",1),1)-FIND(";",A1,1)-4),1,LEN(MID(A1,FIND(";",A1,1)+1,FIND(";",SUBSTITUTE(A1,";"," ",1),1)-FIND(";",A1,1)-4))/2))&MID(A1,FIND(";",SUBSTITUTE(A1,";"," ",1),1)-3,9999)

It's a total abomination, but I've no doubt Barry will do his usual magic and come up with something a fraction of the size!

Regards,
Brian.
ASKER CERTIFIED SOLUTION
Avatar of barry houdini
barry houdini
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Our formulas give different results for some cases (see below). Are any of these possible values?

(1) Remove periods from initial string...
44643100000 -118663500000  NAD83; JDA 094.01 JDA 094.01_01;
(2) Extra digit in initial value...
144.643100000 -118.663500000  NAD83; JDA 094.01 JDA 094.01_01;
(3) Two fewer digits in initial string (one is OK)...
4.643100000 -18.663500000  NAD83; JDA 094.01 JDA 094.01_01;

Regards,
Brian.
This question has been classified as abandoned and is closed as part of the Cleanup Program. See the recommendation for more details.
Agreed, teylyn