excel strip repeated text string from a cell

Hello all, in excel 2010 I need to find and strip the second occurrence of a repeated text string from a cell. This is not the same as stripping a single repeated word. I have a program that spits out repeated text strings en masse; there are many thousands of lines and I need to find only the repeated strings, delete these occurrences, and see the result in the next column.

E.g.: A1 contains "dog cat dog cat"
The result in B1 needs to be "dog cat " (note the trailing space). In my case there is a semicolon after the end of the repeated text but I suppose there could be a space as well. It would be nice if this is applicable to both instances, but I don't need the latter.

The string I need to strip can be more than one word long but it only occurs twice in the source cell. I would like to do this without a macro or add-in, if possible. makes things simpler since I further manipulate the text before copying it back to the other program.

Thanks in advance for helping me with this!
orerockonAsked:
Who is Participating?
 
barry houdiniConnect With a Mentor Commented:
If you want the result to be this

44.643100000 -118.663500000  NAD83; JDA 094.01;

then try this formula

=LEFT(A1,(LEN(A1)-39)/2+35)&RIGHT(A1,1)

if you need the _XY from the end then change the 1 at the end of that to a 4

=LEFT(A1,(LEN(A1)-39)/2+35)&RIGHT(A1,4)

regards, barry
0
 
redmondbCommented:
Hi,

So would an actual cell look like "dog;cat;dog;cat", "dog;cat;dog;cat;", or something else?

Will a cell always be "First string;Second String;First String;Second String"? Or may the strings come in a different order?

Thanks,
Brian.
0
 
redmondbCommented:
Apologies, I reckon I misread the question...

Is the format "dog cat dog cat;"  (with sometimes a space after the semi-colon)?

Regards,
Brian.
0
Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

 
redmondbCommented:
With the string in A1, please try the following in B1...
=TRIM(IF(MOD(LEN(A1),2)=0,LEFT(A1,LEN(A1)/2),LEFT(A1,(LEN(A1)-1)/2)))&" "

Regards,
Brian.
0
 
orerockonAuthor Commented:
Sorry, I wasn't clear enough in my question. The string could contain: "dog cat dog cat"(semicolon)(any characters). All I need to do is strip the second "dog cat", and leave the rest of the string intact.
0
 
redmondbCommented:
OK, even easier...
=TRIM(LEFT(A1,FIND(";",A1,1)/2))&" "

Regards,
Brian
0
 
redmondbCommented:
Nope, forget the remaining. Just a moment!
0
 
redmondbCommented:
=TRIM(LEFT(A1,8.5))&MID(A1,FIND(";",A1,1),9999)&""
0
 
redmondbCommented:
Aargh!

=TRIM(LEFT(A1,8.5))&MID(A1,FIND(";",A1,1),9999)&" "
0
 
redmondbCommented:
I'd like to have a reasonable excuse for the last few posts, but I think I'll have to settle for stupidity...

=TRIM(LEFT(A1,FIND(";",A1,1)/2))&MID(A1,FIND(";",A1,1),9999)&" "

If you don't want the semi-colon...
=TRIM(LEFT(A3,FIND(";",A3,1)/2))&MID(A3,FIND(";",A3,1)+1,9999)&" "

Regards,
Brian.
0
 
barry houdiniCommented:
Is the repetition always at the start of the string? Is the repeated text of variable length? It would be nice to see some examples of the actual strings with required results:

If you have this text

xyz abc xyz abc; def

then this formula will remove the first xyz abc......but it relies on there always being two instance of the exact same text (length could be variable) followed by a semi-colon and then more text

=REPLACE(A1,1,FIND(";",A1)/2,"")

that would also work, for instance, to change

AAAAA FFFFF AAAAA FFFFF;123

to this

AAAAA FFFFF;123

regards, barry
0
 
orerockonAuthor Commented:
Barry: Yes, the replace(find()/2) works for a specific case, as you pointed out it has to be in the beginning of the string. Here's a real world example which I should have just included earlier:

44.643100000 -118.663500000  NAD83; JDA 094.01 JDA 094.01_01;
44.642700000 -118.662000000  NAD83; JDA 094.01 JDA 094.01_02;
44.641616260 -118.661310347  NAD83; JDA 094.01 JDA 094.01_03;
44.641500000 -118.664100000  NAD83; JDA 094.01 JDA 094.01_04;

If I can strip out the second " JDA 094.01" then I can take care of the "_XY" characters. The result would be:

44.643100000 -118.663500000  NAD83; JDA 094.01;
44.642700000 -118.662000000  NAD83; JDA 094.01;
44.641616260 -118.661310347  NAD83; JDA 094.01;
44.641500000 -118.664100000  NAD83; JDA 094.01;
0
 
redmondbCommented:
orerockon,

(1) Will the repeating strings always immediately follow the first semi-colon?
(2) Will they always immediately precede the last semi-colon?
(3) Will they always be of the particular formating your examples - "AAA nnn.nn AAA nnn.nn"?
(4) Any (other) rules?

If none of the above is true, then a larger number of examples would be very useful.

Thanks,
Brian.
0
 
orerockonAuthor Commented:
barry, here's a more general format:

example: 44.643100000 -118.663500000  NAD83; JDA 094.01 JDA 094.01_01;

general format: nn.nnnnnnnnn[space][minus]nnn.nnnnnnnnn[space]NAD83[semicolon][space][repeated phrase][space][repeated phrase][underscore][two characters][semicolon]

So the string I want to strip can be a phrase of repeated characters _generally_ of the above specific example but varies somewhat in length and can have a period or not. If if helps I can remove the spaces from the repeated string but not the periods.

If this is getting too complicated then forget it, I will live with it and not waste too much of your time!

0
 
redmondbCommented:
Thanks, orerocon. Please try the following...

=MID(A1,1,FIND(";",A1,1)+1)&TRIM(MID(MID(A1,FIND(";",A1,1)+1,FIND(";",SUBSTITUTE(A1,";"," ",1),1)-FIND(";",A1,1)-4),1,LEN(MID(A1,FIND(";",A1,1)+1,FIND(";",SUBSTITUTE(A1,";"," ",1),1)-FIND(";",A1,1)-4))/2))&MID(A1,FIND(";",SUBSTITUTE(A1,";"," ",1),1)-3,9999)

It's a total abomination, but I've no doubt Barry will do his usual magic and come up with something a fraction of the size!

Regards,
Brian.
0
 
redmondbCommented:
Our formulas give different results for some cases (see below). Are any of these possible values?

(1) Remove periods from initial string...
44643100000 -118663500000  NAD83; JDA 094.01 JDA 094.01_01;
(2) Extra digit in initial value...
144.643100000 -118.663500000  NAD83; JDA 094.01 JDA 094.01_01;
(3) Two fewer digits in initial string (one is OK)...
4.643100000 -18.663500000  NAD83; JDA 094.01 JDA 094.01_01;

Regards,
Brian.
0
 
Ingeborg Hawighorst (Microsoft MVP / EE MVE)Microsoft MVP ExcelCommented:
This question has been classified as abandoned and is closed as part of the Cleanup Program. See the recommendation for more details.
0
 
redmondbCommented:
Agreed, teylyn
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.