Has anybody written a function which compares two strings, where you can specifie a match % betweem the two strings or something like that.

It should be quite simple.

The following matchStr( s1, s2) compare s1 and s2 then compare the number of characters matched to the longer of the string:

function matchStr( s1, s2: string): real;

var c, first, last, matched: word;

begin

first:= length( s1);

last:= length( s2);

if first> last then begin

c:= first;

first:= last;

last:= c;

end;

matched:= 0;

for c:= 1 to first do

if s1[c]= s2[c] then

inc( matched);

result:= matched/last*100;

end;

let me know if this is what you want.

TMask allows the comparison of strings containing wildcards to a mask.

Source in Delphi 3\Source\Internet\masks.pa

Function FindErrors( St1, St2: ShortString; var ErrStr: ShortString): integer; gives you the error count beetwen St1 and St2 and the ErrStr give you description of error;

for your example St1:='abcd' and St2:='acde' you receive Errcount:=2 and ErrStr := 'M2; A5;'

M2 - Missing character at pos:=2

A5 - Added character at pos:=5;

However you need some modifications. You can after call to ErrCount delete all 'A' tokens and decrease the value of err cout.

Then you receive the goodcount as ' Length(St1)-NewErCount' and this is what you probably need. If you need more than ask.

This is my old proc with example how to call.

function FindErrors(St1, St2: ShortString; var ErrStr: ShortString): integer;

const MaxWidth = 255; { max width of strings }

type TErrTable = array[1..MaxWidth, 1..MaxWidth] of byte;

var

ETable: ^TErrTable;

ErrPos, ErrCount, ErrType, CurrErrPos: Integer;

L1 : byte absolute St1;

L2 : byte absolute St2;

function Err(p1, p2: integer): integer;

var e1, e2, e3: integer;

begin

result := ETable^[p1, p2];

if result = 255 then { nie sprawdzany jak sprawdzany to zwarca result }

begin

result := 0;

while (L1 >= p1) and (L2 >= p2) and (st1[p1] = st2[p2]) do begin

inc(p1); inc(p2); end; { omija te co sie zgadzaja }

if (L1 >= p1) and (L2 >= p2) then

begin

e1 := Err(p1 + 1, p2 + 1); { incorrect char }

if e1 = 0 then begin

ErrType := 1;

result := e1 + 1

end

else

begin

e2 := Err(p1, p2 + 1); { added char }

if e2 = 0 then begin

ErrType := 2;

result := e2 + 1

end

else

begin

e3 := Err(p1 + 1, p2); { mising char }

if e1 <= e2 then

if e1 <= e3 then begin

ErrType := 1;

result := e1 + 1

end

else

begin

ErrType := 3;

result := e3 + 1

end

else

if e2 <= e3 then

begin

ErrType := 2;

result := e2 + 1

end

else

begin

ErrType := 3;

result := e3 + 1

end

end;

end;

end

else

if L1 = (p1 - 1) then { s1 ended }

begin

Result := abs(L2 - p2 + 1);

if Result <> 0 then

ErrType := 2;

end

else

if L2 = (p2 - 1) then { s2 ended }

begin

Result := abs(L1 - p1 + 1);

if Result <> 0 then

ErrType := 3;

end

end;

ETable^[p1, p2] := result;

if Result <> 0 then

ErrPos := p1;

end;

begin

St1 := Copy(St1, 1, MaxWidth);

St2 := Copy(St2, 1, MaxWidth);

New(ETable);

ErrStr := '';

ErrCount := 0;

CurrErrPos := 0;

while (Length(st1) <> 0) or (Length(St2) <> 0) do

begin

ErrPos := 0;

ErrType := 0;

FillChar(ETable^, SizeOf(TErrTable), 255);

if Err(1, 1) > 0 then

begin

case ErrType of

1:begin

ErrStr := ErrStr + 'I' + IntToStr(ErrPos + CurrErrPos) + '; ';

st1 := Copy(st1, ErrPos + 1, L1); st2 := Copy(st2, ErrPos + 1, L2);

end;

2:begin

ErrStr := ErrStr + 'A' + IntToStr(ErrPos + CurrErrPos) + '; ';

st1 := Copy(st1, ErrPos, L1); st2 := Copy(st2, ErrPos + 1, L2);

Dec(CurrErrPos);

end;

3:begin

ErrStr := ErrStr + 'M' + IntToStr(ErrPos + CurrErrPos) + '; ';

st1 := Copy(st1, ErrPos + 1, L1); st2 := Copy(st2, ErrPos, L2);

end;

end;

CurrErrPos := CurrErrPos + ErrPos;

inc(ErrCount);

end

else begin

St1 := '';

St2 := '';

end;

end;

result := ErrCount;

Dispose(ETable);

end;

procedure TForm1.Button1Click(Sender

var

ErrStr: Shortstring;

begin

Edit3.Text := 'errors = ' +

IntToStr( FindErrors( ShortString(Edit1.Text), ShortString(Edit2.Text), ErrStr) );

Edit4.Text := ErrStr;

end;

With regards

mirek

:illegal character in input file ' ' ($A0)"

mayby this is you err ? try to remove the leading blanks and insert the new one. This is Ex-Ex mistake with formatting the text. However when I remove this spaces the proc will compile and work fine. I don't know what is going on.

I just can't put focus on koger's requirement.

What *exactly* is the function supposed 2 do?

Differentiations needed!

Which 1 of the compared strings is 2 b taken as the reference?

The longer 1?

What is the sense of the whole thing?

Mayb there's a different way of realizing a function 2 cum 2 the same result.

Black Death.

Look at my proc. This proc compares Str1 to Str2 and give you result lik :

Added Char (in second str) at pos ...

Missing Char at pos ...

Deleted Char at pos ...

However the main advantage is than if you compare :

ABCDEFG to ABBCDFG this proc give you result :

Added Char at pos 3 and missing Char at pos 5. This is not easy to prepare - please try :)

I think so Koger need this king of compare because he(she) need to know % of chars which is correct int the second string and of course comparing :

AAABBB to BBBAAA must return exactly 3 errors.

it's 07-28-98 01:55 a.m. now & i've fumbled around the whole evenin with a problem that's not my own.

guess what - right. it's your's. i didn't want 2 find this all out by myself, cos i'm as lazy as hell.

that's why i claimed not 2 b able 2 focus your intention in my last message. nevermind -

here's what i came out with:

1st of all i make the following assumptions (your code really ain't obliging the inclined reader 2 much!):

- Edit1.Text contains the percentage the strings in comparison have 2 match at least by

- Hit is true, if Edit1.Text % of either string is contained in the other 1

so let's have an example:

if u have "afbcee" in listbox1, "fbcf" in listbox2 & specify 66 in edit1,

then the algorithm is of the opinion that he has 2 highlight "afbcee" in listbox1 & "fbcf" in listbox2.

66% means 2 third. so far, so good.

the comparison that leads 2 the questionable hit is in the loop which leads 2 searching "fbc" in "fbcf".

but with the strings "fbcf" & "afbcee" u have a maximum match of 50% which is "fbc".

o.k.- the other way round it would make:

75% of "fbcf" r containded in "afbcee".

susceptible 2 errors.

the reason 4 this proneness is your formula 4 evaluating tegnafg:

in this case, it cumz out with 3. which can't b.

if i got your algorithm right, tegnafg (whatever the f*** that may stand 4)

should contain the number of characters of the longer string that equals 100% minus specified amount of percentage.

if i'm wrong in this, tell me.

in our example, this had 2 b 2 (longer string is "afbcee" with 100% = 6 chars, 66% = 4, 6 - 4 = 2).

but is 3.

let's switch on the floodlight 2 make this more clear - let's have a look at this:

"afbcee" compared 2 "acegg" cumz out to b highlighted.

now:

the algorithm says that he found "ace" from "afbcee" in "acegg".

that is indubitably correct.

but that isn't 66% on either side.

it's more like 50% 4 "afbcee" & 60% 4 "acegg".

agreed?

consider this:

i've tried "round" instead of "trunc" (4 evaluating tegnagnagna...).

result:

"fbcf" is not highlighted any longer - correctemundo

"acegg" the same

so far, so good. (this creepy epigram again)

try the following:

"acee" is highlighted - well done, my darling

"aceeg" the same

"aceeggg" is not highlighted - right, A, sit down

(-paragraph-)

& now 2 your 2nd problem: u said not all matches r found.

that's right.

look at:

S1 = "afbcee" & S2 = "aceegg" with 66%.

hum.

"acee" is subset of "afbcee" & equals 66%.

"acee" is subset of "aceegg" & equals 66%.

so what's the problem.

it's the way u find the longer string.

if there ain't no longer string (i.e. Length(S1) = Length(S2), S2 is taken 4 the longer string automatically.

why is this the problem?

this string is used to generate the searchstring.

which makes the other 1, S1 in this case, the reference string.

this leads 2 a search of "acee" in "afbcee".

which turns out with Pos(S2, S1) = 0.

solution:

the way 2 work around this problem should b 2 compare both directions when strings equal in length.

if in one comparison u turn out with true, u should highlight.

so far concerning function.

relating 2 your speed problem:

*pooh*!

i'm gonna give it a try.

2morrow.

not now.

if it turns out to b quick i'll let u know.

but as odd as your code loox - it's not that bad.

some obsolete variables, mayb.

& kinda disgusting layout.

but what does that matter?

seems quite clever 2 me, though.

gotta work & it does.

so - let's c what i can do 4 u. hope this bit helped a little (& hit your point)

so long,

Black Death.

Black Death

1st some remarx:

- nuthin is impossible.

- i do not think that your proc which is a function hits the point at all, becos it doesn't work the way u stated (c below)

- give me just 1 reason why i should try 2 write something like that ??

- why (?) should delphi complain bout leading blanx? how do U indent 4 layout? (!)

- $A0 is 160 decimal - has got nuthin 2 do with blanx... it's more like "Ã¡" on my machine.

but main point is this (concerning your message 2 me - 07-27-98 08:13 a.m.):

* i say: don't u fool around with me *

i dunno what's goin on in your head:

following your description of your function (yeah - i actually read it):

"AAABBB" compared 2 "BBBAAA" CANNOT turn out with exactly 3 errors at all (as u took "of course" 4 granted)

not even approximately 3.

yeah - i did it. i looked at your code. well. that was evident:

your program turns out with EXACTLY *SIX* ERRORS which r: I1; I2; I3; I4; I5; I6.

if u don't believe - try it. apparently u don't know your own code. (sorry - is it?)

what do we learn from this ... ?

concluding this remains to state the simple fact that the *king of compare* surely is in no way what koger was looking 4.

it's not even able 2 recognize a simple palindrome.

(by the way: what's it good 4? i mean - which problem can b solved by it? is it 4 a word quiz or something like that?)

anyway.mirek:

goodbye & farewell, braggart. go ballyhoo your crap elsewhere.

yours sincerely,

Black Death.

p.s.: nevermind - i don't care either.

p.p.s.: 2 all:

i wanna apologize in advance 4 this lttle outburst. normally, i'm calmness itself (my girlfriend tends 2 call me phlegmatic). but i can't stand complacent gossip guys obsessed by their image.

i do know my comments aren't brilliant either, but i give it a try & i try 2 find out how 2 solve the problem & not 2 hit the questioner with dud code.

sorry again & have a nice day,

Black Death.

Koger - you can get faster code when you decrease the MaxWidth. If you need string with length only up to 20Chars then setting the MaxWidth to 20 giver you the proc to run some faster. I can't prepare now fastest implementation because your problem need many compares. You can try to remove the recursion however I was try to create fastest proc in this way one year ago without result.

Yesterday I was prepared proc where you can say how many words need be compared in one pass and if you sat this to 5 you have fastest proc but if you put strings where more than 5 charscters is missing than you can receive bad results. When I Have some time to prepare this better to you I will post this here.

Of course you can prepare easiest proc which give you good results and they will be acceptable.

It is nice to hear so you was try my proc, however this is for compare strings , not exactly for koger question, but this is good start point for create modified algorith which need to give excellent results for koger question.

The result after compare AAABBB to BBBAA is very good in my proc, however this sounds like a joke. I will try to explain more why but with shortest example which is the same in the subject.

If you compare AB to BA than you can receive 3 answers with my proc which have the same errcount=2 and which is returned depends of priority of the mistake. My proc was created for my problem and this is not koger problem, however you can easy made same modifications to receive good proc for koger. OK but this is the errors :

1.Incorrect Char at pos 1 and incorrect char at pos 2 >>> errcount=2

2.Added Char at pos1 and missing char at pos 2 >>> errcount=2

3.Missing char at pos1 and missing char at pos 1 >> errcount=2

Probably you can't understand this but I will now try to explain how the proc work, and remember so you don't know in which string the char was missed or added (this is the difference of koger question !)

This proc compares by recursion 3 substrings

e1:=ErrCount( copy(st1,2,length(st1)), st2 )

e2:=ErrCount( St1, copy(st2,2,length(st2)) )

e3:=ErrCount( copy(st1,2,length(st1)), copy(st2,2,length(st2)) )

and after this we chose smales e from e1,e2,e3

This is very fine for comparring real words, however if e1,e2 and e3 is the same the result is sometimes not good for koger question.

I now work for some modification exactly for the koger problem, and the results come soon, however tell me if you undestand the algorithm and the idea - I think so this is the only one way to resolve this problem, but mayby I am on wrong way ?

BTW - what is nuthin ?

plase say in easiest word this :

* i say: don't u fool around with me *

i dunno what's goin on in your head:

because I can't understand.

You say :

apparently u don't know your own code. (sorry - is it?)

what do we learn from this ... ?

Mayby you now know so I know my code and mayby you can learn some from this ?

And of course I think so we will try here to talk about problems and resolve them as many as possible an your words about me is not fair. I can' say some words to you in the same look, but I think so you know what I mean about you.

Shortly : you must learn and learn and learn .... :)))

Sorry for my english.

tell me how maches you need for this examples :

s1:='abcd'

s2:='acde'

s1:='abcd'

s2:='acdb'

s1:='abcd'

s2:='acdeb'

.I think so Koger need this king of compar...."

so

BlackDeath - you was probably tell about this badly written text

please replase this as :

.I think so Koger need this kind of compar...."

I can't speak english well and this is my mistake on keyboard. When I write in other language then I sometime typing wrong characters. Sorry for the mistake I never think so I can prepare KING of compare I mean KIND of compare.

This is Black Death.

i will try and express my self as easy as possible.

"nuthin"

means: "nothing"

* i say: don't u fool around with me *

means: do not play games with me

i dunno what's goin on in your head:

means: i do not know what you had in mind (when telling me the thing about "AAABBB" and "BBBAAA"") but i think it must have been something quite strange.

i will now try to explain the point with "AAABBB" and "BBBAAA" again (maybe i have not got you right):

you told us that "AAABBB" compared to "BBBAAA" by your function will return 3 errors.

i took your function and tried it.

the result was: 6 errors: i1, i2, i3, i4, i5, i6

so - how comes you state 3 errors should be the result and your own function comes out with 6?

-> this is the point i would like to be answered. <-

as far as the king of compare is concerned:

that was the point that drove me ****ing mad.

and obviously it was a misunderstanding.

i now understood that you meant to say: kind of compare.

i think it's my turn now:

i sincerely apologize to you for being so rude. maybe i should have asked first if this could have been a mistake. apart from this, it was not one of the best days for me, that day.

i am NOT choleric, i've just been done with my nerves.

(seldomly occurs)

so - sorry again.

i'll try to show more consideration for broken english.

(mainly because mine is not brilliant either)

Have a nice day,

Black Death.

glad 2 hear it helped.

btw: chum of mine (SkyGood) is r&w - d.dynamite, 2.

i think he must have joined us already.

Black Death.

i was thinking a little bit about this problem yesterday night & came out with the following:

1st.

i just saw mirek doin a similar approach, so this isn't new any longer:

the thing is to count the matching hits of charactersfrom string1 in string2 & then u can evaluate the percentage as already proposed by mirek.

2nd.

the runtime problem:

both your approach & mirek's r using double iteration. u know this leads 2 runtimes raising exponential. 2 solve this, try the following:

function GetHitPercentage(sTst, sRef: string): real;

var

i: Integer;

s: string;

begin

s := sRef;

for i := 1 to Length(sTst) do

if Length(s) > 0 then

Delete(s, Pos(sTst[i], s), 1)

else

Break;

Result := (Length(sRef) - Length(s)) * 100 / Length(sTst);

end;

procedure TForm1.Button1Click(Sender

begin

Edit3.Text := FloatToStrF(GetHitPercenta

end;

i do not think it could b done vastly more efficient.

this func finds all chars, regardless of their order. i.e., it would give a result which would include the palindromes of the pars. but i think, that is what it must do 4 correct result.

if order is 2 b taken in2 consideration, u gotta have an index marker. the char 2 delete must have an index higher than the actual index marker. On chars not 2 b deleted, index marker is increased. i think this 1's clear enough.

watch out: this one is case sensitive,

so u gotta uppercase the paramters on function call.

the result represent, the percentage of chars of par1 included in par2.

4 reverse comparison, swap parameters.

u can try & test around with pointers. but i don't think u will increase speed significantly. this 1's relatively compact.

so - tell me what u think of it - then i can stop thinking about it (i'm sleepin shitty the last few days).

bye,

Black Death.

in this case, it could b worth an effort 2 implement pointer opertaion, indeed.

Black Death

i've performed some measurements with the turbo profiler (the function isn't delphi specific mainly (o.k. - result doesn't exist in turbo)) & came 2 interesting results. essence:

this 1 is ****ingfast.

it should solve your speed problem.

(with the strings i've tested, at the crucial point my function iterates as often as the string is long (27 times -> linear repetition); while with mirek's (& your old function similar) it is 837 times 4 the inner loop due 2 the nesting of iterations!)

that's it 4 me 4 now.

so long,

Black Death.

------------

--> ach, mirek.

your statement:

"so if you compare

ABCD to A0B1C2E3 you receive result 4

abcd to agbcbcd you receive result 4"

tell me, why would ABCD compared to A0B1C2E3 result in 4?

isn't it rather 3?

could it be you meant to write A0B1C2D3?

(which would indeed result in 4)

i mean, this isn't your bad english - it's the simple fact you don't care what you've written when submitting, am i right?

.

btw - are you interested in the result of the profiler runs i've performed to compare our algorithms?

maybe you'll reconsider your statement concerning the necessity for *me* to "learn, learn, learn..."

(actually: from you?)

Black Death.

Don't worry about this learn, learn .... I write some bad words - sorry for this, because whole problems is created only by my mistakes.

However comparing result is not needed - I was write so my last proc is very stupid and this is only for resolve of the problem. I can write faster proc and of course your proc is not very efficient because delete and copy is very slow procedures. If you replace delete and copy by other than mayby this will be good :)))

and now my answers to your questions

Q1.could it be you meant to write A0B1C2D3?

Answ :- yes I was made mistake when typing.

Q2.i will now try to explain the point with "AAABBB" and "BBBAAA" again (maybe i have not got you right):

you told us that "AAABBB" compared to "BBBAAA" by your function will return 3 errors.

i took your function and tried it.

the result was: 6 errors: i1, i2, i3, i4, i5, i6

so - how comes you state 3 errors should be the result and your own function comes out with 6?

Answ : I say : " Koger need ......comparing :

AAABBB to BBBAAA must return exactly 3 errors. " not My function returs 3 errors !

and of course My previus function (which need modification for koger) give you correct return. Because you can receive errors as :

1.I1,I2,I3,I4,I5,I6

2.A1,A1,A1,M4,M4,M4

3.M1,M2,M3,A4,A4,A4

and in this 3 results the err count = 6 so proc return the first.

My proc is created for correction between two words where the first is OK and the second have mistakes. For this problem the proc works fine, but for koger proc need some modification. My Last proc is the example of better proc for koger.

Q2."maybe you'll reconsider your statement concerning the necessity for *me* to "learn, learn, learn..." "

Answ : Why ? we all must learn, learn and learn :))

sorry for mistakes and english (This is very long answer and I can't read this twice :))

apparently there is a regrettable general misunderstanding between the both of us caused by the different ways we make use of the english language.

taking this into consideration, i suggest forgetting this whole thing in order to keep on going with the pleasant parts of life.

regards

Black Death.

p.s.: if you would have read all of my comments you would have noticed that i already took into consideration to use pointer operations in order to speed the whole mess up. (Delete and Copy surely are slow)

fact is, the runtime spent on those (when having midrange strings) is marginal (as you colud have seen in the profiler report). Nevertheless - speeding up is possible, out of question (see above).

p.s.: i did not suggest reconsidering your statement concerning "learn, learn..." because i think i'm fit to not having to learn anymore, but the way you expressed it, it had a devaluable touch.

of course we mus not stop learning all the time - agreed 100%.

btw: are you possibly from poland ?

(with regard to your syntax - no attack intended!!!)

ok - bye now

Black Death.

Please provide your email to receive a free trial preview!

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.