Solved

What string comparision does Microsoft allows in VB?

Posted on 2004-04-29
17
267 Views
Last Modified: 2010-05-02
Where in Microsoft documentation is stated and guaranteed for the future that syntax:

  if "aaa" < "aba" then msgbox "I guess"

is allowed and will output "I guess".

Perhaps, it is hidden in some early documentation for VB3 and
enforced in higher versions by backward compatibility.
Perhaps, there are some official examples from Microsoft.

Please don't make answers like "this works". For example, it is known that
following program will work. This is not a question. The question intends to
find word from the vendor which is most reliable quarantee.

Option Compare Binary
.....
Private Sub Form_Load()

t = "aaaa" < "aaba"
t = t And " aa" < "aaa"
t = t And "aa" > "aA"
t = t And Chr(0) & "aa" < "aaa"
t = t And "aa" < "aaa"
t = t And "" < "a"

f = "aaaa" > "aaba"
f = f Or "aaaa" >= "aaba"
f = f Or "aaa" < "aa" & Chr(0)

If t And Not f Then MsgBox "Works"
'The real output is "Works"

End Sub

Thank you very much.
Beaverstone.
0
Comment
Question by:beaverstone
  • 12
  • 5
17 Comments
 
LVL 4

Expert Comment

by:Javin007
ID: 10952955
0
 

Author Comment

by:beaverstone
ID: 10953693
Thank you Javin007.

This is very close. But, I don't see that VB allows to compare second, and n-th characters
if first and (n-1)th are equal. All examples which I see restricted to the first character
comparision. (Like printer is before scanner because p is less than s.
But, will be aaa less than aab?)

Thank you.
Beaverstone.
0
 
LVL 4

Expert Comment

by:Javin007
ID: 10954446
Actually, it's not even comparing the characters, Beaverstone.  What it's doing is taking the bit values of the whole string:

abc = 011000010110001001100011
aaa = 011000010110000101100001

And it orders the string according to the bits, left to right.

-Javin
0
 
LVL 4

Expert Comment

by:Javin007
ID: 10954458
Side note:

It also considers the lack of a bit to be a null string (chr$(0), or 00000000)

So if
aaa=011000010110000101100001 and
  aa=011000010110000100000000, then

aa < aaa

-Javin
0
 
LVL 4

Expert Comment

by:Javin007
ID: 10954509
God this thing needs an edit function...

Rereading what I wrote, I noticed that this could be confusing:  "And it orders the string according to the bits, left to right."

I don't mean it orders the string according to the physical bits, but to the total value of the bits.  This is why capitalized letters are considered "smaller" than their counterparts.  Thus a > AA

Strangely enough, microsoft's own Listboxes and controls don't use this method to alphabetize.

-Javin
0
 
LVL 4

Expert Comment

by:Javin007
ID: 10954531
:/  I'm not explaining this very well.  It made more sense to me before I tried to explain it.

-Javin
0
 
LVL 4

Expert Comment

by:Javin007
ID: 10954562
"Option Compare Binary results in string comparisons based on a sort order derived from the internal binary representations of the characters. In Microsoft Windows, sort order is determined by the code page. A typical binary sort order is shown in the following example:"

I think that's pretty much all you're going to find microsoft saying on the subject.  But suffice to say that comparisons like you were doing are by default binary comparisons.  Maybe someone else can explain it better.  The concept itself is very simple (and basic for most languages, this isn't just a Microsoft thing), but the result is that aab will always be greater than aaa.

-Javin
0
 

Author Comment

by:beaverstone
ID: 10954982
Thank you Javin, you are doing more work than asked.

The question is not about how Microsof implements string comparision.
The question is not about how to describe ths string comparision algorithm in
equivalent languich of "bytes" or other representaion.

The question is about to find word from Microsoft that it will work as stated in question.
Why Microsoft does not offer and example where second or n-th character are compared?
Why you seems cannot find this example in MSDN, or VB3, or VB4, VB6 documentation?
All examples which I see restricted to the first character
comparision. (Like printer is before scanner because p is less than s.
Perhaps  be aaa less than aab only accidentaly because some Microsoft programmer
did an extra job which is not documented and not intended to be supported?)

Althoug particular implementation of the algorithm is not a subject of the question, but
for accuracy let me note that adding a char(0) at the end seems incorrect as shows this
program

If "aa" < "aa" & Chr(0) Then MsgBox "Not At the end."
If "aa" > Chr(0) & "aa" Then MsgBox "Not At the beginning"
If Chr(0) > "" Then MsgBox "Not chr(0)"

Perhaps the rule is that any char at the end is greater than unfilled position at the end.
But again, does MS guarantees this detais along with entire algorithm?
The only reason that we can rely on this is that "common sence" suggestes this algorithm,
and MS may be forced to follow common practice in order to be "in the crowd".

Thank you.
Beaverstone.


0
Maximize Your Threat Intelligence Reporting

Reporting is one of the most important and least talked about aspects of a world-class threat intelligence program. Here’s how to do it right.

 
LVL 4

Expert Comment

by:Javin007
ID: 10955522
I don't believe you're going to find ANY documentation on this, though, because it's not a Microsoft implementation.  It's something that comes with any language you use.  C++, C#, Delphi, Basica, etc.  When a language looks at the string, it sees an array of bytes, not a string.  More accurately, it sees a very long string of bits.  Thus, the language DOESN'T see it when you compare "abc."  It doesn't SEE abc.  What it sees is the bitvalue equivilent of abc, which is 6513249.  That's why I was trying to explain how it works, and why you won't find the "guarantee" you're wanting.  Microsoft would spend no more time explaining that function than they would spend explaining the low-level details of what AND does.

-Javin
0
 

Author Comment

by:beaverstone
ID: 10962633
Thank you Javin for your comments.

The idea described in your explanation
"It's something that comes with any language you use" perhaps right.
I understand this idea that there is certain "unspoken agreement" or
"programming culture" to generalize string comparision algorigthm including all the characters
not only the first. You probably trying to point that Microsoft implicity follows
this culture. But, if so, this culture must have traces in magazins, journals,
documentation or examples.
This is what the part of real job of  programmer is - read documentation.

Nothing can force Microsoft to follow this culture.
Even if there were were "low-level requirements", nothing can prevent programmer
from using getchar C-function, or "sub" Assembler instruction to
implement character by character or word by word string comparision.

Your explanation tried to support that algorithm by idea of string as an "udivided entity",
In particular, your explanation tries to model string as a number.
and state that programming language compares numbers not a characters.
That particular model is incorrect. It is a Myth which sometimes programmes have
to create in the mind to picture the backend of the system which low-level they don't
know and do not have to know.

Inded: if to follow this model, "b" = x42, "ab" = x4142 and "ab" > "b".
(xNN - hexadecimal representation). It to try
to fix this flaw in the model by adding char(0) at the end, then the model still does
not work:  Compare s1="b" and s2="b" & chr(0). If to add char(0) to the s1 to
make it equal in length, then the model give s1 >=  s2, which is not the case in VB.
(I've pointed to this in previous comment, but this seems ignored.)

Finally, there are absolutely no low-level requirements to follow this algorithm.
Inded: if string in the memory of 16bit -Intel based computer is
   "ABCDE",
then one of the ways to submit this string  to CPU for comparision is to read this
string from memory via computer bus and use CPU instruction "subscruct".
In this case, string is read in sequence:

  B A   D C   E  

Not only string is split by two-byte fragments, each fragment is filipped.
And, C and Assembler programmer must make loop via words; moreover,
handling inside of the loop will be different for round word and half word,

I am returning to my question:
what I expecting from Expert is to find evidence of algorithm in literature, or
disassemble the piece of VB string comparision program to
reveal the truth (which is still occasional; the word from vendor
is much reliable).

Thank you.
Beaverstone.


0
 
LVL 4

Expert Comment

by:Javin007
ID: 10962671
Well, then I bow out of this one.  Because as I've said, I don't think it exists.

-Javin
0
 
LVL 4

Expert Comment

by:Javin007
ID: 10962802
By the way, I don't see your logic behind the argument that my explanation of the bits comparison "doesn't work" with the added Chr$(0)

All of the statements are the exact same, with the exact same values, and all are true:

"a" & chr$(0) > a
0110000100000000 > 01100001
 24832 > 97

I have absolutely NO clue what you were talking about when you got into your argument about hexidecimals and flipping string values, but it made no sense to me what so ever.  Not even logically.

-Javin
0
 
LVL 4

Expert Comment

by:Javin007
ID: 10963841
As you can tell, this question has been bugging me.  :)  I hate not getting a satisfactory answer, and it won't leave me alone.

Maybe this is what you're looking for:

http://www.microsoft.com/globaldev/getwr/steps/wrg_sort.mspx
0
 
LVL 4

Accepted Solution

by:
Javin007 earned 500 total points
ID: 10964014
Well, I've exhausted the search engines on both Microsoft, and Google, and have determined that you won't find an authoritative answer on WHY Binary String Comparison does what it does.  This would be the equivilent of asking someone to explain why 2+2=4 on a binary level.  People just seem to assume nobody's going to ask that question.  So in answer to your question:

"Where in Microsoft documentation is stated and guaranteed for the future that syntax:"

The answer is simply, nowhere.  The syntax you're asking about is basic binary string comparison.  I've searched high and low through microsoft, and there's no explanation as to HOW or WHY binary string comparison works.  

The closest you are going to find is the following, where microsoft quotes from a book ("Faster Smarter Beginning Programming" by Jim Buyens):

>Comparing Strings
>When comparing two strings, Visual Basic .NET starts by comparing the first character of each operand, then the next >character of each operand, and so forth, until it finds two unequal characters or until one string runs out of characters.
>
>
>If it finds two unequal characters, the result of comparing them becomes the result of the entire operation. For example, >the string "abcDEF" is less than "abcXA" because D (Unicode 0044) comes before X (Unicode 0058).
>
>If one string runs out of characters before the other, the longer string is greater. Thus, "abcd" is greater than "abc". The >string "abc " (which includes a trailing space) is also greater than "abc".
>
>If both strings run out of characters at the same time, then they are equal.

http://www.microsoft.com/mspress/books/sampchap/6189.asp

-Javin
0
 

Author Comment

by:beaverstone
ID: 10964045
Yes I've made a mistake:
I wrote:
"b" = x42, "ab" = x4142 and "ab" > "b".
But, I ment
"A" = x42, "AB" = x4142 and "AB" > "B".

In your comment your wrote:

"a" & chr$(0) > a
0110000100000000 > 01100001
 24832 > 97

The string on left site of comparision which is "a" & chr$(0)="a\0x00"
 is one character greater than
string on the right site.
According your algorithm, the ch(0)="\0x00" must be added
to the shorter string "a", and then make the comparision.
For example, in former comment  you wrote that
"So if
aaa=011000010110000101100001 and
  aa=011000010110000100000000, then ...",
you added chr(0) to "aa" at the end.
 
Thus before you compare "a" & chr(0) > "a",
the chr(0) must be added to "a" on the right side which
make both strings equal; thus, algorithm will give "false" in case of this comparision.

Whith out addition a chr(0), your algorithm does not work either:

  "A" = x42=66=1000010, "AB" = x4142=10000011000010 and "AB" > "B" which is

incorrect.


>I have absolutely NO clue what you were talking
>about when you got into your argument about
>hexidecimals and flipping string values, but it
>made no sense to me what so ever.  Not even logically.

Flipping string fragment values is a basic thing how Intel platform operates in
"low-level".

In your previous comment, there is a string:

 "abc" = 011000010110001001100011

which is stored in computer memory as you correctly wrote

 011000010110001001100011

 which is x61 x62 x63  in hexadecimal notation.

When CPU takes this string to comparision (in method described in my comment),
the CPU reads from the memory a word. A word in 16 bit platform is a two bytes chunk
of data. CPU cannot take chunk "abc". There is no space in CPU registry to hold that chunk.
Then CPU places in some of its registry this chunk "ab". But the FACT is that
it flips it. So, if in the computer memory this word "ab"=x6162=0110000101100010=24930.
In computer registry, say in ax, this chunk is stored as "ba".
The content of registry ax is a number ax=x6261=0110001001100001=25185.

Then CPU takes the first word of another string which is in right part of comparision
expression, flips it and puts in registry, say bx. Then the program looks like:

      sub ax,bx
      jmz label

This means that CPU substructs numbers ax and bx and jumps to "label" depending
on sign of result.

You can say, we are using now 32bit platform or 64bit platform.
Unlikely, this changes this example in principal; rather
CPU will take 32 bit words and the string "abcd"=1633837924 will be flipped to
"dcba"=1684234849 and then compared.

Thank you.
Beaverstone.





0
 
LVL 4

Expert Comment

by:Javin007
ID: 10966187
Well, in all honesty, I'm a paying member, so I couldn't care less about the points.  But if you don't accept my last answer as the "correct" answer, then you're simply a troll in my book.

-Jaivn
0
 

Author Comment

by:beaverstone
ID: 10979711
Thank you for all your effort Javin.

In addition to this beautiful fragment of Visual Basic .NET which you discovered
I've looked closely to the section "Comparison Operators" in VB4 and Visual Studio 6.0 and found:

   MSDN VS6.0 ... Visual Basic Documentation\ Reference\ Language Reference\ Operators
        scroll to the line:
        Both expressions areString Perform astring comparison.
        open up:
 string comparison
 A comparison of two sequences of characters. Use Option Compare to specify binary or text comparison. In English-U.S.,   binary comparisons are case sensitive; text comparisons are not.

It says "sequences of characters." This is enough good.

The same reference is:
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/veendf98/html/defstringcomparison.asp

All of your answers (plus my comments) form the entire answer.
The closest comment is chosen as accepted.

Thank you very much.
Beaverstone.
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

Introduction While answering a recent question about filtering a custom class collection, I realized that this could be accomplished with very little code by using the ScriptControl (SC) library.  This article will introduce you to the SC library a…
Article by: Martin
Here are a few simple, working, games that you can use as-is or as the basis for your own games. Tic-Tac-Toe This is one of the simplest of all games.   The game allows for a choice of who goes first and keeps track of the number of wins for…
As developers, we are not limited to the functions provided by the VBA language. In addition, we can call the functions that are part of the Windows operating system. These functions are part of the Windows API (Application Programming Interface). U…
Get people started with the process of using Access VBA to control Outlook using automation, Microsoft Access can control other applications. An example is the ability to programmatically talk to Microsoft Outlook. Using automation, an Access applic…

747 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now