What string comparision does Microsoft allows in VB?

Posted on 2004-04-29
Last Modified: 2010-05-02
Where in Microsoft documentation is stated and guaranteed for the future that syntax:

  if "aaa" < "aba" then msgbox "I guess"

is allowed and will output "I guess".

Perhaps, it is hidden in some early documentation for VB3 and
enforced in higher versions by backward compatibility.
Perhaps, there are some official examples from Microsoft.

Please don't make answers like "this works". For example, it is known that
following program will work. This is not a question. The question intends to
find word from the vendor which is most reliable quarantee.

Option Compare Binary
Private Sub Form_Load()

t = "aaaa" < "aaba"
t = t And " aa" < "aaa"
t = t And "aa" > "aA"
t = t And Chr(0) & "aa" < "aaa"
t = t And "aa" < "aaa"
t = t And "" < "a"

f = "aaaa" > "aaba"
f = f Or "aaaa" >= "aaba"
f = f Or "aaa" < "aa" & Chr(0)

If t And Not f Then MsgBox "Works"
'The real output is "Works"

End Sub

Thank you very much.
Question by:beaverstone
  • 12
  • 5

Expert Comment

ID: 10952955

Author Comment

ID: 10953693
Thank you Javin007.

This is very close. But, I don't see that VB allows to compare second, and n-th characters
if first and (n-1)th are equal. All examples which I see restricted to the first character
comparision. (Like printer is before scanner because p is less than s.
But, will be aaa less than aab?)

Thank you.

Expert Comment

ID: 10954446
Actually, it's not even comparing the characters, Beaverstone.  What it's doing is taking the bit values of the whole string:

abc = 011000010110001001100011
aaa = 011000010110000101100001

And it orders the string according to the bits, left to right.

Live: Real-Time Solutions, Start Here

Receive instant 1:1 support from technology experts, using our real-time conversation and whiteboard interface. Your first 5 minutes are always free.


Expert Comment

ID: 10954458
Side note:

It also considers the lack of a bit to be a null string (chr$(0), or 00000000)

So if
aaa=011000010110000101100001 and
  aa=011000010110000100000000, then

aa < aaa


Expert Comment

ID: 10954509
God this thing needs an edit function...

Rereading what I wrote, I noticed that this could be confusing:  "And it orders the string according to the bits, left to right."

I don't mean it orders the string according to the physical bits, but to the total value of the bits.  This is why capitalized letters are considered "smaller" than their counterparts.  Thus a > AA

Strangely enough, microsoft's own Listboxes and controls don't use this method to alphabetize.


Expert Comment

ID: 10954531
:/  I'm not explaining this very well.  It made more sense to me before I tried to explain it.


Expert Comment

ID: 10954562
"Option Compare Binary results in string comparisons based on a sort order derived from the internal binary representations of the characters. In Microsoft Windows, sort order is determined by the code page. A typical binary sort order is shown in the following example:"

I think that's pretty much all you're going to find microsoft saying on the subject.  But suffice to say that comparisons like you were doing are by default binary comparisons.  Maybe someone else can explain it better.  The concept itself is very simple (and basic for most languages, this isn't just a Microsoft thing), but the result is that aab will always be greater than aaa.


Author Comment

ID: 10954982
Thank you Javin, you are doing more work than asked.

The question is not about how Microsof implements string comparision.
The question is not about how to describe ths string comparision algorithm in
equivalent languich of "bytes" or other representaion.

The question is about to find word from Microsoft that it will work as stated in question.
Why Microsoft does not offer and example where second or n-th character are compared?
Why you seems cannot find this example in MSDN, or VB3, or VB4, VB6 documentation?
All examples which I see restricted to the first character
comparision. (Like printer is before scanner because p is less than s.
Perhaps  be aaa less than aab only accidentaly because some Microsoft programmer
did an extra job which is not documented and not intended to be supported?)

Althoug particular implementation of the algorithm is not a subject of the question, but
for accuracy let me note that adding a char(0) at the end seems incorrect as shows this

If "aa" < "aa" & Chr(0) Then MsgBox "Not At the end."
If "aa" > Chr(0) & "aa" Then MsgBox "Not At the beginning"
If Chr(0) > "" Then MsgBox "Not chr(0)"

Perhaps the rule is that any char at the end is greater than unfilled position at the end.
But again, does MS guarantees this detais along with entire algorithm?
The only reason that we can rely on this is that "common sence" suggestes this algorithm,
and MS may be forced to follow common practice in order to be "in the crowd".

Thank you.


Expert Comment

ID: 10955522
I don't believe you're going to find ANY documentation on this, though, because it's not a Microsoft implementation.  It's something that comes with any language you use.  C++, C#, Delphi, Basica, etc.  When a language looks at the string, it sees an array of bytes, not a string.  More accurately, it sees a very long string of bits.  Thus, the language DOESN'T see it when you compare "abc."  It doesn't SEE abc.  What it sees is the bitvalue equivilent of abc, which is 6513249.  That's why I was trying to explain how it works, and why you won't find the "guarantee" you're wanting.  Microsoft would spend no more time explaining that function than they would spend explaining the low-level details of what AND does.


Author Comment

ID: 10962633
Thank you Javin for your comments.

The idea described in your explanation
"It's something that comes with any language you use" perhaps right.
I understand this idea that there is certain "unspoken agreement" or
"programming culture" to generalize string comparision algorigthm including all the characters
not only the first. You probably trying to point that Microsoft implicity follows
this culture. But, if so, this culture must have traces in magazins, journals,
documentation or examples.
This is what the part of real job of  programmer is - read documentation.

Nothing can force Microsoft to follow this culture.
Even if there were were "low-level requirements", nothing can prevent programmer
from using getchar C-function, or "sub" Assembler instruction to
implement character by character or word by word string comparision.

Your explanation tried to support that algorithm by idea of string as an "udivided entity",
In particular, your explanation tries to model string as a number.
and state that programming language compares numbers not a characters.
That particular model is incorrect. It is a Myth which sometimes programmes have
to create in the mind to picture the backend of the system which low-level they don't
know and do not have to know.

Inded: if to follow this model, "b" = x42, "ab" = x4142 and "ab" > "b".
(xNN - hexadecimal representation). It to try
to fix this flaw in the model by adding char(0) at the end, then the model still does
not work:  Compare s1="b" and s2="b" & chr(0). If to add char(0) to the s1 to
make it equal in length, then the model give s1 >=  s2, which is not the case in VB.
(I've pointed to this in previous comment, but this seems ignored.)

Finally, there are absolutely no low-level requirements to follow this algorithm.
Inded: if string in the memory of 16bit -Intel based computer is
then one of the ways to submit this string  to CPU for comparision is to read this
string from memory via computer bus and use CPU instruction "subscruct".
In this case, string is read in sequence:

  B A   D C   E  

Not only string is split by two-byte fragments, each fragment is filipped.
And, C and Assembler programmer must make loop via words; moreover,
handling inside of the loop will be different for round word and half word,

I am returning to my question:
what I expecting from Expert is to find evidence of algorithm in literature, or
disassemble the piece of VB string comparision program to
reveal the truth (which is still occasional; the word from vendor
is much reliable).

Thank you.


Expert Comment

ID: 10962671
Well, then I bow out of this one.  Because as I've said, I don't think it exists.


Expert Comment

ID: 10962802
By the way, I don't see your logic behind the argument that my explanation of the bits comparison "doesn't work" with the added Chr$(0)

All of the statements are the exact same, with the exact same values, and all are true:

"a" & chr$(0) > a
0110000100000000 > 01100001
 24832 > 97

I have absolutely NO clue what you were talking about when you got into your argument about hexidecimals and flipping string values, but it made no sense to me what so ever.  Not even logically.


Expert Comment

ID: 10963841
As you can tell, this question has been bugging me.  :)  I hate not getting a satisfactory answer, and it won't leave me alone.

Maybe this is what you're looking for:

Accepted Solution

Javin007 earned 500 total points
ID: 10964014
Well, I've exhausted the search engines on both Microsoft, and Google, and have determined that you won't find an authoritative answer on WHY Binary String Comparison does what it does.  This would be the equivilent of asking someone to explain why 2+2=4 on a binary level.  People just seem to assume nobody's going to ask that question.  So in answer to your question:

"Where in Microsoft documentation is stated and guaranteed for the future that syntax:"

The answer is simply, nowhere.  The syntax you're asking about is basic binary string comparison.  I've searched high and low through microsoft, and there's no explanation as to HOW or WHY binary string comparison works.  

The closest you are going to find is the following, where microsoft quotes from a book ("Faster Smarter Beginning Programming" by Jim Buyens):

>Comparing Strings
>When comparing two strings, Visual Basic .NET starts by comparing the first character of each operand, then the next >character of each operand, and so forth, until it finds two unequal characters or until one string runs out of characters.
>If it finds two unequal characters, the result of comparing them becomes the result of the entire operation. For example, >the string "abcDEF" is less than "abcXA" because D (Unicode 0044) comes before X (Unicode 0058).
>If one string runs out of characters before the other, the longer string is greater. Thus, "abcd" is greater than "abc". The >string "abc " (which includes a trailing space) is also greater than "abc".
>If both strings run out of characters at the same time, then they are equal.


Author Comment

ID: 10964045
Yes I've made a mistake:
I wrote:
"b" = x42, "ab" = x4142 and "ab" > "b".
But, I ment
"A" = x42, "AB" = x4142 and "AB" > "B".

In your comment your wrote:

"a" & chr$(0) > a
0110000100000000 > 01100001
 24832 > 97

The string on left site of comparision which is "a" & chr$(0)="a\0x00"
 is one character greater than
string on the right site.
According your algorithm, the ch(0)="\0x00" must be added
to the shorter string "a", and then make the comparision.
For example, in former comment  you wrote that
"So if
aaa=011000010110000101100001 and
  aa=011000010110000100000000, then ...",
you added chr(0) to "aa" at the end.
Thus before you compare "a" & chr(0) > "a",
the chr(0) must be added to "a" on the right side which
make both strings equal; thus, algorithm will give "false" in case of this comparision.

Whith out addition a chr(0), your algorithm does not work either:

  "A" = x42=66=1000010, "AB" = x4142=10000011000010 and "AB" > "B" which is


>I have absolutely NO clue what you were talking
>about when you got into your argument about
>hexidecimals and flipping string values, but it
>made no sense to me what so ever.  Not even logically.

Flipping string fragment values is a basic thing how Intel platform operates in

In your previous comment, there is a string:

 "abc" = 011000010110001001100011

which is stored in computer memory as you correctly wrote


 which is x61 x62 x63  in hexadecimal notation.

When CPU takes this string to comparision (in method described in my comment),
the CPU reads from the memory a word. A word in 16 bit platform is a two bytes chunk
of data. CPU cannot take chunk "abc". There is no space in CPU registry to hold that chunk.
Then CPU places in some of its registry this chunk "ab". But the FACT is that
it flips it. So, if in the computer memory this word "ab"=x6162=0110000101100010=24930.
In computer registry, say in ax, this chunk is stored as "ba".
The content of registry ax is a number ax=x6261=0110001001100001=25185.

Then CPU takes the first word of another string which is in right part of comparision
expression, flips it and puts in registry, say bx. Then the program looks like:

      sub ax,bx
      jmz label

This means that CPU substructs numbers ax and bx and jumps to "label" depending
on sign of result.

You can say, we are using now 32bit platform or 64bit platform.
Unlikely, this changes this example in principal; rather
CPU will take 32 bit words and the string "abcd"=1633837924 will be flipped to
"dcba"=1684234849 and then compared.

Thank you.


Expert Comment

ID: 10966187
Well, in all honesty, I'm a paying member, so I couldn't care less about the points.  But if you don't accept my last answer as the "correct" answer, then you're simply a troll in my book.


Author Comment

ID: 10979711
Thank you for all your effort Javin.

In addition to this beautiful fragment of Visual Basic .NET which you discovered
I've looked closely to the section "Comparison Operators" in VB4 and Visual Studio 6.0 and found:

   MSDN VS6.0 ... Visual Basic Documentation\ Reference\ Language Reference\ Operators
        scroll to the line:
        Both expressions areString Perform astring comparison.
        open up:
 string comparison
 A comparison of two sequences of characters. Use Option Compare to specify binary or text comparison. In English-U.S.,   binary comparisons are case sensitive; text comparisons are not.

It says "sequences of characters." This is enough good.

The same reference is:

All of your answers (plus my comments) form the entire answer.
The closest comment is chosen as accepted.

Thank you very much.

Featured Post

Gigs: Get Your Project Delivered by an Expert

Select from freelancers specializing in everything from database administration to programming, who have proven themselves as experts in their field. Hire the best, collaborate easily, pay securely and get projects done right.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

The debugging module of the VB 6 IDE can be accessed by way of the Debug menu item. That menu item can normally be found in the IDE's main menu line as shown in this picture.   There is also a companion Debug Toolbar that looks like the followin…
Since upgrading to Office 2013 or higher installing the Smart Indenter addin will fail. This article will explain how to install it so it will work regardless of the Office version installed.
As developers, we are not limited to the functions provided by the VBA language. In addition, we can call the functions that are part of the Windows operating system. These functions are part of the Windows API (Application Programming Interface). U…
Show developers how to use a criteria form to limit the data that appears on an Access report. It is a common requirement that users can specify the criteria for a report at runtime. The easiest way to accomplish this is using a criteria form that a…

786 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question