An example of Operator Overloading in Delphi XE2

developmentguruPresident
Published:
In my programming career I have only very rarely run into situations where operator overloading would be of any use in my work.  Normally those situations involved math with either overly large numbers (hundreds of thousands of digits or accuracy required) or matrix math.  Since I rarely have need for either, operator overloading has remained one of those areas that I thought would be an interesting area to try to get into.

I recently had another coding requirement.  I mention it here because that need collided with the idea of operator overloading to make me consider operator overloading for an entirely new reason (to me).  I am writing a language parser.  This involves reading in characters one at a time and breaking them into tokens.  A big part of my coding along these lines, in the past, has been to make use of a Set Of Char.  This allows me to write very naturally flowing, easy to ready code such as:

function SkipWhiteSpace(var C : PChar) : boolean;
                      const
                        WhiteSpace = [#9, #10, #13, ' '];
                      
                      begin
                        While (C^ <> #0) and (C^ in WhiteSpace) do
                          inc(C);
                        Result := C^ <> #0;  //fail if at the end
                      end;

Open in new window


This is simply an example of how things might have looked.  There are several things wrong with this code since Delphi 2009 introduced Unicode strings.

If you attempt to use a set of character to do the same kind of test on characters in the newer versions of Delphi you get a warning:

W1050 WideChar reduced to byte char in set expressions.  Consider using 'CharInSet' function in 'SysUtils' unit.

To avoid this, you could change:

(C^ in WhiteSpace)

Open in new window

to
(CharInSet(C, WhiteSpace))

Open in new window


Different varieties of Unicode strings may contain characters that are 1-4 bytes wide.  For this reason, instead of

Inc(C);

Open in new window

you could use
C := StrNextChar(C);

Open in new window


The code could also have been written to make use of streams, keep in mind this is written to focus on the issue at hand.  While there are valid concerns about the code, try to keep focused on where I am heading and leave the analysis for later.

A Delphi set type is limited to 256 elements due to the fact that each element is represented by one bit (resulting in a 32 byte type).  With 65,536 possibilities in a 2 byte character this would result in an 8k set variable since a set requires all of the space, even when empty.

While the new notation works, I find it makes the code less natural to read.  Doing things the old way will generate warnings, you could ignore those, it also limits the sets to using Ansi (8 byte) characters.  I suppose I am like most people in that I wanted to have my cake and eat it too.

To that end I decided to use operator overloading to create a new kind of set.  The idea was to use a record that stores a string containing the characters that are in the set while allowing it to use the normal set operations.  To start with you declare a normal record type, in this case I chose not to use the usual capital T to start the name since this is intended to be a base type and is not a class.

type
                        WideCharSet = record
                          Chars : string;
                          class operator Add(a, b: WideCharSet): WideCharSet;
                          class operator Implicit(a : WideCharSet) : string;
                          class operator Implicit(a : string) : WideCharSet;
                        end;

Open in new window


I started by using a member variable called Chars as a string to hold my list of characters in my set.  The implicit class operators allow me to assign back and forth between a string and a WideCharSet.  I decided to start with the Add class operator as my first test.  When assigning from a string I did not want to just set a string of characters into the Chars member variable.  I wanted to assume I was assigning a string containing something like a normal set declaration.  In this case, to make it easier to work with, I allowed the user to denote characters with single or double quotes.  Here is the code for the Implicit conversion from string to WideCharSet the full source code will be at the end of the article so, for now, don't worry about any missing code.

class operator WideCharSet.Implicit(a : string) : WideCharSet;
                      var
                        I : integer;
                        C, D, E : Char;
                      
                      begin
                        Result.Chars := '';
                        //have to have at least []
                        assert(Length(a) >= 2, 'Invalid string used in convert to SetOfWideChar');
                      
                        I := 1;
                        SkipWhiteSpace(a, I);
                        assert(a[I] = '[', 'Invalid string used in convert to SetOfWideChar, ' +
                          'missing opening square bracket');
                      
                        inc(I);
                      
                        while (I < Length(a)) and (a[I] <> ']') do
                          begin
                            C := ReadChar(a, I);
                            SkipWhiteSpace(a, I);
                            if a[I] = '.' then
                              begin
                                inc(I);
                                if I > Length(a) then
                                  raise Exception.Create(InvalidSetOfWideCharString);
                                if a[I] <> '.' then
                                  raise Exception.Create(InvalidSetOfWideCharString);
                                inc(I);
                                D := ReadChar(a, I);
                      
                                for E := C to D do
                                  InsertSorted(E, Result.Chars);
                              end
                            else
                              InsertSorted(C,Result.Chars);
                      
                            if a[I] = ',' then
                              begin
                                inc(I);
                                if I > Length(a) then
                                  raise Exception.Create(InvalidSetOfWideCharString);
                              end;
                          end;
                      
                        Assert(I <= Length(A), InvalidSetOfWideCharString);
                        Assert(a[I] = ']', InvalidSetOfWideCharString);
                      end;

Open in new window


There is quite a bit to talk about in this method.  I used assertions to enforce the style of the string passed into the conversion routine, that way they could be compiled out easily.  I included the skipping of white space to make it more friendly for developers to use.  Keep in mind that this is a first attempt.  I may change the assertions to exceptions if I find that to be more useful in some situations.

I am doing some parsing in this method.  This allows me to be flexible regarding the precise format of the string being passed in.  To that end the class includes methods such as SkipWhiteSpace and ReadChar.  SkipWhiteSpace just moves past any spaces, tabs, carriage returns, and line feeds.  ReadChar does a bit more than it might seem.  It reads in the definition of a character as defined in a string.  This means either a single character in quotes, a # followed by a decimal number, or #$ followed by a hex definition.  I later realized that just adding the characters to the end of the Chars member variable would not allow me to compare sets.  Realizing this, I added the method InsertSorted.  If all of the characters are added in a sorted manner then, no matter how the set was built, comparisons should work accurately.

Of course I could not define the class to accept strings in and not allow the set to be assigned to a string.  This assignment would require the parsing of the current set of characters to create a minimal definition for the set.  Here is the code for the implicit assignment of a WideCharSet to a string.

class operator WideCharSet.Implicit(a : WideCharSet) : string;
                      var
                        Line : string;
                        I : integer;
                        FirstChar, LastChar : Char;
                        Start : integer;
                        R1, R2 : string; //possible range representations for characters
                      
                      begin
                        Line := '';
                      
                        {Start at the first character and see how far we can go until we run into a
                         non sequential character}
                        I := 1;
                        While I < Length(a.Chars) do
                          begin
                            Start := I;
                            FirstChar := a.Chars[I];
                            repeat
                              inc(I);
                            until (I > Length(a.Chars)) or
                              (ord(a.Chars[I]) <> Ord(FirstChar) + I - Start);
                      
                            LastChar := a.Chars[Pred(I)];
                      
                            if Line <> '' then
                              Line := Line + ', ';
                      
                            R1 := CharToStringRep(FirstChar);
                            Line := Line + R1;
                      
                            if I > Start + 1 then
                              begin
                                R2 := CharToStringRep(LastChar);
                                Line := Line + '..' + R2;
                              end;
                          end;
                      
                        Result := '[' + Line + ']';
                      end;

Open in new window


The only piece of this code I would think might need some extra explanation is CharToStringRep.  Not all characters are legible or identifiable as they are.  Some work much better as a literal numeric definition.  This function returns the character as it should be represented.  The first part inside the loop is used to find the end of a run of characters.  If no run is found then the character is added to the line.  If a run was found then it adds the first character, '..' and the last character to the line.  The process is repeated until all characters have been accounted for.

Next I wanted to be able to test for membership with the IN operator.  

class operator WideCharSet.In(a : Char; b : WideCharSet) : Boolean;
                      begin
                        Result := SortedPos(a, b.Chars) <> 0;
                      end;

Open in new window


While I could have used the Pos function I thought it made more sense to use the same CharSearch function I use when adding to a set.  To that end I added the SortedPos function which acts just like the Pos function except it works on sorted strings of characters, using a binary search.  On large sets it should be a lot faster.  On smaller sets it should be similar speed.  If you know that your code will only be using extremely small sets you might want to switch it to use the Pos function.  Since the CharSearch function is intended to return either the index of the character, or the index of where it should be inserted, the function needed to take a couple of extra precautions (checking the the index was not past the end of the string, and that it matched).

At this point I can already do some basic code using my new type.

Procedure MyTest;
                      var
                        Numeric : WideCharSet;
                        WhiteSpace : WideCharSet;
                        C : Char;
                      
                      begin
                        WhiteSpace := '[#9..#10, #13, " "]';
                        Numeric := '["0".."9"]';
                        C := '6';
                        if C in WhiteSpace then
                          begin
                            //obviously not
                          end;
                      
                        if C in Numeric then
                          begin
                            //Bingo
                          end;
                      end;

Open in new window


Here you can see assignment from strings as well as tests for membership.

From this point, the remainder of the work is in assigning more of the operator overloading class methods.  The ones we still need in order to do most of the same work as a Delphi set of char would be:

    class operator Add(a, b: WideCharSet): WideCharSet;
                          class operator Subtract(a, b: WideCharSet): WideCharSet;
                          class operator Multiply(a, b : WideCharSet) : WideCharSet;
                          class operator Equal(a, b: WideCharSet) : Boolean;
                          class operator LessThanOrEqual(a, b : WideCharSet) : Boolean;
                          class operator GreaterThanOrEqual(a, b : WideCharSet) : Boolean;
                          class operator NotEqual(a, b : WideCharSet) : Boolean;

Open in new window


  Here is a more complete test of the functionality.

Procedure MyTest;
                      var
                        Numeric : WideCharSet;
                        WhiteSpace : WideCharSet;
                        LowerCaseAlpha : WideCharSet;
                        UpperCaseAlpha : WideCharSet;
                        Alpha : WideCharSet;
                        AlphaNumeric : WideCharSet;
                        CommonNumericAlphaNumeric : WideCharSet;
                        CommonUpperAndAlphaNumeric : WideCharSet;
                        AlphaNumericMinusNumeric : WideCharSet;
                        AlphaNumericDefinition : string;
                      
                      begin
                        WhiteSpace := '[#9..#10, #13, " "]';
                        Numeric := '["0".."9"]';
                        LowerCaseAlpha := '["a".."z"]';
                        UpperCaseAlpha := '["A".."Z"]';
                        Alpha := LowerCaseAlpha + UpperCaseAlpha;
                        AlphaNumeric := Alpha + Numeric;
                        CommonNumericAlphaNumeric := Numeric * AlphaNumeric;
                        CommonUpperAndAlphaNumeric := UpperCaseAlpha * AlphaNumeric;
                        AlphaNumericMinusNumeric := AlphaNumeric - Numeric;
                      
                        AlphaNumericDefinition := AlphaNumeric + '["&"]';
                      end;

Open in new window


I will leave it to the reader to go over the individual methods as they are relatively common programming.

One type of operation that this type of set would not be able to do (when compared to a Delphi set) is to use the functions Include and Exclude.  Those could be added as methods to the class if the functionality is desired.  This would not produce equivalent code to the set of char, but it should be familiar enough to provide the functionality well.  I will leave that as an exercise for the reader.

This article should have been able to show you that using operator overloading in Delphi is not difficult, but it does have some limitations.  One is not being able to dynamically create space for member variables.  It is also limited to being used with records (one of the reasons that it cannot dynamically create space for member variables).  The help on operator overloading in Delphi mentions both that it only works with records, and that it can be used with records and classes.  I can tell you that I tried adding an operator overloading method to a class and got a compiler error.

The help on the topic in Delphi XE2 can be found using the help address:
ms-help://embarcadero.rs_xe2/rad/Operator_Overloading_(Delphi).html
 
The help states that you should not provide implicit conversions both from type A to type B and from type B to type A.  In my own code it did not present a problem, but that may be due to the fact that on of the types was a Delphi base type.
uWideCharSet.pas
1
8,104 Views

Comments (0)

Have a question about something in this article? You can receive help directly from the article author. Sign up for a free trial to get started.