vspacer
 
vspacer
 

Microsoft's C# Language Implementation

 

ZDS Languages
ByDave Goodall

C# is a generally well concieved modern OO language but Microsoft's implementation is not without it's quirks and rough spots. Dave's tour of the low lights..



Strings

It is a truism that computers don't compute.

They just shuffle and transform strings from one place to another.

Consequently, the most fundamental issue in developing a language system that executes fast and makes mimimal use of system resources is an efficient implementation of the string class.

Unfortunately... the .NET Framework System.String class ( or it's alias string) is immutable.

Once set, the content of a String object cannot be modified. Methods that APPEAR to modify a string's value succeed but actually destroy the old String object and return a new string containing the modification.

This behavior is a gross waste of resources because if you make modifications to a string, you end up churning several generations of copies of the original though the heap.

Instead of fixing the original bad behaviour Redmond have created yet another class as a bandaid : the StringBuilder class in the System.Text namespace. This essentially does what the string class should have done in the first place : do an initial alloc on the heap, modifications in-place, and re-alloc's and copying ONLY as necessary for extensions.

Budi Kurniawan has as useful article on Efficient String Manipulations with StringBuilder

The string class in the .NET Framework is sealed. You can't derive from it, (which is just as well given it's implementation), If you need string method extensions you should derive from the StringBuilder class.

The system performs sneaky string interning with the un-brained notion of improving bind-time space efficiency (minimally if at all - how many identical constant strings do you define in your programs?) at the major cost of run-time (heap) space and (object generation and scavenging) time inefficiency.

For more on this see Inside C# by Tom Archer and Andrew Whitechapel.

For an actual example : PadRight does not work as you would expect:

  string s = "123456";
  s.PadRight(12);
  Before            : s=[123456]
  Intuitive result  : s=[123456      ]
  Actual result     : s=[123456]

To cure this, if we just use strings, we have to generate more generations of strings under the covers, e.g in the example below generate (r times c) 'sp' string objects to hold padded strings, thoroughly thrashing the heap.

  string s = " ";
  for (r = 1 ; r < wTable.Rows.Count; r++ ) {
       for ( c = 0 ; c < wTable.Columns.Count; c++ ) {
           s = (String)wTable.Rows[r][c];
           string sp = s.PadRight( wTable.Columns[c].MaxLength);
           Debug.WriteLine("ROW c after pad s=[" + s + "]  sp=[" + sp + "]");
       }
  }

And the moral is : Don't use strings: Use StringBuilder objects instead.

There's a useful article at http://www.dotnetjunkies.com on the performance gains.

You'll need to register and then search for tutorialid=427.


StringBuilder

blue_bulletstring to StringBuilder conversions

The existence of the (hopefully) efficiently implemented StringBuilder band-aid class is not per se going to get you out of jail free.

Practically every .NET provided class deals in the basic string object type, so you are still going to incur the conversion overhead of constructing and destroying on the heap first a string object, then a StringBuilder object, and then back to a string object again.

You can't simply assign a StringBuilder to a string or vice-versa. Instead you have to use the clumsy syntax:

  myStringContent.Insert(0, myStringBuilder.Value.ToString());

blue_bulletHow long is a piece of StringBuilder?

If we start with a range which the value of which is:

  objRange.Value=[  ]

A string assignment results in the same thing being stored:

  sContent = objRange.Value.ToString();
  sContent len=[2] value=[  ]

But a string to StringBuilder assignment stores a whole lot more!

  sbContent.Insert(0, objRange.Value.ToString());
  sbContent len=[112802] value=[ a whole LOT of stuff  .....

Surprisingly (or maybe not)StringBuilder only has two 'copy' methods available : Insert and Append.

The behaviour we expected when we initialized the new StringBuilder object with Insert() from 0 was that the Length property would be set to the actual length. Instead it seems to be left at the end of the heap (which has interesting implications for pooching ...)

Sorry, it's on you to set the length explicitly. A Length() method is not provided.

Irritatingly the tool tip says sbContent.Length 'gets or sets the length of this instance' when all a property can do is expose ie. get an attribute.

Instead direct assignment (which, give them that, is efficient) must be used to alter the StringBuilder length property.

  sbContent.Length = sContent.Length;

Syntax Annoyances

blue_bulletGotta be different #1

  wTable.bProperty = TRUE;  // The name 'TRUE' does not exist in the namespace
  wTable.bProperty = 1      // Constant value '1' cannot be converted to a 'bool'
  wTable.bProperty = true;  // Works

blue_bulletGotta be different #2

This passes the compiler but returns 'fozzie=[%s]'

Console.WriteLine(" foo=[%s]", args[0]);

This returns what you would expect : 'fozzie=[bear]'

  Console.WriteLine(" fozzie=[" + args[0] + "]");

blue_bulletGotta be different #3

Unlike 'c', args[0] is NOT the program name but the first argument.


Poor/Broken implementations

blue_bulletThe Split method of the String class is broken.

If the string to be split contains substrings that are separated by multiple instances of a delimiter character, the second and up instances are wrongly parsed out as tokens.

For example, if we are using space a delimeter, and have more than one space between any of the words in the string, we'll get these results:

  string u = "Once   Upon A Time In   America";
  char[] sep3 = new char[]{' '};
  foreach (string ss in u.Split(sep3))
  Console.WriteLine(ss);

Here's the output:

Once
          ... a blank line
          ... a blank line
Upon
A
Time
In
          ... a blank line
          ... a blank line
America

So.. because of an egregiously obvious bug that even mimimal testing would have caught, everyone has to write their own version of strtok.

blue_bulletDataTable class implementation

As a DataTable is filled, the maximum character width of the each column in the table could easily have been tracked and, for virtually no cost, when the table was filled or updated, updated into the MaxLength atribute of DataColumn member objects of the DataColumnCollection object.

Instead, (DataSet) wTable.Columns[c].MaxLength is set to -1 = (we know but we ain't gonna tell you) for all the table columns, making it necessary for the poor user to crawl through the completed table to figure this out for hisself.

This information is going to be very frequently used when processing DataTables to text displays and reports, and should have been supported as default behaviour of the DataTable object's implementation.


Seriously evil things

blue_bulletRun time errors resulting from the compiler failing silently

This compiles without a murmur:

  string sContent =  objRange.Value2.ToString();

but at run-time the program aborts with message:

  Object reference not set to an instance of an object'.

What this means is that at run-time if the spreadsheet cell is empty the right hand expression evaluates not to an empty string object which is what you would expect but to a NULL, and instead of a string object being created with an internal empty string value we get this funky run-time message and a run-time failure.

This is a bad compiler oversight. If the compiler knows that an expression being used to instantiate a new object can evaluate to NULL the at the very least it should issue a warning, NOT pass this silently.

This also means that it's on you as a programmer to check for NULL and create the empty string object the compiler should have done.

Since we don't in general know what expressions may evaluate to null and which to empty objects this is not an easy proposition.

Something is bound to get missed and lead to run-time failures, since the compiler puts the onus entirely on the programmer to anticipate possible NULLs and handle them.

This code style should handle the problem but won't:

 // if ( objRange.Value.ToString() == null )
 //    sContent = "";
 // else
 //    sContent =  objRange.Value.ToString();

Instead you must use this style:

  try {
      sContent =  objRange.Value.ToString();
  }
  catch( NullReferenceException e ) {
      Console.WriteLine( "Caught error: {0}.", e);
      sContent = "";
  }
 

The run-time error output of this is as follows:

  Caught error: System.NullReferenceException:
  Object reference not set to an instance of an object.
  at ExcelUnicodeConsole.Class1.Main(String[] args) in
  ...class1.cs:line 255.

blue_bulletDefinite assignment

The compiler will generate a 'Use of unassigned local variable wTable' for any variable that has not been assigned.

This is very, very, evil when all you want to declare is a reference!

An additional refinement of this torture is that the compiler evidently evaluates ALL paths that may be taken before the variable is used.


   

Back to top | ZDS Home | This article updated September 12, 2002.