http://www.codeguru.com/csharp/csharp/cs_data/streaming/article.php/c4223/Streams-and-NET.htm
In this article I will show you the classes the .NET provides to use streams. I will start by looking at basic stream access, which will lead me to explain encoding and stream readers and writers, serializing objects to streams and finally I will explain how to create a stream.
Row, row, row the boat...
Let's start right at the beginning: the System.IO.Stream class. This is an abstract class that defines the basic functionality that should be implemented by a concrete stream class. This class has properties to determine what you can do to the stream (is it readable, writeable; does it support random access?), and information about the size of the stream and the current seek position in the stream. Stream has methods to read and write single bytes and arrays of bytes; if you chose to read or write arrays of bytes, this can be done synchronously or asynchronously.
A stream can be used like this (assuming that GetInputStream() and GetOutputStream() are methods that return stream references):
- Stream outStr = GetOutputStream();
- byte[] outBuf = new byte[7]{82, 105, 99, 104, 97, 114, 100};
- outStr.Write(outBuf, 0, outBuf.Length);
- outStr.Close(); // don't need it any more
- Stream inStr = GetInputStream();
- byte[] inBuf = new byte[(int)inStr.Length];
- inStr.Read(inBuf, 0, inBuf.Length);
- inStr.Close(); // don't need it any more
(Out of interest, notice that Stream.Length is a long, whereas the value passed to Array.CreateInstance() when declaring the size of the array with new, is an int, so I have to do an explicit cast.)
In this code, it does not matter what the stream is based on (it could be a file or a socket, for example), the same methods are used. Notice that I call Close() as soon as I have finished using the stream. This is generally a good practice with .NET because it ensures that resources are not held longer than they are required.
For example, if the outStr reference is a stream based upon a file, the file will be open until Close() is called, and typically this will mean that there will be an exclusive lock on the file, preventing other code from accessing the file. Also, file streams are buffered, and so any data you write will not be written to the file until the stream is flushed - Close() will do this. You could argue that when the stream is garbage collected the lock on the file will be released, however, in most cases you do not know when the stream will be garbage collected, and unless you explicitly tell the garbage collector to do its work, this will be when an allocation fails due to a lack of memory - hopefully an event that will occur only occasionally. It is far better to explicitly indicate that you are finished with the stream by calling Close().
Byte-ing the Bullet
Dealing in bytes is a bit of a pain: have you noticed what the data in the outBuf example above represents? It is the ASCII text "Richard". The base class library helps you to create buffers of bytes, but before I talk about these classes I need to point out that byte and char are not the same, a C# byte (System.Byte) is a single, unsigned byte, whereas a C# char (System.Char) is a UNICODE character, 2 bytes. As you can see from the code above, this makes writing strings to streams very inconvenient, what is needed is a class to convert a string to a byte array.
The System.Text namespace has a class called Encoding that allows you to convert between chars, bytes and strings. There are actually several encoding classes, used to convert to ASCII, UNICODE (big endian and little endian), UTF7 and UTF8. The Encoding class has static properties that will return a reference to one of these classes. For example, if I am interested in converting an array of bytes that represents an ASCII string to a System.String I can use the ASCII property of Encoding to return a reference to an ASCIIEncoding class:
- byte[] buf = new byte[7]{82, 105, 99, 104, 97, 114, 100};
- string str;
- str = System.Text.Encoding.ASCII.GetString(buf);
In this code, str will be initialised with the string "Richard". This covers making the data read from a stream useful, but what about writing data to streams? The appropriate Encoding classes have methods for that too:
- string str = "Grimes";
- byte[] b = new byte[str.Length];
- Encoding.ASCII.GetBytes(str.ToCharArray(),
- 0,
- str.Length,
- b,
- 0);
So, now you are happy, you can read strings from streams and write strings to streams. But are you completely happy? The code looks rather cluttered, and anyway what about other data types?
Readers and Writers
To make life much easier for you the designers of the base class library have provided reader and writer classes. These are based on streams and allow you to read and write data types other than arrays of bytes. These classes can be found in the System.IO namespace:
Class |
Base Class |
Description |
BinaryReader |
Object |
Allows you to read data from a stream as the various base class data types |
BinaryWriter |
Object |
Allows you to write data to a stream as the various base class data types |
StreamReader |
TextReader |
Allows you to read data from a stream as lines or characters, you can specify the encoding or allow the class to determine it. The class can also open a stream based on a file. |
StreamWriter |
TextWriter |
Allows you to write data to a stream as lines or characters. The class can also open a stream based on a file. |
The names are a little misleading because they are both used with streams, and they both convert between binary data and .NET data types. The StreamReader/Writer classes allow you to treat a stream as a series of characters arranged in lines, thus, given a stream reference in the variable stm, you can do this:
- StreamReader reader;
- reader = new StreamReader(stm, Encoding.ASCII);
- string str;
- do
- {
- str = reader.ReadLine();
- Console.WriteLine(str);
- }while (str != null);
The StreamReader class has many constructors, and the one that I have chosen takes a stream and an encoding class. The StreamReader class doesn't have to be created on a stream, indeed, you can create the object based on a file by giving the name of the file:
- StreamReader file;
- file = new StreamReader("test.txt");
This opens the file test.txt and provides access to it via the StreamReader. I will return later to the issue of opening a stream based on a file. If you want, you can allow the StreamReader class to determine the encoding of the stream. To do this you should call the constructor that takes four parameters:
- StreamReader file;
- file = new StreamReader("test.txt",
- Encoding.ASCII,
- 1024,
- true);
The first parameter is either an existing stream or the name of the file (as in this example), the second parameter is the default encoding that should be used if the class cannot determine the encoding to use, the third specifies the size of the buffer, and the final parameter is a boolean. If this boolean is true, the StreamReader class will read the first three bytes of the stream to determine the encoding to use. If it cannot determine the encoding it uses the default that you pass to the constructor.
The StreamWriter works similar to the StreamReader. The writer class, however, allows you to write data as lines, characters and as strings to the stream.
The BinaryReader and BinaryWriter classes are created only on streams, so if you want to base them on a file, you have to open a stream on a file using IO.File.OpenRead() or IO.File.OpenWrite() as explained later. These classes have a plethora of Read and Write methods, one for each of the .NET basic data types. There are a few things to note about these classes.
The BinaryWriter class has two methods for writing strings: the overloaded Write() method and the WriteString() method. The former writes the string as a stream of bytes according to the encoding the class is using. The WriteString() method also uses the specified encoding, but it prefixes the string's stream of bytes with the actual length of the string. Such prefixed strings are read back in via BinaryReader.ReadString().
The interesting thing about the length value it that as few bytes as possible are used to hold this size, it is stored as a type called a 7-bit encoded integer. If the length fits in 7 bits a single byte is used, if it is greater than this then the high bit on the first byte is set and a second byte is created by shifting the value by 7 bits. This is repeated with successive bytes until there are enough bytes to hold the value. This mechanism is used to make sure that the length does not become a significant portion of the size taken up by the serialized string. BinaryWriter and BinaryReader have methods to read and write 7-bit encoded integers, but they are protected and so you can use them only if you derive from these classes.
Formatters
Using the readers and writers mentioned in the last section you can read and write the various .NET data types to and from streams. However, they do not take into account perhaps the most important data type to pass through a stream: objects.
How do you serialize an object to a stream? One option would be to add a ToString() method onto your object that converts the object's state to a string by converting each field to a string and concatenating them. You can then use BinaryWriter.WriteString() to serialize the object's state to a stream. To allow you to read an object from a stream you will have to have a constructor on the object that takes a System.String parameter and in this constructor extract the values from the string to initialize the object's fields. There are two main problems to this: firstly, the constructor will have to parse the string to extract the field values and this is not a trivial task; secondly, it uses up a constructor and the ToString() method, so a naove reader could try to pass their own string to the constructor or pass an object to a method that requires a string, like Console.WriteLine(), which will trigger a call to ToString(); in both cases the your code will not be used in the way it is intended.
The solution is to use .NET serialization and a formatter object. As the name suggests serialization means that an item is serialized into a stream of bytes - just what we are looking for. It is the formatter class that does this work, but it needs some information to know what it should serialize. This information is metadata, and is controlled by the attributes [Serializable] and [NonSerialized]. The [Serializable] attribute can be applied to classes, delegates, enums and structs, it effectively sets the serializable metadata for all fields in the item that it is applied. If you decide that some fields should not be serialized (for example they correspond to temporary or intermediate values) then you can turn off the serializable metadata by applying the [NonSerialized] attribute to the field. For example:
- [Serializable]
- public class Point
- {
- private double xVal;
- private double yVal;
- [NonSerialized] private double len = 0;
- public Point(int x, int y)
- {
- xVal = x;
- yVal = y;
- }
- public double x{get{return xVal;}}
- public double y{get{return xVal;}}
- public double Length{
- get{
- if (len == 0)
- len = Math.Sqrt(x*x + y*y);
- return len;
- }
- }
- }
This represents a read-only class that represents a point; it has three properties, the x, and y coordinate and the length of the vector from the origin to the point. These properties are based on three fields: xVal, yVal and len. When the Length property is accessed the code checks to see if it is zero, in which case, the length is calculated and cached in the field len. Because the vector length is calculated, there is no reason to serialize it and so it is marked with the [NonSerialized] attribute. The class is used like this:
- Point p1 = new Point(1, 2);
- Point p2 = new Point(3, 4);
- Point p3 = new Point(5, 6);
- BinaryFormatter bf = new BinaryFormatter();
- bf.Serialize(stm, p1);
- bf.Serialize(stm, p2);
- bf.Serialize(stm, p3);
- str.Close();
stm represents some stream that has been opened for writing. The BinaryFormatter object reads the metadata on the fields of an object and if the serializable metadata is set, the field is serialized to the stream. The interesting point to note is that the fields can be private, and yet the BinaryFormatter class can still obtain the value of the field and serialize it.
Reading an object from a stream also involves a BinaryFormatter object:
- Point p4, p5, p6;
- p4 = (Point)bf.Deserialize(str);
- p5 = (Point)bf.Deserialize(str);
- p6 = (Point)bf.Deserialize(str);
The Deserialize() method creates an instance of the object that was serialized into the stream and initializes the fields that are not marked with [NonSerialized] with the serialized values. The fields that are marked with [NonSerialized] are given a value of zero appropriate to that data type. Again, the fields of the class can be private, and yet the BinaryFormatter is still able to write to them. Clearly this class has code which normal C# programmers are not permitted to write.
The question remains, how does Deserialize() know what class the stream holds? The reason is that the Serialize() method places in the stream the name of the assembly, the complete name of the class and its version, the names of the fields that are serialized and finally, the values of those fields. This information is determined by a class called SerializationInfo, and if you want to control this you can implement the ISerializable interface on your object. This interface has a single method called GetObjectData() which allows you to determine the information that is serialized to the stream. This interface is called during serialization. It is unusual because as an interface it requires that your class implements the items in the interface and it requires that your class also implements a specific constructor. This constructor takes a SerializationInfo reference that contains the values that were serialized, and this constructor is called during deserialization. The issue of object serialization and deserialization is very interesting, but unfortunately I don't have the space to go into further details.
As I have already mentioned, the stream that you use can be one of many types, and if it is based on a HTTP socket the stream could be used to pass the object via SOAP. To accommodate this, System.Runtime.Serialization.Formatters.Soap namespace provides the SoapFormatter class. Instead of serializing an object as a stream of bytes in a binary format, this class provides a SOAP-compliant XML representation of the object. To use this all you have to to the previous code is replace BinaryFormatter with SoapFormatter.
Formatters will serialize an entire graph of objects. By graph I mean that if the object you pass to Serialize() has fields that are references to other objects and those objects have references to other objects, then all objects will be serialized. A graph is not always as simple as this, because two objects may refer to the same object and this presents a problem when the base object is deserialized, because it should only create one instance for this shared object. This is accomplished with a class called the ObjectManager which keeps track of all objects as they are deserialized, and so if a request is made to deserialize an object that has already been deserialized, the existing instance will be used. The ObjectManager and its associated classes are flexible and configurable, and again, I will leave a complete description to another time.
Streams
Finally I come to the issue of how to obtain a stream. The IO.Stream class is abstract and the following table lists some of the more common classes derived from it:
Class |
Description |
FileStream |
A buffered stream based on a disk file |
NetworkStream |
An unbuffered stream based on a socket |
BufferedStream |
A wrapper class that adds buffering to an existing unbuffered stream |
MemoryStream |
A stream based on memory |
In addition to these, there are streams returned by classes in the System.Data and System.Data.SQL namespaces.
How these streams are created depend upon the class that is creating them. For example, a FileStream object is created by the static methods IO.File.OpenRead() and IO.File.OpenWrite(),
- StreamReader read;
- read = new StreamReader(File.OpenRead("sourcefile.txt"));
- StreamWriter write;
- write = new StreamWriter(File.OpenWrite("destfile.txt"));
- // copy one file to the other, adding line numbers
- int line = 0;
- while (true)
- {
- string str = read.ReadLine();
- if (str == null) break;
- line++;
- write.WriteLine("{0:D4} {1}", line, str);
- }
- read.Close();
- write.Close();
The read and write references are based on files and the while loop reads each line from read prefixes it with a line number and writes it to the destination file through the write reference.
A NetworkStream is returned by calling TCPClient.GetStream() on the client-side of a socket:
- // attach to socket 2048 on the local machine
- TCPClient client = new TCPClient("localhost", 2048);
- // get the NetworkStream and wrap it in a BinaryWriter
- BinaryWriter writer = new BinaryWriter(client.GetStream());
On the other hand, the developer explicitly creates a NetworkStream on the server-side of the socket by passing the underlying socket as a constructor parameter to NetworkStream:
- // listen on port 2048
- TCPListener listener = new TCPListener(2048);
- // Socket is returned when the client connects
- Socket socket = Listener.Accept();
- // create a stream
- NetWorkStream str = new NetworkStream(socket);
- // wrap it up in a BinaryReader
- BinaryReader reader = new BinaryReader(str);
One final example, the System.Net.WebResponse class represents a response from a server, once you have a WebResponse object you can ask it for a stream, and access the data of the response via the stream:
- WebRequest req = null;
- req = WebRequestFactory.Create("http://www.microsoft.com/");
- WebResponse resp = req.GetResponse();
- StreamReader reader = new StreamReader(resp.GetResponseStream());
- while(true)
- {
- string str;
- str = reader.ReadLine();
- if (str == null) break;
- Console.WriteLine(str);
- }
This will print out the HTML of the default page on www.microsoft.com.
And Finally...
Throughout this article you may have noticed that the stream reader and writer classes have a similarity to the static methods of the System.Console class. The reason is that Console implements three streams which are accessed through three static properties called Error, In and Out: In is a TextReader, and the other two are TextWriters. This means that you can choose at runtime where the output from your application goes, and where the input comes from, for example:
- TextWriter tw = null;
- if (bToLogFile)
- tw = new StreamWriter(File.OpenWrite("myapp.log"));
- else
- tw = Console.Out;
- tw.WriteLine("text for the current output stream");
- tw.Flush();
The bToLogFile can be set at runtime. The Console class has another way to set the input and output streams with methods called SetOut() (that take a TextWriter parameter) and SetIn() (that takes a TextReader parameter). Once you have called SetOut() it means that every call you make to the static Console.WriteLine() or Console.Write() will go to your specified stream. Notice that if the output stream is based on a FileStream the writes will be buffered, so you have to call either Flush() or Close() to flush the buffer to the file.