16BPP.net
 
 
 
 
 Previous Page   Compression   Next Page 

Compression

Before we start with UDP, I want to talk about compression a little bit.  Transmitting information over the data tubes we call the internet can take a while.  This depends on many different factors that are out of your control like connection speed, where point A and B are (geography), hardware, etc.  But there is one factor that you have sway over and that's the size of the data you want to send.  Using compression, you can make the data you want to send slimmer.

Luckily for us, the .NET Framework includes some compression utilities in the System.IO.Compression namespace.  We're going to look at GZipStream and DeflateStream in particular.  These two streams use the DEFLATE algorithm to compress data in a lossless format, which is implemented under the hood with the ever popular zlib library.

 

 

DeflateStream vs. GZipStream

Looking at the two streams you might be a little confused on which one to use.  They both use DEFLATE underneath, but why does it matter? Well, DeflateStream will only compress your data whereas GZipStream will do that too, but add some extra information (like a CRC) so you can save the data in the resulting stream directly as a .gz file.

Please note that you are not restricted to using only DEFLATE compression in your application.  You can always implement a different algorithm in your code or grab something off of NuGet.  GZipStream and DeflateStream are built in and just the easiest to use.

Below is the source for a program that shows you how to use the two streams on some data from a file.  It also provides a little bit of information about the compressed data:

 

 

Here is some of my output. I used both a .bmp image and a small .mp4 to test it.  It worked very nicely with the .bmp because it's very uncompressed.  Whereas the .mp4 is already compressed in some fashion we couldn't squeeze that much more out of it.

image.bmp uncompressed is 3.1239 MB large.
Compressed using DeflateStream (Optimal): 0.3509 MB [11.23%] in 70ms
Compressed using DeflateStream (Fast): 0.3509 MB [11.23%] in 50ms
Compressed using GZipStream: 0.3509 MB [11.23%] in 46ms

...

film.mp4 uncompressed is 35.7181 MB large.
Compressed using DeflateStream (Optimal): 35.3482 MB [98.96%] in 1176ms
Compressed using DeflateStream (Fast): 35.3482 MB [98.96%] in 1123ms
Compressed using GZipStream: 35.3482 MB [98.96%] in 1155ms

It might look like all three created the exact same compression and performed similar but keep in mind that we did truncate the printed output.  It's very possible that the actual data created by both Fastest and Optimal are different (as well as their true size).

 

So what's going on?

The code here is pretty simple (and duplicated) so there isn't too much to explain.  At top we have a little helper function to computer a file size in Megabytes for us.  In the beginning of the Main() function we read in all of the bytes we want to use for our test file.  At first we apply Deflate/Optimal on it, then Deflate/Fast, and lastly we try to save it to a .gz file.  (Theres also a little bit of benchmarking code too.)

Working with these streams can be a little tricky at first but here's the procedure:

  1. Create a stream (e.g. MemoryStream) where the data will be written to
  2. Create a DeflateStream/GZipStream providing that first stream as the destination stream, specify the mode/level, and we (optionally) tell the Deflate/GZip to keep the destination stream open after the compression stream is closed.
  3. We write the data to the compression stream
  4. We then Close() (and thus Flush()) the compression stream
    • This can be a really odd thing at first to grasp.  The compression stream won't write any data to the destination stream until it has been flushed.  Do this before doing any reading on destination (you'll get nothing if you don't).
  5. Our compressed/uncompressed data is now ready for whatever we want to do with it.

I didn't include an example of decompressing the data.  To do that all you simply need to do is pass in CompressionMode.Decompress as the second argument in the compression streams' constructor.

 

 

To Compress or Not to Compress

There are some things you need to consider before compressing data (and sending it over a network).

  1. Is the data large enough where its even worth it?  Let's say that your original data is about 12 bytes long but compressed its 10 bytes.  Shaving off 16.66% of your data is pretty good, but in the context of only 12 bytes, that's pretty dismal to have a big impact.  If you're sending 12 bytes per millisecond (which is about 11.7 KB/s) that might be more worth it then.
  2. Are you actually saving someone time?  Saving disk space is always a good reason for compression but in terms of networking, time is an important factor to consider.  Using the 12 byte example, let's say that time to compress/uncompress the data will take two seconds, and the network speed is one byte per second (these are obviously not real world numbers).  It would take 12 seconds to transmit the data uncompressed (0 + 12 + 0), but then 16 seconds for the compressed data (2 + 12 + 2).  In this case compressing wastes more time than it saves.  Remember, compression isn't free.
  3. Does it improve the experience of your users?  This is a lot more relevant for people who are making real time applications (such as video games), and this is a bit of a combination of the above two.  Consider how much data is being compressed/decompressed and the time that it takes.  Will the user notice a difference?  Will it improve or interfere with the necessary "real time feel," of your app?
  4. Is the data already compressed in some fashion?  If you want to compress some images, it's very possible that they have already been compressed, and might not be able to be compressed any further (e.g. PNG).  So when you try to apply compression on it, you might end up with a bigger file in the end.

 

 

A Quick Note about MessagePack

When I posted the last section on Reddit, a user commented that serializing the Packet information to MessagePack would have been more efficient than using JSON.  If you don't know what that is, it's essentially a compressed version of JSON (and it's fast too).  So if your sending JSON like information over a network, you might want to take a look into using MessagePack instead.

 Previous Page   Compression   Next Page