GZipStream and buffering - DOTNET

This is a discussion on GZipStream and buffering - DOTNET ; I wrote a client / server networking application and all communication between the two is compressed using GZipStream. However, I found a wierd buffering problem. Despite how I had the network stream configured, buffering would still occur (on a small ...

+ Reply to Thread
Results 1 to 5 of 5

GZipStream and buffering

  1. Default GZipStream and buffering

    I wrote a client / server networking application and all communication
    between the two is compressed using GZipStream. However, I found a wierd
    buffering problem. Despite how I had the network stream configured,
    buffering would still occur (on a small scale). I pinpointed the problem to
    GZipStream. Basically it seems to buffer content and calling Flush has no
    effect.

    Below I wrote a small sample application which demonstrates this. Basically
    in two CMDs call "App.exe L" to listen, and just "App.exe" to start the
    client. You will notice the GZipStream is buffering. Simply uncommenting
    "//#define DONT_USE_GZIP_STREAM_READ" and "//#define
    DONT_USE_GZIP_STREAM_WRITE" will cause the application to begin working. I
    understand I am not getting any gain from compression here because it is just
    1 byte. That is besides the point - this is just a demonstration of the
    problem (the original app has sizable messages to transmit). Regardless of
    size, GZipStream.Flush should flush data to the network stream and it does
    not. Also, I know it is on the sending side because if I only uncomment
    "//#define DONT_USE_GZIP_STREAM_READ" then every couple seconds the stream
    does not read anything implying nothing was sent over the network.

    How do I get this demo app to work as expected?

    Thanks!

    // BUGGY CODE USED FOR DEMONSTRATION ONLY
    //#define DONT_USE_GZIP_STREAM_READ
    //#define DONT_USE_GZIP_STREAM_WRITE
    using System;
    using System.IO.Compression;
    using System.Net;
    using System.Net.Sockets;

    class Program
    {
    static void Main(String[] args)
    {
    if (args.Length > 0 && args[0] == "L")
    {
    TcpListener listener = new TcpListener(IPAddress.Loopback, 40);
    listener.Start();
    using (TcpClient client = listener.AcceptTcpClient())
    {
    client.NoDelay = true;
    client.ReceiveBufferSize = 1;
    #if DONT_USE_GZIP_STREAM_READ
    NetworkStream stream = client.GetStream();
    #else
    GZipStream stream = new GZipStream(
    client.GetStream(), CompressionMode.Decompress, false);
    #endif
    while (true)
    {
    Console.WriteLine("{0}", stream.ReadByte());
    }
    }
    }

    else
    {
    using (TcpClient client = new TcpClient())
    {
    client.Connect(IPAddress.Loopback, 40);
    client.NoDelay = true;
    client.SendBufferSize = 1;
    #if DONT_USE_GZIP_STREAM_WRITE
    NetworkStream stream = client.GetStream();
    #else
    GZipStream stream = new GZipStream(
    client.GetStream(), CompressionMode.Compress, false);
    #endif
    for (Byte b = 0;; ++b)
    {
    stream.WriteByte(b);
    stream.Flush();
    Console.WriteLine("{0}", b);
    System.Threading.Thread.Sleep(1000);
    }
    }
    }
    }
    }


  2. Default Re: GZipStream and buffering

    On Mon, 21 Sep 2009 18:19:01 -0700, Agendum
    <Agendumatdiscussionsdotmicrosoft.com> wrote:

    > I wrote a client / server networking application and all communication
    > between the two is compressed using GZipStream. However, I found a wierd
    > buffering problem. Despite how I had the network stream configured,
    > buffering would still occur (on a small scale). I pinpointed the
    > problem to
    > GZipStream. Basically it seems to buffer content and calling Flush has
    > no
    > effect.


    Not that I think it's such a great idea to:

    -- Set NoDelay to true,
    -- Set the send and receive buffers to 1 byte in length, or
    -- Flush the GZipStream after each write

    But, basically the problem here is that you expect there to be no
    buffering when it's impossible for there to be no buffering.

    The job of GZipStream is to take a stream of bytes and turn it into a
    shorter stream of bytes. Since you get fewer bytes on the receiving end,
    it should be obvious that for at least some of the bytes you send, you
    will not receive a byte on the output of GZipStream.

    Likewise, at the receiving end, the job of the class there is to take a
    short stream of bytes and turn it back into the longer stream. Thus,
    there it should also be obvious that for every byte you actually do
    receive on the network, for at least some of them, you will get more than
    one byte on the output of the GZipStream.

    In other words, GZipStream is doing exactly what it's supposed to, as is
    each TcpClient given how you've configured them (however obscenely that
    may be ).

    In general, trying to disable buffering on a network stream is a really
    bad idea. But at the very least, it is simply impossible to avoid at
    least some buffering within the compression/decompression stages, because
    that's a fundamental aspect of how compression works (it's essentially a
    corallary to the pigeon-hole principle...you only have so many
    "pigeon-holes" on the output of the GZipStream to put the input, which has
    more elements than there are "pigeon-holes", so obviously some of the
    input elements don't have their own unique output "pigeon-hole").

    Pete

  3. Default Re: GZipStream and buffering

    The fact I use NoDelay, have a 1 byte buffer, and flush is just a
    demonstration that theres no other options other than for the byte to be
    transmitted. Also, I don't Flush after each write (each byte!) in the
    original app -- it is just a demonstration here.

    In any case, I understand what you are saying about the GZipStream.
    Basically to apply a reasonable amount of compression GZipStream reads a
    minimum amount of bytes. The fact GZipStream has a Flush method is
    irrelevant... it just flushes the already-compressed bytes to the stream. I
    was incorrectly assuming it would compress any remaining bytes in the stream
    and write it out. Apparently there is no method for doing that.

    I mentioned the original application sends messages of a sizeable amount and
    I am experiencing the same problem. I guess I can conclude from this that:

    1) GZipStream compresses bytes on some internally defined byte boundary.
    This would explain why just a "minimum number of bytes" is not enough.

    2) The only way to invoke the call of "compress any remaining bytes in the
    stream" is to actually close the GZipStream itself.

    Thanks for your response.

    "Peter Duniho" wrote:

    > On Mon, 21 Sep 2009 18:19:01 -0700, Agendum
    > <Agendumatdiscussionsdotmicrosoft.com> wrote:
    >
    > > I wrote a client / server networking application and all communication
    > > between the two is compressed using GZipStream. However, I found a wierd
    > > buffering problem. Despite how I had the network stream configured,
    > > buffering would still occur (on a small scale). I pinpointed the
    > > problem to
    > > GZipStream. Basically it seems to buffer content and calling Flush has
    > > no
    > > effect.

    >
    > Not that I think it's such a great idea to:
    >
    > -- Set NoDelay to true,
    > -- Set the send and receive buffers to 1 byte in length, or
    > -- Flush the GZipStream after each write
    >
    > But, basically the problem here is that you expect there to be no
    > buffering when it's impossible for there to be no buffering.
    >
    > The job of GZipStream is to take a stream of bytes and turn it into a
    > shorter stream of bytes. Since you get fewer bytes on the receiving end,
    > it should be obvious that for at least some of the bytes you send, you
    > will not receive a byte on the output of GZipStream.
    >
    > Likewise, at the receiving end, the job of the class there is to take a
    > short stream of bytes and turn it back into the longer stream. Thus,
    > there it should also be obvious that for every byte you actually do
    > receive on the network, for at least some of them, you will get more than
    > one byte on the output of the GZipStream.
    >
    > In other words, GZipStream is doing exactly what it's supposed to, as is
    > each TcpClient given how you've configured them (however obscenely that
    > may be ).
    >
    > In general, trying to disable buffering on a network stream is a really
    > bad idea. But at the very least, it is simply impossible to avoid at
    > least some buffering within the compression/decompression stages, because
    > that's a fundamental aspect of how compression works (it's essentially a
    > corallary to the pigeon-hole principle...you only have so many
    > "pigeon-holes" on the output of the GZipStream to put the input, which has
    > more elements than there are "pigeon-holes", so obviously some of the
    > input elements don't have their own unique output "pigeon-hole").
    >
    > Pete
    >


  4. Default Re: GZipStream and buffering

    On Mon, 21 Sep 2009 20:06:01 -0700, Agendum
    <Agendumatdiscussionsdotmicrosoft.com> wrote:

    > The fact I use NoDelay, have a 1 byte buffer, and flush is just a
    > demonstration that theres no other options other than for the byte to be
    > transmitted.


    Obviously, there _are_ other options other than for the byte to be
    transmitted. The GZipStream instance can (and does) buffer it.

    > Also, I don't Flush after each write (each byte!) in the
    > original app -- it is just a demonstration here.


    Okay, that's a relief.

    > In any case, I understand what you are saying about the GZipStream.
    > Basically to apply a reasonable amount of compression GZipStream reads a
    > minimum amount of bytes.


    It's not really about being "reasonable". It's simply how that particular
    compression algorithm works. It builds a dictionary as it goes, and when
    certain conditions are fulfilled (e.g. some new sequence of bytes not
    already in the dictionary is seen, or a given sequence of bytes seen does
    match something in the dictionary, etc.) the compression algorithm emits
    bytes on the output end.

    Depending on the input, this may in fact result in unreasonable amounts of
    compression, or even inflation of the stream. "Reasonable" doesn't come
    into play; it's basically a dynamic state machine, and at certain states,
    bytes are emitted, hopefully (but not always) in a compressed state as
    compared to the input.

    > The fact GZipStream has a Flush method is
    > irrelevant... it just flushes the already-compressed bytes to the
    > stream. I
    > was incorrectly assuming it would compress any remaining bytes in the
    > stream
    > and write it out. Apparently there is no method for doing that.


    Allowing that would be counter-productive from a compression point of
    view, but would prevent the decompression side from working in any case.

    > I mentioned the original application sends messages of a sizeable amount
    > and
    > I am experiencing the same problem. I guess I can conclude from this
    > that:
    >
    > 1) GZipStream compresses bytes on some internally defined byte boundary.
    > This would explain why just a "minimum number of bytes" is not enough.


    It's not "some internally defined byte boundary". It has to do with the
    progress of the compression algorithm in matching the input to the current
    state of its dictionary. The compression algorithm is documented. If you
    care how it works, you should read about how it works.

    > 2) The only way to invoke the call of "compress any remaining bytes in
    > the
    > stream" is to actually close the GZipStream itself.


    Yes. That is the only way for that particular compression algorithm to
    work.

    Pete

  5. Default Re: GZipStream and buffering

    * Peter Duniho wrote, On 22-9-2009 5:20:
    > On Mon, 21 Sep 2009 20:06:01 -0700, Agendum
    > <Agendumatdiscussionsdotmicrosoft.com> wrote:
    >
    >> The fact I use NoDelay, have a 1 byte buffer, and flush is just a
    >> demonstration that theres no other options other than for the byte to be
    >> transmitted.

    >
    > Obviously, there _are_ other options other than for the byte to be
    > transmitted. The GZipStream instance can (and does) buffer it.
    >
    >> Also, I don't Flush after each write (each byte!) in the
    >> original app -- it is just a demonstration here.

    >
    > Okay, that's a relief.
    >
    >> In any case, I understand what you are saying about the GZipStream.
    >> Basically to apply a reasonable amount of compression GZipStream reads a
    >> minimum amount of bytes.

    >
    > It's not really about being "reasonable". It's simply how that
    > particular compression algorithm works. It builds a dictionary as it
    > goes, and when certain conditions are fulfilled (e.g. some new sequence
    > of bytes not already in the dictionary is seen, or a given sequence of
    > bytes seen does match something in the dictionary, etc.) the compression
    > algorithm emits bytes on the output end.
    >
    > Depending on the input, this may in fact result in unreasonable amounts
    > of compression, or even inflation of the stream. "Reasonable" doesn't
    > come into play; it's basically a dynamic state machine, and at certain
    > states, bytes are emitted, hopefully (but not always) in a compressed
    > state as compared to the input.
    >
    >> The fact GZipStream has a Flush method is
    >> irrelevant... it just flushes the already-compressed bytes to the
    >> stream. I
    >> was incorrectly assuming it would compress any remaining bytes in the
    >> stream
    >> and write it out. Apparently there is no method for doing that.

    >
    > Allowing that would be counter-productive from a compression point of
    > view, but would prevent the decompression side from working in any case.
    >
    >> I mentioned the original application sends messages of a sizeable
    >> amount and
    >> I am experiencing the same problem. I guess I can conclude from this
    >> that:
    >>
    >> 1) GZipStream compresses bytes on some internally defined byte boundary.
    >> This would explain why just a "minimum number of bytes" is not enough.

    >
    > It's not "some internally defined byte boundary". It has to do with the
    > progress of the compression algorithm in matching the input to the
    > current state of its dictionary. The compression algorithm is
    > documented. If you care how it works, you should read about how it works.
    >
    >> 2) The only way to invoke the call of "compress any remaining bytes in
    >> the
    >> stream" is to actually close the GZipStream itself.

    >
    > Yes. That is the only way for that particular compression algorithm to
    > work.


    In my opinion, the best way to make this work is to compress the data
    first, and then send the compressed data over the wire as if it were a
    message.

    For that you have two options:
    1) Create the message beforehand by writing it to a MemoryStream, then
    stream the contents of that over the network. The problem with this
    approach is that it requires more memory.
    2) Create the GZipStream with the second constructor, (see
    http://msdn.microsoft.com/en-us/library/27ck2z1y.aspx), and specify
    false on the second parameter. This allows you to close the GZipstream,
    forcing it to send its contents along to the network while leaving the
    connection open. This means that you would have to create a new
    GZipStream to send a new message over the wire. The problem with this
    approach is that the receiving end need to know when the end of one
    zipped object has been received, so that it in turn can also create a
    new GZipStream to decompress the message on the other end. Meaning
    you'll have to add some protocol handling on both ends.

    Not knowing exactly what you're trying to do here, but I wonder if it
    wouldn't be a better idea to use WCF or some other existing
    communication stack to solve your problem.

    --
    Jesse Houwing
    jesse.houwing at sogeti.nl

+ Reply to Thread