Solved

Data compression in C#/J# and Java

Posted on 2004-10-28
2,339 Views
Last Modified: 2012-05-05
Environment:
Windows, Java 1.4, .NET Framework 1.1

Hi,

I'm distributing dynamic content (always strings) by a .NET web page, and consume it by a Java client.  I'd like to compress this content to speed up the time to delivery.

Referencing the J# library in the .NET project, I can utilize the Microsoft java.util.zip package.  In my case, I'm - both client- and server-side - using the Java Inflater resp. Deflater objects.  The Java resp. C# code is almost similar, and compression / decompression works fine for each platform.

I've made sure the strings which ought to be decompressed at the client exactly matches the strings compressed at the server (as aforementioned, decompressing them at the server does work).


PROBLEM:

In general and partly, the Java client is able to decompress the string compressed by the C# server.
But, the longer the string gets, the worse is the decompression success rate.  I.e., the first 20 characters are decompressed 100% correct.  20% of the next 20 characters are not - or wrongly - decompressed, 30% of the next 20 characters, and so on.


I'd appreciate any well-directed hints.
Thanks indeed.


---
Additional information:
1. I'm aware there is GZIPInputStream, etc.  I'm not using it as I ran into problems with converting Java byte arrays to strings.  In C# web applications, I can output strings or byte[], but not sbyte[] (which is the equivalent of the Java byte[] type.

2. Also, the GZIPOutputStream complained it would need a dictionary (setDictionary()).  I guess that's near where the solution is.

3. I'm also aware IIS 6.0 can be set up to deliver gzip'ed, dynamic content.  I didn't want to choose this way as, AFAIK, any IIS website on this server would be concerned.

Thanks again.
0
Question by:robbert
    7 Comments
     
    LVL 4

    Expert Comment

    by:lcwiding
    How are you passing the compressed data from the server to the client? I have used both libraries to decompress/compress data with on problems. If you are streaming the data, and processing it as it streams down, you may be hitting timeout conditions, and not handling those correctly.

    Some example code from both sides might help figure out the issue as well.
    0
     
    LVL 15

    Author Comment

    by:robbert
    I'm using Response.Write(), after setting the content-encoding to UTF-8.

    I'm stepping through the server-side and client-side code at run-time, using the Visual Studio .NET and Eclipse debugger:  The string at the client is literally the same as the one sent from the server.  Additionally, my test projects are kept simple, in order to eliminate additional possible error sources.

    Remember:  The first portions of the string get decompressed fine.  There is no fixed position at which decompression fails, generally.  The problem is, the longer the string gets, the more failures occur decompressing single characters.  Even at the end of the string, some random characters may get "translated" correctly.

    Thanks.
    0
     
    LVL 24

    Expert Comment

    by:sciuriware
    Rest assured that GZIPInputStream and GZIPOutputStream are safe (upto 4 Gb!!!!).
    The problem is certainly not in there, I did a lot compression and expanding,
    so much that I found the 4Gb barrier.
    ;JOOP!
    0
     
    LVL 15

    Author Comment

    by:robbert
    I guess the problem is the string encoding.

    HttpURLConnection.getContentEncoding() returns "text/plain; charset=utf-8".
    I have to convert the string received to a byte array. I was using:

              byte[] bytes = new byte[str.length()]; // length is 35
            for (int i = 0; i < str.length(); i++)
            {
                char c = str.charAt(i);
                int charCode = c;
                bytes[i] = (byte)charCode;
            }

    That way, ASCII characters get decoded fine but, i.e., Umlauts (äöü) get decoded as &#65508;&#65526;&#65532;.

    I also tried java.nio(.charset),

              Charset charset = Charset.forName("UTF-8");
              ByteBuffer byteBuffer = charset.encode(str);
              byte[] bytes = byteBuffer.array();  // length is 76

    but that resulted in the error java.util.zip.DataFormatException: invalid distance code
    0
     
    LVL 15

    Author Comment

    by:robbert
    BTW, String.getBytes() and getBytes(charsetName) resulted in other DataFormatExceptions.
    0
     
    LVL 4

    Accepted Solution

    by:
    From what you have said here, it sonuds like you are taking the output from the compressor, and writing that out as a string to be read by the client application. Without some form of encoding (and standard HTTP encoding will not work), this will fail.

    The easiest encoding would be to to convert each byte into two hexidecimal characters, write that stream out, and convert them back to bytes in the Java client (using Integer.parseInt(s,16)).

    To provide a better conversion process, you can look at Base64 encoding. Here is a link to some Java code to support this: http://mindprod.com/products.html#BASE64

    There may be easier ways to implement this, perhaps direct through a MIME type, but I have not worked with those much myself.

    0
     
    LVL 15

    Author Comment

    by:robbert
    > it sonuds like you are taking the output from the compressor, and writing that out as a string to be read by the client application. Without some form of encoding (and standard HTTP encoding will not work), this will fail.

    That's what I'm doing, and, in the meantime, it started working!
    As for the record, at the server, the data is compressed like in this sample: http://msdn.microsoft.com/msdnmag/issues/03/06/ZipCompression/default.aspx

    At the client side, I firstly read in the bytes returned by the HttpURLConnection. Then:
    String x = new String(bytes, "UTF-8");  // that's all
    Then, decompress the string by Java code being quite identical to the code in the aforementioned link.

    > look at Base64 encoding

    Sorry that I'm contradicting.  I've tried that.  Base64 encoding will blow up the input strings.  Even compressed, this results in larger strings than the initial ones.

    - Thank you for your support, though.  Getting input is very valuable to me.
    0

    Write Comment

    Please enter a first name

    Please enter a last name

    We will never share this with anyone.

    Featured Post

    How your wiki can always stay up-to-date

    Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
    - Increase transparency
    - Onboard new hires faster
    - Access from mobile/offline

    Suggested Solutions

    1. Package the applet into a JAR file. The applet must be in a JAR file before a certificate can be attached to it. Use the jar JDK utility. If the applet was previously referenced with the help of a codebase attribute in  tag, replace the codebase …
    Introduction This article is the last of three articles that explain why and how the Experts Exchange QA Team does test automation for our web site. This article covers our test design approach and then goes through a simple test case example, how …
    Viewers learn about the third conditional statement “else if” and use it in an example program. Then additional information about conditional statements is provided, covering the topic thoroughly. Viewers learn about the third conditional statement …
    This tutorial covers a step-by-step guide to install VisualVM launcher in eclipse.

    913 members asked questions and received personalized solutions in the past 7 days.

    Join the community of 500,000 technology professionals and ask your questions.

    Join & Ask a Question

    Need Help in Real-Time?

    Connect with top rated Experts

    16 Experts available now in Live!

    Get 1:1 Help Now