We help IT Professionals succeed at work.

Best way to see if a file is compressed

ryno71
ryno71 asked
on
Medium Priority
158 Views
Last Modified: 2010-08-05
Hi

If I want to see if a file is compressed , what is the best way?  If see if the zipentry is null... might work but if the the file is corrupted, that wouldn't be accurate


Thanks
ryno71
Comment
Watch Question

Test for an extension .zip or .ZIP

If a zip-file is named another way somebody is cheating.

;JOOP!

Author

Commented:
Thats just it, if I dont know... say it was sent to me in a byte array I wouldn't know.
File f;   // Got it from a directory listing, then you don't know its name.

     if(f.getAbsolutePath().toUpper().endsWith(".ZIP"))
     {
 //    compressed


;JOOP!

Author

Commented:
If I am the one naming the new file I receive and don't know what the original name was this won't work.  
OK then, if the first 8 bytes are (octal):

 0120, 0113, 003, 004, 024, 000, 002, 000

then you may assume it's a ZIP compressed file.

;JOOP!
You can read those 8 bytes from a raw InputStream from the file.

;JOOP!
Correction: I did some research on other zip files:

only 4 bytes are enough:

0120, 0113, 003, 004

;JOOP!

Author

Commented:
Something like this?

I keep getting false

public class BytesFromFile
{

public static byte[] getBytesFromFile(File file) throws IOException {
        InputStream is = new FileInputStream(file);
   
        // Get the size of the file
        long length = file.length();
   
        // You cannot create an array using a long type.
        // It needs to be an int type.
        // Before converting to an int type, check
        // to ensure that file is not larger than Integer.MAX_VALUE.
        if (length > Integer.MAX_VALUE) {
            // File is too large
        }
   
        // Create the byte array to hold the data
        byte[] bytes = new byte[4];
            
            is.read(bytes);
            
      
            
            byte[] bytes1= new byte[] {0120,0113,003,004};

            boolean bo = Arrays.equals(bytes, bytes1);
            boolean bo1 = bytes.equals(bytes1);
            
            System.out.println("result is "+ result);
            System.out.println("bo is "+ bo);
            System.out.println("bo1 is "+ bo1);
        // Read in the bytes
        int offset = 0;
        int numRead = 0;
        while (offset < bytes.length
               && (numRead=is.read(bytes, offset, bytes.length-offset)) >= 0) {
            offset += numRead;
        }
   
        // Ensure all the bytes have been read in
        if (offset < bytes.length) {
            throw new IOException("Could not completely read file "+file.getName());
        }
   
        // Close the input stream and return bytes
        is.close();
        return bytes;
    }

      
      
      
      
      public static void main(String args[]) {
               
                    BytesFromFile da = new BytesFromFile();
               try
               {
                    System.out.println(" ");
                              System.out.println("Zip1 ");
                              System.out.println(" ");
                              String fileName=args[0];
                              byte[] bytes =null;
                              File temp=null;
                              temp = new File(fileName);
                    bytes=da.getBytesFromFile(temp);
                        
                              System.out.println("bytes are "+bytes);
                              
                              
               }
               catch (Exception e1)
               {
                    e1.printStackTrace();
               }
          }
      }
better:

import java.io.File;
import java.io.IOException;
import java.io.RandomAccessFile;

public class TestForCompression
{
   /**
    * Program entry point.
    *
    * @param commandLine program command line vector.
    * @throws IOException
    */
   public static void main(String[] commandLine) throws IOException
   {
      byte[] bytes = new byte[] {0120,0113,003,004};
      byte[] data = new byte[4];
      RandomAccessFile ra;

      if(commandLine.length > 0)
      {
         ra = new RandomAccessFile(new File(commandLine[0]), "r");
         if(ra.read(data) != data.length)
         {
            System.out.println("Short file, not compressed.");
            return;
         }
         
         for(int i = 0;  i < data.length;  ++i)
         {
            if(data[i] != bytes[i])
            {
               System.out.println("Not compressed.");
               return;
            }
         }
         System.out.println("Compressed.");
      }
   }
}

Always test your code!

;JOOP!

Not the solution you were looking for? Getting a personalized solution is easy.

Ask the Experts

Author

Commented:
Thanks.  that will work better!
:)

Author

Commented:
sciuriware

Where did you find that you need to look at four bytes for a compressed file (pkzip or Gzip), couldn't it be two?

ryno71
From the SUN JAVA sources:

ZIP      "PK\003\004"
GZIP    0x8B1F

seems to be 4 bytes all the time.

;JOOP!

Author

Commented:
where did you find this?  Been looking and can't sem to find it!

Thanks alot!
The JDK is accompanied by "src.zip"

;JOOP!
Access more of Experts Exchange with a free account
Thanks for using Experts Exchange.

Create a free account to continue.

Limited access with a free account allows you to:

  • View three pieces of content (articles, solutions, posts, and videos)
  • Ask the experts questions (counted toward content limit)
  • Customize your dashboard and profile

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.