Link to home
Start Free TrialLog in
Avatar of Dorababu M
Dorababu MFlag for India

asked on

itextsharp text exceptions invalidpdfexception pdf header signature not found

I have the following where I am converting the bytes to pdf
public static byte[] concatAndAddContent(List<byte[]> pdf)
    {
        byte [] all;

        using(MemoryStream ms = new MemoryStream())
        {
            Document doc = new Document();

            PdfWriter writer = PdfWriter.GetInstance(doc, ms);

            doc.SetPageSize(PageSize.LETTER);
            doc.Open();
            PdfContentByte cb = writer.DirectContent;
            PdfImportedPage page;

            PdfReader reader;
            foreach (byte[] p in pdf)
            {
                reader = new PdfReader(p);
                int pages = reader.NumberOfPages;

                // loop over document pages
                for (int i = 1; i <= pages; i++)
                {
                    doc.SetPageSize(PageSize.LETTER);
                    doc.NewPage();
                    page = writer.GetImportedPage(reader, i);
                    cb.AddTemplate(page, 0, 0);
                }
            }

            doc.Close();
            all = ms.GetBuffer();
            ms.Flush();
            ms.Dispose();
        }

        return all;
    }

Open in new window

Getting an error saying  PDF header signature not found
Avatar of Jonathan D.
Jonathan D.
Flag of Israel image

Which line throws the exception? And if possible, a screenshot of the exception dialog. 
Avatar of Dorababu M

ASKER

This line throws exception

reader = new PdfReader(p);
A wild guess (before I go in depth) is that you're creating a new PdfReader instance each iteration inside the loop while reading the bytes content of the document, which means each new instance is getting a different region of the document which will not result in a complete document. You need to read the entire document stream of bytes and then pass the memory stream to a single instance of PdfReader to parse the document. Create that instance outside the scope of the foreach loop.
But new PdfReader(); will not accept as it takes some arguments
According to the doc, the method reads an entire pdf document by accepting it as a raw file from the disk by file path, or by series of bytes from a stream (could me memory stream or a file stream) and more.

Please tell me more about the static method you're working with, concatAndAddContent. What is List<byte[]> pdf? Is it the raw stream of bytes of the pdf document? Because if so you can pass this directly to the memory stream and then pass the stream to the PdfReader method. Please provide as much information as you can.
I am developing a windows application where I am getting images from azure blob and converting them to stream
So basically, you want to create a fresh pdf file which consists of a series of images you're retrieving from the blob server. Right? Which format are the images? How are they defined in the program's code? please provide a little more code so we can help you restructure the logic of your program. I have a feeling in my next comment I will solve this issue for you, just bear with us.
The images can be in gif/jpeg/png kind
@Dorababu M Please give me full answers to my questions, you're answering me a single question. I need full cooperation from you, read again what I've asked.
Yes I need to create a fresh pdf file for the series of files I am getting from azure, format of the images are mostly GIF

Here is the code I am trying
CloudBlockBlob blockBlob;
      using (MemoryStream memoryStream = new MemoryStream()) {
        string blobstorageconnection = "connection;
        CloudStorageAccount cloudStorageAccount = CloudStorageAccount.Parse(blobstorageconnection);
        CloudBlobClient cloudBlobClient = cloudStorageAccount.CreateCloudBlobClient();
        CloudBlobContainer cloudBlobContainer = cloudBlobClient.GetContainerReference("container");
        blockBlob = cloudBlobContainer.GetBlockBlobReference(Process(rdr));
        await blockBlob.DownloadToStreamAsync(memoryStream);
      }

List<byte[]> l2 = new List<byte[]>();
Stream blobStream = blockBlob.OpenReadAsync().Result;
byte[] buffer = new byte[16 * 1024];
using (MemoryStream ms = new MemoryStream()) {
   int read;
   while ((read = blobStream.Read(buffer, 0, buffer.Length)) > 0) {
    ms.Write(buffer, 0, read);
}
l2.Add(ms.ToArray());
}

Open in new window

That list of bytes I am passing to the function which was shared
Did you verify that the images are being retrieved correctly? I want to narrow down the process step by step with you until we get to the solution, we're really close so pay attention to what I'm asking :)

Can you tell me what each Byte[] holds in the list that you're accepting as a parameter? is it a raw image or pdf file?

Edit: You know, naming conventions is really important too because you've named the parameter pdf (the list of bytes) which may cause us to think you're expecting pdf files in raw bytes rather than images from the blob server. This is very important to understand the flow of the logic of a computer program source code.
Hi yes when I actually test the code with the follows image is getting downloaded

public async Task<IActionResult> Download(string blobName)
 {
     CloudBlockBlob blockBlob;
     await using (MemoryStream memoryStream = new MemoryStream())
     {
         string blobstorageconnection = _configuration.GetValue<string>("blobstorage");
         CloudStorageAccount cloudStorageAccount = CloudStorageAccount.Parse(blobstorageconnection);
         CloudBlobClient cloudBlobClient = cloudStorageAccount.CreateCloudBlobClient();
         CloudBlobContainer cloudBlobContainer = cloudBlobClient.GetContainerReference("filescontainers");
         blockBlob = cloudBlobContainer.GetBlockBlobReference(blobName);
         await blockBlob.DownloadToStreamAsync(memoryStream);
     }

     Stream blobStream = blockBlob.OpenReadAsync().Result;
     return File(blobStream, blockBlob.Properties.ContentType, blockBlob.Name);
 }

Open in new window

But the same when I am trying to convert to byte and converting to pdf it is not working as expected
Ha I got you earlier we use to store the data in varbinary so the code works now as it moved to azure I am trying to convert the GIF to pdf
The Download action download's a single blob (Image file) from azure blob storage and is being called multiple times, before passing it to the serializer to convert them to a series of bytes and then passes the list of byte array to concatAndAddContent. Right? Is this the correct order in the chain?
I write it multiple to test from the MVC application whether img is getting downloaded. But it will called one time to serialize and convert to pdf
What is the purpose of concatAndAddContent? I see you're creating there pdf document with hopefully the images, but you're returning an array of bytes. So where does the pdf document gets into play here? Answer me this as I'm preparing a solution for you.

Edit: I got to the conclusion that the PdfReader is throwing you an exception because it's expecting a pdf document file format. What you're passing it is a series of bytes of a gif image. I will restructure the block of code for you to match your needs.
Hi Jonathan, I will have multiple files associated for a user so what I am doing is I will get the details from db and loop through  page by page and attach all them to PDF. so let's say I got 10 images from database I will loop through all and get them and convert them to pdf
Why are you working with raw bytes instead of a abstract object like Bitmap or Image and attach them directly to the document? You can check out the Image class in the doc for this use. Consider rethinking the logic of your program.
But from azure we will get the path how can I place it on PDF?
How are you receiving it from the blob storage? as a series of bytes per image? If so you can deserialize each image and then attach it to the single document.
Can I get the code, when I am writing the stream or byte array to an image the image is not loading properly, I used SixLabors.ImageSharp 

Stream blobStream1 = blockBlob.OpenReadAsync().Result;
      byte[] buffer = new byte[16 * 1024];
      using (MemoryStream ms = new MemoryStream()) {
        int read;
        while ((read = blobStream1.Read(buffer, 0, buffer.Length)) > 0) {
          ms.Write(buffer, 0, read);
        }
        var image = Image.Load<Rgba32>(ms.ToArray());
        image.Mutate(x => x.Grayscale());
        try {
          image.Save(@"C:\Bugs\test.gif");
        } catch (Exception ex) {
        }

Open in new window

But the image after saving is unable to view it is like compressed
ASKER CERTIFIED SOLUTION
Avatar of Jonathan D.
Jonathan D.
Flag of Israel image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Btw, did you read this?

PLEASE NOTE: iTextSharp is EOL, and has been replaced by iText 7. Only security fixes will be added

We HIGHLY recommend customers use iText 7 for new projects, and to consider moving existing projects from iTextSharp to iText 7 to benefit from the many improvements such as:
- HTML to PDF (PDF/A) conversion
- PDF Redaction
- SVG support
- Better language support (Indic, Thai, Khmer, Arabic, Hebrew)
- PDF Debugging for your IDE
- Data Extraction
- Better continued support and bugfixes
- More modular, extensible handling of your document workflow
- Extra practical add-ons
- Encryption, hashing and digital signatures
Looks like in .net core there is no bitmap
Still no luck the image is getting compressed not sure why
Looks like in .net core there is no bitmap

The minimum version of .net core is 3.0, after it is 3.1. so unless you're on a earlier version of .net core, you should consider upgrading.
Still no luck the image is getting compressed not sure why
Still no luck the image is getting compressed not sure why

What do you mean by "image getting compressed"? The quality of the image is getting compressed? Despite the image getting compressed, do you successfully attach the image to the pdf document?
I tried to save the Bitmap with out even attaching to PDF, image quality is not as per original image
Ha I got the thing now I am trying to get the bytes of thumbnail image so it was not working, now can you tell me how can I write those bytes to PDF
now can you tell me how can I write those bytes to PDF

You mean how to attach the images (Bitmap class) to the pdf document? Take a look at this snippet from SoF, and try to figure out how to implement such logic. You're basically adding to the Document object an Image object which you use with GetInstance to read the image.