C# Concatenated text or tiff files

This is a unique project. We are trying to separate a concatenated tiff or text file. When we read the files it only gives us the first file when there are more there. This is not a multiple tiff file. It is a concatenated tiff file. Is there a special character that I must use so that it goes to the next file. When I look at the file in notepad++, I can see all the binary data starting with  an "II*" for each file. All help is appreciated.
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

kaufmed   ( ⚆ _ ⚆ )Patches? We Ain't Got No Patches! We Don't Need No Patches! I Don't Have to Push You No Stinkin' Patches!Commented:
I'm confused:  Where does the text file come into play?
rkspenceAuthor Commented:
When you view the tiff file in binary format it acts just like a text file.
kaufmed   ( ⚆ _ ⚆ )Patches? We Ain't Got No Patches! We Don't Need No Patches! I Don't Have to Push You No Stinkin' Patches!Commented:
Ah, well that's functionality of the text editor you are viewing the file with, not the file itself.

According to Wiki, the "magic bytes" for tiff files can come in two varieties (textually):  II*. and MM.*. If these files were truly concatenated together at a binary level, then you should be able to loop the bytes until you find these markers. Once you find one, you are at the next file.

When working with binary files (like images), you generally work with the bytes, and you do not treat them as characters. So the corresponding byte representations of the magic numbers I referenced above are (respectively): 49 49 2A 00 and 4D 4D 00 2A.
Microsoft Azure 2017

Azure has a changed a lot since it was originally introduce by adding new services and features. Do you know everything you need to about Azure? This course will teach you about the Azure App Service, monitoring and application insights, DevOps, and Team Services.

rkspenceAuthor Commented:
Yes you are absolutely correct and this is what we are trying to do. Can you please give us and example in C# how to loop for this configuration because we are only getting the first file as we stream the file and we are not sure how to handle the binary format.
kaufmed   ( ⚆ _ ⚆ )Patches? We Ain't Got No Patches! We Don't Need No Patches! I Don't Have to Push You No Stinkin' Patches!Commented:
What is the end result? What are you trying to do with each image?
rkspenceAuthor Commented:
These are check images from the bank and we are trying to split each front and back into separate tiff files to import into a document imaging system. We used NotePad++ to see if multiple image files really were there. We were able to delete binary data but not copy and paste it in that program. By deleting all the binary data from the second instance of II back to the beginnign of the file we were able to see the next image so it really is a concatenated TIFF file and not a multi-page one (viewers only show the first image).

We thought the easiest way to break them apart again would be to look for the "II" sequence in the stream. The bank also provided a text data file with start positions and lengths which appear to be byte related. Here is some sample data for the first ten images:

##Start Position## ##Length#
000000000000000000 000011486 Check#1 Front
000000000000011486 000007010 Check#1 Back
000000000000018496 000012370 Check#2 Front
000000000000030866 000014120 Check#2 Back
000000000000044986 000011618 Check#3 Front
000000000000056604 000008887 Check#3 Back
000000000000065491 000011522 Check#4 Front
000000000000077013 000009119 Check#4 Back
000000000000086132 000017028 Check#5 Front
000000000000103160 000016444 Check#5 Back

Sorry I can't post the TIFF file but it contains all 356 live check images and I don't know how to mock one up as a fake sample. It would be neater if the start position and length data could be used in C# to process the TIFF files for each image. That file also contains check#, account#, etc which we would use to rename the files as we process them.

Once they are split into individual TIFF files we would like to combine the front and back image pairs into single true multipage TIFFs; one for each check.
Joe WinogradDeveloperCommented:
I've never seen a TIFF file concatenated like that and I have no idea why the bank would do it that way. If it were me, I'd press the bank for a more standard file format. That said, my first question is if the byte string beginning at [Start Position] and running for [Length] bytes is a well-formed TIFF file. If it is, then it should be easy to read the entire file into a string variable and then loop through the string, selecting each substring beginning at [Start Position] and for [Length] bytes, writing each such substring to a file. You don't have to "handle the binary format" in any way, since you know the starting byte number and the number of bytes for each file – just parse the string that way, ignoring any meaning to the binary format. The text data file also seems to provide a file name for each file, e.g., <check_1_front.tif>, <check_1_back.tif>, <check_2_front.tif>, <check_2_back.tif>, etc. I'm not a C# programmer, but could write such a program in a language that I know, and I'm sure C# has enough functionality as a programming language for this to be done. Of course, if each substring from [Start Position] for [Length] bytes is not a well-formed TIFF file, then the whole idea is for naught.

Once you have the individual TIFF files for front and back, there are various ways to combine each front-and-back set into a multi-page (in this case, two-page) TIFF. I'd probably use the "/multitif" option of IrfanView, as shown in this EE article. Regards, Joe
rkspenceAuthor Commented:
Unfortunately the bank is not being helpful. Their handbook for using the files appears to have been written for mainframe programmers and actually says they will offer no support for parsing or splitting the data and that customers will need the expertise to do those functions.

We are able to split off the first image file using both VBA and C#. The current problem appears to be that the BinaryReader.ReadBytes is not reading the entire file. Like the TIFF Viewer programs it only sees the first image. We know it is working as we have split off a viewable TIFF file of 15kb out of the full 4Mb file but still can't move to the next image.

I'm guessing it is hitting some type of End of File marker. Is there some way to tell ReadBytes to ignore EOF and read the entire concatenated file?
Joe WinogradDeveloperCommented:
Sorry...don't know ReadBytes. Perhaps a C# expert familiar with it will jump in. In the meantime, if you're willing to try another language, I recommend AutoHotkey (AHK), an excellent (free!) programming/scripting language. There have been several forks of the original language and my preferred one now is AutoHotkey_L. It comes with a Windows installer, as well as a compiler that turns the AHK source code (plain text) into a stand-alone/no-install executable (an EXE file).

If you're willing to give it a try, attached is an AHK program (the source code in a text file) that will read a binary file into a variable and then write the variable out to a binary file. I just ran it on a multi-page TIFF file and it worked perfectly – the output file is identical to the input file (determined by using a binary file comparison program). Change these two lines of code to point to your test files:


It would be very interesting to know if this AHK program gets your entire check file or if it also stops at an EOF marker. I'd be happy to test it for you, but you said that you can't post the TIFF file. So to test it yourself, all you need to do is install AutoHotkey and it will own the AHK file type – just double-click the attached AHK source code after downloading it and AutoHotkey will run it (no need to compile it into an EXE until and unless you want a stand-alone executable). Since the source code is provided, you may see for yourself that the code is not malicious.

Note that the AHK function that reads the file into a variable (readfiletovar) takes as parameters a starting location and a number of bytes to read. So you could use the bank's text data file to drive the file-reading once we know that the basic concept is sound. But the attached AHK program simply reads (and writes) the whole file just to see if there's an EOF issue. Regards, Joe
rkspenceAuthor Commented:
Joe, Thanks for the suggestion of AutoHotKey. We need to end up with a .exe we can call from a script. I really prefer C# as we are trying to standardize on it. We were able to do a proof of concept in VBA using the Start Position and Byte Length provided in the Bank's text file:

Private Sub Command0_Click()
    Dim FileIn As Long
    Dim FileOut As Long
    Dim StartPos As Long
    Dim Length As Long
    Dim ImageData As String
    FileIn = FreeFile()
    FileOut = FreeFile()
      'I hard coded in sample data for testing
      StartPos = 2613076
    Length = 10786
      'This next lint was apparently the trick.
      'I created an empty string the exact length needed for the image file.
    ImageData = String(Length, " ")
    Open "C:\input.tiff" For Binary Access Read As #FileIn
    'The +1 below was needed to correct start position
      Get #FileIn, StartPos + 1, ImageData
    Close #FileIn
    Open "C:\Output.tiff" For Binary Access Write As #FileOut
    Put #FileOut, , ImageData
    Close #FileOut
End Sub

This succesfully writes a TIFF file with the single image specified.
Does anyone know what the equivalent of this would be in C#?
Joe WinogradDeveloperCommented:
> We need to end up with a .exe we can call from a script.

That's exactly what the AutoHotkey compiler will create.

> I really prefer C# as we are trying to standardize on it.

Let's hope a C# expert jumps in...that's not me.

> We were able to do a proof of concept in VBA using the Start Position and Byte Length provided in the Bank's text file

Good to hear! I'm guessing you could complete the effort in VB (and I'm pretty sure I could do it in AHK), but if you're committed to doing it in C#, so be it. In any case, I was happy to try to help. Regards, Joe
kaufmed   ( ⚆ _ ⚆ )Patches? We Ain't Got No Patches! We Don't Need No Patches! I Don't Have to Push You No Stinkin' Patches!Commented:
Here's a quick-and-dirty approach to separating the files:

namespace _28332667
    class Program
        static void Main(string[] args)
            using (System.IO.FileStream inputStream = System.IO.File.Open(@"C:\path\to\file.tiff", System.IO.FileMode.Open))
                System.IO.FileStream outputStream = null;
                int fileCount = 0;

                if (inputStream.Length > 0)
                    int datum = inputStream.ReadByte();

                    outputStream = System.IO.File.Create(@"C:\path\to\file" + fileCount.ToString() + ".tiff");

                    while (inputStream.Position != inputStream.Length && datum >= 0)
                        datum = inputStream.ReadByte();

                        if (datum == (int)'I')
                            int temp = inputStream.ReadByte();

                            if (temp == (int)'I')
                                temp = inputStream.ReadByte();

                                if (temp == (int)'*')
                                    temp = inputStream.ReadByte();

                                    if (temp == (int)'\0')
                                        outputStream = System.IO.File.Create(@"C:\path\to\file" + fileCount.ToString() + ".tiff");




                if (outputStream != null)

Open in new window

The idea is that you loop through all the bytes of the source file looking for the magic number. If you find it, then you close the current output file and start a new one. The FileStream class doesn't expose a Peek method, so we simulate one by reading ahead when we find magic number characters, and if we don't find the complete magic number, then we move the read position backward however many characters forward we moved.

The above is based on the TIF magic number of 49 49 2A 00. If the format you are provided used the other magic number instead (4D 4D 00 2A), then you simply need to adjust the character values in lines 24, 28, 32, and 36. It would be odd that the bank would mix the two magic numbers, so it should be consistent throughout the file.


In case you're not aware, the character "\0" is one character (even though it looks like two). It is the null character, and it has a numerical value of zero.


The above simply writes out each image to its own file. I am not familiar enough with the multipage TIFF file specification to show how how to create outputs of that type.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
Joe WinogradDeveloperCommented:
> I am not familiar enough with the multipage TIFF file specification to show how how to create outputs of that type.

This is where the "/multitif" option of IrfanView can help. I mentioned an EE article earlier, but here's just the command showing the IrfanView command line call:

i_view32.exe /multitif=(c:\path\front_back_in_2page_file.tif,c:\path\front.tif,c:\path\back.tif) /killmesoftly /silent /tifc=4

The "/multitif" syntax is such that the first file is the name of the multi-page output (combined/merged) TIFF file and all of the subsequent files are the input files. The "/killmesoftly" and "/silent" params are nice for calling it in a program. The "/tifc" param is for the TIFF file compression. Its values may be:

3=ITU-T Group 3
4=ITU-T Group 4

I have experimented extensively with them and unless you have a reason for picking something else, I strongly recommend ITU-T (previously known as CCITT) Group 4. Btw, all of the IrfanView command line parameters are documented in the file <i_options.txt> that is created in the IrfanView install directory. I have attached it to this thread for the latest version of IrfanView (4.37). Regards, Joe
rkspenceAuthor Commented:

Thanks, the Q&D based on the Magic Numbers appears to work fine. We tried using the start posiiton and length in VBA and got it to work pulling out individual images but could not get it to loop through all the images.

Using the Magic numbers to break them apart does not provide any easy way to rename them with the check number and front/back designation. We would have to read the txt data file separately and rename the checks in order. I think that will work OK but we will need to test and make sure we have the correct number of images we are expecting. The bank recently reminded me that some images may be missing. i think they will still be in the txt file with 0 length but still showing check#, date, and amount.

Is there any option for using Start Position, and length in C# or will we run into the same problem when we try to run it in a loop?
Hey kaufmed,

I have a similar issue...except I want to do it in reverse!
I want to create that type of file vs reading from it.
I work for a Bank and need to create a file just like this!

Please see this question I submitted:

kaufmed   ( ⚆ _ ⚆ )Patches? We Ain't Got No Patches! We Don't Need No Patches! I Don't Have to Push You No Stinkin' Patches!Commented:
Hi smithmrk,

I just saw your post. I'm actually off to bed now, but I'll take a look at your question tomorrow.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today

From novice to tech pro — start learning today.