Link to home
Start Free TrialLog in
Avatar of olmy
olmy

asked on

Who could repair ASCII transfered binary file?

I had a binary file that was accidently transfered with ASCII options and now the file is not oviousely working anymore. I need to recover that file, but I can't find/contact a expert that could do it. I can produce a similar (but not excatly same) file with with and without the ascii "corruption" and I know the problem can be fixed with that information.

Who would be available to do it now and what kind of compensation would you like to have for this kind of work?

Best Regards
  Janne

ps. There was a discussion about the same issue in experts-exchange. Click the link below.
https://www.experts-exchange.com/questions/21787847/FTP-ascii-versus-binary.html?sfQueryTermInfo=1+ascii+binari+corrupt+ftp+transfer&anchorAnswerId=16284397#a16284397
Avatar of ravenpl
ravenpl
Flag of Poland image

Since ascii transfer looses(!! the most significant bit !!) some information(and possibly adds some control chars), it can't be fixed/reproduced.
Sorry, but that's the answer.
Avatar of olmy
olmy

ASKER

Dear R-k

I'm trying to reach you, since I have a broken binary that was transfered in ASCII mode.
https://www.experts-exchange.com/questions/22954045/Who-could-repair-ASCII-transfered-binary-file.html?anchorAnswerId=20270092#a20270092
Avatar of olmy

ASKER

Eh, my comment was intended to another question...
Hi olmy,

I saw your comment in the other thread. It has been a while since I worked on that problem, so I am rusty. Also, a lot depends on the particular client and server systems used, as they are not always consistent. However, if you want me to take a look, please transfer a test binary file both ways, i.e. as ascii and then binary, then zip them both into a single zip file and post that at http://www.ee-stuff.com/ I think there is a size limit there so do this with a file no greater than 1 or 2 MB.

I will be glad to take a look, keeping in mind that I am always behind on things and in any case not every file can be recovered.

Avatar of Jeff Darling
This is a very touchy subject.  I cannot tell you how many files have been corrupted by that "feature".

You will probably get arguments from different people about ASCII mode and its benefits.

Some benefits that I've seen of ASCII mode are:

1. Conversion of the end of record delimiter across platforms.  UNIX uses a line feed only HEX 0A, and IBM PC DOS/ MSDOS and all the microsoft windows versions use Carriage Return and Line Feed HEX 0D0A respectively.

2. Conversion of encoding from EBCDIC to ASCII.  I've seen IBM's FTP client for the mainframe and it offers a EBCDIC to ASCII translation as part of the ASCII mode transfer.  This is handy, but the file must only contain character text data.  If there are any packed data fields in the file, then your file will be problematic. Oh sure it might transfer without abending, but you or your recipient will most likely regret it.  Please convert any packed data fields to character format before transfering or else use the binary mode and then perform field by field conversions on the text data later and leave the packed data intact.

Let the user beware, if you are transfering a binary file such as a ZIP , Gzip, Spreadsheet, Wordprocessing document or any other file that isn't plain TEXT then use the BINARY PROTOCOL to prevent your file from becoming corrupted.
Avatar of olmy

ASKER

Thanks r-k. I have been trying to reach you via email. I will produce the test file and send it to you, but I think that the size limit is a problem. Perhaps we could exchange emails. You find me at janne[ dot ]timmerbacka[ at ]mechaul.com.
You may want to try with a test file first. just take any file (e.g. a jpg image) and transfer it both in ascii and binary mode. The file need not be very large. Here are the instructions for posting to the ee-stuff web site:

Go to http://www.ee-stuff.com/ with your web browser.
Click on the small "login" link at the top-right section of that page.
Enter your Experts-Exchange username and password
and click on Login
Click on the "Expert Area" tab
Click on the "Upload a File" link
Scroll down a bit to where you can click on the "Browse" button
Locate your file to be uploaded, etc.
In the "Question" field, enter the question ID (e.g. 22954045)
Click on "Upload"
Let us the know when it has been uploaded, or even better the URL of the upload page.

The advantage is that other people can take a look as well, rather than sending to me by email. I should be able to get a feel for whether we can reconstruct the file by looking at even the trial transfers, but do include both (ascii and binary).

Avatar of olmy

ASKER

Thank you r-k.
Now I understand. Here is a link for the zip packet:

https://filedb.experts-exchange.com/incoming/ee-stuff/5584-testpictures.zip

Janne
Janne,
I looked at the two files, and I _think_ we can reconstruct the binary file from the one transferred in ascii mode. The file is not damaged, but the extra CR byte has been inserted at somewhat arbitrary locations rather than the regular pattern I was hoping for. However, it is not random, and I believe it can be stripped out to reconstruct the original file. The good news is that nothing seems to have been stripped off during the transfer, just additional bytes added.

At the same time, I hope you're not in a rush, because I may not be able to get to it until the weekend. One thing that would help a lot is if you can upload another file to the ee-stuff web site. This time, upload a different file, and only the (damaged) ascii version. The file can be another image, jpg e.g., and the size should be about 1 MB, not much less than that. I can try reconstructing it in a few days and send it back to you, and you can tell me whether it looks correct. Can you also tell us what type of systems (i.e. what OS) were used as the source and destinations for the ftp transfer?

Thanks.
Avatar of olmy

ASKER

Thank you r-k

I'm really glad for your efforts and I understand the timetable. Weekend is ok, but I wonder if transfer via paypal could help you to make the weekend timetable work, because we really need to fix this file as soon as possible?

Server
Unix operating system and ProFTPD FTP Server (215 UNIX Type: L8).

Client
Windows XP and the ftp.exe program operated via commandline.


I just added a new file which is very similar with the broken file
  https://filedb.experts-exchange.com/incoming/ee-stuff/5590-testiascii.zip

  Janne
Please don't make Yourself high hopes.
Refer http://www.nwe.ufl.edu/writing/help/remote/ftp/binary_ascii.shtml
If the file was binary (even one byte was greater than 127) the 8th bit information was lost. It can't be reproduced!
Hi Janne,
Thanks for the new file. It seems to be a .bck file rather than .jpg, but that is OK. The good news is that no bit is stripped out - I took a quick look.
Chances of success are good, but I can't promise 100%. Will only know when I fix it and send it back to you. I will try as soon as possible, but like I mentioned I am always behind. Will hope for no later than this weekend.
Please don't worry about paypal because I don't need any money.

Will try to post an update once I take a closer look at both files.
Avatar of olmy

ASKER

Thanks. The (target) broken backup file is about 2,3 in size, so we need to think how to transfer that to you.
You don't need to transfer the file. I will write a program and send it to you, and you can fix it on your computer.
BTW, I am assuming the file exists on the PC side, or at least can be copied there, because my program will be for Windows.
And by 2,3 in size, did you mean 2.3 GB?
Avatar of olmy

ASKER

Ok, sweet. I ment 2,3 MB - sorry about that. Windows software is ok. What kind of programming and file comparison tools are you using?

You rank pretty high in top experts list, so I began to wonder how your worklife is organized, how can you arrange time for experts-exchange "work"  ?
ASKER CERTIFIED SOLUTION
Avatar of r-k
r-k

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of olmy

ASKER

Thank you r-k. I understand the problem. Based on your information I began to work with plan B. The bck file is a backup with compression. I have the source code and I hope I could track the location where the missing char should be and fix it. I'll let you know how it goes.
Avatar of olmy

ASKER

Thank you very much. With your information I was able to restore the backup file ;)

Million thanks
  Janne
Avatar of olmy

ASKER

Thank you r-k. The plan B worked. I was able to restore the backup file. With the information you provided I could locate the missing chars. I have the sourcecode for the restore routine, so I could pinpoint correct location of missing chars. I took quite a lot of work, but after 10 hours backupfile was restored successfully.

Million thanks
  Janne
Janne,

That is the most wonderful news. After I posted my last message I was rather discouraged, and knew it would take a very large amount of work and skill, and some luck. I had meant to post some additional thoughts after knowing that you were trying anyway, but have been very busy with non-EE things the last month. So I am delighted to hear of your success. Well done.

Wishing everyone best wishes for the New Year.

r-k
Avatar of olmy

ASKER

Without your help r-k I would have never even tried to fix the backup file. You can only imagine how happy our customer is ;)

Happy christmas
  Janne
The problem of transferring compressed files (typically zip or gz or gzip) in ftp ascii mode instead of binary mode can be fixed but it is very difficult.

The first problem is that there is a loss of information, and in a compressed file, every bit counts.  Corrupted bits send the decompressor into the woods pretty rapidly.

The files can be repaired by doing a search that tries every possible combination of repairs, runs the decompressor, and determines whether the repairs were right, e.g. based on getting a correct zip checksum at the end.

The second problem is that for anything non-trivial in size, trying every possible repair is computationially infeasible -- the number of combinations grows exponentially, it's 2^(#repairs).  But that problem can be solved too by guiding the search and trying the most-likely repairs first (requires some knowledge of what the output file "should" look like; that's used to build a search heuristic function).

I've done it successfully, have recovered gigabytes of compressed then corrupted web server logs.
I've also seen it work on compressed tar files containing source code.

To my knowledge nobody else has done this successfully.  Period.

But it requires custom coding of the heuristic function, which makes it a consulting project, not a turn-key program.  So, really, it's only affordable by someone who is has lost something pretty valuable, and does not have any other means of replacing it.

My web site http://www.bukys.com/services/recovery/ gives some more information on this topic.

P.S. The various "zip repair" programs that are out there generally do not fix the data, they just overwrite checksums so that error messages go away through the corrupt data remains.  For some minimally-damaged zip files, that's all some people need.  But for recovering a systematically damaged compressed file, stronger medicine is necessary.