Solved

Need to programatically convert binary files to text

Posted on 2013-01-28
8
347 Views
Last Modified: 2013-02-04
Hello,

We have a few gigs of binary files (full of plain text data) that we need to convert to text only files. Whats the easiest way to do this programatically.

to give you an idea of how we can do it manually - we can open each file in notepad and then "save as" a text file (before anyone asks, simply changing the extension does not do the trick). In case you are wondering why text files are being treated as binary - its the fault of our FTP process, since the files were originally uploaded as binary, they are saved as binay on our windows system.

Thanks in advance.
0
Comment
Question by:CodeWrangler
8 Comments
 
LVL 35

Accepted Solution

by:
Robert Schutt earned 500 total points
ID: 38828430
There's a tool for this, try here: http://waterlan.home.xs4all.nl/dos2unix.html#UNIX2DOS
0
 
LVL 29

Expert Comment

by:anarki_jimbel
ID: 38828920
Hmmm... Everything, any text is binary, eventually.

I believe that the encoding is ASCII, isn't it? Try to use

Encoding.ASCII.GetString

method:

http://msdn.microsoft.com/en-us/library/38b953c8.aspx

See a solution like:

http://stackoverflow.com/questions/6006425/binary-to-corresponding-ascii-string-conversion
0
 
LVL 55

Expert Comment

by:Jaime Olivares
ID: 38830113
What do you mean with 'saved as binary file' ?
If you can open them in Notepad, they are not binary.
0
PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

 
LVL 35

Assisted Solution

by:Robert Schutt
Robert Schutt earned 500 total points
ID: 38830155
I'm assuming it's a problem with line endings, hence my post. You could do it yourself in C# (replace LF with CR/LF) but it's a fairly common problem with a 'standard' solution (at least these kind of utilities are common on Unix/Linux). If the source OS is Mac you need a slightly different solution (I think, replace CR with CR/LF) but it could well be included in those tools, haven't checked.
0
 
LVL 55

Expert Comment

by:Jaime Olivares
ID: 38830169
If there is a CR/LF problem, I think there is no need to write a C# application. this can be done with a batch file. Here are some alternatives:
http://stackoverflow.com/questions/3110031/batch-file-convert-lf-to-crlf
or you can use tool like:
http://cleansofts.org/unix2dos.html
0
 
LVL 10

Expert Comment

by:Monica P
ID: 38830210
0
 
LVL 35

Assisted Solution

by:Robert Schutt
Robert Schutt earned 500 total points
ID: 38830245
If you really want to, you could use a little console application like this (I only tested with a small file, use only on a copy of the files or after testing with a big file):
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;

namespace EE_Q_28011639
{
    class Program
    {
        const int bufsiz = 65536;

        const string strLE_Unix = "\n"; // LF
        const string strLE_Mac = "\r"; // CR

        static void Main(string[] args) {
            foreach (string fn in Directory.GetFiles(".", "*.txt", SearchOption.AllDirectories)) {
                try {
                    FileStream fs = File.OpenRead(fn);
                    byte[] fbi = new byte[fs.Length];
                    fs.Read(fbi, 0, fbi.Length);
                    fs.Close();
                    fs = null;
                    string strLE = "";
                    for (int b = 0; b < fbi.Length; b += bufsiz) {
                        string tmp = Encoding.Default.GetString(fbi, b, Math.Min(bufsiz, fbi.Length - b));
                        if (b == 0) { // check first block for existing line endings
                            if (tmp.Contains(System.Environment.NewLine)) {
                                Console.WriteLine("Not converting file '{0}', CR/LF detected", fn);
                                break;
                            } else if (tmp.Contains(strLE_Unix)) {
                                strLE = strLE_Unix;
                                Console.WriteLine("Converting unix file '{0}'", fn);
                            } else if (tmp.Contains(strLE_Mac)) {
                                strLE = strLE_Mac;
                                Console.WriteLine("Converting mac file '{0}'", fn);
                            } else {
                                Console.WriteLine("Not converting file '{0}', no line endings detected at all in first block", fn);
                                break;
                            }
                            fs = File.OpenWrite(fn);
                        }
                        tmp = tmp.Replace(strLE, System.Environment.NewLine);
                        byte[] fbo = Encoding.Default.GetBytes(tmp);
                        fs.Write(fbo, 0, fbo.Length);
                    }
                    if (fs != null) {
                        fs.Flush();
                        fs.Close();
                        fs = null;
                    }
                }
                catch (Exception ex) {
                    Console.WriteLine("Error while processing file '{0}': {1}", fn, ex.Message);
                }
            }
        }
    }
}

Open in new window

0
 

Author Closing Comment

by:CodeWrangler
ID: 38853065
awarding 3 posts from the same user because a combination of the posts got me where i needed to be.
0

Featured Post

Master Your Team's Linux and Cloud Stack!

The average business loses $13.5M per year to ineffective training (per 1,000 employees). Keep ahead of the competition and combine in-person quality with online cost and flexibility by training with Linux Academy.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Exception Handling is in the core of any application that is able to dignify its name. In this article, I'll guide you through the process of writing a DRY (Don't Repeat Yourself) Exception Handling mechanism, using Aspect Oriented Programming.
It was really hard time for me to get the understanding of Delegates in C#. I went through many websites and articles but I found them very clumsy. After going through those sites, I noted down the points in a easy way so here I am sharing that unde…
In a recent question (https://www.experts-exchange.com/questions/28997919/Pagination-in-Adobe-Acrobat.html) here at Experts Exchange, a member asked how to add page numbers to a PDF file using Adobe Acrobat XI Pro. This short video Micro Tutorial sh…
Microsoft Active Directory, the widely used IT infrastructure, is known for its high risk of credential theft. The best way to test your Active Directory’s vulnerabilities to pass-the-ticket, pass-the-hash, privilege escalation, and malware attacks …

832 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question