Link to home
Start Free TrialLog in
Avatar of inet2xtreme
inet2xtreme

asked on

File Transfer - Ascii vs Binary

I am a setting up a new web server to replace the one we have now.  The current webserver is windows NT and the new one is Linux.  I need a way to transfer the files from the NT server to the new Linux server in a way that treats .txt and .html files as ascii and the .gif and .jpg files as binary.  I wanted to find a tool that could probe a file before downloading it and determine what mode is best for that file.  We are talking about 12G of data spread across about 1,200 user accounts.  I tried using smbclient, but that does only binary transfers.  The ncftpget does have ascii options, but it doesn't know how to autodetect the filetype on the fly as far as I can tell.  If worse comes worse, I'll just do the whole thing in binary since most the text files look fine when displayed on a web browser even though they are not absolutely correct.  When I edit text files that I downloaded in binary mode from the NT server with vi, it has those ^M characters on the end of each line.  That is what I'm trying to avoid.  
Avatar of RobWMartin
RobWMartin

Go ahead and do the transfer, then test each file with the file command.  Most ascii files will cause file to return the word text somewhere in its output.  Then, if file says it's text, you can safely remove the ^M with tr.  Thus:

if [ -n "`file $thefile | grep 'text'`" ]
then
    cp $thefile $thefile.bak
    tr -d "\r" < $thefile.bak > $thefile
fi

Hope this helps!

Rob
BTW:  This is a bash script, so #!/bin/bash should be the first line of the script file you incorporate the above segment into.  Also, the quote characters in the if condition are very important.  There are double quotes, backticks, and single quotes.

For example, create a file (with your fav editor) called un-nt and put this in it:

#!/bin/bash

for thefile in *
do
  if [ -n "`file $thefile | grep 'text'`" ]
  then
      cp $thefile $thefile.bak
      tr -d "\r" < $thefile.bak >    $thefile
   fi
done

This will walk thru any files you mention on the command line, doing the conversion if necessary.

e.g.

un-nt /home/rob/html/*

would test and convert all files in /home/rob/html

Rob

Oh, one more thing.  You'll need to do this before the script will execute:

chmod 755 un-nt

Another thing, did you notice the script will retain a backup of the file in case something screwy happens.

That's all, fer real :)

Rob
Avatar of inet2xtreme

ASKER

Sorry, I hate to get rejections, but I had to.  Many of the html files don't contain the word "text" in them.  I know that it is possible for ftpclients to detect when I'm ascii mode trasferring binary files since they warn me that it contained some bare end of line characters or something like that.  I really want a automatic transmission so to speak that will get the files and fix them on the fly.  The sheer amount of files is probably over 10,000 or more.  If you can come up with a surefire easy way, you can have the points still.  Thanks for trying.
Sorry, I hate to get rejections, but I had to.  Many of the html files don't contain the word "text" in them.  I know that it is possible for ftpclients to detect when I'm ascii mode trasferring binary files since they warn me that it contained some bare end of line characters or something like that.  I really want a automatic transmission so to speak that will get the files and fix them on the fly.  The sheer amount of files is probably over 10,000 or more.  If you can come up with a surefire easy way, you can have the points still.  Thanks for trying.
You misunderstood ( or I didn't explain right :)

We are not looking for the word text in the file itself, we are looking for the word text in the output from the file command.  IOW, there is a utility called "file" that examines a file and tries to determine it's type.  Try running it on one of your html files.  Somewhere in the output of this command you should see 'text' if it is an ascii file.  That's what we're looking for.  Do 'man file' for more info.

Rob
ASKER CERTIFIED SOLUTION
Avatar of RobWMartin
RobWMartin

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Actually, the *.bak files are the originals.  The corresponding files (i.e. with .bak) are the converted versions of the file.

Rob
I'll give that a try.  Maybe I can incorporate that into the download script.  In psudocode:

#Login to the server
smbclient //server/resource -U username || exit 0

#load up the file list I saved/processed from the source server

@filenames=<filelist>;

lcd /tmp/un-nt/
get filenames[fileindex]

call your script with /tmp/un-nt/filenames[fileindex] on the command line:

if [ -n "`file $1 | grep 'text'`" ]
                         then

                             tr -d "\r" < $1 > $1
                         fi

Then copy the fixed or not fixed file from that location to its final path in the /home tree.  

I think this will work.  Thanks for the help.  Sorry for my mis-understanding.  I thought you were simply greping for the word "text".  I've not used the "file" command before except to test if a file exists in a shell script.
Can you think of any way to make the method you listed above that will step though a directory and fix files that are broken recursive?  Instead of checking each file one by one, I'm downloading all of them as you suggested and I'm going to run the un-nt fix on them, however they all have subdirectories with them.  If no good answer, I can process the output from ls -R >files and cut out only the directories and hack up some script to run that on each dir listed.
Try this:

find /root/path -type d -exec un-nt \{\}/\* \;

I haven't tested this particular invocation, but I've done similar many times before.  The trick is to get the escapes right.

First try it without the -exec ....

Make sure it lists the subdirectories you're interested in.  Then add the -exec ....  back in.  I would try it on a test directory tree first.

Rob
Can you think of any way to make the method you listed above that will step though a directory and fix files that are broken recursive?  Instead of checking each file one by one, I'm downloading all of them as you suggested and I'm going to run the un-nt fix on them, however they all have subdirectories with them.  If no good answer, I can process the output from ls -R >files and cut out only the directories and hack up some script to run that on each dir listed.