File Transfer - Ascii vs Binary

I am a setting up a new web server to replace the one we have now.  The current webserver is windows NT and the new one is Linux.  I need a way to transfer the files from the NT server to the new Linux server in a way that treats .txt and .html files as ascii and the .gif and .jpg files as binary.  I wanted to find a tool that could probe a file before downloading it and determine what mode is best for that file.  We are talking about 12G of data spread across about 1,200 user accounts.  I tried using smbclient, but that does only binary transfers.  The ncftpget does have ascii options, but it doesn't know how to autodetect the filetype on the fly as far as I can tell.  If worse comes worse, I'll just do the whole thing in binary since most the text files look fine when displayed on a web browser even though they are not absolutely correct.  When I edit text files that I downloaded in binary mode from the NT server with vi, it has those ^M characters on the end of each line.  That is what I'm trying to avoid.  
LVL 2
inet2xtremeAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

RobWMartinCommented:
Go ahead and do the transfer, then test each file with the file command.  Most ascii files will cause file to return the word text somewhere in its output.  Then, if file says it's text, you can safely remove the ^M with tr.  Thus:

if [ -n "`file $thefile | grep 'text'`" ]
then
    cp $thefile $thefile.bak
    tr -d "\r" < $thefile.bak > $thefile
fi

Hope this helps!

Rob
0
RobWMartinCommented:
BTW:  This is a bash script, so #!/bin/bash should be the first line of the script file you incorporate the above segment into.  Also, the quote characters in the if condition are very important.  There are double quotes, backticks, and single quotes.

For example, create a file (with your fav editor) called un-nt and put this in it:

#!/bin/bash

for thefile in *
do
  if [ -n "`file $thefile | grep 'text'`" ]
  then
      cp $thefile $thefile.bak
      tr -d "\r" < $thefile.bak >    $thefile
   fi
done

This will walk thru any files you mention on the command line, doing the conversion if necessary.

e.g.

un-nt /home/rob/html/*

would test and convert all files in /home/rob/html

Rob

0
RobWMartinCommented:
Oh, one more thing.  You'll need to do this before the script will execute:

chmod 755 un-nt

Another thing, did you notice the script will retain a backup of the file in case something screwy happens.

That's all, fer real :)

Rob
0
The Ultimate Tool Kit for Technolgy Solution Provi

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy for valuable how-to assets including sample agreements, checklists, flowcharts, and more!

inet2xtremeAuthor Commented:
Sorry, I hate to get rejections, but I had to.  Many of the html files don't contain the word "text" in them.  I know that it is possible for ftpclients to detect when I'm ascii mode trasferring binary files since they warn me that it contained some bare end of line characters or something like that.  I really want a automatic transmission so to speak that will get the files and fix them on the fly.  The sheer amount of files is probably over 10,000 or more.  If you can come up with a surefire easy way, you can have the points still.  Thanks for trying.
0
inet2xtremeAuthor Commented:
Sorry, I hate to get rejections, but I had to.  Many of the html files don't contain the word "text" in them.  I know that it is possible for ftpclients to detect when I'm ascii mode trasferring binary files since they warn me that it contained some bare end of line characters or something like that.  I really want a automatic transmission so to speak that will get the files and fix them on the fly.  The sheer amount of files is probably over 10,000 or more.  If you can come up with a surefire easy way, you can have the points still.  Thanks for trying.
0
RobWMartinCommented:
You misunderstood ( or I didn't explain right :)

We are not looking for the word text in the file itself, we are looking for the word text in the output from the file command.  IOW, there is a utility called "file" that examines a file and tries to determine it's type.  Try running it on one of your html files.  Somewhere in the output of this command you should see 'text' if it is an ascii file.  That's what we're looking for.  Do 'man file' for more info.

Rob
0
RobWMartinCommented:
To test the solution in a safe manner, create the script I mentioned above (i.e. un-nt).  Grab some good examples of files that should be converted and some that shouldn't.  Put A COPY of those files in a temporary directory, say /var/tmp/un-nt. Then run the command as follows:

/usr/local/bin/un-nt /var/tmp/un-nt/*

This assumes you put the script file in /usr/local/bin and made it executable.

Then, ls /var/tmp/un-nt and look for *.bak files.  Every one of those should have been converted.

Rob
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
RobWMartinCommented:
Actually, the *.bak files are the originals.  The corresponding files (i.e. with .bak) are the converted versions of the file.

Rob
0
inet2xtremeAuthor Commented:
I'll give that a try.  Maybe I can incorporate that into the download script.  In psudocode:

#Login to the server
smbclient //server/resource -U username || exit 0

#load up the file list I saved/processed from the source server

@filenames=<filelist>;

lcd /tmp/un-nt/
get filenames[fileindex]

call your script with /tmp/un-nt/filenames[fileindex] on the command line:

if [ -n "`file $1 | grep 'text'`" ]
                         then

                             tr -d "\r" < $1 > $1
                         fi

Then copy the fixed or not fixed file from that location to its final path in the /home tree.  

I think this will work.  Thanks for the help.  Sorry for my mis-understanding.  I thought you were simply greping for the word "text".  I've not used the "file" command before except to test if a file exists in a shell script.
0
inet2xtremeAuthor Commented:
Can you think of any way to make the method you listed above that will step though a directory and fix files that are broken recursive?  Instead of checking each file one by one, I'm downloading all of them as you suggested and I'm going to run the un-nt fix on them, however they all have subdirectories with them.  If no good answer, I can process the output from ls -R >files and cut out only the directories and hack up some script to run that on each dir listed.
0
RobWMartinCommented:
Try this:

find /root/path -type d -exec un-nt \{\}/\* \;

I haven't tested this particular invocation, but I've done similar many times before.  The trick is to get the escapes right.

First try it without the -exec ....

Make sure it lists the subdirectories you're interested in.  Then add the -exec ....  back in.  I would try it on a test directory tree first.

Rob
0
inet2xtremeAuthor Commented:
Can you think of any way to make the method you listed above that will step though a directory and fix files that are broken recursive?  Instead of checking each file one by one, I'm downloading all of them as you suggested and I'm going to run the un-nt fix on them, however they all have subdirectories with them.  If no good answer, I can process the output from ls -R >files and cut out only the directories and hack up some script to run that on each dir listed.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Linux

From novice to tech pro — start learning today.