inet2xtreme
asked on
File Transfer - Ascii vs Binary
I am a setting up a new web server to replace the one we have now. The current webserver is windows NT and the new one is Linux. I need a way to transfer the files from the NT server to the new Linux server in a way that treats .txt and .html files as ascii and the .gif and .jpg files as binary. I wanted to find a tool that could probe a file before downloading it and determine what mode is best for that file. We are talking about 12G of data spread across about 1,200 user accounts. I tried using smbclient, but that does only binary transfers. The ncftpget does have ascii options, but it doesn't know how to autodetect the filetype on the fly as far as I can tell. If worse comes worse, I'll just do the whole thing in binary since most the text files look fine when displayed on a web browser even though they are not absolutely correct. When I edit text files that I downloaded in binary mode from the NT server with vi, it has those ^M characters on the end of each line. That is what I'm trying to avoid.
BTW: This is a bash script, so #!/bin/bash should be the first line of the script file you incorporate the above segment into. Also, the quote characters in the if condition are very important. There are double quotes, backticks, and single quotes.
For example, create a file (with your fav editor) called un-nt and put this in it:
#!/bin/bash
for thefile in *
do
if [ -n "`file $thefile | grep 'text'`" ]
then
cp $thefile $thefile.bak
tr -d "\r" < $thefile.bak > $thefile
fi
done
This will walk thru any files you mention on the command line, doing the conversion if necessary.
e.g.
un-nt /home/rob/html/*
would test and convert all files in /home/rob/html
Rob
For example, create a file (with your fav editor) called un-nt and put this in it:
#!/bin/bash
for thefile in *
do
if [ -n "`file $thefile | grep 'text'`" ]
then
cp $thefile $thefile.bak
tr -d "\r" < $thefile.bak > $thefile
fi
done
This will walk thru any files you mention on the command line, doing the conversion if necessary.
e.g.
un-nt /home/rob/html/*
would test and convert all files in /home/rob/html
Rob
Oh, one more thing. You'll need to do this before the script will execute:
chmod 755 un-nt
Another thing, did you notice the script will retain a backup of the file in case something screwy happens.
That's all, fer real :)
Rob
chmod 755 un-nt
Another thing, did you notice the script will retain a backup of the file in case something screwy happens.
That's all, fer real :)
Rob
ASKER
Sorry, I hate to get rejections, but I had to. Many of the html files don't contain the word "text" in them. I know that it is possible for ftpclients to detect when I'm ascii mode trasferring binary files since they warn me that it contained some bare end of line characters or something like that. I really want a automatic transmission so to speak that will get the files and fix them on the fly. The sheer amount of files is probably over 10,000 or more. If you can come up with a surefire easy way, you can have the points still. Thanks for trying.
ASKER
Sorry, I hate to get rejections, but I had to. Many of the html files don't contain the word "text" in them. I know that it is possible for ftpclients to detect when I'm ascii mode trasferring binary files since they warn me that it contained some bare end of line characters or something like that. I really want a automatic transmission so to speak that will get the files and fix them on the fly. The sheer amount of files is probably over 10,000 or more. If you can come up with a surefire easy way, you can have the points still. Thanks for trying.
You misunderstood ( or I didn't explain right :)
We are not looking for the word text in the file itself, we are looking for the word text in the output from the file command. IOW, there is a utility called "file" that examines a file and tries to determine it's type. Try running it on one of your html files. Somewhere in the output of this command you should see 'text' if it is an ascii file. That's what we're looking for. Do 'man file' for more info.
Rob
We are not looking for the word text in the file itself, we are looking for the word text in the output from the file command. IOW, there is a utility called "file" that examines a file and tries to determine it's type. Try running it on one of your html files. Somewhere in the output of this command you should see 'text' if it is an ascii file. That's what we're looking for. Do 'man file' for more info.
Rob
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Actually, the *.bak files are the originals. The corresponding files (i.e. with .bak) are the converted versions of the file.
Rob
Rob
ASKER
I'll give that a try. Maybe I can incorporate that into the download script. In psudocode:
#Login to the server
smbclient //server/resource -U username || exit 0
#load up the file list I saved/processed from the source server
@filenames=<filelist>;
lcd /tmp/un-nt/
get filenames[fileindex]
call your script with /tmp/un-nt/filenames[filei ndex] on the command line:
if [ -n "`file $1 | grep 'text'`" ]
then
tr -d "\r" < $1 > $1
fi
Then copy the fixed or not fixed file from that location to its final path in the /home tree.
I think this will work. Thanks for the help. Sorry for my mis-understanding. I thought you were simply greping for the word "text". I've not used the "file" command before except to test if a file exists in a shell script.
#Login to the server
smbclient //server/resource -U username || exit 0
#load up the file list I saved/processed from the source server
@filenames=<filelist>;
lcd /tmp/un-nt/
get filenames[fileindex]
call your script with /tmp/un-nt/filenames[filei
if [ -n "`file $1 | grep 'text'`" ]
then
tr -d "\r" < $1 > $1
fi
Then copy the fixed or not fixed file from that location to its final path in the /home tree.
I think this will work. Thanks for the help. Sorry for my mis-understanding. I thought you were simply greping for the word "text". I've not used the "file" command before except to test if a file exists in a shell script.
ASKER
Can you think of any way to make the method you listed above that will step though a directory and fix files that are broken recursive? Instead of checking each file one by one, I'm downloading all of them as you suggested and I'm going to run the un-nt fix on them, however they all have subdirectories with them. If no good answer, I can process the output from ls -R >files and cut out only the directories and hack up some script to run that on each dir listed.
Try this:
find /root/path -type d -exec un-nt \{\}/\* \;
I haven't tested this particular invocation, but I've done similar many times before. The trick is to get the escapes right.
First try it without the -exec ....
Make sure it lists the subdirectories you're interested in. Then add the -exec .... back in. I would try it on a test directory tree first.
Rob
find /root/path -type d -exec un-nt \{\}/\* \;
I haven't tested this particular invocation, but I've done similar many times before. The trick is to get the escapes right.
First try it without the -exec ....
Make sure it lists the subdirectories you're interested in. Then add the -exec .... back in. I would try it on a test directory tree first.
Rob
ASKER
Can you think of any way to make the method you listed above that will step though a directory and fix files that are broken recursive? Instead of checking each file one by one, I'm downloading all of them as you suggested and I'm going to run the un-nt fix on them, however they all have subdirectories with them. If no good answer, I can process the output from ls -R >files and cut out only the directories and hack up some script to run that on each dir listed.
if [ -n "`file $thefile | grep 'text'`" ]
then
cp $thefile $thefile.bak
tr -d "\r" < $thefile.bak > $thefile
fi
Hope this helps!
Rob