Link to home
Start Free TrialLog in
Avatar of Sean Scissors
Sean ScissorsFlag for United States of America

asked on

Need help understanding how to copy files via python

I have the following code and am trying to figure out why my copy isn't working. For the record I have tried shutil.copy, shutil.copy2, shutil.copyfile. In my current code I am trying to concatenate just using the deprecated os.system function. My problem is when I use copy, copy2, or copyfile I get one error or another. I either get "Not a directory" or oddly enough if I switch it I get "Is a directory". It's like I can't win. Below is my code. All I am having trouble with is once a file is recursively put into my local drive I want to copy those files recursively to another directory on my local drive.

#!usr/bin/python
import os
import pysftp
import paramiko
import shutil
import os.path

HOST=""
USER=""
#PASSWORD=""
SUFFIX_TO_FETCH__AND_DELETE=".csv"
DESTINATION_PATH= "/mnt/sas/ftp"
DESTINATION_PATH2="/mnt/sas/ftp_ncdfs_coll"

cnopts=pysftp.CnOpts()
cnopts.hostkeys.load('/home/boro/.ssh/id_rsa')

srv = pysftp.Connection(host=HOST, username=USER, cnopts=cnopts)#, password=PASSWORD)


def do_nothing(fname):
        return " "

def fetch_and_remove(fname):
    if not fname.endswith(SUFFIX_TO_FETCH__AND_DELETE):
        return # skip files with wrong suffix
    filename = os.path.split(fname)[1]
    dst_fname = os.path.join(DESTINATION_PATH, filename)
    dst_dirname = os.path.dirname(dst_fname)
    if os.path.isfile(dst_fname):
        return " "
    else: (THIS IS WHERE THE PROBLEM OCCURS)
        srv.get(fname, dst_fname)
#     dst_fname = os.path.split(fname)[0]
        os.system('cp ' +dst_fname DESTINATION_PATH2)
        srv.remove(fname)

srv.walktree('/tolab/sdt', fetch_and_remove, do_nothing, do_nothing)

def fetch_and_remove2(fname2):
    if not fname2.endswith(SUFFIX_TO_FETCH__AND_DELETE):
        return # skip files with wrong suffix
    filename2 = os.path.split(fname2)[1]
    dst_fname2 = os.path.join(DESTINATION_PATH, filename2)
    dst_dirname2 = os.path.dirname(dst_fname2)
    if os.path.isfile(dst_fname2):
        return " "
    else:
        srv.get(fname2, dst_fname2)
        srv.remove(fname2)

srv.walktree('/tolab', fetch_and_remove2, do_nothing, do_nothing)

Open in new window

Avatar of aikimark
aikimark
Flag of United States of America image

Check your indentation.  Maybe the two statements after the comment are failing to be recognized as part of the else: block
Avatar of Member_2_8038612
Member_2_8038612

If I am getting it right, there can be 2 different solutions:-

Sol 1: You are splitting the fname inside a variable filename but using the same fname variable in get() method.
filename = os.path.split(fname)[1]

Open in new window

Try changing , fname in split to filename that you have defined in line no. 27

Sol 2: I can be wrong about above solution (as I don't know what actually is the format of fname), so I looked into PySFTP and According to their PySFTP documentation, The only other argument acceptable in get() is preserve_mtime. In your code, you have defined fname and destination fname:
srv.get(fname, dst_fname)

Open in new window

So, you must change get() to get_d() which takes the local-path as well to save the file.
Avatar of Sean Scissors

ASKER

I will give this a try tomorrow and get back to you. Sorry for the delay and thanks for the responses.
Still no luck with using get_d. It's not an indent error that was just a weird artifact when copying onto EE. I switch it from copy or copy2 and get the opposite errors. My DESTINATION_PATH2 is a set variable but for some reason when trying to use any of the SHUTIL functions it is adding onto that path. I have tried printing the name to see what it shows and it looks correct but for some reason it's not working.

All I am trying to do is copy a file from one directory to another in windows via a python script on a linux vm where i have the directory mounted permanently.

#!usr/bin/python
import os
import pysftp
import paramiko
import shutil
import os.path

HOST=""
USER=""
#PASSWORD=""
SUFFIX_TO_FETCH__AND_DELETE=".csv"
DESTINATION_PATH= "/mnt/sas/ftp"
DESTINATION_PATH2="/mnt/sas/ftp_ncdfs_coll"

cnopts=pysftp.CnOpts()
cnopts.hostkeys.load('/home/boro/.ssh/id_rsa')

#for name, value in sorted(os.environ.items()):
#    print("export %s='%s'" % (name, value))

srv = pysftp.Connection(host=HOST, username=USER, cnopts=cnopts)#, password=PASSWORD)

def do_nothing(fname):
        return " "

def fetch_and_remove(fname):
    if not fname.endswith(SUFFIX_TO_FETCH__AND_DELETE):
        return # skip files with wrong suffix
    filename = os.path.split(fname)[1]
    dst_fname = os.path.join(DESTINATION_PATH, filename)
    dst_dirname = os.path.dirname(dst_fname)
    if os.path.isfile(dst_fname):
        return " "
    else:
        srv.get_d(fname, dst_fname)
        print("dst_fname= " +dst_fname, "fname= " +fname)
        shutil.copy2(dst_fname, DESTINATION_PATH2)
        print("dst_fname= " +dst_fname, "fname= " +fname)
        print("do I see this")
        srv.remove(fname)

srv.walktree('/tolab/sdt', fetch_and_remove, do_nothing, do_nothing)

def fetch_and_remove2(fname2):
    if not fname2.endswith(SUFFIX_TO_FETCH__AND_DELETE):
        return # skip files with wrong suffix
    filename2 = os.path.split(fname2)[1]
    dst_fname2 = os.path.join(DESTINATION_PATH, filename2)
    dst_dirname2 = os.path.dirname(dst_fname2)
    if os.path.isfile(dst_fname2):
        return " "
    else:
        srv.get(fname2, dst_fname2)
        srv.remove(fname2)

srv.walktree('/tolab', fetch_and_remove2, do_nothing, do_nothing)

Open in new window


(u'dst_fname= /mnt/sas/ftp/test.csv', u'fname= /tolab/sdt/test.csv')
Traceback (most recent call last):
  File "new_ncdfs_sftp.py", line 44, in <module>
    srv.walktree('/tolab/sdt', fetch_and_remove, do_nothing, do_nothing)
  File "/usr/local/lib/python2.7/dist-packages/pysftp/__init__.py", line 919, in walktree
    fcallback(pathname)
  File "new_ncdfs_sftp.py", line 39, in fetch_and_remove
    shutil.copy2(dst_fname, DESTINATION_PATH2)
  File "/usr/lib/python2.7/shutil.py", line 130, in copy2
    copyfile(src, dst)
  File "/usr/lib/python2.7/shutil.py", line 83, in copyfile
    with open(dst, 'wb') as fdst:
IOError: [Errno 20] Not a directory: u'/mnt/sas/ftp_ncdfs_coll/test.csv'

If I use shutil.copy2 i get the error "not a directory" and for some reason the name is being concatenated onto the end of my DESTINATION_PATH2. If I switch to shutil.copy then I get the "is a directory" error and it doesn't work that way either. It's odd to be getting opposite errors.
I think there is a larger underlying issue. I just checked our root user and it's empty. I tried to run a ls -l on my normal user I login as and somehow got a "permission denied" error. Something is definitely up with our server and since I haven't touched it I blame my coworker haha darn. Any way to like "revert" back to an older day on a linux OS? You know how Windows has a System Restore. Is there an option in linux for that?
So I was looking for a way to debug my code line by line like that of Java in Netbeans. It seems there is a built in python module pdb but it's not very easy on the eyes. So I researched some more and I have VS2013 and am now trying to use PTVS 2.2 for that. Will see what I come up with.
ASKER CERTIFIED SOLUTION
Avatar of Member_2_8038612
Member_2_8038612

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
@Aseem This has been a living nightmare I tell you. I installed PTVS 2.2 for VS 2013. The problem now lies trying to update my Python on my Windows OS. I have Python 2.7 because that's what is being used on the LinuxVM (dont ask why).  I upgraded my PIP installer using easy_install. Had some issues but eventually got it taken care of. I have successfully (I believe at least), downloaded Paramiko, Pysftp, and possibly (not sure) Cryptography modules. However when I tried to run the debugger I get that it says it is still missing "BCrypt" module which I guess is supposed to be inside Paramiko. Maybe it didn't install fully?

Long story short I can't get past my imports because the modules aren't installing on Windows as expected and are not easy to figure out. In short Python + Windows = ugh....
So I bit the bullet and decided to stop trying to use the copy function. It was working and now it's not but instead of taking a shortcut I just wrote another near identical function.

#!usr/bin/python
import os
import pysftp
import paramiko
import os.path

HOST=""
USER=""
#PASSWORD=""
SUFFIX_TO_FETCH__AND_DELETE=".txt"
DESTINATION_PATH= "/mnt/sas/ftp"
DESTINATION_PATH2="/mnt/sas/ftp_ncdfs_coll"

cnopts=pysftp.CnOpts()
cnopts.hostkeys.load('/home/boro/.ssh/id_rsa')

#for name, value in sorted(os.environ.items()):
#    print("export %s='%s'" % (name, value))

srv = pysftp.Connection(host=HOST, username=USER, cnopts=cnopts)#, password=PASSWORD)



def do_nothing(fname):
        return " "

def fetch_and_remove(fname):
    if not fname.endswith(SUFFIX_TO_FETCH__AND_DELETE):
        return # skip files with wrong suffix
    filename = os.path.split(fname)[1]
    dst_fname = os.path.join(DESTINATION_PATH, filename)
    dst_dirname = os.path.dirname(dst_fname)
    if os.path.isfile(dst_fname):
        return " "
    else:
        srv.get(fname, dst_fname)

srv.walktree('/tolab/sdt', fetch_and_remove, do_nothing, do_nothing)

def fetch_and_remove2(fname2):
    if not fname2.endswith(SUFFIX_TO_FETCH__AND_DELETE):
        return # skip files with wrong suffix
    filename2 = os.path.split(fname2)[1]
    dst_fname2 = os.path.join(DESTINATION_PATH2, filename2)
    dst_dirname2 = os.path.dirname(dst_fname2)
    if os.path.isfile(dst_fname2):
        return " "
    else:
        srv.get(fname2, dst_fname2)
        srv.remove(fname2)

srv.walktree('/tolab/sdt', fetch_and_remove2, do_nothing, do_nothing)

def fetch_and_remove3(fname3):
    if not fname3.endswith(SUFFIX_TO_FETCH__AND_DELETE):
        return # skip files with wrong suffix
    filename3 = os.path.split(fname3)[1]
    dst_fname3 = os.path.join(DESTINATION_PATH, filename3)
    dst_dirname3 = os.path.dirname(dst_fname3)
    if os.path.isfile(dst_fname3):
        return " "
    else:
        srv.get(fname3, dst_fname3)
        srv.remove(fname3)

srv.walktree('/tolab', fetch_and_remove3, do_nothing, do_nothing)

Open in new window


It gave me an error on ONE of the functions yet they are like identical:
Traceback (most recent call last):
  File "new_ncdfs_sftp_test.py", line 53, in <module>
    srv.walktree('/tolab/sdt', fetch_and_remove2, do_nothing, do_nothing)
  File "/usr/local/lib/python2.7/dist-packages/pysftp/__init__.py", line 919, in walktree
    fcallback(pathname)
  File "new_ncdfs_sftp_test.py", line 50, in fetch_and_remove2
    srv.get(fname2, dst_fname2)
  File "/usr/local/lib/python2.7/dist-packages/pysftp/__init__.py", line 249, in get
    self._sftp.get(remotepath, localpath, callback=callback)
  File "/usr/local/lib/python2.7/dist-packages/paramiko/sftp_client.py", line 756, in get
    with open(localpath, 'wb') as fl:
IOError: [Errno 20] Not a directory: u'/mnt/sas/ftp_ncdfs_coll/test2.txt'

What doesn't make sense is I am splitting the name into filename, filename2, and filename3 respectively. Yet for some reason on filename2 it didn't split right since its stating that the path is not to a directory. Everything else works though. So weird.
So this was the closest to the correct answer. The underlying issue is something on our shared drive or linuxVM changed and folders with "_" underscores are no longer recognized. We renamed the folder without underscores and the script works as expected. Aseem if you have any insight on that it would be appreciated, otherwise thank you for the troubleshooting help all the same.
So after removing the "_" from the directory names, the IOError with filename2 is still there or you were able to get it resolved too after that ?
@Aseem

It seems the error was only being thrown because it couldn't access the folder. Once we removed the underscores from the name it works without a hitch. If you have any idea what would cause underscores to all of a sudden be an issue that would be great, but otherwise all is working well now.
If going by the naming conventions that different paradigms follow, using "_" must never be an issue as compared to other special characters/symbols. Although, in case of VM environments there is an issue reported when the hostname or machine name is defined with "_" in it. For ex. this one

On the contrary, this post shows that "_" and other special characters can be used as display names for datastore names, file names, etc.
So, maybe in your case a similar issue (mentioned in former link) is resurfacing, quite inappropriate but most of the people haven't noticed because they haven't tried something like that. I would suggest you to use a different library for SFTP or any other programming language (if possible, just for accessing the path) to come to a conclusion.

Hope this helps.