Link to home
Start Free TrialLog in
Avatar of Stephen Forero
Stephen ForeroFlag for United States of America

asked on

reading xml from sftp download - unexpected end of file while parsing name has occurrend

Good Afternoon All,

I am creating a c# WPF project, that connects to a ftp site, downloads an XML file, then using LINQ, reads the XML file to present charts/graphs/statistics.
Then every one minute, on a timer, I re-connect to ftp, download and read.

I have all this working correctly.
Downloading from ftp using third party library Tamir.SharpSsh
sftpClient.Get("/" + fName, localSourcePath);

Open in new window


Here is the problem, if I have this program running for 8 hours straight, I would say once every 4 hours I get the following error.  Screenshot attached of error...
"a first chance exception of type "system.xml.xmlexception" occurred in system.xml.dll
addition information, unexpected end of file while parsing Name has occurred.
Line 65069 position 14"
User generated image
I've been looking online for a couple days now.  First I thought I wasn't leaving the connection open long enough for ftp, but I think that's not the problem.

I believe the file has not fully downloaded before I read in the file to all my processing.
Anyone have any thoughts???

Thanks in advance


two methods in question...
        private void HandleNewSourceFile()
        {
            Sftp sftpClient = new Sftp(InputAddress.Text, InputUserName.Text, InputPassword.Text);
            bool connected = false;

            try
            {
                sftpClient.Connect();
                statusBarContent.Content = "connection established successfully...";
                connected = true;
            }
            catch (Exception e)
            {
                statusBarContent.Content = "error on login, connection terminated...";
            }


            if (connected == true)
            {
                ArrayList filesObject = sftpClient.GetFileList(".");
                foreach (string fName in filesObject)
                {
                    if (fName == targetFileName)
                    {
                        CreateDirectoryFolder();
                        sftpClient.Get("/" + fName, localSourcePath);
                        
                        statusBarContent.Content = "file located...";

                        var directory = new DirectoryInfo(localSourcePath);
                        var myFile = directory.GetFiles()
                            .OrderByDescending(f => f.LastWriteTime)
                            .First();
                        myFileDetailsClass.CurrentName = myFile.Name;

                        statusBarContent.Content = "loading data...";


                        //check if xml file is ready to be used
                            bool fileCompleted = false;
                            while (true)
                            {
                                fileCompleted = checkXMLfile(localSourcePath + @"\" + targetFileName);
                                if (fileCompleted == true)
                                {
                                    myDataRefClass.LoadAllData(myDataRefClass.libraryXdoc);
                                    statusBarContent.Content = "data loaded successfully at " + DateTime.Now.ToLongTimeString();
                                    break;
                                }
                                else
                                {
                                    statusBarContent.Content = "file creating in progress at " + DateTime.Now.ToLongTimeString();
                                }
                            }

                        sftpClient.Close();
                        break;
                    }
                }
            }
        }

Open in new window


        private bool checkXMLfile(string sampleXmlFile)
        {
            try
            {
                myDataRefClass.libraryXdoc = null;
                myDataRefClass.libraryXdoc = XDocument.Load(sampleXmlFile);
                Debug.Print("file created successfully");
                return true;
            }
            catch (Exception ex)
            {
                Debug.Print("file creation failed" + ex);
                return false;
            }}

Open in new window

Avatar of kaufmed
kaufmed
Flag of United States of America image

You generally don't need to be concerned with "first chance exceptions." Is this exception actually causing your application to halt? Are your catch blocks actually catching this exception?
Avatar of Stephen Forero

ASKER

my program does come to a halt.  The catch blocks are catching it.

the reason I have the "while (true)" is because I was thinking, if it finds the xml file is not ready, it will just keep looping while file is downloading until it fully downloads.  but for some reason, once the file hits that first error, its almost as if the file download stops, because no matter how many times it loops the file is never loaded properly.
Well, you've got what is considered to be a "spinlock" in that your code is continually looping, checking for a condition, without yielding the processor to another process that may be waiting. It is generally advised to avoid using spinlocks, if possible. I'm not sure if you will be able to here, but at the very least I would think you should sleep your thread just to yield the processor to other processes while you are waiting for this file to arrive.

As to your issue, I think you may be encountering a file that is not completely written. If you only have part of a file, then your call to XDocument.Load would fail. What behavior do you witness if you step through your code in the debugger?

Also, have you looked into using the FileSystemWatcher component at all?
I don't mind removing the "spinlock" that I put in.

My best guess so far is that the file is not completely written.  When I am in debugger mode, say for example the first hundred minutes It runs fine, but then on minute 101, once I get to XDocument.Load it fails with the error message in the screenshot I attached.  
Unexpected end of file.

I do not know how FileSystemWatcher would help.  Do you know any other way I can wait for file to fully download?
Download to a temp file.  Compare size of downloaded file to size of file on server.  Overwrite the last good file with the temp file only if the sizes match.
thanks Alex,  the file on the server gets larger approximately every 10 minutes.  How would I go about implementing your method?
Download the file into a staging folder and then, after the download is complete, get the size of the local file and compare it to the size of the remote server file.n   If they match, move the downloaded local file into the actual destination folder where it will be processed.

This way, the logic that deals with processing files will not have to worry about a partial file... that possibility is handled by the download process.  

You could make it even safer by adding logic to the download process such that download will not begin unless the remote file size is static for a while indicating that it is not being modified by the remote process.  Pseudo code form this:
bool bDone = false;
while (!bDone)
{
  x = RemoteFileSize("MyFile");
  Sleep(100); 
  y = RemoteFileSize("MyFile");; 
  if (x == y) 
    bDone = true; 
}

Open in new window

so I'm trying to implement your idea.  And I'm using Tamir.SharpSsh.Scp Class library.
I can't seem to find anything where I can check the size of the file on the remote server.

http://www.databasecure.com/sharpssh/class_tamir_1_1_sharp_ssh_1_1_scp.html

any thoughts?
Hmm, perhaps that SFTP library doesn't support a method for determining the size of the file on the server.  Is there another one you could use?

Another possibility is to download the file in a separate single-purpose service and only process the file with your tool.  I'm thinking specifically about using a scriptable SFTP client to compare file sizes to be sure the file is not still being written on the remote server BEFORE downloading it and then only moving it into the folder where your app picks it up if the download is successful.  This removes two possible causes of incomplete files and simplifies your project.  The script might look like this:
SET file_to_get = "TheFile.xml"
SET staging_folder = "c:\myproject\scratch\"
SET output_folder = "c:\myproject\"  
SET sftp_address = "sftp.thedomain.org"
SET sftp_username = "UserID"
SET sftp_password = "Secret"

:begin
WORKINGDIR staging_folder /create 
ftplogon sftp_address" /user=sftp_username /pw=sftp_password /servertype=SFTP /trust=all
:try_again
getsitefile file_to_get
IFERROR= $error_no_file_found GOTO done
set size1 = %sitefilesize
pause /for=10  ;; wait 10 seconds 
getsitefile file_to_get
set size2 = %sitefilesize
IFNUM!= size1 size2 goto pause_and_try_again
rcvfile file_to_get
IFERROR GOTO done 
MOVE file_to_get output_folder
goto done
:pause_and_try_again
;; we only get here if the file sizes didn't match
pause /for=5
goto try_again
:done
FTPLOGOFF
PAUSE /for=300 ;; wait 5 minutes (300 seconds) 
GOTO begin 

Open in new window

That is example is an infinite loop that does its work every 5 minutes.  You could change the frequency by changing the pause time.  I wrote it with the idea it would be installed as a Windows service so it would just sort of always be on...  you could add logic to skip the download if the size and timestamp of the remote file did NOT change since the last loop iteration.  Anyway, the script syntax I used is for Robo-FTP since that is the sriptable client that I know best but if you have another favorite it should be easy enough to port that syntax.  The only SFTP lib that I've used in C# has a $800 license for one developer so for something like this is cheaper to not roll your own, especially when you consider the cost of programming and testing hours.
Thanks for all the input.  
So as this project is small scale I do not have the budget for a license.
Then to make things more complicated, company protocol restricts admin rights, which in turn blocks the function for me to create windows services.

Here's another thought... in my XML file I have three tables.
for example...
Table 1... 2000 rows of data, each with 10 nodes.
Table 2... 30,000 rows of data, each with 5 nodes
Table 3... 1 row of data, with 5 nodes.

Since table3 is the last table, and it is tiny, (will only ever be 1 row with 5 nodes) what if I just continually loop until it finds a Table3, then somehow check if Table3 has successfully closed tags?

What do you think?  Too much of a bad hack?
Can you reliably identify the very last line of the document?... because if you could you wouldnt even really need to see if it was parseable xml... just read the end of the stream.
so this is what I did.  Its running now and I'm just waiting to see if I get an error message.

On a background worker thread, I download the file from ftp.
Then on main thread, I have the continuous loop until BREAK.

                            while (true)
                            {
                                //Thread.Sleep(500);
                                if (backgroundCompleted == true)
                                {
                                    fileCompleted = checkXMLfile(localSourcePath + @"\" + targetFileName);
                                    if (fileCompleted == true)
                                    {
                                        var configured = myDataRefClass.libraryXdoc.Descendants("Variables");
                                        if (configured != null)
                                        {

                                            myDataRefClass.LoadAllData(myDataRefClass.libraryXdoc);
                                            statusBarContent.Content = "data loaded successfully at " + DateTime.Now.ToLongTimeString();
                                            backgroundCompleted = false;
                                            break;
                                        }
                                    }
                                }
                                else
                                {
                                    statusBarContent.Content = "file creating in progress at " + DateTime.Now.ToLongTimeString();
                                }
                            }

Open in new window


basically the main checker is this line
var configured = myDataRefClass.libraryXdoc.Descendants("Variables");

Open in new window


"Variables", is the name of the last table, that is only 1 line... with 4-5 nodes.
If not null.... then I am assuming everything is okay...

guess we'll see..

what do you think?
Perhaps run in in Debug mode and put a file in that you know will fail so you can make sure it handles failure gracefully.
okay... I think I widdled down the problem a bit more.  So when I am downloading the file from SFTP, every once in awhile the rest of the program tries to use the downloaded file before its fully completed.... as we expected.

and I think the main problem is, once the file has been attempted to be used before completion... the file stops downloading... so there is no reason to wait for completion, because it stops the download.

so I attempted to put the file download part in a background thread worker. had this working.

and the way I tested if the file was ready to be used was as so...
string lastLine = File.ReadLines(localSourcePath + @"\" + targetFileName).Last();
                                        string compareString = lastLine.Substring(0, lastLineLength);
                                        if (compareString == "</NewDataSet>")

Open in new window


basically grab last line and make sure it equals what I know it should be.

if it is not ready to be used...  then cancel the background worker, and re-do it.
But I cannot get the cancel the background worker part to work.

else
                                            backgroundCompleted = false;
                                            downloadFTP.WorkerSupportsCancellation = true;

                                            if (downloadFTP.IsBusy)
                                            {
                                                downloadFTP.CancelAsync();
                                            }

                                            while (downloadFTP.IsBusy)
                                            {
                                                Thread.Sleep(500);
                                                //continuous loop until not busy
                                            }

                                            downloadFTP.RunWorkerAsync(objectHolder);

Open in new window


the background thread never gets cancelled.  and if I try to start a new one... it says can't start a new thread till old one is cancelled.

I seem to be spinning my wheels here... any thoughts?
Make the downloading thread put the file in a staging folder and make it responsible for moving the file from the local staging folder to the local processing folder... this way the processing thread CAN NEVER SEE an incomplete file.
sorry but I'm not quiet sure I follow.  That brings me back to the original problem, if I have it downloading to a staging folder, how will I know the download is done before moving it over?
ASKER CERTIFIED SOLUTION
Avatar of AlexPace
AlexPace
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
thanks for all your help alex