Solved

How to Recursively Search Drive for Specific File Types

Posted on 2004-04-05
12
1,958 Views
Last Modified: 2013-12-03
Hi,
I'm writing a utility program that will allow LAN Managers to remotely backup files on remote PC's as a preperatory step to the user either moving or receiving a new PC. Actually the backup is not really a backup, but a copy to a folder of their choice. The main functionality behind this piece is based upon some code I found I think on CodeGuru, or CodeProject. Anyway, it worked fine when I passed the method the folder to start searching in, along with the file type to search for. However, a common complaint was that the LAN Managers would need to run this multiple times to capture the different files. For instance, they may have .docs in ..\My Documents, and .xls files in \CurrentProjects. They would have to then check the "doc" extenstion check box and the populate the start search text box and run the program, and then check the "xls" check box and populate its start search text box. Well, one approach would have been to place a text box to start searching from next to each check box. However, I was trying to search from the root of the drive, and compare each file's extension to that being sought, and if there was a match, a copy would follow. Needless to say, I have had some major issues trying to get it to work.

I'm posting the most current version of what DOESN'T work. If any has some ideas regarding how I can recursively find file types starting from the root, I would appreciate it.

Thanks,
Jeff


BOOL CFileBackupDlg::CopyExtension(CString csRoot, CString csDest, CString csCriteria)
{
      BOOL                        bRet = FALSE;
      BOOL                        bRes = FALSE;
      CString                        csPathMask;
      CString                        csFullPath;
      CString                        csNewFullPath;
      CString                        csNewPath;
      CString                        csMsg;
      CString                        strFirst;
      CString                        strSecond;
      CString                        csBuf;
      CString                        csSub;
      CString                        csToMatch;
      DWORD                        dwRet = 0;
      WIN32_FIND_DATA            fd;
      HANDLE                        hFind;
      int                              nchar = 0;

      ////////////////////////////////////////////////////////////////
      csRoot += _T("\\");
      csDest += _T("\\");

      csNewPath = csDest + csCriteria.Mid(2);
      CreateDirectory (csNewPath, NULL);
      csPathMask = csRoot + _T("*.*");

      hFind = FindFirstFile (csPathMask, &fd);
      if (hFind == INVALID_HANDLE_VALUE)
      {
            MessageBox ("The file criteria did not return any matches. Ensure the folder selection is correct and try again.",
                              "File Find Error", MB_OK);
            return FALSE;
      }
      else
      {
            // strip the extension from the incoming criteria
            nchar = csCriteria.Find (_T("."));
            if (nchar != -1)
                  csToMatch = csCriteria.Mid (nchar + 1);
            // see if the file found is one we're looking for
            csBuf = fd.cFileName;
            //strip the extension from the file we found
            nchar = csBuf.Find (_T("."));
            if (nchar != -1)
            {
                  csSub = csBuf.Mid (nchar + 1);
                  if (csSub.CompareNoCase (csToMatch) == 0)
                  {
                        csFullPath = csRoot + fd.cFileName;
                        csNewFullPath = csDest + fd.cFileName;
                        CopyFile (csFullPath, csNewFullPath, FALSE);
                        strFirst = GetCRC (csFullPath, dwRet);
                        strSecond = GetCRC (csNewFullPath, dwRet);
                        if (strFirst != strSecond)
                        MessageBox ("File copy integrity check failed! The file may be corrupt or missing!",
                                          csFullPath, MB_OK);

                        csMsg = "Copying: " + csFullPath;
                        SetDlgItemText (IDC_STATUS, csMsg);

                  }
            }
            while (hFind && FindNextFile (hFind, &fd))
            {
                  //need to add checking to check for matching extensions

                  nchar = csCriteria.Find (_T("."));
                  if (nchar != -1)
                        csToMatch = csCriteria.Mid (nchar + 1);
                  // see if the file found is one we're looking for
                  csBuf = fd.cFileName;
                  //strip the extension from the file we found
                  nchar = csBuf.Find (_T("."));
                  if (nchar != -1)
                  {
                        csSub = csBuf.Mid (nchar + 1);
                        if (csSub.CompareNoCase (csToMatch) == 0)
                        {
                              csFullPath = csRoot + fd.cFileName;
                              csNewFullPath = csDest + fd.cFileName;
                              // see if it's a file or a folder
                              if (!(fd.dwFileAttributes & FILE_ATTRIBUTE_DIRECTORY))
                              {
                                    bRes = CopyFile (csFullPath, csNewFullPath, FALSE);
                                    strFirst = GetCRC (csFullPath, dwRet);
                                    strSecond = GetCRC (csNewFullPath, dwRet);
                                    if (strFirst != strSecond)
                                          MessageBox ("File copy integrity check failed! The file may be corrupt or missing!",
                                                csFullPath, MB_OK);
                                    if (!bRes)
                                    {
                                          bRet = FALSE;
                                    }
                                    csMsg = "Copying: " + csFullPath;
                                    SetDlgItemText (IDC_STATUS, csMsg);
                              }
                              else // probably a directory
                              {
                                    if ((_tcscmp (fd.cFileName, _T(".")) != 0) &&
                                          (_tcscmp (fd.cFileName, _T("..")) != 0))
                                    {
                        
                                          if (!CopyFolder (csFullPath, csNewFullPath, csCriteria))
                                          {
                                                bRet = FALSE;
                                          }
                                    }
                              }
                        }

                  }
                  else // probably a directory
                  {
                        if ((_tcscmp (fd.cFileName, _T(".")) != 0) &&
                                    (_tcscmp (fd.cFileName, _T("..")) != 0))
                        {
                              if (!CopyFolder (csFullPath, csNewFullPath, csCriteria))
                              {
                                    bRet = FALSE;
                              }
                        }
                  }

            }
      }
      SetDlgItemText (IDC_STATUS, "");
      FindClose (hFind);
      return bRet;
                        
}

0
Comment
Question by:jpetter
  • 4
  • 4
  • 2
  • +2
12 Comments
 
LVL 86

Expert Comment

by:jkr
ID: 10759782
Could you be a bit more specific about what exactly is not working?
0
 
LVL 44

Expert Comment

by:Karl Heinz Kremer
ID: 10759814
I would create a string that contains all extensions you are interested in:
string extensions = ".doc.xls.pdf.txt";

Create this when the user clicks the "OK" button, after all the extensions that need to be backed up are selected. Then, in your look that searches for the files, after you extract the extension of the current file (stored in the variable currentExtension), do something like this:

if (extensions.find(currentExtension) != npos)
{
    // file needs to be backed up
}
else
{
    // file does not need to be backed up - you may not need this else branch
}
0
 

Author Comment

by:jpetter
ID: 10759894
Sure, it does iterate through the files in the root, and through the debugger (BTW, MS VC++ v 6) I can see it comparing the extenstions to the search criteria. When the current file is a folder, and I try to "step into" the recursive call, I can't for some reason. However, it bombs out at the beginning of the method where I check to see if I have a valid file handle returned from FindFirstFile. I wish I was more competent with the debugger, because I would like to see what is being passed for the values. Normally I have no problem, but when it's called recursively I can't see it.

Thanks,
Jeff
0
 
LVL 44

Expert Comment

by:Karl Heinz Kremer
ID: 10759941
Looks like I misunderstood your question :-)

Can you please provide the CopyFolder() code as well.
0
 
LVL 44

Assisted Solution

by:Karl Heinz Kremer
Karl Heinz Kremer earned 135 total points
ID: 10759949
BTW: Usually when you put your mouse over a variable name in the debugger, you will get a tooltip with the current value.
0
 
LVL 86

Assisted Solution

by:jkr
jkr earned 135 total points
ID: 10759954
>> When the current file is a folder, and I try to "step into" the recursive call, I can't for some reason.

The reason seems to be that in

                        else // probably a directory
                        {
                             if ((_tcscmp (fd.cFileName, _T(".")) != 0) &&
                                   (_tcscmp (fd.cFileName, _T("..")) != 0))
                             {
                   
                                  if (!CopyFolder (csFullPath, csNewFullPath, csCriteria))
                                  {
                                       bRet = FALSE;
                                  }
                             }
                        }

you are

a) calling a different function ('CopyFolder()' instead of 'CopyExtension()'
b) not appending fd.cFileName to csFullPath before making the call
0
Highfive Gives IT Their Time Back

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 

Author Comment

by:jpetter
ID: 10760038
jkr &  khkremer,

You guys have given me some great help. Let me play around with this a second.

One problem, as both of you caught. I modified my CopyFolder function to end up with CopyExtension, but never changed the call when I go to call it recursively. And that is probably why I don't append the csFullPath, because the variable names are different.

If this turns out to be the solution, what a total tool I'll feel like. How careless.

Thanks, and I'll get right back.
0
 

Author Comment

by:jpetter
ID: 10760612
Well, those eye openers that caught my oversights did allow me to get further. Now I just have to rework the function a little so that it doesn't die when I hit an empty folder.

Thanks,
Jeff
0
 
LVL 44

Expert Comment

by:Karl Heinz Kremer
ID: 10761989
Where does it die in an empty folder?
0
 
LVL 12

Assisted Solution

by:Salte
Salte earned 55 total points
ID: 10763599
I think you might have problems due to the find file handle?

Note that each search creates a find handle and there is a point in minimizing those.

It might therefore NOT be a good idea to simply scan through each file and if it is a directory you recurse into it immediately and if it is a file you copy it.

If it is a file you can copy it alright but if it is a directory I suggest you just put it in a queue or stack and process that dir after you are done with current dir. If you use a queue the copy will be breadth first and if you use a stack it will be depth first.

So, I suggest you make a simple stack or queue, the queue can be as simple as a queue of strings holding the directory names (full path) of the directory to scan.

step 1 - get a stack class. STL has a vector that can be used as stack or you can make your own or - since I see you use MFC a lot - you can use MFC's array class as a stack.

step 2. Initialize the stack with the root directory you want to search.

step 3. Loop while the stack is non-empty and pop the top element from the stack and scan that directory. For each element in the scan if it is a file copy it - create directories as necessary to do the copy. If it is a directory push it unto the stack.

When the loop is done your copying is done.

Also, since you use an explicit stack you don't even have to recurse in your program, a simple loop is all:


stack.push(RootDir);
while (! stack.is_empty()) {
    CString dirname = stack.pop();
    // open a find handle and scan the dir here.
    // for each file entry check if it should be copied, scanned or ignored.
   while (FindNextFile(....)) {
        if (current file is dir) {
             stack.push(full_file_name); // May have to construct that name first.
       } else if (current file should be copied)  {
             copy_file(....);
       } // else it should be ignored.
   }
    close_search_handle();
}

Try this and see if it works better. The clue is to not have too many open search handles at the same time. This code uses only ONE search handle for all the directories since it closes the handle before it searches next directory.

Hope this is of help.

Alf
0
 
LVL 3

Accepted Solution

by:
akalmani earned 175 total points
ID: 10764013
//Win32 way no MFC usage....
void DoIt(LPCTSTR szDir)
{
   WIN32_FIND_DATA FileData;
   HANDLE hSearch = NULL;
   _TCHAR szPath[MAX_PATH] = _T("");     //MAX_PATH can be defined as 255

  _tcscpy(szPath, szDir);
  _tcscat(szPath, _T("*"));

   hSearch = FindFirstFile(szPath, &FileData);
   if(INVALID_HANDLE_VALUE != hSearch)
   {
        while(FindNextFile(hSearch, &FileData))
        {
           //Search for . and .. special files
          if((_tcsicmp(FileData.cFileName, _T(".")) == 0) ||
            (_tcsicmp(FileData.cFileName, _T("..")) == 0))
          {
            continue;
          }

          //Check if it is directory
          if(FILE_ATTRIBUTE_DIRECTORY == FileData.dwFileAttributes)
          {
             _tcscpy(szPath, szDir);
             _tcscat(szPath, FileData.cFileName);
             _tcscat(szPath, _T("\\"));

             DoIt(szPath);//Recursive call
          }
         else
         {
            //szDir will contain the path. Do your file copy after checking the extension here
         }
       }//End of while
     }//End of if(INVALID_HANDLE_VALUE != hSearch)
     
    //Close the search handle.
    FindClose(hSearch);
}


//Give credit to original author Nonubik. This uses MFC
void DoIt(LPCTSTR szDir)
{
  CFileFind   Finder;
  CString     strPath(szDir);
  strPath += "\\*";
  BOOL bFind = Finder.FindFile(strPath);
  while(bFind)
  {
    bFind = Finder.FindNextFile();
   
    // skip . and .. files; otherwise, we'd
    // recur infinitely!
    if(Finder.IsDots())
         continue;

    // if it's a directory, recursively search it
    if (Finder.IsDirectory())
      DoIt(Finder.GetFilePath());
    else
      //Use Finder.GetFilePath() to get the path. Copy file after extension check here.
  }
}

//Refer to recurse a directory
http://www.experts-exchange.com/Programming/Programming_Languages/Cplusplus/Q_20935224.html
0
 

Author Comment

by:jpetter
ID: 10768073
I can't thank all of you enough for all of your help. The support I've received on this has been terrific!

I'll split up the points, and wish I had more to go around.

Thanks again,
Jeff
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Errors will happen. It is a fact of life for the programmer. How and when errors are detected have a great impact on quality and cost of a product. It is better to detect errors at compile time, when possible and practical. Errors that make their wa…
Written by John Humphreys C++ Threading and the POSIX Library This article will cover the basic information that you need to know in order to make use of the POSIX threading library available for C and C++ on UNIX and most Linux systems.   [s…
The viewer will learn how to clear a vector as well as how to detect empty vectors in C++.
The viewer will be introduced to the technique of using vectors in C++. The video will cover how to define a vector, store values in the vector and retrieve data from the values stored in the vector.

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now