Find missing files using PHP

I have a folder "e:\my documents\myfolder" with about 400000 files.

The folder has only 2 types of files extentions:
1. *.txt
2. *.key

Each .txt file must have a corresponding .key file

e.g. abc.txt should also have a file called abc.key in the same folder

But I know there are some .txt files who do not have a .key file.

I need to find those .txt files and move them to a folder called "NoLink".

Please help me with a php script which will take care of this for such a large file base.
nainilAsked:
Who is Participating?
 
Marco GasiConnect With a Mentor FreelancerCommented:
I just tested my code and this is the working one:

<?php

define('DS', DIRECTORY_SEPARATOR);
$filelist = array();
$fromDir = 'actualDir';
$toDir = 'actualDir' . DS . 'NoLink';

if ($handle = opendir($fromDir)) {
  while (($file = readdir($handle)) !== false) {
    if ($file != "." && $file != ".." && !is_dir($file)) {
      $fileNameParts = explode('.', $file);
      echo $file . "<br>";
      if ($fileNameParts[1] == 'txt' && !file_exists($fileNameParts[0] . '.key')) {
        $filelist[] = $file;
        echo " only txt file is $file <br>";
      }
    }
  }
  closedir($handle);
}
echo "<pre>";
var_dump($filelist);
echo "</pre>";
if (!file_exists($toDir)) mkdir($toDir);
foreach ($filelist as $file) {
  copy($fromDir . DS . $file, $toDir . DS . $file);
//  unlink($fromDir . DS . $file);
}

Open in new window


Uncomment unlink line to delete all orphan txt files.

Cheers
0
 
GhostScripterConnect With a Mentor Commented:
Morning!

I assume the following directory structure for the script below:

+ base dir where the script is
++ pairs => here are all *.txt and *.key files
++ nolink => here should all *.txt files without a *.key partner go

<?php

$_ORIG_DIR = 'pairs';
$_NOLINK_DIR = 'nolink';

// walk over all *.txt files in the base dir
foreach(glob($_ORIG_DIR.'/*.txt') as $file) {
	// check if the *.txt file has a *.key partner
	// if not move the *.txt file to nolink dir

	// extract only the filename without ending
	$filename = substr($file, strpos($file, '/')+1, (strpos($file, '.')-2 - strpos($file, '/')+1));
	
	// check if the file has a partner 
	// if no partner is found the file is copied to unlin dir and deleted from the original dir
	if(glob($_ORIG_DIR.'/'.$filename.'.key') != null) {
		echo "Pair found<br />";
	} else {
		copy($file,$_NOLINK_DIR.'/'.$filename.'.txt');
		unlink($file);
		echo "File <b>$filename.txt</b> moved";
	}
}
?>

Open in new window


This script is just a fast shoot. It worked for me on my LAMP system.

Greets,
GhostScripter
0
 
Marco GasiFreelancerCommented:
You can try this untested code:
<?php
define('DS', DIRECTORY_SEPARATOR);
$filelist = array();
$fromDir = 'your_original_dir';
$toDir = 'your_original_dir/NoLink';

if ($handle = opendir("your_original_dir")) {
    while (($file = readdir($handle)) !==false) {
        if (substr($file,0,1) != "." && !is_dir($file))
		    $fileNameParts = explode('.', $file);
			if ($fileNameParts[1]] == 'txt' && !file_exists($fileNameParts[0])){
				$filelist[] = $file;
			}
        }
    }
    closedir($handle);
}
mkdir($toDir);
foreach ($filelist as $file){
	copy($fromDir . DS . $file, $toDir . DS . $file);
	unlink($fromDir . DS . $file);
}

Open in new window


Please if want to test it, comment the unlink statement and echo results to check if it works fine this way:
<?php
define('DS', DIRECTORY_SEPARATOR);
$filelist = array();
$fromDir = 'your_original_dir';
$toDir = 'your_original_dir/NoLink';

if ($handle = opendir("your_original_dir")) {
    while (($file = readdir($handle)) !==false) {
        if (substr($file,0,1) != "." && !is_dir($file))
		    $fileNameParts = explode('.', $file);
			if ($fileNameParts[1]] == 'txt' && !file_exists($fileNameParts[0])){
				$filelist[] = $file;
			}
        }
    }
    closedir($handle);
}
echo "<pre>";
var_dump($filelist);
echo "</pre>";

mkdir($toDir);
foreach ($filelist as $file){
	copy($fromDir . DS . $file, $toDir . DS . $file);
//	unlink($fromDir . DS . $file);
}

Open in new window


Unfortunately, I can't now test it.

Cheers
0
Cloud Class® Course: Microsoft Exchange Server

The MCTS: Microsoft Exchange Server 2010 certification validates your skills in supporting the maintenance and administration of the Exchange servers in an enterprise environment. Learn everything you need to know with this course.

 
InsoftserviceCommented:
pls try out this one
<?php



$path   = "/var/tmp/";
$nolink ="/var/tmp/NoLink/";

$data = getDirList($path);
if(is_array($data))
{
    $notavail =array();
	foreach($data as $key =>$val)
	{
	  if(strpos($val,".txt"))
	  {
	    $replace = str_replace(".txt",'.key',$val);
		if(!file_exists($path.$replace))
		{ 
		  echo $path.$val;
		  echo $nolink.'/'.$val;
		  copy($path.$val,$nolink.'/'.$val);
		  unlink($path.$val);
		}
		
	  }
	} 
}
echo "<pre>";print_r($data);
echo "<pre>";print_r($notavail);



	function getDirList($dirpath)
		{
			if (is_dir($dirpath))
			{
				if ($dh = opendir($dirpath))
				{
					$i = 0;
					while (($file = readdir($dh)) !== false)
					{
						if ($file != "." && $file != ".." && !is_dir($dirpath.'/'.$file))
						{
							$tmp_arr_dirlist[$i] = $file;
							$i++;
						}
					}
				}
				if(is_array($tmp_arr_dirlist))
				{
					sort($tmp_arr_dirlist);
					closedir($dh);
					return $tmp_arr_dirlist;
				}
			}
			return false;
		}

?>

Open in new window

0
 
käµfm³d 👽Connect With a Mentor Commented:
Yet another  : )

<?php

    function compareFilenames ($file1, $file2) { return strcmp(pathinfo($file1, PATHINFO_FILENAME), pathinfo($file2, PATHINFO_FILENAME)); }

    $txts = glob('test/*.txt');
    $keys = glob('test/*.key');

    $missing_key = array_udiff($txts, $keys, "compareFilenames");

    foreach ($missing_key as $txt)
    {
        $fileParts = pathinfo($txt);
        $newFilename = 'NoLink/' . $fileParts['basename'];
        echo "<li>" . $txt . "</li>";
        copy($txt, $newFilename);
        unlink($txt);
    }

?>

Open in new window

0
 
Ray PaseurCommented:
with about 400000 files ... such a large file base
That is not really a very large number of files, but just to be on the safe side, consider using set_time_limit(1) somewhere in the looping process.  That may help you avoid a timeout in your script.

Best to all, ~Ray
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.