Solved

Looking for a quicker way to check files

Posted on 2014-11-11
12
132 Views
Last Modified: 2014-11-11
We have several folders containing 3000+ files each.  They contain scanned pdfs.  Some of our staff have been scanning in blank pages to get around the system, and I have just finished going through them all by thumbnailing them via CFPDF and marking the offending files.  I have all the filenames (good and bad) in a table.

I now need to keep up this task, looking at all new files added to the folders each week.  I thought of the dateLastModified of CFDirectory, but some of the files will have old dates on them.  Is there a quicker/better way (instead of going through a CFDirectory listing and querying the table for each file to see if it has already been processed)?
0
Comment
Question by:jdthedj
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 6
  • 6
12 Comments
 
LVL 52

Expert Comment

by:_agx_
ID: 40435913
Unfortunately, I don't think there's a simple method for doing this.

Edit: I agree that querying file by file is not the way to go. What I've done in the past is to generate a list of file names and save it to a .txt file. Then use db tools (like sql server's BULK INSERT) to import the file into a temporary table. Finally, use a query to compare the file names and find out what's missing.  It sounds more complicated than it really is.  I think I posted an example of this somewhere.  I'll see if I can find it.
0
 
LVL 3

Author Comment

by:jdthedj
ID: 40435949
Thanks agx - a sample would be great
0
 
LVL 52

Expert Comment

by:_agx_
ID: 40435955
Quick questions -
1) I forgot, are you on windows or *nix?
2) Do you have cfexecute access on your server? While you could generate the file list with cfdirectory, o/s commands like LS or DIR are faster.
3) Which db - SQL Server or MySQL?
0
Resolve Critical IT Incidents Fast

If your data, services or processes become compromised, your organization can suffer damage in just minutes and how fast you communicate during a major IT incident is everything. Learn how to immediately identify incidents & best practices to resolve them quickly and effectively.

 
LVL 3

Author Comment

by:jdthedj
ID: 40435970
We are on windows and using SQL server 2008 r2
Yes, I have access to cfexecute
0
 
LVL 52

Accepted Solution

by:
_agx_ earned 500 total points
ID: 40436066
I couldn't find the post I was thinking of (It'll probably turn up later - now that I've created a new one ;-)

Anyway, here's the basic idea:

1. Set the basic file paths (folder to check and path to output file)

2. Use DIR to generate the list of file names

3. If the file is successfully was generated, import it into a #temp table. Then do some sort of JOIN with your main table to identify the "new" files.  For example, I used an OUTER JOIN to identify files that exist on disk but NOT in the existing db table: "YourMainTable"
	SELECT TOP 100 tmp.*
	FROM   ##TempImport tmp LEFT JOIN YourMainTable main
				ON main.FileName = tmp.FileName
	WHERE  main.FileName IS NULL;

Open in new window


Full code:

<!--- folder to check for files --->
<cfset folderToCheck = "c:\somefolder\">
<!--- output file containing the file names --->
<cfset saveToFile = ExpandPath("./"& CreateUUID() &".txt")>

<!--- generate list of files --->
<cfexecute name = "C:\windows\System32\cmd.exe" 
    arguments = '/C dir /a-d /b "#folderToCheck#" '
    outputFile="#saveToFile#" 
    timeout = "60"
    errorVariable="errorMessage"></cfexecute>

<!--- import into db and compare --->
<cfif FileExists(saveToFile) AND NOT Len(errorMessage)>
	<cfquery name="qImport" datasource="MSSQL">
		<!--- temp storage --->
		CREATE TABLE ##TempImport
		(
			FileName VARCHAR(500)
		);
	
		<!--- import file names --->
		BULK INSERT ##TempImport 
		FROM '#saveToFile#';
		
		<!--- query to find file names that do NOT exist in the existing db table --->
		SELECT TOP 100 tmp.*
		FROM   ##TempImport tmp LEFT JOIN YourMainTable main
					ON main.FileName = tmp.FileName
		WHERE  main.FileName IS NULL;
		
		<!--- cleanup --->
		DROP ##TempImport 
	</cfquery>		
	
	<cfdump var="#qImport#">

	<!--- cleanup temp file --->
	<cfset FileDelete(saveToFile)>
<cfelse>
	ERROR: Temp file not created
	<cfoutput>#errorMessage#</cfoutput>
</cfif>	

Open in new window

0
 
LVL 52

Expert Comment

by:_agx_
ID: 40436071
BTW, the above is a quick and dirty example to demonstrate the concept. In real code, I'd put this logic in a stored procedure and pass in the file name as a parameter:
0
 
LVL 3

Author Comment

by:jdthedj
ID: 40436082
WOW! Thanks agx - I will test and come back soon
0
 
LVL 3

Author Comment

by:jdthedj
ID: 40436115
I am getting the following error on line 30

Error Executing Database Query.  
[Macromedia][SQLServer JDBC Driver][SQLServer]Unknown object type '#TempImport' used in a CREATE, DROP, or ALTER statement.  

<!--- query to find file names that do NOT exist in the existing db table --->
29 :             SELECT TOP 100 tmp.*
30 :             FROM   ##TempImport tmp LEFT JOIN CF_Temp#Sfx#.dbo.CheckPPTbl main
31 :                               ON main.FileName = tmp.FileName
32 :             WHERE  main.FileName IS NULL;
0
 
LVL 52

Expert Comment

by:_agx_
ID: 40436171
(Edit) Oh wait... I pasted the wrong version. My bad.  It didn't seem to like the UUID file name, so I used this instead

<cfset saveToFile = ExpandPath("./fileNames.txt")>
0
 
LVL 3

Author Closing Comment

by:jdthedj
ID: 40436199
Thanks very much agx - it works now, and I have made it a stored procedure.
0
 
LVL 52

Expert Comment

by:_agx_
ID: 40436272
Cool.  You probably figured this out already :) but .. for the archives, to use dynamic file names you need to build a SQL string and execute it via exec or sp_executesql. See the BOL for an example
0
 
LVL 3

Author Comment

by:jdthedj
ID: 40436280
Thanks agx - I looked at that BOL page while I was getting things to work.
0

Featured Post

DevOps Toolchain Recommendations

Read this Gartner Research Note and discover how your IT organization can automate and optimize DevOps processes using a toolchain architecture.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Hi. There are several upload tutorials using jquery and coldfusion. I found a very interesting one here Upload Your Files using Jquery & ColdFusion and Preview them (http://www.randhawaworld.com/) . I did keep the main js functions but made sever…
Recently while working on a project I got a very annoying cfdocument has no body error message. I had never seen this error before. So I checked the code. The code was pretty simple; it was Just showing me the cfdocumnt tag and inside that tag a …
In a recent question (https://www.experts-exchange.com/questions/29004105/Run-AutoHotkey-script-directly-from-Notepad.html) here at Experts Exchange, a member asked how to run an AutoHotkey script (.AHK) directly from Notepad++ (aka NPP). This video…
Attackers love to prey on accounts that have privileges. Reducing privileged accounts and protecting privileged accounts therefore is paramount. Users, groups, and service accounts need to be protected to help protect the entire Active Directory …

696 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question