• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 430
  • Last Modified:

Delete small images with cfdirectory and cffile.


Open in new window

<cfquery name="StoryInfo" datasource="nNews">
  select *
  from story
  where id > 680000
</cfquery>



<cfdirectory
    directory="C:\serverroot\n<wbr ></wbr><wbr ></wbr>server\ine<wbr ></wbr>ws"
    name="mydirectory"
    filter="*.jpg"
    sort="datelastmodified DESC">


<cfquery name="mydirectory" dbtype="query">
    select *
    from mydirectory
    where size < 2048
</cfquery>

<cfoutput query="mydirectory">
  <cffile action = "delete" file="C:\serverroot\nserve<wbr ></wbr><wbr ></wbr>r\inews\#m<wbr ></wbr>ydirectory<wbr ></wbr>.name#">
</cfoutput>

Open in new window


News thumbnails are automatically generated from a news feed using cffile. Once in a while an image is created that's corrupted. It's  always less than about 2k in size. I've wasted too much time attempting to discover the reason and now only wish to delete those images on a schedule using ColdFusion.

I want to use CF administrator's Sheduled Event to run this code to delete all files less than the given size.

But with hundreds of thousands of files in the same directory, this takes far too long. How can I look at only the last 100 or so images in the directory so the process goes much faster? I mean, why look at all the files in the folder when all I want are the last 100 or so created? And they are all in numerical order.

I included a cfquery just in case we can use it to shorten the time. Use my snippet as the example. Show me what needs to be changed to make it work.
0
Qsorb
Asked:
Qsorb
  • 6
  • 4
1 Solution
 
Gurpreet Singh RandhawaWeb DeveloperCommented:
Without seeing your exact code:

this is approach i usually follow

<cfset images = valuelist(queryname.thumbnailfile)>
<cfloop list="images" index="i">
<cfif FileExists("#ExPandPath("/images/thumbnails/#i#")#")>
    <cffile action="delete" file="#ExPandPath("/images/thumbnails/#i#")#">
</cfif>
</cfloop>
0
 
QsorbAuthor Commented:
You said, "Without seeing your exact code ..."

What? This does not help at all.

I posted my code, all of it. Didn't you look at my code snippet?

My code works. It's just far too slow because it looks at all the images, not just the top 100.

Your suggestion does not take into account these two factors:

1) Size of the image (filter out all images with a size greater than 2048 bytes.)
2) Last 100 records.

Please look at my code snippet then show me what you mean, based on my code.

Or anyone else who can look at my code and make a suggestion.
0
 
_agx_Commented:
It seems like you're asking 2 different questions. Each has a different answer. Which are you trying to do ultimately?  

1) How to remove ALL files in the directory less than 2K --OR
2) How to check the last 100 images only and remove any of them that are less than 2K

If #2, do you have some way of identifying "new" files other than looking at the date on the file system - like a time stamp column or incrementing ID? If you can determine the "last 100" using a db query, the process would be much faster because you don't have to build a list of 100K files every time.

But with hundreds of thousands of files in the same directory, this takes far too long.


If it's #1, with that many files it's always going to take while. There might be ways to speed it up a little. But first which step in your original process is taking up the most time? 1) Generating the directory listing 2) the QoQ or the 3) deleting files within a loop? I'd guess #1.


generated from a news feed using cffile.

Going forward, that's the best time to check for corruption. So if you haven't done it already - I'd recommend modifying the process to check for bad images when they're 1st created. Doing it after the fact is always going to be slower because like you said - there's tons of files to examine.
0
Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
QsorbAuthor Commented:
I need both:

1) Size of the image (filter out all images with a size greater than 2048 bytes.)
2) Last 100 records.

<cfquery name="StoryInfo" datasource="nNews">
  select top 100 *
  from story
</cfquery>

Wouldn't that give me the last 100 records? The record ID and the image name are the same, that  is, the image name is the id record number, plus the .jpg extension.

Generating the directory takes up nearly all the time.

I gave the query so that someone could show me how to use my cfquery to limit the search for the last 100 records (images).  We can do it by ID greater than but I assume top 100 would be easier as I'd not need to update the code.

Does this make sense?
0
 
_agx_Commented:
Wouldn't that give me the last 100 records?

If you sort the results by the ID in descending order, yes:

          SELECT TOP 100 *
          FROM   story
          ORDER BY RecordIDColumn DESC

Does this make sense?

Yes and no.  The query above (with a little more code) would let you check the last 100 records. Then you could loop through the query, and discard the corresponding of image if it's < 2K.  But obviously there's still the other 999,900 images in your directory to check.

So my question is - are you ultimately trying to check ALL 100K images?  If yes, you'll need a different strategy.  There's no way to do it in a single shot. Generating a directory list of 100K files is going to take a while - no matter what tool you use.

You need to break the process into batches. Like create a scheduled task that processes 100 images or so at a time. Then marks the db records as processed and calls itself again until all of them are processed.

Keep in mind it should be a 1-time cleanup task. Going forward the process should check images when they're created. So you never have to perform a monstrous cleanup task again.
0
 
QsorbAuthor Commented:
Agx: I only want to check the last 100 images/records. I have the query limiting to the last 100 records, but my point is, and has been, that I don't know how to apply it to the code snippet I gave.

That has, all along, been the question. Can you show me how to add this query:

SELECT TOP 100 *
FROM   story
ORDER BY ID DESC

To my CFDIRECTORY, and/or CFFILE, I'll give it a try.

ID is INT PK with identity. And again, the ID number is also the filename of the image.
0
 
_agx_Commented:
Ah, ok. I was a little confused because the cfdirectory and QoQ code above are doing the exact opposite.

Anyway, first run your query

<cfquery name="getLatestImages" datasource="nNews">
    SELECT TOP 100 *
    FROM   story
    ORDER BY ID DESC
</cfquery>

Loop through it and use getFileInfo() to get the image size. Then delete it if it's < 2K. I'm guessing about your column names, so change as needed.

Edit corrected typo in code:  

<!--- assumes YourImageColumnName stores file name:  ie "5.jpg" or "someFile.jpg" --->
<cfloop query="getLatestImages">
       <cfset pathToImage = "C:\serverroot\nserve\"& YourImageColumnName>
       <cfif FileExists(pathToImage)>
           <cfset info = getFileInfo(pathToImage)>
           <!--- automatically delete the image if it's smaller than 2K --->
           <cfif info.size lt 2048>
                <cffile action="delete" file="#info.path#">
            </cfif>
       </cfif>
</cfloop>
0
 
QsorbAuthor Commented:
It was a bit confusing at first because I had never used GetFileInfo, didn't know it existed.

But with your help, got this to work with this code very well, exactly as I needed. And now I've learned about a new tag.


<cfquery name="getLatestImages" datasource="nNews">
    SELECT TOP 200 *
    FROM story
    ORDER BY ID DESC
</cfquery>

<cfloop query="getLatestImages">
 <cfset pathToImage = "C:\serverroot\nserve\"& #ID# & '.jpg'>
 
 <cfif FileExists(pathToImage)>
  <cfset MyFile="C:\serverroot\nserve\#STORYID#.jpg">
  <cfset FileInfo=GetFileInfo(MyFile)>  
 
  <cfif FileInfo.size lt 2048>  
 <cffile action="delete" file="C:\serverroot\nserve\<cfoutput>#STORYID#</cfoutput>.jpg">
  </cfif>
 
 </cfif>
</cfloop>
<cfquery name="getLatestImages" datasource="nNews">
    SELECT TOP 200 *
    FROM story
    ORDER BY ID DESC
</cfquery>

<cfloop query="getLatestImages">
 <cfset pathToImage = "C:\serverroot\nserve\"& #ID# & '.jpg'>
 
 <cfif FileExists(pathToImage)>
  <cfset MyFile="C:\serverroot\nserve\#STORYID#.jpg">
  <cfset FileInfo=GetFileInfo(MyFile)>  
  
  <cfif FileInfo.size lt 2048>   
 <cffile action="delete" file="C:\serverroot\nserve\<cfoutput>#STORYID#</cfoutput>.jpg">
  </cfif>
  
 </cfif>
</cfloop>

Open in new window

0
 
QsorbAuthor Commented:
Thanks for your help.
0
 
_agx_Commented:
It was a bit confusing at first because I had never used GetFileInfo, didn't know it existed.

Yeah, it was one of those hidden jewels introduced in CF8.  Thank goodness. No more doing a directory listing just to get the size of 1 file, or resorting to java (what a pain.) GetFileInfo also returns a lot of good info too, like file type, path, parent, ...

<cffile action="delete" file="C:\serverroot\nserve\<cfoutput>#STORYID#</cfoutput>.jpg">

Fyi, you never have to use <cfoutput> tags when a variable is inside a core CF tag like cffile, cfinput, cfquery, etc... They're always evaluated automatically

ie  Only need     <cffile action="delete" file="C:\serverroot\nserve\#STORYID#.jpg">
0
 
QsorbAuthor Commented:
> Fyi, you never have to use <cfoutput> tags when a variable is inside a core CF tag

Good thing to remember. You're just full of all kinds of helpful tidbits. Thanks so much!
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

  • 6
  • 4
Tackle projects and never again get stuck behind a technical roadblock.
Join Now