Solved

Delete small images with cfdirectory and cffile.

Posted on 2012-03-29
11
409 Views
Last Modified: 2012-06-27

Open in new window

<cfquery name="StoryInfo" datasource="nNews">
  select *
  from story
  where id > 680000
</cfquery>



<cfdirectory
    directory="C:\serverroot\n<wbr ></wbr><wbr ></wbr>server\ine<wbr ></wbr>ws"
    name="mydirectory"
    filter="*.jpg"
    sort="datelastmodified DESC">


<cfquery name="mydirectory" dbtype="query">
    select *
    from mydirectory
    where size < 2048
</cfquery>

<cfoutput query="mydirectory">
  <cffile action = "delete" file="C:\serverroot\nserve<wbr ></wbr><wbr ></wbr>r\inews\#m<wbr ></wbr>ydirectory<wbr ></wbr>.name#">
</cfoutput>

Open in new window


News thumbnails are automatically generated from a news feed using cffile. Once in a while an image is created that's corrupted. It's  always less than about 2k in size. I've wasted too much time attempting to discover the reason and now only wish to delete those images on a schedule using ColdFusion.

I want to use CF administrator's Sheduled Event to run this code to delete all files less than the given size.

But with hundreds of thousands of files in the same directory, this takes far too long. How can I look at only the last 100 or so images in the directory so the process goes much faster? I mean, why look at all the files in the folder when all I want are the last 100 or so created? And they are all in numerical order.

I included a cfquery just in case we can use it to shorten the time. Use my snippet as the example. Show me what needs to be changed to make it work.
0
Comment
Question by:Qsorb
  • 6
  • 4
11 Comments
 
LVL 16

Expert Comment

by:Gurpreet Singh Randhawa
ID: 37786917
Without seeing your exact code:

this is approach i usually follow

<cfset images = valuelist(queryname.thumbnailfile)>
<cfloop list="images" index="i">
<cfif FileExists("#ExPandPath("/images/thumbnails/#i#")#")>
    <cffile action="delete" file="#ExPandPath("/images/thumbnails/#i#")#">
</cfif>
</cfloop>
0
 

Author Comment

by:Qsorb
ID: 37789123
You said, "Without seeing your exact code ..."

What? This does not help at all.

I posted my code, all of it. Didn't you look at my code snippet?

My code works. It's just far too slow because it looks at all the images, not just the top 100.

Your suggestion does not take into account these two factors:

1) Size of the image (filter out all images with a size greater than 2048 bytes.)
2) Last 100 records.

Please look at my code snippet then show me what you mean, based on my code.

Or anyone else who can look at my code and make a suggestion.
0
 
LVL 52

Expert Comment

by:_agx_
ID: 37789357
It seems like you're asking 2 different questions. Each has a different answer. Which are you trying to do ultimately?  

1) How to remove ALL files in the directory less than 2K --OR
2) How to check the last 100 images only and remove any of them that are less than 2K

If #2, do you have some way of identifying "new" files other than looking at the date on the file system - like a time stamp column or incrementing ID? If you can determine the "last 100" using a db query, the process would be much faster because you don't have to build a list of 100K files every time.

But with hundreds of thousands of files in the same directory, this takes far too long.


If it's #1, with that many files it's always going to take while. There might be ways to speed it up a little. But first which step in your original process is taking up the most time? 1) Generating the directory listing 2) the QoQ or the 3) deleting files within a loop? I'd guess #1.


generated from a news feed using cffile.

Going forward, that's the best time to check for corruption. So if you haven't done it already - I'd recommend modifying the process to check for bad images when they're 1st created. Doing it after the fact is always going to be slower because like you said - there's tons of files to examine.
0
3 Use Cases for Connected Systems

Our Dev teams are like yours. They’re continually cranking out code for new features/bugs fixes, testing, deploying, testing some more, responding to production monitoring events and more. It’s complex. So, we thought you’d like to see what’s working for us.

 

Author Comment

by:Qsorb
ID: 37789454
I need both:

1) Size of the image (filter out all images with a size greater than 2048 bytes.)
2) Last 100 records.

<cfquery name="StoryInfo" datasource="nNews">
  select top 100 *
  from story
</cfquery>

Wouldn't that give me the last 100 records? The record ID and the image name are the same, that  is, the image name is the id record number, plus the .jpg extension.

Generating the directory takes up nearly all the time.

I gave the query so that someone could show me how to use my cfquery to limit the search for the last 100 records (images).  We can do it by ID greater than but I assume top 100 would be easier as I'd not need to update the code.

Does this make sense?
0
 
LVL 52

Expert Comment

by:_agx_
ID: 37789548
Wouldn't that give me the last 100 records?

If you sort the results by the ID in descending order, yes:

          SELECT TOP 100 *
          FROM   story
          ORDER BY RecordIDColumn DESC

Does this make sense?

Yes and no.  The query above (with a little more code) would let you check the last 100 records. Then you could loop through the query, and discard the corresponding of image if it's < 2K.  But obviously there's still the other 999,900 images in your directory to check.

So my question is - are you ultimately trying to check ALL 100K images?  If yes, you'll need a different strategy.  There's no way to do it in a single shot. Generating a directory list of 100K files is going to take a while - no matter what tool you use.

You need to break the process into batches. Like create a scheduled task that processes 100 images or so at a time. Then marks the db records as processed and calls itself again until all of them are processed.

Keep in mind it should be a 1-time cleanup task. Going forward the process should check images when they're created. So you never have to perform a monstrous cleanup task again.
0
 

Author Comment

by:Qsorb
ID: 37789593
Agx: I only want to check the last 100 images/records. I have the query limiting to the last 100 records, but my point is, and has been, that I don't know how to apply it to the code snippet I gave.

That has, all along, been the question. Can you show me how to add this query:

SELECT TOP 100 *
FROM   story
ORDER BY ID DESC

To my CFDIRECTORY, and/or CFFILE, I'll give it a try.

ID is INT PK with identity. And again, the ID number is also the filename of the image.
0
 
LVL 52

Accepted Solution

by:
_agx_ earned 500 total points
ID: 37789628
Ah, ok. I was a little confused because the cfdirectory and QoQ code above are doing the exact opposite.

Anyway, first run your query

<cfquery name="getLatestImages" datasource="nNews">
    SELECT TOP 100 *
    FROM   story
    ORDER BY ID DESC
</cfquery>

Loop through it and use getFileInfo() to get the image size. Then delete it if it's < 2K. I'm guessing about your column names, so change as needed.

Edit corrected typo in code:  

<!--- assumes YourImageColumnName stores file name:  ie "5.jpg" or "someFile.jpg" --->
<cfloop query="getLatestImages">
       <cfset pathToImage = "C:\serverroot\nserve\"& YourImageColumnName>
       <cfif FileExists(pathToImage)>
           <cfset info = getFileInfo(pathToImage)>
           <!--- automatically delete the image if it's smaller than 2K --->
           <cfif info.size lt 2048>
                <cffile action="delete" file="#info.path#">
            </cfif>
       </cfif>
</cfloop>
0
 

Author Comment

by:Qsorb
ID: 37790009
It was a bit confusing at first because I had never used GetFileInfo, didn't know it existed.

But with your help, got this to work with this code very well, exactly as I needed. And now I've learned about a new tag.


<cfquery name="getLatestImages" datasource="nNews">
    SELECT TOP 200 *
    FROM story
    ORDER BY ID DESC
</cfquery>

<cfloop query="getLatestImages">
 <cfset pathToImage = "C:\serverroot\nserve\"& #ID# & '.jpg'>
 
 <cfif FileExists(pathToImage)>
  <cfset MyFile="C:\serverroot\nserve\#STORYID#.jpg">
  <cfset FileInfo=GetFileInfo(MyFile)>  
 
  <cfif FileInfo.size lt 2048>  
 <cffile action="delete" file="C:\serverroot\nserve\<cfoutput>#STORYID#</cfoutput>.jpg">
  </cfif>
 
 </cfif>
</cfloop>
<cfquery name="getLatestImages" datasource="nNews">
    SELECT TOP 200 *
    FROM story
    ORDER BY ID DESC
</cfquery>

<cfloop query="getLatestImages">
 <cfset pathToImage = "C:\serverroot\nserve\"& #ID# & '.jpg'>
 
 <cfif FileExists(pathToImage)>
  <cfset MyFile="C:\serverroot\nserve\#STORYID#.jpg">
  <cfset FileInfo=GetFileInfo(MyFile)>  
  
  <cfif FileInfo.size lt 2048>   
 <cffile action="delete" file="C:\serverroot\nserve\<cfoutput>#STORYID#</cfoutput>.jpg">
  </cfif>
  
 </cfif>
</cfloop>

Open in new window

0
 

Author Closing Comment

by:Qsorb
ID: 37790064
Thanks for your help.
0
 
LVL 52

Expert Comment

by:_agx_
ID: 37791546
It was a bit confusing at first because I had never used GetFileInfo, didn't know it existed.

Yeah, it was one of those hidden jewels introduced in CF8.  Thank goodness. No more doing a directory listing just to get the size of 1 file, or resorting to java (what a pain.) GetFileInfo also returns a lot of good info too, like file type, path, parent, ...

<cffile action="delete" file="C:\serverroot\nserve\<cfoutput>#STORYID#</cfoutput>.jpg">

Fyi, you never have to use <cfoutput> tags when a variable is inside a core CF tag like cffile, cfinput, cfquery, etc... They're always evaluated automatically

ie  Only need     <cffile action="delete" file="C:\serverroot\nserve\#STORYID#.jpg">
0
 

Author Comment

by:Qsorb
ID: 37798546
> Fyi, you never have to use <cfoutput> tags when a variable is inside a core CF tag

Good thing to remember. You're just full of all kinds of helpful tidbits. Thanks so much!
0

Featured Post

Master Your Team's Linux and Cloud Stack

Come see why top tech companies like Mailchimp and Media Temple use Linux Academy to build their employee training programs.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

The task A number given should be formatted for easy reading by separating digits into triads. Format must be made inline via JavaScript, i.e., frameworks / functions are not welcome. So let’s take a number like this “12345678.91¿ and format i…
Recently while working on a project I got a very annoying cfdocument has no body error message. I had never seen this error before. So I checked the code. The code was pretty simple; it was Just showing me the cfdocumnt tag and inside that tag a …
The viewer will learn the basics of jQuery, including how to invoke it on a web page. Reference your jQuery libraries: (CODE) Include your new external js/jQuery file: (CODE) Write your first lines of code to setup your site for jQuery.: (CODE)
The viewer will learn the basics of jQuery including how to code hide show and toggles. Reference your jQuery libraries: (CODE) Include your new external js/jQuery file: (CODE) Write your first lines of code to setup your site for jQuery…

776 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question