Solved

Delete small images with cfdirectory and cffile.

Posted on 2012-03-29
11
405 Views
Last Modified: 2012-06-27

Open in new window

<cfquery name="StoryInfo" datasource="nNews">
  select *
  from story
  where id > 680000
</cfquery>



<cfdirectory
    directory="C:\serverroot\n<wbr ></wbr><wbr ></wbr>server\ine<wbr ></wbr>ws"
    name="mydirectory"
    filter="*.jpg"
    sort="datelastmodified DESC">


<cfquery name="mydirectory" dbtype="query">
    select *
    from mydirectory
    where size < 2048
</cfquery>

<cfoutput query="mydirectory">
  <cffile action = "delete" file="C:\serverroot\nserve<wbr ></wbr><wbr ></wbr>r\inews\#m<wbr ></wbr>ydirectory<wbr ></wbr>.name#">
</cfoutput>

Open in new window


News thumbnails are automatically generated from a news feed using cffile. Once in a while an image is created that's corrupted. It's  always less than about 2k in size. I've wasted too much time attempting to discover the reason and now only wish to delete those images on a schedule using ColdFusion.

I want to use CF administrator's Sheduled Event to run this code to delete all files less than the given size.

But with hundreds of thousands of files in the same directory, this takes far too long. How can I look at only the last 100 or so images in the directory so the process goes much faster? I mean, why look at all the files in the folder when all I want are the last 100 or so created? And they are all in numerical order.

I included a cfquery just in case we can use it to shorten the time. Use my snippet as the example. Show me what needs to be changed to make it work.
0
Comment
Question by:Qsorb
  • 6
  • 4
11 Comments
 
LVL 15

Expert Comment

by:myselfrandhawa
Comment Utility
Without seeing your exact code:

this is approach i usually follow

<cfset images = valuelist(queryname.thumbnailfile)>
<cfloop list="images" index="i">
<cfif FileExists("#ExPandPath("/images/thumbnails/#i#")#")>
    <cffile action="delete" file="#ExPandPath("/images/thumbnails/#i#")#">
</cfif>
</cfloop>
0
 

Author Comment

by:Qsorb
Comment Utility
You said, "Without seeing your exact code ..."

What? This does not help at all.

I posted my code, all of it. Didn't you look at my code snippet?

My code works. It's just far too slow because it looks at all the images, not just the top 100.

Your suggestion does not take into account these two factors:

1) Size of the image (filter out all images with a size greater than 2048 bytes.)
2) Last 100 records.

Please look at my code snippet then show me what you mean, based on my code.

Or anyone else who can look at my code and make a suggestion.
0
 
LVL 52

Expert Comment

by:_agx_
Comment Utility
It seems like you're asking 2 different questions. Each has a different answer. Which are you trying to do ultimately?  

1) How to remove ALL files in the directory less than 2K --OR
2) How to check the last 100 images only and remove any of them that are less than 2K

If #2, do you have some way of identifying "new" files other than looking at the date on the file system - like a time stamp column or incrementing ID? If you can determine the "last 100" using a db query, the process would be much faster because you don't have to build a list of 100K files every time.

But with hundreds of thousands of files in the same directory, this takes far too long.


If it's #1, with that many files it's always going to take while. There might be ways to speed it up a little. But first which step in your original process is taking up the most time? 1) Generating the directory listing 2) the QoQ or the 3) deleting files within a loop? I'd guess #1.


generated from a news feed using cffile.

Going forward, that's the best time to check for corruption. So if you haven't done it already - I'd recommend modifying the process to check for bad images when they're 1st created. Doing it after the fact is always going to be slower because like you said - there's tons of files to examine.
0
 

Author Comment

by:Qsorb
Comment Utility
I need both:

1) Size of the image (filter out all images with a size greater than 2048 bytes.)
2) Last 100 records.

<cfquery name="StoryInfo" datasource="nNews">
  select top 100 *
  from story
</cfquery>

Wouldn't that give me the last 100 records? The record ID and the image name are the same, that  is, the image name is the id record number, plus the .jpg extension.

Generating the directory takes up nearly all the time.

I gave the query so that someone could show me how to use my cfquery to limit the search for the last 100 records (images).  We can do it by ID greater than but I assume top 100 would be easier as I'd not need to update the code.

Does this make sense?
0
 
LVL 52

Expert Comment

by:_agx_
Comment Utility
Wouldn't that give me the last 100 records?

If you sort the results by the ID in descending order, yes:

          SELECT TOP 100 *
          FROM   story
          ORDER BY RecordIDColumn DESC

Does this make sense?

Yes and no.  The query above (with a little more code) would let you check the last 100 records. Then you could loop through the query, and discard the corresponding of image if it's < 2K.  But obviously there's still the other 999,900 images in your directory to check.

So my question is - are you ultimately trying to check ALL 100K images?  If yes, you'll need a different strategy.  There's no way to do it in a single shot. Generating a directory list of 100K files is going to take a while - no matter what tool you use.

You need to break the process into batches. Like create a scheduled task that processes 100 images or so at a time. Then marks the db records as processed and calls itself again until all of them are processed.

Keep in mind it should be a 1-time cleanup task. Going forward the process should check images when they're created. So you never have to perform a monstrous cleanup task again.
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 

Author Comment

by:Qsorb
Comment Utility
Agx: I only want to check the last 100 images/records. I have the query limiting to the last 100 records, but my point is, and has been, that I don't know how to apply it to the code snippet I gave.

That has, all along, been the question. Can you show me how to add this query:

SELECT TOP 100 *
FROM   story
ORDER BY ID DESC

To my CFDIRECTORY, and/or CFFILE, I'll give it a try.

ID is INT PK with identity. And again, the ID number is also the filename of the image.
0
 
LVL 52

Accepted Solution

by:
_agx_ earned 500 total points
Comment Utility
Ah, ok. I was a little confused because the cfdirectory and QoQ code above are doing the exact opposite.

Anyway, first run your query

<cfquery name="getLatestImages" datasource="nNews">
    SELECT TOP 100 *
    FROM   story
    ORDER BY ID DESC
</cfquery>

Loop through it and use getFileInfo() to get the image size. Then delete it if it's < 2K. I'm guessing about your column names, so change as needed.

Edit corrected typo in code:  

<!--- assumes YourImageColumnName stores file name:  ie "5.jpg" or "someFile.jpg" --->
<cfloop query="getLatestImages">
       <cfset pathToImage = "C:\serverroot\nserve\"& YourImageColumnName>
       <cfif FileExists(pathToImage)>
           <cfset info = getFileInfo(pathToImage)>
           <!--- automatically delete the image if it's smaller than 2K --->
           <cfif info.size lt 2048>
                <cffile action="delete" file="#info.path#">
            </cfif>
       </cfif>
</cfloop>
0
 

Author Comment

by:Qsorb
Comment Utility
It was a bit confusing at first because I had never used GetFileInfo, didn't know it existed.

But with your help, got this to work with this code very well, exactly as I needed. And now I've learned about a new tag.


<cfquery name="getLatestImages" datasource="nNews">
    SELECT TOP 200 *
    FROM story
    ORDER BY ID DESC
</cfquery>

<cfloop query="getLatestImages">
 <cfset pathToImage = "C:\serverroot\nserve\"& #ID# & '.jpg'>
 
 <cfif FileExists(pathToImage)>
  <cfset MyFile="C:\serverroot\nserve\#STORYID#.jpg">
  <cfset FileInfo=GetFileInfo(MyFile)>  
 
  <cfif FileInfo.size lt 2048>  
 <cffile action="delete" file="C:\serverroot\nserve\<cfoutput>#STORYID#</cfoutput>.jpg">
  </cfif>
 
 </cfif>
</cfloop>
<cfquery name="getLatestImages" datasource="nNews">
    SELECT TOP 200 *
    FROM story
    ORDER BY ID DESC
</cfquery>

<cfloop query="getLatestImages">
 <cfset pathToImage = "C:\serverroot\nserve\"& #ID# & '.jpg'>
 
 <cfif FileExists(pathToImage)>
  <cfset MyFile="C:\serverroot\nserve\#STORYID#.jpg">
  <cfset FileInfo=GetFileInfo(MyFile)>  
  
  <cfif FileInfo.size lt 2048>   
 <cffile action="delete" file="C:\serverroot\nserve\<cfoutput>#STORYID#</cfoutput>.jpg">
  </cfif>
  
 </cfif>
</cfloop>

Open in new window

0
 

Author Closing Comment

by:Qsorb
Comment Utility
Thanks for your help.
0
 
LVL 52

Expert Comment

by:_agx_
Comment Utility
It was a bit confusing at first because I had never used GetFileInfo, didn't know it existed.

Yeah, it was one of those hidden jewels introduced in CF8.  Thank goodness. No more doing a directory listing just to get the size of 1 file, or resorting to java (what a pain.) GetFileInfo also returns a lot of good info too, like file type, path, parent, ...

<cffile action="delete" file="C:\serverroot\nserve\<cfoutput>#STORYID#</cfoutput>.jpg">

Fyi, you never have to use <cfoutput> tags when a variable is inside a core CF tag like cffile, cfinput, cfquery, etc... They're always evaluated automatically

ie  Only need     <cffile action="delete" file="C:\serverroot\nserve\#STORYID#.jpg">
0
 

Author Comment

by:Qsorb
Comment Utility
> Fyi, you never have to use <cfoutput> tags when a variable is inside a core CF tag

Good thing to remember. You're just full of all kinds of helpful tidbits. Thanks so much!
0

Featured Post

Highfive + Dolby Voice = No More Audio Complaints!

Poor audio quality is one of the top reasons people don’t use video conferencing. Get the crispest, clearest audio powered by Dolby Voice in every meeting. Highfive and Dolby Voice deliver the best video conferencing and audio experience for every meeting and every room.

Join & Write a Comment

Today, I was working on some optimization and spam-stopping techniques when I encountered Ben Nadel's post to reduce spam feature using Math (http://www.bennadel.com/blog/197-How-I-Stop-Spammers-On-My-ColdFusion-Blog.htm). While this method is not o…
Nothing in an HTTP request can be trusted, including HTTP headers and form data.  A form token is a tool that can be used to guard against request forgeries (CSRF).  This article shows an improved approach to form tokens, making it more difficult to…
The viewer will learn the basics of jQuery, including how to invoke it on a web page. Reference your jQuery libraries: (CODE) Include your new external js/jQuery file: (CODE) Write your first lines of code to setup your site for jQuery.: (CODE)
The viewer will learn the basics of jQuery including how to code hide show and toggles. Reference your jQuery libraries: (CODE) Include your new external js/jQuery file: (CODE) Write your first lines of code to setup your site for jQuery…

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now