Question

Wildcard cfquery & delete files?

Asked by: cmistre

I have a situation where I have a very large directory of filenames, that are regulary imported via FTP automatically.  

Since they are not imported via coldfusion, there isnt a database associated with which files belong to which records.  The filenames themselves do contain names that allow me to associate them with their corresponding records, so I can query & retrieved the files as needed, but I am having trouble figuring out how to delete the files when the records become outdated.

I need to be able to do a an entire directory search of what is in the FTP folder against the records that are actually active in my database.  

Is there a way I can wildcard delete files? For instance if my records are labeled from 1000-2000.  The associated records for record 1000 are labeled 1000-1, 1000-2, 1000-3, etc.

What I need to do is check the files in the directory against record numbers and delete any files in the directory that do not have a corresponding record in the database.  So if I run the code every week, it can wildcard delete any file that contains a record number in the filename that no longer exists in the live database.

This Question has been solved and asker verified All Experts Exchange premium technology solutions are available to subscription members.

Subscribe now for full access to Experts Exchange and get

Instant Access to this Solution

  • Plus...
  • 30 Day FREE access, no risk, no obligation
  • Collaborate with the world's top tech experts
  • Unlimited access to our exclusive solution database
  • Never be left without tech help again

Subscribe Now

Asked On
2009-07-03 at 14:50:01ID24543148
Topics

Cold Fusion Markup Language

,

ColdFusion Studio

,

ColdFusion Application Server

Participating Experts
1
Points
500
Comments
27

Trusted by hundreds of thousands everyday for fast, accurate and reliable tech support.

  • "The time we save is the biggest benefit of Experts Exchange to Warner Bros. What could take multiple guys 2 hours or more each to find is accessed in around 15 minutes on Experts Exchange." Mike Kapnisakis, Warner Bros.
  • "Our team likes having a resource that is more secure than just using Google and most experts using this service really know their stuff. It's nice to look here first versus using Google." Dayna Sellner, Lockheed Martin
  • "Anytime that I've been stumped with a problem, 9 out of 10 times Experts Exchange has either the accepted solution or an open discussion of the potential solution to the problem." Kenny Red, eBay Inc.

See what Experts Exchange can do for you.

Got a question?

We've got the answer.

Experts Exchange has been collecting answers to technology questions since 1996…3 million and counting! If you have a question, chances are we already have your answer.

Screenshot of Experts Exchange Knowledgebase

Need individual assistance?

Our experts are ready to help.

If you can't find the exact answer you're looking for, ask our exclusive community of 50,000 experts. You’ll get a personalized answer from a trusted professional.

Screenshot of Experts Exchange Knowledgebase

Want to learn from the best?

Read articles from industry experts.

Thousands of free tech tips, tricks, how-to’s and tutorials are available in our peer reviewed articles section. See for yourself how smart our experts are, no login required.

Screenshot of an Article

Working on a long term project?

Store your work and research.

Save solutions to your questions, answers you’ve discovered through searching plus helpful articles in your personal knowledgebase for easy future access.

Screenshot of Experts Exchange Knowledgebase

Access the answers to your technology questions today.

Subscribe Now

30-day free trial. Register in 60 seconds.

What Makes Experts Exchange Unique?

Members of the expert community talk about why the experience at Experts Exchange is different than what you will find anywhere else.

Trusted by the world's most respected brands.

image of each brand's logo

Faithfully serving IT professionals since 1996.

Experts Exchange Logo

Try it out and discover for yourself.

Subscribe Now

30-day free trial. Register in 60 seconds.

Related Solutions

  1. Wildcards
    I have a program that will allow the user to retrieve occurrences of words in a large body of text that has been separated into sentences using the full stop as a delimiter. The user willl enter a word eg. shoulder and every sentence containing that word is printed to a file....
  2. wildcards
    I need a way to tell whether or not a filename passes specific wildcard search criteria (for example, does *.cpp include filespd.h) I can't use the standard find functions since none of my files actually exist on disk. They are all compressed and stored within a large proj...
  3. Wildcards
    I'm using vb6 to create a sort of search engine.....I want the users to be able to use wildcards for searches.....ex: St*....which will bring up Steve, Stock, etc..I'm not sure how to do this, any help would be greatly appreciated. Here's the code where I'll have to add the w...
  4. WildCards
    Does anyone know how to use wildcards in btrieve? Something like the following (which works in MS SQL 2000) select * from table1 where id like 'bob%' thanks

Free Tech Articles

  1. WARNING: 5 Reasons why you should NEVER fix a computer for free.
    It is in our nature to love the puzzle. We are obsessed. The lot of us. We love puzzles. We love the challenge. We thrive on finding the answer. We hate disarray. It bothers us deep in our soul. W...
  2. SCCM OSD Basic troubleshooting
    SCCM 2007 OSD is a fantastic way to deploy operating systems, however, like most things SCCM issues can sometimes be difficult to resolve due to the sheer volume of logs to sift through and the dispe...
  3. Migrate Small Business Server 2003 to Exchange 2010 and Windows 2008 R2
    This guide is intended to provide step by step instructions on how to migrate from Small Business Server 2003 to Windows 2008 R2 with Exchange 2010. For this migration to work you will need the fo...
  4. Create a Win7 Gadget
    This article shows you how to create a simple "Gadget" -- a sort of mini-application supported by Windows 7 and Vista. Gadgets can be dropped anywhere on the desktop to provide instant information, ...
  5. Outlook continually prompting for username and password
    There have been a lot of questions recently regarding Outlook prompting for a username and password whilst using Exchange 2007. There are a few reasons why this would happen and I will try to cover t...
  6. Backup Exchange 2010 Information Store using Windows Backup
    There seems to be quite a lot of confusion around the ability to backup Exchange 2010 using the built in Windows Backup feature. This stems from the omission of this feature prior to Exchange 2007 s...

Cloud Class Webinars

  1. Avoiding Bugs in Microsoft Access
    Alison Balter takes and in-depth look at avoiding bugs in Access. In this webinar you will learn about using the immediate window to debug your applications, invoking the debugger, using breakpoints to troubleshoot, stepping through code, setting the next statement to execute, ...
  2. Top 10 Best New Features in Visio 2010
    Scott Helmers gives live demonstrations of the top 10 new features in Visio 2010. This webinar will teach you how to create compelling diagrams by adding shapes to the page with a single click, linking the shapes in a diagram to data in Excel (or SQL Server, or SharePoint), ...
  3. IT Consultant Business Secrets Revealed
    Michael Munger, Experts Exchange tech pro and IT consultant, pulls back the curtain on his very successful businesses and answers question on every IT consultant and business owner should know about. He shares secrets on what he did to solve the 5 most common problems in IT, ...
  4. Disaster Recovery and Business Continuity
    Quest CTO, Mike Billon, gives an overview of the steps involved in building a dunamic disaster recovery plan. Through case studies and an examination of software/hardware tooles for monitoring and testing, you'll gain a better understandin of where you are, where you want ...
  5. Organize Your Visio Diagrams with Containers and Lists
    Scott Helmers uses cross functional flowcharts, wireframe diagrams, data graphic legends and seating charts to teach you: how to ustilize all three new structured diagram components in Visio 2010, the best practices for organizeing shapes in previous version of Visio, how to organize ...
  6. How to Us Objects, Properties, Events and Methods in Microsoft Access
    Alison Dalter gives an in-depbth look at objects, properties, events and methods in Microsoft Access. In this webinar you will learn about using the object browser, referring to objects, working with properties and methods, working with object variables, understanding the ...

Join the Community

Give a Little. Get a Lot.

Join the community of experts here and help other tech pros by answering question in your area of expertise. You can earn FREE access to all Experts Exchange's premium features and resources.

Join the Community

Answers

 

by: cmistrePosted on 2009-07-03 at 16:27:40ID: 24774928

Here is the concept for what I need to do, but I am not sure how to query based on a filename?

Or the best way to loop through and delete the records that are basically outdated.  The first 10 characters of the filename will always match the record id of the active records.  This is the only way I could think of to do this, but of course it doesnt work since I dont know how to actually reference the name of the file in the directory.

 I used **filename** to show where I need to reference this name.

<cfloop>
 
<cfquery name="get_filename">
select **filename** from
**directory**
</cfquery>
 
<cfquery name="active_records">
select id
from active_records
where id like '%filename%'
</cfquery>
 
<cfif #active_records.id# not like #trim(left(**Filename**,10))#>
DELETE FILE FROM DIRECTORY
(i know the code to delete a file)
</cfif>
 
</cfloop>

                                              
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:

Select allOpen in new window

 

by: _agx_Posted on 2009-07-03 at 16:55:49ID: 24774984

Assuming the files are on a local drive, use cfdirectory to get a listing of the files in a particular directory. You can use "filter" to restrict which files are returned.

Example: Find all files starting with:  "1000-"
<cfdirectory directory="c:\someFolder\yourFiles"
     action="list"
    filter="1000-*"
    name="fileList"
    type="file">

The result is a query and the file name is stored in the "Name" column
<!--- show all files found --->
<cfoutput query="fileList">
      File name = #Name#<br>
</cfoutput>

 

by: cmistrePosted on 2009-07-03 at 17:09:40ID: 24775017

That shows how to display the file names, but how to delete?

Can you give an example with code on how to delete "non-matching" records?

The directory itself may have as many as a few hundred thousand files in it, and the filter way looks like a good way to only bring up those files that match a certain file number, but I need to go a step further and delete ALL files of a record number that do NOT match a record id in the active_records database.

So I need to

1. query the id#'s of all records in the current working database.
2. query the filenames in the directory
3. delete files in the directory that do not have a matching record in the current database any longer.

 

by: _agx_Posted on 2009-07-03 at 17:20:34ID: 24775040

What database type are you using?

 

by: cmistrePosted on 2009-07-03 at 17:21:24ID: 24775041

Its mysql 5.0

 

by: _agx_Posted on 2009-07-03 at 17:22:16ID: 24775044

Also, do you have full control over the server or are you using a shared server (ie with limited access to tags like cfexecute, etc..)?

 

by: _agx_Posted on 2009-07-03 at 18:23:37ID: 24775237

Since there are potentially hundreds of thousands of records you should definitely do this in chunks.  Here is an outline of one way to do it, that would most certainly be more efficient than looping w/CF

http://www.experts-exchange.com/Web_Development/Web_Languages-Standards/Cold_Fusion_Markup_Language/Q_24287703.html

Also, going forward you might consider adding a column to your table that would store the related file name.  You could set up a scheduled task that would run periodically. It could check all records with a NULL file name and update the record.  That would drastically reduce the problem of orphaned files, as you could delete related  physical files at the same time as the database records.

 

by: cmistrePosted on 2009-07-03 at 19:43:41ID: 24775475

This is on a dedicated server so I do have full control.  However I don't have a really good way to change the way I recieve the information so I have to modify the way I update rather than the way the information is stored.

I recieve information 2 different updates.  One update completely replaces the old records, while the other update is the images associated with the records, but its only the latest images.  When the text update no longer contains that product, I am left with these "orphaned" image files which eats up alot of space.

I have read the article you linked but I dont know how to work with dos directories etc.  I have thought of looping through text database, and moving matching image files into different folder and deleting the old, but since these images are actually used for display on a website, if they were moved, I would also have to always update the images url in the website everytime I updated.

Does this new information give you any other ideas?  Thanks for your help

 

by: _agx_Posted on 2009-07-03 at 21:10:54ID: 24775677

The image updates make sense, but I am not quite clear on how your database is updated yet.  So I will focus on the idea in the other thread.

Using DOS to get a listing of the files in a directory is very simple.  You can use the "dir" (directory list) command and send the output to a file.   You can either run it from a command line, or from cfexecute. An example would be something like this.  

--- the "/b" means show only the file names, not the extra information
c:\>  DIR /b  c:\yourDirectory

Then you will have a file containing all of the image names.  You can create a table in your MySQL database and use LOAD DATA INFILE to import the file names into the new table.  Then do a JOIN with your other table, and "flag" all of the images that have no associated record.  It is not much different than a regular update.  Then you can do a select to find all image records that are outdated and loop to delete them.  

Of course you should do this in chunks as it may take a while with so many files and records.

 

by: _agx_Posted on 2009-07-03 at 21:12:28ID: 24775680

Correction:  I forgot the "output to file" syntax

--- Save results to a file named "myFileList.txt"
c:\>  DIR /b  c:\yourDirectory > myFileList.txt

 

by: cmistrePosted on 2009-07-03 at 21:40:51ID: 24775736

I am hoping to be able to automate this cleanup process, so I'm probably going to want to set this up via cfexecute to build the table from filenames.  I've never used CFExecute, and not familiar with outer join either.  I'll do some looking up but hope you can help me write a little code if I can't figure it out.

If the process of how the site updates will help sort this out, I'll give you some more details.

There is one master database for the information my website needs and I use a client that is scheduled  to automatically grab the data on a pretty frequent basis.  Coldfusion also has a scheduled task which completely truncates the entire data table, and re-imports the new data download from scratch.  This was so I didn't have to cross-reference the old table with the new table for changes. It was easier to simply drop the old and re-import the data for the freshest update.

The images however are just FTP'd directly into the directory that displays them on the website.  I can't drop that whole images folder and re-download the matching images every time, so this is how I end up with old stored image files that wouldn't belong to any current records.

I'm gonna give the dos / cfexecute / outer join a try.  I played with cfdirectory a bit, but I think its going to crash my server, its been unable to resolve for hours and still hasnt stopped as of now.

 

by: cmistrePosted on 2009-07-03 at 21:55:36ID: 24775749

Hmm, OK I am pretty confused just with cfexecute already.  I really want this to be a scheduled procedure or something I can just click inside the admin panel to begin the cleanup.

Can you please give me an example of how to run the dos command via cfexecute?

 

by: _agx_Posted on 2009-07-03 at 22:04:00ID: 24775773

>  I played with cfdirectory a bit, but I think its going to crash my server, its been unable
> to resolve for hours and still hasnt stopped as of now.

Yes, that is why I said you have to do this in chunks ;-)  Hundreds of thousands of files is just too much information to process at one time.  It would take a while - even using a command tool.  Adding CF on top of that definitely won't help.

> and re-imports the new data download from scratch.  This was so I didn't have to
> cross-reference the old table with the new table for changes.

That may be easier in terms of data, but what about the file information? I really think you need to find a way a way to do this incrementally.  To try and run this full routine on 100K+ every time you get an update will take too long and is just not feasible.

 

by: _agx_Posted on 2009-07-03 at 22:11:26ID: 24775797

The best approach it is to:

1)  Create a process to do the initial cleanup.  This will be less efficient, because it only needs to run once.  But with this much information, you are probably better off doing this at the o/s level .. rather than adding CF to the mix.  There is absolutely NO way you will be able to process 100K files/records from a single cfm script.  The server will time out or crash first.

2)  Create a more efficient method (schedule task) to do the incremental updates



 

by: cmistrePosted on 2009-07-03 at 22:11:42ID: 24775798

Even incrementally, I would always need to compare the old table to the new one and delete the image files that exist in the directory that don't exist in the new table.  Wouldnt it still have to reference the same exact directory with hundreds of thousands of records?  Its not like they give me a list of photos that need to be deleted...

So... I know the procedure is long and ugly, but it has to be done.  Can you help me with cfexecute?

 

by: _agx_Posted on 2009-07-03 at 22:29:48ID: 24775886

> Even incrementally, I would always need to compare the old table to the new
> one and delete the image files that exist in the directory that don't exist in the new table.

That is only because you are truncating the table every time.  If you used a more efficient method to identify

1) New
2) Changed and
3) Deleted Items

You would have a much smaller set of information to work with each time.  

Unless you have a _LOT_ of hardware on the server, you will not be able to run this kind of process at all, let alone on a regular basis.  At worst it will bring down the server, at best the site will be unusable during that time.  Not to sound harsh, but it is just not feasible. You have to break up the process into smaller pieces.

 

by: _agx_Posted on 2009-07-03 at 22:31:20ID: 24775893

Can you post a screen shot of the database table structure and a sample of the data?  Maybe I can help you figure out a better way to update the table.

 

by: cmistrePosted on 2009-07-03 at 22:48:11ID: 24775967

that would probably complicate things even more since the tables are pretty huge with alot of information and remarks about the items.

The best way to understand is this...  Each record does have an ID and thats how I associate the images with the record.  All that is given to me is a record id and a photo count.  So if record ID is 1000, and the photo count is 3, then I know that the filenames are 1000-1.jpeg, 1000-2.jpeg, 1000-3.jpeg.

Since the new image files are automatically imported directly into the folder, I dont think there is a way to keep a table of filenames current since the downloading client cant really run a mysql procedure to update my table when it adds a file.  This leaves me always having to compare the directory files to the record ID's and look for the orphans.

 

by: _agx_Posted on 2009-07-03 at 23:13:47ID: 24776029

So the relationship between ID's and images is one-to-many, not one-to-one (ie One record ID can have many images, not just one).

But if you at least knew which ID's were deleted when you get the updates, you would know exactly which files you needed to delete. You would not have to do a full comparison each time.  

How many database records are you talking about?

 

by: _agx_Posted on 2009-07-03 at 23:17:12ID: 24776037

> But if you at least knew which ID's were deleted when you get the updates, you would know
> exactly which files you needed to delete. You would not have to do a full comparison each time.

Again, think two stages: initial cleanup and maintenance actions going forward.

(Unfortunately, I have to head out soon.  Holiday weekend)

 

by: cmistrePosted on 2009-07-03 at 23:19:49ID: 24776041

If I didnt truncate the data table, it too would grow out of control in the same way the images are.  The cumulative update option will only add new records.  It has no capabilty to cross-reference my personal MySql table and delete the old records that are no longer in existence.

One thought is if I didnt truncate, then I would need to do this...

1.  remove data records as well as their associated images in the current database that do not exist in the new update file.
2.  do NOT insert records that are already in existence in the database.

Aside from that, I'm pretty sure I can come up with a way to keep an accurate table of the filenames in the directory so I can avoid having to use some type of directory list function after I make the initial table.

 

by: cmistrePosted on 2009-07-03 at 23:29:23ID: 24776064

We kinda crossed paths in the last response.

There are somewhere between 30,000 & 50,000 total records in the database, and each record has anywhere from 1-10 images, some have even more.  Managing the images is a real pain.

Im thinking of the possibilities of the cumulative update...

My thought process is that I have import the new data into a totally different table, but then perform some type of loop by the record ID to flag records that meet one of 3 criteria.

1.  Records that exist in update that dont exist in current table - ADD
2.  Records that exist in both places - OVERWRITE OLD new since it could contain changes to price, etc.
3.  Records that do NOT exist in UPDATE that exist in table - DELETE record and all filenames that begin with the prefix of the id#

This sounds more manageable but I had originally thought the loop idea was not a good idea for the 50,000ish data records, but its definitely a better alternative than the hundreds of thousands of images.

 

by: _agx_Posted on 2009-07-03 at 23:39:01ID: 24776084

> One thought is if I didnt truncate, then I would need to do this...

Yes, but as long as you have a method to uniquely identify common records in the both sets of data, identifying changes is very simple.  It is just a matter of a few JOINS and adding a few "flags" to the table.   In psuedo-code something like

1. Insert the "updates" into a temp table
2. Do an INNER JOIN to identify changes

        UPDATE  tmp
        SET         tmp.IsChanged = true
        FROM     OldTable old INNER JOIN TempTable tmp ON old.ID = tmp.ID

3. Do an OUTER JOIN to identify deletions

        UPDATE  old
        SET        old.IsDeleted = true
        FROM     OldTable old LEFT JOIN TempTable tmp ON old.ID = tmp.ID
        WHERE   temp.ID IS NULL

4. Do an OUTER JOIN to identify new records

        UPDATE  tmp
        SET        tmp.IsNewRecord = true
        FROM     OldTable old RIGHT JOIN TempTable tmp ON old.ID = tmp.ID
        WHERE   old.ID IS NULL

Once you have flagged the new, changed and deleted information it is much easier to process it in batches.

A)  INSERT the new records into the old (ie main table)
B)  Update the changed records in the main table
C)  Have a scheduled task delete the old records from the database (and files) in batches

 

by: _agx_Posted on 2009-07-03 at 23:43:03ID: 24776088

>  We kinda crossed paths in the last response.

And in this one too ;-)

Our timing was off again, but yes. Now we are on the same page.  EXCEPT - do not do the sql updates in a loop.  SQL is much better at processing whole sets of data. Plus looping will take much longer.   The only place you must use looping is for the delete of the image files.  But again, you can do that in a scheduled task, which will make the processing smoother.

 

by: _agx_Posted on 2009-07-04 at 00:02:12ID: 24776123

I have to head out now.  I have about 5 hours to get some sleep before traveling ;-)

 

by: cmistrePosted on 2009-07-04 at 00:11:16ID: 24776142

have fun, man thx for the help so far.

I'm still lost on how to write the real code for the joins but will play around.

 

by: _agx_Posted on 2009-07-06 at 12:06:37ID: 24788210

> I'm still lost on how to write the real code for the joins but will play around.

The examples posted above should be pretty close to the real code.  I just said "psuedo-sql" because they were not tested.

20120131-EE-VQP-002

3 Ways to Join

30-Day Free Trial

The Experts

98% positive feedback on 31,087 answers since March 2000. angeliii is a Microsoft Most Valuable Professional for his work with MS SQL Server & Develoment.

He has also proven his knowledge of Visual Basic Programming, PHP Scripting and Oracle Databases.

The Experts

97% positive feedback on 10,752 answers since July 2000. lrmoore has more than 18 years experience in the networking industry.

The six-time Mircosoft MVPs specialties include firewalls, virtual private networking, and network management.

Testimonials

"...and excellent source for support... Kind of like having your very own IT dept." Electriciansnet

Testimonials

"I was apprehensive at signing up at first. However... it has already made my life as an IT administrator much easier." JaCrews

Testimonials

"WOW! You guys have great, active, and knowledgeable people on here." moore50

Business Clients

Business Clients

In the Press

"If you’ve got a question... Experts Exchange can supply an answer.”

In the Press

"...an invaluable aid for both IT professionals and those who require tech support."

In the Press

"where IT professionals provide quick answers on just about any topic"

Business Account Plans

Loading Advertisement...