Autoupdate a database of file info

I would like to know if there is a program or an easy way to do this...I have an IIS 5 FTP server, and files are uploaded constantly, I would like to have a bot or a program that monitors the files being written to a directory/drive and update a database of files already in it, it should monitor changes in file names and delete a record if the file is deleted and so forth.

So far its simple SQL statements being sent to the database, and its not hard reading file info such as path,file name, size, and in my case ID3 tags too.  but having a program to autoupdate the process i can't seem to figure out.

Any ideas/help/solution would be greatly appreciated.
Who is Participating?

[Webinar] Streamline your web hosting managementRegister Today

SunBowConnect With a Mentor Commented:
A) For autoupdate, you can run wait, or sleep commands, or soon, or remain constantly running and assessing. Through OS or programatically. I recommend process like AT command to schedule the event, since this is IMO easier to manage and can be better controlled independently and remotely.

B) Program on taskbar (or systray) - while running, then ok. In general I would oppose having it constantly running without other criteria. For who knows what reason, it could lock up or cease functioning, and I'd go for an automatic restart of the tool.

C)At this point I still vote for getting periodic snapshot.
> since u are not recreateing the data
Part of problem is anticipating changes, frequency, quantity, and need (if any) for preservation. Suppose one directory gets wiped. It had essential content. Backups are maintained. It would help (me) to have the snapshot to best assess what may have gone wrong, and when, and how best to recover. This can be done programatically easy enough, in VC or many other tools. Again, I opt for best info, and adding fields for things like when program run, when changes were detected, who made changes, things like that, but only as appropriate (as in actually useful, don't need useless overhead, for it is overhead afterall)

D) FTP. IMO Very useful for quick xMit and for international transfers. Especially compared to http, and when considering liabilities incurred with some caching and filtering implementations.

E) Consider Manage & control. Depending on who gets to add what,:
> have them use a browser to call up a server-side
I concur w/payperpage that best first choice option is to have mechanism of permissions or at least something that triggers cgi possibilities, so your code only has to run when needed. Consideration has to be given to frequency, but in general one would expect less overhead on both cpu cycles, runtime and programming, as well as managing of storage and access. Drawback is more overhead at user end. I'd still opt for the total snapshot, but run that assessment less frequently, or only on demand (of admin or programmer).

F) tracking
> part is comparison between the database and the actual file
but what about history, whether a file is updated every 5 minutes, hourly, daily, or less frequent, and whether one has anticipated that very frequency. Perhaps frequency is for a collection of several files, each updated at about the same time. Rather than trigger a new run for each file, one could refine efficiency by running separately for each collection or package of file updates. Any database or even text log can be considered viable for one situation or another.

CJ_S> I would create a program that would look at a specified directory

So I concur that this can facilitate the process for situations like have several district offices, each gets own directory, and you run your program against the directory based on the office involved -- getting back some control

G) Overhead
> u r dealing with millions of files
This knocks out many options due to size. For example, Excel drags at 30K and older Access at 300K. Later Access can handle better if managed, which of course requires more thought and control. In some experiments I've run, NT itself will bog you down for certain implementations. I thereby emphasize managing the structure for #directories, #files per directory, length of filenames, and depth of directory tree.

H) Consider building own crc type of records:
>comparison between the database and the actual file
While NT will give you a collection of dates to compare, size in itself says little about content. While there are a few Command Line tools to perform more true file comparison, bit by bit, I dunno if you want to keep how many old files around to do this. Programaticaly this can be trivial, 1st by simply adding all the bytes mathematically, then, if need determined, making it a few different numbers, at personalized (you only) ranges. Say four blocks, of byte ranges you specify in code, invisibly.
I know CTMon program for monitoring network. Something like that may be availbale for this.
hotsextoyAuthor Commented:
i am not out to monitor my network or site.
i want a program to update a database by monitoring files uploaded or delete

Never miss a deadline with

The revolutionary project management tool is here!   Plan visually with a single glance and make sure your projects get done.

I did something similar, running dos command to output to a file, then move what I wanted into dB after a few processes.

You must have NT so I recommend using the AT command to schedule jobs that run at your convenience. Daily, hourly, whenever. I settled on once a day, shortly before my anticipated login so I could see the latest results. I further recommend the windows version, WinAT which is on resource kit or out on web, and consider kiXtart as well.

So, designing a dir command(s), ex:
   DIR *.txt > DirList.txt
can get you the list. Use switches like /o for preferred sort.

Now, what scheduler does is run .bat, list of the programs to do. One of mine was to run MsAccess from command line. It ran single (auto)macro to move the text inside the dB. Other queries would format differently, but Access also leads to SQL and ODBC, so I recommend you consider it as well.
Please note that SunBow's comment runs the command at certain intervals, and is probably the best suggestion too :-/

It's really hard to attach your program to IIS, however, you could create a filter which will be called once a file is being uploaded.

hotsextoyAuthor Commented:
that is a good suggestion, to initially create the database.  but i don't thnk it will be a good permanent fix.

DIR /B /ON /S > update.txt

this creates a good list with no information except the path name.  but i also need the file size, and including that gives me extra info like date/time, volume info, dir infor, and so on...
the first method i'd have to write code to get file sizes from each file, the second i'd still have to write code to parse the stuff i dont need. after i have a suitable txt file i'd have to import it into access which creates a new database everytime.  this seems a bit much if u r dealing with millions of files and have to extract other information too.

i think a good solotion would be to just write a program in vc++ to scan a drive and get file name and size (pretty easy), and extract info from it (sourse is available), and update it to a database using odbc (also easy).  this has little overhead since u are not recreateing the data, and the only hard part is comparison between the database and the actual file and having it run minimized in the taskbar.  this seems like the best method, what do u guys think?

It's pretty easy to build a filechecker, as a seperate program. I could build it for you...

hotsextoyAuthor Commented:
how would u build this CJ?  and what would it check for?


Just another thought - why not mediate access to the ftp server through an asp or similar?  Instead of people using ftp clients to maintain the server, have them use a browser to call up a server-side script.  When someone submits a request to the script to insert/update/delete a file from the server, the script also maintains the database records.  Isn't this easier than having a database that is out of date as soon as you create it, or is there a compelling business reason for allowing ftp client access to your server?
I would create a program that would look at a specified directory and see when and which files have been uploaded and any changes to the filesystem. Whenever a change has been detected it could be added to a specified database.

1) a good list with no information

- then you do not want brief switch:
DIR       /ON /S > update.txt

There are of course advantages and disadvantages to any method. Downside: DOS' DIR output would require massage of fields, titles, assessing whether directory or file, and still misses some info that (Win2K?) OS may have, and should be expected as noticably slower in execution. But it is leaves you access to a definitive snapshop of conditions at a fixed point in time for auditing, debugging, and is very quick to get going.

Methods like DIR do need parsing, or minimally aligning. I for example, would do both, before and after getting the info to Access (example: insert commas appropriately, and more options become available earlier). One preprocess was to add datestamp and source to each line, allowing collections from multiple servers, dumping of unnecessary info, presorting, and merging.

2) Part of problem of course is knowing data and methods. How often will how many files be deleted? Modified? Added? Changed? Viewed? Moved? Overwritten? Is this limited to a single directory? How big/deep is tree? (beware) How many files per directory (beware). How much control is there, and would there come need to cleanse?

Simple case of one directory leads to problem of organization and size constraint. Allow changes to multiple directories, as the '/s/ implies, allows theoretical infinite copies of someone's favorite calendar photo, movie, or soundtrack. In a controlled business environment, the filename(s) and locations can be managed and thus one can anticipate a specific format, for say, a daily status report, leaving the goal of perhaps just looking for a complete collection of input files before updating a corporate summary, or merging them for forwarding on to a mainframe or whatever.

3) not new db.
> import it into access which creates a new database everytime

While there are advantages to separate database, I just stuffed into table, that would usually be just an overwrite of the prior one which was no longer needed. Alternative of creating new table with name change to say, add datestamp for uniqueness and recordkeeping I found to be rarely needed and cumbersome. Queries kicked off condensed the collected info (ex: # records for this/that) so at a glance I could tell how well or not the process was going.  

If Access is available, then VB should be highly considered, for it interfaces very well, in terms of development time and capabilities.

> if u r dealing with millions of files and have to extract other information too

4) So much for defence. As above, knowing information on quantities, structures, requirements, capabilities, is very much a key to evaluating processes, directions, designs. I'll try to get time to return o comment on your raising VC issue, but certainly, one can always benefit from tools and skills currently available.
If you have few changes, (not millions per minute), and also admin a shop with Exchange 2000, another consideration could be using its Workflow capabilities. These have typical actions, events, triggers. On its own, it should detect when new items are added to folders and more, which can then lead to triggering your own code for tracking, monitoring, whatever, beginning with either VBS or compiled COM object.

While Microsoft seems to claim this would do our deed, I have not personal experience to comment pro/con other than that it looks to be a fit to your scenario. But I dunno what all tools (or budget) you have available. For those who could use background on this, a couple links for starters:

Anyone with experience, feel free to contribute:
done here?
hotsextoyAuthor Commented:
sorry, my account had issues so i couldn't login and so much time passed i had forgotten i had this question still unanswered...anyhow this is old new i moved on to bigger and better things...
btw, i solved it using cj's idea...i moved it all to ASP, from there it wasn't difficult updating the database...only drawback is multiple file uploads and downloads have become more of a tedious process now, but i suppose i could solve that with a shopping cart format...oh well sorry about the time lag.

User clearly states, CJ S answered this question, and will create a question linking back to here to award his points to him, as suley noted by the User.

CJ_S, be looking in this TA for a question for you in the amount of 250 points.

Doing what is right,
CS Admin @ EE
All Courses

From novice to tech pro — start learning today.