splunk. it can handle a lot, and big companies are using it for their infrastructure:
http://www.splunk.com/cust
If NASA, uses it...
Main Topics
Browse All TopicsHave a client that needs to consolidate ~100 mill log entries per day from different sources (switches, bluecoat boxes, windows systems, etc). They need to store these logs for atleast 90 days and generate reports based on these logs.
What solution would you recommend for such a task?
Lars
This Question has been solved and asker verified All Experts Exchange premium technology solutions are available to subscription members.
Experts Exchange has been collecting answers to technology questions since 1996…3 million and counting! If you have a question, chances are we already have your answer.
If you can't find the exact answer you're looking for, ask our exclusive community of 50,000 experts. You’ll get a personalized answer from a trusted professional.
Thousands of free tech tips, tricks, how-to’s and tutorials are available in our peer reviewed articles section. See for yourself how smart our experts are, no login required.
Access the answers to your technology questions today.
30-day free trial. Register in 60 seconds.
Members of the expert community talk about why the experience at Experts Exchange is different than what you will find anywhere else.

Try it out and discover for yourself.
30-day free trial. Register in 60 seconds.
Join the community of experts here and help other tech pros by answering question in your area of expertise. You can earn FREE access to all Experts Exchange's premium features and resources.
splunk. it can handle a lot, and big companies are using it for their infrastructure:
http://www.splunk.com/cust
If NASA, uses it...
Business Accounts
Answer for Membership
by: mark_willsPosted on 2008-11-28 at 21:30:29ID: 23058673
Well, that certainly is a considerable volue - a bit surprising actually, unless there are SOOO many different events being captured at each point... Do you really want to store the "noise" that is also logged by switches ?
only real way (in my books), is to import each individual log into a MS SQL database, firstly into "staging tables" and cleansed / filtered for put away into "live" tables, hopefully reducing some of the volume reflecting the more important events. Can keep the individual logs in an archive area, or possibly on-line for a week or so, then need to start compressing and archiving.
sql server can import test files quite happily, can do the ETL type processes (extract, transform, load) picking up relevant pieces of information (such as location, site etc), and can certainly provide a good database from which a variety of reporting tools and/or extracts are possible - such as Excel, Reporting Services, Crystal etc. Could even go the whole hog and get into BI style reporting down the track, including dashboards (sharepoint). But you need to visualise what you want out first, then we know the inputs (ie the logs), and then decide the best tool of choice to deliver the outpus given the inputs. SQL Server is a pretty good platform for that, and is highly scalable. might start with the free version, though limited a bit, then migrate / upgrade (can keep the same database) all the way up to enterprise version if needed.
With that type of volume, your database will be growing fairly quickly up to the 90 day mark then start to even out in terms of size, but still highly volatile - adding a hundred million rows, and deleteing / archiving the same. So transactional activity is very high. For a production environment, going to need a "real" database and that will mean licenses somewhere. Also, would look toward a minimally logged environment - data is pretty much batchloaded at a point in time, after which a full backup should be taken and then not too much activity other than volume constrained "read" access. So reduce the overheads with minimal log activity, such as a "simple" recovery plan, optimise indexing for common query "read" access and you should be able to maintain reasonable read performance.
The physical server will be fairly important - with that much transactional data - a days worth is impossible for a human to count in under 10 days (think about counting to 10 million in a 10 hour working day, means a million per hour, good luck), let alone read and make an informed decision - will most likely mean plenty of on-line aggregation - you may want to consolidate some of the data from the staging area before putting away into live tables, and no doubt will have millisecond time intervals - so may want to snapshot at minutely or 5 minute intervals...
Anyway, plenty to think about, and will be interested to see what / how it unfolds...