Software to (visually) represent software/hardware inter-dependencies

Hi all,

I'm looking for some suggestions as to software I can use to "map out" the numerous inter-dependencies between our large(ish) infrastructure base - the overall aim is to understand the impact of a failure to one or more of the components - be it a Server, Core Switch, WAN link, DNS, IIS etc., etc.

Some background: We've currently got quite a diverse range of both hardware and software out in the field with several hundred servers, several thousand users, numerous geographically separated offices most of which depend upon several 'hub' sites for delivery of various applications/DBs.  Obviously, there are numerous WAN links in the mix, many critical applications depend upon multiple geographically disperse servers to function fully and there's the complication of the glue that binds the rest of the infrastructure together such as our DNS config, the Directory Service, links to the outside world..  While some of the inter-dependencies are obvious, some aren't or, at least, are difficult to quantify.

I'd like to be able to 'describe' the relationships and dependencies between two or more services and, ideally, the 'software' would give a visual representation of the same along with showing/indicating the effect of a specific failure in the chain.

Answer wise, I'm looking for recommendations based upon specific prior practical experience, rather than the results of a quick Google search!  Also, note that I'm NOT looking for something that will traverse our network infrastructure and build a map based on subnet etc., I've already got that.  The inter-dependencies of the hardware AND software & the ability to manually 'relate' disparate components are the critical things here..

LVL 10
Who is Participating?

Improve company productivity with a Business Account.Sign Up

giltjrConnect With a Mentor Commented:
My understanding is that he wants to point software a a box that is running "application X" and have that software figure out everything that application is dependent on.  That is, if it accesses database Y and database W and applicaton B, the software figures this out and then maps the "network" (switch/router/firewall) dependencies.  It should also figure it which DNS servers, LDAP servers, security/authenticiation server (RADIUS, AD, ect.) that it uses and what the network dependencies are for those also.  It should also be able to figure out that table 1 in data base Y is on server 97 and table 2 in database y is on server 132, and all of the network dependices there.

Unicenter can map networks to get a topology, just like many other (much less expensive production) can do.  It, and nothing I know of, can read code or configuration files of an application to figure out what the applicaiton is dependent on.

Network topology of dependencies is actully the easy part.  You are thinking at a different level than what I beleive he is thinking.  I mean say you have a Web server, this is a "service", how can Unicenter figure out all of the dependencies the webserver has.

With the proper agents installed in the proper places Unicenter can get a network topology, but it can't tell you that Apache running on serverX is connecting to LDAP on serverY, which is actually redirecting some of the LDAP requests to LDAP on serverZ.  Once you figure this out, Unicenter can tell you the network path between the servers.  Even with that there are limitation.  If you have non-managed switches, these will be transparent.
MS Visio is the quick simple answer.  The visuals that are created is a very helpful aide to assist many a people and techs understand network flow as an overall point.  And of course the specific flow of an application.  

Now, I am uncertain as to what you specifically need, as to the interdependencies; represening a problem that may arise and how it cascades into the infrastructure is a mind bogglingly random act that can have many unique or not-so unique reprocussions that there is no way to easily represent that; if that was the goal of the question.

Working in a large networked environment; the closest thing that vaguely resembles what you may be looking for is a monitoring app; like HP Openview, NetIQ or MS' MOM which will report all of the major and minor occurances.  The enginerrs and techs use that information to then resolve the issue at hand.

It would be nice to have something that could predict the chain of events that could predict the eventual catastrophic failure of something as an excerise; but that does appear to get into the realm of AI.  At present, you can definitely represent flow for how things are suppose to work...

Sorry we missed your Q earlier.

Years ago, CA - Computer Associates -- made a phenomenal networking program, I think it was called Unicenter, but not 100% sure.  I got a chance to use it on a global network of 10,000s of computers, and it was truly impressive.  It cost a fortune, as I recall, but you could, from California, investigate the traffic going across a switch in Brussels Belgium, and see all the user nodes and topology connected to it.  I don't know if this product is still available, but it was a potent experience, indeed.

Other than a distributed networking monitoring tool, this is going to involve a lot of legwork and cataloging on your part.  Since the topology responses are always dynamic, I suggest not getting involved in a fixed flow charting software, which "freezes" things in time.  You want to be able to see what is happening, day by day.  Keep in mind, the hardware cataloging will have to be done at each site to be accurate.
Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

scrathcyboyConnect With a Mentor Commented:
Also, here is a plethora of spin off apps and comments on its use or effectiveness, so you dont have to rely just on the sales pitch -
Unicenter from CA, HP's Openview, IBM's Netview, IR Prognosis (and other like software) help monitor and manage the enviroment AFTER you have mapped the interdependencies, but they don't help you map it.

I don't know of any software that will do this for you automatically.  This requires being able to read the configuration files for all operating systems, software (both OEM installed and custom written applications) and devices and making sense of it.
Actually you can "map" it from Unicenter.  It is not a one-button solution, but you can get the status of each node, and you can print those reports.  We had running a screen capture utility, which worked out even better, because you could immediately "snapshot" the state at any one time, and archive that.  Those snapshots or printouts can give you the relatime running history of the network.  I think there is also a save function in Unicenter and a way to build this data into reports, in fact I am sure of it, but it was a while since I used the program, and cant pinpoint it now, but I know it is there.  However, YES, it will take work to map out the full topology of a global network, in fact, a LOT of work.
fostejoAuthor Commented:

Thanks for your suggestions so far.

giltjr is interpreting my requirements the most accurately at the moment - so he and everyone else can advise further let me clarify;

a. I'm *not* looking for something to merely map out the topology - as giltjr says there are plenty of tools to do that out there.
b. I'm also *not* expecting a fully automatic 'one-click' solution - I don't expect any software to wander off into my environment and come back with a perfect listing of all the hardware and software inter-dependencies - I know that isn't going to happen.  I'm happy to spend the time manually putting the info in and "describing" the inter-dependencies to the software.  To use giltjr's example:  I'm happy to 'tell' the software that "Application X" is hosted on "Server Y" and that it uses a SQL back-end hosted on "Server Z" - I would have also 'told' the software that "Server Y" is connected to a particular switch stack, is at the end of a remote link WAN and relies upon "Server W" for DNS. (etc. etc.)
c. I'm *not* looking for a monitoring solution to tell me what's happening in my environment - I've already got MOM etc. for that.

The main reasoning behind this is to be in a position to play "what if" - both for Disaster Recovery purposes along with gaining a deeper understanding of the effect of a change that's made to the environment.  For instance, I'd like to be able to indicate exactly what services will be impacted when we take the WAN link down in two weeks time or if our Primary DNS server were to fail.

ECNSSMT: Take your very valid point about the infrastructure being dynamic and you're quite right about it being very difficult (/impossible) to know for sure exactly what is impacted under all circumstances - this exercise will be limited to the information given to the "software" (ie. the individual components be they an application, a piece of hardware or a service) and their inter-relationships.

scrathcyboy: Also take your point about the info in the 'Hardware Catalog' being pretty accurate - in theory, as long as the inter-dependencies are kept up to date in the software as part of our normal "Change Process" that should be followed during every (controlled) change to our infrastructure this shouldn't cause us too much of an headache..

Hope that helps clarify the requirements and thanks for your time so far !
Not sure about these, but they may work:

As I am sure you know, this is a big task.  Getting it setup for the first go around will be difficult at best.  Depending your size it may be impossible to maintain.
ECNSSMTConnect With a Mentor Commented:
well if you are looking for software to do this; looks like I will not be getting the points for this one.  Assessing loss in this context is a chapter in implementing Disaster Recovery.  Due to the unpredictiveness of damaging or catostrophic events, it becomes a cost assessment of revenue lost per the down time of the effected services.  You can probably do an inverse tree to show dependencies and probably attach a day by day cost in terms of non-productivity.  When I was working for the banks in the NYC Financials; the two main thoughts were bringing failed services back on line (either by replacing a component or the whole box).  We were able to maintain zero down time thru replicated services.  And although servers did go down (best case scenario being maintenance, worse case being an actual outage) the normal worse case outage did not last for more than 4 hours.  9/11 we were up and running at our DR site (which was our NJ office in a 24-48 hrs time period depending on who you talked to).

From the information presented by giltjr it appears that this technology may be a fledgling technology that is packaged with various products.  Its a very interesting item to explore.  It looks very nice...

>The main reasoning behind this is to be in a position to play "what if" - both for Disaster Recovery purposes along with gaining a deeper understanding of
>the effect of a change that's made to the environment.  For instance, I'd like to be able to indicate exactly what services will be impacted when we take the
>WAN link down in two weeks time or if our Primary DNS server were to fail.

In terms of the what ifs and the whole of your paragraph; that is what I believe the system analysts, system engineers and test systems are for; portions of this is inclusive to what I do.

ECNSSMT, I think "fledgling" is being kind. I have no clue what software package they support (WebSphere, Weblogic, IIS, Oracle, Apache, ect.)  or what DBMS they support and so on.

I can't remember the product, but I know I have used one where you map your network and then do "what if ..."  But this was real network type stuff WAN and LAN connections.  I don't think it knew anything about servers or server services.  More like, what happnes if this link goes down?  What happens if I loose 1 T1 where I have 4 bonded together?

This type of What if ... is really more of a manual process.  This is also called finding your SPOF (Single Point's of Failures).  

This gets down to even looking at power, like having one supplier of electrictly and only one power feed and only one generator.  How about all of the telco services having a single enterence into your building?

DR planning is still mainly a manual process.  The quickness of your recovery depends on how well you plan and the scope of disaster.  I know some companies that can be up and running in 15 minutes at their DR site.  However they spend more on DR that most companies spend on the IT budget.  
fostejoAuthor Commented:

While you've not given a specific 'answer' as such, you're input is appreciated and triggered several other lines of enquiry, so I've split the points between the three of you.

I've managed to bump into a product that potentially fits the bill (according to it's marketing blurb anyway!) so will be having a closer look at it - if you're interested, it's IBM Tivoli Application Dependency Discovery Manager at which seems to be able to carry out some form of automated dependency discovery for applications at least.

Once again, thanks for your help !

You need to look at this very carefully.  At one time there was a limted number of produce that it could "read" the configuration information from and find stuff.  It was also supposed to be a bear and an half to actually use and I had heard there were other issues.  
Hi giltjr,

sorry for the long time in between responses.  If fostejo didn't PAQ  I would have never responded.  (side gripe: I can't believe all the notifications I get from this site.  I think I'm spamming myself, so I have the notifications off per the threads)

You are right, DR is still a manual process and you are only as good as what you setup....  DR IT budgets ... shesh... that's a discipline onto itself.  I think we are only good for a 2 - 4 hour turn around ourselves.

And fostejo; thanks for the points.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.