Hadoop is a giant redundant distributed file storage system
I am guessing its main use is when a large number of clients sending http requests to Apache servers, the apache ervers can store the requests in the HDFS and send the requests asynchronously to the back end servers. The back end servers can then send those requests back through the HDFS to the apache servers. Therefore this configuration allow High Availability as it allows the HDFS to contain the messages and therefore requests will not fail if a single (or even multiple apache servers go down.
Is this more or less correct ?