[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More
Experts Exchange Solution brought to you by
"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.
Map-Reduce jobs are used to extract, transform and load data from one store to another. Hadoop can act as a complex ETL mechanism to migrate data in various forms via one or more MapReduce jobs that pull the data from one store, apply multiple transformations (applying new data layouts or other aggregation) and loading the data to another store. This approach can be used to move data from or to MongoDB, depending on the desired result.
I started with a simple example of taking 1 minute time series intervals of stock prices with the opening (first) price, high (max), low (min), and closing (last) price of each time interval and turning them into 5 minute intervals (called OHLC bars). The 1-minute data is stored in MongoDB and is then processed in Hive or Spark via the MongoDB Hadoop Connector, which allows MongoDB to be an input or output to/from Hadoop and Spark.
Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.
I saw the appeal of Spark from my first introduction. It was pretty easy to use. It is also especially nice in that it has operations that run on all elements in a list or a matrix of data. .....
The downside is that it certainly is new and I seemed to run into a non-trival bug (SPARK-5361 now fixed in 1.2.2+) that prevented me from writing from pyspark to a Hadoop file (writing to Hadoop & MongoDB in Java & Scala should work). Also I found it hard to visualize the data as I was manipulating it. It reminded me of my college days being frustrated debugging matrices
Probably more importantly is that, once you analyze data in Hadoop, the work of reporting and operationalizing the results often need to be done. The MongoDB Hadoop Connector makes it easy to process results and put them into MongoDB, for blazing fast reporting and querying with all the benefits of an operational database.....
Overall, the benefit of the MongoDB Hadoop Connector, is combining the benefits of highly parallel analysis in Hadoop with low latency, rich querying for operational purposes from MongoDB and allowing technology teams to focus on data analysis rather than integration.
From novice to tech pro — start learning today.
Premium members can enroll in this course at no extra cost.