Solved

Multiple instances of Node.js script

Posted on 2014-12-23
2
174 Views
Last Modified: 2016-02-10
I am starting on my first node.js project and I want to try making a web scraper. I plan on using a MongoDB instance to store urls and filters I want to scrape and I want to use node.js to process the tasks I send to it. I imagine the process to be as follows:

1. I manually add URLs to the MongoDB in a task queue table
2. My script constantly runs in a loop and checks the database for new tasks.
3. If a new task is found, the node.js script starts an instance of my downloader script to start downloading data from the task URL
4. While the first task is working, I want the main script to check if the database has any additional records and start the downloader script as a new instance. Lets assume I want up to 5 instances running at a time.
5. After an instance of the downloader script finishes, it stores the downloaded data for later processing by a different scripts and marks the task queue item as complete.

As a node.js beginner I still have much to learn but is there a special design pattern I should follow to allow this multi-threaded/asynchronous operation?
0
Comment
Question by:OriNetworks
2 Comments
 
LVL 82

Assisted Solution

by:Dave Baldwin
Dave Baldwin earned 100 total points
Comment Utility
Node doesn't run threads.  http://nodejs.org/about/  And many sites these days use javascript to load the page content after the page is loaded in the browser.  That makes them near impossible to scrape because, even though 'Node.js' is written in javascript, it is not going to run the javascript that exists in a web page.
0
 
LVL 25

Accepted Solution

by:
clockwatcher earned 400 total points
Comment Utility
Pages that are built dynamically with javascript are sometimes easier to scrape than traditionally rendered pages.  In many cases, json is now used to serialize and transfer data back and forth.  And json is much easier to deal with than HTML.  The difficult part nowadays is usually session management and endpoint determination.   But the process is just different-- wouldn't say it's any harder or easier.  Just depends on the page and what you're after capturing.

As for the question, I would suggest simply having a set of worker processes constantly running polling your MongoDB task table rather than having a master process that polls and then spawns new ones.  Your master process goes down and your whole system goes down.  If you're going to have a master process, you should have one that simply maintains the workers.  Keeping a count on the workers, and if one of them dies, spawns a new one.  It shouldn't be polling.  Let the workers do that.

Google around for a publisher/subscriber (pub/sub with multiple subscribers) set up for MongoDB using node.js.  They've got to have an example or two out there.
0

Featured Post

Comprehensive Backup Solutions for Microsoft

Acronis protects the complete Microsoft technology stack: Windows Server, Windows PC, laptop and Surface data; Microsoft business applications; Microsoft Hyper-V; Azure VMs; Microsoft Windows Server 2016; Microsoft Exchange 2016 and SQL Server 2016.

Join & Write a Comment

Suggested Solutions

Setting up Secure Ubuntu server on VMware 1.      Insert the Ubuntu Server distribution CD or attach the ISO of the CD which is in the “Datastore”. Note that it is important to install the x64 edition on servers, not the X86 editions. 2.      Power on th…
International Data Corporation (IDC) prognosticates that before the current the year gets over disbursing on IT framework products to be sent in cloud environs will be $37.1B.
The viewer will learn how to look for a specific file type in a local or remote server directory using PHP.
In this fifth video of the Xpdf series, we discuss and demonstrate the PDFdetach utility, which is able to list and, more importantly, extract attachments that are embedded in PDF files. It does this via a command line interface, making it suitable …

771 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now