Hi, anyone have any suggestions for collecting data from the web? I want to collect layman and/or professional discussions on illnesses that people are having. There are lots of discussion groups that have this kind of information and there are probably lots of other sites as well. I want to, for starters, try to collect all the text in a discussion group thread and consider that one item, then collect all the text in the next thread and consider that as item 2 and continue this to build up a large database of say 10,000 or 20,000 items/threads.
I will need some way to collect the information and some way to store it, with each thread/item being a row or case in the database.
Does anyone have any recommendations about how to do this or where to start?
Thanks very much!