[x]
Posted via EE Mobile

Search, ask, and monitor your questions on the go with EE Mobile. Visit Experts Exchange from your mobile device and never be out of touch again.

Question
[x]
Attachment Details
[x]
The Solution Rating System

With so many solutions, how can you tell which solutions are most likely to help you and which ones are not? To provide you with a tool to use, we rate our solutions based on various elements that most accurately determine if a solution is a quality solution. To explain what factors affect the solution rating, here are the elements we take into consideration when formulating our solution rating.

  • The Grade of the Solution
  • The Zone Rank of the Expert Providing the Solution
  • The Number of Author and Expert Comments
  • The Number of Experts Contributing
  • The Feedback of the Community

Your Input Matters
Because of the way the system is set up, the most important variable in this equation is you. As a member of Experts Exchange, you are able to cast your vote on the quality of the solutions in regard to how complete, accurate, helpful and easy to understand each solution is. When you provide your feedback, each rating is adjusted accordingly. So, if you see a solution that has a poor rating that you think is a good solution, let us know by rating it. As you do, the rating will be adjusted and will become more accurate for other members of our site.

If you have any suggestions that you would like to make for our rating system, please ask a question in the Suggestions Zone of Community Support.

Thank you!

9.0

how to make a sparse database

Asked by onyourmark in Python Scripting Language

Tags: Python

Hi, I am trying to represent a table (like an excel page for example). The project is to crawl the web and pull in the text of webpages and insert all the words into a table. The crawler reads a page and then breaks it up into the list of all the words in the page. I am using beautiful soup and urllib2.
The top row of the table is a non repetitive (no duplicates) list of all the words in all the pages. Each row represents each url and the entry in each column is the number of  times that word has appeared in that particular page (for the particular url that is being   represented by this row). So I will crawl many (maybe 5000) pages. That means that the top row of this table is the list of all the words that are in these 5000 pages. There may be something like 4 or 5 thousand different words. But any particular page and therefore any particular row of this table will probably only contain a small subset (say 400) of these words. Thus this table will be a sparse table or matrix.
I don't know if I should use a dictionary, a shelf, or some other structure to do this. If it is a dictionary would it be a dictionary with the keys being the urls? and the values being a list? Or should the keys be the 4 or 5 thousand different words and values be another dictionary with the keys being the urls and the values being a list of the ???
I am confused. Should I just have a list with the keys being the urls and the values being a list of the frequencies? That sounds like what I want but it does not include the list of all the possible words (the 4 or 5 thousand words)
Any help would be appreciated.

Here is some of the code (it is not mine)

1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
def addtoindex(self,url,soup):
  if self.isindexed(url): return
  print 'Indexing '+url
  # Get the individual words
  text=self.gettextonly(soup)
  words=self.separatewords(text)
  # Get the URL id
  urlid=self.getentryid('urllist','url',url)
  # Link each word to this url
  for i in range(len(words)):
      word=words[i]
      if word in ignorewords: continue
      wordid=self.getentryid('wordlist','word',word)
      self.con.execute("insert into wordlocation(urlid,wordid,location) \
      values (%d,%d,%d)" % (urlid,wordid,i))
[+][-]03/02/08 07:36 AM, ID: 21026291Accepted Solution

View this solution now by starting your 30-day free trial. Setting up your free trial is quick, easy, and secure. We will return you to this solution, unlocked, when you're done.

About this solution

Zone: Python Scripting Language
Tags: Python
Sign Up Now!
Solution Provided By: ramrom
Participating Experts: 1
Solution Grade: A
 
[+][-]02/29/08 08:56 AM, ID: 21015096Expert Comment

At Experts Exchange, members can ask their questions to thousands of technology professionals, also known as Experts. Experts compete and collaborate to answer those questions by leaving comments like this one.

Start your 30-day free trial to view this Expert Comment or ask the Experts your question.

 
[+][-]02/29/08 09:16 AM, ID: 21015316Author Comment

Often, when Experts are collaborating with members who have asked questions, they will request additional information about the problem. Askers respond with an author comment like this one.

Start your 30-day free trial to view this Author Comment or ask the Experts your question.

 
[+][-]02/29/08 09:26 AM, ID: 21015434Administrative Comment

Experts Exchange has a courteous staff of administrators who help members get the most out of the website by means of administrative comments like this one.

Start your 30-day free trial to view this Administrative Comment or ask the Experts your question.

 
[+][-]02/29/08 09:29 AM, ID: 21015471Expert Comment

At Experts Exchange, members can ask their questions to thousands of technology professionals, also known as Experts. Experts compete and collaborate to answer those questions by leaving comments like this one.

Start your 30-day free trial to view this Expert Comment or ask the Experts your question.

 
[+][-]02/29/08 09:40 AM, ID: 21015584Author Comment

Often, when Experts are collaborating with members who have asked questions, they will request additional information about the problem. Askers respond with an author comment like this one.

Start your 30-day free trial to view this Author Comment or ask the Experts your question.

 
[+][-]03/01/08 05:32 AM, ID: 21021498Expert Comment

At Experts Exchange, members can ask their questions to thousands of technology professionals, also known as Experts. Experts compete and collaborate to answer those questions by leaving comments like this one.

Start your 30-day free trial to view this Expert Comment or ask the Experts your question.

 
[+][-]03/01/08 05:55 AM, ID: 21021641Expert Comment

At Experts Exchange, members can ask their questions to thousands of technology professionals, also known as Experts. Experts compete and collaborate to answer those questions by leaving comments like this one.

Start your 30-day free trial to view this Expert Comment or ask the Experts your question.

 
[+][-]03/01/08 05:10 PM, ID: 21024475Author Comment

Often, when Experts are collaborating with members who have asked questions, they will request additional information about the problem. Askers respond with an author comment like this one.

Start your 30-day free trial to view this Author Comment or ask the Experts your question.

 
[+][-]03/01/08 07:59 PM, ID: 21024878Author Comment

Often, when Experts are collaborating with members who have asked questions, they will request additional information about the problem. Askers respond with an author comment like this one.

Start your 30-day free trial to view this Author Comment or ask the Experts your question.

 
[+][-]03/03/08 03:39 AM, ID: 21030537Author Comment

Often, when Experts are collaborating with members who have asked questions, they will request additional information about the problem. Askers respond with an author comment like this one.

Start your 30-day free trial to view this Author Comment or ask the Experts your question.

 
 
Loading Advertisement...
20091111-EE-VQP-92 / EE_QW_2_20070628