• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 207
  • Last Modified:

Macports Import beautifulsoup4 Problem

I have in this screenshot a list of installs and beautifulsoup4 is one of them.  But when I try to import it I get an error.  Please help.

https://gyazo.com/3415f2e1e28c095cf2c225ab7dcdbc79

Thanks,
0
sharingsunshine
Asked:
sharingsunshine
  • 3
  • 3
1 Solution
 
Eoin OSullivanConsultantCommented:
Check what modules are installed in python and what they are called
In the Python Shell type
help('modules')

Open in new window


I think that the beautifulsoup version in macports is v3
https://trac.macports.org/browser/trunk/dports/python/py-beautifulsoup/Portfile

In that case the command should probably be
import BeautifulSoup

Open in new window

0
 
sharingsunshineAuthor Commented:
Here is what I see
https://gyazo.com/f57bd11a6165e5991a7f9e66a841fcde

When I do this
import BeautifulSoup

I get this

>>> 
============== RESTART: /Users/rjw/Documents/Python/test_web.py ==============

Traceback (most recent call last):
  File "/Users/rjw/Documents/Python/test_web.py", line 25, in <module>
    (html, newText) = getText(url)
  File "/Users/rjw/Documents/Python/test_web.py", line 12, in getText
    html = BeautifulSoup(htmlString, 'html5lib')
TypeError: 'module' object is not callable
>>> 

here is the code
from requests import get
import BeautifulSoup
from re import compile, search
url = 'http://sethgodin.typepad.com'
nextPageNumRE = compile(r'page/(\d+?)/')
nextPageNum = '1'
maxPage = 4
totText = []

def getText(url):
    htmlString = get(url).text
    html = BeautifulSoup(htmlString, 'html5lib')
    tags = html.find_all('div', {'class':'entry-body'})
    text = [e.get_text() for e in tags]
    return (html, text)

def getPage(html, regex):
    nextPageTag = html.find('span', {'class':'pager-right'})
    nextPageATag = nextPageTag.find_next('a')
    nextPageURL = nextPageATag.attrs['href']
    nextPageNum = regex.search(nextPageURL).group(1)
    return (nextPageURL, nextPageNum)

while int(nextPageNum) <= maxPage:
    (html, newText) = getText(url)
    totText = totText + newText
    print (str(len(totText))) + ' posts were found'
    (url, nextPageNum) = getPage(html, nextPageNumRE)

Open in new window

0
 
Eoin OSullivanConsultantCommented:
Try changing the import line to
from BeautifulSoup import BeautifulSoup

Open in new window

0
What Kind of Coding Program is Right for You?

There are many ways to learn to code these days. From coding bootcamps like Flatiron School to online courses to totally free beginner resources. The best way to learn to code depends on many factors, but the most important one is you. See what course is best for you.

 
sharingsunshineAuthor Commented:
here is the output from that change
============== RESTART: /Users/rjw/Documents/Python/test_web.py ==============

Traceback (most recent call last):
  File "/Users/rjw/Documents/Python/test_web.py", line 26, in <module>
    (html, newText) = getText(url)
  File "/Users/rjw/Documents/Python/test_web.py", line 13, in getText
    html = BeautifulSoup(htmlString, 'html5lib')
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/BeautifulSoup.py", line 1522, in __init__
    BeautifulStoneSoup.__init__(self, *args, **kwargs)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/BeautifulSoup.py", line 1147, in __init__
    self._feed(isHTML=isHTML)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/BeautifulSoup.py", line 1189, in _feed
    SGMLParser.feed(self, markup)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/sgmllib.py", line 104, in feed
    self.goahead(0)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/sgmllib.py", line 174, in goahead
    k = self.parse_declaration(i)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/BeautifulSoup.py", line 1463, in parse_declaration
    j = SGMLParser.parse_declaration(self, i)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/markupbase.py", line 109, in parse_declaration
    self.handle_decl(data)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/BeautifulSoup.py", line 1448, in handle_decl
    self._toStringSubclass(data, Declaration)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/BeautifulSoup.py", line 1381, in _toStringSubclass
    self.endData(subclass)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/BeautifulSoup.py", line 1251, in endData
    (not self.parseOnlyThese.text or \
AttributeError: 'str' object has no attribute 'text'

Open in new window


here is the code changed
from requests import get
# import BeautifulSoup
from BeautifulSoup import BeautifulSoup
from re import compile, search
url = 'http://sethgodin.typepad.com'
nextPageNumRE = compile(r'page/(\d+?)/')
nextPageNum = '1'
maxPage = 4
totText = []

def getText(url):
    htmlString = get(url).text
    html = BeautifulSoup(htmlString, 'html5lib')
    tags = html.find_all('div', {'class':'entry-body'})
    text = [e.get_text() for e in tags]
    return (html, text)

def getPage(html, regex):
    nextPageTag = html.find('span', {'class':'pager-right'})
    nextPageATag = nextPageTag.find_next('a')
    nextPageURL = nextPageATag.attrs['href']
    nextPageNum = regex.search(nextPageURL).group(1)
    return (nextPageURL, nextPageNum)

while int(nextPageNum) <= maxPage:
    (html, newText) = getText(url)
    totText = totText + newText
    print (str(len(totText))) + ' posts were found'
    (url, nextPageNum) = getPage(html, nextPageNumRE)

Open in new window


this may be past by original issue but I can't tell since it is referencing a line number that is outside the range of my code..  If it is, say the word and I will award you the points since you got me past the BeautifulSoup hurdle.
0
 
Eoin OSullivanConsultantCommented:
Your code is now successfully calling the BeautifulSoup module .. the error / issue now lies INSIDE the BeautifulSoup code that is why the line numbers are referring to that module (BeautifulSoup.py - line 1251).

File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/BeautifulSoup.py", line 1251, in endData  (not self.parseOnlyThese.text or \ AttributeError: 'str' object has no attribute 'text'

Afraid I'm not in a position to debug this as I'm not running beautifulsoup on my Mac and I'd have to do that to replicate .. it could well be the fact that the code is for beautifulsoup4 but macports is using beautifulsoup3.
0
 
sharingsunshineAuthor Commented:
Thanks for getting me this far and letting me know where to look next.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

What Kind of Coding Program is Right for You?

There are many ways to learn to code these days. From coding bootcamps like Flatiron School to online courses to totally free beginner resources. The best way to learn to code depends on many factors, but the most important one is you. See what course is best for you.

  • 3
  • 3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now