Solved

Macports Import beautifulsoup4 Problem

Posted on 2016-07-30
6
85 Views
Last Modified: 2016-08-01
I have in this screenshot a list of installs and beautifulsoup4 is one of them.  But when I try to import it I get an error.  Please help.

https://gyazo.com/3415f2e1e28c095cf2c225ab7dcdbc79

Thanks,
0
Comment
Question by:sharingsunshine
  • 3
  • 3
6 Comments
 
LVL 40

Expert Comment

by:Eoin OSullivan
ID: 41736249
Check what modules are installed in python and what they are called
In the Python Shell type
help('modules')

Open in new window


I think that the beautifulsoup version in macports is v3
https://trac.macports.org/browser/trunk/dports/python/py-beautifulsoup/Portfile

In that case the command should probably be
import BeautifulSoup

Open in new window

0
 

Author Comment

by:sharingsunshine
ID: 41736884
Here is what I see
https://gyazo.com/f57bd11a6165e5991a7f9e66a841fcde

When I do this
import BeautifulSoup

I get this

>>> 
============== RESTART: /Users/rjw/Documents/Python/test_web.py ==============

Traceback (most recent call last):
  File "/Users/rjw/Documents/Python/test_web.py", line 25, in <module>
    (html, newText) = getText(url)
  File "/Users/rjw/Documents/Python/test_web.py", line 12, in getText
    html = BeautifulSoup(htmlString, 'html5lib')
TypeError: 'module' object is not callable
>>> 

here is the code
from requests import get
import BeautifulSoup
from re import compile, search
url = 'http://sethgodin.typepad.com'
nextPageNumRE = compile(r'page/(\d+?)/')
nextPageNum = '1'
maxPage = 4
totText = []

def getText(url):
    htmlString = get(url).text
    html = BeautifulSoup(htmlString, 'html5lib')
    tags = html.find_all('div', {'class':'entry-body'})
    text = [e.get_text() for e in tags]
    return (html, text)

def getPage(html, regex):
    nextPageTag = html.find('span', {'class':'pager-right'})
    nextPageATag = nextPageTag.find_next('a')
    nextPageURL = nextPageATag.attrs['href']
    nextPageNum = regex.search(nextPageURL).group(1)
    return (nextPageURL, nextPageNum)

while int(nextPageNum) <= maxPage:
    (html, newText) = getText(url)
    totText = totText + newText
    print (str(len(totText))) + ' posts were found'
    (url, nextPageNum) = getPage(html, nextPageNumRE)

Open in new window

0
 
LVL 40

Accepted Solution

by:
Eoin OSullivan earned 500 total points
ID: 41737355
Try changing the import line to
from BeautifulSoup import BeautifulSoup

Open in new window

0
U.S. Department of Agriculture and Acronis Access

With the new era of mobile computing, smartphones and tablets, wireless communications and cloud services, the USDA sought to take advantage of a mobilized workforce and the blurring lines between personal and corporate computing resources.

 

Author Comment

by:sharingsunshine
ID: 41737379
here is the output from that change
============== RESTART: /Users/rjw/Documents/Python/test_web.py ==============

Traceback (most recent call last):
  File "/Users/rjw/Documents/Python/test_web.py", line 26, in <module>
    (html, newText) = getText(url)
  File "/Users/rjw/Documents/Python/test_web.py", line 13, in getText
    html = BeautifulSoup(htmlString, 'html5lib')
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/BeautifulSoup.py", line 1522, in __init__
    BeautifulStoneSoup.__init__(self, *args, **kwargs)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/BeautifulSoup.py", line 1147, in __init__
    self._feed(isHTML=isHTML)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/BeautifulSoup.py", line 1189, in _feed
    SGMLParser.feed(self, markup)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/sgmllib.py", line 104, in feed
    self.goahead(0)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/sgmllib.py", line 174, in goahead
    k = self.parse_declaration(i)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/BeautifulSoup.py", line 1463, in parse_declaration
    j = SGMLParser.parse_declaration(self, i)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/markupbase.py", line 109, in parse_declaration
    self.handle_decl(data)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/BeautifulSoup.py", line 1448, in handle_decl
    self._toStringSubclass(data, Declaration)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/BeautifulSoup.py", line 1381, in _toStringSubclass
    self.endData(subclass)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/BeautifulSoup.py", line 1251, in endData
    (not self.parseOnlyThese.text or \
AttributeError: 'str' object has no attribute 'text'

Open in new window


here is the code changed
from requests import get
# import BeautifulSoup
from BeautifulSoup import BeautifulSoup
from re import compile, search
url = 'http://sethgodin.typepad.com'
nextPageNumRE = compile(r'page/(\d+?)/')
nextPageNum = '1'
maxPage = 4
totText = []

def getText(url):
    htmlString = get(url).text
    html = BeautifulSoup(htmlString, 'html5lib')
    tags = html.find_all('div', {'class':'entry-body'})
    text = [e.get_text() for e in tags]
    return (html, text)

def getPage(html, regex):
    nextPageTag = html.find('span', {'class':'pager-right'})
    nextPageATag = nextPageTag.find_next('a')
    nextPageURL = nextPageATag.attrs['href']
    nextPageNum = regex.search(nextPageURL).group(1)
    return (nextPageURL, nextPageNum)

while int(nextPageNum) <= maxPage:
    (html, newText) = getText(url)
    totText = totText + newText
    print (str(len(totText))) + ' posts were found'
    (url, nextPageNum) = getPage(html, nextPageNumRE)

Open in new window


this may be past by original issue but I can't tell since it is referencing a line number that is outside the range of my code..  If it is, say the word and I will award you the points since you got me past the BeautifulSoup hurdle.
0
 
LVL 40

Expert Comment

by:Eoin OSullivan
ID: 41737403
Your code is now successfully calling the BeautifulSoup module .. the error / issue now lies INSIDE the BeautifulSoup code that is why the line numbers are referring to that module (BeautifulSoup.py - line 1251).

File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/BeautifulSoup.py", line 1251, in endData  (not self.parseOnlyThese.text or \ AttributeError: 'str' object has no attribute 'text'

Afraid I'm not in a position to debug this as I'm not running beautifulsoup on my Mac and I'd have to do that to replicate .. it could well be the fact that the code is for beautifulsoup4 but macports is using beautifulsoup3.
0
 

Author Closing Comment

by:sharingsunshine
ID: 41737459
Thanks for getting me this far and letting me know where to look next.
0

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

A set of related code is known to be a Module, it helps us to organize our code logically which is much easier for us to understand and use it. Module is an object with arbitrarily named attributes which can be used in binding and referencing. …
In this article we have discussed about the OS X EI Capitan and how to fix Wi-Fi issue in OS X El Capitan. We have explained how to delete system level preferences and create a new Wi-Fi location to resolve Wi-Fi issue.
Learn the basics of strings in Python: declaration, operations, indices, and slicing. Strings are declared with quotations; for example: s = "string": Strings are immutable.: Strings may be concatenated or multiplied using the addition and multiplic…
This Micro Tutorial hows how you can integrate  Mac OSX to a Windows Active Directory Domain. Apple has made it easy to allow users to bind their macs to a windows domain with relative ease. The following video show how to bind OSX Mavericks to …

730 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question