Solved

Macports Import beautifulsoup4 Problem

Posted on 2016-07-30
6
60 Views
Last Modified: 2016-08-01
I have in this screenshot a list of installs and beautifulsoup4 is one of them.  But when I try to import it I get an error.  Please help.

https://gyazo.com/3415f2e1e28c095cf2c225ab7dcdbc79

Thanks,
0
Comment
Question by:sharingsunshine
  • 3
  • 3
6 Comments
 
LVL 39

Expert Comment

by:Eoin OSullivan
ID: 41736249
Check what modules are installed in python and what they are called
In the Python Shell type
help('modules')

Open in new window


I think that the beautifulsoup version in macports is v3
https://trac.macports.org/browser/trunk/dports/python/py-beautifulsoup/Portfile

In that case the command should probably be
import BeautifulSoup

Open in new window

0
 

Author Comment

by:sharingsunshine
ID: 41736884
Here is what I see
https://gyazo.com/f57bd11a6165e5991a7f9e66a841fcde

When I do this
import BeautifulSoup

I get this

>>>
============== RESTART: /Users/rjw/Documents/Python/test_web.py ==============

Traceback (most recent call last):
  File "/Users/rjw/Documents/Python/test_web.py", line 25, in <module>
    (html, newText) = getText(url)
  File "/Users/rjw/Documents/Python/test_web.py", line 12, in getText
    html = BeautifulSoup(htmlString, 'html5lib')
TypeError: 'module' object is not callable
>>>

here is the code
from requests import get
import BeautifulSoup
from re import compile, search
url = 'http://sethgodin.typepad.com'
nextPageNumRE = compile(r'page/(\d+?)/')
nextPageNum = '1'
maxPage = 4
totText = []

def getText(url):
    htmlString = get(url).text
    html = BeautifulSoup(htmlString, 'html5lib')
    tags = html.find_all('div', {'class':'entry-body'})
    text = [e.get_text() for e in tags]
    return (html, text)

def getPage(html, regex):
    nextPageTag = html.find('span', {'class':'pager-right'})
    nextPageATag = nextPageTag.find_next('a')
    nextPageURL = nextPageATag.attrs['href']
    nextPageNum = regex.search(nextPageURL).group(1)
    return (nextPageURL, nextPageNum)

while int(nextPageNum) <= maxPage:
    (html, newText) = getText(url)
    totText = totText + newText
    print (str(len(totText))) + ' posts were found'
    (url, nextPageNum) = getPage(html, nextPageNumRE)

Open in new window

0
 
LVL 39

Accepted Solution

by:
Eoin OSullivan earned 500 total points
ID: 41737355
Try changing the import line to
from BeautifulSoup import BeautifulSoup

Open in new window

0
What Should I Do With This Threat Intelligence?

Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

 

Author Comment

by:sharingsunshine
ID: 41737379
here is the output from that change
============== RESTART: /Users/rjw/Documents/Python/test_web.py ==============

Traceback (most recent call last):
  File "/Users/rjw/Documents/Python/test_web.py", line 26, in <module>
    (html, newText) = getText(url)
  File "/Users/rjw/Documents/Python/test_web.py", line 13, in getText
    html = BeautifulSoup(htmlString, 'html5lib')
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/BeautifulSoup.py", line 1522, in __init__
    BeautifulStoneSoup.__init__(self, *args, **kwargs)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/BeautifulSoup.py", line 1147, in __init__
    self._feed(isHTML=isHTML)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/BeautifulSoup.py", line 1189, in _feed
    SGMLParser.feed(self, markup)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/sgmllib.py", line 104, in feed
    self.goahead(0)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/sgmllib.py", line 174, in goahead
    k = self.parse_declaration(i)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/BeautifulSoup.py", line 1463, in parse_declaration
    j = SGMLParser.parse_declaration(self, i)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/markupbase.py", line 109, in parse_declaration
    self.handle_decl(data)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/BeautifulSoup.py", line 1448, in handle_decl
    self._toStringSubclass(data, Declaration)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/BeautifulSoup.py", line 1381, in _toStringSubclass
    self.endData(subclass)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/BeautifulSoup.py", line 1251, in endData
    (not self.parseOnlyThese.text or \
AttributeError: 'str' object has no attribute 'text'

Open in new window


here is the code changed
from requests import get
# import BeautifulSoup
from BeautifulSoup import BeautifulSoup
from re import compile, search
url = 'http://sethgodin.typepad.com'
nextPageNumRE = compile(r'page/(\d+?)/')
nextPageNum = '1'
maxPage = 4
totText = []

def getText(url):
    htmlString = get(url).text
    html = BeautifulSoup(htmlString, 'html5lib')
    tags = html.find_all('div', {'class':'entry-body'})
    text = [e.get_text() for e in tags]
    return (html, text)

def getPage(html, regex):
    nextPageTag = html.find('span', {'class':'pager-right'})
    nextPageATag = nextPageTag.find_next('a')
    nextPageURL = nextPageATag.attrs['href']
    nextPageNum = regex.search(nextPageURL).group(1)
    return (nextPageURL, nextPageNum)

while int(nextPageNum) <= maxPage:
    (html, newText) = getText(url)
    totText = totText + newText
    print (str(len(totText))) + ' posts were found'
    (url, nextPageNum) = getPage(html, nextPageNumRE)

Open in new window


this may be past by original issue but I can't tell since it is referencing a line number that is outside the range of my code..  If it is, say the word and I will award you the points since you got me past the BeautifulSoup hurdle.
0
 
LVL 39

Expert Comment

by:Eoin OSullivan
ID: 41737403
Your code is now successfully calling the BeautifulSoup module .. the error / issue now lies INSIDE the BeautifulSoup code that is why the line numbers are referring to that module (BeautifulSoup.py - line 1251).

File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/BeautifulSoup.py", line 1251, in endData  (not self.parseOnlyThese.text or \ AttributeError: 'str' object has no attribute 'text'

Afraid I'm not in a position to debug this as I'm not running beautifulsoup on my Mac and I'd have to do that to replicate .. it could well be the fact that the code is for beautifulsoup4 but macports is using beautifulsoup3.
0
 

Author Closing Comment

by:sharingsunshine
ID: 41737459
Thanks for getting me this far and letting me know where to look next.
0

Featured Post

Free Gift Card with Acronis Backup Purchase!

Backup any data in any location: local and remote systems, physical and virtual servers, private and public clouds, Macs and PCs, tablets and mobile devices, & more! For limited time only, buy any Acronis backup products and get a FREE Amazon/Best Buy gift card worth up to $200!

Join & Write a Comment

Here I am using Python IDLE(GUI) to write a simple program and save it, so that we can just execute it in future. Because when we write any program and exit from Python then program that we have written will be lost. So for not losing our program we…
Introduction On September 29, 2012, the Python 3.3.0 was released; nothing extremely unexpected,  yet another, better version of Python. But, if you work in Microsoft Windows, you should notice that the Python Launcher for Windows was introduced wi…
Learn the basics of lists in Python. Lists, as their name suggests, are a means for ordering and storing values. : Lists are declared using brackets; for example: t = [1, 2, 3]: Lists may contain a mix of data types; for example: t = ['string', 1, T…
Learn the basics of if, else, and elif statements in Python 2.7. Use "if" statements to test a specified condition.: The structure of an if statement is as follows: (CODE) Use "else" statements to allow the execution of an alternative, if the …

705 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

17 Experts available now in Live!

Get 1:1 Help Now