Solved

Macports Import beautifulsoup4 Problem

Posted on 2016-07-30
6
67 Views
Last Modified: 2016-08-01
I have in this screenshot a list of installs and beautifulsoup4 is one of them.  But when I try to import it I get an error.  Please help.

https://gyazo.com/3415f2e1e28c095cf2c225ab7dcdbc79

Thanks,
0
Comment
Question by:sharingsunshine
  • 3
  • 3
6 Comments
 
LVL 39

Expert Comment

by:Eoin OSullivan
ID: 41736249
Check what modules are installed in python and what they are called
In the Python Shell type
help('modules')

Open in new window


I think that the beautifulsoup version in macports is v3
https://trac.macports.org/browser/trunk/dports/python/py-beautifulsoup/Portfile

In that case the command should probably be
import BeautifulSoup

Open in new window

0
 

Author Comment

by:sharingsunshine
ID: 41736884
Here is what I see
https://gyazo.com/f57bd11a6165e5991a7f9e66a841fcde

When I do this
import BeautifulSoup

I get this

>>> 
============== RESTART: /Users/rjw/Documents/Python/test_web.py ==============

Traceback (most recent call last):
  File "/Users/rjw/Documents/Python/test_web.py", line 25, in <module>
    (html, newText) = getText(url)
  File "/Users/rjw/Documents/Python/test_web.py", line 12, in getText
    html = BeautifulSoup(htmlString, 'html5lib')
TypeError: 'module' object is not callable
>>> 

here is the code
from requests import get
import BeautifulSoup
from re import compile, search
url = 'http://sethgodin.typepad.com'
nextPageNumRE = compile(r'page/(\d+?)/')
nextPageNum = '1'
maxPage = 4
totText = []

def getText(url):
    htmlString = get(url).text
    html = BeautifulSoup(htmlString, 'html5lib')
    tags = html.find_all('div', {'class':'entry-body'})
    text = [e.get_text() for e in tags]
    return (html, text)

def getPage(html, regex):
    nextPageTag = html.find('span', {'class':'pager-right'})
    nextPageATag = nextPageTag.find_next('a')
    nextPageURL = nextPageATag.attrs['href']
    nextPageNum = regex.search(nextPageURL).group(1)
    return (nextPageURL, nextPageNum)

while int(nextPageNum) <= maxPage:
    (html, newText) = getText(url)
    totText = totText + newText
    print (str(len(totText))) + ' posts were found'
    (url, nextPageNum) = getPage(html, nextPageNumRE)

Open in new window

0
 
LVL 39

Accepted Solution

by:
Eoin OSullivan earned 500 total points
ID: 41737355
Try changing the import line to
from BeautifulSoup import BeautifulSoup

Open in new window

0
Complete VMware vSphere® ESX(i) & Hyper-V Backup

Capture your entire system, including the host, with patented disk imaging integrated with VMware VADP / Microsoft VSS and RCT. RTOs is as low as 15 seconds with Acronis Active Restore™. You can enjoy unlimited P2V/V2V migrations from any source (even from a different hypervisor)

 

Author Comment

by:sharingsunshine
ID: 41737379
here is the output from that change
============== RESTART: /Users/rjw/Documents/Python/test_web.py ==============

Traceback (most recent call last):
  File "/Users/rjw/Documents/Python/test_web.py", line 26, in <module>
    (html, newText) = getText(url)
  File "/Users/rjw/Documents/Python/test_web.py", line 13, in getText
    html = BeautifulSoup(htmlString, 'html5lib')
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/BeautifulSoup.py", line 1522, in __init__
    BeautifulStoneSoup.__init__(self, *args, **kwargs)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/BeautifulSoup.py", line 1147, in __init__
    self._feed(isHTML=isHTML)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/BeautifulSoup.py", line 1189, in _feed
    SGMLParser.feed(self, markup)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/sgmllib.py", line 104, in feed
    self.goahead(0)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/sgmllib.py", line 174, in goahead
    k = self.parse_declaration(i)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/BeautifulSoup.py", line 1463, in parse_declaration
    j = SGMLParser.parse_declaration(self, i)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/markupbase.py", line 109, in parse_declaration
    self.handle_decl(data)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/BeautifulSoup.py", line 1448, in handle_decl
    self._toStringSubclass(data, Declaration)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/BeautifulSoup.py", line 1381, in _toStringSubclass
    self.endData(subclass)
  File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/BeautifulSoup.py", line 1251, in endData
    (not self.parseOnlyThese.text or \
AttributeError: 'str' object has no attribute 'text'

Open in new window


here is the code changed
from requests import get
# import BeautifulSoup
from BeautifulSoup import BeautifulSoup
from re import compile, search
url = 'http://sethgodin.typepad.com'
nextPageNumRE = compile(r'page/(\d+?)/')
nextPageNum = '1'
maxPage = 4
totText = []

def getText(url):
    htmlString = get(url).text
    html = BeautifulSoup(htmlString, 'html5lib')
    tags = html.find_all('div', {'class':'entry-body'})
    text = [e.get_text() for e in tags]
    return (html, text)

def getPage(html, regex):
    nextPageTag = html.find('span', {'class':'pager-right'})
    nextPageATag = nextPageTag.find_next('a')
    nextPageURL = nextPageATag.attrs['href']
    nextPageNum = regex.search(nextPageURL).group(1)
    return (nextPageURL, nextPageNum)

while int(nextPageNum) <= maxPage:
    (html, newText) = getText(url)
    totText = totText + newText
    print (str(len(totText))) + ' posts were found'
    (url, nextPageNum) = getPage(html, nextPageNumRE)

Open in new window


this may be past by original issue but I can't tell since it is referencing a line number that is outside the range of my code..  If it is, say the word and I will award you the points since you got me past the BeautifulSoup hurdle.
0
 
LVL 39

Expert Comment

by:Eoin OSullivan
ID: 41737403
Your code is now successfully calling the BeautifulSoup module .. the error / issue now lies INSIDE the BeautifulSoup code that is why the line numbers are referring to that module (BeautifulSoup.py - line 1251).

File "/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/BeautifulSoup.py", line 1251, in endData  (not self.parseOnlyThese.text or \ AttributeError: 'str' object has no attribute 'text'

Afraid I'm not in a position to debug this as I'm not running beautifulsoup on my Mac and I'd have to do that to replicate .. it could well be the fact that the code is for beautifulsoup4 but macports is using beautifulsoup3.
0
 

Author Closing Comment

by:sharingsunshine
ID: 41737459
Thanks for getting me this far and letting me know where to look next.
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
How can I iterate through mysql tables to alter character set? 3 60
MAC Needs 2 Domains 2 52
how to split this type of line? 5 84
How does this Python sort work? 5 67
In this article we have discussed about the OS X EI Capitan and how to fix Wi-Fi issue in OS X El Capitan. We have explained how to delete system level preferences and create a new Wi-Fi location to resolve Wi-Fi issue.
Today, still in the boom of Apple, PC's and products, nearly 50% of the computer users use Windows as graphical operating systems. If you are among those users who love windows, but are grappling to keep the system's hard drive optimized, then you s…
Learn the basics of lists in Python. Lists, as their name suggests, are a means for ordering and storing values. : Lists are declared using brackets; for example: t = [1, 2, 3]: Lists may contain a mix of data types; for example: t = ['string', 1, T…
Learn the basics of modules and packages in Python. Every Python file is a module, ending in the suffix: .py: Modules are a collection of functions and variables.: Packages are a collection of modules.: Module functions and variables are accessed us…

920 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

11 Experts available now in Live!

Get 1:1 Help Now