Advertisement

07.25.2008 at 08:32PM PDT, ID: 23597247 | Points: 500
[x]
Attachment Details

Getting the Title of a web page in a clearer format using ruby, open-uri and

Asked by Yavor_01126 in Ruby Scripting Language

Tags: ,

Hello!

I am trying to clean up a title I get from a web page when using open-uri and ofcourse Ruby on Rails.

Example:

If the url is: http://digg.com/travel_places/7_Dizzying_Cliff_and_Mountain_Houses_and_Dwellings_PICS
The title is: Digg - 7 Dizzying Cliff and Mountain Houses and Dwellings [PICS]

The result we want is to have a title like this: 7 Dizzying Cliff and Mountain Houses and Dwellings [PICS]

With one word... without the DOMAIN in it.
Without the nasty "Digg - " or whatever for the web page we access this is. I want to cut it from the front or back of the string also I want to cut it if it is Digg or Digg.com or www.Digg.com

My code for now don't do much about clearing it but it takes what I need from the web page.

Hope I explained it well and someone helps!Start Free Trial
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
27:
28:
29:
30:
31:
32:
33:
34:
35:
36:
if request.post?
      @note = Note.new(params[:note])
      #Load requirements
      require "open-uri"
      require "hpricot"
      #Get the content of the URL
      document_html = open(@note.url, "User-Agent" => "Fanopic")
      #Check to see if it is a HTML Page
      if not document_html.content_type == "text/html"
        title = _("The page is not TEXT/HTML")
        flash[:error] = notice("error", title)
        redirect_to :action => "create" and return
      end
      #Take the URL of the content (if there were any redirects)
      @note.url = document_html.base_uri.to_s
      #Get the stream open (make it an Hpricot object)
      document_html = Hpricot(document_html)
      #Scan the page for needed elements
      document_title        =  document_html.search("title")
      document_description  =  document_html.search("//meta[@name='description']")
      #Take the needed elements without nasty symbols
      if not document_title.blank?
        @note.title = document_title.html.strip
      else
        title = _("We couldn't acquire the page title.")
        flash[:error] = notice("error", title)
        redirect_to :action => "create" and return
      end
      if not document_description.blank?
        @note.content = document_description[0].attributes["content"].strip
      else
        title = _("We couldn't acquire a page description.")
        flash[:error] = notice("error", title)
        redirect_to :action => "create" and return
      end
    end
[+][-]07.26.2008 at 02:15AM PDT, ID: 22094307

Often, when Experts are collaborating with members who have asked questions, they will request additional information about the problem. Askers respond with an author comment like this one.

Start your 7-day free trial to view this Author Comment or ask the Experts your question.

 
[+][-]07.29.2008 at 09:18PM PDT, ID: 22117467

At Experts Exchange, members can ask their questions to thousands of technology professionals, also known as Experts. Experts compete and collaborate to answer those questions by leaving comments like this one.

Start your 7-day free trial to view this Expert Comment or ask the Experts your question.

 
 
Loading Advertisement...
20081112-EE-VQP-42 / EE_QW_2_20070628