Go Premium for a chance to win a PS4. Enter to Win

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 709
  • Last Modified:

How to pull information of websites automatically

I am developing a database where there are assumptions made. I need to build in the ability where if the assumption is made upon a variable factor like exchange rates for example , a person can give the url to a website where the information can be pulled in automatically & if the exchange rate spikes , an email must be sent to a person informing them about the spike in exchange rates and that the assumption made is no longer valid.

I dont know how to begin to make this flexible for different clients. different people will have different assumptions made  , based on different websites information , how do I begin to code something of this nature ?
0
PParuman
Asked:
PParuman
  • 14
  • 13
  • 10
  • +2
1 Solution
 
Sjef BosmanGroupware ConsultantCommented:
Boy, you have some great questions!

Some building blocks for your database:
- agents can set to be run on schedule
- a Domino server can be used as a web browser (see web.nsf)
- thus, the agent can retrieve pages from the Internet
- the content of the page retrieved can be interpreted (text only of course)
- the agent can compare the content with personal assumptions
- the agent can send mail

Does this help you somewhat?
0
 
Bozzie4Commented:
Pfoe.  It's possible, but difficult.  

1. You know the structure of each site, and parse the html according to that knowledge, to get the info you need
2. You open the website in a browser, create a (javascript?) app.  that lets the user select the part on screen he wants to use, and you just copy that data (or you use it to configure the agent)
3. Try to get the data in simple text or xml format from the website owners

cheers,

Tom
0
 
RanjeetRainCommented:
Interesting! I'd love to help you with this.

Components that your app will/should have:

(1) Data Agents - Will be responsible for fetching data from Internet and maintain/keep updating the built-in repository. Without this component your system will have to be a REAL TIME system, which may not be a success. Web Services will be the key providers and SOAP and XML may be your transport layer.

(2) Data Repository - YOu will store the intermediate data here, else... real-time. This will not only serve as a local repository but also will be much useful when it comes to trend-analysis. (I know you didn't ask this, but tomorrow you will need this).

(3) Presentation Manager - Be ready to spend some good amount of time on this. It better be neat. Java may be useful for you (it works good with domino, will ensure you can run it anywhere, with any client, without a headache).

(4) Event Manager - An interesting aspect of this may be this layer. Who orchestrates the show? THe above mentioned three layers may not be put to best use, untill you have some smart cookie sitting there to make them do the work. Hence, the Event Manager.
0
Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

 
PParumanAuthor Commented:
Hi everyone...

I have tried to use your suggestion SJef , but I continually get an error coming back in the body of the document.

This is what is displayed in the body field of the document that is returned :

<H1>ERROR</H1>The requested document (URL http://www.lotus.com/) could not be accessed.<p>The remote server either is not accessible or is refusing to serve the document.

any ideas on how to fix this ?  We have started the web service on the server etc....

This is my code in my scheduled agent :

Messagebox("****************** RUNNING AGENT**************************")
      
      Dim session As New NotesSession
      Dim db As NotesDatabase
      Dim webDb As NotesDatabase
      Dim doc As NotesDocument
      Set db = session.CurrentDatabase
      Set webDB = session.getDatabase("","Web.nsf")
      Set doc = webDB.GetDocumentByURL("http://www.lotus.com")
      Messagebox(doc.body(0))
Messagebox("*********************** FINISHED RUNNING AGENT*************")
0
 
RanjeetRainCommented:
Check your proxy settings. May be your settings are incorrect. Try navigating to the document manually using the personal web navigator.
0
 
Sjef BosmanGroupware ConsultantCommented:
The webdb you're using might not be the one on the server. Did you run your agent locally??

To test whether the web-service works, do the following:
- open your Location document (or create a new one)
- tab Servers: set the InterNotes server to the server you're testing with
- tab Internet Browser: select Notes and from InterNotes server
- save and close
- open the database web.nsf on the server
- click om Open URL, and type your url
- alternative: click on references and double-click some page

Should work...
0
 
PParumanAuthor Commented:
I have found another way to get the HTML from a URL , I used the following code :

      Dim test As String
      Set objHttp = CreateObject("Microsoft.XMLHTTP")
      
      url = |http://www.resbank.co.za/sarbdata/rates/rates.asp?type=HPR|
      objHttp.open "POST", url, False, "", ""
      objHttp.send
      
      response = objHttp.ResponseText
      test = response
      Messagebox(test)

it works fine. I just now need to find a way to strip the HTML tags from the text
Does anybody have code to strip the HTMl Tags off ?

0
 
p_parthaCommented:
Pparuman
when i read your question,the firs thing that came into my head was xmlhttp .. bad that you found it urself. Anyways now since you are stuck with stripping out HTML Tags, let me jump into this .

Its not easy to get the text of an HTML. What you need to do is to assign this response text to a object and get the innerText of just the body.

Or the best alternative is RSS. Most of the sites do give that. Check out whether the site you are referring to spits out XML. You can just parse it in a giffy

Partha
0
 
p_parthaCommented:
i meant assigning response text to an object right, this is how you do it

var xmlDoc = new ActiveXObject("microsoft.xmldom");
xmlDoc.loadXML(objHttp.responseText);


you can just try

bodytag = xmlDoc.getAllElementsByTagName("body'')[0]
alert(bodytag.innerText)

Partha
0
 
Sjef BosmanGroupware ConsultantCommented:
Partha, bro, that's cheating! Using a Microsoft object where Notes has the perfect (hmm...) tools to do this. I'm glad to say thet the object-method won't work on Linux. Web.nsf has worked ever since 4.6 (or so), is not the best in the world but has everything you need. You can even see your webpage as rich-text and as HTML!
0
 
p_parthaCommented:
I agree bro, but xmlhttp object has lot of methods to manipulate the output, But RSS is definitely the best answer for this

Partha
0
 
PParumanAuthor Commented:
Do you guys know how to eliminate newline characters from a string in LotusScript ?
0
 
Sjef BosmanGroupware ConsultantCommented:
RSS will only work if th eother party supplies it (if I'm correctly informed). What if not?

PParuman: did you already get the web.nsf basics to work?
0
 
p_parthaCommented:
Pparuman
Did you try what i suggested?

Partha
0
 
Sjef BosmanGroupware ConsultantCommented:
Partha,

Did you try what you suggested?

Sjef ;)

PS You know my approach probably, why is yours better?
0
 
PParumanAuthor Commented:
Hi guys , I have hit a problem with the XML object code.... it works beautifully when the agent is running manually from the agents list in the designer , but when it is scheduled , it gives an error , OLE Automation Error .... any ideas ?
0
 
Sjef BosmanGroupware ConsultantCommented:
Well, if the agent runs on the server, you have to install the objectcode on the server... As I said before, it must be a Windows server. Linux is out of the question. There may be other tools to do this in Linux.

Would you please be so kind to tell me what you did with my comments about web.nsf?
0
 
p_parthaCommented:
it's a simple installation on the server, and pls comment abt web.nsf, my bro has given his valuable comments ...

Partha
0
 
Sjef BosmanGroupware ConsultantCommented:
Hi bro, I'm going to post some nice feedback for you some day :)

Still, my q. remains: why is your solution better? Or is it just different?
0
 
PParumanAuthor Commented:
Hi Guys....

Sorry I am taking so long to reply to your replies.... I am at clients the whole day & I only get to dialin once in a while .

You guys mentioned that I have to install something on the server , What is it that I need to install ? where do I get what I have to install ???

Comments about the solutions ....

I think that the we.nsf solution would be brilliant if I could get it to work , but I tried & I got frustrated trying... thats what lead me to the XML solution.  I really cant tell which solution is better until I have both solutions working to test them :)
Guess you guys have to fight this one out... can I have pics to see who my bets on ;p
0
 
p_parthaCommented:
To make my bro's solution work, we need to have the webnavigator running, but there is a potential security breach (that's what my admin said when we asked ) ...

I have no other reason to go for XMLhttp.

download , i will try giving u the link..

Partha


P.S: we are not fighting, that's the way we enjoy giving answers


0
 
PParumanAuthor Commented:
Hi Partha....

Thanks for the quick posting , I would really appreciate it if you could send me the link to download the stuff I need...

What is it that I need by the way ?
0
 
p_parthaCommented:
It's installed by default with IE5 and above. Do you have IE5 in your Server?

Partha
0
 
PParumanAuthor Commented:
Dont really know what version of explorer we are using...

What exactly is it that needs to be installed ?  Is it a service or something ?
0
 
Sjef BosmanGroupware ConsultantCommented:
What a pleasure it is to know nothing about xmlhttptmlip, or whatever the name is!

About web.nsf: there are some parameters required. Firstly, in the Server Document, see the tab Server Tasks, sub-tab Web Retriever, but usually the defaults are okay (web.nsf, HTTP, FTP, GOPHER, 50, None, Once per session).

Then, open the web.nsf database, click on Database Views and then File/Database/Access control. You need the WebMaster role to adapt the configuration. If you have the role, click on Actions/Administration. This profile document is REQUIRED.

Last one, the proxy (see also above). Your system has to know how to reach the outside world. Is there an inside server you could do some testing with? Do you have a firewall checking on username?

What are your results until now?
0
 
PParumanAuthor Commented:
Up to now , my results have only returned the following error ...

<H1>ERROR</H1>The requested document (URL http://www-306.ibm.com/software/lotus) could not be accessed.<p>The remote server either is not accessible or is refusing to serve the document.

I have had no clue why it would not work... I have read your reply now.. I have to wait until monday to have my administrator give me the webmaster role & for me to go and check if there is a profile document.  What happens if I use a proxy server when connecting to the internet , does that mean that the server uses the same proxy server ?

If so where do I specify the proxy settings in order for the server to resolve the url I send it ?

When I connect to the internet , I am not prompted for a username & password , it just takes me straight to the url I require.

Thanks for your reply sJef.
0
 
PParumanAuthor Commented:
Hi All...

sJef.... I was told by our administrators that web.nsf etc is not supported in version 6.5 of notes. This makes the solution of web.nsf useless as it has to work on versions to come & not just on version 5.

Partha , is there anything you can think of that will make the XML HTTP code work in the scheduled  agent ? I found out that we are using version 6 of IE... so I dont understand why it is not working ....

Please help.... I need to find a solution to my problem soon...
0
 
Sjef BosmanGroupware ConsultantCommented:
Not supported?? By whom: Lotus, or your administrators?

Judging from these links, it is there:
    http://www.waresource.com/kurchak/ka1.nsf/0/adcd1f339ed1d5cc85256dc7005bb1ae?OpenDocument
    https://www-10.lotus.com/ldd/beta/nd7pubbeta.nsf/0/3337042698898c2785256ebb007111f8?OpenDocument (R7 beta!)

Please ask them again.
0
 
PParumanAuthor Commented:
Thanks for your posting sJef... I am going to be in the office tomorrow & I will be playing around with the code again... I will definitely speak to our administrators & find out what is going on ....

0
 
Sjef BosmanGroupware ConsultantCommented:
Partha, or anyone else, ever heard of discontinued support for WEB.EXE and WEB.NSF??
0
 
p_parthaCommented:
It's pretty much there in R6 and will be there in R7 R8.... If they have removed that, then probably they will deprecate the getdocumentbyurl method itslef , whcih has not been done...

Partha
0
 
Sjef BosmanGroupware ConsultantCommented:
Brilliant deduction. Thanks!
0
 
PParumanAuthor Commented:
Hi guys...


This is my code that I have used in my agent ....

      Messagebox("****************** RUNNING AGENT**************************")
      
      Dim session As New NotesSession
      Dim db As NotesDatabase
      Dim webDb As NotesDatabase
      Dim doc As NotesDocument
      Set db = session.CurrentDatabase
      Set webDB = session.getDatabase(db.Server,"Web.nsf")
      Set doc = webDB.GetDocumentByURL("http://www.lotus.com")
      Messagebox(doc.body(0))
      Messagebox("*********************** FINISHED RUNNING AGENT*************")

This is what is thrown out in the logs after the scheduled agent has run.... Please can you help me & tell me what I am doing wrong... I have created the profile doc within the web.nsf database... is there any special rights I need to have on the server in order for my agent to run ?

09/30/2004 09:18:28   AMgr: Agent ('TestWebnsf' in 'New Depict\StrategyTracker.nsf') message box: ****************** RUNNING AGENT**************************
09/30/2004 09:18:28   WEB(2): Loading additional WEB task
09/30/2004 09:18:28   WEB(3): Initializing
09/30/2004 09:18:49   AMgr: Agent ('TestWebnsf' in 'New Depict\StrategyTracker.nsf') message box: <H1>ERROR</H1>The requested document (URL http://www.lotus.com/) could not be accessed.<p>The remote server either is not accessible or is refusing to serve the document.
09/30/2004 09:18:49   AMgr: Agent ('TestWebnsf' in 'New Depict\StrategyTracker.nsf') message box: *********************** FINISHED RUNNING AGENT*************
0
 
Sjef BosmanGroupware ConsultantCommented:
This tells me that the code above runs, for all MessageBox calls are printed and I assume you habe no On Error Resume Next in your code. The ERROR message printed comes actually from the web.nsf database, so a document has been created for you. Try opening the web.nsf database on the server, not by double-clicking it, but as follows: in the workspace, click once on the db-icon, then on view, goto, all documents. You will see some documents "loaded" from the Internet, among which there should be one about www.lotus.com. You can look at the Document Properties toe see if there are errors mentioned there, like HTTPStatus and URL.

You don't have to do this programmatically. Test it manually first, by opening the server's web.nsf database as usual (dbl-click), and type a URL. The page that should result will also be shown in the Database Views, so you can study the properties if need be.

Most likely the server doesn't know how to get through to the Internet. Is it possible, from the server console, to start an Internet Exploder and get info from the Internet? What are the settings in IE? Is there a proxy? Maybe your firewall blocks requests, but are they logged somewhere? I can confirm that the request DID go out, absolutely.

There are proxy settings in the Server document in the N&A-book. See your N&A-book, open the server's Server document, 3rd tab Ports, 3rd sub-tab Proxies.
0
 
PParumanAuthor Commented:
Hi
Sjef...

I opened the database on the server like you said , but the database is completely blank.. there are no documents .

I do connect through a proxy for the internet... will be checking later on at the server about the internet settings there.... our administrators are very sticky & refuse to give us info about things contained on the server & about the server settings...

Will Keep you posted about the settings once I find them out.
0
 
Sjef BosmanGroupware ConsultantCommented:
Probably you're not allowed to read the documents in the database, you need some privileges.

To open a URL yourself, you might have to change your Location settings, or create a copy of the current one (Office?) and call it "Office (InterNotes)". The modified settings:
- under Internet Browser, select Notes, then select from InterNotes server
- under Servers, add the name of the server you're using as the InterNotes server

Now, go into the web.nsf database and try to retrieve a URL.
0
 
PParumanAuthor Commented:
I have found out that the server was not setup as an internotes server .... under the basics tab of the server configuration There is a section for the server location & there is a field where you can add the internotes server .... that field seems to be empty....

What performance factors are there if we have to make our domino server an internotes server ?
0
 
Sjef BosmanGroupware ConsultantCommented:
The Basic tab/Location section contains very few interesting fields. The server I know doesn't have the InterNotes servername filled in, but if you fill it it will be the default setting for all users. Usually, you have to use your own Location document in the Notes client, and make it use the InterNotes server, as described earlier.

Starting the WEB.EXE retriever process on the server will effectively turn your server into an InterNotes server. Performance? I know it doesn't come for free. I'd have to guess, but it all depends on the number of pages you intend to retrieve. The settings for the WEB process can be found in the WEB.NSF database itself, visible only to a user with the WebMaster role.

Need to know more? Open the Administrator Help database, and look for the document called "Setting up a Web Navigator server", look in the Index under InterNotes server.
0

Featured Post

Concerto Cloud for Software Providers & ISVs

Can Concerto Cloud Services help you focus on evolving your application offerings, while delivering the best cloud experience to your customers? From DevOps to revenue models and customer support, the answer is yes!

Learn how Concerto can help you.

  • 14
  • 13
  • 10
  • +2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now