Solved

How to pull information of websites automatically

Posted on 2004-09-22
43
674 Views
Last Modified: 2013-12-18
I am developing a database where there are assumptions made. I need to build in the ability where if the assumption is made upon a variable factor like exchange rates for example , a person can give the url to a website where the information can be pulled in automatically & if the exchange rate spikes , an email must be sent to a person informing them about the spike in exchange rates and that the assumption made is no longer valid.

I dont know how to begin to make this flexible for different clients. different people will have different assumptions made  , based on different websites information , how do I begin to code something of this nature ?
0
Comment
Question by:PParuman
  • 14
  • 13
  • 10
  • +2
43 Comments
 
LVL 46

Expert Comment

by:Sjef Bosman
ID: 12120603
Boy, you have some great questions!

Some building blocks for your database:
- agents can set to be run on schedule
- a Domino server can be used as a web browser (see web.nsf)
- thus, the agent can retrieve pages from the Internet
- the content of the page retrieved can be interpreted (text only of course)
- the agent can compare the content with personal assumptions
- the agent can send mail

Does this help you somewhat?
0
 
LVL 15

Expert Comment

by:Bozzie4
ID: 12120766
Pfoe.  It's possible, but difficult.  

1. You know the structure of each site, and parse the html according to that knowledge, to get the info you need
2. You open the website in a browser, create a (javascript?) app.  that lets the user select the part on screen he wants to use, and you just copy that data (or you use it to configure the agent)
3. Try to get the data in simple text or xml format from the website owners

cheers,

Tom
0
 
LVL 19

Expert Comment

by:RanjeetRain
ID: 12120845
Interesting! I'd love to help you with this.

Components that your app will/should have:

(1) Data Agents - Will be responsible for fetching data from Internet and maintain/keep updating the built-in repository. Without this component your system will have to be a REAL TIME system, which may not be a success. Web Services will be the key providers and SOAP and XML may be your transport layer.

(2) Data Repository - YOu will store the intermediate data here, else... real-time. This will not only serve as a local repository but also will be much useful when it comes to trend-analysis. (I know you didn't ask this, but tomorrow you will need this).

(3) Presentation Manager - Be ready to spend some good amount of time on this. It better be neat. Java may be useful for you (it works good with domino, will ensure you can run it anywhere, with any client, without a headache).

(4) Event Manager - An interesting aspect of this may be this layer. Who orchestrates the show? THe above mentioned three layers may not be put to best use, untill you have some smart cookie sitting there to make them do the work. Hence, the Event Manager.
0
 

Author Comment

by:PParuman
ID: 12121080
Hi everyone...

I have tried to use your suggestion SJef , but I continually get an error coming back in the body of the document.

This is what is displayed in the body field of the document that is returned :

<H1>ERROR</H1>The requested document (URL http://www.lotus.com/) could not be accessed.<p>The remote server either is not accessible or is refusing to serve the document.

any ideas on how to fix this ?  We have started the web service on the server etc....

This is my code in my scheduled agent :

Messagebox("****************** RUNNING AGENT**************************")
      
      Dim session As New NotesSession
      Dim db As NotesDatabase
      Dim webDb As NotesDatabase
      Dim doc As NotesDocument
      Set db = session.CurrentDatabase
      Set webDB = session.getDatabase("","Web.nsf")
      Set doc = webDB.GetDocumentByURL("http://www.lotus.com")
      Messagebox(doc.body(0))
Messagebox("*********************** FINISHED RUNNING AGENT*************")
0
 
LVL 19

Expert Comment

by:RanjeetRain
ID: 12121450
Check your proxy settings. May be your settings are incorrect. Try navigating to the document manually using the personal web navigator.
0
 
LVL 46

Expert Comment

by:Sjef Bosman
ID: 12121961
The webdb you're using might not be the one on the server. Did you run your agent locally??

To test whether the web-service works, do the following:
- open your Location document (or create a new one)
- tab Servers: set the InterNotes server to the server you're testing with
- tab Internet Browser: select Notes and from InterNotes server
- save and close
- open the database web.nsf on the server
- click om Open URL, and type your url
- alternative: click on references and double-click some page

Should work...
0
 

Author Comment

by:PParuman
ID: 12122038
I have found another way to get the HTML from a URL , I used the following code :

      Dim test As String
      Set objHttp = CreateObject("Microsoft.XMLHTTP")
      
      url = |http://www.resbank.co.za/sarbdata/rates/rates.asp?type=HPR|
      objHttp.open "POST", url, False, "", ""
      objHttp.send
      
      response = objHttp.ResponseText
      test = response
      Messagebox(test)

it works fine. I just now need to find a way to strip the HTML tags from the text
Does anybody have code to strip the HTMl Tags off ?

0
 
LVL 14

Expert Comment

by:p_partha
ID: 12122075
Pparuman
when i read your question,the firs thing that came into my head was xmlhttp .. bad that you found it urself. Anyways now since you are stuck with stripping out HTML Tags, let me jump into this .

Its not easy to get the text of an HTML. What you need to do is to assign this response text to a object and get the innerText of just the body.

Or the best alternative is RSS. Most of the sites do give that. Check out whether the site you are referring to spits out XML. You can just parse it in a giffy

Partha
0
 
LVL 14

Expert Comment

by:p_partha
ID: 12122105
i meant assigning response text to an object right, this is how you do it

var xmlDoc = new ActiveXObject("microsoft.xmldom");
xmlDoc.loadXML(objHttp.responseText);


you can just try

bodytag = xmlDoc.getAllElementsByTagName("body'')[0]
alert(bodytag.innerText)

Partha
0
 
LVL 46

Expert Comment

by:Sjef Bosman
ID: 12122142
Partha, bro, that's cheating! Using a Microsoft object where Notes has the perfect (hmm...) tools to do this. I'm glad to say thet the object-method won't work on Linux. Web.nsf has worked ever since 4.6 (or so), is not the best in the world but has everything you need. You can even see your webpage as rich-text and as HTML!
0
 
LVL 14

Expert Comment

by:p_partha
ID: 12122826
I agree bro, but xmlhttp object has lot of methods to manipulate the output, But RSS is definitely the best answer for this

Partha
0
 

Author Comment

by:PParuman
ID: 12122978
Do you guys know how to eliminate newline characters from a string in LotusScript ?
0
 
LVL 14

Expert Comment

by:p_partha
ID: 12123011
0
 
LVL 46

Expert Comment

by:Sjef Bosman
ID: 12123352
RSS will only work if th eother party supplies it (if I'm correctly informed). What if not?

PParuman: did you already get the web.nsf basics to work?
0
 
LVL 14

Expert Comment

by:p_partha
ID: 12124364
Pparuman
Did you try what i suggested?

Partha
0
 
LVL 46

Expert Comment

by:Sjef Bosman
ID: 12125045
Partha,

Did you try what you suggested?

Sjef ;)

PS You know my approach probably, why is yours better?
0
 

Author Comment

by:PParuman
ID: 12131088
Hi guys , I have hit a problem with the XML object code.... it works beautifully when the agent is running manually from the agents list in the designer , but when it is scheduled , it gives an error , OLE Automation Error .... any ideas ?
0
 
LVL 46

Expert Comment

by:Sjef Bosman
ID: 12132414
Well, if the agent runs on the server, you have to install the objectcode on the server... As I said before, it must be a Windows server. Linux is out of the question. There may be other tools to do this in Linux.

Would you please be so kind to tell me what you did with my comments about web.nsf?
0
 
LVL 14

Expert Comment

by:p_partha
ID: 12132438
it's a simple installation on the server, and pls comment abt web.nsf, my bro has given his valuable comments ...

Partha
0
 
LVL 46

Expert Comment

by:Sjef Bosman
ID: 12132507
Hi bro, I'm going to post some nice feedback for you some day :)

Still, my q. remains: why is your solution better? Or is it just different?
0
What Is Threat Intelligence?

Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

 

Author Comment

by:PParuman
ID: 12133354
Hi Guys....

Sorry I am taking so long to reply to your replies.... I am at clients the whole day & I only get to dialin once in a while .

You guys mentioned that I have to install something on the server , What is it that I need to install ? where do I get what I have to install ???

Comments about the solutions ....

I think that the we.nsf solution would be brilliant if I could get it to work , but I tried & I got frustrated trying... thats what lead me to the XML solution.  I really cant tell which solution is better until I have both solutions working to test them :)
Guess you guys have to fight this one out... can I have pics to see who my bets on ;p
0
 
LVL 14

Expert Comment

by:p_partha
ID: 12133440
To make my bro's solution work, we need to have the webnavigator running, but there is a potential security breach (that's what my admin said when we asked ) ...

I have no other reason to go for XMLhttp.

download , i will try giving u the link..

Partha


P.S: we are not fighting, that's the way we enjoy giving answers


0
 

Author Comment

by:PParuman
ID: 12133505
Hi Partha....

Thanks for the quick posting , I would really appreciate it if you could send me the link to download the stuff I need...

What is it that I need by the way ?
0
 
LVL 14

Expert Comment

by:p_partha
ID: 12133550
It's installed by default with IE5 and above. Do you have IE5 in your Server?

Partha
0
 

Author Comment

by:PParuman
ID: 12133591
Dont really know what version of explorer we are using...

What exactly is it that needs to be installed ?  Is it a service or something ?
0
 
LVL 46

Expert Comment

by:Sjef Bosman
ID: 12134051
What a pleasure it is to know nothing about xmlhttptmlip, or whatever the name is!

About web.nsf: there are some parameters required. Firstly, in the Server Document, see the tab Server Tasks, sub-tab Web Retriever, but usually the defaults are okay (web.nsf, HTTP, FTP, GOPHER, 50, None, Once per session).

Then, open the web.nsf database, click on Database Views and then File/Database/Access control. You need the WebMaster role to adapt the configuration. If you have the role, click on Actions/Administration. This profile document is REQUIRED.

Last one, the proxy (see also above). Your system has to know how to reach the outside world. Is there an inside server you could do some testing with? Do you have a firewall checking on username?

What are your results until now?
0
 

Author Comment

by:PParuman
ID: 12141323
Up to now , my results have only returned the following error ...

<H1>ERROR</H1>The requested document (URL http://www-306.ibm.com/software/lotus) could not be accessed.<p>The remote server either is not accessible or is refusing to serve the document.

I have had no clue why it would not work... I have read your reply now.. I have to wait until monday to have my administrator give me the webmaster role & for me to go and check if there is a profile document.  What happens if I use a proxy server when connecting to the internet , does that mean that the server uses the same proxy server ?

If so where do I specify the proxy settings in order for the server to resolve the url I send it ?

When I connect to the internet , I am not prompted for a username & password , it just takes me straight to the url I require.

Thanks for your reply sJef.
0
 

Author Comment

by:PParuman
ID: 12176898
Hi All...

sJef.... I was told by our administrators that web.nsf etc is not supported in version 6.5 of notes. This makes the solution of web.nsf useless as it has to work on versions to come & not just on version 5.

Partha , is there anything you can think of that will make the XML HTTP code work in the scheduled  agent ? I found out that we are using version 6 of IE... so I dont understand why it is not working ....

Please help.... I need to find a solution to my problem soon...
0
 
LVL 46

Expert Comment

by:Sjef Bosman
ID: 12177505
Not supported?? By whom: Lotus, or your administrators?

Judging from these links, it is there:
    http://www.waresource.com/kurchak/ka1.nsf/0/adcd1f339ed1d5cc85256dc7005bb1ae?OpenDocument
    https://www-10.lotus.com/ldd/beta/nd7pubbeta.nsf/0/3337042698898c2785256ebb007111f8?OpenDocument (R7 beta!)

Please ask them again.
0
 

Author Comment

by:PParuman
ID: 12179401
Thanks for your posting sJef... I am going to be in the office tomorrow & I will be playing around with the code again... I will definitely speak to our administrators & find out what is going on ....

0
 
LVL 14

Expert Comment

by:p_partha
ID: 12179862
0
 
LVL 46

Expert Comment

by:Sjef Bosman
ID: 12179970
Partha, or anyone else, ever heard of discontinued support for WEB.EXE and WEB.NSF??
0
 
LVL 14

Expert Comment

by:p_partha
ID: 12180193
It's pretty much there in R6 and will be there in R7 R8.... If they have removed that, then probably they will deprecate the getdocumentbyurl method itslef , whcih has not been done...

Partha
0
 
LVL 46

Expert Comment

by:Sjef Bosman
ID: 12181894
Brilliant deduction. Thanks!
0
 

Author Comment

by:PParuman
ID: 12187638
Hi guys...


This is my code that I have used in my agent ....

      Messagebox("****************** RUNNING AGENT**************************")
      
      Dim session As New NotesSession
      Dim db As NotesDatabase
      Dim webDb As NotesDatabase
      Dim doc As NotesDocument
      Set db = session.CurrentDatabase
      Set webDB = session.getDatabase(db.Server,"Web.nsf")
      Set doc = webDB.GetDocumentByURL("http://www.lotus.com")
      Messagebox(doc.body(0))
      Messagebox("*********************** FINISHED RUNNING AGENT*************")

This is what is thrown out in the logs after the scheduled agent has run.... Please can you help me & tell me what I am doing wrong... I have created the profile doc within the web.nsf database... is there any special rights I need to have on the server in order for my agent to run ?

09/30/2004 09:18:28   AMgr: Agent ('TestWebnsf' in 'New Depict\StrategyTracker.nsf') message box: ****************** RUNNING AGENT**************************
09/30/2004 09:18:28   WEB(2): Loading additional WEB task
09/30/2004 09:18:28   WEB(3): Initializing
09/30/2004 09:18:49   AMgr: Agent ('TestWebnsf' in 'New Depict\StrategyTracker.nsf') message box: <H1>ERROR</H1>The requested document (URL http://www.lotus.com/) could not be accessed.<p>The remote server either is not accessible or is refusing to serve the document.
09/30/2004 09:18:49   AMgr: Agent ('TestWebnsf' in 'New Depict\StrategyTracker.nsf') message box: *********************** FINISHED RUNNING AGENT*************
0
 
LVL 46

Expert Comment

by:Sjef Bosman
ID: 12188113
This tells me that the code above runs, for all MessageBox calls are printed and I assume you habe no On Error Resume Next in your code. The ERROR message printed comes actually from the web.nsf database, so a document has been created for you. Try opening the web.nsf database on the server, not by double-clicking it, but as follows: in the workspace, click once on the db-icon, then on view, goto, all documents. You will see some documents "loaded" from the Internet, among which there should be one about www.lotus.com. You can look at the Document Properties toe see if there are errors mentioned there, like HTTPStatus and URL.

You don't have to do this programmatically. Test it manually first, by opening the server's web.nsf database as usual (dbl-click), and type a URL. The page that should result will also be shown in the Database Views, so you can study the properties if need be.

Most likely the server doesn't know how to get through to the Internet. Is it possible, from the server console, to start an Internet Exploder and get info from the Internet? What are the settings in IE? Is there a proxy? Maybe your firewall blocks requests, but are they logged somewhere? I can confirm that the request DID go out, absolutely.

There are proxy settings in the Server document in the N&A-book. See your N&A-book, open the server's Server document, 3rd tab Ports, 3rd sub-tab Proxies.
0
 

Author Comment

by:PParuman
ID: 12188341
Hi
Sjef...

I opened the database on the server like you said , but the database is completely blank.. there are no documents .

I do connect through a proxy for the internet... will be checking later on at the server about the internet settings there.... our administrators are very sticky & refuse to give us info about things contained on the server & about the server settings...

Will Keep you posted about the settings once I find them out.
0
 
LVL 46

Expert Comment

by:Sjef Bosman
ID: 12188819
Probably you're not allowed to read the documents in the database, you need some privileges.

To open a URL yourself, you might have to change your Location settings, or create a copy of the current one (Office?) and call it "Office (InterNotes)". The modified settings:
- under Internet Browser, select Notes, then select from InterNotes server
- under Servers, add the name of the server you're using as the InterNotes server

Now, go into the web.nsf database and try to retrieve a URL.
0
 

Author Comment

by:PParuman
ID: 12189010
I have found out that the server was not setup as an internotes server .... under the basics tab of the server configuration There is a section for the server location & there is a field where you can add the internotes server .... that field seems to be empty....

What performance factors are there if we have to make our domino server an internotes server ?
0
 
LVL 46

Accepted Solution

by:
Sjef Bosman earned 500 total points
ID: 12189302
The Basic tab/Location section contains very few interesting fields. The server I know doesn't have the InterNotes servername filled in, but if you fill it it will be the default setting for all users. Usually, you have to use your own Location document in the Notes client, and make it use the InterNotes server, as described earlier.

Starting the WEB.EXE retriever process on the server will effectively turn your server into an InterNotes server. Performance? I know it doesn't come for free. I'd have to guess, but it all depends on the number of pages you intend to retrieve. The settings for the WEB process can be found in the WEB.NSF database itself, visible only to a user with the WebMaster role.

Need to know more? Open the Administrator Help database, and look for the document called "Setting up a Web Navigator server", look in the Index under InterNotes server.
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Problem "Can you help me recover my changes?  I double-clicked the attachment, made changes, and then hit Save before closing it.  But when I try to re-open it, my changes are missing!"    Solution This solution opens the Outlook Secure Temp Fold…
Notes Document Link used by IBM Notes is a link file which aids in the sharing of links to documents in email and webpages. The posts describe the importance and steps to create a Lotus Notes NDL file in brief.
This video shows how to remove a single email address from the Outlook 2010 Auto Suggestion memory. NOTE: For Outlook 2016 and 2013 perform the exact same steps. Open a new email: Click the New email button in Outlook. Start typing the address: …
You have products, that come in variants and want to set different prices for them? Watch this micro tutorial that describes how to configure prices for Magento super attributes. Assigning simple products to configurable: We assigned simple products…

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now