Solved

Pulling XML from Web Site and Storing in a Database

Posted on 2003-11-19
15
1,493 Views
Last Modified: 2007-12-19
Hello,

I going to attempt to get the department of Fish and Wildlife in my area to post some data in XML format on one of their web pages.

They currently post this information manually by having one of the clerks edit a static web page in a wysiwyg editor.

I can't imagine getting them to set up a web service, but I think I can at least get them to post a data island, or a plain .xml file with the data.

I would like to be able to automate the process where my server would contact thier server and pull that data, and then store it in my DB.

I am looking for a "best practice" type of application here.

I know there are ways to write an asp.net application and then schedule it to run say once a day, but I would suspect there would also be a way to do it within an MS SQL 2k stored procedure that was also scheduled to run periodically.

What is the best way to accomplish this task?

Thanks

Sean
0
Comment
Question by:Keyman
  • 8
  • 4
15 Comments
 
LVL 26

Expert Comment

by:rdcpro
ID: 9780909
I've always avoided extended stored procedures, as it seems that often the choke point is sql server.  Of course, part of my reasoning is that I don't exactly know how to write an extended stored proc! ;^)  

I assume your platform is .NET?  I've done exactly what you're asking but with a javascript running under windows scripting host. It runs on a regular windows scheduled task.  It queries the SQL Server (using the SQLXML integration) and gets the configuration information, among which is the set of URL's I'm looking to query, and the regex I'm going to use on each URL to extract the data.  Once I have the data, I persist it back to SQL Server using an Updategram.

If you're simply looking for a static XML file, it's much easier.  Even if you can get them to post the data in some other "standard" format such as a CSV file (assuming they have the data in Excel, this would be easy for them to do), you can pretty easily automate the process.  Screen scraping a WYSIWYG-edited page is risky, though.  Too easy for a clerk to change something that blows up your regex.

I use the ServerXMLHTTPRequest object (for server-side use...doh), or the XMLHTTPRequest object for client-side requests.  The server safe version implements it's own TCP stack, and even allows you to set userid and password, if necessary.  If you're getting back XML, it will be available as a fully parsed XML DOM object in the responseXML property.  If it's not XML, you can get the complete response using the responseText property.  It's a Very Useful Object.


Regards,
Mike Sharp
0
 

Author Comment

by:Keyman
ID: 9781159
Hmm...

I have so many uses for this application...

I am afraid to use a windows scheduled task because there are so many things I need from different sites that I think it would become a major headache running multiple tasks and keeping track of everything.

It would be very nice to come up with some way to do it within MS SQL. That way I would have to worry about keeping track of script files... scheduling them and so on.

Sean
0
 
LVL 26

Expert Comment

by:rdcpro
ID: 9783937
In my case there was only one script, and it queried 40 or 50 sites (checking the status of test servers, really).  I was mainly interested in extracting version and build information which was stored in various ways.  I used a separate admin UI (exposed through Sharepoint, actually) that allowed me (or another admin) to add new sites, reconfigure existing ones, that sort of thing.  I used asynchronous calls, so the load on the server running the tasks was insignificant, even though it took nearly 20 minutes to hit all the sites (if many of them were down).  But yes, the task scheduler is kind of a pain in some ways...the task runs with a specific user or machine credentials, and if the network security requires new passwords periodically, the task will cease to run when the password is updated.

It would be better to do it as a windows service, I guess...I'd still be wary of hogging SQL Server resources for the task, but if you're comfortable writing extended procedures, I guess it would work too.  I'd probably still use that ServerXMLHTTP Request object in the sproc, though, as it's pretty safe.   I'm afraid I don't know much about how to construct a reliable extended procedure, though, so I can't advise you very much there.

Regards,
Mike Sharp
0
 

Author Comment

by:Keyman
ID: 9791098
Well I guess this question should now be in the MS SQL Database area of the site...

I have come up with something but its being cut off at 4k bytes even though I set the variable to 8000
Any xml document that I grab that is over 4k results in this Error.

-2147211494
 
OLE Automation Error Information
  HRESULT: 0x8004271a
  Source: ODSOLE Extended Procedure
  Description: Error in srv_convert.
=========================================================================
Here is my sproc


CREATE PROCEDURE XMLGrabber

AS
set nocount on
declare @objWinHttp int
declare @strLine varchar(8000)
declare @hr int

exec @hr =sp_OACreate 'WinHttp.WinHttpRequest.5', @objWinHttp out
sp_OAMethod @objWinHttp, 'Open',NULL,'GET','http://localhost:85/DOTNET/XmlDocument2.aspx'
sp_OAMethod @objWinHttp, 'Send',NULL
exec @hr =sp_OAGetProperty @objWinHttp, 'ResponseText',@strLine OUtPUT
print @hr
print @strLine
exec sp_OADestroy @objWinHttp

0
 
LVL 26

Expert Comment

by:rdcpro
ID: 9791943
Can you use text or ntext?  I've had situations (though I don't really think it applies here) where my data was utf-16 unicode, and was double byte.  But the 4k limit seems to ring a bell, but I can't remember where I've seen that issue.

Regards,
Mike Sharp
0
 

Author Comment

by:Keyman
ID: 9793428
I cannot use text or ntext because that data type cannot be assigned within the sproc itself.

It can only be assigned as a parameter, and that would not work unless I go back to the way I used to do this and use an ASP.net script to send the xml data to the sproc.



0
3 Use Cases for Connected Systems

Our Dev teams are like yours. They’re continually cranking out code for new features/bugs fixes, testing, deploying, testing some more, responding to production monitoring events and more. It’s complex. So, we thought you’d like to see what’s working for us.

 

Author Comment

by:Keyman
ID: 9793511
Oh this is not good... I just discovered the reason the limit it 4k....

MS SQL uses Transact-SQL data types, and OLE Automation uses VB data types.

The corelation ends up being that an nvarchar or varchar that is 8k in sql ends up being a 4k string in VB


Its amazing that this is so hard to accomplish!

Sean
0
 
LVL 26

Expert Comment

by:rdcpro
ID: 9793934
Can you use the extended procedure to call a custom object written in VB that does the HTTPRequest?  That way the VB com object could update SQL Server separately using updategrams or OpenXML.  

Regards,
Mike Sharp
0
 

Author Comment

by:Keyman
ID: 9800056
Hmm.. I will have to read up on updategrams.


Sean
0
 

Author Comment

by:Keyman
ID: 9977042
As shown I have requested a refund of points for this question as it went un answered and I have figured out a solution on my own.

Sean
0
 

Author Comment

by:Keyman
ID: 9979368
I have used the sp_OACreate (and others) and the WinHttp5.1 object to retrieve the XML data from a URL.

I then used OPENXML  along with XPATH to parse the XML data for insertion into my tables.

The sproc is then executed under a job that is run once a day.

Sean
0
 

Author Comment

by:Keyman
ID: 9979377
opps.. I guess I got lazy and didnt read the whole page here... I for got that I did NOT get past the VB datatype limit!!


my bad!! I am just trying to clean up all these haning questions.


Sean
0
 
LVL 5

Accepted Solution

by:
Netminder earned 0 total points
ID: 9981712
User resolved; points refunded and question closed.

Netminder
EE Admin
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

I was working on a PowerPoint add-in the other day and a client asked me "can you implement a feature which processes a chart when it's pasted into a slide from another deck?". It got me wondering how to hook into built-in ribbon events in Office.
Many times as a report developer I've been asked to display normalized data such as three rows with values Jack, Joe, and Bob as a single comma-separated string such as 'Jack, Joe, Bob', and vice versa.  Here's how to do it. 
Sending a Secure fax is easy with eFax Corporate (http://www.enterprise.efax.com). First, just open a new email message. In the To field, type your recipient's fax number @efaxsend.com. You can even send a secure international fax — just include t…
Video by: Mark
This lesson goes over how to construct ordered and unordered lists and how to create hyperlinks.

867 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

21 Experts available now in Live!

Get 1:1 Help Now