Solved

Pulling XML from Web Site and Storing in a Database

Posted on 2003-11-19
15
1,488 Views
Last Modified: 2007-12-19
Hello,

I going to attempt to get the department of Fish and Wildlife in my area to post some data in XML format on one of their web pages.

They currently post this information manually by having one of the clerks edit a static web page in a wysiwyg editor.

I can't imagine getting them to set up a web service, but I think I can at least get them to post a data island, or a plain .xml file with the data.

I would like to be able to automate the process where my server would contact thier server and pull that data, and then store it in my DB.

I am looking for a "best practice" type of application here.

I know there are ways to write an asp.net application and then schedule it to run say once a day, but I would suspect there would also be a way to do it within an MS SQL 2k stored procedure that was also scheduled to run periodically.

What is the best way to accomplish this task?

Thanks

Sean
0
Comment
Question by:Keyman
  • 8
  • 4
15 Comments
 
LVL 26

Expert Comment

by:rdcpro
ID: 9780909
I've always avoided extended stored procedures, as it seems that often the choke point is sql server.  Of course, part of my reasoning is that I don't exactly know how to write an extended stored proc! ;^)  

I assume your platform is .NET?  I've done exactly what you're asking but with a javascript running under windows scripting host. It runs on a regular windows scheduled task.  It queries the SQL Server (using the SQLXML integration) and gets the configuration information, among which is the set of URL's I'm looking to query, and the regex I'm going to use on each URL to extract the data.  Once I have the data, I persist it back to SQL Server using an Updategram.

If you're simply looking for a static XML file, it's much easier.  Even if you can get them to post the data in some other "standard" format such as a CSV file (assuming they have the data in Excel, this would be easy for them to do), you can pretty easily automate the process.  Screen scraping a WYSIWYG-edited page is risky, though.  Too easy for a clerk to change something that blows up your regex.

I use the ServerXMLHTTPRequest object (for server-side use...doh), or the XMLHTTPRequest object for client-side requests.  The server safe version implements it's own TCP stack, and even allows you to set userid and password, if necessary.  If you're getting back XML, it will be available as a fully parsed XML DOM object in the responseXML property.  If it's not XML, you can get the complete response using the responseText property.  It's a Very Useful Object.


Regards,
Mike Sharp
0
 

Author Comment

by:Keyman
ID: 9781159
Hmm...

I have so many uses for this application...

I am afraid to use a windows scheduled task because there are so many things I need from different sites that I think it would become a major headache running multiple tasks and keeping track of everything.

It would be very nice to come up with some way to do it within MS SQL. That way I would have to worry about keeping track of script files... scheduling them and so on.

Sean
0
 
LVL 26

Expert Comment

by:rdcpro
ID: 9783937
In my case there was only one script, and it queried 40 or 50 sites (checking the status of test servers, really).  I was mainly interested in extracting version and build information which was stored in various ways.  I used a separate admin UI (exposed through Sharepoint, actually) that allowed me (or another admin) to add new sites, reconfigure existing ones, that sort of thing.  I used asynchronous calls, so the load on the server running the tasks was insignificant, even though it took nearly 20 minutes to hit all the sites (if many of them were down).  But yes, the task scheduler is kind of a pain in some ways...the task runs with a specific user or machine credentials, and if the network security requires new passwords periodically, the task will cease to run when the password is updated.

It would be better to do it as a windows service, I guess...I'd still be wary of hogging SQL Server resources for the task, but if you're comfortable writing extended procedures, I guess it would work too.  I'd probably still use that ServerXMLHTTP Request object in the sproc, though, as it's pretty safe.   I'm afraid I don't know much about how to construct a reliable extended procedure, though, so I can't advise you very much there.

Regards,
Mike Sharp
0
 

Author Comment

by:Keyman
ID: 9791098
Well I guess this question should now be in the MS SQL Database area of the site...

I have come up with something but its being cut off at 4k bytes even though I set the variable to 8000
Any xml document that I grab that is over 4k results in this Error.

-2147211494
 
OLE Automation Error Information
  HRESULT: 0x8004271a
  Source: ODSOLE Extended Procedure
  Description: Error in srv_convert.
=========================================================================
Here is my sproc


CREATE PROCEDURE XMLGrabber

AS
set nocount on
declare @objWinHttp int
declare @strLine varchar(8000)
declare @hr int

exec @hr =sp_OACreate 'WinHttp.WinHttpRequest.5', @objWinHttp out
sp_OAMethod @objWinHttp, 'Open',NULL,'GET','http://localhost:85/DOTNET/XmlDocument2.aspx'
sp_OAMethod @objWinHttp, 'Send',NULL
exec @hr =sp_OAGetProperty @objWinHttp, 'ResponseText',@strLine OUtPUT
print @hr
print @strLine
exec sp_OADestroy @objWinHttp

0
 
LVL 26

Expert Comment

by:rdcpro
ID: 9791943
Can you use text or ntext?  I've had situations (though I don't really think it applies here) where my data was utf-16 unicode, and was double byte.  But the 4k limit seems to ring a bell, but I can't remember where I've seen that issue.

Regards,
Mike Sharp
0
 

Author Comment

by:Keyman
ID: 9793428
I cannot use text or ntext because that data type cannot be assigned within the sproc itself.

It can only be assigned as a parameter, and that would not work unless I go back to the way I used to do this and use an ASP.net script to send the xml data to the sproc.



0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 

Author Comment

by:Keyman
ID: 9793511
Oh this is not good... I just discovered the reason the limit it 4k....

MS SQL uses Transact-SQL data types, and OLE Automation uses VB data types.

The corelation ends up being that an nvarchar or varchar that is 8k in sql ends up being a 4k string in VB


Its amazing that this is so hard to accomplish!

Sean
0
 
LVL 26

Expert Comment

by:rdcpro
ID: 9793934
Can you use the extended procedure to call a custom object written in VB that does the HTTPRequest?  That way the VB com object could update SQL Server separately using updategrams or OpenXML.  

Regards,
Mike Sharp
0
 

Author Comment

by:Keyman
ID: 9800056
Hmm.. I will have to read up on updategrams.


Sean
0
 

Author Comment

by:Keyman
ID: 9977042
As shown I have requested a refund of points for this question as it went un answered and I have figured out a solution on my own.

Sean
0
 

Author Comment

by:Keyman
ID: 9979368
I have used the sp_OACreate (and others) and the WinHttp5.1 object to retrieve the XML data from a URL.

I then used OPENXML  along with XPATH to parse the XML data for insertion into my tables.

The sproc is then executed under a job that is run once a day.

Sean
0
 

Author Comment

by:Keyman
ID: 9979377
opps.. I guess I got lazy and didnt read the whole page here... I for got that I did NOT get past the VB datatype limit!!


my bad!! I am just trying to clean up all these haning questions.


Sean
0
 
LVL 5

Accepted Solution

by:
Netminder earned 0 total points
ID: 9981712
User resolved; points refunded and question closed.

Netminder
EE Admin
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

The Confluence of Individual Knowledge and the Collective Intelligence At this writing (summer 2013) the term API (http://dictionary.reference.com/browse/API?s=t) has made its way into the popular lexicon of the English language.  A few years ago, …
Many times as a report developer I've been asked to display normalized data such as three rows with values Jack, Joe, and Bob as a single comma-separated string such as 'Jack, Joe, Bob', and vice versa.  Here's how to do it. 
It is a freely distributed piece of software for such tasks as photo retouching, image composition and image authoring. It works on many operating systems, in many languages.
This video shows how to remove a single email address from the Outlook 2010 Auto Suggestion memory. NOTE: For Outlook 2016 and 2013 perform the exact same steps. Open a new email: Click the New email button in Outlook. Start typing the address: …

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now