I checked and cannot publish directly on the forum. I do not have XML but HTML so need HTML to data parser
Main Topics
Browse All TopicsNeed to develop C# program that goes over HTML and insert the extracted data into database
The HTML contains data about areas (over 30 000 areas). Data about aread contains area ID, size and then there are lists of other data for this area (development plan, special regions etc).
we planned to ahve around 5 tables to store these data.
I check Gold projectz and seems complicated - are there any simple C# HTML parsers/data extractors. I can attach aprt of the HTML file if answers what is best approach to parse would be easier.
This Question has been solved and asker verified All Experts Exchange premium technology solutions are available to subscription members.
Experts Exchange has been collecting answers to technology questions since 1996…3 million and counting! If you have a question, chances are we already have your answer.
If you can't find the exact answer you're looking for, ask our exclusive community of 50,000 experts. You’ll get a personalized answer from a trusted professional.
Thousands of free tech tips, tricks, how-to’s and tutorials are available in our peer reviewed articles section. See for yourself how smart our experts are, no login required.
Access the answers to your technology questions today.
30-day free trial. Register in 60 seconds.
Members of the expert community talk about why the experience at Experts Exchange is different than what you will find anywhere else.

Try it out and discover for yourself.
30-day free trial. Register in 60 seconds.
Join the community of experts here and help other tech pros by answering question in your area of expertise. You can earn FREE access to all Experts Exchange's premium features and resources.
http://msdn.microsoft.com/
http://blogs.msdn.com/tims
I hope this little information will help.
HTML document is not XML (so one article is not related to HTML), another articles works upon currenctly loaded document.
the HTML I have is on the disk drive and I need to open and parse it, that's all (no Silverlight to use on active document as another link is about)
I can have console or win application that takes input file name and database connection string as an argument and loads that from HTML into database.
I need to match regular expresion e.g. "Area" and then take the name that apears after this text and next text which "SubArea". I create thus one record in AREA table. The I continue to road "SubArea" which might be several for area and plase corresponding number of records into database table "SubArea" etc.
HTML is a subset of XML. All rules apply.
In other words, you have a database in html ? If so, you'd have to declare table sizes and fields for each of your table-createable-tags, or create an SQL query generator that would conjure up a query depending on the fields in your table(in your file). I'm afraid I'm not aware of such a tool right now.
It's a tricky task actually. I'm sorry I'm no help. Cheers
Regular expressions are the way to go. You have to know how the data will always look and since it seems homegrown, should be fairly easily. But as piotroxp stated, hard to get you further without an example. The best way forward is to post an example file, with your non-working regexp that we can help you get working.
Wow. First thing, their will be a language barrier for me, but should be doable.
2nd thing is that the HTML is non-compliant throughout. All the HTML code should be lowercase and if you are not, at least have matching tags in the same case, lots of <table width=590>.......</TABLE> type things in there. Make sure you don't do case sensitive on the regex engine.
Got a family night, but will look at this later and tomorrow.
After a few more minutes looking, I don't think one single all powerful regex will do this, not easily anyway. There are multiple formats to the data tables, which doesn't make much sense (ie, can't read it) to me.
I thought this would get me far:
<table[^>]+>(\s+<tr><td[^>]*>([^<]+
but it only matches 6 of the roughly 100 tables.
Business Accounts
Answer for Membership
by: piotroxpPosted on 2009-01-18 at 13:43:50ID: 23406682
An example of the actual file would be welcome.
There is a multitude of xml->data parsers. You may use them to extract data, and then input the data into a database.
Nonetheless, w/o the file, this conversation is pointless. The actual material is needed to make a suitable judgement.