Avatar of matthew phung
matthew phung
 asked on

Extracting data from html using c#

Hi,
I have an active directory report in html that I need to extract data from. The html file contains a lot of tables but I'm only looking for tables that have the following header
<tr><th scope="col">Policy</th><th scope="col">Setting</th><th scope="col">Winning GPO</th></tr>

Open in new window

if we find a table that contains the above header then I want to add it to a  master list. The end result will be to have a master list that contains the data from all the tables that contain the above header.  Below is an example of the html I need to process
<table class="info3" cellpadding="0" cellspacing="0">
<tr><th scope="col">Policy</th><th scope="col">Setting</th><th scope="col">Winning GPO</th></tr>
<tr><td>Enforce password history</td><td>4 passwords remembered</td><td>Default Domain Policy</td></tr>
<tr><td>Maximum password age</td><td>0 days</td><td>Default Domain Policy</td></tr>
<tr><td>Minimum password age</td><td>1 days</td><td>Default Domain Policy</td></tr>
<tr><td>Minimum password length</td><td>8 characters</td><td>Default Domain Policy</td></tr>
<tr><td>Password must meet complexity requirements</td><td>Enabled</td><td>Default Domain Policy</td></tr>
<tr><td>Store passwords using reversible encryption</td><td>Disabled</td><td>Default Domain Policy</td></tr>
</table>
</div></div><div class="he3"><span class="sectionTitle" tabindex="0">Account Policies/Account Lockout Policy</span><a class="expando" href="#"></a></div>
<div class="container"><div class="he4i"><table class="info3" cellpadding="0" cellspacing="0">
<tr><th scope="col">Policy</th><th scope="col">Setting</th><th scope="col">Winning GPO</th></tr>
<tr><td>Account lockout duration</td><td>30 minutes</td><td>Default Domain Policy</td></tr>
<tr><td>Account lockout threshold</td><td>6 invalid logon attempts</td><td>Default Domain Policy</td></tr>
<tr><td>Reset account lockout counter after</td><td>30 minutes</td><td>Default Domain Policy</td></tr>
</table>
</div></div><div class="he3"><span class="sectionTitle" tabindex="0">Local Policies/Audit Policy</span><a class="expando" href="#"></a></div>
<div class="container"><div class="he4i"><table class="info3" cellpadding="0" cellspacing="0">
<tr><th scope="col">Policy</th><th scope="col">Setting</th><th scope="col">Winning GPO</th></tr>
<tr><td>Audit process tracking</td><td>Success, Failure</td><td>Workstations Audit Policies</td></tr>
</table>
</div></div><div class="he3"><span class="sectionTitle" tabindex="0">Local Policies/Security Options</span><a class="expando" href="#"></a></div>
<div class="container"><div class="he4h"><span class="sectionTitle" tabindex="0">Interactive Logon</span><a class="expando" href="#"></a></div>
<div class="container"><div class="he4i"><table class="info3" cellpadding="0" cellspacing="0">
<tr><th scope="col">Policy</th><th scope="col">Setting</th><th scope="col">Winning GPO</th></tr>
<tr><td>Interactive logon: Do not display last user name</td><td>Enabled</td><td>Default Domain Policy</td></tr>
<tr><td>Interactive logon: Message text for users attempting to log on</td><td>This is a private enterprise computer system limited to business use.  Access to and use of this system requires explicit and current authorization.  All users expressly consent to monitoring by system personnel to detect improper access or use.  If such monitoring reveals possible criminal activity or improper access or use,system personnel may provide evidence of such conduct to law enforcement officials and/or company management.</td><td>Workstations</td></tr>
<tr><td>Interactive logon: Message title for users attempting to log on</td><td>Important Notice:</td><td>Workstations</td></tr>
<tr><td>Interactive logon: Number of previous logons to cache (in case domain controller is not available)</td><td>10 logons</td><td>Workstations</td></tr>
</table>
</div></div><div class="he4h"><span class="sectionTitle" tabindex="0">Network Security</span><a class="expando" href="#"></a></div>
<div class="container"><div class="he4i"><table class="info3" cellpadding="0" cellspacing="0">
<tr><th scope="col">Policy</th><th scope="col">Setting</th><th scope="col">Winning GPO</th></tr>
<tr><td>Network security: Force logoff when logon hours expire</td><td>Enabled</td><td>Default Domain Policy</td></tr>
</table>
</div></div><div class="he4h"><span class="sectionTitle" tabindex="0">Other</span><a class="expando" href="#"></a></div>
<div class="container"><div class="he4i"><table class="info3" cellpadding="0" cellspacing="0">
<tr><th scope="col">Policy</th><th scope="col">Setting</th><th scope="col">Winning GPO</th></tr>
<tr><td>Network security: Allow Local System to use computer identity for NTLM</td><td>Enabled</td><td>Workstations Tablet Windows 81 - GPO WF Deny</td></tr>
<tr><td>Network security: Allow LocalSystem NULL session fallback</td><td>Disabled</td><td>Workstations Tablet Windows 81 - GPO WF Deny</td></tr>
</table>

Open in new window

C#HTML.NET Programming

Avatar of undefined
Last Comment
Aaron Jabamani

8/22/2022 - Mon
matthew phung

ASKER
Some of the tables have nested tables too. I would like to ignore the nested tables. I attached an example  below
<table class="info3" cellpadding="0" cellspacing="0">
<tr><th scope="col">Policy</th><th scope="col">Setting</th><th scope="col">Winning GPO</th></tr>
<tr><td>Automatic certificate management</td><td>Enabled</td><td>[Default setting]</td></tr>
<tr><td colspan="3"><table class="subtable3" cellpadding="0" cellspacing="0">
<tr><th scope="col">Option</th><th scope="col">Setting</th></tr>
<tr><td scope="row">Enroll new certificates, renew expired certificates, process pending certificate requests and remove revoked certificates</td><td>Disabled</td></tr>
<tr><td scope="row">Update and manage certificates that use certificate templates from Active Directory</td><td>Disabled</td></tr>
</table></td></tr></table>

Open in new window

SOLUTION
Ioannis Paraskevopoulos

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
GET A PERSONALIZED SOLUTION
Ask your own question & get feedback from real experts
Find out why thousands trust the EE community with their toughest problems.
ASKER CERTIFIED SOLUTION
Aaron Jabamani

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
Your help has saved me hundreds of hours of internet surfing.
fblack61