Avatar of Leo Torres
Leo TorresFlag for United States of America asked on

Need a Powershell Parse of some HTML code

I have an HTML similar type file. The file gets generated from an app so its not exactly HTML, but I need all the values exported into an csv file.

The file has a Header section and detail section. I don't mind having all the data into 1 table if  makes it easier.

Below I have included sample file to parse and small Powershell script the only thing I can do with Powershell i an still too green. Tried using the HTMLAgilty pack and was unsuccessful.

Here are the column details:
Header Info
codebase, instance_id,guid, id, name, version, registrar, risk_index, force_start, locale_id, template_version, status, score, date, source_object_type,source_object_id, calling_hra_id, external_page_id, in_external_page, from_other_hra, doctorsession_id, obxid, case_id,
loggedin_userid, patient_id, patientname, allow_void,

--Detail Rows
SKIPPED_HISTORY,PAGE id, flush_concepts, update_date, valid,QUESTION id, Score, name, value
Sample2.html
GetHTMLData.txt
PowershellC#.NET ProgrammingXML

Avatar of undefined
Last Comment
Leo Torres

8/22/2022 - Mon
Jason Ryberg

PowerShell has a builtin XML object type that will parse the XML content for you:

[xml]$xml = Get-Content C:\users\Jason\Desktop\test.xml

Open in new window


Once the XML object is loaded, you can traverse the object similarly to other PS objects!
ASKER
Leo Torres

once its xml Content any good tutorials on how to extract data from this variable.

And I also see you are passing an xml file.  Could I just rename the file below as xml? Does not seem to have proper tags?
Jason Ryberg

About halfway down this post is a good overview of how navigating XML objects works:

https://blogs.technet.com/b/heyscriptingguy/archive/2013/04/01/working-with-xml.aspx
I started with Experts Exchange in 2004 and it's been a mainstay of my professional computing life since. It helped me launch a career as a programmer / Oracle data analyst
William Peck
aikimark

@ltorres321

What would you like your CSV file to look like, given this input?  I concur with the assertion that you are working with XML data.
aikimark

Another way to get at (parse) the question data would be to apply a regular expression pattern to the contents of the XML file.  The following pattern worked for the question data in the file you posted.
<QUESTION id="(\d+)" Score="(\d+)" name="(.*?)" (?:value="){0,1}(.*?)(?:" ){0,1}/>

Open in new window

ASKER
Leo Torres

aikimark your line of code where do I incorporate it into my code?
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
aikimark

where do I incorporate it into my code?
Please answer my prior question about what your output needs to look like?
ASKER
Leo Torres

The out needs to in a table Format so I can import into a SQL table
Jason Ryberg

Using my method, it would be fairly simple:

[xml]$xml = Get-Content C:\tmp\test.xml
$xml.RESULTSCACHE | export-csv test.csv

Open in new window

Your help has saved me hundreds of hours of internet surfing.
fblack61
aikimark

@Jason

I don't think that CSV file is what ltorres321 had in mind.

You posted an elegant PS script, however.  It is a bit too simple for the detail I expect is needed.
aikimark

The out needs to in a table Format so I can import into a SQL table

Please post a couple of lines of CSV data from that XML file.  Do this editing by hand.
Jason Ryberg

I'll be interested to see what kind of table ltorres321 would prefer to use to import into SQL if it's not CSV... seems pretty obvious, but I could be wrong!
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
aikimark

Yes.  If you have the data in memory, why persist it to CSV?  Just pump it directly into SQL Server.

However, if the import process is SSIS and the data gets validated, merged or split, I can see a reason why the existing SSIS code might be retained.
ASKER
Leo Torres

There is Header data and detail data in this file.

Jason Ryberg : as per your code that actually returns the Header data
which is between  <RESULTSCACHE  and ends at the <SKIPPED_HISTORY value=""/>

There is more data after that that is the Line Detail for this header.

As for CSV it was just to make it easier to get into the database. I was going to use SSIS to move move that data into tables but if Powershell can do it then so be it as well.
ASKER
Leo Torres

Jason your code is nice and very clean but how about the other fields?
All of life is about relationships, and EE has made a viirtual community a real community. It lifts everyone's boat
William Peck
SOLUTION
Jason Ryberg

Log in or sign up to see answer
Become an EE member today7-DAY FREE TRIAL
Members can start a 7-Day Free trial then enjoy unlimited access to the platform
Sign up - Free for 7 days
or
Learn why we charge membership fees
We get it - no one likes a content blocker. Take one extra minute and find out why we block content.
See how we're fighting big data
Not exactly the question you had in mind?
Sign up for an EE membership and get your own personalized solution. With an EE membership, you can ask unlimited troubleshooting, research, or opinion questions.
ask a question
Qlemo

You'll need to separate RESULTSCACHE data and QUESTION data, so I would use this approach:
[xml] $xml = Get-Content C:\tmp\test.xml
$xml.ResultsCache | export-csv -NoType C:\tmp\Results.csv
$xml.ResultsCache.Page | % { $_.Question } | export-csv -NoType C:\Tmp\Questions.csv

Open in new window

Qlemo

Note that Jason's approach only works starting with PS 3. The equivalent code by naming the properties (as in my code snippet) is
$xml.ResultsCache.Page.Question | export-csv -NoType c:\temp\Questions.csv

Open in new window

ASKER
Leo Torres

The data just needs to go into a table table. That's just rows and Columns.

Qlemo: that's impressive. Now how do I include the page number as well in the data.

And is there an ID I can create on the fly so when I put this into a table I can tie the header to the details?  Hope that makes sense.

Thanks for the help. I thought this was going to be couple hundred lines of code you did most if it in 3!
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
ASKER
Leo Torres

FYI PS 3 is fine I am using that.

As I also Compare data to what you are producinf Qlemo;

I see this strings of data is missing

Page ID
Flush_Concepts
Update_Date
valid


<PAGE id="1" flush_concepts="0" update_date="7/25/2012 8:43:58 AM" valid="1">
ASKER CERTIFIED SOLUTION
Log in to continue reading
Log In
Sign up - Free for 7 days
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
SOLUTION
Log in to continue reading
Log In
Sign up - Free for 7 days
Get an unlimited membership to EE for less than $4 a week.
Unlimited question asking, solutions, articles and more.
ASKER
Leo Torres

Wow, Excellent I will close this here for now. I will try to get this data into SQL tables. If I cant I will post another question to be fair. Thanks to all for the exceptional HELP! I need every bit of it.
ASKER
Leo Torres

Experts Exchange is like having an extremely knowledgeable team sitting and waiting for your call. Couldn't do my job half as well as I do without it!
James Murphy