How to import the HTML page body contents into sql Data Table

The page_Details column type declared as (nvarchar(max) in the sql Table. How  to import the HTML page body contents into sql Data Table.

Thanks
KavyaVSAsked:
Who is Participating?
 
Rikin ShahMicrosoft Dynamics CRM ConsultantCommented:
Hi,

I'm not proficient in my SQL but you can do something like this-

DECLARE @xml NVARCHAR(MAX)

SET @xml = SELECT * FROM OPENROWSET(
   BULK 'C:\SampleFolder\SampleData3.txt',
           SINGLE_BLOB
) AS x


UPDATE [Content_Site].[dbo].t_Page_List
SET Page_Details = @xml
WHERE PageID = 1

Open in new window

0
 
Rikin ShahMicrosoft Dynamics CRM ConsultantCommented:
Hi,

Where exactly the HTML Page is getting loaded?
0
 
KavyaVSAuthor Commented:
The HTML page is in the C drive of sql server.

Thanks
0
Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

 
Rikin ShahMicrosoft Dynamics CRM ConsultantCommented:
And you want whole HTML file to be dumped to the SQL Column?

I think you must have got the code to read the content of the file... All you need to do is remove HTML tags from the content. Here is the function which will help you get plain text from the HTML string...

private string GetPlainTextFromHtml(string htmlString)
{
    string htmlTagPattern = "<.*?>";
    var regexCss = new Regex("(\\<script(.+?)\\</script\\>)|(\\<style(.+?)\\</style\\>)", RegexOptions.Singleline | RegexOptions.IgnoreCase);
    htmlString = regexCss.Replace(htmlString, string.Empty);
    htmlString = Regex.Replace(htmlString, htmlTagPattern, string.Empty);
    htmlString = Regex.Replace(htmlString, @"^\s+$[\r\n]*", "", RegexOptions.Multiline);
    htmlString = htmlString.Replace("&nbsp;", string.Empty);

    return htmlString;
}

Open in new window

0
 
KavyaVSAuthor Commented:
I don't want to remove html tags from from the html page. I want to save as it is into
Sql Data Table column. I don't want the whole html page. I want to save the body tag contents in the sql column(data type nvarchar(max))
Any suggestions please.


The following query inserting the HTML page content into Sql DataTable
 when the page_Details column type declared as (XML(.),null(The content
 inside the body tags in .aspx page was saved as xml file)
 Ex:<PageContents>

     - <![CDATA[
 <div>
 </div>

  ]]>

   </PageContents>
 Now the page_Details column type declared as (nvarchar(max). The below
 query is not inserting data.The column type can not be changed. How to
 insert the html data there.

 UPDATE [Content_Site].[dbo].t_Page_List

 SET Page_Details =(

 SELECT * FROM OPENROWSET(

    BULK 'C:\PagedETAILS_Xml\Page1content.xml’,

            SINGLE_BLOB

 ) AS x

 )

 WHERE PageID = 1

 GO

Thanks
0
 
Rikin ShahMicrosoft Dynamics CRM ConsultantCommented:
You might need to cast the x to nvarchar.
0
 
KavyaVSAuthor Commented:
I've requested that this question be closed as follows:

Accepted answer: 167 points for rikin_shah's comment #a39739659
Assisted answer: 166 points for rikin_shah's comment #a39739075
Assisted answer: 0 points for KavyaVS's comment #a39739076
Assisted answer: 167 points for rikin_shah's comment #a39739090

for the following reason:

Thanks
0
 
KavyaVSAuthor Commented:
Thanks
0
 
Safak KAYACommented:
Hello, I am new in sql but I have the same issue.  I want to import a particular data from a web page's html source code. to sql table.

Is it possible?

Thanks
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.