karinos57
asked on
convert html tags into text
Hi,
i am pulling a data from wordpress and one of the columns has html tags content in the column. I am also copying the data from wordpress into sql server. so i would like to format the data in sql server or in wordpress but my goal is to clean up and convert in text so it could be something readable. Here is what it look like the data:
Note, i have researched this but could not get anything useful so i am hoping the EE can help me on this. thanks for your time.
i am pulling a data from wordpress and one of the columns has html tags content in the column. I am also copying the data from wordpress into sql server. so i would like to format the data in sql server or in wordpress but my goal is to clean up and convert in text so it could be something readable. Here is what it look like the data:
Note, i have researched this but could not get anything useful so i am hoping the EE can help me on this. thanks for your time.
<font size="1">
<table border=3>
<tr><b>
<td>Chg #</td>
<td>Planned Start</td>
<td>Planned End</td>
<td>Risk Level</td>
<td>Brief Description</td>
<td>Assign Dept</td>
<td>Requested by</td>
<td>Status</td> </b>
</tr>
<tr>
<td>C10269</td>
<td>9/16/2011 3:00 PM</td>
<td>9/17/2011 11:00 PM</td>
<td>-</td>
<td>Backup for VNETS Refresh</td>
<td>DBA</td>
<td>GKERN</td>
<td>pending</td>
</tr>
<tr>
<td>C10274</td>
<td>9/16/2011 8:30 PM</td>
<td>9/17/2011 10:00 PM</td>
<td>2</td>
<td>EAI Adapter outage</td>
<td>Application-Middleware Tech</td>
<td>janders4</td>
<td>pending</td>
</tr>
<tr>
<td>C10273</td>
<td>9/17/2011 8:00 AM</td>
<td>9/17/2011 2:30 PM</td>
<td>-</td>
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
i am also using this sql code to modify the data
SELECT distinct [post_date]
,convert(varchar(MAX),[post_content]) [post_content]
Looks like you're trying to take the data from one field in WordPress and turn it into a table in SQL. That's probably possible in SQL, but it's probably not the best place to put that logic. It'd be better to use whatever programming language you're using to get the data from WordPress to SQL. Then, you could loop through the lines one by one. If you hit a "<tr>" you know that you're about to grab data for the next record. A "<td>" is one of the fields for that record. A "</tr>" means that you're done with that record and should add it to SQL.
How are you getting the data out of WordPress?
How are you getting the data out of WordPress?
I think I see, based on your last comment, but just to clarify. You're trying to take a field from WordPress and add it to a new table in the same SQL database that WordPress is using?
ASKER
actually the original data is sitting in wordpress and i am getting the data from there into sql server using mysql code. Once i get the data in the sql server then i would like to clean it up. I saw some example on this website but could not understand how to implement it. I have 3 columns in this table and only one of the columns has the html tags. Also not all the cells in this column has html tags; some of them are text. If u can explain to me how can i modify this code would be great. thanks
http://lazycoders.blogspot.com/2007/06/stripping-html-from-text-in-sql-server.html
http://lazycoders.blogspot.com/2007/06/stripping-html-from-text-in-sql-server.html
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
By my quick scan, I agree with mwvisa1 that it's going to transform that text into just one big block of text. Is your goal to have all of that information in a new table or as one field in the existing table, but in a more readable format?
ASKER
well, i would like to test it. how would i tell the code to run this table and check this column? For instance, i don't see in the code below where they specify db name/table name and column name.
CREATE FUNCTION [dbo].[udf_StripHTML]
(@HTMLText VARCHAR(MAX))
RETURNS VARCHAR(MAX)
AS
BEGIN
DECLARE @Start INT
DECLARE @End INT
DECLARE @Length INT
SET @Start = CHARINDEX('<',@HTMLText)
SET @End = CHARINDEX('>',@HTMLText,CHARINDEX('<',@HTMLText))
SET @Length = (@End - @Start) + 1
WHILE @Start > 0
AND @End > 0
AND @Length > 0
BEGIN
SET @HTMLText = STUFF(@HTMLText,@Start,@Length,'')
SET @Start = CHARINDEX('<',@HTMLText)
SET @End = CHARINDEX('>',@HTMLText,CHARINDEX('<',@HTMLText))
SET @Length = (@End - @Start) + 1
END
RETURN LTRIM(RTRIM(@HTMLText))
END
In SQL Server, create a new user function. To do this, you can click the "New Query" button and select your database from the drop down menu. Paste the function above in that window and then click the "Execute" button.
Then, in your select statement, instead of "select MyField" you need "select dbo.udf_StripHTML(MyField) "
Then, in your select statement, instead of "select MyField" you need "select dbo.udf_StripHTML(MyField)
ASKER
thnx
You are welcome! Did you need something additional? Just curious since you marked this a "B". The grade is fine, but if there is something you feel would help you, please do ask or if you run into trouble later you can use a related question; however, do not be afraid to ask for clarification in the future as I would have been happy to keep commenting. I had not responded to your last comment because it looked like you were still going down the stripHTML path and http:#36927707 explained what to do there.
Anyway, glad that I could help in some way.
Best regards and happy coding,
Kevin
Anyway, glad that I could help in some way.
Best regards and happy coding,
Kevin
ASKER
thanks mwvisa1, appreciate your efforts. i am good now. thanks anyway
ASKER
Open in new window