Modify HTML content saved in a DB table

Jorge Maldonado
Jorge Maldonado used Ask the Experts™
on
I have a table in a DB with a general text field which contains HTML formatted text. I need to parse the content of such a field, find all the "img" tags and perform 2 operations (only for "img" tags):

1) Remove completely the "style" attribute (if there is one).
2) Add a class="img-responsive" attribute.

For example, a simple string to be parse can be as follows:

<div>
<p>This is some text</p>
<img src="http://www.mywebsite.com/myImage.jpg" alt = "" style="width:600px; height: 400px;"/>
</div>

Open in new window


Or, something more complex:

<p style="margin: 0px 0px 10px; background: white; font-size: 14.6667px; font-family: Calibri, sans-serif;"><span style="font-size: 14pt; font-family: Georgia, serif; color: #404040;">
<img src="http://www.mywebsite.com/myImage.jpg" alt="" style="width:600px; height: 898px;" />
<br /></span></p>  

Open in new window


In both cases, the "img" tag should result in:

<img src="http://www.mywebsite.com/myImage.jpg" alt="" class="img-responsive" />

Open in new window


I know that one option is to use regular expressions but I do not have any experience with them. I will be using C# to perform this task.
I will very much appreciate your help.

Respectfully,
Jorge Maldonado
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Chinmay PatelChief Technology Ninja
Distinguished Expert 2018

Commented:
Hi jorge,

I would take a different route if i had to do it myself. How about just getting img and alt attributes and get rid of everything else and then just add img-responsive class attribute. Is that something we can do here?

Regards,
Chinmay.

Author

Commented:
It sounds good. I need to get rid of all the attributes for every "img" tag except the following:

<img src="http://www.mywebsite.com/myImage1.jpg" alt = ""/

Open in new window


An then insert class="img-responsive".
What about using HTML Agility Pack?
Does anybody know how to use it?

Regards.
ste5anSenior Developer

Commented:
Is it an existing project or new one?

In the latter case I would consider using XHTML as content format. Cause then it can be processed as XML and you don't need regex.
Should you be charging more for IT Services?

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden using our free interactive tool and use it to determine the right price for your IT services. Start calculating Now!

Author

Commented:
It is an existing project and the DB table already has many records with HTML formatted data.

Regards,
Jorge Maldonado
Chinmay PatelChief Technology Ninja
Distinguished Expert 2018

Commented:
Hi jorge,

I don't think we need to use HTML agility pack for this but you are free to do so. It is as straight forward to use as they claim.
If you get stuck somewhere, do let me know.

Regards,
Chinmay.

Author

Commented:
Hi Chinmay,

I do not have a final solution yet, I am performing some tests with HTML Agility Pack.
I will appreciate if you can help using another approach.

Respectfully,
Jorge Maldonado
Chinmay PatelChief Technology Ninja
Distinguished Expert 2018

Commented:
Hi Jorge,

A non-HTML Agility pack approach will be using string manipulation
1. Find the <img> tag using indexOf
2. Extract src and alt attributes using either indexOf or String,Split("=") - find the attribute src and alt and their values(which will be next to them) from the array
3. Build your string with the above info, add CSS class.
4. Rinse and repeat for the next string.

Regards,
Chinmay.
You can also generated an excel or csv  file from the DB table then use search and replace ...
I could finally managed to make HTML Agility Pack work successfully. The need I have is to process HTML before it is displayed in a ASP.NET MVC website. The approach I took is to

a) Process the information from the DB in the controller.
b) Use HTML Agility Pack to process the HTML in question.
c) Step (b) is done only with data in memory and not saved to the DB.
d) Call the View and pass the data.
e) Display the data in the View.

In this way, all the information in my DB remains unchanged. So, basically I only apply the required changes to the HTML data at run-time.

Best regards,
Jorge Maldonado

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial