Solved

Strip unsafe tags from Html in javascript

Posted on 2016-09-06
5
100 Views
Last Modified: 2016-10-06
HI,
There is a library called Jsoup written in java using which i can remove unsafe tags in html like <script></script etc.
I want to do the same thing. But i need to strip of unsafe tags in javascript.
Jsoup implementation doesnt seem to exist for javascript.

Is there any library or any way i can strip unsafe tags in an html using javascript ?

eg.
If i have # rohit
<script>alert(10)</script>
I want to get # rohit

The use case for this is :
I am writing a markdown editor. User enters markdown in a textarea then switches to markdown mode and i show the corresponding HTML in another pane.
This is all happening on client side.
Now in my case whats happening is user can type stuff like # rohit and when switches to other tab using a lib called
marked  i convert it to HTML which causes the unsafe html tags if present like <script>alert(10)</script> to execute.
Although marked does have an option sanitize but it just replaces < > with &lt etc..
which does prevent the script tag from executing. But the issue is if i type something like <b> rohit </b> in raw markdown the converted HTML will show it as bold. But after sanitization this will show as it is which is wrong.

Thanks
0
Comment
Question by:Rohit Bajaj
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
5 Comments
 
LVL 30

Accepted Solution

by:
Alexandre Simões earned 250 total points
ID: 41786850
Hi mate,
in my opinion, the sanitization is working properly and you should keep it.

Your example with the <b> rohit </b> is actually behaving properly because it is not Markdown.
In Markdown, if you want to do bold you do **rohit**

Remember that Markdown is and abstraction language which can be used to generate formats other than HTML. This means that <b> rohit </b> will appear exactly like this if you use a Markdown to PDF converter, for instance.

Bottom line, force users to user the Markdown syntax when they are in the Markdown editor, and you won't have any problems.

Cheers,
Alex
0
 

Author Comment

by:Rohit Bajaj
ID: 41787265
Hi,
But github gist which uses github flavoured markdown does so. If you type in any unsafe tag it does strips it and shows the ones in <b> rohit </b> in bold. I want to make my application in line with github flavoured markdown...
0
 
LVL 1

Assisted Solution

by:tr0gd0r
tr0gd0r earned 250 total points
ID: 41788589
I suggest using a javascript library that converts markdown to html such as marked. Or alternatively a full HTML parser.

Academically speaking, you can use the browser's native ability to parse HTML by doing something like this:

// create a div in memory
var div = document.createElement('div');
// set the html
div.innerHTML = '<script>alert("pwnd")</script><img src="http://hacker/virus.png" onclick="alert(\'pwnd onclick\')" onerror="alert(\'pwned onerror\')">';
// log the html, which will have <script> removed
console.log(div.innerHTML); // <img src="http://hacker/virus.png" onclick="alert('pwnd onclick')" onerror="alert('pwned error')"> 

Open in new window

Some tags including <script> and <style> will be stripped out or ignored automatically.

HOWEVER, the browser will make requests to things like image sources and run events like onload and onerror. Plus if you append the in-memory node to the DOM without cleaning it first, clicking the img would run any onclick code.
0
 
LVL 30

Expert Comment

by:Alexandre Simões
ID: 41788734
In that case, use an HTML sanitizer instead and disable the Marked sanitizer.
It won't touch anything other than the unsafe tags.

Another thing you should consider is to "re-sanitize" on server-side.
You shouldn't trust the client, ever. It's Ok to do the sanitizing client-side in order to be more user friendly, but before saving you should check it with your server-side code.

Cheers
1
 
LVL 1

Expert Comment

by:tr0gd0r
ID: 41788745
@Alexandre Exactly. Hopefully you would use the JavaScript-based sanitation for the preview and only send the markdown to the server. Then the server would parse and clean the markdown and not store HTML at all.
0

Featured Post

Enroll in June's Course of the Month

June’s Course of the Month is now available! Experts Exchange’s Premium Members, Team Accounts, and Qualified Experts have access to a complimentary course each month as part of their membership—an extra way to sharpen your skills and increase training.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Not sure what the best email signature size is? Are you worried about email signature image size? Follow this best practice guide.
The article shows the basic steps of integrating an HTML theme template into an ASP.NET MVC project
In this tutorial viewers will learn how to style transparent/translucent elements using alpha transparency in CSS Start with a normal styled element, such as a div.: Define its "background-color" property as "rgba (255, 255, 255, .5): The numbers in…
The viewer will learn how to dynamically set the form action using jQuery.

688 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question