Solved

Strip unsafe tags from Html in javascript

Posted on 2016-09-06
5
81 Views
Last Modified: 2016-10-06
HI,
There is a library called Jsoup written in java using which i can remove unsafe tags in html like <script></script etc.
I want to do the same thing. But i need to strip of unsafe tags in javascript.
Jsoup implementation doesnt seem to exist for javascript.

Is there any library or any way i can strip unsafe tags in an html using javascript ?

eg.
If i have # rohit
<script>alert(10)</script>
I want to get # rohit

The use case for this is :
I am writing a markdown editor. User enters markdown in a textarea then switches to markdown mode and i show the corresponding HTML in another pane.
This is all happening on client side.
Now in my case whats happening is user can type stuff like # rohit and when switches to other tab using a lib called
marked  i convert it to HTML which causes the unsafe html tags if present like <script>alert(10)</script> to execute.
Although marked does have an option sanitize but it just replaces < > with &lt etc..
which does prevent the script tag from executing. But the issue is if i type something like <b> rohit </b> in raw markdown the converted HTML will show it as bold. But after sanitization this will show as it is which is wrong.

Thanks
0
Comment
Question by:Rohit Bajaj
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
5 Comments
 
LVL 30

Accepted Solution

by:
Alexandre Simões earned 250 total points
ID: 41786850
Hi mate,
in my opinion, the sanitization is working properly and you should keep it.

Your example with the <b> rohit </b> is actually behaving properly because it is not Markdown.
In Markdown, if you want to do bold you do **rohit**

Remember that Markdown is and abstraction language which can be used to generate formats other than HTML. This means that <b> rohit </b> will appear exactly like this if you use a Markdown to PDF converter, for instance.

Bottom line, force users to user the Markdown syntax when they are in the Markdown editor, and you won't have any problems.

Cheers,
Alex
0
 

Author Comment

by:Rohit Bajaj
ID: 41787265
Hi,
But github gist which uses github flavoured markdown does so. If you type in any unsafe tag it does strips it and shows the ones in <b> rohit </b> in bold. I want to make my application in line with github flavoured markdown...
0
 
LVL 1

Assisted Solution

by:tr0gd0r
tr0gd0r earned 250 total points
ID: 41788589
I suggest using a javascript library that converts markdown to html such as marked. Or alternatively a full HTML parser.

Academically speaking, you can use the browser's native ability to parse HTML by doing something like this:

// create a div in memory
var div = document.createElement('div');
// set the html
div.innerHTML = '<script>alert("pwnd")</script><img src="http://hacker/virus.png" onclick="alert(\'pwnd onclick\')" onerror="alert(\'pwned onerror\')">';
// log the html, which will have <script> removed
console.log(div.innerHTML); // <img src="http://hacker/virus.png" onclick="alert('pwnd onclick')" onerror="alert('pwned error')"> 

Open in new window

Some tags including <script> and <style> will be stripped out or ignored automatically.

HOWEVER, the browser will make requests to things like image sources and run events like onload and onerror. Plus if you append the in-memory node to the DOM without cleaning it first, clicking the img would run any onclick code.
0
 
LVL 30

Expert Comment

by:Alexandre Simões
ID: 41788734
In that case, use an HTML sanitizer instead and disable the Marked sanitizer.
It won't touch anything other than the unsafe tags.

Another thing you should consider is to "re-sanitize" on server-side.
You shouldn't trust the client, ever. It's Ok to do the sanitizing client-side in order to be more user friendly, but before saving you should check it with your server-side code.

Cheers
1
 
LVL 1

Expert Comment

by:tr0gd0r
ID: 41788745
@Alexandre Exactly. Hopefully you would use the JavaScript-based sanitation for the preview and only send the markdown to the server. Then the server would parse and clean the markdown and not store HTML at all.
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Html split(text) 2 30
Pass SQL to JSON. Page is in classic ASP and using Jquery 4 38
html form layout 4 36
Can't get video to center on page 2 12
This article demonstrates how to create a simple responsive confirmation dialog with Ok and Cancel buttons using HTML, CSS, jQuery and Promises
Finding original email is quite difficult due to their duplicates. From this article, you will come to know why multiple duplicates of same emails appear and how to delete duplicate emails from Outlook securely and instantly while vital emails remai…
In this tutorial viewers will learn how to position overlapping items using z-index in CSS. They will also learn the restrictions on the z-index property.  Create a new HTML document with an internal stylesheet.: Create a div in CSS and name it Red.…
Learn how to create flexible layouts using relative units in CSS.  New relative units added in CSS3 include vw(viewports width), vh(viewports height), vmin(minimum of viewports height and width), and vmax (maximum of viewports height and width).

749 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question