Consider dropping/modifying markdown-it-sanitizer to stay in accordance with CommonMark

For making sure our HTML output does not get messed up, we use the markdown-it-sanitizer gem by @svbergerem. This is a good idea, but unfortunately causes us to violate CommonMark.

As reported in this bug, diaspora* fails to correctly parse a content like

Foo <h1> bar

Here, we ignore the <h1> tag because it’s unbalanced, but we probably should not. According to CommonMark, this should be rendered as a “Foo”, followed by a Headline Level 1 “bar”. This is also what markdown-it and commonmark.js are doing.

In order to keep the Markdown payloads compatible with other applications and to keep it predictable for our users, we should follow the spec here.

However, we probably still should keep track of unbalanced tags somehow and close them at the end of the block in order to avoid broken renderings of the rest of the stream. I’m not too sure how to achieve this yet, so here’s a discussion. If someone has an idea, please drop by.

I currently don’t have a good idea for this. It is very convenient just parsing the <h1> as text when it is unbalanced. But like we see here it is causing inconveniences at other places. BTW: What happens with this on mobile? AFAIK the Markdown parser process is different there, isn’t it?

For obvious reasons I would like seeing this unbalanced stuff being treated in some way before send to other systems, but of course its not that easy.