For making sure our HTML output does not get messed up, we use the markdown-it-sanitizer
gem by @svbergerem. This is a good idea, but unfortunately causes us to violate CommonMark.
As reported in this bug, diaspora* fails to correctly parse a content like
Foo <h1> bar
Here, we ignore the <h1>
tag because it’s unbalanced, but we probably should not. According to CommonMark, this should be rendered as a “Foo”, followed by a Headline Level 1 “bar”. This is also what markdown-it
and commonmark.js
are doing.
In order to keep the Markdown payloads compatible with other applications and to keep it predictable for our users, we should follow the spec here.
However, we probably still should keep track of unbalanced tags somehow and close them at the end of the block in order to avoid broken renderings of the rest of the stream. I’m not too sure how to achieve this yet, so here’s a discussion. If someone has an idea, please drop by.