Should escape/preserve < and > within <code>

PLEASE READ THIS BEFORE POSTING: https://discourse.tt-rss.org/t/read-before-posting-reporting-bugs/120

Describe the problem you’re having:

An article with <...> contained within <code> and <pre> is treated like a regular HTML tag and apparently stripped out if invalid. It should be escaped and preserved instead.

If possible include steps to reproduce the problem:

Subscribe to this feed URL: Hacker News - Newest: "nameservers"

The first article there (Some .io nameservers are returning wrong results again | Hacker News) has a couple code blocks. Loading the feed URL directly in the browser renders correctly in Firefox and also in https://fakecake.org/myfeedsucks/ but the same article is missing a bunch of code in between < and > when viewed through TT-RSS.

tt-rss version (including git commit id):

tt-rss git (af13f3009c59c3db338b719b09335a472383d11c)

Platform (i.e. Linux distro, PHP, PostgreSQL, etc) versions:

Ubuntu 16.04, PHP 7.0.22-0ubuntu0.16.04.1, PostgreSQL 9.5.8

Please provide any additional information below:

Debug log:

[21:39:14/10685] article processed
[21:39:14/10685] guid 2,https://news.ycombinator.com/item?id=15293578 / SHA1:4c8795d931bd62d325f501f79f95f797e4ce6eea
[21:39:14/10685] orig date: 1505914846
[21:39:14/10685] date 1505914846 [2017/09/20 13:40:46]
[21:39:14/10685] title Some .io nameservers are returning wrong results again
[21:39:14/10685] link https://news.ycombinator.com/item?id=15293578
[21:39:14/10685] author JelteF
[21:39:14/10685] num_comments: 0
[21:39:14/10685] looking for tags...
[21:39:14/10685] tags found: 
[21:39:14/10685] done collecting data.
[21:39:14/10685] article hash: da23c5f5f652a25f1209325f51cc1e048505bb8b [stored=da23c5f5f652a25f1209325f51cc1e048505bb8b]
[21:39:14/10685] hash differs, applying plugin filters:
[21:39:14/10685] plugin data: 
[21:39:14/10685] matched filter rules: 
[21:39:14/10685] filter actions: 
[21:39:14/10685] article labels:
[21:39:14/10685] force catchup: 
[21:39:14/10685] base guid found, checking for user record
[21:39:14/10685] initial score: 0 [including plugin modifier: 0]
[21:39:14/10685] user record FOUND
[21:39:14/10685] RID: 61331, IID: 61330
[21:39:14/10685] assigning labels [other]...
[21:39:14/10685] assigning labels [filters]...
[21:39:14/10685] looking for enclosures...
[21:39:14/10685] article enclosures:
Array
(
)
[21:39:14/10685] filtered article tags:
Array
(
)

All HTML special characters must be encoded. If a web site is serving literal <, >, and & characters inside any HTML tags (including pre and code) then clients are correct to interpret them as HTML code. Characters should be encoded as &lt;, &gt;, and &amp; (e.g. PHP has htmlspecialchars() function for this purpose).

Characters should be encoded as <, >, and & (e.g. PHP has htmlspecialchars() function for this purpose).

Can still cause trouble. A recent example I could pull up on zero notice: "Yes! More! Yes, yes, yes!"

Specifically, &quot;&amp;lt;Kobayashi-san chi no maid dragon&amp;gt;&quot; and &quot;&amp;lt;Nichijou&amp;gt;&quot; will present a blank title in TT-RSS, at the time of this post.

It’s late enough and I’m tired enough that I could be wrong about whether they’re escaping properly. Also, code tags because the discourse preview is interpreting quot; to be helpful to me, I guess.

rss content is parsed as a valid xml document, i’m not sure if html special snowflake stuff is applicable

even if it is i’m not going to add any special processing for content instead of feeding it to DOMDocument because it’s a security nightmare so this behavior is not going to change

tldr: no