When I try to add a regular expression containing the symbol “<” (which is necessary for e.g. lookbehind assertions) as a filtering condition, this is the result that I get:
I am running the latest Git version of tt-rss on macOS 10.14.5, PHP 7.3.7, MySQL 8.0.16.
What is your actual filter?
Are you putting it inside ()?
I used to use pos & neg lookbehinds all the time and had no issues.
Fox fixed the issue I was having with HTML being stripped.
It’s been a few years, but I remember having an issue and you tweaked something because I was no longer having any problems with my filters using lookbehinds and lookaheads.
Now I no longer use look (ahead|behind), so I wasn’t aware of any issues.
I was trying to find issue i submitted, but it was before the switch to discourse. I’ll try to find it. Maybe i kept something locally on my box.
yeah i’m afraid this will get filtered currently, it was changed sometime after the PDO overhaul i think.
btw as a terrible workaround you can add (or update) whatever regular expression directly in the database, stripping only happens in the actual editor UI. as long as you don’t edit the filter afterwards it’ll work.
<item>
<title>Regular expressions containing the symbol "<" do not work in filters</title>
<dc:creator><![CDATA[@Avoozl]]></dc:creator>
<description><![CDATA[ <p>So how do I use lookbehind assertions in regular expressions? Is there a way to escape them?</p> ]]></description>
<link>https://discourse.tt-rss.org/t/regular-expressions-containing-the-symbol-do-not-work-in-filters/2609/3</link>
<pubDate>Sun, 14 Jul 2019 17:17:50 +0000</pubDate>
<guid isPermaLink="false">discourse.tt-rss.org-post-9248</guid>
</item>
<item>
<title>Regular expressions containing the symbol "<" do not work in filters</title>
<dc:creator><![CDATA[@fox]]></dc:creator>
<description><![CDATA[ <p>this is because ‘<’ opens html tags which are stripped out</p> ]]></description>
<link>https://discourse.tt-rss.org/t/regular-expressions-containing-the-symbol-do-not-work-in-filters/2609/2</link>
<pubDate>Sun, 14 Jul 2019 17:17:09 +0000</pubDate>
<guid isPermaLink="false">discourse.tt-rss.org-post-9247</guid>
</item>
<item>
<title>Regular expressions containing the symbol "<" do not work in filters</title>
I just hit this issue with a negative lookbehind when attempting to filter topics discussing a domain name but without matching email addresses at that domain: (?<!@)example.com.
It looks like it’s getting stripped right away in newrule():
Absent of a dedicated workflow or system for controlling the lifecycle of user input to protect against stored-XSS I wonder this string could be treated as an HTML-encoded string in the DB and htmlspecialchars_decode used on it only when building the regex. That way if it leaks then it’ll leak encoded but it’ll still be usable for filters that might use lookbehinds or match characters that are filtered.
so, if you remove clean there, does it save properly? i remember poking at this when this was originally posted but for some reason decided against changing anything, don’t remember why though.
having markup there shouldn’t really be that big a deal, the worst you could do is somehow script inject yourself, i think.