Troublesome hooks - or not?

banned thread made me aware of the potential issues with plugins having access to entire feed data with no sane controls on data processing, tt-rss normally has.

while i value flexibility and a swiss army knife nature of tt-rss i don’t want it to become a really shitty denial of service vector.

therefore, let’s think on what should be done with the following:

const HOOK_FEED_PARSED = 6; -- ?maybe? not sure
const HOOK_FEED_FETCHED = 12;
const HOOK_FETCH_FEED = 22;

anything else i’ve missed?

bundled plugins would be exempt of any potential limitations because there’s at least some measure of quality control and responsibility involved. for third party plugins, though, i’m simply not sure what to do.

e: i can already see a “place this plugin in a system plugins directory instead of plugins.local because it won’t work otherwise” or “adjust whitelist here source.php:123”

this approach probably won’t work

At the end of the day there’s no way to stop someone from intentionally circumventing restrictions, but with the goal being to stop and end user from mindlessly enabling a random plugin without understanding the consequences I have the following ideas (not suggesting all of them, just presenting them as potential solutions):

  • Limit enabling of these plugins to admin-level users.
  • Requiring the user to provide their password to enable the plugin, with a description of the potential problems.
  • Generate a SHA1 sum of the plugin (plugin/class name, or file_get_contents(), whatever). Present this sum in the Preferences section. Require the sum to be copied to a comma-separated constant in config.php.
  • Add a static variable to fetch_file_contents(). A multi-dimensional array with the base domain as the key and an integer as the number of times that domain has been hit. Limit the number of requests with a defined constant.

I rather like the last one because it completely prevents other avenues of DOS (media cache, etc.). Because it’s just a static variable it would only last for the duration of that update session. Tuning it would be difficult though.

i like the last one, but what if you need to process all this data? what happens when you’ve exceeded this limit?

it would be confusing if readability stopped working suddenly after ten articles.

also, this limit should probably be per time period , not just one update session, which implies persistence, which might be tricky to implement correctly. without something like memcached or redis.

Yeah… It would need persistence now that I think about it because if someone is using the multi-threaded updater it would be running multiple instances concurrently, as well centralized blog sites (Medium, WordPress dot com) wouldn’t be tracked properly.

And media… Imagine if half of the articles in a feed had the media cached and the other half didn’t.

Lots of stuff to potentially break with this.

well, on the other hand, this would be normally limited to a initial feed update, not hitting origin server too hard on every subscription would probably be for the best.

but yes, this should be shared between update processes somehow. i’d really prefer to not use the database for it though, it just sucks.

also:

  1. https://git.tt-rss.org/fox/tt-rss/commit/0d7b10469bbd1e0923fd044283ab9948fb887e57
  2. https://git.tt-rss.org/fox/tt-rss/commit/63ce7ea705fbe47930d1d3a2792d0c7d09a0a109
  3. https://git.tt-rss.org/fox/tt-rss/wiki/FeedHandlerPlugins

additionally, https://git.tt-rss.org/fox/tt-rss/commit/cd4b7f1988633272fc46655e9cf2796a75668e07

this doesn’t block any http requests for the time being, it would be interesting to see the feedback though.

this should also apply for entire update batches (i.e. per invocation of updater worker process) so this is somewhat shared between users, who were unlucky to end up at the same batch.

tweaking this number might be, uh, complicated.

These are some good changes.

In one of the latest commit, a warning have been added about the use of HOOK_FETCH_FEED in plugins.

Maybe it’s reliable to rely on a flag placed on a plugin to tell tt-rss the plugin take care of the remote server by not requesting excessively. For example, the plugin tumblr_gdpr do not request the page content (it’s not a scrapper) but simply change the UA before requesting the server. That’s all.

On counterpart, the very usefull feediron is a scrapper and can overload the server (if misconfigured, as seen in this feed ). But, because it don’t use FETCH_FEED (but ARTICLE_FEED) this particular plugin don’t throw any warning.

Users, what’s your opinion on that ?
Kind regards

this isn’t about people trying to create excessive requests on purpose, i hope there’s no plugins like that for tt-rss. it’s mostly about unintended consequences.

which hook do you mean exactly? i’ve limited the warning to per-feed hooks because per-article plugins are only called if it has been considered modified by normal update process which they have no control over.

i mean obviously you can request the entirety of the internet from any hook, there’s no functional limitations there, but i’m assuming plugins are not being malicious on purpose.

e: i’m gonna merge this with the main discussion thread in development

Tiny Tiny RSS v19.8 (cd4b7f1)
no modification

Ubuntu+Apache+MySQL+PHP

Message:
Exceeded fetch request quota for promodj.com: 28

  1. include/functions.php(238): user_error(Exceeded fetch request quota for promodj.com: 28, 512)
  2. classes/rssutils.php(397): fetch_file_contents(Array)
  3. classes/rssutils.php(148): update_rss_feed(499, 1, )
  4. update.php(250): update_daemon_common(500)

just because you saw a message in the event log doesn’t mean you have to create a thread about it.

it’s not your first time so i’m going to probate you for a week. lurk more. put more effort in your posts here.

TT-RSS v19.8 (cd4b7f1)

Ubuntu 18.04 Nginx 1.14.0 PostgreSQL 12.0-2

Hello, i tried searching for other threads about it and it lead me here :thinking:

i was abit worried to post though as im seeing almost the same messages as mazzy, only in my setup

im seeing quite alot of them,however mine seem to be related to the af_redditimgur plugin directly, also

when i disabled it the messages seems to quiet down,i peeked into the functions.php and also found that

if i increased MAX_FETCH_REQUESTS_PER_HOST, 25 to about 100 or so event log quiets down then

aswell, ive done some changes into that file before but when im doing updates ,im using git with reset

–hard as on rare occasions regular pull doesn’t work ,then my changes get lost , im not well versed with

how git

works but changes to the config.php seems to persist though, when i read the thread i thought restrictions

was intended for third party plugins and not bundled also i thought the plugin im using is bundled? I’m

having two domains in log reddit.il (i got 60 reddit feeds,with some being pure picture feeds,might be

reason) but also another goo.gl domain

which i don’t quite understand why is present. Apologizes if i posted this the wrong place

Below is latest error

E_USER_WARNING (512)|include/functions.php:238
Exceeded fetch request quota for redd.it: 60

  1. include/functions.php(238): user_error(Exceeded fetch request quota for redd.it: 60, 512)
  2. plugins/af_redditimgur/init.php(335): fetch_file_contents(Array)
  3. plugins/af_redditimgur/init.php(434): inline_stuff(Array, [DOMDocument], [DOMXPath])
  4. classes/rssutils.php(716): hook_article_filter(Array)
  5. classes/rssutils.php(148): update_rss_feed(16, 1, )
  6. update.php(250): update_daemon_common(50)|

first of all, this is only a warning message right now, it doesn’t really do anything else.

second, yes, stuff like many reddit feeds in one batch can easily exceed this limit, this is something that i didn’t think about. maybe this quota should be reset between feed updates or we need to rethink the whole thing.

Its ok a little bit unique maybe to have so many feeds from one domain i think,normally its the one only,just

got a little worried,in all the time i’ve used tt-rss ive never seen warnings or errors except if i manually

deploy somewhere (always forget stuff so user errors appear) so that’s what had me puzzled,my log gets

filled with theses so its a bit tough to find other messages ,ill manage though, could set syslog as log

target and have some other program search for stuff of meaning or just grep around the log every now

and then ,as long as it doesn’t affect performance of server im all good :slightly_smiling_face:


Edit: Nice latest update silence all the reddit domain related warnings seems,although now i seem to get

alot of warnings of same sort for the goo.gl domain ,don’t know what to make out of this domain though as

to which feeds it links to,it seems to be googles url shortener the domain itself

Edit: If needed i can keep updating here

,else i think ill start log to syslog instead and see if can get going some sort of filter with syslog daemon to

warn of critical errors although i dont really know what to look for as ive yet to see any real errors

https://git.tt-rss.org/fox/tt-rss/commit/8c3efd51ecd76f4e9ffeb111bbbe2c5862aa3bed

how about with this change?

I am sorry for resuming this thread, but since you asked for feedback, here it goes :grinning:

I have 284 lines in the error log table, generated over 24 hours, and created by only 6 articles, in 6 different feeds.

While it might be useful to know, I believe it defies the purpose of the error log, as what might actually be an error is buried between all those warnings.

In my case I use readability to fetch long articles with lots of images.
Do I stop fetching those because of the warnings? Of course not. I just empty the error log from time to time.

My suggestions would be (in order of preference):

  • remove the whole thing (which might be quite extreme);
  • allow to (dis)enable it in the preferences;
  • have one line per feed per update;
  • have one line per article per update;

I have no idea how hard would it be to implement any of the above, I am actually ok with the current situation.

Just my 2 cents.

yeah, since we’re not actually enforcing those limits, i might as well disable this warning for the time being.

e: https://git.tt-rss.org/fox/tt-rss/commit/01513aa41b29d9b7c748ce86b8232c73901573bf

Thank you. Error_log is much cleaner now

I am thankful these hooks are still there, I was fighting with a nasty feed that had some non printable characters in it that made libxml shit bricks.

The ability to write a small plugin to preg_replace them before libxml parsed the feed was great.

yeah, i can’t argue that there are no legitimate uses for those hooks, can’t really limit plugins to per-article processing if the feed refuses to parse in the first place.