Insane amounts of incoming traffic on every feed update

I have a dedicated VPS, running the latest Tiny Tiny RSS v18.8 (79c5035). Nginx, PHP 7, PGSQL. I have about 150 feeds on an account which is regularly read and purged and another user which managed to rack up some 50,000 unread posts which I’ve subsequently purged. Running update_daemon2.php as a service, built-in plugins and
embed_original
feediron
ff_xmllint
tumblr_gdpr_ua
videoframes

Everything was working fine for years until nine days ago, when feed updates began generating insane amounts of incoming traffic on each run (every 60 seconds), which I’ve noticed when I got an e-mail from my provider warning me that I’ve used up 0.78 TB out of my 1TB allotment, which never ever happens.

The traffic spike (some 60-100 mb) happens at the very end of the feed update, after it checks all the feeds, i.e. somewhere during this time:
[18:25:48/11293] Processed 21 feeds in 17.6000 (sec), 0.8381 (sec/feed avg)
[18:25:48/11293] Running housekeeping tasks for user 5…
[18:25:48/11293] Sending digests, batch of max 15 users, headline limit = 1000
[18:25:48/11293] All done.
[18:25:48/11293] cache/feeds: removed 0 files.
[18:25:48/11293] cache/images: removed 0 files.
[18:25:48/11293] cache/export: removed 0 files.
[18:25:48/11293] cache/upload: removed 0 files.
[18:25:48/11293] Removed 0 old lock files.
[18:25:48/11293] Removing old error log entries…
[18:25:48/11293] Feedbrowser updated, 110 feeds processed.
[18:25:48/11293] Purged 1 orphaned posts.
[18:25:48/11293] Removed 0 (feeds) 0 (cats) orphaned counter cache entries.

For the life of me, I can’t remember updating anything right before this weirdness started happening. I can’t be sure what version was running when this started happening, because I’ve updated to the latest github version since.
I’ve switched off image caching for all feeds, purged a lot of stuff, but no change. 60 mb might not seem too much, but 60 mb x 60 minutes x 24 hours amounts to almost 85 gigs of unaccounted-for traffic per day.
The email digest plugin is disabled globally.
Both update_daemon2.php and manual runs of update.php do this. I’ve switched from a service to updating every 15 minutes via cron to limit the traffic, but I’d love to find out the root cause of this.
An example of one such event, the smaller SSH window is running nload, I’ve manually run update.php --feeds in the left.
spike

The bandwidth chart for my VPS:
52

So the question is, what function run at the end of the update could generate incoming traffic?

maybe you should start with disabling third party plugins and go on from there

I did that when things went sideways. I’ve now disabled internal plugins and it seems that cache_starred_images is the culprit, but I’m not sure why it would repeatedly cache stuff that’s already in the cache folder. Judging by the size, it could be trying to cache a video.

Does the update script have any options for verbose logging?

yes, DAEMON_EXTENDED_DEBUG in config.php i think.

if you’re using curl there should be (?) built-in protection against downloading too much, if not, i’m afraid all bets are off.

I’ve managed to identify the offending post, it was a starred reddit post with an embedded video. I guess it was downloading it over and over again.

that’s interesting, why would it download over and over? was the download unsuccessful somehow?

i guess this isn’t the best plugin to enable if you have limited traffic.

DAEMON_EXTENDED_DEBUG has nothing to report about the cache_starred_images plugin. It clearly downloads the video each time, but it never gets saved to cache/starred-images. Anything else I can try?

[20:15:06/14675] Processed 2 feeds in 0.5322 (sec), 0.2661 (sec/feed avg)
[20:15:06/14675] Running housekeeping tasks for user 2...
[20:15:06/14675] Sending digests, batch of max 15 users, headline limit = 1000
[20:15:06/14675] All done.
[20:15:06/14675] cache/feeds: removed 0 files.
[20:15:06/14675] cache/images: removed 0 files.
[20:15:06/14675] cache/export: removed 0 files.
[20:15:06/14675] cache/upload: removed 0 files.
[20:15:06/14675] Removed 0 old lock files.
[20:15:06/14675] Removing old error log entries...
[20:15:06/14675] Feedbrowser updated, 110 feeds processed.
[20:15:06/14675] Purged 2 orphaned posts.
[20:15:06/14675] Removed 0 (feeds) 0 (cats) orphaned counter cache entries.

if there’s a write error or something it should go somewhere, maybe tt-rss error log

if there’s nothing there, i dunno. i guess the only way is adding more debugging manually. :frowning:

e: check the plugin itself, maybe there are debugging knobs in there.

There’s only a basic
_debug("cache_images: downloading: $src to $local_filename");
commented out which just proves that the download does get triggered, but provides no feedback whether the download was successful or has errors.
In any case, thanks for the assist, I’ve decided to disable the plugin to prevent any potential issues in the future.
It might be a good idea to at least add a minimal logging entry to the plugin which would mention that images/videos are being cached, it would have greatly simplified my investigation.

With Reddit could it be the user tracking links tricking TTRSS into thinking it’s a new video each time? That would be my first guess.

downloading stuff repeatedly is bad for whatever reason, i’ll add some kind of status tracking to cache_starred so that won’t happen, allow one or two attempts per-article and that’s it.

e: there’s some already (via database article marking) but it wasn’t enough.

e2: https://git.tt-rss.org/fox/tt-rss/commit/758752684c68efd179071cd77c92f78879e68f6d

It’s not this, the video never gets saved, although I am able to download it manually using curl.

Thanks, I’ll test this out next week when I find the time.

Works as intended, the video was downloaded, or rather wasn’t, only once. I didn’t think to enable logging so I haven’t found out the underlying cause, though.