Feed update and filtering not working reliably

count0 · November 29, 2017, 2:06pm

The updater daemon fails to update feeds in a rather random way, i.e. some feeds are updated while others are not. For those that are not updated, tt-rss claims it has been updated, but the new articles are not there.

If I use --force-refetch the articles are updated correctly, so I don’t think this is a problem with the feeds.

However, when I force the fetching, the filters are not applied correctly: Only some random subset of the articles are correctly selected.

The logs don’t show anything suspicious, and this issue has been occurring for the past few months.

How can I go about debugging this?

tt-rss version (including git commit id): 17.4 (820873de9f), cloned from git master

Platform (i.e. Linux distro, PHP, PostgreSQL, etc) versions: Arch GNU/Linux, PHP 7.1.12, PostgreSQL 10.1

fox · November 29, 2017, 4:15pm

try feed debugger (f D or --debug-feed X) on affected feeds. one possible reason is http 304 not modified: either server reacts improperly to the if-modified-since request or for some reason tt-rss stores last-modified incorrectly.

count0 · November 29, 2017, 5:07pm

Indeed it shows “unable to fetch: HTTP/1.1 304 Not modified [304]” for feeds that are, in fact, not up to date.
This happens for all arxiv.org feeds for example, and several other journal feeds. I’m sceptical that all these feeds are wrong, so I’m inclined to believe that tt-rss is not doing things right.

How could tt-rss incorrectly interpret last-modified? Is it possible to make tt-rss ignore this altogether, and attempt to fetch anyway? (–force-fetch needs to be applied on a per-feed basis, not globally).

And how does any of this have to do with the fact that the filters are also not behaving properly, even if force-fetch is used?

fox · November 29, 2017, 5:28pm

tt-rss stores last-modified value returned by the server as-is and then sends it when doing the conditional request. as far as i know this is how this should work according to http spec. you can try doing same thing with curl.

if this is somehow wrong you’ll have to demonstrate how and why. with logs.

really appreciating the the vote of confidence here

i don’t know what “filters not behaving properly” means. be specific, include debugging logs, filters, feed urls, everything that is relevant.

count0 · November 29, 2017, 5:55pm

I’m not trying to debate what is the proper way of doing things, I just want to understand what is happening.

I have two simple questions: 1) Suppose the feeds are reporting the wrong information, is there something that can be done from tt-rss side? 2) How can I even find out if tt-rss is not doing something correctly? The logs are not being very helpful.

This is entirely besides the point. Even if all these feeds are doing everything wrong, and tt-rss is following the specs to the letter, I’m still left with the fact that I have a bunch of feeds from different sources that are not working, for no obvious reason. Asking them all to fix their feeds and unrealistic, and it would be nice if tt-rss would be robust against the last-modified information being incorrect. (Assuming this is really what is happening).

It is very simple: I have a filter that selects if a word appears in the title. Only some articles with that word are picked up by the filter. If I search for articles with that word in that feed, I find many more. I don’t know why these are not being picked.

If I look in the logs I do not see any errors.

I know it is frustrating to simply hear that “it does not work” but I don’t know what else to say.

JustAMacUser · November 29, 2017, 6:07pm

Why? As someone who runs web sites, I know I would appreciate being informed if I missed something that was causing my visitors problem.

Exactly how much extra code should be devoted to this sort of thing? TT-RSS is asking a site for articles newer than the date the site last told it. If the site doesn’t return results why should TT-RSS assume the site is wrong? What would TT-RSS do then? Request the feed again without the last-modified? If so, then why use last-modified in the first place?

count0 · November 29, 2017, 6:23pm

Big publishing websites are very inefficient in this regard, and I’m not looking forward to sending emails to several journals asking them to fix their feeds.

I’m not even sure they are really broken. The problem could lie somewhere else.

My main concern here is to understand what is happening, which I still don’t.

A simple option to ignore the information for some feeds would do it. The current hack I have now is to call “–force-fetch” on all relevant feeds using a cron job, which does the trick, but is just ugly.

fox · November 29, 2017, 6:25pm

first of all i’m going to state that personally i’m completely not surprised that a supposed linux-power user with a literal arch install can’t actually think independently if his life depended on it. now, with that out of the way, let’s go over your post:

i like how asking for tt-rss to accommodate literally broken servers which don’t respect http specification is, on the other hand, completely normal and realistic. because all responsibility somehow falls on my shoulders.

oh i’m not frustrated at all. i will however note that so far i have asked you for information twice. you have decided to post your semi-related musings and doubts and feelings instead, while noticing that logs are unhelpful, yet refusing to provide them.

i’m going to give you one more chance to redeem your horrible posting itt and provide concrete information related to the problems you are having.

since you seemingly can’t figure how do this basic shit on your own, probably by being a helpless fucking adolescent, i’m going to give you a basic template which might help you to finally turn on your brain. read carefully, an ability to diagnose and formulate your problems might help you later in life, when you are an adult and need to get actual shit done, because your mommy won’t be there anymore.

1. my filters don’t work

i made a filter for keyword K (provide the regexp or the screenshot of filter editor) it did not apply to the post X of feed Y (provide feed debugger logs which show that filter is not applying).

2. my feed won’t update right

here’s the debugging logs for feed X (URL Y). you see that tt-rss provided last-modified timestamp Z to the server which returned http not modified for whatever godforsaken reason even though it probably shouldn’t have.
i have tried the request manually in curl while substituting this last-modified timestamp from tt-rss logs and it resulted in this: i have also compared the stored value to the actual http header the server returned and the result was as follows:

see, we can’t just telepathically solve all your problems with literally nothing to go on. now go forth and learn to use your brain, instead of ricing your animu i3 setup or whatever.

cripes

JustAMacUser · November 29, 2017, 6:38pm

I guess you’ll never know if you don’t try. My experience in doing this has been rather positive though.

So something like:

[ ] Ignore HTTP Spec for this Feed

But it brings me back to my point:

You can actually fix it yourself by simply writing a plugin to hook the feed fetch routine and override how TT-RSS gets the feed data.

Obviously.

count0 · November 29, 2017, 7:06pm

Wow. You should see a therapist.

Nevermind, I’ll try to figure it out on my own. Thanks for nothing.

fox · November 29, 2017, 7:25pm

good luck, going by the cognitive abilities you’ve demonstrated here so far you’re going to need all of it and then some

SleeperService · November 30, 2017, 2:05am

It always amazes me how THEIR abject failure to brain is YOUR fault…

Mebbe you should code up TinyTiny AI and see about installing it in some of those empty heads. I’m sure it’d be a cognitive improvement.

fox · November 30, 2017, 3:47am

on a more productive note maybe conditional requests should be limited to a time period, i.e. force update every few hours regardless. i really don’t want to add configurable options for this.

JustAMacUser · November 30, 2017, 3:53am

Maybe once or twice per day or every x updates (where x is a generous multiplier). After all, the whole point of a 304 response is to fast-track the request if nothing’s changed.

fox · November 30, 2017, 9:42am

i thought maybe once every ~six hours or so, to keep broken servers mostly functional

once a day seems a bit low

e: https://git.tt-rss.org/git/tt-rss/commit/e50c8eaa4e21599272565612a576435e6c0763ba

count0 · November 30, 2017, 6:12pm

It’s amazing; after throwing a hissy fit and hurling childish insults, you address the simple and obvious problem without the help of a bunch of log files and case studies.

Seriously, it is like 4chan in here. Pretty harrowing.

fox · November 30, 2017, 6:31pm

just imagine being this assblasted to come back itt and post this, lol

implying, etc

shabble · November 30, 2017, 6:38pm

He’s still able to post? You mellowing @Fox?

JustAMacUser · November 30, 2017, 6:57pm

Imagine how much faster it would have been resolved if said process included your actual cooperation.

fox · November 30, 2017, 6:59pm

well this entire forum is a honeypot

wait delet this

any idiot can whine himself a feature while not being a contrarian babby just for the kicks tbh