oh im fairly certain that’s not how it works
Me? I don’t have a clue how it works. I never heard the term GDPR before this discussion. My last comment was totally based on the other replies herein – in particular, the full error message text posted by mamil, and the response I received from the newspaper’s editor.
Assuming the the content apps she referred me to live on servers in the US (or anywhere other than the EU), then requests sent by those apps to the newspaper’s RSS server would not be denied. Likewise, any RSS reader that’s not based in the EU could serve up their RSS feeds to me. Or I can display them in my browser, in which case the request comes through my local ISP. I think what she’s saying is only requests from the EU are being blocked. Whether or not that’s an appropriate use of the 451 error is a different question.
What I find weird is that it’s impossible to collect private user information from RSS feeds because A) no cookies are sent, B) no JS is executed, and C) the IP address is that of the RSS-reader server, not the end user’s IP address. So why do these sites break their RSS feeds when accessed from Europe?
Oh, I know the answer. Incompetence.
A GET request for an RSS feed can just as easily include a cookie as a request for anything else on a web server. Anyway, they’re surely blocking all requests coming from some list of European ips, they’re not going to add another check to permit European requests if they don’t include cookies.
There’s a lot of uncertainty about whether an ip address is enough to count as personally identifying. The fact that the request for the feed is coming from a server is immaterial; to the hosting site, TTRSS on a server is just as much a client as a feed reader on a desktop or mobile device. The TTRSS server could also be used by a single individual (which is often the case) so the server’s ip address could be just as personal, if not more so, than the ip assigned to you at home by your ISP.
Yes, that’s technically possible. But are there any RSS clients that save and send back cookies by default?
I don’t know but it doesn’t matter, it’s all just HTTP.
Interesting idea. I just tried to set up a proxy for the feed linked in my original post. Unfortunately, I was stumped at the step where it asks to define the required “Item (repeatable) search pattern” macro. The “?” pop-up wasn’t very helpful, at least not for a noob like myself. If it’s not too much trouble, could you take a look? Feed43 New Feed
Thanks, Homlett. Unfortunately I still get the 451 error. I checked the location of feed43’s IP address and it’s in the US. Go figure.
EDIT: The next time I opened TT-RSS, there was no error. The proxy feed worked! There must have been a delay before the proxy URL took effect. Thanks!
well some people use reader apps, as opposed to something client-server like feedly or tt-rss
@homlett, the feed43 home page says it can be used to create an RSS feed from any page. I could use something like that for the Associated Press news feeds. They discontinued RSS support last Fall. Instead. The AP news feeds are now presented as web pages (e.g., https://www.apnews.com/tag/apf-topnews). There was a plugin that fixed this but it no longer works (see this discussion).
I found a tutorial @ feed43 that explains how to set up the extraction rules. However, the AP uses JS to generate the article feed so there’s nothing to use for the extraction rules. Does that mean it’s not possible to create a proxy RSS feed in this case?
There is even the full text of the articles but with some ugly non html breaklines unfortunately.
Also, because the json file is quiet heavy, Feed43 keeps only a tiny part and you won’t be able to “extract” more than 9 entries. Which should be enough with a hard refresh rate I guess (depends on Feed43).
Building a complete feed from this json would be easy with a tool like Huginn of even a dedicated ttrss plugin. Anyway, here it is:
luckily for all of us frontend developers are incapable of doing anything without a nice JSON provided by someone who has functional brain matter
Thanks @homlett. That works, within the limitations you mentioned. With the free version of feed43, the refresh rate is 6 hours. However. the AP Top News feed often produces a lot more than 9 feeds within that period, mostly repeats, so I’ll have to see how much it misses.
The author of the AP News plugin for TT-RSS replied to the other discussion I linked in my previous reply. He says the plugin still works, so we’re trying to figure out why @mamil and I are getting the
"unable to download URL" error.
@mamil, is your instance of TT-RSS running on an EU server?
@mamil, so we can’t rule out GDPR. If so, I guess there’s not much to do about it other than stand up my own instance of TT-RSS here in the US. I’m afraid I’m not up for the technical challenge, nor do I have time to learn. Maybe next year. I recently built my first Linux box and I’ve had to spend way more time than I imagined learning how to do all the things I took for granted on my aging (but still productive) XP box.
@homlett, as I suspected, the feed43 proxy doesn’t pick up nearly all the AP top news articles. I guess my best option to to subscribe to the paid version, which updates every hour and has a larger page size allowance (250k vs 100k). Together those features should handle whatever AP throws at it. I really appreciate your effort to set that up!
@homlett, the feed43 link you provided (https://feed43.com/4546230130273742.xml), no longer works. Since July 9 it only loads help wanted listings. I don’t understand how that’s possible, since the original URL (as shown in my original post) still works directly in my browser. Any idea what could cause the feed43 proxy to generate an entirely different feed, or how to fix? I appreciate your assistance!
You can edit here if needed:
@homlett, when I opened that link, I noticed the URL is wrong… some site that lists oil and gas jobs (the original domain was tucson.com). That would explain the problem I’ve been having. The script worked perfectly until July 9. The only explanation I can think of is that the feed43 database got corrupted, causing two unrelated feed scripts to conflate.
I changed the URL back to the newspaper feed page but your original extraction rules and output format were overwritten by entries associated with the oil & gas jobs feed. I would be grateful if you could fix the script as I have no idea how to do that. I should have saved to my computer