Newspaper feed blocked in ttrss? [SOLVED]

ginahoy · June 17, 2018, 1:56am

Thanks, Homlett. Unfortunately I still get the 451 error. I checked the location of feed43’s IP address and it’s in the US. Go figure.

EDIT: The next time I opened TT-RSS, there was no error. The proxy feed worked! There must have been a delay before the proxy URL took effect. Thanks!

fox · June 17, 2018, 11:14am

well some people use reader apps, as opposed to something client-server like feedly or tt-rss

ginahoy · June 17, 2018, 7:21pm

@homlett, the feed43 home page says it can be used to create an RSS feed from any page. I could use something like that for the Associated Press news feeds. They discontinued RSS support last Fall. Instead. The AP news feeds are now presented as web pages (e.g., Top News: US & International Top News Stories Today | AP News). There was a plugin that fixed this but it no longer works (see this discussion).

I found a tutorial @ feed43 that explains how to set up the extraction rules. However, the AP uses JS to generate the article feed so there’s nothing to use for the extraction rules. Does that mean it’s not possible to create a proxy RSS feed in this case?

homlett · June 18, 2018, 12:22am

This kind of pages is a real pain. However, if you look for the requests made by the javascript (with the developers tools of your browser, tab “network”), you can found a json file with everything you need to build your feed:
https://afs-prod.appspot.com/api/v2/feed/tag?tags=apf-topnews

There is even the full text of the articles but with some ugly non html breaklines unfortunately.

Also, because the json file is quiet heavy, Feed43 keeps only a tiny part and you won’t be able to “extract” more than 9 entries. Which should be enough with a hard refresh rate I guess (depends on Feed43).

Building a complete feed from this json would be easy with a tool like Huginn of even a dedicated ttrss plugin. Anyway, here it is:

https://feed43.com/feed.html?name=ap-top-news for editing
https://feed43.com/ap-top-news.xml for subscribing

fox · June 18, 2018, 5:46am

luckily for all of us frontend developers are incapable of doing anything without a nice JSON provided by someone who has functional brain matter

ginahoy · June 18, 2018, 6:07am

Thanks @homlett. That works, within the limitations you mentioned. With the free version of feed43, the refresh rate is 6 hours. However. the AP Top News feed often produces a lot more than 9 feeds within that period, mostly repeats, so I’ll have to see how much it misses.

The author of the AP News plugin for TT-RSS replied to the other discussion I linked in my previous reply. He says the plugin still works, so we’re trying to figure out why @mamil and I are getting the "unable to download URL" error.

@mamil, is your instance of TT-RSS running on an EU server?

mamil · June 18, 2018, 7:19pm

Yes.
…

ginahoy · June 19, 2018, 6:32am

@mamil, so we can’t rule out GDPR. If so, I guess there’s not much to do about it other than stand up my own instance of TT-RSS here in the US. I’m afraid I’m not up for the technical challenge, nor do I have time to learn. Maybe next year. I recently built my first Linux box and I’ve had to spend way more time than I imagined learning how to do all the things I took for granted on my aging (but still productive) XP box.

@homlett, as I suspected, the feed43 proxy doesn’t pick up nearly all the AP top news articles. I guess my best option to to subscribe to the paid version, which updates every hour and has a larger page size allowance (250k vs 100k). Together those features should handle whatever AP throws at it. I really appreciate your effort to set that up!

ginahoy · July 27, 2018, 7:37pm

@homlett, the feed43 link you provided (https://feed43.com/4546230130273742.xml), no longer works. Since July 9 it only loads help wanted listings. I don’t understand how that’s possible, since the original URL (as shown in my original post) still works directly in my browser. Any idea what could cause the feed43 proxy to generate an entirely different feed, or how to fix? I appreciate your assistance!

homlett · July 27, 2018, 11:56pm

@ginahoy The feed source was referencing itself in an infinite loop for some reasons. I fixed it. You can edit here if needed:
http://feed43.com/feed.html?name=4546230130273742

ginahoy · July 28, 2018, 7:07pm

You can edit here if needed:
http://feed43.com/feed.html?name=4546230130273742

@homlett, when I opened that link, I noticed the URL is wrong… some site that lists oil and gas jobs (the original domain was tucson.com). That would explain the problem I’ve been having. The script worked perfectly until July 9. The only explanation I can think of is that the feed43 database got corrupted, causing two unrelated feed scripts to conflate.

I changed the URL back to the newspaper feed page but your original extraction rules and output format were overwritten by entries associated with the oil & gas jobs feed. I would be grateful if you could fix the script as I have no idea how to do that. I should have saved to my computer