Newspaper feed blocked in ttrss? [SOLVED]


#21

Thanks, Homlett. Unfortunately I still get the 451 error. I checked the location of feed43’s IP address and it’s in the US. Go figure.

EDIT: The next time I opened TT-RSS, there was no error. The proxy feed worked! There must have been a delay before the proxy URL took effect. Thanks!


#22

well some people use reader apps, as opposed to something client-server like feedly or tt-rss


#23

@homlett, the feed43 home page says it can be used to create an RSS feed from any page. I could use something like that for the Associated Press news feeds. They discontinued RSS support last Fall. Instead. The AP news feeds are now presented as web pages (e.g., https://www.apnews.com/tag/apf-topnews). There was a plugin that fixed this but it no longer works (see this discussion).

I found a tutorial @ feed43 that explains how to set up the extraction rules. However, the AP uses JS to generate the article feed so there’s nothing to use for the extraction rules. Does that mean it’s not possible to create a proxy RSS feed in this case?


#24

This kind of pages is a real pain. However, if you look for the requests made by the javascript (with the developers tools of your browser, tab “network”), you can found a json file with everything you need to build your feed:
https://afs-prod.appspot.com/api/v2/feed/tag?tags=apf-topnews

There is even the full text of the articles but with some ugly non html breaklines unfortunately.

Also, because the json file is quiet heavy, Feed43 keeps only a tiny part and you won’t be able to “extract” more than 9 entries. Which should be enough with a hard refresh rate I guess (depends on Feed43).

Building a complete feed from this json would be easy with a tool like Huginn of even a dedicated ttrss plugin. Anyway, here it is:

https://feed43.com/feed.html?name=ap-top-news for editing
https://feed43.com/ap-top-news.xml for subscribing


#25

luckily for all of us frontend developers are incapable of doing anything without a nice JSON provided by someone who has functional brain matter


#26

Thanks @homlett. That works, within the limitations you mentioned. With the free version of feed43, the refresh rate is 6 hours. However. the AP Top News feed often produces a lot more than 9 feeds within that period, mostly repeats, so I’ll have to see how much it misses.

The author of the AP News plugin for TT-RSS replied to the other discussion I linked in my previous reply. He says the plugin still works, so we’re trying to figure out why @mamil and I are getting the "unable to download URL" error.

@mamil, is your instance of TT-RSS running on an EU server?


#27

Yes.


#28

@mamil, so we can’t rule out GDPR. If so, I guess there’s not much to do about it other than stand up my own instance of TT-RSS here in the US. I’m afraid I’m not up for the technical challenge, nor do I have time to learn. Maybe next year. I recently built my first Linux box and I’ve had to spend way more time than I imagined learning how to do all the things I took for granted on my aging (but still productive) XP box.

@homlett, as I suspected, the feed43 proxy doesn’t pick up nearly all the AP top news articles. I guess my best option to to subscribe to the paid version, which updates every hour and has a larger page size allowance (250k vs 100k). Together those features should handle whatever AP throws at it. I really appreciate your effort to set that up!


#29

@homlett, the feed43 link you provided (https://feed43.com/4546230130273742.xml), no longer works. Since July 9 it only loads help wanted listings. I don’t understand how that’s possible, since the original URL (as shown in my original post) still works directly in my browser. Any idea what could cause the feed43 proxy to generate an entirely different feed, or how to fix? I appreciate your assistance!


#30

@ginahoy The feed source was referencing itself in an infinite loop for some reasons. I fixed it. You can edit here if needed:
http://feed43.com/feed.html?name=4546230130273742


#32

You can edit here if needed:
http://feed43.com/feed.html?name=4546230130273742

@homlett, when I opened that link, I noticed the URL is wrong… some site that lists oil and gas jobs (the original domain was tucson.com). That would explain the problem I’ve been having. The script worked perfectly until July 9. The only explanation I can think of is that the feed43 database got corrupted, causing two unrelated feed scripts to conflate.

I changed the URL back to the newspaper feed page but your original extraction rules and output format were overwritten by entries associated with the oil & gas jobs feed. I would be grateful if you could fix the script as I have no idea how to do that. I should have saved to my computer :slightly_frowning_face: