Tiny Tiny RSS: Community

Filtering duplicates


#1

From what I’m gathering from searches, I’m guessing the only way to filter out duplicate posts from feeds is to use PostgreSQL and enable the plugin? I’m currently using MySQL for a few different things and would like to avoid migrating everything to a new DB server.


Elimination of duplicates
#2

“duplicate posts” can mean any number of things, try being more specific.

tt-rss has built-in deduplication which works by article IDs, this is supported on any database and works automatically. there’s also a postgres-specific plugin which works on similar titles. note how “similar title” =/= “duplicate post”.

regardless of whether you need tt-rss postgres-specific functionality upgrading to a real database server is never a bad idea tbh


Possible to get same title merged?
#3

I guess to be more specific. I have a few different RSS feeds. I have about 4 or 5 feeds from two different sites for example, Woot and SlickDeals (there’s more but that’s just a small example). For example SlickDeals has their FrontPage Deals and their Popular Deals feeds. Often identical posts will be on both feeds. I’m guessing tt-rss doesn’t pick them up because they’re on different feeds, even though the feeds originate from the same site. And I’m also guessing this is where the plugin for PostgreSQL would come in handy.


#4

if they use different article IDs they won’t be detected as duplicates, yes. feeds don’t actually matter, each article should have a globally unique identifier. it makes sense for same articles to have a same ID even if the feeds are different.

postgresql plugin would help but it would be better if the site in question generated the feeds properly in the first place.

also, you can make a very simple plugin which would work on those specific feeds and go by content or exact title match, i guess.


#5

Yeah that’s what I figured as much. I’m guessing the article ID’s are unique even though it’s the exact same post. I haven’t looked closely at it considering I’m a n00b to tt-rss moving away from Innoreader (and previously a couple of others).

I might wind up just biting the bullet and moving to PostgreSQL. Everything I’ve read regarding what I’m using mysql for says it’ll perform better on postgre. Guess I’ll be taking some snapshots and do some testing in my future.


#6

By “exact same post” I mean the same headline/subject/body even url associated to it.


#7

btw the plugin works on titles only, so if you need exact content matching it’s not going to be of huge help.


#8

Yeah that’s fine. In the cases I’m looking at (multiple feeds from single sites), the titles are what are identical so I should be good.