What is updated if the GUID already exists?

Organizer · May 1, 2018, 4:45pm

I have a site that uses where an articles occasionally is getting messed up which makes me wonder how the process for updating an article work. I’d also like to know if the isPermaLink attribute makes any difference to TT as to better understand and to learn.

In short what I have in my TT DB is an entry with this headline, link and text:

Fortnite artık tüm zamanların en büyük ücretsiz konsol oyunu!
https://www.psoyun.com/oyun/sonic-mania-plus-cikis-tarihi.html
Fortnite, rekor kıran gelirleri ve aylık aktif kullanıcı sayısı ile resmi olarak tüm zamanların […]

Looking at the above you will notice that it does not match the text or headline (which is the problem). I can’t know for sure exactly what caused it so I am just looking to understand the TT process better.

Do Tiny ever update the headline of an article if it detects a change (with our without ?
Tiny always makes a content HASH to check for updates (both with and without )?
Tiny never check nor updates the the article link when even when is present?
In general does the isPermaLink false or true matter to how Tiny works?

I thought the core of the system is intended to help RSS readers understand it’s the same article as before and thereby allow all sort of corrections (headline, link and content) as without it would simply not know and thereby create the article as new one?

Any insight would super helpful

fox · May 1, 2018, 5:21pm

tt-rss identifies articles by GUIDs
all information related to specific GUID is updated if change is detected within originating feed
your site most likely reuses article GUIDs or they are otherwise not identifying the articles properly

if you want to get into details, the code is available on gogs. i’m not going to write you an essay on feed update process, sorry.

JustAMacUser · May 1, 2018, 5:57pm

For your convenience, here’s where to start in the code:

RSSUtils::update_rss_feed()

Organizer · May 3, 2018, 9:46am

@fox Cheers, no essay required, quick and short is all good. @JustAMacUser I am no coder but will try to look at the code as well, thank you.

As far as I checked the GUID is the same as that was what promoted this thread (it was even in the feed so I could see it and force refresh and observe) as the link was not updated.

Ref. point two; “all information is updated” seems not to happen in this case but if it should I will try to take a closer look.

Ref. point three “re-used an article GUID”; that is what I expect they did, but this a Wordpress site so the GUID system is highly likely not at fault. So a re-used (changed) article in this case as per point two should simply result in a full update of article data (link, headline, content) I’d presume.

Thanks.

Added info, this is what a debug shows and my confusions is around:

[09:49:50/13252] guid 1,https://www.psoyun.com/?p=5695 / SHA1:4113982db117c3f331e808f178be57d43244c4ad
[09:49:50/13252] orig date: 1525181706
[09:49:50/13252] date 1525181706 [2018/05/01 13:35:06]
[09:49:50/13252] title Fortnite artık tüm zamanların en büyük ücretsiz konsol oyunu!
[09:49:50/13252] link https://www.psoyun.com/oyun/fortnite-mart-2018.html
[09:49:50/13252] author Ozan Baki
[09:49:50/13252] num_comments: 0
[09:49:50/13252] looking for tags...
[09:49:50/13252] tags found: oyun,slider,fortnite,oyunlar,mart 2018
[09:49:50/13252] done collecting data.
[09:49:50/13252] article hash: 2dd8751fe6f59639bb621bffbd2179a08cd09e56 [stored=2dd8751fe6f59639bb621bffbd2179a08cd09e56]
[09:49:50/13252] stored article seems up to date [IID: 8484142], updating timestamp only

Per above debug output the link is fetched correctly and relates to fortnite, however in my DB I have the correct Fortnite headline, Fortnite content, but the article LINK is still that of a Sonic Mania article which I presume was what the article briefly was before the content got changed.

Update 2: Forcing a change of the title, triggers a new content hash but like before the LINK remains wrong in DB and is not updated.

[10:27:51/19992] guid 1,https://www.psoyun.com/?p=5695 / SHA1:4113982db117c3f331e808f178be57d43244c4ad
[10:27:51/19992] orig date: 1525181706
[10:27:51/19992] date 1525181706 [2018/05/01 13:35:06]
[10:27:51/19992] title xFortnite artık tüm zamanların en büyük ücretsiz konsol oyunu!
[10:27:51/19992] link https://www.psoyun.com/oyun/fortnite-mart-2018.html
[10:27:51/19992] author Ozan Baki
[10:27:51/19992] num_comments: 0
[10:27:51/19992] looking for tags...
[10:27:51/19992] tags found: oyun,slider,fortnite,oyunlar,mart 2018
[10:27:51/19992] done collecting data.
[10:27:51/19992] article hash: 2307168f88577c437715f13e58f0ae72c30f24d8 [stored=2dd8751fe6f59639bb621bffbd2179a08cd09e56]
[10:27:51/19992] hash differs, applying plugin filters:
[10:27:51/19992] ... Af_Readability
[10:27:51/19992] === 0.0000 (sec)
[10:27:51/19992] ... Af_Unburn
[10:27:51/19992] === 0.0000 (sec)
[10:27:51/19992] plugin data: af_readability,af_unburn,
[10:27:51/19992] matched filter rules: 
[10:27:51/19992] filter actions: 
[10:27:51/19992] article labels:
[10:27:51/19992] force catchup: 
[10:27:51/19992] base guid found, checking for user record
[10:27:51/19992] initial score: 0 [including plugin modifier: 0]
[10:27:51/19992] user record FOUND: RID: 8484142, IID: 6737434
[10:27:51/19992] resulting RID: 8484142, IID: 6737434
[10:27:51/19992] assigning labels [other]...
[10:27:51/19992] assigning labels [filters]...
[10:27:51/19992] looking for enclosures...
[10:27:51/19992] article enclosures:
Array
(
)
[10:27:51/19992] filtered article tags:
Array
(
    [0] => oyun
    [1] => slider
    [2] => fortnite
    [3] => oyunlar
    [4] => mart 2018
    [5] => çıkış tarihi
    [6] => sega
    [7] => sonic mania plus
)
[10:27:51/19992] article processed

I am hoping I’m just being stupid or have done something weird on my side, but at least i my mind this seems wrong that is not updating as indeed when we got a GUID we need to trust it.

Update 3: If I am right code wise this would be the part, which indeed if so is not updating the link and explains my issue (or am I missing something?)
https://git.tt-rss.org/fox/tt-rss/src/master/classes/rssutils.php#L986-L1006

fox · May 3, 2018, 11:31am

you’re right, looks like article link is the exception here. it should also be updated, i think.

Organizer · May 3, 2018, 11:55am

In cases where the < guid > is provided I’d say so from my point of view at least.

fox · May 3, 2018, 12:37pm

i think link is used as one of the candidates for guid generation if it is missing so treating link as volatile could cause duplicates (which could be more annoying than stale link on an article that had been changed)

in all honesty i’m leaning towards disallowing changing this because feeds which misuse guids are broken and should fix their shit instead: just make a new post, assholes. don’t reuse guids.

Organizer · May 3, 2018, 10:25pm

Well yes, when there’s no feed guid defined you use the link as guid internally. Though in those cases if the link (for any reason) change, you would be inserting it as new one seeing TT would not know that it’s just an update.

It’s not just in the case of a re-use of an article, but it also affect well behaved feeds (WordPress included, which is big portion of the internet - at least it feels like it) when they maybe correct the headline and thereby SEO after going live (which the GUID is supposed to handle).

Yes there are the idiots who misuse feed guids; mainly that don’t make them “unique”, but TT don’t handle that problem (as that;s their fault). I can’t at glance think of what could cause duplicates in light of feed guid (as mentioned duplicates should only happen when they don’t use it). Maybe there’s other with experience that can chime in with their two cents.

Personally I think core TT policy should be do what’s should be logical (fully update the existing entry), and if sites misuse the feed guid then that’s their problem. I’m updating my core file now and will see how it works out with my feeds.

JustAMacUser · May 4, 2018, 12:28am

Last I read WordPress runs about 30% of sites that use a content management system. Not most of the Internet, but still a lot. To the best of my knowledge, WordPress on its own does not change the guid after an article is published, though a user could override it for sure. Therefore the GUID would not change if the article title were later updated. But even if it does…

To the best of my knowledge the guid tag is optional and is there to provide aggregators a system for identifying a unique item. Declaring a guid is optional. Using the guid is optional. So at the end of the day, it doesn’t really matter.

When I read the spec I see that the isPermalink attribute of the guid tag is optional but defaults to true. That tells me the specification’s authors probably prefer the guid to be a permanent and unique link to the article in question; so falling back to the article link makes perfect sense.