Tiny Tiny RSS: Community

Getting data on "inactive" feeds


#1

I’ve for a long time been using the following query to get some data on “inactive feeds”. However, I am currently getting a timeout from the server after a minute or so. I know it is heavy with my data, but I used to run 3 in a row in the same PHP file that would do a 24h, 1 week and a 1 month check and save the results (without timeout).

Suddenly a week ago even running a single of them gives me a timeout. Any suggesting on debugging/repairing or improving the speed of the query welcome.

SELECT ttrss_feeds.title, ttrss_feeds.site_url,
ttrss_feeds.feed_url, ttrss_feeds.id, MAX(updated) AS last_article
FROM ttrss_feeds, ttrss_entries, ttrss_user_entries 
WHERE
  ( SELECT MAX(updated) FROM ttrss_entries, ttrss_user_entries 
   WHERE ttrss_entries.id = ref_id 
   AND ttrss_user_entries.feed_id = ttrss_feeds.id) < NOW() - INTERVAL '24 hours'
  
  AND ttrss_feeds.owner_uid = 1 
  AND ttrss_user_entries.feed_id = ttrss_feeds.id 
  AND ttrss_entries.id = ref_id
  AND ttrss_feeds.update_interval <> '-1'
  
  GROUP BY ttrss_feeds.title, ttrss_feeds.id, ttrss_feeds.site_url, ttrss_feeds.feed_url
  ORDER BY last_article

TTRSSv17.12 (1ddf3a2), Ubuntu 14.04.5 LTS‬, PHP 7.0.27, PostgreSQL 9.6.6, curl 7.35.0, 2000+ feeds


#2

i suggest starting with postgresql explain analyze statement


#3

Explain Analyze wont work seeing it times out via PHPPGADMIN at least (I will try via SSH later today and update). Just explain wise for now:

Sort  (cost=103792154.63..103792156.65 rows=805 width=87)
  Sort Key: (max(ttrss_entries.updated))
  ->  GroupAggregate  (cost=2572256.22..103792115.78 rows=805 width=87)
        Group Key: ttrss_feeds.id
        ->  Merge Join  (cost=2572256.22..103783206.08 rows=1780330 width=87)
              Merge Cond: (ttrss_feeds.id = ttrss_user_entries.feed_id)
              ->  Index Scan using ttrss_feeds_pkey on ttrss_feeds  (cost=0.28..101526787.18 rows=805 width=79)
                    Filter: ((update_interval <> '-1'::integer) AND (owner_uid = 1) AND ((SubPlan 1) < (now() - '24:00:00'::interval)))
                    SubPlan 1
                      ->  Aggregate  (cost=37655.87..37655.88 rows=1 width=8)
                            ->  Nested Loop  (cost=61.72..37647.85 rows=3207 width=8)
                                  ->  Bitmap Heap Scan on ttrss_user_entries ttrss_user_entries_1  (cost=61.29..11076.63 rows=3207 width=4)
                                        Recheck Cond: (feed_id = ttrss_feeds.id)
                                        ->  Bitmap Index Scan on ttrss_user_entries_feed_id  (cost=0.00..60.48 rows=3207 width=0)
                                              Index Cond: (feed_id = ttrss_feeds.id)
                                  ->  Index Scan using ttrss_entries_pkey on ttrss_entries ttrss_entries_1  (cost=0.43..8.28 rows=1 width=12)
                                        Index Cond: (id = ttrss_user_entries_1.ref_id)
              ->  Materialize  (cost=2572255.94..2602068.17 rows=5962447 width=12)
                    ->  Sort  (cost=2572255.94..2587162.05 rows=5962447 width=12)
                          Sort Key: ttrss_user_entries.feed_id
                          ->  Hash Join  (cost=260715.06..1697459.86 rows=5962447 width=12)
                                Hash Cond: (ttrss_entries.id = ttrss_user_entries.ref_id)
                                ->  Seq Scan on ttrss_entries  (cost=0.00..1236432.60 rows=5947160 width=12)
                                ->  Hash  (cost=162893.47..162893.47 rows=5962447 width=8)
                                      ->  Seq Scan on ttrss_user_entries  (cost=0.00..162893.47 rows=5962447 width=8)
25 row(s)

Total runtime: 7.936 ms

#4

we should probably summon @SleeperService here

i’m not really any good at this but i can suggest either logically splitting the query or adding harder limits to the outer parts to make explain analyze work so that you can figure out the actual bottleneck

i.e. filter out a smaller subset of articles to work with either by id or date so it would be indexed and fast and then go from there


#5

Huh? what? I was napping.

At this point, I don’t know what’s going on with Organizer, when I run the same query above, I get:

explain analyze SELECT ttrss_feeds.title, ttrss_feeds.site_url,
ttrss_feeds.feed_url, ttrss_feeds.id, MAX(updated) AS last_article
FROM ttrss_feeds, ttrss_entries, ttrss_user_entries 
WHERE
  ( SELECT MAX(updated) FROM ttrss_entries, ttrss_user_entries 
   WHERE ttrss_entries.id = ref_id 
   AND ttrss_user_entries.feed_id = ttrss_feeds.id) < NOW() - INTERVAL '24 hours'
  
  AND ttrss_feeds.owner_uid = 1 
  AND ttrss_user_entries.feed_id = ttrss_feeds.id 
  AND ttrss_entries.id = ref_id
  AND ttrss_feeds.update_interval <> '-1'
  
  GROUP BY ttrss_feeds.title, ttrss_feeds.id, ttrss_feeds.site_url, ttrss_feeds.feed_url
  ORDER BY last_article;
                                                                                           QUERY PLAN                                                                                            
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=594.15..594.15 rows=1 width=112) (actual time=0.057..0.057 rows=0 loops=1)
   Sort Key: (max(ttrss_entries.updated))
   Sort Method: quicksort  Memory: 25kB
   ->  GroupAggregate  (cost=593.25..594.14 rows=1 width=112) (actual time=0.034..0.034 rows=0 loops=1)
         Group Key: ttrss_feeds.id
         ->  Sort  (cost=593.25..593.54 rows=117 width=112) (actual time=0.031..0.031 rows=0 loops=1)
               Sort Key: ttrss_feeds.id
               Sort Method: quicksort  Memory: 25kB
               ->  Nested Loop  (cost=0.98..589.23 rows=117 width=112) (actual time=0.013..0.013 rows=0 loops=1)
                     ->  Nested Loop  (cost=0.56..513.49 rows=117 width=108) (actual time=0.012..0.012 rows=0 loops=1)
                           ->  Index Scan using ttrss_feeds_owner_uid_index on ttrss_feeds  (cost=0.27..395.98 rows=1 width=104) (actual time=0.011..0.011 rows=0 loops=1)
                                 Index Cond: (owner_uid = 1)
                                 Filter: ((update_interval <> '-1'::integer) AND ((SubPlan 1) < (now() - '24:00:00'::interval)))
                                 SubPlan 1
                                   ->  Aggregate  (cost=393.67..393.68 rows=1 width=8) (never executed)
                                         ->  Nested Loop  (cost=0.70..393.37 rows=118 width=8) (never executed)
                                               ->  Index Scan using ttrss_user_entries_feed_id on ttrss_user_entries ttrss_user_entries_1  (cost=0.29..116.34 rows=118 width=4) (never executed)
                                                     Index Cond: (feed_id = ttrss_feeds.id)
                                               ->  Index Scan using ttrss_entries_pkey on ttrss_entries ttrss_entries_1  (cost=0.41..2.35 rows=1 width=12) (never executed)
                                                     Index Cond: (id = ttrss_user_entries_1.ref_id)
                           ->  Index Scan using ttrss_user_entries_feed_id on ttrss_user_entries  (cost=0.29..116.34 rows=118 width=8) (never executed)
                                 Index Cond: (feed_id = ttrss_feeds.id)
                     ->  Index Scan using ttrss_entries_pkey on ttrss_entries  (cost=0.41..0.65 rows=1 width=12) (never executed)
                           Index Cond: (id = ttrss_user_entries.ref_id)
 Planning time: 2.341 ms
 Execution time: 0.237 ms
(26 rows)

Time: 4.650 ms

Though it did take 26ms the first time, but gave no result.


#6

@SleeperService I guess in your case all feeds had been active last 24 hour as far as zero results? I’ll checking from SSH this weekend, see if mine executes at all (if given more time).

Update: I did revisit when this happened, it seems it has been 3 months since it worked consistently. While the DB grows slowly maybe the overall script even for a single check quickly got too heavy with this approach. My query currently returns 1183 rows (as in feeds with inactivity last 24 hours)

Could or would a full re-indexing be relevant tor is that not so much an required thing. Not sure if that would fix anything or if it is even likely anything is corrupt in this case.

Updated with full explain analyse via psql cmd:

explain analyze SELECT ttrss_feeds.title, ttrss_feeds.site_url,
ttrss_feeds.feed_url, ttrss_feeds.id, MAX(updated) AS last_article
FROM ttrss_feeds, ttrss_entries, ttrss_user_entries
WHERE
  ( SELECT MAX(updated) FROM ttrss_entries, ttrss_user_entries
   WHERE ttrss_entries.id = ref_id
   AND ttrss_user_entries.feed_id = ttrss_feeds.id) < NOW() - INTERVAL '24 hours'

  AND ttrss_feeds.owner_uid = 1
  AND ttrss_user_entries.feed_id = ttrss_feeds.id
  AND ttrss_entries.id = ref_id
  AND ttrss_feeds.update_interval <> '-1'

  GROUP BY ttrss_feeds.title, ttrss_feeds.id, ttrss_feeds.site_url, ttrss_feeds.feed_url
  ORDER BY last_article;
                                                                                          QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=102028908.87..102028910.88 rows=805 width=87) (actual time=77173.156..77173.327 rows=1178 loops=1)
   Sort Key: (max(ttrss_entries.updated))
   Sort Method: quicksort  Memory: 229kB
   ->  GroupAggregate  (cost=2575439.77..102028870.02 rows=805 width=87) (actual time=15261.880..77171.259 rows=1178 loops=1)
         Group Key: ttrss_feeds.id
         ->  Merge Join  (cost=2575439.77..102019937.07 rows=1784979 width=87) (actual time=14526.267..77103.022 rows=621493 loops=1)
               Merge Cond: (ttrss_feeds.id = ttrss_user_entries.feed_id)
               ->  Index Scan using ttrss_feeds_pkey on ttrss_feeds  (cost=0.28..99753563.08 rows=805 width=79) (actual time=2084.407..61979.092 rows=1178 loops=1)
                     Filter: ((update_interval <> '-1'::integer) AND (owner_uid = 1) AND ((SubPlan 1) < (now() - '24:00:00'::interval)))
                     Rows Removed by Filter: 1518
                     SubPlan 1
                       ->  Aggregate  (cost=36998.14..36998.15 rows=1 width=8) (actual time=25.653..25.653 rows=1 loops=2415)
                             ->  Nested Loop  (cost=61.26..36990.27 rows=3148 width=8) (actual time=0.566..25.410 rows=2400 loops=2415)
                                   ->  Bitmap Heap Scan on ttrss_user_entries ttrss_user_entries_1  (cost=60.83..10894.19 rows=3148 width=4) (actual time=0.551..6.586 rows=2400 loops=2415)
                                         Recheck Cond: (feed_id = ttrss_feeds.id)
                                         Heap Blocks: exact=3474378
                                         ->  Bitmap Index Scan on ttrss_user_entries_feed_id  (cost=0.00..60.04 rows=3148 width=0) (actual time=0.329..0.329 rows=2402 loops=2415)
                                               Index Cond: (feed_id = ttrss_feeds.id)
                                   ->  Index Scan using ttrss_entries_pkey on ttrss_entries ttrss_entries_1  (cost=0.43..8.28 rows=1 width=12) (actual time=0.007..0.008 rows=1 loops=5795417)
                                         Index Cond: (id = ttrss_user_entries_1.ref_id)
               ->  Materialize  (cost=2575439.49..2605329.57 rows=5978015 width=12) (actual time=12432.244..14432.749 rows=5972769 loops=1)
                     ->  Sort  (cost=2575439.49..2590384.53 rows=5978015 width=12) (actual time=12432.240..13866.672 rows=5972769 loops=1)
                           Sort Key: ttrss_user_entries.feed_id
                           Sort Method: external merge  Disk: 128528kB
                           ->  Hash Join  (cost=261396.34..1698246.99 rows=5978015 width=12) (actual time=2106.035..9119.963 rows=5972769 loops=1)
                                 Hash Cond: (ttrss_entries.id = ttrss_user_entries.ref_id)
                                 ->  Seq Scan on ttrss_entries  (cost=0.00..1236443.55 rows=5948255 width=12) (actual time=0.013..3789.900 rows=5972688 loops=1)
                                 ->  Hash  (cost=163319.15..163319.15 rows=5978015 width=8) (actual time=2104.735..2104.735 rows=5972769 loops=1)
                                       Buckets: 131072  Batches: 128  Memory Usage: 2850kB
                                       ->  Seq Scan on ttrss_user_entries  (cost=0.00..163319.15 rows=5978015 width=8) (actual time=0.009..1107.206 rows=5972769 loops=1)
 Planning time: 4.742 ms
 Execution time: 77196.137 ms
(32 rows)