How to help tiny tiny rss developement?

Hello.
I’m a DBA (10 years of experience) and I’ve noticed that many queries can be improved in order to run faster and use less rows.
I want to know how to contact devs and maybe bring my experience with them :slight_smile:

Example, by rewriting a simple update query from :

explain UPDATE ttrss_user_entries SET unread = false, last_read = NOW() WHERE ref_id IN (SELECT id FROM (SELECT DISTINCT id FROM ttrss_entries, ttrss_user_entries WHERE ref_id = id AND owner_uid = '1' AND unread = true AND feed_id = '31' AND true AND true) as tmp);
+------+--------------------+--------------------+----------------+---------------------------------------+---------+---------+--------------------------------+---------+------------------------------+
| id   | select_type        | table              | type           | possible_keys                         | key     | key_len | ref                            | rows    | Extra                        |
+------+--------------------+--------------------+----------------+---------------------------------------+---------+---------+--------------------------------+---------+------------------------------+
|    1 | PRIMARY            | ttrss_user_entries | index          | NULL                                  | PRIMARY | 4       | NULL                           | 1068183 | Using where                  |
|    2 | DEPENDENT SUBQUERY | <derived3>         | index_subquery | key0                                  | key0    | 4       | func                           | 1       |                              |
|    3 | DERIVED            | ttrss_user_entries | ref            | feed_id,owner_uid,idx_tue_1,idx_tue_2 | feed_id | 5       | const                          | 264     | Using where; Using temporary |
|    3 | DERIVED            | ttrss_entries      | eq_ref         | PRIMARY                               | PRIMARY | 4       | rss2.ttrss_user_entries.ref_id | 1       | Using index                  |
+------+--------------------+--------------------+----------------+---------------------------------------+---------+---------+--------------------------------+---------+------------------------------+
4 rows in set (0.001 sec)

to:

explain UPDATE ttrss_user_entries SET unread = false, last_read = NOW() WHERE ref_id IN ( SELECT te.id FROM ttrss_entries te inner join ttrss_user_entries tue on tue.ref_id = te.id  WHERE tue.owner_uid = '1' AND tue.unread = true AND tue.feed_id = '31');
+------+-------------+--------------------+--------+---------------------------------------+-----------+---------+-----------------+------+------------------------------+
| id   | select_type | table              | type   | possible_keys                         | key       | key_len | ref             | rows | Extra                        |
+------+-------------+--------------------+--------+---------------------------------------+-----------+---------+-----------------+------+------------------------------+
|    1 | PRIMARY     | tue                | ref    | feed_id,owner_uid,idx_tue_1,idx_tue_2 | feed_id   | 5       | const           | 264  | Using where; Start temporary |
|    1 | PRIMARY     | ttrss_user_entries | ref    | idx_tue_1,idx_tue_2                   | idx_tue_1 | 4       | rss2.tue.ref_id | 1    |                              |
|    1 | PRIMARY     | te                 | eq_ref | PRIMARY                               | PRIMARY   | 4       | rss2.tue.ref_id | 1    | Using index; End temporary   |
+------+-------------+--------------------+--------+---------------------------------------+-----------+---------+-----------------+------+------------------------------+
3 rows in set (0.001 sec)

On MariaDB the best query is:

 EXPLAIN UPDATE ttrss_user_entries tue
    -> INNER JOIN ttrss_entries te on te.id = tue.ref_id
    -> SET tue.unread = false, tue.last_read = NOW()
    -> WHERE tue.owner_uid = '1' AND tue.unread = true AND tue.feed_id = '31';
+------+-------------+-------+--------+---------------------------------------+---------+---------+-----------------+------+-------------+
| id   | select_type | table | type   | possible_keys                         | key     | key_len | ref             | rows | Extra       |
+------+-------------+-------+--------+---------------------------------------+---------+---------+-----------------+------+-------------+
|    1 | SIMPLE      | tue   | ref    | feed_id,owner_uid,idx_tue_1,idx_tue_2 | feed_id | 5       | const           | 264  | Using where |
|    1 | SIMPLE      | te    | eq_ref | PRIMARY                               | PRIMARY | 4       | rss2.tue.ref_id | 1    | Using index |
+------+-------------+-------+--------+---------------------------------------+---------+---------+-----------------+------+-------------+
2 rows in set (0.001 sec)

264 rows instead of 1068183

Nice find.

I’m interested in helping out, too, although my professional experience is mostly with Oracle, but I think it still applies to other RDBMS’s, so I’ll add some thoughts to this thread.

I’ve peeked at the application code a little, and these statements appear as hard-coded SQL rather than generated from an ORM, so it shouldn’t be too hard to change specific queries.

This pattern of using SELECT id FROM (SELECT DISTINCT id occurs in a few places throughout classes/feeds.php. In these scenarios, it may be enough to just remove the DISTINCT keyword to reduce the amount of rows visited. Logically, the DISTINCT keyword doesn’t affect the rows being altered/projected in the outer query, but it forces the inner query to retrieve all of those rows to then sort the values and remove duplicates.

while I welcome your enthusiasm, I must say that I’m not interested in any mysql specific code or optimizations in ttrss.

tldr:

  • adding mysql specific query where there wasn’t one before - no
  • changing something resulting in pgsql working worse - no
  • changing stuff so pgsql and optionally mysql works better - yes

that’s the gist of it

Yes we are talking about just rewriting MySQL queries.
It can dramatically improve the retrieval of data and updating them.
I have 1 million rows in the entries tabled after switching to ttrss and importing data from another rss reader

If there is a mysql variant in place already, sure making it better is a great idea.

you probably would want to register on gogs etc - see contributing.md

e: I should also ask to keep commits separated logically I. e. don’t change 20 queries at once and make one commit

This change should help execution time regardless of the database used. Here are the relevant plans for postgres, run against a test db with ~4k feed entries loaded.

postgres=# explain analyze UPDATE ttrss_user_entries SET unread = false, last_read = NOW() WHERE ref_id IN (SELECT id FROM (SELECT DISTINCT id FROM ttrss_entries, ttrss_user_entries WHERE ref_id = id AND owner_uid = '1' AND unread = true AND feed_id = '31' AND true AND true) as tmp);
                                                                                                QUERY PLAN                                         
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Update on ttrss_user_entries  (cost=12.89..20.94 rows=1 width=143) (actual time=25.670..25.670 rows=0 loops=1)
   ->  Nested Loop  (cost=12.89..20.94 rows=1 width=143) (actual time=25.667..25.668 rows=0 loops=1)
         ->  Subquery Scan on tmp  (cost=12.61..12.63 rows=1 width=32) (actual time=25.666..25.666 rows=0 loops=1)
               ->  Unique  (cost=12.61..12.62 rows=1 width=4) (actual time=25.663..25.663 rows=0 loops=1)
                     ->  Sort  (cost=12.61..12.61 rows=1 width=4) (actual time=25.661..25.661 rows=0 loops=1)
                           Sort Key: ttrss_entries.id
                           Sort Method: quicksort  Memory: 25kB
                           ->  Nested Loop  (cost=0.56..12.60 rows=1 width=4) (actual time=25.387..25.387 rows=0 loops=1)
                                 ->  Index Scan using ttrss_user_entries_feed_id on ttrss_user_entries ttrss_user_entries_1  (cost=0.28..8.30 rows=1 width=4) (actual time=25.384..25.384 rows=0 loops=1)
                                       Index Cond: (feed_id = 31)
                                       Filter: (unread AND (owner_uid = 1))
                                 ->  Index Only Scan using ttrss_entries_pkey on ttrss_entries  (cost=0.28..4.30 rows=1 width=4) (never executed)
                                       Index Cond: (id = ttrss_user_entries_1.ref_id)
                                       Heap Fetches: 0
         ->  Index Scan using ttrss_user_entries_ref_id_index on ttrss_user_entries  (cost=0.28..8.30 rows=1 width=106) (never executed)
               Index Cond: (ref_id = tmp.id)
 Planning Time: 6.106 ms
 Execution Time: 26.542 ms
(18 rows)

postgres=# explain analyze UPDATE ttrss_user_entries SET unread = false, last_read = NOW() WHERE ref_id IN (SELECT id FROM ttrss_entries, ttrss_user_entries WHERE ref_id = id AND owner_uid = '1' AND unread = true AND feed_id = '31' AND true AND true);
                                                                                         QUERY PLAN                                                
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Update on ttrss_user_entries  (cost=16.88..17.00 rows=1 width=127) (actual time=0.018..0.018 rows=0 loops=1)
   ->  Nested Loop  (cost=16.88..17.00 rows=1 width=127) (actual time=0.017..0.017 rows=0 loops=1)
         Join Filter: (ttrss_user_entries.ref_id = ttrss_user_entries_1.ref_id)
         ->  HashAggregate  (cost=16.60..16.61 rows=1 width=20) (actual time=0.016..0.016 rows=0 loops=1)
               Group Key: ttrss_entries.id
               ->  Nested Loop  (cost=0.56..16.60 rows=1 width=20) (actual time=0.014..0.015 rows=0 loops=1)
                     ->  Index Scan using ttrss_user_entries_feed_id on ttrss_user_entries ttrss_user_entries_1  (cost=0.28..8.30 rows=1 width=10) (actual time=0.014..0.014 rows=0 loops=1)
                           Index Cond: (feed_id = 31)
                           Filter: (unread AND (owner_uid = 1))
                     ->  Index Scan using ttrss_entries_pkey on ttrss_entries  (cost=0.28..8.30 rows=1 width=10) (never executed)
                           Index Cond: (id = ttrss_user_entries_1.ref_id)
         ->  Index Scan using ttrss_user_entries_ref_id_index on ttrss_user_entries  (cost=0.28..0.36 rows=1 width=106) (never executed)
               Index Cond: (ref_id = ttrss_entries.id)
 Planning Time: 1.351 ms
 Execution Time: 0.160 ms
(15 rows)

Just by removing the unnecessary DISTINCT the new plan can skip the costly Unique/Sort steps and will execute much quicker. On larger databases this difference would become even more stark.

I appreciate this thread. Thanks everyone. However, could someone maybe change its title since it’s about SQL rather than to help dev in general? TIA!