Archiving/splitting the database


#1

If I would want to archive part of my database to lighten the server load (let’s say archiving pre-2017 data into another database for offline access only). What would the basic principle be for that as to ensure full database integrity and everything still working smoothly?

For the “archive version” would it essentially be to dump all tables and then simply delete ttrss_enrtries based the “update” field and keep all other tables “as is” ? And similarly for the active database, just delete older entries based on the “updated” date and still have a perfectly working system?

Would something like this be possible to “restore/merge” again at one point pending I’ve not touched all the existing relations for the data moved out or will Tiny clean some of that up (making it impossible)?

Setup: TTRSSv17.12 (1ddf3a2), Ubuntu 14.04.5 LTS‬, PHP 7.0.27, PostgreSQL 9.6.6, curl 7.35.0, 2000+ feeds.


#2

the answer you don’t want to hear is that you should stop hoarding useless transient data your hardware literally can’t even handle anymore.

that said, if you just can’t stop hoarding, your best bet is doing something like this: every year you create a new independent database and copy over the schema with relevant data (i.e. your user record, feeds and preferences) while excluding actual feed entries. then you just switch tt-rss between those databases as needed. this way you could also do stuff like dump older backups for archival and restore on your live database server when needed only.

trying to split a relational database with multiple interdependent constraints into separate parts is a fool’s errand.


#3

Thanks for the update by the way, I wish I was not a horder but unfortunately I need to be able to run statistics on several years of data (which can require analyzing the content of articles as well). So it’s valuable data to me.

As you said I likely in my case just need to dump the data in a more simple way to achieve this. I think the DB is about 70 GB and it seems generally fine, that said searching it can become slow (might need to look into Sphinx soon).


Use less traffic & Do not embed images config options
#4


#5

@shabble he-he, all considered though it seems to be running just fine beyond certain badly written queries (which I am good at) and some full text searches if grabbing more than 30 days of data.I don’t know what if any limits there are to PGSQL and what to expect if it does grow to 100GB and beyond (or if it generally if the hardware is good enough that should be just fine).


#6

There can be some issues. One I had was the following:

Another (probably related) was it used to take ages to add new feeds, which sped up again when I’d removed a lot of cruft from the DB.