CREATE EXTENSION zhparser;
CREATE TEXT SEARCH CONFIGURATION chinese_zh (PARSER = zhparser);
ALTER TEXT SEARCH CONFIGURATION chinese_zh ADD MAPPING FOR n,v,a,i,e,l,t WITH simple;
I found a problem when debugging with the extension.
I tired
select title from ttrss_entries where to_tsvector( content) @@ plainto_tsquery('simple','底层支撑');
directly on postgresql, and there is nothing found. This result is as expected, since “底层支撑” is not found as a single word in the contents.
Then I tried
SET default_text_search_config = 'Chinese_zh'
select title from ttrss_entries where to_tsvector( content) @@ plainto_tsquery('底层支撑');
and there are some results. The results are as expected, since ‘底层支撑’ is now split into “底层” and “支撑” by the extension, and there are documents that contains these two words.
However, when I try it with the search dialog on tt-rss side, there is nothing found. I also tried selecting “Chinese_zh” on the list of search languages, but it doesn’t make any difference.
I wonder what could be the cause of the problem. Can I change the default search language settings of tt-rss?
you can try adding some debugging or enabling query logging in postgresql to see what exactly does tt-rss generate
i’d say its either case conversions or tsvector_combined not being filled correctly for this language (you might need to set per-feed language in feed editor) AND run it through feed debugger with force rehash afterwards so that index updates
try running searches against tsvector_combined instead of to_tsvector(content) because that’s what tt-rss does for performance reasons.
yes, index is updated when articles are processed, so unless you run the feed through feed debugger it will only apply to articles which were added afterwards
you can easily rebuild the index through postgresql console btw
update ttrss_entries set tsvector_combined = to_tsvector('Chinese_zh', content);
this will update everything for Chinese_zh, you’ll need to limit the query for specific feeds if you want to
i suggest going through feed debugger instead, because there’s a possibility of other minor issues if you try to create tsvector index from complete articles (it’s length-limited so you may have some errors, etc)