Commit graph

6 commits

Author SHA1 Message Date
David Yip
74141adca1 Move KeywordMute into Glitch namespace.
There are two motivations for this:

1. It looks like we're going to add other features that require
   server-side storage (e.g. user notes).

2. Namespacing glitchsoc modifications is a good idea anyway: even if we
   do not end up doing (1), if upstream introduces a keyword-mute feature
   that also uses a "KeywordMute" model, we can avoid some merge
   conflicts this way and work on the more interesting task of
   choosing which implementation to use.
2017-10-21 14:54:36 -05:00
David Yip
05ee0aeb8b Allow keywords to match either substrings or whole words.
Word-boundary matching only works as intended in English and languages
that use similar word-breaking characters; it doesn't work so well in
(say) Japanese, Chinese, or Thai.  It's unacceptable to have a feature
that doesn't work as intended for some languages.  (Moreso especially
considering that it's likely that the largest contingent on the Mastodon
bit of the fediverse speaks Japanese.)

There are rules specified in Unicode TR29[1] for word-breaking across
all languages supported by Unicode, but the rules deliberately do not
cover all cases.  In fact, TR29 states

    For example, reliable detection of word boundaries in languages such
    as Thai, Lao, Chinese, or Japanese requires the use of dictionary
    lookup, analogous to English hyphenation.

So we aren't going to be able to make word detection work with regexes
within Mastodon (or glitchsoc).  However, for a first pass (even if it's
kind of punting) we can allow the user to choose whether they want word
or substring detection and warn about the limitations of this
implementation in, say, docs.

[1]: https://unicode.org/reports/tr29/
     https://web.archive.org/web/20171001005125/https://unicode.org/reports/tr29/
2017-10-21 14:54:36 -05:00
David Yip
2c11e95efd Fix case-insensitive match scenario; test some word ornamentation. #164. 2017-10-21 14:54:36 -05:00
David Yip
a3ee8592a8 Rework KeywordMute interface to use a matcher object; spec out matcher. #164.
A matcher object that builds a match from KeywordMute data and runs it
over text is, in my view, one of the easier ways to write examples for
this sort of thing.
2017-10-21 14:54:36 -05:00
David Yip
ddcb129101 Spec out KeywordMute interface. #164. 2017-10-21 14:54:21 -05:00
David Yip
c123b710ad Add KeywordMute model.
Gist of the proposed keyword mute implementation:

Keyword mutes are represented server-side as one keyword per record.
For each account, there exists a keyword regex that is generated as one
big alternation of all keywords.  This regex is cached (in Redis, I
guess) so we can quickly get it when filtering in FeedManager.
2017-10-21 14:53:41 -05:00