-
Notifications
You must be signed in to change notification settings - Fork 22
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
src/dict/dictionary: better filter mecab and deconjugator duplicates
Better filter queries that are produced by both the deconjugator and MeCab. Previously MeCab results were generally not filtered if they were duplicates of deconjugator results. This would lead to duplicate search results where some would have conjugation information and some would not. This is generally undesirable. This changes the behavior to filter out MeCab results that are duplicates of Deconjugator results when compiled with MECAB_SUPPORT. The filtering algorithm is not trivial. This required adding metadata indicating the source alogrithm that each came from. If this had not been done, it would have been difficult to discriminate MeCab queries from Exact queries. The algorithm consists of finding every deconj string that the deconjugator came up with. Then it tosses any MeCab queries that have a deconj string that the deconjugator already found. This is unforunteately not behavior that can be implemented with std::unqiue easily.
- Loading branch information
Showing
6 changed files
with
50 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters