Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Searching with non-latin characters in English sources is not working #1223

Open
sananjalka opened this issue Oct 3, 2024 · 4 comments
Open
Assignees
Milestone

Comments

@sananjalka
Copy link

Hello.

I am on Kiwix 2.3.1. on Linux MInt 21.2

My problem is that in the zim files of English Wikipedia or English Wiktionary searching any word with any non-latin characters returns no results. This applies to searching inside the application and in server mode in browser.

Searching with non-latin characters works however for example in Finnish or Swedish language zim files.

@kelson42
Copy link
Collaborator

kelson42 commented Oct 4, 2024

@sananjalka Can you please share an example, what you get and what you expect?

@sananjalka
Copy link
Author

Allright, I started testing this to screenshot and show examples and I ran into interesting behaviour. If I for example quickly type "Düsseldorf" (where the non-latin letter is the "ü", and press enter, I get the following screen:

Image

But if I first type "Düsseldorf" and wait a couple of seconds, I get suggestions in the search bar:

Image

If I then press enter after getting the suggestions, I get into the wikipedia page "Düsseldorf".

I did not expect to have to wait after typing before pressing enter to get results. Especially when this behaviour does not happen when searching for English-language words with latin-only letters. Is this a known phenomenon?

@sananjalka
Copy link
Author

After this I noticed that even if I wait for those suggestions to show up, selecting "Düsseldorf (Fulltext search) yields no results. But if I type "London", select "London (Fulltext search) and press enter, I get a list of pages that contain the word "London" (although the list is not very long, so it possibly can't contain all the pages containing that word.

@sananjalka
Copy link
Author

sananjalka commented Oct 5, 2024

Testing in Wiktionary-En:

If I enter the search string "head" and immediately press enter, I get a full-text search page of articles which contain the word "head", including the article "head".

If I enter the search string "head" and wait a couple of seconds, I get the following recommendations, and if I press enter after those having shown up, I get directly into the article "Head".
Image

On the other hand, if I enter the search string "pää" and immediately press enter, I get the following result:
Image

If I enter the search string "pää" and wait a couple of seconds, I get the following recommendations:
Image
First of all, they display the letter "ä" as some kind of code. But even dismissing that, interestingly enough, the article "pää" (which exists) is not in that recommendation list at all, even though it would be the most accurate and simple result. In the recommendation list are idioms that contain the word "pää", but not the article for the word "pää" itself.

As with Wikipedia-En, selecting the fulltext search option from the recommendations for the word with non-latin letters yields no results.

@kelson42 kelson42 added this to the 2.5.0 milestone Oct 5, 2024
@kelson42 kelson42 self-assigned this Oct 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants