Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Files uploaded from MacOS with umlauts in names cannot be accessed #72

Open
Tronic opened this issue Nov 11, 2022 · 5 comments
Open

Files uploaded from MacOS with umlauts in names cannot be accessed #72

Tronic opened this issue Nov 11, 2022 · 5 comments

Comments

@Tronic
Copy link

Tronic commented Nov 11, 2022

MacOS encodes filenames using Unicode normalization form NFD where ä is encoded as two characters (combining diaresis and latin a), where Windows and Linux use normalization form NFC (single character). Apparently DroppyJS converts URLs or something to NFC, making it impossible to access or even delete such files, even from the Mac. The only way to access the file is to ssh into the server and rename/delete it from there. In particular, if one already has broken filenames, these can be repaired by running within the storage folder:

convmv -f UTF-8 -t UTF-8 --nfc --notest -r .

I suggest doing filename.normalize("NFKC") on all uploaded files' names to avoid this problem during uploads, and to prevent incorrect filename encoding stored on (Linux server) filesystem. Possibly the opposite filename.normalize("NFKD") should be done for downloads on Mac clients but I am not aware of whether this is needed of if Mac does it locally anyway. Also ideally DroppyJS should be able to access such files if already stored on server, but this is less critical than handling of uploads.

@markhughes
Copy link
Collaborator

Sorry for the delay in getting to tickets, I'm dealing with some pretty savage health issues. It's on my mind.

@markhughes
Copy link
Collaborator

I think either:

  1. we add support to configure this value
  2. we add a better support layer for file system types

@Tronic
Copy link
Author

Tronic commented Dec 12, 2022

I don't think it should be configurable, but rather always convert to NFC (of NFKC) any filenames uploaded/created, which should solve this issue with Mac clients and Windows or Linux server. This doesn't need any OS detection, just always do it.

Possibly it could also convert to NFD (or NFKD) any filenames stored on Mac filesystem (i.e. on a Mac server) and/or convert any filenames sent to Mac clients to one of those forms. I have not tested whether browsers already handle that automatically for Mac clients. If you'd like, I can do the experiments to find you the correct values to use.

Either way, none of these have any use for config options, it just needs to be done at all cases for proper Mac interop (even if there are only Macs involved because some part of the current implementation uses NFC/NKFC making files using the D type encoding inaccessible). For the purposes of this discussion, iPhone/iOS are the same as MacOS, using the D encoding for all Unicode.

@Tronic
Copy link
Author

Tronic commented Dec 13, 2022

Quick testing results (with NodeJS Express, Formidable and Chrome all on Mac):

  • HTML input text fields are automatically NFC (no conversion needed for filenames or text entered on Web UI)
  • File uploads on HTML forms use NFD (this is what breaks Droppy)
  • Downloads with NFC names are converted by the browser into NFD and stored as such on filesystem
  • NFKC/NFKD is not used anywhere (don't use the K, only C/D).
  • I did no testing on invalid Unicode (8 bit gibberish or WTF-8).

TLDR: filename = uploaded_filename.normalize('NFC') will fix this for you and it doesn't affect any non-Apple clients. If you use a middleware that directly stores the file on disk using incorrect name (encoded as NFD), you may need to rename it or patch the middleware instead.

@markhughes
Copy link
Collaborator

Thanks for that!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants