Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Expose convertEncoding convenience function to controller #13772

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

christophehenry
Copy link
Contributor

Allow to convert from UTF-8 to whatever encoding the device supports

Allow to convert from UTF-8 to whatever encoding the device supports
@christophehenry christophehenry changed the title Expose convertEncoding convenience function to controller [WIP] Expose convertEncoding convenience function to controller Oct 15, 2024
src/controllers/controller.h Outdated Show resolved Hide resolved
Comment on lines 190 to 191
class Charsets : public QObject {
};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to expose some common charsets as constants to JS, maybe moving convertEncoding to that new object too so that from JS, it would look like: engine.charsets.convertEncoding(engine.charsets.WELL_KNOWN_CHARSETS.LATIN_9), for instance. But I still havent figured out how to do that. I feel like I may be trying to bit off more that I can chew here, provided my C++ skills…

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think making it part of the engine class is unnecessary. you can simply add it to the global object, it requires a little bit of boilerplate but not much. See ControllerScriptEngineBase::initialize().

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh you mean have charsets exposed in the global namespace? Aye. I could do that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To add an enum, simply add it to the Charsets class and make it a Q_ENUM. Seems fairly doable IMO, I'm sure you can figure it out. If not, feel free to ask.

src/controllers/controller.h Outdated Show resolved Hide resolved
@Swiftb0y
Copy link
Member

Swiftb0y commented Oct 15, 2024

I think design-wise, it would make most sense to just return a QByteArray (which is automatically a JS TypedArray), since if you're converting into a different format, you'll likely want to send out the bytes soon after without manipulating them more (at least in terms of characters). Also putting it back into a QString would either convert it back to UTF-16 or create a (likely invalid, UB causing) string.

@christophehenry
Copy link
Contributor Author

it would make most sense to just return a QByteArray

Isn't what I'm doing here?

Also putting it back into a QString

I don't understand… Where am I doing that?

@Swiftb0y
Copy link
Member

I don't understand… Where am I doing that?

whoops, yeah, that assumption was based on the ucnv_fromUChars call that currently has its arguments swapped.

Comment on lines 217 to 233
icu::UnicodeString unicodeString;
UErrorCode errorCode;
UConverter* latin9Converter = ucnv_open(targetCharset.toLocal8Bit().data(), &errorCode);

if (!U_FAILURE(errorCode)) {
ucnv_close(latin9Converter);
return QJSValue::UndefinedValue;
}

char* result = nullptr;
ucnv_fromUChars(latin9Converter,
result,
0,
&value.data()->unicode(),
value.length(),
&errorCode);
ucnv_close(latin9Converter);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Qt has a wrapper for all this. I think it should be:

Suggested change
icu::UnicodeString unicodeString;
UErrorCode errorCode;
UConverter* latin9Converter = ucnv_open(targetCharset.toLocal8Bit().data(), &errorCode);
if (!U_FAILURE(errorCode)) {
ucnv_close(latin9Converter);
return QJSValue::UndefinedValue;
}
char* result = nullptr;
ucnv_fromUChars(latin9Converter,
result,
0,
&value.data()->unicode(),
value.length(),
&errorCode);
ucnv_close(latin9Converter);
QTextCodec* codec = QTextCodec::codecForName(targetCharset);
if (!codec) {
return QJSValue::UndefinedValue;
}
QByteArray result = codec->fromUnicode(value);

(Not tested)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately this nice API was replaced in Qt6 by QStringConverter, which supports much less codecs. The old API is still available in the "Qt 5 Core Compat module" for Qt6, but I guess this is nothing that should be used in new code.

Copy link
Member

@daschuer daschuer Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm interesting.

In Qt6 there is a common base class for QStringConverterBase for QStringConverter and QTextCodec
https://github.com/qt/qtbase/blob/30d90b4ccad83ab1f23dab7cd72b7e228c299895/src/corelib/text/qstringconverter.cpp#L1772

And it is also using ucnv under the hood:
https://github.com/qt/qtbase/blob/30d90b4ccad83ab1f23dab7cd72b7e228c299895/src/corelib/text/qstringconverter.cpp#L2085

Suggested change
icu::UnicodeString unicodeString;
UErrorCode errorCode;
UConverter* latin9Converter = ucnv_open(targetCharset.toLocal8Bit().data(), &errorCode);
if (!U_FAILURE(errorCode)) {
ucnv_close(latin9Converter);
return QJSValue::UndefinedValue;
}
char* result = nullptr;
ucnv_fromUChars(latin9Converter,
result,
0,
&value.data()->unicode(),
value.length(),
&errorCode);
ucnv_close(latin9Converter);
QStringEncoder encoder = QStringEncoder(targetCharset);
if (!encoder.isValid()) {
return QJSValue::UndefinedValue;
}
QByteArray result = encoder.encode(value);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@daschuer thank you for pointing this up. If I can't use QTextCodec, there's definitely some code I can copy here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if I'm not mistaken, this should look like on my last commit. This, however doesn't work as expected. From JS side, convertEncoding returns an object that, printed using console.log, display the original string. Not a TypedArray. I can't figure out what's happening. I'm leaving it ofr tonight.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@Swiftb0y Swiftb0y Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe do some type introspection (instanceof) to confirm if it is an ArrayBuffer.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah you're right. I expected a TypedArray and thus a .forEach(). So I'd need to do new Uint8Array(midi.convertEncoding("ISO-8859-15", "Thing to display")) from JS? Is there a way to directly return a TypedArray?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can technically execute some JS code in C++ that essentially does that and return the resulting QJSValue, but thats quite ugly IMO. I think it should up to the caller how they want to interpret that buffer. If thats too much typing for you, you can always create a utility function in your mapping, but lets not bake that into the API. It will cause API inconsistencies and unnecessary overhead.

// The length parameter is here for backwards compatibility for when scripts
// were required to specify it.
Q_INVOKABLE virtual void send(const QList<int>& data, unsigned int length = 0) {
Q_UNUSED(length);
m_pController->send(data, data.length());
}

// Available charsets should be available here:
// http://www.iana.org/assignments/character-sets/character-sets.xhtml
Q_INVOKABLE QByteArray convertEncoding(const QString& targetCharset, const QString& value) {
Copy link
Contributor Author

@christophehenry christophehenry Oct 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I can expose well known charsets as an enum to QJSEngine but C++ enum only accepts ints as value. Is there a way that I can use strings as value or modify the signature of this function to accepts either a QString or an enum?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you can take a QJSValue (which acts as a QVariant) and then switch internally depending on the type of the contained value. I'm not sure if that complexity is worth it though. I would prefer to have an enum exposed. A string would make more sense if the set of encoding is open, but in this case its closed because we're limited by QStringConverter::Encoding.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh no, since 6.4, the set is opened with QStringConverter. That's why my last commit uses QTextCodec prior to 6.4. Virtually any encoding supported by UCI should be available. See my last comment, I used Latin-9 which is not available in QStringConverter::Encoding.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, there is probably a little more we can do here. Apparently there are more optional codecs available using QStringConverter::availableCodecs() which we might want to support. In this case we should probably mirror the QStringEncoder api a little and have a function with the enum and a function with the string. I wouldn't try to emulate that via the QJSValue overload stuff though because the manual dispatch is annoying. Instead I would create two different Q_INVOKEABLE methods that are implemented in terms of the correct QTextEncoder overload. I'd all add a another Q_VERSION_CHECK, so we can skip the UTF16 to UTF8 conversion for > Qt 6.8.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh no, since 6.4, the set is opened with QStringConverter. That's why my last commit uses QTextCodec prior to 6.4. Virtually any encoding supported by UCI should be available. See my last comment, I used Latin-9 which is not available in QStringConverter::Encoding.

Yeah, you're right. I didn't see that comment when I wrote the last one.

@christophehenry
Copy link
Contributor Author

Ok, so I can confirm this code works.

1000013918_1.gif

Copy link
Member

@daschuer daschuer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some nit picks.

@@ -16,7 +18,7 @@ class Controller : public QObject {
Q_OBJECT
public:
explicit Controller(const QString& deviceName);
~Controller() override; // Subclass should call close() at minimum.
~Controller() override; // Subclass should call close() at minimum.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note, no need for a change: Make sure you editor does not reformat the whole file. That adds only noise to PRs and may conflict unnecessarily with other PRs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm… This is a pre-commit fix, isn't it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, but pre-commit only touches the lines in the diff.

@@ -193,13 +196,33 @@ class ControllerJSProxy : public QObject {
: m_pController(m_pController) {
}

QHash<QString, QString> m_cache;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are in the controller class this cache needs a more significant name.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What, is this unused?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah no, sorry. This was left from a previous experimentation. This must be removed.

// http://www.iana.org/assignments/character-sets/character-sets.xhtml
Q_INVOKABLE QByteArray convertEncoding(const QString& targetCharset, const QString& value) {
#if QT_VERSION < QT_VERSION_CHECK(6, 4, 0)
auto* codec = QTextCodec::codecForName(targetCharset.toUtf8());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please prefix pointers with p: pCodec

// The length parameter is here for backwards compatibility for when scripts
// were required to specify it.
Q_INVOKABLE virtual void send(const QList<int>& data, unsigned int length = 0) {
Q_UNUSED(length);
m_pController->send(data, data.length());
}

// Available charsets should be available here:
// http://www.iana.org/assignments/character-sets/character-sets.xhtml
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the supported charsets depend on the Qt build options and the ICU version, we should only allow charsets from a positive list.
The charsets on this positive list can than be probed in the CMake configuration step, to ensure that nobody builds a Mixxx version that don't support this functionality of the mapping API:

# Add a custom command to compile the code
add_custom_command(
    OUTPUT list_codecs
    COMMAND ${CMAKE_CXX_COMPILER} -o list_codecs -xc++ - <<EOF
#include <QStringConverter>
#include <QStringList>
#include <QTextStream>

int main(int argc, char* argv[]) {
    if (argc != 2) {
        return 1; // Invalid number of arguments
    }
    QString charset = argv[1];
    QStringList codecs = QStringConverter::availableCodecs();
    if (codecs.contains(charset)) {
        return 0; // Charset is available
    } else {
        return 1; // Charset is not available
    }
}
EOF
    WORKING_DIRECTORY ${CMAKE_BINARY_DIR}
    COMMENT "Compiling list_codecs"
)

# List of specified charsets to check
set(SPECIFIED_CHARSETS "UTF-8;ISO-8859-1;ISO-8859-15")

# Check for each specified charset
foreach(CHARSET IN LISTS SPECIFIED_CHARSETS)
    execute_process(
        COMMAND ${CMAKE_BINARY_DIR}/list_codecs ${CHARSET}
        WORKING_DIRECTORY ${CMAKE_BINARY_DIR}
        RESULT_VARIABLE RETURN_CODE
    )
    if(${RETURN_CODE} EQUAL 0)
        message(STATUS "Charset ${CHARSET} is available.")
    else()
        message(FATAL_ERROR "Charset ${CHARSET} is not available.")
    endif()
endforeach()

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alternatively we just let the mapping handle it. Not sure whats better.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Meh. I'm not in favor of that. The idea is to allow controller developers to use charsets if available without having to open a PR to make it available. I can, however check against QStringConverter::availableCodecs() and QTextCodec::availableCodecs().

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Every Mixxx installation must support each mapping. The mapping API must be stable accross installations. There is no problem, to add a long list of charsets to the positive list.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if packagers are very happy about this though. Letting the mapping handle it (disable the corresponding feature) may make more sense. It entirely depends on the list.

@@ -16,7 +18,7 @@ class Controller : public QObject {
Q_OBJECT
public:
explicit Controller(const QString& deviceName);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this functionality should not be in the controller class itself, which contains the protocol specific controller IO functions.
Either add it to the engine API src/controllers/scripting/legacy/controllerscriptinterfacelegacy.h or add an own charSetMapper class like src/controllers/scripting/colormapperjsproxy.h

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it makes sense since this is computing data that it supposed to be send through midi. Isn't it adding complexity to add yet another interface in the global object; in particular an interface with only one function?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The engine class is already there, and used for nearly every function in the mapping API other than raw IO. There is no overhead, just a different existing class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes less sense to me but I won't fight on this and move to the engine object.

@christophehenry
Copy link
Contributor Author

christophehenry commented Oct 18, 2024

Should I expose QStringConverter::availableCodecs()/QTextCodec::availableCodecs() to JS too, so that developers can chek what is available on their environment?

@JoergAtGithub
Copy link
Member

JoergAtGithub commented Oct 18, 2024

Should I expose QStringConverter::availableCodecs()/QTextCodec::availableCodecs() to JS too, so that developers can chek what is available on their environment?

No, the API must guarantee to support the same charsets on all platforms and builds of Mixxx.

@Swiftb0y
Copy link
Member

No, the API must guarantee to support the same charsets on all platforms and builds of Mixxx.

That'll limit us to a somewhat small common subset though. Is that a price we're willing to pay? Do we want to tell a contributor "no we can not accept your mapping because one niche feature doesn't work on one particular platform"?

@JoergAtGithub
Copy link
Member

That'll limit us to a somewhat small common subset though.

I don't think so, if you use a Qt build with the same ICU version, you will get always the same charsets. We just need to ensure, that the configuration of the Mixxx build fails, if someone uses an inproper Qt build.

@christophehenry
Copy link
Contributor Author

christophehenry commented Oct 18, 2024

Plus, it's not as if it poses the risk of a crash. At worst, the function would return undefined and the mapping can propose an alternative behavior. If the charset becomes mainstream, we can always add it to the well-known charsets later.

Comment on lines +310 to +312
* @param {string} targetCharset The charset to encode the string into.
* @param {string} value The string to encode
* @returns {ArrayBuffer | undefined}The converted String as an array of bytes or undefined if an error happened when performing conversion
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe jsdoc/require-param-type and jsdoc/require-returns-type should be disabled since we're docmenting tpyes using JS.

#else
QStringEncoder fromUtf8 = QStringEncoder(targetCharset.toUtf8().data());
if (!fromUtf8.isValid()) {
return nullptr;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait, that doesn't translate into undefined. Am I really forced to use QJSValue as a return type?

ISO_10646_UCS_2

};
Q_ENUM(WellKnownCharsets)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dammit! I can't find how to expose this enum under engine.WellKnownCharsets. Can someone help me?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need to tell the JavaScript-Interpreter (QJSEngine) about it. Something like this:

  void registerEnums(QJSEngine& engine) {
      qRegisterMetaType<mixxx::preferences::constants::WellKnownCharsets>("WellKnownCharsets");

      const QMetaObject& metaObject = mixxx::preferences::constants::staticMetaObject;
      QMetaEnum metaEnum = metaObject.enumerator(metaObject.indexOfEnumerator("WellKnownCharsets"));

      QJSValue enumObject = engine.newObject();
      for (int enumIdx = 0; enumIdx  < metaEnum.keyCount(); ++enumIdx ) {
          enumObject.setProperty(metaEnum.key(enumIdx ), metaEnum.value(i));
      }

      engine.globalObject().setProperty("WellKnownCharsets", enumObject);
  }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants