Skip to content

Latest commit

 

History

History
41 lines (35 loc) · 1.96 KB

TODO

File metadata and controls

41 lines (35 loc) · 1.96 KB

Use ExtUtils::CppGuess’s C++11 support, rather than doing it ourselves

(Probably needs some fixes in CppGuess for some platforms I tried though)

Fix UTF-8 support

This turns out to be harder than I was thinking. The first step is to compile two versions of the regexp, one for matching UTF-8 and one for matching Latin1 (maybe on demand).

RE2 won’t accept \x{…} escapes that are greater than the current character set. I was hoping it would be possible to give a string containing these to RE2 then let RE2 realise part of it won’t match (e.g. (?:foo|\x{1234}) will still match foo, even if the input string isn’t UTF-8).

(I’m only talking about \x{…}; this is the only case I have to care about, \p{…} are accepted by RE2 regardless. Due to Perl’s behaviour we can’t have raw UTF-8 in the string if the UTF-8 flag isn’t on.)

The approach for now will probably be to replace \x{nnn} in strings (where nnn>0xFF) with something that won’t match (maybe [^\x00-\xff]), but allows the other branches to match.

Think about supporting perl 5.14’s unicode regexp flags

At least at the top level, implementing within RE2 would be silly.

RE2 doesn’t have all the behaviours perl does (i.e. /a is implied for \d, etc.). Might just be a case of documenting what RE2 does, once UTF-8 is working to some extent. An alternative could be to make things explicit (e.g. you need to say “no feature ‘unicode_strings’” if you happen to have enabled them to use RE2).

Support more options

never_nl could be useful for cpangrep optimisations

Support RE2::Set functionality

i.e. a Regexp::RE2::Set class that can have RE2 regexps added into it then a match method.

Improve tests

Improve performance comparisons

See maybe https://github.com/axiak/pyre2/blob/master/tests/performance.py

Support /x (probably needs RE2 changes to do properly)

Both Perl and RE2 store the stringification of the regexp, can we avoid this?