-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC007] Bytecode interpreter #2045
base: master
Are you sure you want to change the base?
Conversation
Bencher Report
Click to view all benchmark results
|
Some parts might need refinement, but I think it's in a good shape for a first round of reviews. |
#### AST | ||
|
||
The first one is an AST and would more or less correspond to the current unique | ||
representation, minus runtime-specific constructors. We could have gone closer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can also get rid of some efficiency-oriented duplication in the current representation, like the distinction between LetPattern
/Let
and RecRecord
/Record
. Getting rid of these would be convenient for both the LSP and the typechecker, I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. In fact I started to draft the first representation and did get rid of the Let
and Fun
to keep only the pattern variants.
record and the empty array, for example with `enum Record { Empty, | ||
NonEmpty(RecordData) }`. This should use the same space as `RecordData` in Rust | ||
(if `RecordData` is a pointer, at least) and save an allocation for empty | ||
structures. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused about what discriminants are present in the "top-level" representation (I'm not sure what's the right term, but I'm talking about the word-sized thing that packs in a pointer with some discriminant). Above, it sounded like we were only going to inline null
and boolean
in the discriminant; here, you're also proposing the put empty records and empty arrays? It seems like there isn't enough room in the pointer alignment for all of these.
By the way, x86_64, aarch64, and riscv-64 all max out at 48 bits of address space. So on these architectures we can pack lots more stuff at the most-significant end of the top-level representation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, x86_64, aarch64, and riscv-64 all max out at 48 bits of address space. So on these architectures we can pack lots more stuff at the most-significant end of the top-level representation.
Yes, but it's less portable. Although we don't need Nickel to run on embedded, I think we can do with one bit for now, and explore those other possibilities later.
I'm a bit confused about what discriminants are present in the "top-level" representation (I'm not sure what's the right term, but I'm talking about the word-sized thing that packs in a pointer with some discriminant). Above, it sounded like we were only going to inline null and boolean in the discriminant; here, you're also proposing the put empty records and empty arrays? It seems like there isn't enough room in the pointer alignment for all of these.
Your first understanding is right. At the top-level, there is only one discriminant for bool
, null
and pointer
. If we follow the pointer, we find many representations: arrays, numbers, etc. Here I'm talking about the representation of the pointee, which can itself be a pointer to something else (typically I guess your immutable vec representation would be mostly a pointer to the root plus some parameters). Somehow the 1-word representation can hold any data, and I'm talking about this specific data when the pointee represents a record. Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, as we need a discriminant at the beginning of the pointee (is it an Array? a Record? etc.) that will need to be aligned (although it might be merged with some other metadata as well), we'll probably have some more space here, and can even special case EmptyArray
and EmptyRecord
as special discriminants, instead of bothering making Record
actually an enum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Even on 32-bit, we can have lots more values at the top-level, right? Assuming our pointers are all 4-byte aligned, anything ending in 01, 10, and 11 is not a pointer. But then for each of those non-pointer values we still have 30 bits left to store actual data. So couldn't we have variants for EmptyArray
and EmptyRecord
without even following that single top-level pointer? And we'd still have room for small integers...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, we have a lot more room. It's just that you can't encode anything that needs at least a full machine word or more - typically OCaml needs to have 31-bits and 63-bits integers so that they can unbox them, which is the case of most other non trivial data structures. But indeed special values like empty stuff could in theory be also directly put that top-level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, makes sense. I was just confused about what discriminants were where. Our current Vector
has a root: Option<Rc<Node>>
, so it's already avoiding allocation for empty arrays.
Although the name is a bit pompous, the goal of this RFC is mostly to be a working document for designing a more compact and efficient run-time representation for Nickel expressions.
While this is something that won't be user-facing (at least in a direct way), and thus can be changed later without breaking backward-compatibility, I think the technical scope of this effort is such that I find it better to discuss it formally here before going for a first implementation.