Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

IoT Platform Design Doc: Double Trouble

DZone's Guide to

IoT Platform Design Doc: Double Trouble

See how you can get your interpreter to encode values in such a way that the value type is accessible at runtime alongside the value itself.

· IoT Zone
Free Resource

Discover why Bluetooth mesh is the next evolution of IoT solutions. Download the mesh overview.

An interpreter for a dynamically typed language needs to encode values in such a way that the value type is accessible at runtime alongside the value itself. There are a few tricks to encode the type compactly. Here, we discuss a quite common and ingenious trick, shared with some well-known JS virtual machines.

A bit of context: V7, our embedded JavaScript virtual machine has been designed from the ground up to be small and simple. Introducing something that sounded advanced and complex such as this beautiful NaN-boxing trick needed some analysis in order to convince our CTO. We share this analysis in hope that you might find it useful.

Follow the conversation of our engineering team along and remember, this is an insight into their thinking rather than product documentation for Mongoose IoT Platform.

Double trouble

Objective

Choose compact and practical encoding for JS values.

The solution should degrade cleanly with the different pointer sizes and be endianness neutral.

Background

Currently, V7 defines a value as tagged union containing a bunch of primitive types, among which there is a pointer to object.

union v7_valholder {
    int boolean;
    v7_num_t number;
    struct v7_str string;
    struct v7_object *object;
    v7_func2_t c_function;
};struct v7_value {
    union v7_valholder value;
    enum v7_type type;
};

Every object property (including variables and arguments in the activation frames) will contain a whole value structure + padding for alignment, which on x86_64, for example, will be 16 bytes.

On smaller architectures, the alignment overhead will be smaller. But, in the best of the cases it will still be at least one byte longer than the JS number type (double).

Furthermore, most compilers will represent the enum as an int unless explicitly instructed to pack it (using compiler specific attributes or pragmas), bringing the realistic size for a value from 10 to 12 bytes.

High-level language implementations often employ a technique called tagged pointers, where pointers and other data coexist in the same word. This works by exploiting the fact that the runtime can ensure that the pointers always point to word-aligned structures, leaving the least significant bits always set to zero. The SPARC instruction set even has special instructions to deal with integer arithmetics that ignores the least significant bit in order to help implementers of dynamic languages.

Overview

Mixing 31 (or 63) bit integers and pointers is a well known and cheap trick, but in JS the numeric type is a 8 bytes double, whose internal format is complex and trimming the mantissa and exponent length might lead to precision issues making us diverge from the standard.

The IEEE 754 double precision floating-point standard allows to build a NaN value which carries from 51 to 53 bits of payload (depending on how you stretch them), which can be used to carry a tag and a pointer.

(Snippet from StackExchange):

When you are implementing a dynamically typed language, you've got to have a single type which can hold any of your objects. There are three different approaches I'm aware of for this:

Firstly, you can pass around pointers. This is what the CPython implementation does. Every object is a PyObject pointer. These pointers get passed around and operations are performed by looking at details in the PyObject struct to figure out the type.

The disadvantage is that small values like numbers get stored as boxed values, So your little 5 get stored as a block of memory somewhere. So this leads us to the union approach, which is used by Lua. Instead of a PyObject*, each value is a struct which one field to specify the type, and then a union of all the different supported types. That way we avoid allocating any memory for small values, instead storing them directly in the union.

The NaN approach stores everything as doubles, and reuses the unused portion of NaN for the extra storage. The advantage over the union method is that we save the type field. If it's a valid double, it's a double otherwise the mantissa is a pointer to the actual object.

Remember, this is every javascript object. Every variable, every value in an object, every expression. If we can reduce all of those from 96 bits to 64 bits that is pretty impressive.

Is it worth the hack? Recall that there is a lot of demand for efficient Javascript. Javascript is the bottleneck in many web applications, and so making it faster is a higher priority. Its reasonable to introduce a certain degree of hackiness for performance reasons. For most cases, it'd be a bad idea, because it’s introducing a degree of complexity for little gain. But in this specific case, it is worthwhile for memory and speed improvements.

This trick is called nunboxing, and it is employed by Spidermonkey, while V8 allocates doubles on the heap and relies on other optimizations to get rid of boxing.

In current 64 bit architectures, only 48 bits are effectively used for virtual addresses. Those 48-bit words are sign extended to 64 bit. This means that the range of canonical 64-bit addresses is:

0xFFFFFFFFFFFFFFFF
0xFFFF800000000000

0x00007FFFFFFFFFFF
0x0000000000000000


Only 48 bits are required to represent these two ranges, as the bits from 48 to 64 are copies of the 47th bit. This trick is called punboxing (for packed nunboxing), and it has been introduced in the 64-bit version of Spidermonkey.

This document describes an adaptation of this technique for V7. The goal of the document is to check whether this technique provides enough space to encode all primitive value types we need, leaving some details to be defined elsewhere.

Apologetics

(p|n)unboxing is very simple to implement and removes the ghost of primitive values boxing by paying a constant and predictable overhead (8 bytes per value). Our main goal is not performance, but if we box every operation on doubles we might end up easily having performance regressions also due to GC-pressure. See the Alternatives section.

Those values will be mainly held in the object properties, especially since initially the activation frames will be implemented as plain JS object. This opens the possibility of merging the property attributes into the free bits left in the value. Thus, producing a similar effective size of the property record between 32-bit tagged pointers (+ 32 bit next pointer + attributes + padding) and 64-bit value with embedded attributes. The price would be higher for 16-bit architectures though. See the Future Work section.

Detailed Design

Layout

Design_Doc_Double_Trouble.png

The two infinites are encoded with all exponent bits set and fraction/mantissa bits cleared:

+Inf: 0xFFF0000000000000

-Inf: 0x7FF0000000000000


A NaN is defined as all exponent bits set and a non-zero fraction part. The sign bit has no meaning.

There are two flavors of NaN: garlic^Wsignaling NaN and quiet NaN.
Signaling NaN should raise a an invalid operation exception if encountered during computation, if the feature is enabled in the FPU control word, while quiet NaNs silently propagate.

Signaling NaNs have the 51st bit cleared and at least one other mantissa bit set.
Quiet NaNs have the 51st bit set and other fraction bits might as well be all zeros for it won’t be confused with an infinite.

We are technically free to use all 53 bits (52 bits of the mantissa + one sign bit) as long as we leave one value which encodes our JS NaN (for example we can choose the negative quiet NaN with all zeros payload), and 0x7FF0000000000000/0xFFF0000000000000 which encode the two infinity values.

Tags

We have 5 bits to use as tags, without counting the LSBs of the 48 bit pointer. 5 bits should be enough for now, so in order to make it simpler for us to adapt to various pointer sizes and platforms, we don’t rely on any specific alignment amount and store the full (up to) 48 bit pointer.

Design_Doc_Double_Trouble_2.png

For simplicity, let’s include the 11 bits of the exponent in the tag constants; they always have to be set to 1. If they are not set, it means that it’s a double. Hence the tag is a 16-bit quantity shifted 48 bits to the left:

  • 0xFFFF << 48: Object pointer
  • 0xFFFE << 48: Foreign pointer (C data, not followed by the GC)
  • 0xFFFD << 48: JS undefined
  • 0xFFFC << 48: Boolean
  • 0xFFFB << 48: Interned string
  • 0xFFFA << 48: Cord
  • 0xFFF9 << 48: String (aka. dependent string)
  • 0xFFF8 << 48: JS NaN (51th bit set means quiet NaN)
  • 0xFFF7 << 48: Foreign string
  • 0xFFF0 << 48: Reserved (with 0 payload it would be -Inf)
  • 0xFFFE--7FF1: Reserved
  • 0x7FF0 << 48: Reserved (with 0 payload it would be +Inf)

Null is encoded as a null foreign pointer. (0xFFFE000000000000), i.e. v7_mk_foreign(NULL) == v7_mk_null

Pointers

Stuffing

In order to convert a canonical 64 bit pointer into a 48 bit value just strip the most significant 16 bits and apply the object (or foreign pointer) tag:

s = x & ((1L << 48) -1) | 0xFFFF << 48

Unstuffing

In order to recover the 64 bit pointer from the 48 bit value, we just need to sign extend it:

struct {uint64_t s:48;} h;

x = (h.s = s);

Or in the extremely rare case (now that even cc65 supports them) that the compiler doesn’t like bit fields:

const uint64_t m = 1L << (48 - 1);

x = (s ^ m) - m;

Foreign pointers

(A.k.a “opaque pointer sized value”)

Sometimes we’ll need to pass some value from C to JS and back to C. We could encode it in a n byte string and pass it like that, or just have native support for foreign pointers. In that case the Garbage Collector will need to know that it’s not a JS object.

Strings

General V7 strings are not necessarily null terminated, hence, we need to express a string with a base + length pair. The rationale is:

  • JS strings can contain null bytes
  • We want to let the user expose raw byte ranges as JS strings

We cannot fit a string base + length

Interned Strings

Interned strings such as string literals are created once and can be compared by pointer comparison. Interned string values contain a pointer to a memory area which contains the string object, the format of which is to be defined in another document.

Inlined Strings

We can encode up to 6 8-bit characters, or 10 5-bit (e.g. only lowercase chars). 6 chars are enough to encode the very common property name “length”.

The inlined string will be null terminated or implicitly terminated by reaching the maximum length that fits (e.g. 6). Captain Obvious suggested to me that short strings containing a null byte would be treated as non short strings.

Just as interned strings, inlined strings can be easily and quickly compared by just comparing the value word.

Spidermonkey doesn’t inline strings in the value, but instead inlines the string data in the string object header and pre allocates a bunch of interned strings, such as “”, all strings up to 3 letters and the first 256 stringified integers. We cannot waste those 32-64k of precious memory on that.

Cords

To be discussed in another document.

Dependent strings

(i.e. pointers to parts of other strings). To be discussed in another document.

Code

The v7_value type could either be a double or a 64 bit integer value. However, the choice affects both the usability from C code (comparing two punboxed values yields always false, since they are technically NaNs) and an object pointer can be extracted without having to move it from a floating point register to a general purpose register. We expect object accesses to dominate the execution of our use cases.

However, when we are testing for is_double(), we convert the number to a double and use the built-in isnan(), which generates smaller assembly code by leveraging the byzantine NaN comparison.

The alternative would be to test for something like: (x & (0x7FF << 48)) - 0x7FF > 0

It’s more compact to let the hardware do it.

typedef uint64_t v7_value;#define V7_TAG_OBJECT    ((uint64_t) 0xFFFF << 48)
#define V7_TAG_FOREIGN   ((uint64_t) 0xFFFE << 48)
#define V7_TAG_UNDEFINED ((uint64_t) 0xFFFD << 48)
#define V7_TAG_BOOLEAN   ((uint64_t) 0xFFFC << 48)
#define V7_TAG_STRING    ((uint64_t) 0xFFF9 << 48)
#define V7_TAG_NAN       ((uint64_t) 0xFFF8 << 48)
#define V7_TAG_MASK      ((uint64_t) 0xFFFF << 48)#define V7_NULL V7_TAG_FOREIGN
#define V7_UNDEFINED V7_TAG_UNDEFINED#define V7_SET_NAN(v)                           \
do {                                          \
    * (uint64_t *) &v = V7_TAG_NAN;             \
  } while(0)                                    \double v7_value_to_double(v7_value v);int v7_is_double(v7_value v) {
    v7_value nan;V7_SET_NAN(nan);
    return !(isnan(v7_value_to_double(v)) && v != nan);
}int v7_is_object(v7_value v) {
    return (v & V7_TAG_MASK) == V7_TAG_OBJECT;
}int v7_is_boolean(v7_value v) {
return (v & V7_TAG_MASK) == V7_TAG_BOOLEAN;
}int v7_is_null(v7_value v) {
    return v == V7_NULL;
}int v7_is_undefined(v7_value v) {
    return v == V7_UNDEFINED;
}static v7_value v7_pointer_to_value(void *p) {
    return ((uint64_t) p & ((1L << 48) -1));
}static void *v7_value_to_pointer(v7_value v) {
    struct {uint64_t s:48;} h;
    return (void *) (h.s = v);
}
v7_value v7_object_to_value(struct v7_object *o) {
    return v7_pointer_to_value(o) | V7_TAG_OBJECT;
}struct v7_object *v7_value_to_object(v7_value v) {
    return (struct v7_object *) v7_value_to_pointer(v);
}v7_value v7_foreign_to_value(struct v7_object *o) {
    return v7_pointer_to_value(o) | V7_TAG_FOREIGN;
}void *v7_value_to_foreign(v7_value v) {
    return v7_value_to_pointer(v);
}v7_value v7_string_to_value(struct v7_str *s) {
    return v7_pointer_to_value(s) | V7_TAG_STRING;
}struct v7_str *v7_value_to_string(v7_value v) {
    return (struct v7_str *) v7_value_to_pointer(v);
}v7_value v7_double_to_value(double v) {
    v7_value res;
/* not every NaN is a JS NaN */
if (isnan(v)) {
    V7_SET_NAN(res);
} else {
    * (double *) &res = v;
    }
    return res;
}double v7_value_to_double(v7_value v) {
    return * (double *) &v;
}v7_value v7_boolean_to_value(int v) {
    return (!!v) | V7_TAG_BOOLEAN;
}int v7_value_to_boolean(v7_value v) {
    return v & 1;
}

Demo exercising the code is at: https://gist.github.com/mmikulicic/6dd7d1e728d436b993b3

Caveats

All values and pointers will be 8 bytes wide, even on 32-bit architectures. If the payload doesn’t involve much floating point math, we’d be better off by boxing double values and have simple bit tagged pointers on smaller architectures.

Alternative

A simple alternative is to have values big enough to hold a native pointer, align all objects to 8 bytes and use the 3 LSBs of the pointers to hold:

  • 0: object pointer
  • 1: foreign pointer
  • 2: boolean
  • 4: small integer
  • 5: pointer to boxed double
  • 6: string
  • 7: ??
  • 8: ??

Bloat Analysis

The code necessary to encode, decode and test for value type is quite small and self contained. By having a lightweight value, we reduce the temptation of over complicating our VM internals just to avoid unnecessary copies of the heavy value struct.

Work Estimates

Two days for basic values glossing over advanced packed value formats such as inlined strings.

Future Work

By exploiting the unused tag bits and unused address LSBs, it’s possible to merge the property attributes into the value. Those bits belong logically to the property not the value. Thus, they’ll have to be stripped and set when the value gets read or stored into a property.

Another technique would be to have two kind of property descriptors: one for all the properties which have the default attributes, and another bigger one for properties which have some attribute set. The kind of property can be embedded in a tag in the “next” pointer. To be described in another document, just dumping a stub here.

References

Easy to read overview of the NaN boxing technique: http://nikic.github.io/2012/02/02/Pointer-magic-for-efficient-dynamic-value-representations.html

More links to Mozilla’s documents citing the nunboxing and punboxing:

http://evilpie.github.io/sayrer-fatval-backup/cache.aspx.htm

https://developer.mozilla.org/en-US/docs/Mozilla/Projects/SpiderMonkey/Internals

Mozilla string internals overview:

http://blog.cdleary.com/2012/01/string-representation-in-spidermonkey/

Notes from the Future

Implemented in V7.

We decided to define val_t as uint64_t instead of double.

    

Take a deep dive into Bluetooth mesh. Read the tech overview and discover new IoT innovations.

Topics:
interpreter ,iot ,javascript ,dynamic languages

Published at DZone with permission of Marko Mikulicic, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}