The Value of JSON Values
A deep dive into JSON.
Join the DZone community and get the full member experience.
Join For FreeThis article introduces json-values, a pure functional Java library to work with JSON. In this first article, we'll see some impressive recursive data structures to model essential concepts in software development like data validation, data generation, and parsers.
The first and most important virtue of json-values is that JSON objects are immutable and implemented with persistent data structures, better known in FP jargon as values. As Pat Hellan said, immutability changes everything.
It's a fact that, when possible, working with values leads to code with fewer bugs, is more readable, and is easier to maintain. Item 17 of Effective Java states that we must minimize mutability. Still, sometimes it's at the cost of losing performance because the copy-on-write approach is very inefficient for significant data structures. Here is where persistent data structures come into play.
Most functional languages, like Haskell, Clojure, and Scala, implement persistent data structures natively. Java doesn't. The best alternative I've found in the JVM ecosystem is the persistent collections provided by the library vavr. It provides a well-designed API and has a good performance.
The standard Java programmer finds it strange to work without objects and all the machinery of frameworks and annotations. FP is all about functions and values; that's it. I will try to cast some light on how we can manipulate JSON with json-values following a purely functional approach. First things first, let's create a JSON object:
xxxxxxxxxx
JsObj.of("name", JsStr.of("Rafael"),
"surname", JsStr.of("Merino García"),
"languages", JsArray.of("Java", "Clojure", "Scala"),
"age", JsInt.of(37),
"address", JsObj.of("street", JsStr.of("Elm Street"),
"number", JsInt.of(12),
"city", JsStr.of("Madrid"),
"coordinates", JsArray.of(45.9, 18.6)
)
);
As you can see, its definition is like raw JSON. It’s a recursive data structure. You can nest as many JSON objects as you want. Think of any imaginable JSON, and you can write it in no time.
But what about validating JSON? We can define the JSON schema following precisely the same approach:
JsObjSpec.strict("name",str,
"surname",str,
"languages", arrayOfStr,
"age",integer,
"address",JsObjSpec.lenient("street",str,
"number",any,
"city",str,
"coordinates",tuple(decimal,
decimal
)
)
);
I’d argue that it is very expressive, concise, and straightforward. I call it json-spec
. I named it after a Clojure library named spec. Writing specs feels like writing JSON. Strict specs don't allow keys that are not specified, whereas lenient ones do. The real power is that you can create specs from predicates and compose them:
x
BiFunction<Double,Double, Predicate<BigDecimal>> range =
(min,max) -> dec -> dec.doubleValue() <= max && dec.doubleValue() >= min;
Predicate<BigDecimal> latitudeRange = range.apply(-180.0,180.0);
Predicate<BigDecimal> longitudeRange = range.apply(-90.0,90.0);
JsObjSpec addressSpec =
JsObjSpec.lenient("street", str.nullable().optional(),
"number", any(v -> v.isStr() || v.isInt()),
"city",str.optional(),
"coordinates",tuple(decimal(latitudeRange),
decimal(longitudeRange)
)
);
JsObjSpec personSpec =
JsObjSpec.strict("name", str(s -> s.length() < 255),
"surname", str(s -> s.length() < 255),
"languages", arrayOfStr(s -> s.length() < 128),
"age", integer(n-> n >= 16),
"address",addressSpec
);
As you can see, the spec's structure remains the same, and it’s child’s play to define optional and nullable fields.
Another exciting thing we can do with specs is parsing strings or bytes. Instead of parsing the whole JSON and then validating it, we can verify the JSON schema while parsing it and stop the process as soon as an error happens. After all, failing fast is important as well!
x
// json :: String or byte[]
JsObjSpec spec = ...
JsObjParser parser = new JsObjParser(spec);
JsObj obj = parser.parse(json);
Let's benchmark json-values and other alternatives using JMH. The test is as simple as parsing a string into JSON and validating it according to the following spec:
x
JsObjSpec vegetablesSpec =
JsObjSpec.strict("veggieName", str(length(1,255)),
"veggieLike", bool
);
JsObjSpec personSpec =
JsObjSpec.strict("firstName",str(length(1, 255)),
"lastName", str(length(1, 255)),
"age", integer(interval(0, 110)),
"latitude", decimal(interval(new BigDecimal(-90),
new BigDecimal(90)
)
),
"longitude",decimal(interval(new BigDecimal(-180),
new BigDecimal(180)
)
),
"fruits",arrayOfStrSuchThat(greaterThanOne),
"numbers",arrayOfIntSuchThat(greaterThanOne),
"vegetables", arrayOf(vegetablesSpec).optional()
);
I've tried different alternatives:
- json-values
- Jackson + bean validation annotations (hibernate-validator implementation)
- json-schema-validator
- justify
The results are as follows:
Benchmark | Mode | Cnt | Score | Units | |
json_values | thrpt | 25 | 212278 | ops/s | 100% |
justify | thrpt | 25 | 123254 | ops/s | 60% |
jackson_bean_validation | thrpt | 25 | 62104 | ops/s | 29% |
json_schema_validator | thrpt | 25 | 30691 | ops/s | 14% |
There is a big difference in terms of performance and obtaining the best looking json-values, as json-values use dsl-json to parse strings into a JSON object's given specs.
Another critical aspect of software development is data generation. It’s an essential aspect of property-based testing, a technique for the random testing of program properties very well known in FP. Computers are way better than humans at generating random data. You'll catch more bugs testing your code against a lot of inputs instead of just one. Writing generators, like specs, is as simple as writing JSON:
xxxxxxxxxx
JsObjGen.of("name",alphabetic,
"surname",alphabetic,
"age", choose(16, 100),
"address",JsObjGen.of("street",alphabetic.nullable().optional(),
"number", oneOf(choose(0,1000),
alphanumeric
).optional(),
"city",alphabetic.optional(),
"coordinates",tuple(decimal,
decimal
)
)
);
Consider the following method, testProp
:
xxxxxxxxxx
/**
@param gen generator to produce randomized input data
@param prop the property to be tested
@param times number of iterations
*/
public void testProp(JsGen<JsObj> gen,
Predicate<JsObj> prop,
int times
)
{
Supplier<JsObj> supplier = gen.sample();
Stream<JsObj> stream = Stream.generate(supplier);
Assertions.assertTrue(stream.limit(times)
.allMatch(prop)
);
}
It shows the essence of property-based testing, even if it's far from being a real implementation like Quickcheck or ScalaCheck. You pass in a generator and a predicate representing a property that your code has to satisfy. Then, the predicate is tested against randomized input data produced by the generator. You have to indicate the number of iterations; otherwise, it never ends!
Given the previous JSON generator, imagine we need to generate addresses with either all the fields (name, number, and city) or none of them. We can use brute force to save us a lot of time using the suchThat
combinator that returns a new generator that produces values that satisfy the given predicate:
x
Predicate<JsObj> allFields =
address -> address.get("street").isNotNothing() &&
address.get("number").isNotNothing() &&
address.get("city").isNotNothing();
Predicate<JsObj> noneFields =
address -> address.get("street").isNothing() &&
address.get("number").isNothing() &&
address.get("city").isNothing();
JsObjGen newGen = gen.suchThat(allFields.or(noneFields));
Care is needed to ensure there is a high chance the generator will satisfy the predicate. By default, suchThat
will try 100 times to generate a value that satisfies the predicate. If no value passes this predicate after this number of iterations, a runtime exception is thrown. I love this feature. Most of the time, the predicate you passed in is an existing one that is used for validation purposes.
Data generation and validation are critical in software. Generating and validating your data with such concise and readable data structures has a significant impact on productivity and maintainability.
Let's go over the most important functions to manipulate JSON that you can find in json-values. Consider the following JSON:
xxxxxxxxxx
{
"name": "Rafael",
"age": 37,
"languages": ["Java", "Scala"],
"address":{
"street": "Elm street",
"coordinates": [12.3, 34.5]
}
}
It can be modeled as a set of path-value pairs:
xxxxxxxxxx
[
("/name", "Rafael"),
("/age", 37),
("/languages/0", "Java"),
("/languages/1", "Scala"),
("/address/street", "Elm street"),
("/address/coordinates/0", 12.3),
("/address/coordinates/1", 34.5),
(*: JsNothing)
]
As you may notice, *
represents all the paths not defined for that JSON, and JsNothing
is their associated value. This model is convenient for expressing two of the most critical functions:
xxxxxxxxxx
public JsValue get(JsPath path);
public Json set(JsPath path, JsValue value);
The get
method always returns a value, no matter what path is passed in (if the path doesn't exist it returns the singleton of JsNothing
). It's a total function. Functional programmers strive for total functions. Their signature still reflects reality: no exceptions, and it never returns null. Following the same philosophy, if you set a value at a specific path, it will always be created. In the next line of code after setting that value, it will be at the specified path. The following property still holds:
xxxxxxxxxx
json.set(path,value).get(path) == value
What do you think setting JsNothing
at a path does? Well, it has to remove the value, so that get
returns JsNothing
:
xxxxxxxxxx
jsObj.set(path, JsNothing).get(path) == JsNothing
FP has to do with honesty. Establishing laws makes it easier to reason about the code we write. By the way, the set
method always returns brand new JSON; if you remember well, json-values uses persistent data structures.
x
JsObj.empty().set(path("/a/b"),
JsStr.of("foo")
);
// { a: { b: "foo"} }
JsObj.empty().set(path("/a/b/2"),
JsInt.of(1)
);
// { a: b: [null, null, 1] }
// padding arrays with 0
JsObj.empty().set(path("/a/b/2"),
JsInt.of(1),
JsInt.of(0)
);
// { a: b: [0, 0, 1] }
There are times when it's more convenient to use the following functions to get some data out:
xxxxxxxxxx
JsObj json = ...
String name = json.getStr("name");
Integer age = json.getInt("age");
JsArray languages = json.getArray(path("/languages"));
Double latitude = json.getDouble(path("/address/coordinates/0"));
Let's introduce the jewels in the crown of FP: filter
, map
, and reduce
.
xxxxxxxxxx
// maps only the first level
json.mapKeys(key -> key.toLowerCase())
// traverses the whole Json recursively
json.mapAllKeys(key -> key.toLowerCase());
json.filterAllKeys(key -> key.isNotEmpty());
json.filterAllValues(JsValue::isNotNull);
Optional<Integer> sum = a.reduceAll(Integer::sum,
js -> js.toJsInt().value,
JsValue::isInt
)
Consider the following operation. It maps a JSON recursively, trimming only the elements that are strings:
xxxxxxxxxx
json.mapAllValues(js -> js.isStr() ?
js.toJsStr().map(String::trim):
js
);
We need to check the type and make a conversion. It's not very declarative, is it? Prisms come to the rescue. If you think of what a Prism does to light, it happens the same with the sum-type JsValue. We have several subtypes to consider (int, long, string, obj, array, instant, bool), and we want to focus on a specific one. You can refactor the previous code into:
x
json.mapAllValues(JsStr.prism.modify.apply(String::trim));
Every type in json-values has a Prism. Another example with instants:
xxxxxxxxxx
json.mapAllValues(js -> js.isInstant() ?
js.toJsInstant().map(ins -> ins.plusSeconds(3)):
js
);
// using a prism
json.mapAllValues(JsInstant.prism.modify.apply(ins -> ins.plusSeconds(3)));
Prisms are a specific type of optics. I'll cover optics in the next article. Nevertheless, I'd like just to point out some critical aspects of them. It’s ubiquitous to navigate through recursive data structures like JSON objects and arrays to find, insert, and modify data. It’s a cumbersome and error-prone task (NullPointerException is always lurking around), requiring a defensive programming style with much boilerplate code. The more nested the structure is, the worse. FP uses optics to cope with these limitations.
Some other handy methods are union and intersection:
xxxxxxxxxx
JsObj a = ...
JsObj b = ...
// unordered array, repeated values not allowed
a.unionAll(b,SET);
// ordered array, repeated values allowed
a.unionAll(b,LIST);
// unordered arrays, repeated allowed
a.intersectionAll(b,MULTISET);
a.intersectionAll(b, SET);
Even if you are using the standard Java libraries to work with JSONs and a pure functional library doesn't fit in your project, you can benefit from json-values to generate and validate Json in your tests.
There are some related projects you may find interesting
- mongo-values: to work with the MongoDB Java driver and json-values without any kind of conversion to/from BSON.
- vertx-effect: replaces the Vert.x JSON with json-values. Persistent data structures especially shine in architectures based on the actor model.
- vertx-mongodb-effect: built on top of vertx-effect and with mongo-values
- json-scala-values: the Scala version of json-values
- json-kotlin-values: currently under development.
Wrapping up:
- json-values provides a persistent JSON with a simple, declarative, and efficient API.
- We've seen different recursive data structures to model JSON objects, specs, and generators. You can open a JShell and start writing and testing them right away. It should be easy to interact with the code we develop. A unit test is not a proper way of interacting with the software you write. You and your code will end up growing apart.
Opinions expressed by DZone contributors are their own.
Comments