Over a million developers have joined DZone.

Agile and Literate Data Entry: Is YAML the Answer?

DZone 's Guide to

Agile and Literate Data Entry: Is YAML the Answer?

We take a look at how YAML, when combined with Agile principles, helped one dev cut down on the monotonous activities in her day, freeing her up for more creative work.

· Agile Zone ·
Free Resource

I have, more frequently than perhaps I would wish, a requirement to enter data.

The data can be simple enough, but not so simple that I would use a spreadsheet for the data entry task; a scrolling horizontal grid is very unsuited to the task, even with locked column headers.

Fully featured data entry programs exist, indeed I have written quite a few myself, but the paradigm there is that:

  • One knows exactly what the data looks like beforehand, what the fields are, and what the validation procedures are/should be. That is OK for large repetitive data entry jobs (but anyone who has been involved in a real world data preparation room will know how rarely one’s assumed knowledge translates into practice, and how frequently special cases arise).

  • Importantly, this is not an agile approach. Agile, as in “agile programming,” can be applied to data entry too. It means starting now, getting on with the job, treating special cases at some later date, or being in a position to quickly adjust the framework so that “special needs” become “system features.”

  • A specialist data entry app has to be distributed, or deployed, with the concomitant issues of data definition setup, OS compatibility, version control, and synchronizing. This is a PITA. Typically, I might have a few hundred or a few thousand records to get entered that don’t fit some standard model, I have a few people available to me to do it, and I don’t want to have to write a new app, tweak an existing app, or provide specs or train the people. Agility rules! I want the job done in one to three days, with minimal supervision from me.

Text Rules, But Test Is Not CSV

Well, if I don’t want a specialist data entry app and I don’t want a spreadsheet, and I do want zero-install, what else do I want?

Well, I want the format to be “literate,” that is, I want both the writer (data entry person) and the reader (the person using the data) to be able to read it. Easily. Like a book.

And I want the data to be enterable by ANY text editor. 

Not all data needs to be further processed. Maybe it just needs to sit there as a reference, to be read/reviewed/searched. Maybe it just needs minimal further processing for readability or publishing purposes (some sort of Tidy or XSLT if one must); maybe it needs validity checking.

OK, What Formats Do We Know That Are Just Text?

  • XML. Not an option. The visual clutter of the markup tags makes it too hard for the data entry person.
  • SOX, no (it's just a minor simplification of XML). 
  • JSON. No thanks. A simplification, but again, really, it is designed for data exchange between machines.

And Data Entry is Not Data Exchange.

Maybe YAML?


Well, here it is.

Wikipedia says that YAML is “human-readable data serialization format that takes concepts from languages such as XML, C, Python, Perl, as well as the format for electronic mail as specified by RFC 2822.”

But YAML is better than that sounds. “Human readable?” Well, so is XML.

YAML is actually Human Writeable.

The Wikipedia article is OK, but have a look at the discussion in Symfony where it is embedded.

Symfony is much more than YAML, it is a web application framework that embraces YAML, a framework that could conceivably be used to build a distributed “natural” data entry program, but let’s just stick to YAML, its principles, and how we write it.

Well, look at how we read it first.

This is a typical YAML data item (in plain text).

name: Doe
- John
- Jane
- Paul
- Mark
- Simone
number: 34
street: Main Street
city: Nowheretown
zipcode: “12345″

In YAML, the structure is shown through indentation, sequence items (as in, items in a collection) are denoted by a dash, and key/value pairs within a map are separated by a colon.

YAML also has a shorthand syntax to describe the same structure with fewer lines, where arrays are explicitly shown with [] and hashes with {}

family: { name: Doe, parents: [John, Jane], children: [Paul, Mark, Simone] }
address: { number: 34, street: Main Street, city: Nowheretown, zipcode: “12345″ }

This looks very WRITEABLE, particularly through line-oriented (as per the first example) blank “templates” that we copy and paste as many times as we like. It is also (presumably) extensible. In the above example, we could have as many children as we liked (including zero), but we could also add “fields” as we go along (in which case, the final processing app, if there is one, will need to sort things out).

Compare this to a DTD or XLS (schema) driven approach, where the data description is assumed to be known and knowable upfront.

There is much more to YAML, such as the ability to work with multiple documents/data items in a stream (separated by — ).

# Ranking of 1998 home runs

- Mark McGwire
- Sammy Sosa
- Ken Griffey

# Team ranking

- Chicago Cubs
- St Louis Cardinals

And much more information about it here (more than you will likely want to know). There is some more information about bindings and the grammar at libYaml.

In Conclusion:

The YAML concept appears attractive, given that I want to use text-based literate and flexible/agile data entry without a specialist data entry app.

But a full implementation of the YAML spec or the Syck version is not without its issues, and I would probably opt for allowing only a cut-down version of the syntax in any micro-YAML data reader.

Article By: Nicki Jenns

data ,yaml

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}