Properly Getting Into Jail: Data Flow
Properly Getting Into Jail: Data Flow
Learn about the data flow between the systems that we've talked about so far in this series and lay out the different aspects of integration between the various pieces.
Join the DZone community and get the full member experience.Join For Free
Built by the engineers behind Netezza and the technology behind Amazon Redshift, AnzoGraph™ is a native, Massively Parallel Processing (MPP) distributed Graph OLAP (GOLAP) database that executes queries more than 100x faster than other vendors.
In our prison system, we have a lot of independent parts by design. Each of them is expected to work independently of the rest of the system but also to cooperate with them. That typically requires a particular mindset when designing the application.
Let’s lay out the different aspects of integration between the various pieces, shall we?
- Each part of the system should be able to run completely independently from the other pieces.
- There are offline methods for communications (literally a guy walking down to the block with a piece of paper) that can be a backup communication plane but can also bypass the system entirely (a warrant being served directly to the block’s officers).
- There are two distinct options for communication: commands (release this inmate, ensure this inmate is ready to go to court at this date) and notifications (inmates count, status, etc.).
- We trust, but verify, each piece of information that we receive.
The first point seems like a pretty standard requirement but combined with the second point, we get into a particularly interesting issue. We may have the same information entered into the system by multiple parties at different times.
For example, a warrant to release an inmate might be served directly to the block’s officer. The release is processed and then the warrant arrives at the Registration Office, which will also enter it. At some later time, this data is merged and we need to ensure that it makes sense to the people reading it.
The offline communication plane is also a very important design consideration for a system that reflects the real world. It means that we don’t have to provide too complex of an infrastructure for surviving production. In fact, given the fact that a prison is going to hardly have a top-notch technical operations team (they might have a top-notch operations team, but they refer to something quite different), we don’t want to build something that relies on good communications.
To make sense of such a system, we need to define data ownership and data flow between the various systems. Because this is a really complex system, we’ll take a few examples and analyze them properly.
- The legal status of an inmate.
- The location of an inmate.
What is the meaning of legal status? It means under what warrant it is in the prison (held until trial, 48-hour hold, got a final judgment). At its simplest, it is what date should this person be released. But in practical terms, this can be much more complex and there may be conditions on where this inmate can be held, what they can do, etc.
Everything about the legal status of an inmate is the responsibility of the Registration Office. Any movements of inmates into or out of the prison must go through the Registration Office. Actually, this isn’t quite true. Any movement of an inmate from the responsibility of the prison must go through the Registration Office was assigned.
A good example of this would be an inmate who has been hospitalized. They are not physically inside the prison, but the prison is still responsible for them. The Registration Office doesn’t usually care for such details (but sometimes they do; for example, if the inmate has a court date that they’ll miss, they need to notify the court) because there isn’t a change in who is in charge of the inmate.
This is complex, but this is also the real world, and we need to manage this complexity. So, let’s define the ownership rules and data flow behavior:
- Legal status is owned by Registration Office and is being disseminated from there to all interested parties.
- The location of an inmate and their current physical status are owned by the block they are assigned to and disseminated from there to all interested parties.
- The assignment of an inmate to a particular block is also interesting. This piece of information is owned by the Registration Office, but it is not their (sole) decision. This may take a bit of explaining.
The block an inmate is assigned to is determined by a bunch of stuff, i.e. the legal status of the inmate, the previous/expected behavior of this inmate, what the inmate needs to be isolated from, if the inmate needs to be together with certain people, court decisions, the free space available on each block, the inmate's medical status, and many other details that are not quite important.
The Registration Office will usually make the initial placement of where an inmate is going to go, but this is not their decision: there is a workflow involved that has input from way too many parties. The official decision is at the hands of the prison commander, but recording this decision (and the data ownership of it) is at the hands of the Registration Office.
Okay, enough domain knowledge. Let’s talk about the technical details, shall we? I’m sorry that I have to do such an info dump, and I’m trying to contain it to relevant pieces, but if I don’t include the semantics of what we are doing, it will make very little sense or be extremely artificial.
The legal status of inmates in the Registration Office needs to be sent to other parties in the prison. In particular, all the blocks and the Command and Control Center.
We can deal with this by defining the following RavenDB ETL process from the Registration Office:
What this does is simply define the data that we’ll share with the outside world. If I was building this for real, this would probably be a lot bigger because an inmate is a really complex topic. What is important here is that we define this as an explicit process. In other words, this is part of the service contract that we have with the outside world. The shape of the data and the way we publish it may only be changed if we contact all parties involved. Practically speaking, this usually means that we can only add data, but never remove any fields from the system.
Note that in this case, we simplify the model as we send it out. The warrants for this inmate aren’t going out, and we just pull the latest status and release dates from the most up to date warrant. This is a good example of how we avoid exposing our internal state to the outside world, giving us the flexibility to change things later.
The question now is, Where does this data go to? RavenDB ETL will write the data to an external database, and here, we have a few options. First, we can define an ETL target for each of the known parties that want this data (each of the blocks and the Command and Control Center at this time). But while that would work, it isn’t such a great idea. We’ll have to duplicate the ETL definition for each of those.
A better option is to send the (transformed) data to a dedicated database that will be our integration source. Consider the following example:
In this case, we can have this dedicated public database that exposes all the data that the Registration Office shares with the rest of the world. Any part that wants this information can set up external replication from this database to their own. In this manner, when the Intelligence Office decides to make it known that they also need to access the inmate registration data, we can just add them as a replication destination for this database.
Another option is not to have each individual party in the prison share its own status but have a single shared database that each of them can write to. This can look like this:
In this case, any party that wants to share data will be writing it to the shared database, and anyone who reads it will have access to it through replication from there. This way, we define a data pipeline of all the shared data in the prison that anyone can hook up to.
This post is getting long enough that I’ll separate the discussion of the actual topology of the data and handling the incoming data to separate posts. Stay tuned.
Published at DZone with permission of Oren Eini, CEO RavenDB , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.