More Holes in Our Persistence Models
What is science? The practice of making up a fake world, filling it up with only data that the Scientist approves, in pursuit of publishing a fairy tale about what happened when the scientist‘s hypothesis was tested against his sanitized surrogates. We aren‘t much better in the programming world. SQL is smart to enforce constraints and make data integrity a top priority. But let‘s face the facts, 44 years later, most databases don‘t have constraints, or if they do, only the ones that were created for the users/programmers. Nuts like Joe Celko, author of 3 of the top 5 selling books on SQL, put constraints on everything, including check constraints, which would be the equivalent of an invariant in DbC, but I would be willing to bet that is a 1 in 100 proposition in modern projects. Simple example: suppose that there is a status on an account table and an amount due field. You might want to make a check constraint that says that the status can only be overdue if the amount owed field is greater than 0. Contracts are great, dumbing things down through reductionism, not great.
How do models get better? By working with them and improving them. Here‘s the rub: every single persistence framework that is out there basically has a sign up that says ‘bring me your stuff already scrubbed exactly as you described it when you first set things up, and maybe I‘ll save a copy of whatever it is you have in hand.‘ I saw a little mini app appeared on the Mac recently that lets you add things to your calendar using natural language. These cats are on to something. We should think about taking some of the same medicine. Acquiring and loading data is still a huge hassle and there are no really good tools out there to make it easier. How on earth can this be?? What‘s worse is that 10 years after the XML revolution was supposed to usher in semantically rich, schematized everything, the vast majority of integration work still involves cleaning up stupidly braindead data. The Rosetta dream is dead, there is no effort to really schematize things (a la web 3.0). Face that facts: we are stuck in this garbage pit/wasteland for the foreseeable future.
Ironically, I left this partially finished then came across the post in the image by Emannuel Bernard about NoSQL, calling them idiots for ignoring a problem and thinking it will go away. I am kind of saying that that‘s what the SQL guys have done. Who‘s right? Who cares. Also, to tie to my prior post, this is largely a consequence of the gutlessness of developers to really apply Lean and figure out where their time goes. There are only two possibilities: either you are on a greenfield project, and you can just change things constantly, in which case, the schema can evolve unhampered and there will be a rather low cost to getting your changes in, but at some point, you will have data, and the auto schema thing will stop working (covered in a prior post), or, you have to deal with real data. In line with my prognostications that the future of development is going to have to get on to things other than drafting the nth iteration of a simulation, all projects will have to deal with data, period.
Furthermore, thinking a model will take a shape anything close to finished without a bunch of data having been sent through it and a lot of usage logged, is not naive, it‘s stupid. Another possible solution to this problem would be to beef up the ability to simulate. There is zero of that anywhere on either sides. Do a search, find me articles of either camp talking about creating simulated data.
Most projects adopt a model and develop against it in a more or less empty state, then get sucked up into the drama of it filling up, at which time, evolving it or further refining it, is the last thing that anyone is going to want to do.