Modelling Is Everything
Join the DZone community and get the full member experience.
Join For FreeI’m often asked, “What is the best way to learn about building
high-performance systems”? There are many perfectly valid answers to
this question but there is one thing that stands out for me above
everything else, and that is modelling. Modelling what you need to
implement is the most important and effective step in the process. I’d
go further and say this principle applies to any development and the
rest is just typing :-)
Domain Driven Design
(DDD) advocates modelling the domain and expressing this model in code
as fundamental to the successful delivery and ongoing maintenance of
software. I wholeheartedly agree with this. How often do we see code
that is an approximation of the problem domain? Code that exhibits
behaviour which approximates to what is required via inappropriate
abstractions and mappings which just about cope. Those mappings between
what is in the code and the real domain are only contained in the
developers’ heads and this is just not good enough.
When requiring high-performance, code for parts of the system often have
to model what is happening with the CPU, memory, storage sub-systems,
or network sub-systems. When we have imperfect abstractions on top of
these domains, performance can be very adversely affected. The goal of
my “Mechanical Sympathy” blog is to peek at what is under the hood so we can improve our abstractions.
What is a Model?
A model does not need to be the result of a 3-year exercise producing
UML. It can be, and often is best as, people communicating via various
means including speech, drawings, illustrations, metaphors, analogies,
etc, to build a mental model for shared understanding. If an accurate
and distilled understanding can be reached then this model can be turned
into code with great results.
Infrastructure Models
If developers writing a concurrent framework do not have a good model of
how a typical cache sub-system works, i.e. it uses message passing to
exchange cache lines, then the framework is unlikely to perform well or
be correct. If their code drives the cache sub-system with mechanical
sympathy and understanding, it is less likely to have bugs and more
likely to perform well.
It is much easier to predict performance from a sound model when coming
from an understanding of the infrastructure for the underlying platform
and its published abilities. For example, if you know how many packets
per second a network sub-system can handle, and the size of its transfer
unit, then it is easy to extrapolate expected bandwidth. With this
model based understanding we can test our code for expectations with
confidence.
I’ve fixed many performance issues whereby a framework treated a storage
sub-system as stream-based when it is really a block-based model. If
you update part of a file on disk, the block to be updated must be read,
the changes applied, and the results written back. Now if you know the
system is block based and the boundaries of the blocks, you can write
whole blocks back without incurring the read, modify, write back cycle
replacing these actions with a single write. This applies even when
appending to a file as the last block is likely to have been partially
written previously.
Business Domain Models
The same thinking should be applied to the models we construct for the
business domain. If a business process is modelled accurately, then the
software will not surprise its end users. When we draw up a model it
is important to describe the relationships for cardinality and the
characteristics by which they will be traversed. This understanding
will guide the selection of data structures to those best suited for
implementing the relationships. I often see people use a list for a
relationship which is mostly searched by key, for this case a map could
be more appropriate. Are the entities at the other end of a
relationship ordered? A tree or skiplist implementation may then be a
better option.
Identity
Identity of entities in a model is so important. All models have to be
entered in some way, and this normally starts with an entity from which
to walk. That entity could be “Customer” by customer ID but could
equally be “DiskBlock” by filename and offset in an infrastructure
domain. The identity of each entity in the system needs to be clear so
the model can be accessed efficiently. If for each interaction with a
model we waste precious cycles trying to find our entity as a starting
point, then other optimisations can become almost irrelevant. Make
identity explicit in your model and, if necessary, index entities by
their identity so you can efficiently enter the model for each
interaction.
Refine as we learn
It is also important to keep refining a model as we learn. If the model
grows as a series of extensions without refining and distilling, then
we end up with a spaghetti mess that is very difficult to manage when
trying to achieve predictable performance. Never mind how difficult it
is to maintain and support. Everyday we learn new things. Reflect this
in the model and keep it up to date.
Implement no more, but also no less, than what is needed!
The fastest code is code that just does what is needed and no more.
Perform the instructions to complete the task and no more. Really fast
code is normally not a weird mess of bit-shifting and complier tricks.
It is best to start with something clean and elegant. Then measure to
see if you are within performance targets. So often this will be
sufficient. Sometimes performance will be a surprise. You then need to
apply science to test and measure before jumping to conclusions. A
profiler will often tell you where the time is being taken. Once the
basic modelling mistakes and assumptions have been corrected, it usually
takes just a little mechanical sympathy
to reach the performance goal. Unused code is waste. Try not to
create it. If you happen to create some, then remove it from your
codebase as soon as you notice it.
Conclusion
When non-functional requirements, such as performance and availability,
are critical to success, I’ve found the most important thing is to get
the model correct for the domain at all levels. That is, take the
principles of DDD and make sure your code is an appropriate reflection
of each domain. Be that the domain of business applications, or the
domain of interactions with infrastructure, I’ve found modelling is
everything.
From http://mechanical-sympathy.blogspot.com/2011/09/modelling-is-everything.html
Opinions expressed by DZone contributors are their own.
Comments