Critical Overview of Software Thought Leadership
Critical analysis of software thought leadership from Google, Facebook, Amazon, Twitter, etc.
Join the DZone community and get the full member experience.Join For Free
Software is a formal discipline (it has numerous links to logic and foundations of mathematics) and yet the industry is saturated with cargo cult practices. Most companies are not Google, Amazon, Twitter, Facebook, Netflix, etc. and they never will be but a lot of programmers uncritically internalize and advocate for practices developed at those companies. I'd like to critically analyze some of the thought leadership and technology that has come out of those companies and how their uncritical adoptions has been a net negative for the software industry.
Netflix - Chaos Engineering
Netflix is famous for seemingly bucking the trend when it comes to engineering practices and theories. If you search for "netflix engineering/work culture" you will find slide decks outlining their work culture and how they famously don't put up with "low" performing team members.
I don't know how they decide who is a "low" performer and who is a "high" performer. I suspect there are a lot of long work hours involved if you want to be considered a "high" performer. This might be something that appeals to you or you might find out that after a few months you are burned out even though they're technically paying you $500k+ for all your effort.
My main issue with their philosophy is that no one else is paying their engineers ridiculous sums of money to manage 1000s of servers. Money is a great incentive for a lot of people and the type of people this will attract might actually enjoy the "high" performance culture. They probably also need all the senior engineers they can get for the scale they operate at and since senior engineers are generally more skilled they will tend to demand higher salaries. So the type of practices they advocate might not be applicable for other engineering teams operating at much smaller scales and with fewer senior engineers. Chaos engineering requires a great deal of discipline and experience because fire/chaos drills are not easy things to consistently practice. Systems fail on their own and most engineering teams have a hard time keeping up with just regular failures.
So most engineers work at companies that don't have the necessary prerequisites for chaos engineering. The underlying business (streaming media) and engineering incentives (any downtime is very bad) are just not there for most companies. Most companies are not streaming terabytes of video and they can afford to have some downtime every now and then. Every minute Netflix is down is probably a million dollars of revenue lost which is not the case for most other companies. I'm exaggerating a little bit but they pay their engineers so much because they can't afford to not pay their engineers that much. Every minute of uptime is real money that goes into Netflix coffers and they're willing to give some percentage of that to "high" performing engineers to make sure Netflix continues to make as much money as possible. Chaos engineering is how they keep themselves honest when building systems that print millions of dollars every hour.
Can people benefit from adopting chaos engineering practices? Certainly. It doesn't hurt to engineer resilient systems. It also doesn't hurt to use practices that don't require so much human captial (senior engineers) and discipline. I've gotten way more benefit out of trying to use programming languages that have a static/gradual type systems than chaos engineering principles. Do I think about failure modes? Yes. Do I go out of my way to build tools that will try to inject chaos into my own software systems? No. I just don't have the time or resources and I've never worked anywhere that did. So, for most engineering teams, the time invested in learning about better tools and how to use them to engineer more resilient systems is going to be a much better investment of time than figuring out how to do chaos/fire drills.
Google - Big Data
I can't imagine the scale Google operates at. Their scale is so large that they can consistently throw away good products and still manage to stay in business. They famously killed their RSS reader which at the time was consistently getting favorable reviews.
People rightfully got upset but Google didn't care. They probably crunched some internal numbers and decided it just wasn't worth it to continue operating/maintaining an RSS reader at their scale. There was a lot of outrage but Google just didn't care and continued to print money with their other business ventures.
Google operates at such a large scale that they can just throw away good products that people like. Anything that makes less than a million dollars is just not worth it to them. So everything they do is "BIG" relative to other companies. Most companies can't throw away products that make less than a million dollars a year and continue to surive. So it's no wonder then that Google champions engineering approaches involving "BIG data" because they themselves actually operate at a very "BIG" scale. People call Google a juggernaut and it's a valid analogy. No other company has such an outsized effect on the internet and so consistently and inadvertantly "stomps" on people by discontinuing products that those people were using and were happy with.
So it's a safe bet to assume that most people are not operating at Google scale and are not collecting and analyzing Google scale data. I can say with pretty high certainty that any random company you point me to will not have "BIG data". They will have regular data that fits on a single large VM with a few hundred GB of RAM.
That's it. That's their scale, a single virtual machine with a lot of RAM. So when they go looking for engineering practices structured around having so much data that it doesn't fit on a single server then those smaller scale businesses will invariably run into a lot of issues trying to contort their problems into engineering frameworks that were not built for their scale.
The contortions will involve deploying monstrosities like Hadoop/HDFS, Spark, Flume, Pig, Kafka, Samza, etc. all the while forgetting that their dataset fits on a large virtual machine. What they really needed were some ETL processes but what they got instead was a distributed system and all the attendant headaches and costs associated with distributed systems. Instead of printing money like Google they are now burning it because they adopted practices that were designed for unimaginable scale and deployed software systems that require tons of resources that are only available when you operate at Google's scale.
Facebook - Moving Fast and Breaking Things
When your site consists of people posting comments and pictures you can afford to lose a few comments and pictures because people will just repost them. Facebook is not a safety critical system and in the early days they prioritized growth at the expense of everything else because they knew that people would continue to use the site even if a few pictures got lost.
Notice a common pattern here, just like Netflix and Google, Facebook's business incentives determined their early engineering practices. Maybe engineers wanted to go fast and break things as well but Facebook's business model in turn necessitated a lot of experimentation to figure out what a large number of people would like. If a few of them had a bad experience because some server somewhere had some buggy code then it wouldn't be a problem if that user enjoyed everything else on the site enough to stick around.
The practice of shipping any code that is barely functional is a viable business strategy only if your business model is predicated on having lots of things that people like even if some of those things are currently unpleasant for some subset of users. Some software can operate that way, people will use it even if a few things are broken, so it's fine to pile on technical debt to accomplish some business goal but at some point that stops working. Now that Facebook is an actual company that technical debt is starting to catch up. All the activity around their security breaches are a good example of what happens when you prioritize growth at the expense of everything else. Badly engineered systems that no one understands are easier to hack and game.
So just don't do what Facebook did. Don't prioritize growth at the expense of everything else. I'm pretty sure making software the way Facebook made software is borderline unethical. Well, I think it would be if there was an engineering code of ethics and conduct. I doubt any of the engineering practices at Facebook would make the cut if there was such a code of ethics and conduct.
Amazon, Twitter, etc. - Microservice All the Things!!!
People that have managed a large enough software system with a sprawling mess of dependencies have in theory gotten a taste of what it means to operate with microservices. There is an irreducible amount of complexity in any software system. The complexity can be moved around but it can't be hidden. It's like a physical conservation law. You can't destroy or create energy, all you can do is transfer it from one form to another.
The trade-off you're making with microservices is more dependencies for potentially more velocity in changing those dependencies. In practice, what tends to happens is that each microservice starts to depend on internal implementation details of the other microservices and you end up with a distributed monolith. I have yet to see a successful microservice deployment. If you know of one then let me know because I don't want to be unfair to any software methodology that actually works. I've never seen a successful microservice deployment so if I'm missing something then I want to know.
There is no substitute for thinking through business requirements and figuring out which new technologies actually address those requirements. Try to think through the complications of introducing the latest and greatest technologies buoyed by marketing and make sure you know what cost you're actually paying to be on the bleeding edge. Do try to simplify as much as possible because there is no substitute for a simple and well designed software system. At the end of the day software is a means to an end and we should always focus on human outcomes over overhyped software practices.
Published at DZone with permission of David Karapetyan, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.