Big Code: The Ultimate Challenge of Software Engineering (Part 1)
Big Code: The Ultimate Challenge of Software Engineering (Part 1)
Big Code was born of Big Data, big applications, and big communities. Both Big Data and Big Code were born of complexity, and complexity can't be fully eliminated by any technology.
Join the DZone community and get the full member experience.Join For Free
Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.
How do you imagine the future of software engineering? Some dream about a future in which code will be written directly from your mind, where there won't be any programming at all because code will be written by AI, where everyone will be software engineers and it will be a routine activity for everyone. But full-fledged mind scanning, artificial intelligence, and widespread programming knowledge by everyone (if these are even possible) won't become the reality anytime soon. On the other hand, we cannot say everyone is an accountant just because they have money and work on a budget. Accounting is more than that, and such a statement depends on the level of complexity we consider.
Complexity is definitive for everything we do. Especially today, when it is raised in all areas of computing. This is challenging and imperative. Data and code became big because of Internet-connected sites, data stores, and applications. Applications became big because they must anticipate all possible use cases for any possible user. Communication became big because of lots of white noise being generated each minute worldwide. The development community became big, too, because development teams are constantly changing and are dispersed worldwide.
Code became big because of all of these factors. Some typical questions that arise about Big Code are:
What does this code do?
What do you do if you understand what this code does but do not understand why?
How do you understand which use cases are affected by code and which code is affected by a certain use case?
Can we measure coverage of use cases by code?
How do you find correspondences between requirements and code?
Can you measure coverage of requirements by code?
How do you reuse code by a team efficiently (because it is not always possible to notify everyone, especially future team members)?
How do you gather logic, which can be scattered between files, layers, and services?
Is it possible to make unit testing even more useful without making it a bothersome activity?
How do you make commenting reasonable and useful — not "nice to have" but with real benefits?
Is self-documenting efficient? What's if a name is wrong or inappropriate?
Do we need commenting at all (with self-documenting or other alternatives)?
How do you link logging and code in the most efficient and simple way?
How do you find bug dependencies on previous issues if someone forgot to link tickets?
Are version control tools enough for tracking changes (i.e. when you need to know when and why a line was added)?
How do you support correspondence between changes in code and comments/documentation?
Is it possible to expose features without risks for usability?
Is it possible to satisfy contradicting demands of all customers in one application?
Is bug-free code possible?
Can these questions be addressed with extremely efficient programming paradigms? With exciting programming languages? With good architecture? With appropriate code conventions? With respected development culture? Big Code problems only partially affected by the way we write code and imperfect tooling. On the other hand, most paradigms, languages, technologies, and techniques focus on code consistency, whereas Big Code problems are about code completeness (volume and complexity of code logic). It does not matter which paradigm or language you choose — code can always grow over intelligible limits and, at some point, you just won't be able to embrace the complete mechanics of your application.
However, Big Code problems partially come from the nature of cognition. They arise from our inability to grasp more than we can because, at a certain level of volume and complexity, information is characterized by vastness, vagueness, uncertainty, inconsistency, and fallibility. Big Data and Big Code were both born of numerous aspects and use cases that are considered by a specific algorithm. Both are born of complexity, and complexity cannot be fully eliminated by any technology. We can't just ignore some factors or use cases without loss of accuracy. We can break code into smaller parts but they must be integrated somewhere. We can postpone complexity to other layers but we can't avoid it.
Can the above-mentioned questions be handled theoretically? Evidently, yes, because we address all of them somehow at some point. Do we need an enhanced (automated) solution for them? Yes, because our minds are not capable of covering information with volumes higher than a certain level. Why are these questions raised at all? Because code may be quite cryptic (with hard-to-decode abbreviations and "obvious" parts). Because information may span across tools and emails, or not be implied at all. Because all good conventions and development culture points are not respected fully (for different reasons, starting with lack of time and ending with negligence). Because what should be synchronized becomes more and more desynchronized. Is this a problem of a "bad guy"? No — this is the problem of the entire team. Blaming a "bad guy" may bring you satisfaction but it does not resolve the problem. Just compare a situation in projects with a solution that automated some aspect of software engineering (like unit testing or continuous integration). You don't have to preach to follow good coding practices, where this is managed explicitly by corresponding technologies.
Is Big Code challenging enough? All mainstream paradigms were born long ago, and there is a reason for that: abstraction as reality imitating activity is restricted. It is quite improbable that we would see a lot of new ones in coming years, whereas old ones continue to merge. The long story of object-oriented vs. functional opposition came to an end and resulted in a multi-paradigm approach (as both imitate the inseparable space-time dualism of the universe). Could a mythical new paradigm propose a new kind of abstraction better than objects? First, remember Occam's razor. Second, remember that objects replaced structs because the former combines both data and code (methods), whereas the latter contains only data. Therefore, a mythical abstraction should propose some totally new principle unnoticed, unknown, and insignificant before. In the outer world, we can't find anything, as space-time dualism is the current representation. Quantum mechanics, maybe? But it does not replace Newtonian laws (which are applied more frequently in our lives) and is used in parallel with them. Therefore, the most probable quantum computing will operate in parallel with traditional software and paradigms. In the abstract world, the only thing that comes to mind is aspects, but the aspect-oriented paradigm is not widespread (though perhaps there's just not a good enough implementation). Could some technology be more challenging? The most challenging for now is AI. But the problem is that it prefers a black box approach, which prevents widespread usage in software engineering (made from freely distributed paradigms, which can be reproduced by millions of developers at home labs).
That's it for Part 1. Stay tuned for Part 2, where we'll discuss how Big Code can be addressed and imagined today.
Opinions expressed by DZone contributors are their own.