Over a million developers have joined DZone.

Big Code: The Ultimate Challenge of Software Engineering (Part 5)

DZone's Guide to

Big Code: The Ultimate Challenge of Software Engineering (Part 5)

Are the measures that we proposed in the previous posts in this series really enough to solve the Big Code problem?

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

Before reading, be sure to check out the rest of the series below:

Can our proposed measures help to solve the Big Code problem?

The questions at the beginning of this series were linked to the implicit meaning of requirements, code, tests, and other constituents of software engineering. Why does this happen? Because deciphering code meaning takes time. Because even using comments, it is not enough. Often, we don't know what components participate in concerned business logic (as comments relate only to a specific part of the code in a specific file). Why can't we support a semantic model and use cases (as particularities of a model) of domain and programming logic explicitly? Developers can write code according to a domain semantic model and use cases and verify whether code corresponds to this model. In its own turn, QA can verify the same, but rather as a black box. For example, the meaning of the getPlanetDiameter() function can be exposed as a "diameter {of} planet" identifier with the "diameter {of} Jupiter {is equal to} 139822 {units of} km" use case. To verify this, we need to check whether an application can answer "What is the diameter of the planet?" This use case is covered with this question (in this case, the base meaning and use case are evident; however, this may not be so in more vague situations with a lot of conditions).

This implies a sort of semantic interface to applications that can be implemented more easily than UI and CLI. It requires additional steps compared to mere APIs but this will pay off later. APIs often require unnecessary cognitive effort (because they are not documented well). Apparently, semantic interface can be complex, too; imagine the meaning of the getPlanet(orbitalPeriod, numOfSatellites, radius, mass, yearOfDiscovery) function. It may sound like "What is planet with {_ orbital period}, {_ number of satellites}, {_ radius}, {_ mass}, {_ year of discovery}?" It is not far from the API definition and is even more verbose, right? However, we have more important benefits: each parameter name can be replaced with a similar one ("number of moons" instead of "number of satellites") or with an inferred one ("diameter" instead of "radius"), not to mention that parameter names are now deciphered into grammatically correct natural language identifiers. An even more important benefit is that a semantic interface takes almost no additional effort once you understand markup syntax and mechanics. The learning curve for any semantic interface would include only effort frp, new terms and possibly new relations, but this is an effort that we take for any new domain.

This implies that we need to build semantic models for everything (data, code, communication, etc.). We should not wait for AI to do this for us but learn to express meaning ourselves. Human minds, in particular (and abstraction in general), is a quite flexible thing: gigabytes of data may be described with two or three words making up one small step, but even such two or three meaningful words are better than nothing. For example, in communication. Don't you ever encounter a situation in which some topic is discussed in a multi-page thread (or a long tree of questions/answers) and someone says at page 234, "But we discussed this; see above." Should we explain how long it may take to go through all these pages and find an answer? How is semantics applicable here? Similarly to any application, any communication represents a semantic model. That is, by discussing, we "develop" this model, adding or changing a part of the meaning. We should learn to build semantic models. And namely, these semantic models may address the above-stated Big Code questions (to some degree).

All these measures cannot completely resolve the Big Code problem. It may become sufficiently manageable but it will be always Big because the universe is always bigger than any human activity. Ideally, the resolution for Big Code means bug-free code, which is utopian (unless we restrict requirements). Why do bugs happen at all? Mostly because some factor or use case was not considered when designing/implementing/verifying. Can we consider all factors and use-cases? Theoretically, no, as sometimes, we even don't know about them in advance. Practically, at least we can try to do this for factors and use cases considered by design. This is not an easy task, as the application may include thousands of factors and use cases with thousands of dependencies. Even in this case, it can be complicated, considering all combinations of factors and use cases (as the number of combinations may be very high). Or, in other words, can we fully automate tests ("fully" means the creation of tests based on a semantic model by the automatic combination of factors and use-cases)? The proposed approach of defining semantics implies formalizing affected factors and use cases, which may be followed by automation. But this is a different story.

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

big data ,data analytics ,software engineering

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}