Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Big Code: The Ultimate Challenge of Software Engineering (Part 4)

DZone's Guide to

Big Code: The Ultimate Challenge of Software Engineering (Part 4)

In Part 4 of this series on the concept of big code, let's continue brainstorming big code by looking at mechanics and ecosystems.

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

Brainstorming Big Code: Mechanics

Meaning mechanics as well as meaning itself start from similarity. "is-a" is a relation of similarity, which has several sub-relations like "is an instance of," "is a type of," "abstracts," "specifies," etc. As we cannot have unique identifiers of everything, we need tools to work with parts of the whole. Therefore, similarity is followed by structuring relations as "has-a"/"of-a" (with more specific "has property," "has value," etc.) and with a relation of combining (i.e. between an object and an action or a method, or between different aspects of something). Structuring and combining relations are quite close. For example, "orbit of planet" is an inclusion (where an orbit as a "static" conception assumed to "belong" to a planet), whereas "planet orbits" is a combination (of a thing and an action) with quite similar meaning. These relations underline the abstraction; however, we need more to specify other aspects of reality and abstraction. Therefore, they are followed by space-time, cause-effect, abstraction references ("someone said that" or when reality is referred to by a book, which is referred to by a movie), and others. In other words, the first goal of mechanics is addressing What? Who? What does? Where? When? Why? How? questions.

"is-a" and "has-a" relations mentioned here are different comparing to ones used in programming. In many mainstream programming languages, the "is-a" relation implies a hierarchy of classes with inheritance, as if the whole world is similar to a hierarchy of biological species. Is it so? No, because even biological hierarchies are not strictly hierarchical (as branches may intertwine). Classifications are based on the similarity of instances within one group, but generally, there can be too much similarity criteria. Therefore, the hierarchy may significantly vary, and as users bear in mind too much different similarity criteria, they may not guess which one is used by a specific hierarchy (which can be critical to reach deep levels). "has-a" is too general in programming. For example, Jupiter diameter, moons, and books about are expressed as "fields" of the "Jupiter" object, though they represent very different kinds of "has-a."

Why does it matter? Because we need to express "is-a" and "has-a" relations in many additional cases, which are not covered by programming paradigms. For example, the "is-a" relation can be used for inheritance between "AstronomicalBody" and "Planet" — however, the "Jupiter" object is also a giant planet. Should we use multiple inheritances? Interfaces? Belonging to a "giant planet" class may not involve any additional fields or methods. Moreover, when using similarity, you may realize you need hundreds of such interfaces. Do we need to produce new entities? Barely. The "has-a" relation is not always expressed in fields/attributes/properties. For example, the getPlanetDiameter() function implies the "planet {has} diameter" meaning, which is expressed only in the function name and not in the programming construct. Of course, it may be expressed in the "planet" class and getDiameter() function, but it may be redundant to create a separate class for planets in your application. That is, in each specific case, you will be guided by programming experience, whereas semantics should know about any "is-a" and "has-a" relation in a concerned domain.

Of course, even though semantic markup is interesting by itself (as a tool for disambiguation), it becomes even more powerful when participating in inference. Unfortunately, modern rule/inference engines are quite heavyweight and sometimes claim to make "universal" inferences based on world-wide databases of facts. Development goals are often not so ambitious, as we need to operate with a domain or even a small part of a domain (say, for a microservice). This requires a simple and lightweight way to operate with "semantic arithmetic" (preferably, locally and within predictable time), similarly to Boolean logic. And probably, such arithmetic will become a part of programming languages in the near future, so let's consider some simple examples for that.

  • If "Jupiter {is instance of} planet" and "planet {is} celestial body", then "What is planet?" should return "Jupiter" and "celestial body" answers.

  • If "Jupiter {is instance of} planet" and "astronomical body {is type of} planet", then "Jupiter {is} planet".

  • If "Jupiter {is} planet" and getPlanetDiameter() function returns "diameter {of} planet", then this function may be queried with "diameter {of} Jupiter".

Brainstorming Big Code: Ecosystem

How will these principles be implemented in a computer environment?

Explicit system-wide semantics implies semantics should be used and interpreted similarly by different applications. Any application-wide mechanisms like tags are efficient only within boundaries of a specific application. To link different layers of software engineering, we need a cross-boundary mechanism. Covert mechanics (as in search engines) or barely conceivable ones (as with deep learning and AI) are even less efficient outside corresponding tools.

Semantic links and semantic space imply mechanics similar to search engine queries. However, unlike search engines, semantic space can operate with slightly more precise meaning. For example, "astronomical body" may result in the following links in your environment (instead of millions): to the AstronomicalBody class, to the "Astronomical object" article in Wikipedia, and to a local document with the "Astronomical Body" chapter. Such semantic space may replace local file systems and the web with a more human-friendly environment.

Content-driven usage implies changes in our modus operandi at the computer. What is the first thought when you need to retrieve some information from a computer? First, it's which application to use, then it's where a resource with information is located, and then we finally reach content. What if we turn this scheme upside down? Reaching content should imply that behind the scenes, a corresponding resource will be involved with the help of some application. File application-driven usage was the only possible way to work with computers many years ago when they didn't have enough power to have something different. Now, computers have more power but nobody wants to change the status quo because it is really hard to fight with stereotypes. But sooner or later, we need to go this way.

Ubiquitous servicing implies a sort of continuation of service-oriented architecture and microservices trends but on a larger scale. What if any element of computer environment (applications, services, frameworks, files, databases, and even a phrase of a text) may serve as a service? For example, an astronomy application or a database or a file with planet data can expose the "diameter {of} planet" meaning. This may simplify access to data and code, as you may access them without additional development for graphical or command-line interfaces.

Lightweight tooling implies rather local tools executing tasks within finite time. From a development point of view, we would prefer something doable to something great but insurmountable. We would prefer small domains inside applications or files that can be used by users and other apps for the foreseeable future. We would prefer lightweight formats based on natural language (as all identifiers and relations consist of natural language words) rather than heavyweight standards for the semantic web. We would prefer simple inference tools implementable with small libraries. We would prefer small portions of information to huge information heaps, which many semantic tools try to provide.

Self-explanatory applications imply that software should make clear why it behaves a certain way. If we request some operation, the software may ask for missed items or advise what isn't configured correctly. For example, if we want to know the diameter of Jupiter, the software may advise installing missing services for retrieving planet data. Or, if we already have an application with such data but this item is disabled, it may explain that this is because a specific option is set to false.

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

Topics:
big data ,data analytics ,software engineering

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}