The Next User Interface: Why, How, and When?
The Next User Interface: Why, How, and When?
Natural language recognition can solve some problems by allowing flexible commands to be executed quickly, but are modern apps ready for that?
Join the DZone community and get the full member experience.Join For Free
Learn how to operationalize machine learning and data science projects to monetize your AI initiatives. Download the Gartner report now.
Will the future of UI be based on natural language or virtual and augmented reality? A full understanding of natural language is not possible right now (and barely will be in the coming years). Virtual reality forces a user to be fully isolated from reality, which is not always acceptable. AR is a variation of a GUI merged with reality, which is not usually considered a sort of UI. The user interface has origins in things of human nature like senses and thinking. Namely, the dualism manifests in command line UI versus graphical UI and could continue in the future as natural language UI versus virtual and augmented reality UI. Is something else possible? Do we need something else?
Yes. Preexisting UIs have certain flaws and command line interfaces have cognitive shortcomings. For example, each command and application introduce a mini-language, sometimes with a significant learning curve, and it is hard to estimate command and application abilities as they are hidden under long descriptions. GUI tried to overcome this through visual representation where all features were visible and exposed. But simultaneously it created new ones, which relate to an inability to compact everything that we have in an application into a small visual area.
Therefore, we are to work with long GUI paths (click-open-click-type, etc.) because we are to separate information by different windows. Theoretically, the most efficient UI contains a restricted number of UI elements that a user can keep in mind and easily reach. The more UI elements we have, the less efficient UI is, because a user spends more time for finding a required control. The fewer UI elements we have, the less we can control and expose. We are to balance between flexibility, expressiveness, a number of features, and usability.
Natural language recognition theoretically can solve at least some problems by allowing flexible commands to be executed as faster or even faster than GUI now. But are modern applications ready for that? Nowadays, approaches assume we have to control UI with voice by pronouncing GUI and file paths to UI controls and information. That is, we get flexible control over not flexible UI, which is not adapted for that.
Why? Possibly, we were assured semantic technologies and natural language processing can solve all problems through interaction with intelligent agents. Possibly, there is a lack of understanding that UI can and must absorb semantics itself to make itself more flexible. In any case, to apply semantic-aware technologies to semantic-unaware applications is not very efficient. It is futile to use semantic technologies to convert input into low-level UI commands and force users to think according to the latter. Users should think in terms of own intentions as closer to natural language as possible but not in terms of GUI and file paths as it is now.
How does natural language transform into computer entities (UI controls, files, etc.)? Would it be done automatically by algorithms? Maybe in the future. But why do we need to wait? One variant of a bridge over this gap could be semantic markup, which, on one hand, uses natural language and, on another, can be attached to different computer entities. Should we wait until algorithms will give reliable results here? Why, if the collaborative effort of developers and advanced users will give faster and more reliable results?
Of course, we need to get users familiar with basic semantics, but that is not the big problem. Semantics is inherent to natural language. Semantics can be understood and used by ordinary users. Some semantics principles are used in modern UI. The relation of similarity ("is") expressed with file types, the relation of composability ("has") expressed with directories, identification awareness raised with the usage of search engines, which also forced users to understand the semantics of queries and results. The next step is to make semantics more user-friendly and that's why the markup is needed.
Is there something common between different computer applications? Imagine we work with information that relates to a planetary system and planets. If it is represented with GUI, then our application has windows for a planetary system and planets. If it is represented with a command line application, then it has arguments, which specify planetary system and/or planet and which define what will be shown. If it is represented as a database, then it consists of two tables, linked with planetary system id.
Finally, it can be represented as web pages linked with hyperlinks. To see a list of Solar System planets we use different ways: (a) to select a row in a planetary system list and press "Show planets" button, (b) to define -planetary-system "Solar System" argument, (c) to use "SELECT * FROM pl_system ps, planets p where ps.id = p.ps_id" query, (d) open the "Solar System" link. These ways are different from the point of view of computer representation. And they all are common from the point of view of users as they correspond to one question of natural language: "what are the planets of Solar System?"
How do we get answers in the cases above? The application and UI should be aware of what relates to a planetary system, what does to planets, how they are linked, etc. This is done differently in each case (according to some inner logic and semantics), but for semantic markup, we need a slightly different approach: precise and simple enough, appropriate, explicit, flexible, gradual, and human-friendly usage of meaning. Additionally, meaning has to affect UI and UI has to affect meaning.
What is the precise enough usage of meaning? Planetary systems should be characterized by these words, not by PlanetarySystem class, PlanetarySystemForm window, PSystem data structure, PL_SYSTEM table, or planetary-system.html file. These identifiers are computer-friendly, but for humans, they need decoding. This could be ambiguous (in PSystem and PL_SYSTEM cases). Users need human-friendly and sufficiently unique identifiers (this means that they could include additional words to discern like "planet" (astronomical body) and "Planet" (magazine).
What is the simple-enough usage of meaning? A user may not understand a difference between a class and an instance but does understand the "is" relation from natural language and understands how things can be similar. Of course, the distinction between class and instance is important but nevertheless, it could be somewhat arbitrary. The Earth can be an instance of a planet class but could be a base of an Earth-like planet class. A list of planets may have only planet identifiers or planet instances. That is, "instances" and "classes" are rather roles not fixed classification titles. And if a user knows only the "is" relation, it is enough to work only with "Earth is a planet" sentence without more precise "Earth is an instance of a planet."
What is the appropriate usage of meaning? At they very least, it implies identification, abstraction, similarity, and composability matters more than any distinction of noun, verb, subject, predicate, and object. Simply put, it does not matter if "planet orbits star" or "planet has orbit around star." The distinction between nouns and verbs is required for natural language but, in reality, most nouns and verbs are object-action dualisms or a complex of many objects and actions (moreover, sometimes we use words as nouns and verbs only to have correct sentence order or a phrase, which sounds better). It is more important to identify participants of a described situation but not which parts of speech are used.
What is the explicit usage of meaning? It is about explicit distinction of meaningful identifiers (to discern "buildings of New York" and "New buildings of York" out of "new york buildings") and relations between them.
What is the flexible usage of meaning? The meaning of any application should be ready to be extended with any information, which was not anticipated by original intentions but which is required for user intentions.
What is the gradual usage of meaning? We should imply that a user can specify meaning as much and as precise as he or she can for now (by describing 1 TB of data with a few words). And extend it later.
What is the human-friendly usage of meaning? Semantic definitions should be as close as possible to natural language to minimize a learning curve.
How can meaning affect UI? Any question is meaning itself too, and, for example, "What are planetary systems?" (which could correspond to short "planetary system" query) consists of an unknown and "planetary system" linked with "is" relation. This question matches both "A planetary system is a set of gravitationally bound non-stellar objects in orbit around a star or star system" and "Solar System is a planetary system". To answer this question an application should correlate "planetary system" identifier and own elements. Thus, (a) the GUI application may show a planetary system window, (b) the console application may show a list of planetary systems, (c) the database may output a planetary system table, (d) a browser may show a planetary system web page.
In the case of "What are planets of Solar System?", a simple inference is required to get "Solar System is a planetary system". Which will imply (a) the GUI application may show a planet window as if "Solar System" selected in a planetary system one, (b) the console application may work as if a planetary system argument filled with "Solar System" value, (c) the database may transform the question into "SELECT * FROM pl_system ps, planets p WHERE ps.id = ps_id.id AND ps.name = 'Solar System'" query, and (d) a browser may show a Solar System page.
Links between semantics, questions, and answers and parts of GUI, consoles, databases, and web applications open possibility for better control of UI with the help of natural language (or, at least, with a set of pre-defined questions and answers, which is a step forward, anyway). But it also means that UI can change because there is no need in long sequences of clicks. All UI elements may be reached directly from, say, a search input. All questions an application can answer may be registered in a system, which will distribute them further. Of course, it does not concern only applications and "what?" questions. Separate UI elements may answer questions like "which mass has the Earth?" On the other hand, an application may be able to explain why it behaves namely so (for example, why a certain window is not enabled or shown) or how some function may be activated or show all options, which relates to certain meaning scope.
How can UI affect meaning? UI implies interaction, meaning scope (or context). GUI implies some context; a planet window contains information/controls, which relate only to a specific planet. File systems do, too; a planet file directory theoretically should contain only plane-related information (though OS has no means to restrict content with it). But this is not enough, meaning scope and context control should be explicit and should affect the entire working environment. When we open a planet window, a planet table, or a planet related file, we should move to a planet scope/context. It may include more information on planets or a specific planet. UI controls to deal with it. Inside it, terms should be interpreted as related to a planet (or to astronomy as more generalizing conception).
Any meaning and corresponding meaning's scope can be represented with an identifier of quite different length (word, phrase, sentence, etc.), which is a kind of a semantic link. A semantic link is a link to a set of resources corresponding to a specific meaning (and which won't be the same for different users, computers, and circumstances). This could change a paradigm of search. Instead of dumping information in a pile and sifting after, a search will know where an answer can be found (at least locally).
How many times did you search for some information that you knew about but did not remember its exact location? And how many times was search not able to find it? The secret is simple: often we remember information by meaning (maybe not very precise) but not by its location in a file system or GUI. The above-described approach may change it because information will be linked with meaning. It is easier to correspond similar meaning than to remember long files or GUI paths or even to search with similar words.
All of this opens more possibilities for UI migration to natural language. It is not enough just to control UI with natural language. Instead, UI should "understand" natural language, at least to some degree. Also, the very users should understand what meaning they deal with. Users aren't comfortable if they don't understand how information interpreted. Look at a modern search: in part, it flourishes because users get accustomed to the way it retrieves information.
Of course, there is a lot of space for quite complex semantic algorithms but they may be too big for a small space of a standalone computer. Users need to organize their own information space as they want according to clear rules that allow the retrieval of information later in a predictable way. That's why they need to understand semantics itself. Maybe in the form of small semantic markup (as it is illustrated by the meaningful.js library), maybe by something else. But they really need it. And UI needs it.
Opinions expressed by DZone contributors are their own.