So the bot boom is here. Mark Zuckerberg hit the gong at the latest F8 conference, and since then, more than 11,000 bots were created on Facebook Messenger. Wow.
Well, creating a basic "small talk" bot or "What's the weather?" bot is not very difficult.
Facebook provides a very nice tool — wit.ai, that lets you easily develop dialogs, identify intents, and run functions.
But what about more complex scenarios? How to build a modular system? How to split the work? How to support complex flows? How to manage states? How to use advanced cognitive services? It can get messy.
In this post, I'll discuss design considerations when building artificially intelligent bots. I will do so by breaking a typical system into smaller modules and discuss each component and how they interact.
AI Bot Core Components
If you think about it, bots are very similar to humans in many aspects (surprise) — they can sense the environment, can understand what they see or hear (or smell?), and can take actions accordingly. Bots have memory and can change their behaviors based on past knowledge.
I like to divide AI systems into 5 main modules -
sensors, control flow, state repository, cognitive services, and business logic.
Sensors are the eyes and ears of the system — they sense the environment and submit the information for further analysis; with minimal changes on one hand but on a standard interface on the other hand. Information sources might include messaging vendors (Facebook messenger, Skype, Slack), voice recording, raw images, videos, etc.
Design Consideration #1: Support Multiple Sources
From the very first day, remember that the number of sources is usually small and limited, but there are also more than one.
You'd probably need to develop a special plugin for each source that takes care of the specific protocol between the information source and your system.
A very nice example of this module can be seen at the Microsoft's Bot Framework, which they recently launched for preview. They let you mesh with any messaging channel and receive the information in a standard format regardless of its source.
Design Consideration #2: Transforming Data
When data arrives, it wears the format of the channel it came from — email doesn't look like SMS. The main logic of the sensor module is to transform the information to a standard format that can be 'digested' by the underneath layers.
The right way to do that is by 'promoting' special attributes to a common schema (to, from, body, etc.), but also keeping the original raw information as well.
When information leaves your system, the opposite procedure should be taken. Now you'd need to take the standard output and transform it to the format that can be consumed by the channel.
Control flows are the heart of the AI system — like real hearts, they are not very smart, but they're very effective. Under a very specific set of rules, they direct the blood to the right organ back and forth and keeping the system alive.
In AI bots, control flows are used to manage the different types of interactions that the users can have with the system — e.g. what are the questions and answers, which forms need to be filled, and how does it escalate to a real person. They should let the user the feel as if he's navigating the conversation, while the 'story' is usually already written for him.
Control flows are usually built as a set of waterfalls or as state machines.
Waterfalls: a set of events happening chronologically. Each step adds additional information to the next phase, aiming to reach a specific goal.
User: What is the weather in Tel Aviv?
Bot responds: For when would you like to know?
User says: Next Monday.
Bot says: It will be approximately 23c.
Waterfall might sound like an oversimplified model of reality, but as a matter of fact, it performs very well in most cases, and a smart combination of those can be very powerful.
State machines: Apparently, life is not just about a set of decision trees. One can change his mind and go five steps up and three steps aside.
Waterfalls are chronological in nature and might not be suitable for advanced use cases, which might be more suitable for an automated/state machine with clear transformation rules.
Design Consideration #3: Avoid Mixing Control and Logic
The developers of the control flow layer usually have a different skillset from the those who develop the backend facade or the cognitive services. Their job is to help the user navigate the system in order to get his answer in the fastest and most effective way. They are more focused on user experience than scale, performance, and other engineering issues. I call this new profession AI storytellers, a close relative of web designers.
When developing an AI system, be sensitive to this difference of skills and focus of view. Keep the logic facade outside of the control flow and ensure that the main responsibility of this layer is to orchestrate services, not to run the logic by itself.
The memory. They say that fish has three seconds memory (and they are wrong), but your bot should have a much longer memory. Or shorter. It depends.
State repository is where you aggregate the information you've collected about the user of the system. This information can come from different sources, such as a backend system (it's a VIP customer), a questionnaire you had the user fill out (what is your order number?), or from calculations made in real time (there's a cat in this image).
Design Consideration #4: Shared State and Zero-Memory Control Flows
When your system starts to grow, you'll soon start to face concurrency bottlenecks. The key to scale this kind of system (and this rule is applicable to many similar systems), is to keep everything but the state repository stateless. This way, you can easily throw more machines at the problem without worrying about session stickiness and migrating local objects. You'd want the state repository to grow natively with the load as well. Most of the key-value stores out there — MongoDB, Redis, Couchbase, Cassandra, etc. can satisfy this need.
The brain. This layer of services has a rather simple task to accomplish — it takes raw information and structures it. Easy. My 1-and-a-half-year-old, can look at a cat picture and say, "Meow," so what's the big deal?
Well, it ain't that simple.
Companies like Microsoft, Google, IBM, and others are spending a lot of money on deep learning algorithms, aiming to identify cats in images, understand human text/voice, and drive cars.
The good news is that the sophistication of these algorithms and the amount of information that was used to train them do not imply how easy is it to use them. The era of cognitive services has begun — I call it BaaS, Brain-as-a-Service. Today, with simple API,s you can get immediate access to the work of some of the smartest PhDs in the world.
Design Consideration #5: Be Smart About Vendor Locking
Like any other X-as-a-service, the same rule applies here — it's OK to be vendor locked sometimes, but there must be a super critical function that this service provides in order to get you locked.
When you are debating between the different cognitive services out there, try to see if you can replace the selected vendor with another service, or at least understand that you are married to the chosen service with your eyes open. There's a lot of similarity between them, so it will soon become a matter of accuracy and cost.
Using open-source tools to build your own cognitive service layer is not an easy task yet, although many strong tools are available today to start walking this path — TensorFlow, Torch, and others.
Muscles. If your system doesn't change anything, you're probably building a new interface for a search, which is quite useless in my point of view.
The interesting use cases of bots are those where a real action happens — for example, those that let you buy products, track shipments, change a seat on the plane, etc.
The business facade is the layer that connects your bot with the real world outside of FB Messenger.
Design Consideration #6: Remember to Apply Security Measures
Business facade is no joke. It can deal with real money and can do real damage.
Exposing these business capabilities in your AI bot is very interesting, but it doesn't mean you should lower the security for it. On the contrary — using self-service bots, you might let your customers do things they had normally done only with the supervision of a real agent, like buing stocks. Put some serious thought into this phase, or things will get out of hand.
The AI bots era is going to be fascinating, and I believe it will make a serious change in the employment market. Most companies are still skeptical about the impact that we're going to notice in the coming years, if not months. And thus, most bots being built today are still rather simple ones.
Building a trivial FAQ bot is not that hard, but when it gets to do more complicated real business actions, do some serious consideration before jumping into making a system that will quickly break.
In my next article, I will benchmark the different vendors and solutions currently available that support some of the elements described here. Stay tuned.