Revamping Online Commerce Workflows by Deploying Semi-Supervised Machine Learning Modules
Read about the development of so-called semi-supervised algorithms helping to bring enterprise-scale artificial intelligence systems to small business owners.
Join the DZone community and get the full member experience.
Join For FreeAutomation has quickly become something of a watchword in the field of online marketing, but it's often applied inconsistently. Due to the long lead times that are necessary when developing an automation tool, specialists often only deploy larger commercial packages in their places of business. These are meant to be one-size-fits-all solutions that may not be custom-tailored to a given industry.
Data scientists wouldn't normally take too much of a glance at this field, since they're generally involved in research that's somewhat outside of the purview of these business practices. The large enterprise-scale artificial intelligence systems that they deploy tend to be too complex to scale down to the needs of small business owners. The development of so-called semi-supervised algorithms is helping to port this technology to the home microcomputer platforms that once led a digital revolution, which in turn is helping individual business owners to implement it in their own offices. After all, it normally isn't possible to deploy supercomputing-focused modules in a commercial environment.
Getting a clear definition of what this technology is and what it's supposed to do hasn't proven so easy. So, industry evangelists are hard at work figuring out the best way to boil down these lofty goals into something that fits into real-world use cases.
Defining Semi-Supervised Algorithms
In a traditional machine learning environment, a module either learns based on attempting the same task many times over, or alternatively, learns directly under the tutelage of a skillful technician. Since the process can take an extremely long time, it's usually impractical to simply spin up a new type of algorithm without going through a proper planning phase. Traditionally, this is where project evaluation review technique (PERT) checklists would come into play. These tend to make the project planning process much more manageable even if there isn't much in the way of previous schedule data. Managers can use a PERT system to figure out whether or not they're going to have sufficient time and resources to finish off an algorithm's learning stage in time for deployment.
As any astute observer might expect, this doesn't work in the world of internet marketing. Things move much too quickly for there to be any real dedicated planning stage and tools are often needed immediately, which has limited the application of sophisticated ML-based subroutines. Computer scientists define semi-supervised ML algorithms as those that still learn under the guidance of a human but undertake a large amount of unlabeled data that doesn't have to be parsed before it can be processed.
Those who adhere to the Fred Brooks school of thinking would argue that laying out a clear definition of this technology is of vital importance when it comes time to specify its scope. Some data scientists are finding that it's much better to leave some blanks in order to help tailor algorithms to a specific job later on.
Customizing a Marketing-Based Algorithm
It may help to think of things in terms of real use cases that an online marketer is likely to come across on a daily basis. Consider, for example, a set of exit-intent tools that are designed to give business owners a last-ditch chance to convert a potential lead. These tend to involve a pop-up message, which could be effective if geared toward the right kind of customer. On the other hand, many people would simply see this as spam if they're not within the right audience segment. That's where a marketing algorithm could come into place.
While it would take a binary tree or conventional neural network some time to derive useful insights from a routine like this, it might eventually have a strong understanding of what sort of customers respond best to this. In general, they'd monitor the overall level of engagement that people had with a site. These could then learn when the best time to display a pop-up to a customer is in a relatively short period of time. That is at least considerably less time than older methods would have taken.
Eventually, though, the quality of data is going to become a serious issue that researchers will have to address if they seriously plan on scaling this kind of technology to the needs of small businesses working in the online space.
Supervising Which Data Points To Use
Though it might be difficult for data scientists to pin down one single definition for semi-supervised ML systems, they do agree that the only real human intervention needed to control one is slick manipulation of the data inputs. The quality of data feeds can quickly suffer, especially when information is getting taken in from more than one feed. In spite of certain privacy considerations, most small businesses are likely collecting far too much information about their clients to actually draw useful insights from them.
Few such firms have much experience in the way of big data workflows, thus they might be feeding in irrelevant or repetitive data points that eventually start to draw inappropriate conclusions. While there's currently no way for specialists to design AI-based agents that can consistently make these judgment calls, they can work with existing business owners to help them pick out what's important. Once this decision has been reached, it's relatively easy to author a filter mask that will sort through all the information exposed by an API and then afterward start to feed it over automatically.
In many cases, they'll also be able to label it more effectively after only a few short sessions spent working with a data analysis team.
Properly Labeling Data Inputs
Domain expertise is of the greatest importance when it comes to labeling data. Anything that small business owners lack in terms of big data processing skills they more than make up for by knowing what items are of importance to their particular field. Today's heap of unstructured data is tomorrow's labeled training material. Someone simply has to come along and add columnar specifications for each data type in a group. Simple code extensions have made it possible to manipulate larger databases in the cloud without having to resort to any type of specialty hardware.
After this labeling process is done, an AI-based agent shouldn't have too much difficulty running itself. While further maintenance will of course be necessary, it shouldn't be anything outside of what a modestly-sized IT department would be able to do. Though the prospect might seem somewhat alien to those working in the field of commerce today, it shouldn't take long for them to get up and running with a fairly sophisticated tool.
Opinions expressed by DZone contributors are their own.
Comments