Quality Engineering Design for AI Platform Adoption
AI platform adoption forces testing the algorithm, data, integrations, safety, security, performance, UX, and training models.
Join the DZone community and get the full member experience.Join For Free
We are in the golden age of AI (1). AI adoption makes businesses more creative, competitive, and responsive. The software-as-a-service (SaaS) model, coupled with the advancements of the cloud, has matured the software production and consumption process. Most organizations prefer to “buy” AI capabilities than “build” their own. Hence SaaS providers, such as Salesforce, SAP, Oracle, etc., have introduced AI platform capabilities, creating AI-as-a-Service (AIaaS) model. This evolution has made AI adoption easier for enterprises (2).
For quality assurance (QA) in general, testing in particular plays a vital role in the AI platform adoption. Testing is complex in the adoption of an AI platform and the reasons are:
- Testing AI demands intelligent test processes, virtualized cloud resources, specialized skills, and AI-enabled tools.
- While AI platform providers would make frequent releases, the pace of testing should be equally fast.
- AI products generally lack transparency and won’t be explainable (3). Hence, it is difficult to trust.
- It is not just the AI product, but the quality of the training models and the quality of the data are equally important. Therefore, conventional testing methods that validate cloud resources, algorithms, interfaces, and user configurations would be inefficient. It is equally important to test the learning, reasoning, perceptions, manipulations, etc. as well.
In a plug-and-play AI solution model, the AI logic is provided by the software vendor. The consumer builds the interfaces, provides data for training the logic, trains the logic in the solution context, and extends the experiences to the end user. Firstly, like in traditional testing, we should test the data, algorithm, integration, and user experiences. Secondly, to test the functional fitment of the solution, the training model should be validated, which would extend the testing to reasoning, planning, learning, etc. Thirdly, the approach to validate the AI algorithm itself should be developed. Finally, tools that AI logic may use such as search, optimization, probability, and so on should also be covered in the functional validation. This article presents a practical view of a potential AI testing framework.
Core Necessity in the AI Platform Adoption: Continuous Testing
QA maturity by means of a high degree of automation is critically important for AI platform adoption. As enterprises modernize their infrastructure and engineering methodologies, release cycles can be short and highly automated. Continuous integration (CI) techniques have proven effective (4). As code is logged-in several times a day and then re-compiled, it generates multiple QA feedback loops. So, to successfully apply CI, automation of the build and deployment process is critical. While automation is the basis of CI, test automation makes continuous delivery (CD) possible (5). In summary, CD is driven by CI. The evolution of Agile and DevOps models has accelerated the feedback loop between development and testing, institutionalizing continuous testing (CT), continuous development, and continuous delivery.
In a business enterprise, data, applications, infrastructure, etc., constantly change. At the same time, AI product is continuously upgraded by the SaaS vendor in order to improve experiences and efficiencies. In such a dynamic situation, it is crucial to establish a continuous testing ecosystem so that a fully automated test environment not only confirms the ever-changing enterprise IT assets but also validates the changing versions of the AI product.
Establishing a CT ecosystem will demand the following considerations:
- Shift automation test scripts to an enterprise version control tool. Automation codebase just like an application codebase should reside in a version control repository. That way, it will be effective to align test assets with application and data assets.
- Plan to integrate the automation suite with a code/data build deployment tool to enable centralized execution and reporting. It is important to align code/data builds with their respective automation suite. Tool-based auto-deployment during every build is an absolute necessity to avoid human intervention.
- Classify the automaton suite in multiple layers of tests to enable faster feedback at each checkpoint. For example, an AI health check can verify services are up after the deployment of changes in the interfaces and data structures. An AI smoke test can verify that key system features are operational and no blocking defects occur.
- Cover the training model as well. AI testing should also test the training model which certifies if the solution has learned the given instructions, both supervised and unsupervised. Recreating the same scenarios more than once to check if the response is as per the given training is critical. Similarly, establishing a process as part of testing to train the solution on bugs, errors, exceptions, mistakes, etc., is also critical. Fault/error tolerance can be established if exception handling is well thought through.
- Plan to administer AI training/learning throughout the adoption cycle. CT setup should help continue the learning from testing through production rollout with fewer worries about transfer learning.
- Optimize through intelligent regression. If execution cycle-time for overall regression is significantly high, CT should carve out a subset of regression for execution at run-time based on critically impacted areas to provide feedback within a reasonable time window. Effective use of ML algorithms to create a probabilistic model for selecting regression tests (6) that are aligned to the particular code and data build, help optimize the use of cloud resources efficiently and speed up testing.
- Always plan for full regression periodically. This can be shifted to overnight or during the weekend depending on its alignment with recurring build frequencies. This is the final feedback from the CT ecosystem. The goal is to minimize feedback time by running parallel execution threads or machines.
When there is no manual intervention for testing, bugs, errors, mistakes, and any algorithmic exceptions all become sources of discoveries for the AI solution. Similarly, the actual usage and user preferences during testing also become the source of training which should continue through production.
Assuring the Data Extraction in AIaaS Adoption
Quality of data is the most important success criterion for AI adoption. There is useful data outside the enterprise as much as inside. The ability to extract useful data and make them available to the AI engine is a requirement. Extract, transform, and load (ETL) is a heritage term, used to refer to a data pipeline that collects data from various sources, transforms the data according to business rules, and loads it into a destination data store. The field of ETL has advanced to enterprise information integration (EII), enterprise application integration (EAI), and enterprise cloud integration platforms as a service (iPaaS) (7, 8, 9). Irrespective of technological advancements, the need for data assurance has only gotten greater importance. Data assurance should address functional testing activities such as map-reduce process validation, transformation logic validation, data validation, data storage validation, etc. It should also address non-functional aspects of performance, failover, and security of data.
Structured data are easier to administer whereas unstructured data that originate outside the enterprise IT should be handled with care. Stream processing principles help prepare data in motion, i.e., processing the data as soon as it is produced or received from websites, external applications, mobile devices, sensors, and other sources through event-driven processing. Checking the quality through established quality gates is an absolute necessity. Messaging Platforms such as Twitter, Instagram, WhatsApp, etc., are popular sources of data. When used, they connect applications, services, and devices across various technologies via a cloud-based messaging framework. Deep learning technologies are available that can make computers learn from these loads of data. Some of these data would require neural network solutions to solve complex signal processing and pattern recognition problems including speech-to-text transcription, handwriting recognition, and facial recognition (10, 11, 12). Necessary quality gates should be established to test out data that flows from these platforms.
Following are some design considerations for AI-driven QA orchestration.
- Automate quality gates: ML algorithms can be implemented to determine if the data is a “go” or “no go” based on historical and perceived standards.
- Predict root causes: Triaging or identifying the root cause of a data defect not only helps avoid bugs in the future but also helps continuously improve the data quality. With patterns and correlations, test teams can implement ML algorithms that trace defects to the root causes (13). This helps auto-perform remedial tests and fixes before the data progresses to the next stage leading to self-testing and self-healing.
- Leveraging precognitive monitoring: ML algorithms can scout for symptoms in data patterns and associated coding errors such as high memory usage, a potential threat that could result in an outage, and teams can implement corrective steps automatically. For example, the AI engine can automatically spin up a parallel process to optimize server consumption.
- Failover: ML algorithms can detect failures and automatically recover to proceed with processing, registering the failure for learning.
Assuring the AI Algorithm in AIaaS Adoption
When the internals of a software system are known, developing tests is straightforward. In an AI platform solution, the “interpretability” of the AI and ML is low (3), i.e., input/output mapping is the only known element and the mechanism for the underlying AI function (for example, prediction) cannot be looked at or understood. Though traditional black-box testing helps address the input/output mapping, when there is a lack of transparency, humans will have difficulty trusting the testing model. Of course, the AI platform solution is a black box; there are unique AI techniques that help validate the AI functionality so that testing can go beyond just input and output mapping. Some of the AI-driven black-box testing techniques for design considerations include:
- Posterior predictive checks (PPC) to simulate replicated data under the fitted model and then compare these to the observed data. So, testing can use posterior predictive to "look for systematic discrepancies between real and simulated data."
- Genetic algorithms to optimize test cases (14). The challenge of generating test cases is to search for a set of data that lead to the highest coverage when given as input to the software under test. If this problem is solved the test cases can be optimized. There are adaptive heuristic search algorithms that perform basic acts of natural evolution such as selection, crossover, and mutation. In the generation of test cases using heuristic search, feedback information concerning the tested application is used to determine whether the test data meet the testing requirements. The feedback mechanism gradually adjusts test data until test requirements are met.
- Neural networks for automatic generation of testcases. These are physical, cellular systems that can acquire, store, and process experiential knowledge. They mimic the human brain in order to carry out learning tasks. Neural network learning techniques are used in the automatic generation of test cases (15). In this model, a neural network is trained on a set of test cases applied to the original version of the AI platform product. The network training is only on inputs and outputs of the system. The trained network can then be used as an artificial oracle for evaluating the correctness of the output produced by new and possibly faulty versions of the AI platform product.
- Fuzzy logic for model-based regression test selection. While these approaches are useful in projects that already use model-driven development methodologies, a key obstacle is that the models are generally created at a high level of abstraction. They lack the information needed to build traceability links between the models and coverage-related execution traces from the code-level test cases. Fuzzy logic-based approaches are available that automatically refine abstract models to generate detailed models that permit the identification of traceability links (16). The process introduces a degree of uncertainty, which is addressed by applying fuzzy logic based on the refinements to allow the classification of the test cases as re-testable accordingly to the probabilistic correctness associated with the used refinement.
[Read more: Black-Box Testing for Machine Learning Models]
Assuring the Integration and Interfaces in AIaaS Adoption
All SaaS solutions, AI-as-a-service included, come with a set of defined web services that enterprise applications and other intelligent sources can interact with to deliver the promised outcome. Web services have evolved to provide platform independence, i.e. inter-operability. Increased flexibility has enabled most web services to be consumed by diverse systems. The complexity of these interfaces will demand an increased level of testing. In a CI/CD environment, it is even more critical to check the compatibility of these interfaces in every build.
The primary challenge is to virtualize the web services and validate the data flow between the AI platform solution and the application or IoT interfaces. The main reasons why interface/web service testing is complex are:
- There is no user interface to test unless it is integrated with another source that may not be ready to test.
- All elements of a service need to be validated no matter which application uses them or how often they are used.
- The underlying security parameters of the service must be validated.
- Connection to services is made through different communication protocols.
- Multiple channels calling a service simultaneously lead to performance and scalability issues.
Testing the interface layer will particularly demand:
- To simulate component or application behavior. Complexities in the AI application interfaces with humans, machines, and software should be simulated in the AI testing for correctness, completeness, consistency, and speed.
- To check for nonstandard code usage. Usage of open-source libraries and the adoption of real-world applications could bring non-standard code and data into the enterprise IT environment. Hence, they should be validated.
Assuring the User Experiences in AIaaS Adoption
In the new realities of remote work and life, customer experience has become imperative for business success. This is an even greater objective in AI adoption. Non-functional testing is a proven phenomenon that delivers meaningful customer experience by validating attributes such as performance, security, and accessibility. Next-gen technologies have added more complexity to experience assurance in general.
Here are some important design considerations for experience assurance in the overall AI testing framework.
- Design for the experience rather than testing for it. Enterprise AI strategy should be derived from an end-user perspective. It is important to make sure that the testing team represents the actual customers. Early involvement of customers in the design will help not just the design, but also provide early access to earn the trust.
- Agility and automation are delivered through a build-test-optimize model. Testing cycles at the scrum level should have considerations for user experience. Early testing for experiences will help implement a build-test-optimize cycle.
- Continuous security with an Agile approach is critical. Have the corporate security team part of the Agile team 1) own and validate the organization’s threat model at the scrum level, and 2) evaluate the structural vulnerabilities (from a hypothetical hacker’s point of view) for all the multi-channel interfaces that SaaS AI solution architecture may have.
- Speed is critical. Attributes of the AI data such as volume, velocity, variety, and variability would force pre-processing, parallel/distributed processing, and/or stream processing. Testing for performance will help optimize the design for distributed processing that is required for the speed that the users expect from the system.
- Nuances of text and voice testing are important. Many research surveys suggest that conversational AI remains at the top of the corporate agenda. New technologies such as augmented reality, virtual reality, edge AI, etc., continue to emerge. Hence, testing text, voice, and natural language processing should be addressed.
- Simulation helps test the limits. Checking for user scenarios is fundamental for experience assurance. When it comes to AI, testing for exceptions, errors, and violations would help predict the system behavior, which in turn helps us validate the error/fault tolerance level of AI applications.
- Trust, transparency, and diversity. It is critically important to verify the trust that enterprise users develop in the AI outcome, validate the transparency requirements of the data sources and algorithms to reduce the risks and grow confidence in AI, and ensure diversity in the data sources and user/tester involvement to check AI ethics and accuracy. In order to do these, not only should the testers have increased levels of domain knowledge, but also are expected to know the technical know-how of the data, algorithms, and the integration processes within the larger enterprise IT.
Continuous testing is a fundamental requirement for AI platform adoption. We should adopt the modular approach of perfecting the designs of data, algorithms, integration, and experience assurance activities. This will let us create a continuous testing ecosystem where enterprise IT can always be ready to accept the frequent changes of internal and external AI components.
- Kaynak, O. (2021). The golden age of Artificial Intelligence.
- Bommadevara, N., Del Miglio, A., & Jansen, S. (2018). Cloud adoption to accelerate IT modernization.
- Brčić, M., Došilović, F. K., & Hlupić, N. (2018). Explainable artificial intelligence: A survey.
- Babar, M. A., Shahin, M., & Zhu, L. (2017). Continuous Integration, Delivery and Deployment: A Systematic Review on Approaches, Tools, Challenges and Practices.
- Arachchi & Perera, I. (2018). Continuous Integration and Continuous Delivery Pipeline Automation for Agile Software Project Management.
- Lachmann, R. (2018). Machine Learning-Driven Test Case Prioritization Approaches for Black-Box Software Testing.
- CIO Wiki (2021). Enterprise Information Integration (EII).
- MuleSoft. Understanding enterprise application integration - The benefits of ESB for EAI.
- Microsoft Azure. What is PaaS?
- Attili, I., Azzeh, M., Nassif, A., Shaalan, K., & Shahin, I.(2019). Speech Recognition Using Deep Neural Networks: A Systematic Review
- Doetsch, P., Kozielski, M., & Ney, H. (2014). Fast and Robust Training of Recurrent Neural Networks for Offline Handwriting Recognition
- Chan, D., Mahoor, M. H., & Mollahosseini, A. (2016). Going deeper in facial expression recognition using deep neural networks
- Bastida, J. K. (2019). Applying Machine Learning to Root Cause Analysis in Agile CI/CD Software Testing Environments
- Bao, X., Qian, J., Wu, B., Xiong, Z., Zhang, N., & Zhang, W. (2017). Path-oriented test cases generation based adaptive genetic algorithm
- Kandel, A., Last, M., & Vanmali, M. (2002). Using a neural network in the software testing process
- Al-Refai, M., Cazzola, W., & Ghosh, S. (2017). A Fuzzy Logic Based Approach for Model-Based Regression Test Selection
Opinions expressed by DZone contributors are their own.