Technical Solutions Used for AI
Technical Solutions Used for AI
TensorFlow, Python, R, and Spark were the most frequently mentioned out of more than 20 solutions.
Join the DZone community and get the full member experience.Join For Free
Start coding something amazing with the IBM library of open source AI code patterns. Content provided by IBM.
To gather insights on the state of artificial intelligence (AI), and all of its sub-segments — machine learning (ML), natural language processing (NLP), deep learning (DL), robotic process automation (RPA), regression, et al — we talked to 21 executives who are implementing AI in their own organization and helping others understand how AI can help their business. We began by asking, "What are the technical solutions (languages, tools, frameworks) you or your clients are using for A.I. initiatives?" Here's what they told us:
- We pride ourselves on flexibility and try not to pin ourselves to any one technology around A.I. Depending on the problem, we might use Python (pandas, SciKit-learn, Anaconda, TensorFlow, Pytorch), R, WEKA, GoLang, or even Excel to build and analyze models. We focus on the science, not the frameworks to get us there. If a particular technology can help us do something faster, that’s what we use. By remaining flexible, we can iterate more quickly and remain agile.
- There are no magic bullets other than understanding your world, your business, your data, and getting more of it. The tools people are using are what you’d expect. A data platform is involved to bring data in from across the organization. I don’t know anyone just using one tool. Everyone tries a bunch of them to know what works for them. TensorFlow, R, H2O, MxNet, and Café are all widely used for cross-referencing because they scale differently and with different algorithms will do better in the cases of parallel hardware or GPUs. A GPU may be fast enough to take the place of a dozen or several dozen CPUs in a cluster, but not all algorithms are amenable to that. Most people have some GPU, some algorithms can use parallel GPUs. Some algorithms are really cheap to train, and they would use sequential single machine activities. So, you have this panoply of approaches. You want your data stored in a single platform and able to be used by all these tools so you’re not having to move data and copy to a laptop or a GPU. You want it to be universally accessible. It should be directly usable by all these tools. A successful platform should blend like a tennis player and their racket. A good platform should be in the front of the action but use without thinking about it.
- Our platform is on the operational side. The language we see a lot of is Python, some R, and some Java. The inference side of AI and the training don’t have the same language. On the inference side, we see Python. On the training side, we see a mixture of Python, R, Java, and Scala. Last year, R was trending ahead of Python. This year, we see Python taking the lead from R. An online study verifies this. Frameworks Spark, TensorFlow, use TensorFlow in the lab with the intent to move into production. No one is standardizing on a single framework. There are new frameworks for reinforced learning. Inference has a lot of heterogeneity — Spark, containerized Python, streaming in IOT a lot of REST. Much less standardized than training. Training uses Spark and TensorFlow and Pytorch (FB DL framework), SciKit, MATLAB.
- Analytics engines: Spark, TensorFlow, and Flink. Cloud providers: AWS, Azure, and GCP. Stream brokers: Elastic, Nifi, Flume, RabbitMQ, Kafka, and Kinesis. Data lakes: Avro, Cassandra, Hadoop, Apache HBase, JDBC, HDFS, and NFS. We help clients build a composite of models in the context of a reusable business service. Output is containerized and deployed on Spark or TensorFlow run and manage tasks and jobs while picking up statistics and then slice and dice for business reports and compliance.
- There is a lot of open source deep learning. TensorFlow ML pipeline to spend more time testing and training the model. Nice to see Microsoft, Google, and Amazon competing for your data. C++ for math libraries and memory allocations and NumPy and Fortran for raw efficiency with data sets. Easy to write by hand. Depending on the dataset size C++, understand pipeline data and how much it costs to train the data. Can use Crystal, then Go, and then C++ eventually to GPU programming to do serious computations. Google is now writing ML kits and are doing ML instructions direct to the GPUs.
- Unique in its ability to drive OLAP workloads and running TensorFlow. Talking to customers, we see Python is the language of choice. For some, C or C++ are for cluster analysis. Models while there are specific models for verticals — Deep Chem for pharma built atop TensorFlow. TensorFlow is receiving widespread adoption. It has the ability to be distributed and the ability to run models in a distributed way becomes important. Can distribute UDF or Python code across servers for faster training and inference.
- TensorFlow has created a big democratization of ability to use AI. It makes it easier for small teams to do big things.
- As the platform for sending every type of email, we handle a myriad of types of transactional and promotional messages for senders large and small. To increase the throughput and efficiency of your email program, we have built intelligent queuing into our infrastructure. Our adoption of ensemble methods such as Random Forest has led to improved prediction. Random Forest helps correct for the problem of overfitting: when a statistical model describes random error or noise instead of the underlying relationship. In addition, our adoption of TensorFlow, an open source Python library, has accelerated our progress through its rapid calculation execution and its robust visualization tools. TensorFlow is now the main system we use to build and train our neural networks to detect and decipher patterns and correlations.
- A successful AI solution hinges on tools for the two fundamental building blocks – 1) how efficiently data is collected 2) how data is analyzed via ML and visualization and automation based on the insights. Big Data and ML technologies and underlying languages are extremely critical in order to deliver on these fundamental capabilities. API-based integration using standard REST API is paramount to quickly connect to data sources across a disparate environment and collect data for processing. Python as the core programming language with python as an interactive command line terminal is heavily used by data scientists and developers to build ML algorithms. Elastic ELK stack has proved to be a powerful platform for big data processing along with Spark, a Storm-like framework that augments for data processing, correlation, and insights development. Java is the underlying core language for all of the big data platform technologies.
- On the research side of things, we use Python. On the deployment side of things, we are looking at distributed frameworks like Apache Spark, Neo4J, and others that are more problem specific. Jupyter Notebooks — to translate from R&D to E. The math happens at the research layer. At the development level, we have machine learning engineers who will take the research and figure out how to operationalize and scale it. Once understood there, we can then consider it an engineering problem. A critical piece in that puzzle is how to translate from research to development. Jupyter Notebooks lets someone show code, text, and images in one document. If the data changes, you can re-run it, which re-populates the whole report. This makes it much easier for R&D + E teams to operationalize and productize research. A secondary benefit is this also encourages rigorous documentation.
- To appeal to developers, open source vendors and cloud vendors are competing with each other. Open source enables a lot of cool research. This is a rapidly evolving field. Be able to take the package that works for you and bind it to a digital twin deployed at a traffic intersection dynamically by pushing into the fabric. Everything you can use in the open source environment Spark ML time scales to embrace; the edge environment does not evolve very fast. Need general abilities to build a useful digital twin and attach latest and greatest algorithm. Data flow pipelines are stream based and need to be able to deploy new algorithms in the streams on the fly.
- We are embracing the development community by embracing open source, Apache Spark, Hadoop, ways in which the infrastructure interacted with based on open source Python, Scala, R, and Jupyter are all kind of standards and well known in the data science community. People are less tied to a particular data science technology.
- Usable by a working engineer, trained as a mechanical engineer, understand instrumentation, data, not a signal processing or ML expert. Take signal processing and ML so they can use without understanding more than the fundamentals. Very focused on specific uses and use cases, not requiring the user to be an ML expert.
- We use a number of libraries when it comes to building the Audioburst technology. Off the shelf, libraries give us the ability to improve our natural language processing (NLP) and voice recognition technology without having to reinvent the wheel. By reusing existing engines, we can incorporate them into our AI platform to streamline the way we analyze and segment audio content.
- The way you address and access data is where our solution comes in, adding a level of security. Many companies collect data without looking at it; for us, it is an inherent part of our offering that once the user taps into the data, then we check the access and compliance. SAP believes the platform will bring businesses up to speed with the times of technology. Dependent on what is stored, data has the ability to massively impact business outcomes. In turn, businesses must start being more open-minded when it comes to adopting these new approaches.
- Because AI projects are largely greenfield opportunities, organizations start with a clean slate when designing the supporting infrastructure. We see customers deploying GPU-based servers such as the HPE’s 6500 or Nvidia’s DGX-1, NVMe-based SSDs, and 100Gb/second Ethernet or EDR InfiniBand networks to handle the high data ingest rates and complex analytics of ML and AI. To fully realize the capability of all this specialty hardware, a low-latency, high-performance, shared file system is necessary to ensure fast access to large datasets. We provide the world’s fastest file system that provides high performance, low latency, and cloud scale in a single solution deployed on commodity hardware. A single GPU server requires over 4 gigabytes of ingested data per second to be fully utilized. Anything less, and the server is I/O bound waiting for data, wasting valuable time and money. Matrix is optimized for AI workloads and will ensure your GPU servers are fully utilized. It also offers multi-protocol support and seamless access to on-demand resources in the cloud. This ability to burst to the cloud for additional resources means organizations no longer have to buy infrastructure based on peak needs. It also serves as a form of remote backup and disaster recovery without the need for dedicated hardware or software.
- The layer of core technologies consists of: 1) Machine learning is a type of A.I. that provides computers with the ability to learn without being explicitly programmed. 2) Deep Neural Networks are inspired by the human brain and nervous system and can be leveraged to advance speech recognition and natural language understanding as part of the mission to facilitate better communication between people and machines. 3) Deep Learning is a type of ML that applies neural networks with multiple layers of processing and analysis. Based on this, we have developed a range of technologies that help to convert unstructured data into structured data and information: 1) Automatic Speech Recognition (ASR) allows users to speak to a system as they would to another human. 2) Natural Language Generation/Text-To-Speech (TTS) can vocalize data to consumers in a way that is nearly indistinguishable from human speech. 3) Natural Language Understanding (NLU) allows consumers to speak naturally to a system using their own words, be understood, and achieve their desired outcome. For example, an NLU system can distinguish whether the consumer is talking about a crane (construction equipment) or a crane (the bird). 4) Computer vision or image classification and analysis finds relevant information in images, like finding nodules in CT-scans of the lung or anomalies in X-Rays of the chest.
- We are working to solve Wi-Fi assurance issues and ultimately create a self-healing network. There’s so much opportunity in AI for IT operations (AIOps), and we’re excited about the role we’re playing to make IT smarter and faster while ensuring the best experience for wireless users.
Here's who we spoke to:
- Assaf Gad, Vice President and Strategic Partnerships, Audioburst
- Tyler Foxworthy, Chief Scientist, DemandJump
- Patric Palm, CEO, Favro
- Sameer Padhye, CEO, FixStream
- Matthew Tillman, CEO, Haven
- Dipti Borkar, V.P. Product Marketing, Kinetica
- Ted Dunning, Chief Application Architect, MapR
- Jeff Aaron, VP Marketing and Ebrahim Safavi, Data Scientist, Mist Systems
- Dominic Wellington, Global IT Evangelist, Moogsoft
- Dr. Nils Lenke, Director, Corporate Research, Nuance Communications
- Mark Gamble, Senior Director of Product Marketing, OpenText
- Sri Ramanathan, Group Vice President of Mobile, Oracle
- Sivan Metzger, CEO and Co-founder, ParallelM
- Nisha Talagala, CTO and Co-founder, ParallelM
- Stuart Feffer, Co-founder and CEO, Reality AI
- Sven Denecken, SVP Head of Product Management, SAP S/4 Hana Cloud
- Steve Sloan, Chief Product Officer, SendGrid
- Simon Crosby, CTO, Swim
- Liran Zvibel, CEO and Co-founder, WekaIO
- Daniel DeMillard, A.I. Architect, zvelo
Opinions expressed by DZone contributors are their own.