Helping Provide Insight in an Unstructured World
Learn from this article how the pairing of IBM Watson Explorer with IBM Data Science Experience provides visibility in an unstructured world.
Join the DZone community and get the full member experience.Join For Free
In my previous article, I talked about the coming together of SPSS with the IBM Data Science Experience, a valuable add-on to the Data Science Experience platform. In this article, I look at another addition to the platform — the pairing of Watson Explorer with Data Science Experience.
Data Scientist: Traditionalists and Mavericks
If we strip away everything but the analysis portion, I believe data scientists generally fall into two main camps: the traditionalists and the mavericks.
- Traditionalists are used to and prefer packaged solutions. Analytics are a means to an end, and there are typically multiple people working together on various analytics projects either within a department or across a given company.
- Mavericks are coders by nature and are experimenters at heart. They don't look for packages to help do analyses but instead look for specific techniques or a way to play with an unusual data type (such as streaming data, social data). Mavericks may start with an application they want to create in mind and then come up with takes on the available data that can be woven together. The maverick tends to work on the data and an analysis and will then pass it on to someone else to interpret and operationalize.
Both of these profiles need to be taken into account by an analytic platform.
IBM Watson Explorer (WEX) can help organizations understand the "why" and "how" by recognizing patterns behind unstructured data and can help deliver actionable insights that could help organizations make decisions based on such data. Let's look at three key areas of WEX:
- Explore: Find information scattered across your enterprise fast, by leveraging ML models for relevancy
- Analyze: Discover trends and anomalies hidden in unstructured data rapidly — using the cognitive content miner
- Advise: Introducing Cognitive Advice — suggest the next potential action with confidence using machine learning models for text analytics and classification
WEX can help extract information and context from unstructured text data so that it can be used like structured data. Statistical scores such as frequency and correlation are computed to further describe the extracted information. Key information within a document is extracted, describing the document as meta-data in the form of keywords or facets. The facets are captured in a text index and used to enhance findability, enable the discovery of unknown patterns using the content miner, and provide the basis for the delivery of cognitive advice.
WEX for Data Science Experience is a new offering positioned as shown in figure #1 that provides the text analytics and content mining capabilities of Watson Explorer as an integrated add-on to the IBM Data Science Experience. With this new offering both novice and expert data scientists can more easily:
- Extend an analysis to unstructured data using visual capabilities
- Directly access and analyze the native data sources of Data Science Experience
- Manage Watson Explorer collections as a set of Data Science Experience assets
- Use the Watson Explorer Content Miner to see unknown patterns in text sources and refine information extraction
WEX for DSX Integration Scenarios: A Closer Look
This first scenario introduces the use of Watson Explorer in tandem with SPSS Modeler for Data Science Experience.
The new WEX for DSX add-on introduces an SPSS Watson Explorer node to the DSX which enables the extraction of keywords or facets from text documents and creates structured output that is used to feed an SPSS prediction model.
The second scenario introduces Python library access to Watson Explorer APIs. API access provides a Data Science Experience Notebook access to all the information in a Watson Explorer document collection including statistical scores. This can assist Data Scientists in creating documents that combine code, equations, visualizations, and narratives with keywords, facets and statistical scores from Watson Explorer.
A subset of capabilities of Watson Explorer is integrated into this new add-on for Data Science Experience including seamless integration of the Cognitive Content Miner and Administrator user interface.
Integration Scenario #1
WEX for DSX provides a set of standard annotators for extracting basic and advanced information from unstructured data. Annotators represent a model of patterns for understanding the text. An administrator or data scientist defines the sources of information to be accessed which includes access to available DSX platform data sources.
When developing a text analytics model, a data scientist can select from and enable out of the box annotators and a typical configuration for information extraction would include the parts of speech annotator which provides basic linguistic analysis including phrase constituents. Sentiment analysis and named entity recognition can also be enabled.
Users can use the dictionary annotator along with a new feature called the "domain curator" to create dictionaries from keywords, taxonomies, and ontologies.
The output of this activity is a Watson Explorer node which represents the text analytics model.
The next step is to create an SPSS modeler flow that uses a Watson Explorer node, shown in Figure #2.
The Watson Explorer node, representing the Watson Explorer NLP model created in the first step, is inserted into the SPSS Modeler flow.
The user then refines a prediction model to incorporate keywords — for example, those extracted from customer complaints about a retailer's location. The input data source contains a text body that might describe a recent event in a retailer's store. Text information provides context as to why a customer was dissatisfied with an experience in a particular store location.
The Watson Explorer node produces a table representing the keywords extracted from the body text, which are then used to refine and improve a customer churn prediction model.
After creating the Watson Explorer model and customizing the SPSS Modeler flow the SPSS prediction model, which includes the WEX NLP runtime extraction model can be run in the flow as an SPSS worker job on DSX Local Deployment Manager.
Integration Scenario #2
The second use case scenario features the integration of Watson Explorer APIs into a DSX Notebook. Information in a WEX collection from annotated documents — meaning tagged with extracted keywords, entities, sentiment to analytics like frequency and correlation are accessible in a Notebook for analysis and collaboration. The add-on integrates a subset of WEX into DSX helping organizations to extend their collaborative data analytics environment to leverage important data trapped in text based disparate data sources.
This Notebook integration makes information in a WEX collection accessible by the WEX API and the resulting Notebooks that include WEX text analytics models can be deployed to DSX Model Management & Deployment.
The SPSS Modeler for DSX integration uses the new WEX node to augment flows with structured data created from unstructured sources.
Organizations can use WEX with DSX in a range of development and deployment scenarios. They can start by using the Notebook integration available in WEX Deep Analytics Edition and use the WEX for DSX add-on when their needs progress to using the WEX node with SPSS Modeler for DSX.
I'm excited to see this new addition to the IBM DSX. WEX can help offer value with its combination of open-standards-based Apache UIMA text analytics and natural language processing and content mining capabilities - helping to deliver an environment for detecting patterns in unstructured data. Let me close with the news that IBM was named a leader in this report The Forrester Wave™ AI-Based Text Analytics Platforms.
For more information on WEX for DSX click here.
Published at DZone with permission of Steven Astorino, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.