Over a million developers have joined DZone.

Simple Sample of the Watson Document Conversion Service

DZone's Guide to

Simple Sample of the Watson Document Conversion Service

A sample demo of the Watson Document Conversion service on Bluemix to convert documents into HTMl, plain text, or JSON.

· Big Data Zone ·
Free Resource

The open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.

With the Watson Document Conversion service on Bluemix PDF, Word and HTML documents can be converted into HTML, plain text or JSON. The converted documents can be used as input to other Watson services like Concept Insights and Retrieve and Rank.

In my concept insights sample I’ve used the service to convert the downloaded HTML files into JSON. From the JSON the title and body fields were extracted and uploaded to the concept insights service. Check out the Python script convert.py file to see how to invoke the service via curl for multiple files. Here is the key part.

curl_cmd = 'curl -k -s %s -u %s -F "config={\\"conversion_target\\":\\"ANSWER_UNITS\\"}" -F "file=@%s" "%s"' % (VERBOSE, DOCCNV_CREDS, htmlfilename, DOCCNV_CNVURL)
process = subprocess.Popen(shlex.split(curl_cmd), stdout=subprocess.PIPE)
output = process.communicate()[0]

Check out the API explorer for samples how to invoke the service from the command line and from Java, Node and Python. There are also various customization options.

Here is a sample of the online demo.


The post Simple Sample of the Watson Document Conversion Service appeared first on Niklas Heidloff.

Managing data at scale doesn’t have to be hard. Find out how the completely free, open source HPCC Systems platform makes it easier to update, easier to program, easier to integrate data, and easier to manage clusters. Download and get started today.

convert html to wordpress ,api service composition ,bluemix

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}