Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Simple Sample of the Watson Document Conversion Service

DZone's Guide to

Simple Sample of the Watson Document Conversion Service

A sample demo of the Watson Document Conversion service on Bluemix to convert documents into HTMl, plain text, or JSON.

· Big Data Zone
Free Resource

Learn best practices according to DataOps. Download the free O'Reilly eBook on building a modern Big Data platform.

With the Watson Document Conversion service on Bluemix PDF, Word and HTML documents can be converted into HTML, plain text or JSON. The converted documents can be used as input to other Watson services like Concept Insights and Retrieve and Rank.

In my concept insights sample I’ve used the service to convert the downloaded HTML files into JSON. From the JSON the title and body fields were extracted and uploaded to the concept insights service. Check out the Python script convert.py file to see how to invoke the service via curl for multiple files. Here is the key part.


curl_cmd = 'curl -k -s %s -u %s -F "config={\\"conversion_target\\":\\"ANSWER_UNITS\\"}" -F "file=@%s" "%s"' % (VERBOSE, DOCCNV_CREDS, htmlfilename, DOCCNV_CNVURL)
process = subprocess.Popen(shlex.split(curl_cmd), stdout=subprocess.PIPE)
output = process.communicate()[0]

Check out the API explorer for samples how to invoke the service from the command line and from Java, Node and Python. There are also various customization options.

Here is a sample of the online demo.

watdoccon

The post Simple Sample of the Watson Document Conversion Service appeared first on Niklas Heidloff.

Find the perfect platform for a scalable self-service model to manage Big Data workloads in the Cloud. Download the free O'Reilly eBook to learn more.

Topics:
convert html to wordpress ,api service composition ,bluemix

Published at DZone with permission of Niklas Heidloff, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}