Over a million developers have joined DZone.

Simple Sample of the Watson Document Conversion Service

DZone's Guide to

Simple Sample of the Watson Document Conversion Service

A sample demo of the Watson Document Conversion service on Bluemix to convert documents into HTMl, plain text, or JSON.

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

With the Watson Document Conversion service on Bluemix PDF, Word and HTML documents can be converted into HTML, plain text or JSON. The converted documents can be used as input to other Watson services like Concept Insights and Retrieve and Rank.

In my concept insights sample I’ve used the service to convert the downloaded HTML files into JSON. From the JSON the title and body fields were extracted and uploaded to the concept insights service. Check out the Python script convert.py file to see how to invoke the service via curl for multiple files. Here is the key part.

curl_cmd = 'curl -k -s %s -u %s -F "config={\\"conversion_target\\":\\"ANSWER_UNITS\\"}" -F "file=@%s" "%s"' % (VERBOSE, DOCCNV_CREDS, htmlfilename, DOCCNV_CNVURL)
process = subprocess.Popen(shlex.split(curl_cmd), stdout=subprocess.PIPE)
output = process.communicate()[0]

Check out the API explorer for samples how to invoke the service from the command line and from Java, Node and Python. There are also various customization options.

Here is a sample of the online demo.


The post Simple Sample of the Watson Document Conversion Service appeared first on Niklas Heidloff.

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

convert html to wordpress ,api service composition ,bluemix

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}