Over a million developers have joined DZone.

Mass Conversion From Word To HTML

I got 1000 word files. Each contains 1 main image and some decorations.
(This is actually a big book scanned into 1-file-per-page format)
I need to extract all the images. What do I do?

Python can do some automation using COM. (or something like that)

import pythoncom, win32com.client

app = win32com.client.gencache.EnsureDispatch("Word.Application")

doc = 'C:\\lang\\try\\bdham\\p1'
app.Documents.Open(doc + '.doc')
app.ActiveDocument.SaveAs(doc + '.html', FileFormat=win32com.client.constants.wdFormatHTML)
# now repeat with p2, p3, etc.

Actually, I should put it in a loop. But this non-loop version
is easier to read and remember.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}