Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Extracting Text From an Image: Experiments With Shape Detection

DZone 's Guide to

Extracting Text From an Image: Experiments With Shape Detection

I just want to be able to copy text that is held inside images in the browser.

· Integration Zone ·
Free Resource

I had a little downtime after Google IO, and I wanted to scratch a long-term itch I've had. I just want to be able to copy text that is held inside images in the browser. That is all. I think it would be a neat feature for everyone.

It's not easy to add functionality directly into Chrome, but I know I can take advantage of the intent system on Android, and I can now do that with the Web (or at least Chrome on Android).

Two new additions to the web platform — Share Target Level 2 (or as I like to call it File Share) and the TextDetector in the Shape Detection API — have allowed me to build a utility that I can Share images to and get the text held inside them.

The basic implementation is relatively straightforward; you create a Share Target and a handler in the Service Worker, and then once you have the image that the user has shared, you run the TextDetector on it.

The Share Target API allows your web application to be part of the native sharing sub-system, and in this case, you can now register to handle all image/* types by declaring it inside your Web App Manifest as follows.

"share_target": {
  "action": "/index.html",
  "method": "POST",
  "enctype": "multipart/form-data",
  "params": {
    "files": [
      {
        "name": "file",
        "accept": ["image/*"]
      }
    ]
  }
}

When your PWA is installed, you will see it in all the places where you share images:

The Share Target API treats sharing files like a form post. When the file is shared to the Web App the service worker is activated the fetch handler is invoked with the file data. The data is now inside the Service Worker but I need it in the current window so that I can process it, the service knows which window invoked the request, so you can easily target the client and send it the data.

self.addEventListener('fetch', event => {
  if (event.request.method === 'POST') {
    event.respondWith(Response.redirect('/index.html'));
    event.waitUntil(async function () {
      const data = await event.request.formData();
      const client = await self.clients.get(event.resultingClientId || event.clientId);
      const file = data.get('file');
      client.postMessage({ file, action: 'load-image' });
    }());

    return;
  }
  ...
  ...
}

Once the image is in the user interface, I then process it with the text detection API.

navigator.serviceWorker.onmessage = (event) => {  
  const file = event.data.file;
  const imgEl = document.getElementById('img');
  const outputEl = document.getElementById('output');
  const objUrl = URL.createObjectURL(file);
  imgEl.src = objUrl;
  imgEl.onload = () => {
    const texts = await textDetector.detect(imgEl);
    texts.forEach(text => {
      const textEl = document.createElement('p');
      textEl.textContent = text.rawValue;
      outputEl.appendChild(textEl);
    });
  };
  ...
};

The biggest issue is that the browser doesn't naturally rotate the image (as you can see below), and the Shape Detection API needs the text to be in the correct reading orientation.

I used the rather easy-to-use EXIF-Js library to detect the rotation and then do some basic canvas manipulation to re-orientate the image.

EXIF.getData(imgEl, async function() {
  // http://sylvana.net/jpegcrop/exif_orientation.html
  const orientation = EXIF.getTag(this, 'Orientation');
  const [width, height] = (orientation > 4) 
                  ? [ imgEl.naturalWidth, imgEl.naturalHeight ]
                  : [ imgEl.naturalHeight, imgEl.naturalWidth ];

  canvas.width = width;
  canvas.height = height;
  const context = canvas.getContext('2d');
  // We have to get the correct orientation for the image
  // See also https://stackoverflow.com/questions/20600800/js-client-side-exif-orientation-rotate-and-mirror-jpeg-images
  switch(orientation) {
    case 2: context.transform(-1, 0, 0, 1, width, 0); break;
    case 3: context.transform(-1, 0, 0, -1, width, height); break;
    case 4: context.transform(1, 0, 0, -1, 0, height); break;
    case 5: context.transform(0, 1, 1, 0, 0, 0); break;
    case 6: context.transform(0, 1, -1, 0, height, 0); break;
    case 7: context.transform(0, -1, -1, 0, height, width); break;
    case 8: context.transform(0, -1, 1, 0, 0, width); break;
  }
  context.drawImage(imgEl, 0, 0);
}

And Voila, if you share an image to the app, it will rotate the image and then analyze it, returning the output of the text that it found.

It was incredibly fun to create this little experiment, and it has been immediately useful for me. It does, however, highlight the inconsistency of the web platform. These API's are not available in all browsers, they are not even available in all version of Chrome — this means that as I write this article on Chrome OS, I can't use the app, but when I can use it...OMG, so cool. Let me know what you think in the comments!

Topics:
integration ,tutorial ,extracting text from an image ,shape detection

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}