Object Detection and Augmentation in Modern Web Development
Often when we think of augmentation we think of AR/VR via held devices, but, in this article, a dev explores this phenomenon in the context of the web.
Join the DZone community and get the full member experience.Join For Free
I’ve been playing around a lot with the Shape Detection API in Chrome and I really like the potential it has. For example, a very simple QRCode detector I wrote a long time ago has a JS polyfill but uses
new BarcodeDetector() API if it is available.
You can see some of the other demo’s I’ve built here: https://paul.kinlan.me/face-detection/, https://paul.kinlan.me/barcode-detection/ and https://paul.kinlan.me/detecting-text-in-an-image/
I stumbled across Jeeliz over the weekend and I was incredibly impressed at the performance of their toolkit — granted I was using a Pixel3 XL, but detection of faces seemed significantly quicker than what is possible with the
It got me thinking a lot. This toolkit (and ones like it) don’t use APIs that are not broadly available on the Web (unlike Chrome’s Shape Detection API), so we can reach billions of users with these experiences relatively easily, and we can now do so safely. That means we can build the fun snapchat-esque face filter apps without having users install massive apps that harvest a huge amount of data from their device (because there is no underlying access to the system).
Outside of the fun demos, it’s possible to solve very advanced use-cases quickly and simply for the user, such as:
- Text Selection directly from the camera or photo input by the user.
- Live translation of languages from the camera.
- Inline QRCode detection so people don’t have to open WeChat all the time.
- Auto extract website URLs or address from an image.
- Credit card detection and number extraction (get users signing up to your site quicker).
- Visual product search in your store’s web app.
- Barcode lookup for more product details in your store's web app.
- Quick cropping of profile photos on to people’s faces.
- Simple A11Y features to let the a user hear the text found in images.
I just spent 5 minutes thinking about these use-cases — I know there are a lot more — but it hit me that we don’t see a lot of sites or web apps utilizing the camera, instead, we see a lot of sites asking their users to download an app, and I don’t think we need to do that any more.
Published at DZone with permission of Paul Kinlan, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.