Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Microsoft Cognitive Services: Computer Vision

DZone's Guide to

Microsoft Cognitive Services: Computer Vision

The Computer Vision Service recognizes different things on photos, provides a list of tags, describes objects and things on it, detects faces, and does text recognition.

· AI Zone
Free Resource

Bring the power of Artificial Intelligence to IT Operations. Brought to you in partnership with BMC.

Similar to Face API, Computer Vision API service deals with image recognition, though on a bit of a wider scale. The Computer Vision Cognitive Service recognizes different things on a photo and tries to describe what's going on — with a formed statement that describes the whole photo, provides a list of tags, describes objects and living things on it. And similar to the Face API, detect faces. It can even do basic text recognition (printed or handwritten).

Create a Computer Vision Service Resource on Azure

To start experimenting with Computer Vision API, you have to first add the service on Azure dashboard.

Untitled

The steps are almost identical to what I've described in my Face API blog post, so I'm not going to describe all the steps; the only thing worthy of a mention is the pricing. There are currently two tiers: the free tier (F0) is free and allows for 20 API calls per minute and 5,000 calls per month, while the standard tier (S1) offers up to 10 calls per second. Check the official pricing page here.

Hit the Create button and wait for service to be created and deployed (should take less than a minute). You get a new pair of the key to access the service; the keys are, again, available through Resource Management > Keys.

Trying It Out

To try out the service yourself, you can either try the official documentation page with the ready-to-test API testing console or you can download a C# SDK from nuget (source code with samples for UWP, Android, and iOS (Swift)).

Also, source code used in this article is available from my Cognitive Services playground app repository.

For this blog post, I'll be using the aforementioned C# SDK.

When using the SDK, the most universal API call for Computer Vision API is the AnalyzeImageAsync:

var result = await visionClient.AnalyzeImageAsync(stream, new[] {VisualFeature.Description, VisualFeature.Categories, VisualFeature.Faces, VisualFeature.Tags});
var detectedFaces = result?.Faces;
var tags = result?.Tags;
var description = result?.Description?.Captions?.FirstOrDefault().Text;
var categories = result?.Categories;

Depending on the visualFeatures parameter, AnalyzeImageAsync can return one or more types of information (some of them also separately available by calling other methods):

  • Description: One on more sentences, describing the content of the image, described in plain English.
  • Faces: A list of detected faces; unlike the Face API, the Vision API returns age and gender for each of the faces.
  • Tags: A list of tags related to image content. 
  • ImageType: Whether the image is a clip art or a line drawing. 
  • Color: The dominant colors and whether it's a black and white image.
  • Adult: Indicates whether the image contains adult content (with confidentiality scores).
  • Categories: One or more categories from the set of 86 two-level concepts, according to the following taxonomy:

The details parameter lets you specify domain-specific models that you want to test against. Currently, two models are supported: landmarks and celebrities. You can call the ListModelsAsync method to get all models that are supported, along with categories they belong to.

image

Another fun feature of Vision API is recognizing text in images, whether it's printed or handwritten.

var result = await visionClient.RecognizeTextAsync(stream);
Region = result?.Regions?.FirstOrDefault();
Words = Region?.Lines?.FirstOrDefault()?.Words;

The RecognizeTextAsync method will return a list of regions where printed text was detected, along with general image text angle and orientation. Each region can contain multiple lines of (presumably related) text, and each line object will contain a list of detected words. RegionLine, and Word will also return coordinates, pointing to a region within the image where that piece of information was detected.

TrueSight is an AIOps platform, powered by machine learning and analytics, that elevates IT operations to address multi-cloud complexity and the speed of digital transformation.

Topics:
ai ,microsoft cognitive services ,computer vision ,api ,tutorial

Published at DZone with permission of Andrej Tozon, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}