DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Toward Explainable AI (Part 10): Bridging Theory and Practice—Responsible AI: Ambition or Illusion?
  • Toward Explainable AI (Part 9): Bridging Theory and Practice—Conclusion: Explainability Under Real-World Conditions
  • An Introduction to Artificial Intelligence: Neural Networks, NLP, and Word Embeddings
  • Toward Explainable AI (Part 7): Bridging Theory and Practice—SHAP: Bringing Clarity to Financial Decision-Making

Trending

  • Detecting Bugs and Vulnerabilities in Java With SonarQube
  • A Deep Dive into Tracing Agentic Workflows (Part 1)
  • Skills, Java 17, and Theme Accents
  • AI Agents in Java: Architecting Intelligent Health Data Systems
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. Is Computer Vision Difficult To Use?

Is Computer Vision Difficult To Use?

Computer vision is a type of artificial intelligence that helps the computer to see the world, make interpretations, and analyze the visual world.

By 
Amrutha TESR user avatar
Amrutha TESR
·
Mar. 14, 24 · Analysis
Likes (1)
Comment
Save
Tweet
Share
2.8K Views

Join the DZone community and get the full member experience.

Join For Free

We humans see objects, places, and people using our eyes. We have been gifted with a natural object-analyzing, detection tool that helps us to identify things in the vicinity. But have you wondered how face lock works in both Android and iPhones? Do computers also have an eye that keeps looking at the world like humans?

Computer Vision

Computer vision is a type of artificial intelligence that helps the computer to see the world, make interpretations, and analyze the visual world. It also uses the machine learning concept to identify different objects that it sees and classify them with similar objects. The machine learning model used here is already well-trained to do this job.

But, in the process of identifying and classifying objects, a few difficulties can have a major effect on the final result.

1) Information Loss During Conversion of 3D to 2D

In this case, when the object is being captured by the camera, the main trouble is with the pinhole that we use. A pinhole is a box with a small hole in it that is used for perspective projection.

The real trouble with the pinhole model is that when the image is being captured,  the projective transformation sees a relatively small object close to the camera. In this case, we humans require a 'yardstick' to predict the actual size of the object. But this won't work out for computers. 

The actual image of the object is not captured in the computer so the size of a coin, a bat, and a building is the same when seen as an image in the computer.

2) Interpretation

When we humans try to analyze or understand an image, we use all of our previous long-gathered knowledge and experience to fully interpret the image and get insights from it.  We have invested several years in training an Artificial intelligence model to understand observations, but the ability of the model to understand observations is still limited. To increase the level of interpretation, several mathematical tools are being utilized.

3) Noise

Noise is present in each of the measurements of the image.  We use the mathematical tools that deal with such unreliability. Noise can't be removed to some extent but the usage of such tools can complicate the image analysis.

4) Large Data

The image and audio files that we use are huge in memory. An A4 sheet of paper is scanned monochromatically at 300 dots per inch corresponding to 8.5MB. Non-interlaced RGB 24-bit color video 512 * 768 pixels, makes a data stream of 225MB per second.

If the processing we conduct is not very simple then it is hard for it to achieve a real-time performance like processing 25 to 30 images per second.

5) Local View vs Global View

An image analysis algorithm analyses small storage in the local memory like a pixel in the image, the computer sees the image through a keyhole. When we see the image through a keyhole, it's more difficult to understand what the image is depicting. But It is easy for humans to interpret an image if it is seen globally 

Conclusion

In this blog, you can get a clear picture of the various difficulties faced while processing images using computer vision. Once we overcome these difficulties, we can make computer vision accessible for all.

I hope you enjoyed reading this blog. Please do like and comment on your views on today’s topic. Go to my profile for more such blogs.

Happy learning!!

Machine learning artificial intelligence

Opinions expressed by DZone contributors are their own.

Related

  • Toward Explainable AI (Part 10): Bridging Theory and Practice—Responsible AI: Ambition or Illusion?
  • Toward Explainable AI (Part 9): Bridging Theory and Practice—Conclusion: Explainability Under Real-World Conditions
  • An Introduction to Artificial Intelligence: Neural Networks, NLP, and Word Embeddings
  • Toward Explainable AI (Part 7): Bridging Theory and Practice—SHAP: Bringing Clarity to Financial Decision-Making

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook