Automatic 2D to 3D Conversion for Your Selfies?

DZone 's Guide to

Automatic 2D to 3D Conversion for Your Selfies?

Not your grandfather' s Viewmaster, that's for sure. The algorithm will help feed the VR monster's content appetite.

· IoT Zone ·
Free Resource

3-D movies have really taken off in the last 10 years. The movie Avatar may have provided the largest push in this direction for the industry. But we have been fascinated with 3-D ever since we have been making photographs.

One of my favorite Hitchcock movies "Dial M for Murder" was filmed in in 1954. What you may not know is that it was originally shot in "real" 3-D. There were two separate lenses for left and right eye views, and it was shot in color. It required a special projector with polarizing filters and polarizing viewing glasses much like the kind glasses we use today. It remains one of the most subtle, sophisticated, impactful 3-D movies ever made and I recommend seeing it. (Author's note: As soon as it becomes available for Gear VR or Oculus Rift I promise I will be watching it in that format!) But even as early as the beginning of the 17th century people were thinking about stereo viewers, and in the early 19th century stereoscopes were not uncommon. In fact, the devices and the image cards are quite collectible. We have been intrigued with 3-D technology for quite a while. My guess is we won't be satisfied until we get a holodeck.

But the main impediment to the introduction of any 3-D viewing system is the production of the 3-D media. Back in the day of stereoscopes, one needed either two cameras or one special camera with two lenses. Today's movies, shot with modern small digital cameras (e.g. Red) find it far more practical to get the dual view shots. But even so many of these feature films are still shot with a single monocular image which is later turned into stereoscopic images in a post process that uses human "depth artists" to manually map each frame of the shot onto a depth armature that the "depth artist" sculpts. It is highly labor-intensive and obviously subjective. The 3-D images can have "impact" but usually don't seem quite right. Most reviewers, both on the artistic and the technical side, agree that the effect is much less realistic and it is much more likely to give you a "3-D headache". (Author's note: For the record, I currently only watch 3-D that was shot in 3-D. Find out what movies are real 3-D here.)

But what about all those ordinary, monocular images we already have, not to mention all the images we will continue to make? What if we could add the cachet of believable 3-D to those 2-D images? And what if this method could be consistent and good enough to make realistic 3-D movies (without "depth artists") from 2-D movies?

Well, perhaps some good news! Researchers Junyuan Xie, Ross Girshick, and Ali Farhadi at the University of Washington have found a way to train models using deep convolutional neural networks. You can read the paper here, and I do recommend looking at this paper. Yes, there are a few equations that may be a little dense but it is very well written and you will find much of it enjoyable and understandable.

Image title

Figure 1 from the paper

What is really intriguing about their method is not the convolutional neural nets (which are great, don't get me wrong) but rather how they got the corpus of training data. As I mentioned, the industry has been creating a growing number of "real" 3-D movies for the last decade. The researchers realized that they had training data with two perfect stereoscopic images which they could use to train against the individual monocular images. There were millions of frames of the cleanest and most ideal input data one could imagine. It was all properly lighted and color balanced and artifact free. And, in case you aren't a data scientist, data is usually the biggest problem in any machine learning task. On almost any given machine learning project more human effort is expended on collecting, cleaning, restructuring, etc. the input data than on the algorithms and the analysis. This corpus could only be described as "data bliss" for the deep3D project.

Image title

The system isn't perfect yet, but from the looks of this very early work it looks extremely promising. Judging from the level of investment in virtual reality technology today coupled with the dearth of content for those platforms and environments I expect to see a lot of rapid advancement.

For those of you who are feeling extremely adventurous the deep 3-D code is available on GitHub here! You will need some other stuff, just follow the instructions to get mxnet.

So how long will it be before the algorithm is in your smart phone? One of you ingenious readers out there could make an app. Seriously, write the app and I would be more than happy to review it. Just let me know :)

3d parallax, machine learning, rendering

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}