This article is an excerpt from Oculus Rift in Action by Bradley Austin Davis, Karen Bryla and Alex Benton.
Virtual reality is about constructing an experience that simulates a user’s physical presence in another environment. The Rift accomplishes this by acting both as a specialized input device and a specialized output device.
As an input device, it uses a combination of several sensors to allow an application to query for the current orientation and position of the user’s head. This is commonly referred to as the head pose. This allows an application to change its output in response to the changes in where the user is looking or where their head is.
In VR applications, a head pose is a combination of the orientation and position of the head relative to some fixed coordinate system.
As an output device, the Rift is a display that creates a deep sense of immersion and presence by attempting to more closely reproduce the sensation of looking at an environment as if you were actually there, compared to viewing it on a monitor. It does this by
- Providing a much wider field of view than conventional displays
- Providing a different image to each eye
- Blocking out the real environment around you, which would otherwise serve as a contradiction to the rendered environment.
On the Rift display, we can display frames that have been generated to conform to this wide field of view and offer a distinct image to each eye.
Because developing for the Rift involves rendering multiple images, it’s important to have terminology that makes it clear what image we might be talking about at a given moment. When we use the term frame, we’re referring to the final image that ends up on a screen. In the case of a Rift application, these frame images will be composed of two eye images, one each for the left and right eyes. Each eye image will have been distorted specifically to account for the lens under which it will appear, and then composited together during the final rendering step before they are displayed on the screen.
These specializations do not happen automatically. You can’t simply replace your monitor with a Rift and expect to continue to use your computer in the same way. Only applications that have been specifically written to read the Rift input and customize the output to conform to the Rift’s display will provide a good experience.
To understand how an application running on the Rift is different, it is important to look at how it is distinct from non-Rift applications.
All applications have input and output and most graphical applications invoke a loop that conceptually looks something like figure 1.
Figure 1: The typical loop for conventional applications
The details can be abstracted in many ways, but for just about any program you can eventually look at it as an implementation of this loop. For as long as the application is running, it responds to any user input, renders a frame and outputs that frame to the display.
Rift-specific applications embellish this loop, as seen in figure 2.
Figure 2: A typical loop for a Rift application
In addition to conventional user input, we have another step which fetches the current head pose from the Rift. This is typically used by the application to change how it renders the frame. Specifically, if you’re rendering a 3D virtual environment, you want the view of the scene to change in response to the user’s head movements.
In addition to the rendering step, we also need to distort it to account for the effects of the lenses on the Rift.
Practically speaking, the head pose is really a specialized kind of user input, and the Rift-required distortion is part of the overall process of rendering a frame, but we’ve called them out here as separate boxes to emphasize the distinction between Rift and non-Rift applications.
However, as we said, the design of the Rift is such that it shows a different image to each eye by showing each eye only one half of the display panel on the device. As part of generating a single frame of output, we render an individual image for each eye and distort that image, before moving on to the next eye. Then, after both per-eye images have been rendered and distorted, we send the resulting output frame to the device.