Over a million developers have joined DZone.

3 Augmented Reality Frameworks for Windows Phone

· Mobile Zone


Augmented reality (AR) solutions and projects are seen as that next step for a user interface or experience that melds technology with the real world. With the embedding of AR camera features in devices such as the Nintendo 3DS (and you don’t even need to buy a game to play around with it), GPS recognition systems, and audio detection, AR is slowly becoming mainstream.

While the reality is that AR is perceived (by developers at least) as just too difficult or cumbersome to implement, thanks to the developer community, this just isn’t the case. This article walks through what features are available with AR, explains terminology, and provides a high-level comparison of some of the premiere frameworks for use on Windows Phone.

Newcomers: Download the Windows Phone Toolkit for Free!

Descriptions of types

This section details the three core concepts of AR where we take input from the real world and merge that into the app experience.

·      3D space

 AR frameworks need to understand the 3D space of the world you are viewing and potentially manipulating. These principles are the same for 3D games.

With 3D space types of AR, you are applying a 3D representation of what you are displaying proportional to the view through a normal camera based on either the axis that the device is being held or the position of objects in the real world.  This could be deforming the video image or simply overlaying an object or image on top of the video.  Interaction is important, whether manipulating the 3D view or just interacting with it.

3D space is important because you are trying to meld a 3D view through a camera with an artificial 3D view generated by the app or game. In most cases, this is achieved by just AR markers or special or feature recognition in advanced cases, which is described in the next section.  Knowing where you are and what you are looking at in relation to what you are intending to display is key.

·      Video recognition 

A more advanced AR system recognizes what it  reads from the video input by depth / motion or definition.

The simplest recognition method is to use some kind of fixed image (commonly referred to as an AR card / marker), which varies from a QR code image or strongly colored image like the 3DS AR cards.  A definitive image is key so the software easily recognizes where in the camera view the image is and what its current orientation is.

More advanced systems can actually recognize the world without the use of such cards. These systems can recognize features such as faces (as commonly used in modern-day cameras) and also depth (usually with the aid of IR depth reading cameras, like the Kinect). However, these systems generally require complex algorithms and, more importantly, power—both CPU and electric, both of which are limited on most mobile form factors.  In many cases, this level of AR isn’t needed and can be achieved with the simpler solution using fixed images.

One of the more fascinating projects I’ve seen using this approach is the display of the final construction of a shopping center on a handheld device / tablet. This display allows you to virtually walk around the site and see the constructed building before it’s even built.

·      Geo position

Another slant on the AR world is an object’s exact location in relation to where the camera or person is in the real world. This uses the GPS coordinate of the object in relation to the GPS coordinate of the user and the direction he or she is facing. Most applications that employ this create a collection of places (like restaurants and gas stations) and display icons or images when the camera is facingthe user’s general direction. Implementations also use nearby surroundings to alter the view or game world as the player is using it.  For example, a Hangman-style game may search for points of interest near the player and then offer up words relating to them (e.g., food items, if near a supermarket, or sports celebrities if near a sports stadium).

On platforms like Android and iOS where an application can be running continually in the background, the app can alert the user when he or she is near a place and can provide related information. Just because the user isn’t using a video feed at the time, the app can still recognize that person’s surroundings. Such techniques have been used a lot in GeoTracking projects.

The Toolkits

When deciding to implement AR into your app or game (or when considering the design of your app/game), it’s important to choose a framework that is best geared to assist you to meet your goals.

When looking around there are several AR Toolkits available and if you browse through the App Hub catalogue, you’ll find at least one AR sample there as well. However, the AppHub sample is more an exploration of the Motion API (one of the best) rather than AR. Even with its simple use of the camera feed, it is just displaying two textures relational to the position of the device (the frame of the video and the 3D overlay).

Each of the toolkits described below has unique abilities aimed at solving certain issues or providing unique features. The following introspection provides a high-level description of each of the top three AR toolkits. There is no doubt there are many more out there but these make up the cream of the crop for simplicity and ease of use.

·      Silverlight AR Toolkit (SLAR)



On its website, http://slartookit.codeplex.com/, SLAR Toolkit is described as a flexible AR library for Silverlight and Windows Phone with the aim to make real-time AR applications with Silverlight as easy and fast as possible. It can be used with Silverlight's Webcam API (or with any other CaptureSource), WriteableBitmap, or Windows Phone's PhotoCamera.

SLAR Tookit is based on the established ARToolkit (http://www.hitl.washington.edu/artoolkit) and NyARToolkit (http://www.artoolworks.com/products/desk-top/nyartoolkit ) which use a dual license mode. It can be used for open- or closed-source applications under certain conditions.

It has implementations in Silverlight 4 / 5 (including Hardware acceleration in SL 5) and it’s available for Windows Phone.

If you are feeling adventurous, you could also use this in a SilverXNA (the Silverlight XNA integration in WP 7.1), potentially using a 3D rendering on top of your Silverlight framework. A great example of this was shown off at Build using Kinect, where QR codes were placed around a room and different interactions were kicked off when the camera detected different codes. You could use recognition instead of the video to kick off actions. Think beyond!

Project Examples

Here are a few images of what others have done with the SLAR toolkit:

Using the Framework

The SLAR Toolkit has one of the easiest implementations I’ve seen. It breaks down like this:

o   Set up a standard video capture source.

o   Initialize the AR recognition engine passing the capture source as a reference.

o   Hook up to the events for the AR recognition engine.

o   Start the camera capture source.


For example:

// Load AR marker from the generated marker file

var markerSlar =

         Marker.LoadFromResource("data/Marker_SLAR_16x16segments_80width.pat", 16, 16, 80.0);


// Initialize detector with the a camera capture source

ArDetector =

         new CaptureSourceMarkerDetector(captureSource, 1, 4000, new List<Marker>{ markerSlar });


// Hook up to the marker detection event

ArDetector.MarkersDetected += (s, e) =>


   var detectedResults = e.DetectionResults;


All that is left is to do something when the AR marker is detected. (The site already comes with a pre-pared AR marker to use, but you can make your own if you wish). The samples and documentation on the site give some basic implementation examples that can help get you on your way. 

There’s a great article here that shows this implementation on Windows Phone: http://kodierer.blogspot.co.uk/2011/05/augmented-mango-slartoolkit-for-windows.html

In my own experience of implementing SLAR in a SilverXNA project, it is important to ensure that you remember that you are melding two separate worlds. The SLAR Toolkit gives you exactly what you need to position your XNA 3D world based on what the camera is seeing, Remember, the camera can move on its own (thanks to the person holding the camera) and is no longer at the control of the game engine!


Alternatively, you can use the Balder3D engine in Silverlight for a different twist. It is compatible with Windows Phone. The SLARToolkit even has samples written for Balder! (http://balder.codeplex.com/)


If you get stuck, there are many out there willing to help via the discussions on the codeplex site or from its creator Rene Schulte (http://kodierer.blogspot.co.uk).[TD1]

·      Goblin XNA




Goblin is a completely different beast than the SLAR Toolkit. Goblin offers a much wider range of capabilities and features right out of the box, including physics and networking support via additional open-source libraries that are cleverly integrated.

Goblin provides:

o   Full support 3D scene manipulation and rendering.

o   6DOF (six-degrees-of-freedom) position and orientation tracking.

o   Support of the Vuzix iWear VR920 head-worn display in monoscopic and stereoscopic modes (3D viewing).

o   A 2D GUI system to allow the creation of classical 2D interaction components.


This extra complexity does come at a little extra cost in terms of brain matter.  What this really comes down to is that you have to think in 5D—not just the position and orientation of the 3D scene you are drawing but also the relative position where you are drawing the scene plus the position and orientation of the camera / person holding the camera.

This gives more detailed and expansive options in what you can implement. You only have to look at some of the research projects done with Goblin, ranging from simple interactive surfaces to a full 3D island maze where 3D objects collide with real-world objects.

On the bright side, the framework does a lot of the grunt work and math. the only limit is your imagination.

Project Examples

Here are a few images of what others have done with the Golbin toolkit:



Using the Framework

At the time of writing, Goblin XNA supports up to XNA 4.0. Even though there isn’t a specific Windows Phone release at the moment, the library will run under Windows Phone and there are plans to release a phone version later.

I’d recommend a solid understanding of XNA before you start down this path.

Similar to SLAR to use the toolkit to recognise markers you would initialise Goblin as follows:

// Add this video capture device to the scene so that it can be used for

    // the marker tracker



// Create an optical marker tracker that uses ARTag library

       tracker = new ARTagTracker();

       // Set the configuration file to look for the marker specifications

       tracker.InitTracker(638.052f, 633.673f, captureDevice.Width,

       captureDevice.Height, false, "ARTag.cf");


// Set the marker tracker to use for our scene

       scene.MarkerTracker = tracker;


       // Display the camera image in the background. Note that this parameter

       // should be set after adding at least one video capture device to the

       // Scene class.

       scene.ShowCameraImage = true;


After that instructions follow you through composing your 3D scene and using built in physics, the toolkit takes care of the rest.

As stated not quite as easy as SLAR but very powerful once you have mastered the framework.

*Note, Goblin won’t work “out of the box” for Windows Phone and will require a little effort to make it work with the CameraSource used in WP, it’s not difficult as the WP CameraSource is based on the DirectX version.  At the time of writing the Goblin team were working on a source port for WP7, check the discussion on the site for more info.


Channel 9 article explaining Goblin use end-to-end:


Goblin article using ALVAR for advanced tracking:



·      GEO AR Toolkit (GART)




GART was created by Jared Bienz, a Microsoft employee living in Houston Texas. Jared helps developers build applications for Windows Phone, so if you build something cool with GART he'd love to hear about it.

The GART project describes itself as a framework that was created to help people quickly and easily build AR applications for Windows Phone.

This kit is different from other AR kits in that it enables “Geo AR.” Where other toolkits place virtual things on top of specially printed tags, this toolkit places information on top of real places in the world around you by tracking where you are and the direction you’re facing.

Geo AR apps are easy to write because all you need to provide is a collection of objects that have latitude and longitude points. These can come from anywhere—for instance, a Bing restaurant search, a Flickr photo search, or a Wikipedia article search. The framework then takes care of managing sensors and tracking where the user is in relation to the reference points. It can show where the points are in relation to user from a top-down perspective (on a map) or it can show where the points are as a virtual lens into the real world.

Please note that GART makes heavy use of the Motion APIs shipping with Windows Phone Mango (OS 7.5), so it is recommended you have motion-enabled device to use GART (if the device does not have a Gyroscope the motion API will attempt to compensate with what sensors are available). This should include all devices that ship with 7.5 as well as many of the existing 7.0 devices that have been upgraded to 7.5. The emulator, unfortunately, does not currently support the Motion API fully.

You could view GART as an extension to the SLAR Toolkit. However, GART does not use any of the video recognition or tracking features and solely relies on the Geo data it has to work with.

The toolkit simply functions by using the GPS on the device (or one of the other location services available on Windows Phone) in combination with the Motion API to discover where you are and which direction you are facing. It then discovers places near you from the web and plots them out. If a place is in your field of view through the camera, then a tag is displayed.

The framework is not limited to just the camera and can be easily extended to offer additional information about the places of interest on the device in other ways (arrows pointing to places to the left or right, for example).

The toolkit also offers additional services to further enhance the users experience out of the box:

·       HeadingIndicator – Draws a circle with a cone that rotates to show the user’s heading. Good for layering on top of a map.

·       OverheadMap – Displays a Bing map that remains centered on the user’s location. Normally, the map is fixed to ‘North Up’, but it can also rotate to the user’s heading.

·       VideoPreview – Essentially, a rectangle that displays video from the camera. Normally, it’s placed as the first (or lowest) layer and it’s set to fill the screen. But you can have multiple instances of this layer at different sizes and locations. For example, you could have an OverheadMap fill the screen with a small video preview in the corner.

·       WorldView – Displays a virtual world in 3D space and applies matrix math to keep the virtual world aligned with what’s seen through the camera.


Note that there is a difference between facing and travelling. A person can be driving north on the freeway but looking west out the window.

Items to display are simply sourced from a list of items which can be local (good for GeoCaching types of projects) or from Bing (the toolkit includes Bing search functionality).

Project Examples

Here are a few images of what others have done with the GART toolkit:



Pinbucket                          Gart Sample project


Using the Framework

The basic steps anyone would follow to build an app with the toolkit are:

·       Start a new Windows Phone project

·       Add an ARDisplay control to your page

·       Add the views (or layers) you want as children of the ARDisplay

·       In Page.OnNavigatedTo call ARDisplay.StartServices()

·       In Page.OnNavigatedFrom call ARDisplay.StopServices()

·       Create a collection of ARItem objects (or your own custom type that inherits from ARItem)

·       Set the GeoLocation property of each ARItem to a location in the real world

·       Set ARDisplay.Items equal to your new collection


As Gart is Silverlight based all the nessasary code is built into the UserControl provided by GART, so implementing it purely comes down to adding the control to your page:

   <Grid x:Name="LayoutRoot">

                  <ARControls:ARDisplay x:Name="ARDisplay" d:LayoutOverrides="Width">

                           <ARControls:VideoPreview x:Name="VideoPreview" />

                           <ARControls:OverheadMap x:Name="OverheadMap" CredentialsProvider="{StaticResource BingCredentials}" />

                           <ARControls:WorldView x:Name="WorldView" />

                           <ARControls:HeadingIndicator x:Name="HeadingIndicator" HorizontalAlignment="Center" VerticalAlignment="Center" />



And then initializing the control in your page “OnNavigatedTo” and stopping it in your “OnNavigatedFrom” methods of your page:

protected override void OnNavigatedFrom(System.Windows.Navigation.NavigationEventArgs e)


            // Stop AR services






        protected override void OnNavigatedTo(System.Windows.Navigation.NavigationEventArgs e)


            // Start AR services





All that remains is to source all the labels you want displayed on the viewfinder, the examples provided with the framework allow you to roll your own or integrate it with Bing.


The Motion API library reference:


Bing Developer resources:


Pinbucket case study:




Hopefully, this article shows you some of the possibilities with AR from full blown AR solutions to a hybrid mix of capabilities that can extend your current and future solutions. Additionally, this article is intended to show that AR is not as daunting as things first appear.

Personally, I’d like to see more merged solutions where you blend more of the real world in your applications or extend your application into the real world.

I’ve seen projects with great potential. One such example is Kickstarter, an app for people who run where the app/game tracks where you are and plans a route for you. It sets target packages or activities to do on a run. The runner then collects points or artefacts that they can use at the end of their run to manage a virtual base. Each person’s base competes with other players around the web.

In my opinion, you could take almost any game, point it out of the window or, with a quick web search, drag the real world in and change the experience for the player and make it different every time. You could use QR codes from everyday objects or even set up an Easter egg hunt using an app.  Go wild, be fun and creative, and don’t settle—this is an augmented world after all (quick nod to deus-ex there).


Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}