User Tools

Site Tools


start

Computer Vision

Welcome to the home of Computer Vision powered by PlayFusion. Computer Vision is one of the many subdomains of Artificial Intelligence, more commonly referred to as a ‘AI’. This site focuses specifically on Monocular Visual Odometry on mobile phones and tablet devices which is another way of saying ‘teaching low powered devices with one eye to see’ as opposed to the high end computing power available on personal computers or specialist stereo or depth sensing cameras.

This is a very exciting area of computer vision research and an area PlayFusion has been pioneering for the past three years. We call it the Enhanced Reality Engine(™)

Computer Vision basics

The essentials of computer vision are:

  • Acquisition of images - via cameras or other detectors
  • Processing of data - enhancement, feature extraction etc.
  • Analysis of data - object recognition, application of data for specific tasks

Essentially the desire is to reproduce what our eyes and brains do and extend this capability to an automated system, while adding enhanced functionality and extracting or mixing data unavailable by direct vision.

The scope of application for computer vision is vast and varied. In the article below we look at these and consider some of the methods used and the algorithms applied, with specific attention to the use of computer vision solutions in Augmented Reality (AR) and Mixed Reality (MR) applications, where PlayFusion is leading the way with it’s advanced Enhanced Reality Engine. In these applications there is a lot of attention paid to optimization of algorithms, to run on low end hardware, with a single camera and emphasis on fast accurate recognition, mapping and tracking.

What is Augmented Reality in computer vision?

The process by which computer generated content is overlaid onto images or video streams

Why do it?

For many reasons. To provide an enhanced experience for users and to provide more information to the user than is available from simple observation.

Solutions

So what makes PlayFusion’s Enhanced Reality Engine the solution of choice for many computer vision applications?

Let’s first look at the other big players already in this field.

ARKit - from Apple

  • Platform: Only runs on iOS 11 systems
  • Hardware: Restricted to high end new Apple devices capable of running iOS 11 and even then performance can’t be guaranteed.
  • Tracking: Can accurately track the device position in the real-world. Using the Visual Inertial Odometer, combined with camera tracking and motion sensor data, with which a real-time position of the device is recorded. No tracking cards are required
  • Landscape Understanding and Lightning Perception: iPhones and iPads would be especially aware of the surroundings using the ARKit. It comes with the ability to identify surfaces in the real-world environment, like floor, tables, walls, ceiling, etc. Also referred to as ‘Plane Detection’.
  • Rendering: It provides an easy integration with SpriteKit, SceneKit and Metal, with an added support for in Unity and Unreal Engine

ARCore - from Google

  • Platform: Only ruins on Android systems running Android N and above
  • Hardware: Limited to Google Pixel and some high end Samsung phones
  • Motion Tracking: observes IMU sensor data and feature points of the surrounding space to determine both the position and orientation of the device as per its movement
  • Environmental Understanding : detects horizontal surfaces using features similar to motion tracking
  • Light Estimation: detects the lighting ambience of the device, thereby enhancing the appearance and making the visual accurate in real-time
  • User Interaction: With the ‘hit-testing’ feature, detects intersection of light rays in the device’s camera view
  • Anchoring Objects: To accurately place a virtual object, defines an anchor that ensures its ability to track the object’s displacement over a period of time.

OpenCV - open source software

Open Source Computer Vision is a library of programming functions mainly aimed at real-time computer vision. The library is cross-platform and free for use under the open-source BSD license.

It is programmed in C++ and this is still it’s primary interface. This makes it difficult to use for non-experienced C++ programmers and, although it has a wealth of features and algorithms, it requires a great deal of understanding of these in order to implement them effectively.

Enhanced Reality Engine - powered by PlayFusion

  • Platform agnostic: the Enhanced Reality Engine runs equally well on Apple iOS and Google Android devices. It can also run on Windows based systems.
  • Hardware: Will run on low end mobile devices with older versions of Android and iOS.
  • Multiplayer AR: Create AR experiences with multiple users
  • AR cloud: for faster multi user performance
  • Motion tracking: Can track objects across surfaces
  • Multiple planes: Can identify horizontal and vertical planes
  • Interface support: Unity, Unreal, CryENGINE
  • Development: The Enhanced Reality Engine is used in PlayFusion’s own products, so development is ongoing. We are always looking to push the state of the art in computer vision for our own products and pass this on to users of our platform.

Why has this innovation come from games industry veterans rather than those really smart people at Microsoft?

Many people regard computer games as purely a distraction, not a serious industry or berate it for the fact that “kids nowadays don’t mix and climb trees anymore”. However, the games industry is responsible for major developments in computer technology that have led to advances in all kinds of other areas of our lives.

Tremendous competition in the games industry means that developers are constantly competing to make more realistic graphics, faster processing or data, better multiplayer experiences etc. etc. Because the game industry is so large it drives hardware development to cope with the demands of producing ever more impressive content.

Consequently we now see graphics cards being used to provide low cost, superfast processing of mathematical and statistical data for all kinds of applications, processors that are orders of magnitude faster than a few years ago, video and graphical image manipulation that is years ahead of where it would otherwise be, computer vision solutions to all kinds of problems, faster disk drives etc. all leading to developments in medical technology, aerospace, manufacturing, retail, and a host of other application areas.

And above all, people in the games industry are passionate, imaginative and motivated by a desire to create something new and exciting - we imagine something that we would like to do - and then we make it happen. We believe anything is possible.

Examples

Applications

What kinds of applications can PlayFusion’s Enhanced Reality Engine be used in?

At PlayFusion we have developed advanced computer vision capabilities that can be applied to many areas of industry, here are a few of them:

  • Automotive - computer vision is used in areas such as self driving autonomous vehicles, maintenance, manufacturing, diagnostics
  • Medical - computer vision is being applied to diagnostics for medical scans, surgery and rehabilitation amongst others
  • Sports - many sports use computer vision is a variety of forms from augmented reality experiences for their audience to providing data for performance analysis
  • Retail - many retail businesses are seeing a need to update their stores to give customers a more interactive experience and a reason to visit, rather than buy online. Computer vision has been instrumental in transforming these.
  • Entertainment - this is a major driving force behind computer vision. Augmented reality and virtual reality are employed in film and television and of course in the computer games industry.
  • Manufacturing - advances in robotic assembly, laser cutting and welding diagnostics, amongst and testing, amongst others have all been made possible by utilizing computer vision technology
  • Defence and Aerospace - the applications for computer vision in these areas includes target identification and tracking, enhanced vision capabilities and training.
  • Leisure and tourism - computer vision techniques have been used to advertise and sell products and to provide immersive experiences.
  • Heritage - museums and other heritage sites are making experiences more interactive and entertaining

In fact almost any market sector you can think of can employ computer vision to enhance and extend its capabilities.

Technology

The Enhanced Reality Engine for computer vision solutions - powered by Playfusion

At the core of the technology is our proprietary Enhanced Reality Engine. This has been developed from our award winning computer games and is set to form a platform which can be licensed by users wanting to add augmented reality experiences or other advanced computer vision technology to their products or services.

Enhanced Reality Engine development goals

  • Accuracy - the pose estimate must be close to the actual value.
  • Reproducibility - must be consistent in its results
  • Adaptability - must work well over variable distance and time
  • Accessibility - must run in real time on commonly available low cost hardware
  • Robustness - must work well in different environments, lighting conditions etc.

PlayFusion's Enhanced Reality Engine employs a technique known as Monocular Visual Odometry and has some very advanced features, many of which you won’t find elsewhere. The most significant of these are shown below.

  • Multi person AR - multiple users can share an AR experience with individual input
  • Works on any platform - Android, iOS, Windows
  • Works on low end hardware - because of our gaming heritage our engine has been developed to be as inclusive as possible with the hardware users have access to. We have worked hard to optimize our code and this has enabled us to get unparalleled performance, even from low end mobile devices.
  • AR cloud
  • Integration into Unity Unreal and CryENGINE
  • Constant development and improved performance - we use it ourselves in our games, so the technology is evolving all the time.
  • Target tracking
  • Image recognition

So let’s look at some of these in more detail and see how they can be applied…..

But first..some explanation of computer vision terminology

What is visual odometry?

This is the process in computer vision where the pose (position and orientation) of a camera is estimated using visual data over time. This can be independent of a map of the surroundings, so it can, for example, estimate pose relative to starting position etc.

Simply put, this technique involved tracking a set of interest points - eg. edge or corner pixels in a video feed - and using advanced geometric algorithms to estimate poses. A flow chart of the method would look something like this:

Detect matching points in multiple images > compute matrices relating pairs of points > perform post processing to refine accuracy eg. Extended Kalman Filter > output of pose data

This looks simple enough, however it involves a number of other computer vision techniques in order to work. For example use of key frames - to fix reference points in time and space, an image analysis algorithm to detect the points must be employed, some image enhancement may be necessary to facilitate this, some prior knowledge about the environment may be required or need to be collected before the implementation of the odometry measurements eg. detection of planes and camera height etc.

Finally the data can be used in a number of ways, for example animation of an object overlaid on a live video feed to give an Augmented Reality experience.

What is monocular visual odometry?

Many visual odometry systems employ dual camera systems so that 3D data can be collected. This makes pose estimation a lot easier. However, as we have mentioned already, PlayFusion’s Enhanced Reality Engine is designed to be a cross platform tool that works on a wide range of low cost mobile devices. These generally have only one camera so the problem of pose estimation is much more complicated.

Essentially it is doing what other systems require two cameras for on one camera.

Monocular inertial visual odometry

This uses additional data from sensors that are commonly present in mobile technology to enhance precision. In particular inertial sensors that detect movement.

SLAM - Simultaneous Localization And Mapping

A method where a map of an environment is built up while simultaneously plotting the pose of a camera travelling in that environment.

SLAM can essentially be broken down into a flow that looks something like this:

Landmark extraction > data association > state estimation > state update > landmark update.

As can be seen, these two computer vision techniques work together, the flow chart for visual odometry is very similar to SLAM and essentially forms the Localization part of the procedure whereby the rest of the algorithm calculates the real world mapping. So we can now produce a more comprehensive flow chart.

StepOperationNotes
1Scan area to establish key frames and detect landmarksMove camera around
2Initialize odometry
3Detect matching point pairs and relative movementMove camera or object
4Compute relative matrices
5Post process data to improve accuracy
6State estimation (pose)
7State update (pose change)
8Landmark update
9Output dataAnimate graphics etc.
10Repeat from 3

Although this is a simplified workflow, it shows the basic concept of visual inertial odometry that forms the basis of most computer vision AR methods. The algorithm’s and physics are no secrets and there are many papers describing these in detail - however the skill is in implementing them efficiently. PlayFusion’s Enhanced Reality Engine uses a similar, but highly modified, workflow. As can be imagined, the amount of computer processing required to make this solution effective in real time is considerable. Many users have managed this with high end devices. What PlayFusion has managed is to bring this functionality to low cost readily available mobile technology.

SLAM

Image recognition

This is where the optical data recorded by a device is processed and identified by software. It is an incredibly complex area of computer vision. Image recognition is a very advanced computer vision technique that has been used and developed over many years. It forms the basis of many computer vision technologies.

Essentially it tries to replicate what our eyes and brains do in a fraction of a second. Complex algorithms are used to help a computer decide what it is actually seeing. An example is in facial recognition. This is used in security systems and can also be applied to augmented reality. A simple application is facebook’s messaging system where people can send pictures or video with various overlays on them. Although essentially a fun application, the algorithms here have taken years to develop and are now widely used in AR.

Target tracking

This is the process by which a moving object is tracked by the computer vision system. The target may be an object or a set of points on an object. From an animated character in a game to a moving tank on a battlefield, this is an area that, combined with image recognition can provide the basis of very powerful augmented reality experience where enhanced data can be presented in real time to users.

AR Cloud

This technology enables the sharing of data across multiple platforms to create a map of the environment that is persistent. Data can be stored off device enabling much faster computer vision operations as the amount of relative data being managed is much less, so live AR is possible even on realitvely low powered devices.

Multi user augmented reality

At PlayFusion we have pioneered multi user technology in our Enhanced Reality Engine. This enables exciting experiences for people. We have unique capabilities in this area and users of our SDK will be able to create complex multi-user experiences.

Imagine looking at a desk through your phone or tablet and seeing race track superimposed on it. One of the cars on the track is yours, the others are other players. The race is on. You all see the same track, from different perspectives, but everyone controls their own vehicle and everyone shares the same precise map data.

Now imagine the same desk with an AR representation of medical scan of a patient from a CT machine. Two surgeons are able to practice an operation to remove a tumor. Each seeing the others moves and effects so they can work in tandem. The image generated for both users is accurate and although they can see it from different perpectives the precision is absolute.

The possibilities are endless…if you can imagine it we can probably do it.

Commonly used Algorithms in Computer Vision Technology

So what are some examples of important computer vision techniques and algorithms? In computer vision applications, such augmented reality as featured in PlayFusion’s award winning Lightseekers card, game three things are of paramount importance to users:

  • Image recognition - feature detection and extraction, object recognition, environment mapping etc.
  • Object tracking - to estimate pose and movement
  • Computational optimization - these must work on low end hardware

At PlayFusion we are proud of our achievements to date, especially in terms of optimizing the code to run in real time on low end hardware other solutions can’t access, across platforms. We are continually developing the software, refining and improving it for the best possible computer vision experiences for our own use. Consequently the platform we make available to users outside our own industry is also being constantly supported and updated, driven by the intense demands and competition in the games industry to provide ever more impressive experiences.

Feature Detection

An important part of any computer vision system is the ability to recognize artifacts or features in the image. To this end a number of methods have developed and the choice of technique depends very much on the application. There are four basic features that these computer vision algorithms look for:

  • Edges - defined as a line or curve where there is sharp contrast gradient between adjacent pixels on one side.
  • Corners - sometimes referred to as “interest points” and defined by a pixel or small group of pixels surrounded or partially surrounded by pixels showing sharp contrast gradient.
  • Blobs - defined by a region of interest points. Essentially a larger area version of the corner detection above.
  • Ridges - a line or curve of interest points bounded on two sides by areas of high contrast gradient.

A brief comparison of a few of the commonly used computer vision algorithms is shown below.

AlgorithmEdgeCornerBlob
Canny X
Sobel X
SUSAN X X
Shi and Tomasi X
FAST X X
Laplacian of Gaussians X X
Difference of Gaussians X X
MSER X
PCBR X

More details and rigorous explanations of the mathematics behind these algorithms can be found by following the links below:

Canny - named after its inventor, John F. Canny in 1986.

Sobel - named after its inventor, Irwin Sobel. sometimes called Sobel-Feldman operator.

SUSAN- Smallest Univalue Segment Assimilating Nucleus.

Shi and Tomasi - modification of the Harris algorithm

FAST - Features from Accelerated Segment Test

Laplacian of Gaussians

Difference of Gaussians

MSER - Maximally Stable External Regions

PCBR - Principal Curvature Based Region detector

Feature Extraction

Once features have been detected, a computer vision process known as “Feature Extraction” can be performed. This essentially looks at the area local to a feature, or set of features, to describe it more precisely eg. gradient of contrast, edge orientation etc. A number of methods can be used in this step. The extracted features can be used to identify artifacts in the image and comparison of sets of extracted features can be used to determine movement etc.

Since an extracted feature may be used to identify sets of detected features, it effectively reduces the amount of data requiring manipulation. This speeds up subsequent processing and facilitates machine learning. Optimization of feature extraction is key to fast, efficient computer vision solutions.

Feature extraction is regarded as a method of dimensionality reduction. This is a method that takes data which requires a number of variables to describe and reduces it to fewer variables by defining new “principal variables” in this case related sets of detected features.

Some of the more commonly used approaches are listed below.

SIFT - Scale Invariant Feature Transform

This is a computer vision technique that detects and describes features in images and groups related points or areas of interest together. It is widely used in areas such as object recognition, remote mapping, gesture recognition and video tracking.

SIFT effectively extracts a number of image points or features and uses these as a set, to provide a feature description of an object. Once this feature description is saved, it can be used to identify the object quickly in new images or video streams. Importantly, the identification has to be reliable under a wide range of conditions such as noise, different light levels and also has to be scalable.

In computer vision applications requiring robust object recognition and subsequent tracking - eg. animating an augmented reality figure - algorithms like SIFT are extremely important.

For a fuller explanation of the technique click this link: SIFT

SIFT is often used in conjunction with techniques like….

GLOH

A SIFT like computer vision image descriptor that considers more spatial regions for the histograms representing accurate distributions of data.

Gradient location and orientation histogram

..and

PCA

A statistical procedure for converting a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components.

Principal component analysis

Hough Transform

This is another computer vision technique that is helps with object recognition and defining shapes, esepcially where the data might be incomplete or contain erroneous artifacts.

Hough Transfrom

SURF - Speeded up robust features

This is a feature detector that is essentially an evolution of SIFT. It is faster than SIFT, however it uses integer approximations of the SIFT algorithm so it may not be suitable for all applications. As always the precise choice of methods will depend on the application requirements. At PlayFusion or engineers are experts in applying this technology. Consequently we are able to optimize our code to run on low end hardware, while still delivering exceptional performance.

Read more about SURF here And a comparison of SURF and SIFT here

KLT Tracker

This is a computer vision feature extraction algorithm, specifically developed to address the problem that many other image recognition algorithms suffer from, namely that they are slow and require a lot of processing power to work fast enough in real time. KLT is able to extract data more quickly and use fewer points than many traditional methods.

As more research and development is done in computer vision, and the requirements to run complicated methods on low end hardware such as mobile phones, algorithms such as KLT have become extremely important.

KLT Tracker - Kanade-Lucas-Tomasi feature tracker

RANSAC - Random Sample Concensus

RANSAC is a method of identifying “outliers” or points that are outside a statistical level of error. It is used in computer vision mapping to ensure that erroneous points do not affect the reliability of the data. For example in mapping one image onto another - or frames in a video stream - it is used to discard any random points that may appear due to noise or other artifacts, that would otherwise affect the accuracy of the mapping.

RANSAC

Some advanced challenges in computer vision - and how PlayFusion's Enhanced Reality Engine is tackling them

Occlusion

Occlusion in computer vision is the interaction of real and virtual parts of the image. For example if an augmented character runs behind a wall, the wall hides the character from the display. Likewise if a character runs in front of a real object, the object should be hidden by the augmented entity.

To occlude real items with the augmented characters is fairly straightforward. However, to occlude augmented content with real content is much harder. It either requires a pre-mapped area to be held on the device or cloud or uses complex algorithms to map occluding objects as they appear.

One academic discussion of approaches using this technology can be found here: Real Time Occlusion Handling.

Multi User Experiences

Multi user AR has long been a much sought after capability. It has been approached in a number of ways. For example a pre rendered map can be used, or a cloud based map derived from a number of users individual inputs and stitched together.

PlayFusion's Enhanced Reality Engine is pioneering the use of multi user AR on mobile devices.

Plane Tracking

Many Computer Vision systems rely on detecting a set of data points and defining planes from these points. This is a convenient way to map an environment.

PlayFusion's Enhanced Reality Engine rapidly detects a large number of data pioints and identifies multiple planes in both horizontal and vertical orentations. AR components can be spawned on any of these planes as required.

Real Time on Low End Hardware

At PlayFusion we have worked hard to optimize code to work on devices from low end mobiles through to high specification computers. We are proud of this capability as we can perform complex enhanced reality tasks that others require either high end computing power and complex optics to achieve.

Hardware Developments

PlayFusion is working with companies involved in developing the latest hardware to run enhanced reality solutions on. The progression from optical to photonics devices is beginning to happen, meaning that head up displays and headsets will become smaller, lighter, cheaper and more functional.

Photonics technology using components such as diffractive optics, is bringing rapid advances to enhanced reality experiences and is set to revolutionize computer vision. Making applications using techniques such as augmented reality more accessible and immersive.

Further reading

CVOnline - a comprehensive, detailed look at many computer vision related algorithms.

Computer Vision Algorithms - looks at algorithms for specific tasks

Robot Vision - a look at various computer vision types and relationships

Computer Vision Applications - uses of computer vision in every day life

start.txt · Last modified: 2018/04/16 15:33 by elhoggo