How Real-Time Face Detection Works

Real-time face detection and tracking are at the core of virtual try-on, AR masks, avatars, and many other features. In this article, we will describe what these technologies are and how they work, as well as how you can efficiently integrate them into your app.

Alex Krasko

Originally posted on January 19, 2024
Last updated on January 19, 2024

[navigation]

Face detection and face tracking

Face detection is a technology that finds out whether there are any faces in the picture or a video frame (real-time face detection, in this case). All other face-related features, like facial recognition or touch-up, are based on it.

Face tracking actively follows the face if it changes the position.

Sometimes when people use the above mentioned terms, they might mean different things. For example, face recognition (matching a face in the image/frame to a face of a specific person). This article will focus on the detection and tracking only, however.

Working with faces has always been complicated for computer scientists. Firstly, human faces can be very different from each other. Secondly, the technology should work no matter the expression of the person in the picture. Thirdly, different lighting conditions and accessories (e.g. glasses or medical masks) can drastically change the looks. Finally, the presence of several people in the image can further complicate the process.

There are three main approaches to solving this task:

Viola-Jones algorithm is a popular face detection algorithm that works by scanning the image at different scales and positions, looking for certain indicators to determine if a region contains a face or not. This algorithm revolutionized face detection due to its speed and accuracy, making it widely used in various applications.
Single-Shot Detector (SSD) algorithm divides the input image into a grid of predefined boxes at different scales and aspect ratios. It then predicts the presence of objects and their corresponding class labels within each box using a set of convolutional layers. SSD is known for its real-time performance and high accuracy, making it suitable for applications such as autonomous driving and video surveillance.
You Only Look Once (YOLO) algorithm is another popular object detection algorithm that also aims to achieve real-time performance. YOLO divides the input image into a grid and predicts bounding boxes and class probabilities directly from the grid cells. Unlike SSD, YOLO does not rely on region proposals or sliding windows. Instead, it uses a single neural network to simultaneously predict multiple bounding boxes and class probabilities. YOLO is known for its simplicity, speed, and ability to capture small objects well. It has been widely used in various real-time applications, including object tracking and robotics.

Real-time face detection with neural networks

Neural networks are good at solving problems that require finding similarities between objects. So it was only natural that they are used for finding human faces.

The most common way is landmark-based. The system locates facial features, determines their position, and infers the location of the face based on the results. This method is relatively simple and works under complicated circumstances, e.g. dim lighting.

Banuba takes it a step further. While the first step (detecting facial landmarks) is the same, there are some refinements. The landmarks are detected as a series of points that are later converted into hidden vectors that are then used to build a detailed 3D face model. The default number of vertices we get this way is 3308, but it can be up- and downscaled as required. In practice, this provides a more accurate detection, which can then be used for applying masks, filters, and other AR effects.

Integrating real-time face detection

It is entirely possible to build your own system for this purpose. There are also open-source libraries (e.g. OpenCV) that can be used as a foundation. The real challenge lies in getting the large and diverse dataset needed to train the neural networks. You will need images of people of all skin tones and taken in different lighting conditions with both low- and high-end cameras. This is necessary to ensure that your application offers robust real-time face detection in any situation. Banuba’s dataset, for example, contains over 200.000 images and videos.

The simpler way is to license a commercial SDK that comes with pre-trained neural networks. They are designed to be easy to integrate, so the time saving they offer is a major benefit.

This is how you can connect Banuba Face AR SDK to your app. It includes a 14-day free trial, so you can test all the features at your convenience.

Integrating a Face Tracking SDK: Android

The Android version of Banuba SDK is distributed as a Maven package. This is how you can integrate it.

[code]
allprojects {
    repositories {
        google()
        mavenCentral()
        maven {
            name "GitHubPackages"
            url "https://maven.pkg.github.com/sdk-banuba/banuba-sdk-android"
            credentials {
                username = "sdk-banuba"
                password = "\u0067\u0068\u0070\u005f\u0033\u0057\u006a\u0059\u004a\u0067\u0071\u0054\u0058\u0058\u0068\u0074\u0051\u0033\u0075\u0038\u0051\u0046\u0036\u005a\u0067\u004f\u0041\u0053\u0064\u0046\u0032\u0045\u0046\u006a\u0030\u0036\u006d\u006e\u004a\u004a"
}
        }
    }
}
[/code]

Integrating a Face Tracking SDK: iOS

To integrate the iOS version, add a custom source to your podfile:

[code]
source 'https://github.com/sdk-banuba/banuba-sdk-podspecs.git'
[/code]

Then add Banuba SDK dependencies:

[code]
target 'quickstart-ios-swift' do
use_frameworks!
pod 'BanubaSdk', '~> 1.4'
end
[/code]

Conclusion

Real-time face detection and tracking are an integral part of modern video conferencing, dating, eCommerce, and other applications. It is possible to develop your own system for it using open-source libraries. However, to save time and get to market faster, it is prudent to use a commercially available SDK.

Top