Face Detection and Face Tracking with Deep Learning
Face detection & Face Tracking
To start, let’s define what we are talking about.
Face detection is determining whether there is a human face in the input image, for example, a picture or a video frame. It is the first step in other face-related technologies, including face recognition, background replacement, virtual try-on, etc.
Face tracking means following the detected face as it moves. This is common in video editing, live streaming, social media, etc.
These technologies shouldn’t be conflated with face recognition (determining whether a specific person is in the picture). As this is not our area of expertise, face recognition software isn’t in the scope of this article.
Solving this computer vision problem has historically been harder than most image classification tasks, as it is hard to specify all the conditions. Not only are the faces different from each other, they can appear under different angles, in different lighting conditions, and with various elements that change their shapes and looks (eyeglasses, makeup, haircuts, hats, etc.). Moreover, there is a matter of detecting several faces in one picture or frame.
Traditionally, it has been done using special algorithms. Some of the most prominent are:
- Viola-Jones. This algorithm drives a “sliding window” (a square of a predetermined size) over the image to locate Haar-like features. These features can then be recognized as elements of a face.
- Single Shot Detector (SSD). This one places a grid and a number of “anchor boxes” on the image, the latter are created during the training stage. These boxes are used to detect features of the needed objects (e.g. faces) and their position.
- You Only Look Once (YOLO). A similar approach to SSD, it boasts better performance because it only needs one “look” at the picture to find all the objects of interest.
Machine Learning, Deep Learning and Neural Networks
Machine learning (ML) and deep learning (DL) took the world by storm in recent years. All of these terms relate to artificial intelligence, but they are not the same.
ML is teaching the machines to make decisions without being explicitly programmed to do so. Usually, it means feeding the algorithms a training dataset (a curated collection of data). It is commonly used for pattern recognition: object detection, speech/voice recognition, face tracking, etc.
Neural networks are sets of algorithms that attempt to simulate a human brain. They are built in layers, with each layer refining the input data.
Deep learning is a subset of machine learning that involves neural networks with 3 or more layers. The more layers there are, the more accurate the results can be. So deep learning methods are a de-facto industry standard when someone needs to perform face detection.
Face Detection: Deep Learning
Detecting faces and tracking them is a common enough problem with different solutions. Each of them requires deep learning methods, however.
The most widespread approach is based on landmark detection. The idea is to locate specific features (eyes, ears, nose, mouth, etc.), and determine the position of the face based upon them. This is a relatively straightforward method, and it is reasonably robust, able to work with different angles and facial expressions.
Banuba’s technology is a bit more advanced than that. While the first step (detecting facial landmarks) is the same, there are some refinements. The landmarks are detected as a series of points that are later converted into hidden vectors that are then used to build a detailed 3D face model. The default number of vertices we get this way is 3308, but it can be up- and downscaled as required. In practice, this provides a more accurate detection, which can then be used for applying masks, filters, and other AR effects.
Face Tracking & Deep Learning: Success Stories
These companies aren’t among the Fortune 500, however, they were able to successfully solve face detection and tracking problems.
1. Clash of Streamers
Clash of Streamers is the top-grossing and most popular NFT-powered mobile game in the world. It uses the players’ faces to personalize characters.
Clash of Streamers
To do that, the developers needed to detect a face in the selfie, separate it from the overall picture and apply a 3D mask.
Today they have over 4 million installs and more than 100.000 monthly active users.
MNFST (Manifest) is a micro-influencer app, allowing people to earn with their Instagram account even if they have just 50 followers. It connects users with businesses that want to run micro-influencer campaigns. The advertiser provides a branded face filter, while the users’ job is to make a post with such filters and promote them. The better the engagement, the higher the payout.
AR face filters require accurate face recognition and face tracking based on a deep neural network. MNFST managed to use them successfully, and acquired $2.5M in investment, as well as over a million downloads.
Bermuda is a random chat app with over 4 billion matches in total and more than 20 million installs. People can set their location and the preferred sex they want to talk to and the app will connect them to a random user fitting the parameters.
In the case of Bermuda, machine learning is used for face detection and tracking, as well as automatic interpreting. Face detection, in turn, powers the AR filters that are used over 15 million times per month.
Eyebou is an established maker of glasses and other eyewear, as well as a major player on the electronic healthcare market. Its software portfolio includes an app for diagnosing vision issues, including cataracts, glaucoma, blepharitis, myopia, etc.
It was contracted by UNICEF to deliver a whitelabel solution that would help diagnose the same issues in orphans from low-income countries.
In this case, the deep learning models are required to detect the distance between the camera and the person’s face. But when there is enough medical data, another neural network will be trained to diagnose everything automatically.
By the mid-2023 the Eyebou/UNICEF app is expected to help 10.000 children.
Looke is a niche Indonesian cosmetics brand that focuses on halal, vegan, and cruelty-free products. They were the first company in their country to launch a virtual makeup try-on app.
Looke quickly got 50.000+ downloads thanks to the accuracy and realism of the virtual makeup try-on. This was achieved largely by using a state-of-the-art AI.
Integrating a Face Recognition/Tracking SDK
Building your own deep neural networks trained to perform face detection is doable. You can easily find a hands-on guide to this, especially if you know how object detection works. There are open-source libraries (e.g. OpenCV) that would save you some time. However, gathering the training dataset can be a challenge. To properly prepare your system, you will need thousands of images of people of all skin colors, taken in different lighting conditions and on both high-end and low-end cameras – this is to ensure that the software will reliably perform in any conditions and with any user. Banuba’s dataset, for example, contains over 200.000 images and videos.
This stage can be skipped entirely if you use a commercially available face recognition SDK. Such systems are pre trained, so you can quickly integrate them into your software.
This is how it can be done with Banuba SDK, for example. You can try it for free for 14 days, so don’t hesitate to test this.
The first step is the same regardless of the platform: request the SDK and the trial token. To do that, simply shoot us a message through the contact form.
Integrating a Face Tracking SDK: Android
The Android version of Banuba SDK is distributed as a Maven package. This is how you can integrate it.
username = "sdk-banuba"
password = "\u0067\u0068\u0070\u005f\u0033\u0057\u006a\u0059\u004a\u0067\u0071\u0054\u0058\u0058\u0068\u0074\u0051\u0033\u0075\u0038\u0051\u0046\u0036\u005a\u0067\u004f\u0041\u0053\u0064\u0046\u0032\u0045\u0046\u006a\u0030\u0036\u006d\u006e\u004a\u004a"
Integrating a Face Tracking SDK: iOS
To integrate the iOS version, add a custom source to your podfile:
Then add Banuba SDK dependencies:
target 'quickstart-ios-swift' do
pod 'BanubaSdk', '~> 1.4'
Deep learning methods are the most common way to solve face detection and tracking issues. They provide the commercially viable level of quality and are robust enough to work in low-light and high-occlusion conditions, when properly pre trained. A face detection and tracking SDK can be integrated in a few lines of code and will save a ton of development time.