To apply an AR effect to a person’s face, the application first needs to find the said face. While this is easy for a human, teaching a piece of software to recognize this is rather hard. In this article, we will explain how face segmentation works in most commercial applications, how Banuba does it differently, what can it be used for, and how a face recognition SDK could help your app.
What is face segmentation?
The term refers to locating a face in a digital image. It is applicable to both static pictures and frames in the videos or live streams. Moreover, this is a subset of the wider technology of image segmentation that can work with other objects: people, vehicles, road signs, etc. Technically, the right term in this context is “face tracking” but in the context of applying augmented reality effects, they can be used interchangeably.
With the current level of technology, face segmentation can be implemented on almost any device with a camera. For example, Banuba Face AR SDK works on desktop computers, in web browsers, on iPhones starting from 6S and most Android-powered devices.
Besides detecting the face as a whole, segmentation of its elements (eyes, nose, ears, etc.) also has its applications. More on that later.
What is face segmentation used for?
The applications of this technology are many:
- AR masks. Various face filters and stickers are now an expected feature in social networks and many videoconferencing apps. To apply them accurately, you need to precisely track the face.
- Virtual try-on. By tracking the face and facial features as well as various physical parameters (e.g. the distance between the eyes and the distance to the camera) it is possible to create a virtual fitting room. The items (eyeglasses, hats, jewelry, etc.) could both look realistic and fit the person just like they would in real life.
- Virtual makeover. Applying all kinds of makeup: lipstick, foundation, mascara, etc. In the case of hair products, this even supports differently colored strands and various bright, outlandish colors. This has been popular with both the market giants (e.g. Sephora, Ulta, and Estee Lauder) and smaller niche brands like Looke. It allows creating social media-worthy photos and letting users try the products without actually touching them.
- Beautification. Removing blemishes and eye bags, evening out skin tone, adding a bit of glare to the eyes - all in a couple of clicks. This is very useful for social media and video calls when people want to look their best.
- Background replacement. This has more to do with upper body recognition, but face segmentation is an important part of it. The software can locate the person and either blur the rest of the image or replace it with a static or moving picture.
How does face segmentation work?
The process of tracking faces has a lot of complications that aren’t obvious at first glance.
- How many faces can the software detect at once?
- How well does it track faces that are filmed at an angle?
- Can the faces be detected in low-light conditions or when the image is overexposed?
- What if the person in the picture wears glasses or has a large beard?
- What if the person in the picture covers a part of the face with a hand, a scarf, or something else?
- Can the software work with people who have different skin colors? What about reliably tracking both men and women?
- How much processing power does the application need to work? How will it affect battery power consumption and heat emission?
- Can the software work in real-time? Can it work on both still images and videos?
These are just some issues that need to be covered in addition to just making the software “see” things.
Now, when it comes to face segmentation, there are two major schools of thought.
The first one is based around “landmarks” - facial features represented as collections of points. The software tracks each of them and adjusts the data according to their positions. The more points there are, the better the precision of the effects, these numbers can reach thousands. This is a popular and reasonably effective approach that can provide enough accuracy for commercial use.
Banuba has a different take. Like other companies, we use neural networks. However, ours are trained on a much larger dataset: over 200.000 photos of men and women taken in different lighting conditions and with both low- and high-end cameras to teach the software to “see” the face as a human would. The software then tracks 36 parameters that cover facial expressions and head position. As a result, just 36 characteristics are enough to build a model of 3308 vertices and provide reliable tracking and precise effects application. As an extra advantage, this isn’t as resource-intensive as the method mentioned above. The difference is especially notable in the long term if the device is continuously used for an hour or so.
Another advantage of this approach is its flexibility. If you need a more or less detailed face model, the core technology can be quickly adapted to accommodate it.
Finally, it is a convenient foundation for other features. Using 36 numbers for detecting emotions, for example, is much more simple than using thousands of them.
Another thing worth mentioning is the differences between the platforms that support face segmentation.
- Web. This is the most flexible platform, as almost any device with a camera and a browser can support it. It is also easy to maintain and it doesn’t require any downloads. However, it is also the least powerful and offers the lowest quality of effects.
- Mobile. The most popular one. High-end smartphones can provide stunning visuals for a long time but require more support (maintaining two apps instead of one even with hybrid frameworks like Flutter or React Native) and mobile face detection is more expensive as a result.
- Desktop. It has more power than smartphones but the size is a major disadvantage for certain applications (e.g. virtual try-on or making social media videos on the go). However, in the case of video conferencing applications, it works perfectly fine.
Using a face recognition SDK
The simplest and quickest way to get any functionality that requires face segmentation is to integrate a face recognition SDK. It is a premade product that can be quickly connected to an app and perform the necessary functions.
Augmented reality, computer vision, and machine learning require specialized skills that are quite rare. Finding people with such skills is hard and expensive, not to mention that developing similar technology from scratch takes a lot of time. There were examples of companies spending 6 months and upwards of 500K dollars only to abandon the project and buy a face recognition SDK.
Such products already have expansive feature sets, demo apps where you can see examples of their work, and support staff that can help you marry the SDK to your project. So unless you have a team of developers who know computer vision like the back of their hands and a massive budget, a ready-made face recognition SDK is your best bet for reasonable expenses and short time-to-market.
Face segmentation is a flexible technology that serves as a foundation for many interesting features, including virtual try-on, AR masks, background replacement, and beautification. The most cost-effective way to implement it in your app is by using a face recognition SDK. And to get one with a unique technical approach, click the button below - Banuba Face AR SDK has two weeks of a free trial.