Background subtraction is a computer vision method to detect in-video objects and compare them to the background and foreground. This machine learning (ML)-based algorithm helps create stunning video background removals, erasers, and changers used by market-leading brands like Zoom, TikTok, Instagram, Bumble, and others.
Despite the technology’s success across multiple industries, many startups and companies still wonder how background extraction works and if it’s viable and investment-worthy to empower their business apps. Since we’ve decent experience in facilitating our AR-powered products with background subtraction, this post will help you figure out whether this computer vision-based technology is worth your time and funds. The article features:
- Background subtraction: how it works technically
- Core challenges background extraction helps to resolve
- Must-watch benefits of background subtraction for business
- How brands subtract backgrounds from images: use cases
- How to empower your app with background subtraction.
Background Subtraction: How It Works Technically
The background subtraction method (BSM) is a computer vision algorithm that detects objects in video content by comparing them to the background and foreground parts of an image.
BSM was designed to spot foreground objects by isolating them during the no-object-frame comparison. Simply put, the algorithm analyzes the input video content to detect foreground objects, isolates them, and compares them to a frame with no objects. This way, the method finds post-comparison differences between the values of 2 frames and builds up a distance matrix. A frame containing objects is called a threshold value which is equal to the number of objects detected. It becomesdefined by analyzing the first several fragments of video content.
Once the value difference between two frames exceeds the threshold value, the BSM algorithm marks it as a moving object. The below-mentioned illustration vividly shows the step-by-step background subtraction process:
Over the last two decades, the BSM technique has greatly evolved, and now it includes multiple approaches designed to solve challenge-specific problems that background subtraction deals with. Here are the most common methods:
- Running Gaussian average
- Temporal median filter
- Mixture of Gaussians
- Kernel density estimation (KDE)
- Sequential KD approximation
- Cooccurrence of image variations
Let’s discuss the 2 most widespread approaches: Running Gaussian average and temporal median filter.
Running Gaussian Average
The technique authors have proposed to approach background modeling by identifying it at each pixel location independently. This method is based on a Gaussian probability density function (pdf) designed to fit the last pixel’s values ideally. The model was patented to process intensity images and utilize multi-component color spaces like RGB and YUV.
The core drawback of Running Gaussian Average is that the lower the background model’s update rate, the fewer chances that the system will respond quickly when dealing with actual background dynamics.
Temporal median filter
The method authors have suggested utilizing the median value of the last N frames as a background model. It was supposed that this median value would provide the most accurate model even if the n frames are subsampled concerning the original frame rate by a factor of 10. This way, it skyrockets the model stability and provides way more accurate subtraction results.
Background subtraction and AR filters
Core Challenges of Background Extraction
The up-to-date performance and stability of the background subtraction algorithms are far beyond what 10-year-old approaches could do, but they still experience uncertain treatment issues. The challenge of treating undefined values comes from a real-life scenario that considering undefined values as a normal depth oftentimes leads to issues with most background models. For example, many subtraction algorithms still identify object shadows as foreground items which results in an erroneous and low-accuracy output.
Filling the Gaps
Most algorithms still cannot properly identify frame pixels, which leads to undefined pixel values, creating background model gaps. The core challenge here is that a defined pixel differs significantly from an undefined background, thus resulting in either false positives or negatives. Currently, most computer vision engineers argue that an image reconstruction algorithm is a viable solution to reduce the number of errors, but it still doesn’t eliminate the pixel-gap challenge.
Modeling depth images still leads to noice-full results and generates multiple false positives like small blobs and thin edges around objects. The preferred foreground objects like humans are mostly quite large since the Kinect sensor (off-the-shelf solution to detect moving objects by Microsoft) produces range constraints. This mismatch results in non-accurate post-processing outcomes, which decrease the algorithm’s accuracy level.
OpenCV vs Commercial SDKs
This is less of an issue with background subtraction as a feature and more about its implementation in a specific project.
OpenCV is a free, open source computer vision library that helps in creating various related software: image recognition, background replacement, access control, etc. There are also paid proprietary software development kits (SDKs) - premade modules that can be integrated into an app and perform certain functions. Some companies prefer building their own solutions based on OpenCV, while others opt for SDKs. When should you choose which?
Firstly, OpenCV is just a foundation, while SDKs are feature-complete products. This means that if you have the capabilities, expertise, time, and budget for a unique custom solution, the library is a good choice. It is also the only option when there is a noticeable lack of SDKs in your niche (e.g. with self-driving vehicles). Otherwise, go for premade modules.
Secondly, OpenCV provides lower image quality by default, as it uses a "one-size-fits-all" code for iOS, Android, and Web. SDKs usually are platform-specific and optimized. This makes them better for scalability.
Thirdly, OpenCV is based on object recognition algorithms. While solid, they take second place to modern neural networks that do the same job better. In the case of background replacement, SDKs are much more accurate as a result.
Finally, SDKs have other useful features. Banuba SDK, for example, can place AR masks, color filters, virtual makeup, and do a lot of other things.
Summing up, OpenCV is better when you need a foundation for a complex custom product that you have time and money to develop. An SDK is better to quickly launch a feature-rich app
Read more about it in a dedicated article.
How Brands Subtract Backgrounds from Images: Use Cases
Background subtraction is a thriving computer vision technology adopted by social media, entertainment, video conferencing, live streaming, and dating products. Let’s discuss how world-leading companies leverage background extraction capabilities to skyrocket their business growth with never-seen-before user experiences.
Video chat apps
Market-leading peer-to-peer video chat applications like WhatsApp, Viber, Skype, and FaceTime are actively adopting background subtraction technologies to provide competitive functionality for end-users. This helps them both acquire new users through never-seen-before features and retain existing customers by offering frequent on-demand updates with new features.
For example, video chat products create more immersive video communication experiences by allowing users to replace traditional traditional filters with animated backgrounds.
Video conferencing platforms
Zoom fatigue and long-winded monotonous work-related conferences and meetings make users struggle and may even decrease their productivity. More than that, the trend toward remote work forces professionals to look for modern video conferencing solutions that facilitate privacy.
All these challenges are addressed by background subtraction. The device camera now focuses only on humans and accurately replaces real-life backgrounds with multiple static or moving content like 3D environments, blur effects, tailor-made brand-related thumbnails, or customized uploaded pictures.
Here are the most common ways computer vision capabilities can benefit video conferencing platforms:
- Streamline a business call or remote interview experiences with custom backgrounds
- Edit “boring” backgrounds in video processing software to create immersive content
- Create a virtual background for a webcam to remove noise, add entertainment value and enhance the camera experience
- Add animation effects in the background that can be changed by the user as part of interactivity
- Boost users' privacy in video calls, conferences, meetings, and even live streaming
- Add 360-degree backgrounds in 2D or 3D for educational or marketing purposes, e.g. let consumers virtually enter your shop.
- Remove unwanted objects or people from videos.
Millionaire bloggers and active YouTube, Facebook, Twitch, and Dacast streamers are always seeking new technology-driven ways to engage their existing audience and acquire new followers dealing with tough competition. Here comes background subtraction which enables content creators to customize their camera appearances and lets them broadcast live experiences whenever they want.
Tailor-made game-related backgrounds, streamer’s branded static or motion content, and animation-powered experiences are available know-how provided by background subtraction. More than that, background subtraction capabilities help streamers avoid traditional green screens and chroma keys that are hard to relocate and take lots of space to set up and run.
Since the Covid-19 pandemic, the online video dating industry has boomed as people turned to digitizing their first meeting experiences. Bumble, a Tinder-like video dating application, has reported an over 80% increase in monthly active users activating a video chat feature, while Hinge, a striving app with over 10M of downloads, claimed a 65% increase in users who trigger a video dating feature after the first match.
So, multiple industry-leading dating products started adopting computer vision technologies and background subtraction specifically to boost their marketing campaign results, acquire new users, and retain existing customers. As user privacy is among the core bottlenecks of the dating market, ML-based technologies help both brands and consumers streamline their security and privacy issues by providing top-notch background-changing capabilities.
They keep users safe and secure, thus decreasing frustration and simplifying the dating process while skyrocketing brand retention rates.
Apart from security and user privacy perks, background extraction also facilitates higher user engagement through immersive content creation. TikTok- and Instagram-like brands actively seek new computer vision technologies to empower their digital products with never-seen-before experiences that boost self-expression.
For example, social media influencers can leverage tailor-made backgrounds to improve their brand awareness and engage with their audience way more immersively. In turn, users can create stunning short-form videos and stories using the top-charting backgrounds that help other users engage with them through smart social media platform recommendation algorithms.
The mobile video editing industry has thrived in recent years as users no longer need resource-intensive platforms to create studio-like or even Hollywood-level video content. Here comes the background subtraction technology as a must-have built-in video editing feature that enables users to replace chroma keys and boost their videos’ engagement with customizable backgrounds.
How to Empower Your App with Background Subtraction
When it comes to empowering your digital product with background subtraction capabilities, there are two ways you can succeed:
- Custom from-scratch feature development
- Purchasing a ready-made software development kit (SDK).
Let’s briefly discuss each option to determine the best model for your business.
Custom Functionality Development
This option is a great opportunity for medium-sized companies and large-scale enterprises that have an in-house R&D team with experience in creating neural networks and expertise in developing computer vision algorithms. Alternatively, custom development may suit a technology-driven startup launching an AI-powered digital product and having in-house IT resources.
Tailor-made development will help you get full control over the project development process and enable to embed the required level of accuracy. More than that, in-house IT resources can facilitate your company’s value growth as you can further patent the final background subtraction technology as an intellectual property asset.
However, custom feature development is a lengthy and expensive process requiring to spend fortunes of time and funds to deliver a Minimum Viable Product (MVP). It may take up to 12 months of active engagement and hundreds of thousands of dollars, since the average US-based computer vision engineer's salary is about $105,000.
If you currently don’t have in-house development resources, you can hire an outsourcing technology partner to build either an MVP a or a full-fledged solution. This option is still pricey and risky since you should adjust for multiple factors like the time to choose a partner, product discovery sessions, validate a vendor’s background and expertise, etc.
If you don’t have a large budget, don’t have in-house R&D resources, and want to accelerate your go-to-market period (GTM), premade software development kits with background subtraction and replacement capabilities are what you need. An SDK is a ready-made product with built-in modules that include multiple features that you can smoothly and quickly integrate into either a market-ready or existing application to set up and run in a matter of days.
Choosing a background extraction SDK is a challenging task, so here are the core factors you should consider beforehand:
- Platform (Android, iOS, Win, Mac, Web) and device compatibility
- General feature set
- Performance and stability
- Real-time background subtraction capabilities
- Supported background types (blurred, static picture, GIF, video, 3D environment)
- Customer service quality
- Trial period conditions (e.g. Banuba offers 14 days of free trial)
- Price and subscription models.
For example, Banuba SDK with background subtraction. Designed by our professional in-house R&D team, we have patented the background extraction technology with a few commercially available background models that accurately process frame sequences.
The core features of Banuba’s technology include:
- Multiple background modes and effects blur, static images, 3D environments, etc)
- Portrait-to-landscape modes
- Image and video content support
- Both real-time and post-processing availability
- iOS, Android, and Unity cross-platform support
- Multi-browser compatibility supporting Chrome, Safari, Firefox, Edge, Opera)
- Trusted by Gucci, Meta, and RingCentral
- 14-day free trial to test the technology inside out.
Subtracting backgrounds from images or videos is no longer unique since there are multiple brands adopting cutting-edge AI-driven capabilities in their products, and the technology’s performance is very advanced. Social media, video conferencing, dating, live streaming, and video chat apps actively seize the technology-driven landscape to acquire new users and retain existing customers.
When it comes to empowering your own solution with background subtraction, leverage either ready-made SDKs or custom software development. The premade kits offer low initial pricing, facilitate the go-to-market (GTM) process, and provide multiple out-of-the-box features. From-scratch development is a long-run and pricey initiative that may suit your needs well if you have in-house R&D resources or a reliable outsourcing partner.
Banuba’s Face AR SDK is a great kit to test inside and out. Leverage a 14-day free trial to validate our patented and high-accuracy background extraction capabilities, simply integrating it into your market-ready or existing application in a day.
We use background subtraction to prevent embarrassing situations (like people or animals walking in), protect privacy, and add a fun element to video conferencing.
Background removal in image processing is a computer vision technique designed to detect the foreground objects and separate them from the background.