Do I need advanced experience to implement background subtraction?

If you're building from scratch, yes. You need a senior computer vision engineer, a graphics engineer, and a dataset large enough to generalize across skin tones and lighting conditions. If you're using Banuba’s Background Subtraction SDK, a regular mobile or web developer can have a working prototype running in a day. The SDK exposes a high-level API, so you don't need to understand the segmentation model itself.

Which platforms and frameworks are supported?

Banuba's Background Subtraction SDK supports native iOS (Swift, Objective-C), native Android (Kotlin, Java), Web (JavaScript, WebGL), Flutter, React Native, Unity, Windows, and macOS. The web version runs without plugins or downloads in Chrome, Safari, Firefox, Edge, and Opera.

How long does it take to implement background subtraction using an SDK?

With Banuba, a working prototype takes one to three days for most teams. Production-quality polish, including UI for background selection, integration with your existing video pipeline, and QA across your supported device matrix, typically lands in two to six weeks. Compared to the six to twelve months required for a custom build, the SDK route is roughly 10x faster to ship.

Blog

Virtual Background

How to Implement Background Subtraction Using an SDK

May 25, 2026

How to Implement Background Subtraction Using an SDK

Video communication is no longer a workaround. It's the default for work, learning, dating, telehealth, and live commerce. The global video conferencing market sat at USD 37.29 billion in 2025 and is projected to reach USD 65.72 billion by 2034 at a 5.90% CAGR, and cloud-based platforms now hold 73% of total market share. Behind those numbers sits a quieter shift: users expect to control what the camera shows. A messy room, a roommate walking through, a hotel wallpaper that gives away your location. People want it gone, and they want it gone instantly.

That expectation reshapes what apps need to ship. Background subtraction stopped being a nice-to-have somewhere around 2021. Today, in dating, telehealth, live shopping, gaming, education, and creator tools, the absence of a virtual background reads as a missing feature, not a stylistic choice. Background subtraction now powers AR background remover, blur, and replacement in apps like Zoom, TikTok, Instagram, and Bumble.

The catch: doing it well is hard. Real-time segmentation on mobile chips, with low light, head turns, hair edges, and occlusion, is a genuine computer vision problem. That is why most teams reach for a Background Subtraction SDK rather than a research project, and today we are going to figure out how to do it.

how to develop a background substraction feature with an SDK

Written by Tania Rohachuk.
Technically reviewed by Artem Harytonau.

Originally posted on Monday, May 25, 2026

Last updated on Monday, May 25, 2026

Stay tuned Keep up with product updates, market news and new blog releases

[navigation]

Background subtraction separates a person from everything behind them in a video stream, then replaces, blurs, or augments that background in real time. Building this capability from scratch demands a trained segmentation neural network, a GPU-accelerated rendering pipeline, and months of optimization for mobile and web. Implementing this feature with a Background Subtraction SDK like Banuba can reduce the work required to a few weeks by giving you a pretrained model, prebuilt rendering, and cross-platform bindings out of the box.

TL;DR

Background subtraction with deep learning powers virtual backgrounds in Zoom, Microsoft Teams, TikTok, Bumble, and almost every modern video tool.
Custom development requires computer vision research, neural network training on hundreds of thousands of images, and ongoing GPU and battery tuning.
An SDK reduces engineering risk and shortens release cycles from 6–12 months to a few weeks, with on-device inference that keeps user data local.
Pick the SDK route when speed to market, predictable performance across the majority of devices, and feature breadth (blur, image, video, 3D, AR effects) matter more than full control over the segmentation model.
Banuba's Background Subtraction SDK runs at 30 fps on devices as old as iPhone 7, supports web browsers without plugins, and integrates with Face AR for combined effects.
Banuba’s Background Subtraction SDK offers no-code integration, which takes days compared to months of custom development.

Why Apps with Background Subtraction Succeed

To understand what good looks like, check out how popular apps handle the experience.

Zoom and Microsoft Teams are the most prominent examples. Their virtual backgrounds load in one click, run while screen sharing is active, and recover gracefully when a user turns their head sharply. Users do not think about the technology. That is the point.

TikTok and Instagram push the creative side. Real-time green-screen effects, animated overlays, and AR filters layered over segmented backgrounds have made short-form video the dominant content format on social platforms today.

Bumble and Hinge use background subtraction for video dating. Dating apps lean on background subtraction to keep users private while signaling presence and personality.

VROOM, a professional video-calling app built by True Digital Group in Bangkok, is a useful Banuba success story to examine directly. The team needed virtual backgrounds and face touch-up to lift camera enablement rates and reduce the awkwardness of remote meetings. They licensed Banuba's SDK rather than build segmentation in-house. Since integrating Banuba's technology, the number of new monthly active users has grown 30% faster than before the implementation, and the number of registered users has risen by 54%.

What these apps share:

Real-time preview. Users see the effect before they commit, which builds trust.
Stable edges. No flicker around shoulders, no halo around hair when the user moves.
Low device load. The CPU stays cool, the battery survives the call, and the fan doesn't kick in.
On-device processing. No video frames are sent to a server, which is a baseline for privacy and GDPR compliance.
Cross-platform parity. Web, iOS, Android, and desktop behave the same way.

If you ship background subtraction that misses any one of those, users will notice and uninstall.

Core Capabilities to Build Background Substraction

Before you decide between custom and SDK, it helps to see the full surface area. Background subtraction is not one feature. It's a stack of capabilities that have to work together at 30 frames per second.

Segmentation engine

Person-vs-background classification at the pixel level
Trained neural network with diverse data (skin tones, lighting, camera quality, hairstyles, partial occlusion)
Edge refinement for hair, glasses, and clothing that blends with the wall
Anti-jitter logic so the mask doesn't shake when hands move

Rendering pipeline

GPU acceleration through Metal, OpenGL, Vulkan, or WebGL
Real-time compositing of the foreground mask against blur, static images, video loops, or 3D scenes
Color and lighting matching so the user looks like they belong in the new scene
Fallback to CPU on low-end devices

Capture and I/O

Camera frame ingestion at consistent frame rate
Resolution scaling per device class
Output to video chat SDKs, recording, or streaming

Cross-platform support

iOS (Metal), Android (OpenGL/Vulkan), Web (WebGL/WebAssembly), Windows, macOS, Unity, Flutter, React Native

Privacy and compliance

On-device inference with no server round trips
GDPR, HIPAA, and regional data-handling requirements

Optional but expected

Face tracking and beautification combined with the background effect
Multiple background modes (blur, image, GIF, video, 3D)
Drag-and-drop user positioning ("Weatherman" style)
Multi-person segmentation in group calls

That last column matters for differentiation. A virtual background alone is table stakes. Combining it with face tracking, makeup AR, or interactive 3D rooms is what raises retention.

Ways to Implement Background Subtraction

There are two honest paths to shipping background subtraction. Both work. They serve different teams and different timelines.

Building from Scratch

This is a real R&D project, not a sprint. And for this sprint, you will need the following tech stack:

Computer vision and deep learning expertise (PyTorch or TensorFlow)
Mobile inference runtimes (Core ML, TensorFlow Lite, ONNX Runtime, MediaPipe)
Native graphics frameworks (Metal, OpenGL ES, Vulkan, WebGL)
Cross-platform tooling for Android, iOS, and web parity
A model training pipeline and a dataset large enough to generalize across skin tones, lighting, and camera hardware

What you have to build

A segmentation model trained on a representative dataset. For example, Banuba's networks were taught using a dataset of over 200K photos of men and women of all skin colors, in good and awful lighting conditions, and with both low-end and high-end cameras. Anything significantly smaller and you'll see edge artifacts on darker skin tones, in low light, or on older phones.
A GPU rendering pipeline that composites the foreground mask onto the chosen background mode at native frame rates.
Anti-jitter logic that smooths the mask between frames without introducing lag.
Per-device performance tuning. A Snapdragon 695 and an A17 Pro behave very differently.

Realistic timeline and cost

Expect to invest hundreds of thousands of dollars and at least six months in development, and that's just the minimum viable product version. Twelve months to a polished release is more typical once you account for QA across the device spectrum.

Why teams still choose this path

Full control of the segmentation model and the IP
Ability to optimize for one specific use case (e.g., medical imaging, security cameras)
Long-term cost reduction if usage volume is enormous
Strategic value as patentable IP

Why most teams don't

Talent is scarce and expensive
Hardware fragmentation on Android is brutal
The model must be retrained as devices and camera sensors evolve
Time-to-market kills the launch window before you ship v1

Using an SDK

An SDK is a prepackaged module that drops the segmentation engine, rendering pipeline, and platform bindings into your app through standard package managers.

What you trade

Some control over the model architecture
A licensing fee instead of an internal team

What you gain

Weeks instead of months. The integration process can be done within a day for a working prototype, with a few weeks for production polish.
A model trained on a dataset larger than most teams can assemble
Cross-platform parity already solved
Ongoing improvements pushed in version updates
Predictable per-MAU pricing

This is the path Banuba's own customers, including True Digital's VROOM, sMedio, and Chingari, have taken to ship faster.

Comparison Table: Build vs Background Subtraction SDK

Comparison Table Build vs Background Subtraction SDK

SDK-focused Background Subtraction Implementation with Banuba

Banuba's Background Subtraction SDK is built on patented computer vision technology developed in-house. It separates the user from the surroundings using deep learning rather than chroma key, so no green screen is needed. The SDK then composites the foreground against blur, a static image, an animated GIF, a video, or a 3D environment.

What it replaces if you were going to build:

The segmentation neural network and its training data
The mobile and web inference runtime
The GPU rendering pipeline
The capture and frame management layer
The cross-platform bindings

Performance characteristics

Real-time 30 fps on mobile and web. Even on iPhone 7, the system holds 30 fps for at least an hour of non-stop work without lags or overheating, and on the latest devices, this can reach 300 fps.
Maintains 30 fps tracking performance under up to 70% facial occlusion, 360° camera rotation, and low-light conditions per Banuba's 2025 internal benchmarks.
Effective and fast performance on 90% of smartphones, including older constrained devices.
68 facial anchor points used by the companion face tracking module, which lets background effects combine with beautification, makeup AR, and 3D filters in the same pipeline.

Platforms supported

Native iOS and Android, web browsers without additional downloads, Flutter, React Native, Mac, Windows, and Unity. The web version works in Chrome, Safari, Firefox, Edge, and Opera, which is rare among commercial background subtraction SDKs.

Background modes available

Static images, animated GIFs, dynamic videos, and interactive 3D environments, plus blur and solid color. The "Weatherman Mode" lets end users drag and drop themselves anywhere on the screen, which is useful for presentations, education, and creator content.

Privacy posture

Effects are applied on the end-user's device with no data being sent to Banuba servers, which keeps the SDK GDPR-compliant out of the box and avoids the latency of cloud inference.

Recent upgrades

In October 2025, Banuba announced a next-generation AI model that smooths the borders between a user and their digital background, eliminating the jagged edges and pixelation that often plague video calls. CPO and Co-founder Anton Liskevich described the goal directly: "Our latest AI model doesn't just cut a person out; it intelligently blends them into a new environment." The update specifically targets the "ladder effect" along edges that has been the visible weakness of most segmentation models.

In December 2025, Banuba paired the improved virtual background with a new face shape detection module in the Face AR SDK, delivering a cleaner segmentation result by focusing on jitter and pixelization at the edges of a person, as well as more accurate separation in complex cases.

Integration Overview

The integration flow is conceptually simple:

Request a 14-day trial token from Banuba.
Add the SDK to your project through CocoaPods (iOS), Maven (Android), npm (Web), or the Unity package.
Initialize the SDK with the token.
Pass camera frames to the SDK and receive the composited output.
Configure the background mode (blur, image, video, 3D) and any face AR effects you want layered on top.
Render the output to your existing video chat or recording pipeline.

Full implementation details, code samples, and configuration references live in the official documentation:

Banuba documentation: https://www.banuba.com/docs
Banuba GitHub: https://github.com/Banuba
Background subtraction product page: banuba.com/technology/background-subtraction
LLM-focused guides: https://docs.banuba.com/far-sdk/tutorials/development/llms

The GitHub includes platform-specific quickstart projects for iOS, Android, Web, Flutter, React Native, and Unity, so most teams can have a working camera-to-background-replacement loop running on day one.

Implementation Decision Framework

Use this short decision tree before committing to a path.

Choose an SDK if:

You need to ship in under 3 months
Your team does not have a computer vision specialist
You want web, iOS, Android, and desktop parity from day one
You want background subtraction combined with face tracking, makeup AR, or beautification in the same pipeline
Predictable per-MAU pricing fits your business model

Choose to build if:

Background subtraction is your core IP and your moat
You have a senior CV team and at least 12 months of runway
Your use case sits outside the SDK's design (e.g., medical pixel-precision segmentation)
You expect volume that makes per-MAU licensing more expensive than internal maintenance

For most product teams in video conferencing, dating, telehealth, live commerce, and education, the SDK route wins on every axis except deep model control. That tradeoff is rarely worth twelve months of engineering.

Conclusion

Background subtraction is one of those features that looks simple on the surface and sits on top of a serious computer vision stack underneath. Doing it well requires a trained segmentation network, a GPU-accelerated rendering pipeline, anti-jitter logic, and consistent performance across the device spectrum. Building all of that takes a year and a specialist team. Most product roadmaps cannot absorb that.

An SDK shortcuts the entire stack. You get a pretrained model, a rendering pipeline, and cross-platform bindings as one drop-in dependency, and you ship in weeks. Banuba's Background Subtraction SDK adds three things that matter beyond the basics: 30 fps performance on 90% of smartphones, on-device processing for privacy, and tight integration with face AR so you can layer beautification, makeup, and 3D filters on the same pipeline.

If you're scoping a virtual background feature for a video chat, dating, telehealth, education, or live commerce app, the question is rarely "build or buy?" It's "how fast do you need to be in front of users?" Trial the SDK first. If it covers your use case, you've saved a year. If it doesn't, you'll have a much sharper requirements document for your custom build.

Get a 14-day trial token now and validate it against your real workload before you commit to either path.

References

Banuba. (2021, June 28). Video background subtraction in a nutshell. https://www.banuba.com/blog/background-subtraction-in-a-nutshell

Banuba. (2022, September 16). We tested background subtraction methods: Here's what we found. https://www.banuba.com/blog/background-subtraction-guide

Banuba. (2023). 30% more MAUs and 54% more users for video conferencing app. https://www.banuba.com/blog/30-more-maus-and-54-more-users-vroom-success-story

Banuba. (2025). Background subtraction with deep learning: Detection, removal. https://www.banuba.com/technology/background-subtraction

Banuba. (2025). Face AR technology. https://www.banuba.com/technology/

Banuba. (2025). What is the best background subtraction SDK with real-time processing for mobile and web apps? https://www.banuba.com/faq/best-background-subtraction-sdk-with-real-time-processing

Banuba. (2026, February 23). Webcam background removal software: Definitive guide. https://www.banuba.com/blog/webcam-background-removal-and-replacement

Business Wire. (2023, July 31). Banuba Face AR SDK boosted MAU growth by 30% for VROOM. https://www.businesswire.com/news/home/20230731308608/en/Banuba-Face-AR-SDK-Boosted-MAU-growth-by-30-for-VROOM

Business Wire. (2025, October 10). Banuba unveils next-generation AI for flawless virtual backgrounds. https://www.businesswire.com/news/home/20251010633225/en/Banuba-Unveils-Next-Generation-AI-for-Flawless-Virtual-Backgrounds

Business Wire. (2025, December 22). Banuba enhances Face AR SDK with superior virtual backgrounds and face shape detection. https://www.businesswire.com/news/home/20251222329858/en/Banuba-Enhances-Face-AR-SDK-with-Superior-Virtual-Backgrounds-and-Face-Shape-Detection

Fortune Business Insights. (2025). Video conferencing market size, share, trends and growth analysis report, 2026–2034. https://www.fortunebusinessinsights.com/industry-reports/video-conferencing-market-100293

Zebracat. (2025, April 3). 150+ video conferencing statistics for 2025. https://www.zebracat.ai/post/video-conferencing-statistics

If you're building from scratch, yes. You need a senior computer vision engineer, a graphics engineer, and a dataset large enough to generalize across skin tones and lighting conditions. If you're using Banuba’s Background Subtraction SDK, a regular mobile or web developer can have a working prototype running in a day. The SDK exposes a high-level API, so you don't need to understand the segmentation model itself.
Banuba's Background Subtraction SDK supports native iOS (Swift, Objective-C), native Android (Kotlin, Java), Web (JavaScript, WebGL), Flutter, React Native, Unity, Windows, and macOS. The web version runs without plugins or downloads in Chrome, Safari, Firefox, Edge, and Opera.
With Banuba, a working prototype takes one to three days for most teams. Production-quality polish, including UI for background selection, integration with your existing video pipeline, and QA across your supported device matrix, typically lands in two to six weeks. Compared to the six to twelve months required for a custom build, the SDK route is roughly 10x faster to ship.

Top