[navigation]
Background subtraction separates a person from everything behind them in a video stream, then replaces, blurs, or augments that background in real time. Building this capability from scratch demands a trained segmentation neural network, a GPU-accelerated rendering pipeline, and months of optimization for mobile and web. Implementing this feature with a Background Subtraction SDK like Banuba can reduce the work required to a few weeks by giving you a pretrained model, prebuilt rendering, and cross-platform bindings out of the box.
TL;DR
- Background subtraction with deep learning powers virtual backgrounds in Zoom, Microsoft Teams, TikTok, Bumble, and almost every modern video tool.
- Custom development requires computer vision research, neural network training on hundreds of thousands of images, and ongoing GPU and battery tuning.
- An SDK reduces engineering risk and shortens release cycles from 6–12 months to a few weeks, with on-device inference that keeps user data local.
- Pick the SDK route when speed to market, predictable performance across the majority of devices, and feature breadth (blur, image, video, 3D, AR effects) matter more than full control over the segmentation model.
- Banuba's Background Subtraction SDK runs at 30 fps on devices as old as iPhone 7, supports web browsers without plugins, and integrates with Face AR for combined effects.
- Banuba’s Background Subtraction SDK offers no-code integration, which takes days compared to months of custom development.
Why Apps with Background Subtraction Succeed
To understand what good looks like, check out how popular apps handle the experience.
Zoom and Microsoft Teams are the most prominent examples. Their virtual backgrounds load in one click, run while screen sharing is active, and recover gracefully when a user turns their head sharply. Users do not think about the technology. That is the point.
TikTok and Instagram push the creative side. Real-time green-screen effects, animated overlays, and AR filters layered over segmented backgrounds have made short-form video the dominant content format on social platforms today.
Bumble and Hinge use background subtraction for video dating. Dating apps lean on background subtraction to keep users private while signaling presence and personality.
VROOM, a professional video-calling app built by True Digital Group in Bangkok, is a useful Banuba success story to examine directly. The team needed virtual backgrounds and face touch-up to lift camera enablement rates and reduce the awkwardness of remote meetings. They licensed Banuba's SDK rather than build segmentation in-house. Since integrating Banuba's technology, the number of new monthly active users has grown 30% faster than before the implementation, and the number of registered users has risen by 54%.
What these apps share:
- Real-time preview. Users see the effect before they commit, which builds trust.
- Stable edges. No flicker around shoulders, no halo around hair when the user moves.
- Low device load. The CPU stays cool, the battery survives the call, and the fan doesn't kick in.
- On-device processing. No video frames are sent to a server, which is a baseline for privacy and GDPR compliance.
- Cross-platform parity. Web, iOS, Android, and desktop behave the same way.
If you ship background subtraction that misses any one of those, users will notice and uninstall.
Core Capabilities to Build Background Substraction
Before you decide between custom and SDK, it helps to see the full surface area. Background subtraction is not one feature. It's a stack of capabilities that have to work together at 30 frames per second.
Segmentation engine
- Person-vs-background classification at the pixel level
- Trained neural network with diverse data (skin tones, lighting, camera quality, hairstyles, partial occlusion)
- Edge refinement for hair, glasses, and clothing that blends with the wall
- Anti-jitter logic so the mask doesn't shake when hands move
Rendering pipeline
- GPU acceleration through Metal, OpenGL, Vulkan, or WebGL
- Real-time compositing of the foreground mask against blur, static images, video loops, or 3D scenes
- Color and lighting matching so the user looks like they belong in the new scene
- Fallback to CPU on low-end devices
Capture and I/O
- Camera frame ingestion at consistent frame rate
- Resolution scaling per device class
- Output to video chat SDKs, recording, or streaming
Cross-platform support
- iOS (Metal), Android (OpenGL/Vulkan), Web (WebGL/WebAssembly), Windows, macOS, Unity, Flutter, React Native
Privacy and compliance
- On-device inference with no server round trips
- GDPR, HIPAA, and regional data-handling requirements
Optional but expected
- Face tracking and beautification combined with the background effect
- Multiple background modes (blur, image, GIF, video, 3D)
- Drag-and-drop user positioning ("Weatherman" style)
- Multi-person segmentation in group calls
That last column matters for differentiation. A virtual background alone is table stakes. Combining it with face tracking, makeup AR, or interactive 3D rooms is what raises retention.

Ways to Implement Background Subtraction
There are two honest paths to shipping background subtraction. Both work. They serve different teams and different timelines.
Building from Scratch
This is a real R&D project, not a sprint. And for this sprint, you will need the following tech stack:
- Computer vision and deep learning expertise (PyTorch or TensorFlow)
- Mobile inference runtimes (Core ML, TensorFlow Lite, ONNX Runtime, MediaPipe)
- Native graphics frameworks (Metal, OpenGL ES, Vulkan, WebGL)
- Cross-platform tooling for Android, iOS, and web parity
- A model training pipeline and a dataset large enough to generalize across skin tones, lighting, and camera hardware
What you have to build
- A segmentation model trained on a representative dataset. For example, Banuba's networks were taught using a dataset of over 200K photos of men and women of all skin colors, in good and awful lighting conditions, and with both low-end and high-end cameras. Anything significantly smaller and you'll see edge artifacts on darker skin tones, in low light, or on older phones.
- A GPU rendering pipeline that composites the foreground mask onto the chosen background mode at native frame rates.
- Anti-jitter logic that smooths the mask between frames without introducing lag.
- Per-device performance tuning. A Snapdragon 695 and an A17 Pro behave very differently.
Realistic timeline and cost
Expect to invest hundreds of thousands of dollars and at least six months in development, and that's just the minimum viable product version. Twelve months to a polished release is more typical once you account for QA across the device spectrum.
Why teams still choose this path
- Full control of the segmentation model and the IP
- Ability to optimize for one specific use case (e.g., medical imaging, security cameras)
- Long-term cost reduction if usage volume is enormous
- Strategic value as patentable IP
Why most teams don't
- Talent is scarce and expensive
- Hardware fragmentation on Android is brutal
- The model must be retrained as devices and camera sensors evolve
- Time-to-market kills the launch window before you ship v1
Using an SDK
An SDK is a prepackaged module that drops the segmentation engine, rendering pipeline, and platform bindings into your app through standard package managers.
What you trade
- Some control over the model architecture
- A licensing fee instead of an internal team
What you gain
- Weeks instead of months. The integration process can be done within a day for a working prototype, with a few weeks for production polish.
- A model trained on a dataset larger than most teams can assemble
- Cross-platform parity already solved
- Ongoing improvements pushed in version updates
- Predictable per-MAU pricing
This is the path Banuba's own customers, including True Digital's VROOM, sMedio, and Chingari, have taken to ship faster.
Comparison Table: Build vs Background Subtraction SDK

SDK-focused Background Subtraction Implementation with Banuba
Banuba's Background Subtraction SDK is built on patented computer vision technology developed in-house. It separates the user from the surroundings using deep learning rather than chroma key, so no green screen is needed. The SDK then composites the foreground against blur, a static image, an animated GIF, a video, or a 3D environment.
What it replaces if you were going to build:
- The segmentation neural network and its training data
- The mobile and web inference runtime
- The GPU rendering pipeline
- The capture and frame management layer
- The cross-platform bindings
Performance characteristics
- Real-time 30 fps on mobile and web. Even on iPhone 7, the system holds 30 fps for at least an hour of non-stop work without lags or overheating, and on the latest devices, this can reach 300 fps.
- Maintains 30 fps tracking performance under up to 70% facial occlusion, 360° camera rotation, and low-light conditions per Banuba's 2025 internal benchmarks.
- Effective and fast performance on 90% of smartphones, including older constrained devices.
- 68 facial anchor points used by the companion face tracking module, which lets background effects combine with beautification, makeup AR, and 3D filters in the same pipeline.
Platforms supported
Native iOS and Android, web browsers without additional downloads, Flutter, React Native, Mac, Windows, and Unity. The web version works in Chrome, Safari, Firefox, Edge, and Opera, which is rare among commercial background subtraction SDKs.
Background modes available
Static images, animated GIFs, dynamic videos, and interactive 3D environments, plus blur and solid color. The "Weatherman Mode" lets end users drag and drop themselves anywhere on the screen, which is useful for presentations, education, and creator content.
Privacy posture
Effects are applied on the end-user's device with no data being sent to Banuba servers, which keeps the SDK GDPR-compliant out of the box and avoids the latency of cloud inference.
Recent upgrades
In October 2025, Banuba announced a next-generation AI model that smooths the borders between a user and their digital background, eliminating the jagged edges and pixelation that often plague video calls. CPO and Co-founder Anton Liskevich described the goal directly: "Our latest AI model doesn't just cut a person out; it intelligently blends them into a new environment." The update specifically targets the "ladder effect" along edges that has been the visible weakness of most segmentation models.
In December 2025, Banuba paired the improved virtual background with a new face shape detection module in the Face AR SDK, delivering a cleaner segmentation result by focusing on jitter and pixelization at the edges of a person, as well as more accurate separation in complex cases.
Integration Overview
The integration flow is conceptually simple:
- Request a 14-day trial token from Banuba.
- Add the SDK to your project through CocoaPods (iOS), Maven (Android), npm (Web), or the Unity package.
- Initialize the SDK with the token.
- Pass camera frames to the SDK and receive the composited output.
- Configure the background mode (blur, image, video, 3D) and any face AR effects you want layered on top.
- Render the output to your existing video chat or recording pipeline.
Full implementation details, code samples, and configuration references live in the official documentation:
The GitHub includes platform-specific quickstart projects for iOS, Android, Web, Flutter, React Native, and Unity, so most teams can have a working camera-to-background-replacement loop running on day one.
Implementation Decision Framework
Use this short decision tree before committing to a path.
Choose an SDK if:
- You need to ship in under 3 months
- Your team does not have a computer vision specialist
- You want web, iOS, Android, and desktop parity from day one
- You want background subtraction combined with face tracking, makeup AR, or beautification in the same pipeline
- Predictable per-MAU pricing fits your business model
Choose to build if:
- Background subtraction is your core IP and your moat
- You have a senior CV team and at least 12 months of runway
- Your use case sits outside the SDK's design (e.g., medical pixel-precision segmentation)
- You expect volume that makes per-MAU licensing more expensive than internal maintenance
For most product teams in video conferencing, dating, telehealth, live commerce, and education, the SDK route wins on every axis except deep model control. That tradeoff is rarely worth twelve months of engineering.
Conclusion
Background subtraction is one of those features that looks simple on the surface and sits on top of a serious computer vision stack underneath. Doing it well requires a trained segmentation network, a GPU-accelerated rendering pipeline, anti-jitter logic, and consistent performance across the device spectrum. Building all of that takes a year and a specialist team. Most product roadmaps cannot absorb that.
An SDK shortcuts the entire stack. You get a pretrained model, a rendering pipeline, and cross-platform bindings as one drop-in dependency, and you ship in weeks. Banuba's Background Subtraction SDK adds three things that matter beyond the basics: 30 fps performance on 90% of smartphones, on-device processing for privacy, and tight integration with face AR so you can layer beautification, makeup, and 3D filters on the same pipeline.
If you're scoping a virtual background feature for a video chat, dating, telehealth, education, or live commerce app, the question is rarely "build or buy?" It's "how fast do you need to be in front of users?" Trial the SDK first. If it covers your use case, you've saved a year. If it doesn't, you'll have a much sharper requirements document for your custom build.
Get a 14-day trial token now and validate it against your real workload before you commit to either path.
References
Banuba. (2021, June 28). Video background subtraction in a nutshell. https://www.banuba.com/blog/background-subtraction-in-a-nutshell
Banuba. (2022, September 16). We tested background subtraction methods: Here's what we found. https://www.banuba.com/blog/background-subtraction-guide
Banuba. (2023). 30% more MAUs and 54% more users for video conferencing app. https://www.banuba.com/blog/30-more-maus-and-54-more-users-vroom-success-story
Banuba. (2025). Background subtraction with deep learning: Detection, removal. https://www.banuba.com/technology/background-subtraction
Banuba. (2025). Face AR technology. https://www.banuba.com/technology/
Banuba. (2025). What is the best background subtraction SDK with real-time processing for mobile and web apps? https://www.banuba.com/faq/best-background-subtraction-sdk-with-real-time-processing
Banuba. (2026, February 23). Webcam background removal software: Definitive guide. https://www.banuba.com/blog/webcam-background-removal-and-replacement
Business Wire. (2023, July 31). Banuba Face AR SDK boosted MAU growth by 30% for VROOM. https://www.businesswire.com/news/home/20230731308608/en/Banuba-Face-AR-SDK-Boosted-MAU-growth-by-30-for-VROOM
Business Wire. (2025, October 10). Banuba unveils next-generation AI for flawless virtual backgrounds. https://www.businesswire.com/news/home/20251010633225/en/Banuba-Unveils-Next-Generation-AI-for-Flawless-Virtual-Backgrounds
Business Wire. (2025, December 22). Banuba enhances Face AR SDK with superior virtual backgrounds and face shape detection. https://www.businesswire.com/news/home/20251222329858/en/Banuba-Enhances-Face-AR-SDK-with-Superior-Virtual-Backgrounds-and-Face-Shape-Detection
Fortune Business Insights. (2025). Video conferencing market size, share, trends and growth analysis report, 2026–2034. https://www.fortunebusinessinsights.com/industry-reports/video-conferencing-market-100293
Zebracat. (2025, April 3). 150+ video conferencing statistics for 2025. https://www.zebracat.ai/post/video-conferencing-statistics