Video Editing

How To Implement an Overlay Video Editor (Picture-in-Picture Mode) for iOS with AVFoundation

AVFoundation is Apple's advanced multimedia framework for iOS, macOS, watchOS, and tvOS. Using its capabilities it is possible to add video capturing, editing and playback to you app. Today we will discuss a basic way of adding a picture-in-picture functionality to your app by implementing the playback and export of one smaller video overlaid on a larger one using only standard iOS SDK frameworks. In this article, we will show exactly how to do that. 

Let's start with taking a quick look at key components that we'll need for the job:

  • AVPlayerViewController is primarily used for video playback. It provides the user interface with standard playback controls for playing videos.
  • AVAssetExportSession will be used to transcode and write a final overlayed video to device's storage.
  • AVMutableComposition is designed to add and remove composition audio and video tracks as well change their time ranges.
  • AVMutableVideoComposition is a representation of the video composition and is used to specify how tracks should be mixed to render a frame of the output video.
  • AVMutableVideoCompositionInstruction is a set of layer instructions that define how tracks are layered and transformed in the final video.
  • AVMutableVideoCompositionLayerInstruction is a special class used to transform, crop and change opacity of specific video track. This is the class that we are going to use to make our second video smaller than the first one and position it at a specific corner.

Let's establish the naming convention used in the sample code. The video that will be rendered at its full size in the background will be called the main video asset. The secondary video displayed as an overlay on top of the main video will be called the overlay video asset.

Firstly, we need to load both videos. To keep things simple we are assuming that they are a part of resources inside the application bundle.

let mainVideoUrl = Bundle.main.url(forResource: "main", withExtension:
let overlayVideoUrl = Bundle.main.url(forResource: "overlay",
withExtension: "mov")!
let mainAsset = AVURLAsset(url: mainVideoUrl)
let overlayAsset = AVURLAsset(url: overlayVideoUrl)

Since we are editing two videos, we need to create two empty video tracks in
AVMutableComposition as well as the composition itself:

let mutableComposition = AVMutableComposition()
guard let mainVideoTrack =
preferredTrackID: kCMPersistentTrackID_Invalid),
let overlayVideoTrack =
preferredTrackID: kCMPersistentTrackID_Invalid) else {
// Handle error

After obtaining two empty composition tracks, we need to load a video track from each AVURLAsset and insert them to corresponding composition track:

let mainAssetTimeRange = CMTimeRange(start: .zero, duration:
let mainAssetVideoTrack = mainAsset.tracks(withMediaType:[0]
try mainVideoTrack.insertTimeRange(mainAssetTimeRange, of:
mainAssetVideoTrack, at: .zero)
let overlayedAssetTimeRange = CMTimeRange(start: .zero, duration:
let overlayAssetVideoTrack = overlayAsset.tracks(withMediaType:[0]
try overlayVideoTrack.insertTimeRange(overlayedAssetTimeRange, of:
overlayAssetVideoTrack, at: .zero)

It is now time to create two instances of AVMutableVideoCompositionLayerInstruction:

let mainAssetLayerInstruction =
AVMutableVideoCompositionLayerInstruction(assetTrack: mainVideoTrack)
let overlayedAssetLayerInstruction =
AVMutableVideoCompositionLayerInstruction(assetTrack: overlayVideoTrack)

To adjust the size and position of overlay video, we will use the setTransform method. To make the overlay size approximately half as tall and half as wide as the main video, an identity transform scaled by 50% is used. The exact position of the overlay is adjusted by translating the transform. In the following code snippet, we demonstrate the examples of transforms that cause the overlay to be show at top-left, top-right, bottom-left or bottom-right position.

let naturalSize = mainAssetVideoTrack.naturalSize
let halfWidth = naturalSize.width / 2
let halfHeight = naturalSize.height / 2
let topLeftTransform: CGAffineTransform = .identity.translatedBy(x: 0, y: 0).scaledBy(x: 0.5, y: 0.5)
let topRightTransform: CGAffineTransform = .identity.translatedBy(x: halfWidth, y: 0).scaledBy(x: 0.5, y: 0.5)
let bottomLeftTransform: CGAffineTransform = .identity.translatedBy(x: 0, y: halfHeight).scaledBy(x: 0.5, y: 0.5)
let bottomRightTransform: CGAffineTransform = .identity.translatedBy(x: halfWidth, y: halfHeight).scaledBy(x: 0.5, y: 0.5)
overlayedAssetLayerInstruction.setTransform(topRightTransform, at: .zero)

We assume that resolution of both main and overlay videos are the same. Otherwise, we will need to change the calculation of the transform. In this example, both videos had a 1920 by 1080 pixels resolution.

Now the only thing left is to create an AVMutableComposition and an AVMutableVideoComposition that will be used later during video playback and export.

let instruction = AVMutableVideoCompositionInstruction()
instruction.timeRange = mainAssetTimeRange
instruction.layerInstructions = [overlayedAssetLayerInstruction,
mutableComposition.naturalSize = naturalSize
mutableVideoComposition = AVMutableVideoComposition()
mutableVideoComposition.frameDuration = CMTimeMake(value: 1, timescale:
mutableVideoComposition.renderSize = naturalSize
mutableVideoComposition.instructions = [instruction]

The AVPlayerViewController will be used to display our picture-in-picture video. To use it, it is necessary to create an instance of AVPlayer and pass it as a reference to AVPlayerItem that will use AVMutableVideoComposition and AVMutableComposition created earlier. Make sure that you've imported the AVKit framework in you source file, otherwise an error "Cannot find 'AVPlayerViewController' in scope" will appear during the compilation.

let playerItem = AVPlayerItem(asset: mutableComposition)
playerItem.videoComposition = mutableVideoComposition
let player = AVPlayer(playerItem: playerItem)
let playerViewController = AVPlayerViewController()
playerViewController.player = player

After creating the player, you can either display it fullscreen using standard
UIViewController methods like present() or show it in line with other views using view controller containment API like addChild() and didMove(toParent:).
As a last step of this tutorial we will export the resulting video using

guard let session = AVAssetExportSession(asset: mutableComposition,
presetName: AVAssetExportPresetHighestQuality) else {
// Handle error
let tempUrl =
session.outputURL = tempUrl
session.outputFileType = AVFileType.mp4
session.videoComposition = mutableVideoComposition
session.exportAsynchronously {
// Handle export status

This is what our final test video looks like:

image (10)

If you would like to adjust the appearance of overlayed video, for example add a color border or shadow or draw the overlay with rounded edges, then it will be necessary not only to create a custom compositor class that would implement the AVVideoCompositing protocol, but also use a custom instruction conforming to AVVideoCompositionInstructionProtocol instead of

A simpler way

The process shown above is a straightforward and reliable way to implement an overlay video editor. However, if you need it done faster and have additional features, you can integrate Banuba Video Editor SDK. It is distributed as CocoaPods/Maven/npm package (depending on the platform) and is compatible with Flutter and React Native. Its feature set also includes:

  • Trimming/merging
  • Sound editing
  • Text/picture/GIF overlays
  • Transition effects
  • 3D masks
  • Color filters (LUTs)
  • Slo-mo/rapid
  • Music provider integration
  • etc.

You can try it for free - the trial lasts 14 days. No credit card required.

Get Video Editor SDK for Your App  Get Free Trial