Supporting iOS Safari for a website using MediaPipe ML models, canvas MediaStreams and webm

Julien de Charentenay
5 min readDec 29, 2021

--

This story focuses on the implementation and modifications done to video-mash.com to work around technological issues encountered when viewing it using the iOS Safari web-browser. The technologies employed in video-mash.com have been described in this story (MediaPipe ML model) and that story (canvas, MediaStreams and WebM).

Screenshot of using video-mash.com on iOS Safari — version 12.5

The views/opinions expressed in this story are my own. This story relates my personal experience and choices, and is provided for information in the hope that it will be useful but without any warranty.

I developed the website video-mash.com over the past few months. My development setup is based on Windows laptops and Android phones. I shared the website with a few people and was pleased with the responses until I shared it with friends using iPhones. Unfortunately it all went tumbling down — blank screen, non-usable features, etc. I do not have access to a Mac, but I do have an iPhone 6 lying around gathering dust. So it was time to boot it up. As expected video-mash.com did not work, so here is the story describing the challenges I encountered and how I side-stepped them.

This story focuses on the following three aspects of the modifications required to support iOS Safari version 12.5 — the version of the iPhone I have access to:

  • Working around MediaPipe SelfieSegmentation issue;
  • Managing createImageBitmap limitations;
  • Working around the canvas captureStream issue to allow for video export.

MediaPipe SelfieSegmentation and Person Segmentation

GitHub issue 2675 reports that MediaPipe SelfieSegmentation display a blank canvas when the demo is viewed using Safari version 15.0. This symptom is consistent with the issues I encountered.

I considered two options to solve this issue: (a) investigate why the model does not work on Safari or (b) find a suitable replacement. I did not considered myself skilled enough to tackle the first option and proceeded with the second option — finding a replacement. I settled for using the updated BodyPix model.

The model is available as part of TensorFlowJs and using it as part of video-mash.com is reasonably straightforward. The use of the MediaPipe SelfieSegmentation and BodyPix model differs on the following:

  • SelfieSegmentation model uses a callback to provide results, whereby BodyPix returns the results as a response to the asynchronous segmentation function;
  • SelfieSegmentation provides results in the form of 2 bitmap images corresponding to the original image and the segmentation mask. BodyPix returns a single integer array of length corresponding to the number of pixels in the image;
  • BodyPix provides additional functionalities, such as segmentation model parameters, body part segmentation, and output visualisation functions.

Lastly, BodyPix is slower than SelfieSegmentation as previously reported in this medium post — which also describes their effort at making a faster model.

Due to the speed difference, video-mash.com uses SelfieSegmentation when it is deemed to be supported — see below for details — and revert to BodyPix otherwise. The implementation uses a wrapper class that provides a unified interface for its user and looks after the details related to selecting the segmentation model, calling it and manipulating the results. The interface of the wrapper class looks like:

class SegModel {
constructor(...) {...}
// Returns the name of the model used
get model() {...}
// Run the segmentation model and return the result in the form
// of {image:..., segmentationMask:...}
async send(image) {...}
}

createImageBitmap

The implementation of createImageBitmap for webkit is known to have incomplete features as documented in bug 182424. This issue has two consequences.

First, the canvas manipulation done in video-mash.com used bitmap images created using createImageBitmap to transfer data. This aspect is solved simply by virtue of the canvas drawImage operation supporting the use of canvas elements as well as ImageBitmap. Hence the solution is in rewriting the handling of canvas to avoid using createImageBitmap.

The second aspect is that this incomplete feature can be used to indicate which segmentation model to use. If the createImageBitmap promise is successfully fulfilled, the MediaPipe SelfieSegmentation model is used. If the promise is rejected, i.e failed, then the BodyPix model is used as it is anticipated that the web-browser would not support the MediaPipe SelfieSegmentation model. This check reads as follows — with the functions makeSelfieSegModel and makeBodypixSegModel used to generate wrappers around MediaPipe SelfieSegmentation or BodyPix models:

const id = document.getElementById("check-createimagebitmap-canvas")
.getContext("2d")
.createImageData(100, 100);
createImageBitmap(id)
.then(() => {
// Able to create image bitmap: use mediapipe selfie
makeSelfieSegModel()
.then(resolve)
.catch(reject);
})
.catch(() => {
// Unable to create image bitmap - Safari: use bodypix
makeBodypixSegModel()
.then(resolve)
.catch(reject);
});

Canvas capturing and video exporting

WebKit Bug 181663 documents an issue related to canvas and captureStream that is also experienced in my case. The symptom is that the website freezes when the MediaRecorder stop function is called.

I investigated a number of options to allow video export and settled on using the gif.js.optimised library to export the video as an animated GIF. Whilst it is reasonably satisfactory, it does not have the ability to record sound and I am unsure how it would perform on long videos. Also the GIF framerate is fixed and set to its default value of 500ms at present — which may be a little slow.

Similarly to the implementation of the segmentation model, a unified interface is used to hide the implementation details related to using MediaStream or export as GIFs. In this scenario, the interface is provided as a Vue component in place of a library to allow the interface to also manage change in UI.

The implementation decides on whether to use (a) captureStream and export to webm or (b) use gif.js.optimised based on the following:

  • Generate a stream using captureStream
  • Check for a valid stream by checking for the existence of a video track and that the video track contains a field deviceId. On iOS Safari, the video track obtained using captureStream does not appear to contain valid information. This check reads as follows:
const videoTrack = (stream.getVideoTracks().length === 1 ? stream.getVideoTracks()[0] : null);
var use_mediarecorder = (videoTrack !== null) && ('deviceId' in videoTrack.getCapabilities());

These changes are allowing the website to function on Android and iPhone as well as desktop — at least I hope it does.

--

--

Julien de Charentenay
Julien de Charentenay

Written by Julien de Charentenay

I write about a story a month on rust, JS or CFD. Email masking side project @ https://1-ml.com & personal website @ https://www.charentenay.me/

No responses yet