Supporting iOS Safari for a website using MediaPipe ML models, canvas MediaStreams and webm
This story focuses on the implementation and modifications done to video-mash.com
to work around technological issues encountered when viewing it using the iOS Safari web-browser. The technologies employed in video-mash.com
have been described in this story (MediaPipe ML model) and that story (canvas, MediaStreams and WebM).
The views/opinions expressed in this story are my own. This story relates my personal experience and choices, and is provided for information in the hope that it will be useful but without any warranty.
I developed the website video-mash.com
over the past few months. My development setup is based on Windows laptops and Android phones. I shared the website with a few people and was pleased with the responses until I shared it with friends using iPhones. Unfortunately it all went tumbling down — blank screen, non-usable features, etc. I do not have access to a Mac, but I do have an iPhone 6 lying around gathering dust. So it was time to boot it up. As expected video-mash.com
did not work, so here is the story describing the challenges I encountered and how I side-stepped them.
This story focuses on the following three aspects of the modifications required to support iOS Safari version 12.5 — the version of the iPhone I have access to:
- Working around MediaPipe SelfieSegmentation issue;
- Managing
createImageBitmap
limitations; - Working around the canvas
captureStream
issue to allow for video export.
MediaPipe SelfieSegmentation and Person Segmentation
GitHub issue 2675 reports that MediaPipe SelfieSegmentation display a blank canvas when the demo is viewed using Safari version 15.0. This symptom is consistent with the issues I encountered.
I considered two options to solve this issue: (a) investigate why the model does not work on Safari or (b) find a suitable replacement. I did not considered myself skilled enough to tackle the first option and proceeded with the second option — finding a replacement. I settled for using the updated BodyPix
model.
The model is available as part of TensorFlowJs and using it as part of video-mash.com
is reasonably straightforward. The use of the MediaPipe SelfieSegmentation and BodyPix model differs on the following:
- SelfieSegmentation model uses a callback to provide results, whereby BodyPix returns the results as a response to the asynchronous segmentation function;
- SelfieSegmentation provides results in the form of 2 bitmap images corresponding to the original image and the segmentation mask. BodyPix returns a single integer array of length corresponding to the number of pixels in the image;
- BodyPix provides additional functionalities, such as segmentation model parameters, body part segmentation, and output visualisation functions.
Lastly, BodyPix is slower than SelfieSegmentation as previously reported in this medium post — which also describes their effort at making a faster model.
Due to the speed difference, video-mash.com
uses SelfieSegmentation when it is deemed to be supported — see below for details — and revert to BodyPix otherwise. The implementation uses a wrapper class that provides a unified interface for its user and looks after the details related to selecting the segmentation model, calling it and manipulating the results. The interface of the wrapper class looks like:
class SegModel {
constructor(...) {...} // Returns the name of the model used
get model() {...} // Run the segmentation model and return the result in the form
// of {image:..., segmentationMask:...}
async send(image) {...}
}
createImageBitmap
The implementation of createImageBitmap
for webkit is known to have incomplete features as documented in bug 182424. This issue has two consequences.
First, the canvas manipulation done in video-mash.com
used bitmap images created using createImageBitmap
to transfer data. This aspect is solved simply by virtue of the canvas drawImage
operation supporting the use of canvas
elements as well as ImageBitmap
. Hence the solution is in rewriting the handling of canvas
to avoid using createImageBitmap
.
The second aspect is that this incomplete feature can be used to indicate which segmentation model to use. If the createImageBitmap
promise is successfully fulfilled, the MediaPipe SelfieSegmentation model is used. If the promise is rejected, i.e failed, then the BodyPix model is used as it is anticipated that the web-browser would not support the MediaPipe SelfieSegmentation model. This check reads as follows — with the functions makeSelfieSegModel
and makeBodypixSegModel
used to generate wrappers around MediaPipe SelfieSegmentation or BodyPix models:
const id = document.getElementById("check-createimagebitmap-canvas")
.getContext("2d")
.createImageData(100, 100);
createImageBitmap(id)
.then(() => {
// Able to create image bitmap: use mediapipe selfie
makeSelfieSegModel()
.then(resolve)
.catch(reject);
})
.catch(() => {
// Unable to create image bitmap - Safari: use bodypix
makeBodypixSegModel()
.then(resolve)
.catch(reject);
});
Canvas capturing and video exporting
WebKit Bug 181663 documents an issue related to canvas
and captureStream
that is also experienced in my case. The symptom is that the website freezes when the MediaRecorder stop function is called.
I investigated a number of options to allow video export and settled on using the gif.js.optimised
library to export the video as an animated GIF. Whilst it is reasonably satisfactory, it does not have the ability to record sound and I am unsure how it would perform on long videos. Also the GIF framerate is fixed and set to its default value of 500ms at present — which may be a little slow.
Similarly to the implementation of the segmentation model, a unified interface is used to hide the implementation details related to using MediaStream or export as GIFs. In this scenario, the interface is provided as a Vue component in place of a library to allow the interface to also manage change in UI.
The implementation decides on whether to use (a) captureStream
and export to webm or (b) use gif.js.optimised
based on the following:
- Generate a stream using
captureStream
- Check for a valid stream by checking for the existence of a video track and that the video track contains a field
deviceId
. On iOS Safari, the video track obtained usingcaptureStream
does not appear to contain valid information. This check reads as follows:
const videoTrack = (stream.getVideoTracks().length === 1 ? stream.getVideoTracks()[0] : null);
var use_mediarecorder = (videoTrack !== null) && ('deviceId' in videoTrack.getCapabilities());
These changes are allowing the website to function on Android and iPhone as well as desktop — at least I hope it does.