This PR addresses an issue with the previous implementation of Background Blurring. It used too much memory space, as openCV contains huge builds for every possible memory architecture supported on Android. This leads to duplicate files, large amounts of unnecessary code, and a larger download size. The MediaPipe library and the .tflite file used for foreground-background detection was trivial in comparison. This is fixed by moving the entire blurring operation to openGL ES, a graphics framework that is built into Android, cutting out 100 MB from the build size while also increasing performance. This is done by the implementing Gaussian Blur algorithm from scratch in GLSL, the openGL Shader language.
This is not a trivial task, as it involves the interop between MediaPipe, OpenGL, and the apps kotlin code to handle format conversion, perform advanced calculations, and prevent memory leaks. This blog will walk over the background knowledge needed to understand this PR. In it, I'll go over
If you've ever taken a statistics class, you likely have worked with a normal distribution.
This a naturally occurring phenomenon of distribution that models many things in life
from test scores to insurance rates. These kinds of bell curve functions are known as
Gaussians and they were discovered by German Mathematician Carl Frederich Gauss, in the 1800s.
![]() |
![]() |
![]() |
![]() |
If you've were done grade school math, you would have likely worked on a system of linear equations.
That is, usually 2 or more equations with 2 or more variables, and the goal is to find the value of
each variable that satisfies all the equations.
Scales an object by sx and sy.
Rotates (Counter-clockwise) an object by angle θ.
Most of computer software runs on the CPU (Central processing unit) that runs the extensive calculations and memory shifting
operations that make modern computing possible. But this isn't the 60s anymore, we want information present on screens
in a convenient ease to access layout. This requires the use of graphics, displaying pixels in a certain order, to show information.
Early computers just did everything on the CPU, but later on, a dedicated chip was created to optimize graphics processing
operations. The GPU (Graphics Processing Unit) was born. Now on big computers, the GPU is given it's own dedicated card and peripherals
but on mobile phones, the GPU is located on the same card as the CPU, just in a different physical location.
A lot of modern graphics is taken for granted, but the underlying implementation is actually pretty complicated.
While on embedded machines, it's usually just writing to a magic memory address thats hardwired to the screen. In modern computing
we require the use of several intermediate stages from processing the raw data, to deciding how to interpet that data, to
performing operations on that data, that must run in parallel to take advantage of the multithreaded nature of GPU's.
As you might imagine, this is a huge pain to deal with, so we invented Graphics Libraries to make our lives (somewhat) easier.
OpenGL (Open Graphics Library) is a widely adopted, cross-platform API for rendering 2D and 3D vector graphics, enabling software to communicate directly with GPUs for high-performance,
hardware-accelerated rendering. Used in CAD, game development, virtual reality, and simulation, it acts as a standard interface, with specialized versions
like OpenGL ES for mobile/embedded devices and WebGL for browsers.
As mentioned before, GPU and CPU's exist in 2 different physical spaces. They have two different memory spaces
and different ways of accessing them. Because of this OpenGL has a lot of boilerplate, as it needs to manage 2 different memory spaces.
Therefore it operates a lot like a state machine under the hood, hence why we need to explicitly define
variables, bind those variables to memory in the gpu, activate/deactivate that memory, send/read data from
the GPU, and release memory after we're done using it. In addition, a lot of these functions have terrible
documentation, with vague naming, and unclear parameters.
If you've ever written code in C, you already know most of GLSL. It's a C style language
used to translate high level code into low level operations that the GPU can understand.
Understanding glsl, mostly comes down to understanding the the shading stage of openGL.
Basically, the keywords `in` and `out` refer to the entry and exit variable of the program.
There is no returning in glsl. In addition, each program runs in a multi threaded enviornment,
you can think of the main function having multiple instances of itself running at the same time.
Also depending on the version, certain system defined global variables like `gl_Position` can be
the exit variable as well.
#version 300 es
in vec4 a_Position;
in vec2 a_TexCoord;
out vec2 v_TexCoord;
void main() {
gl_Position = a_Position;
v_TexCoord = a_TexCoord;
}
Previously, the PR already included the foundational work on using MediaPipe to perform a
foreground background detection on an image, and return a segmentation mask filter. This mask, is
really a matrix (2d array) of 1's and 0's, to represent if a pixel is marked foreground or background.
The Goal is to now leverage what we learned to apply this mask upon an image, blurring the background, but
keeping the foreground unblurred. This is done through several classes and helper libraries.
The app build size decreased drastically from the previous implementation. Falling from 263 MB to just 128 MB
A 135 MB loss in APK size! In addition the performance of the frame processing increased as well, with noticably reduced choppiness and cleaner
blurring. Benchmarks are TODO
I've also included demo screenshots showing the segmentation mask on the right, and it's corresponding
use in the final outputed frame on the left.