Posted by Brandon Azad, Project Zero
In this post I'll describe how I discovered and exploited CVE-2023-6225, a MIG reference counting vulnerability in XNU's task_swap_mach_voucher() function. We'll see how to exploit this bug on iOS 12.1.2 to build a fake kernel task port, giving us the ability to read and write arbitrary kernel memory. (This bug was independently discovered by @S0rryMybad.) In a later post, we'll look at how to use this bug as a starting point to analyze and bypass Apple's implementation of ARMv8.3 Pointer Authentication (PAC) on A12 devices like the iPhone XS.
A curious discovery
MIG is a tool that generates Mach message parsing code, and vulnerabilities resulting from violating MIG semantics are nothing new: for example, Ian Beer's async_wake exploited an issue where IOSurfaceRootUserClient would over-deallocate a Mach port managed by MIG semantics on iOS 11.1.2.
Most prior MIG-related issues have been the result of MIG service routines not obeying semantics around object lifetimes and ownership. Usually, the MIG ownership rules are expressed as follows:
If a MIG service routine returns success, then it took ownership of all resources passed in.
If a MIG service routine returns failure, then it took ownership of none of the resources passed in.
Unfortunately, as we'll see, this description doesn't cover the full complexity of kernel objects managed by MIG, which can lead to unexpected bugs.
The journey started while investigating a reference count overflow in semaphore_destroy(), in which an error path through the function left the semaphore_t object with an additional reference. While looking at the autogenerated MIG function _Xsemaphore_destroy() that wraps semaphore_destroy(), I noticed that this function seems to obey non-conventional semantics.
Here's the relevant code from _Xsemaphore_destroy():
task = convert_port_to_task(In0P->Head.msgh_request_port);
OutP->RetCode = semaphore_destroy(task,
convert_port_to_semaphore(In0P->semaphore.name));
task_deallocate(task);
#if __MigKernelSpecificCode
if (OutP->RetCode != KERN_SUCCESS) {
MIG_RETURN_ERROR(OutP, OutP->RetCode);
}
if (IP_VALID((ipc_port_t)In0P->semaphore.name))
ipc_port_release_send((ipc_port_t)In0P->semaphore.name);
#endif /* __MigKernelSpecificCode */
The function convert_port_to_semaphore() takes a Mach port and produces a reference on the underlying semaphore object without consuming the reference on the port. If we assume that a correct implementation of the above code doesn't leak or consume extra references, then we can conclude the following intended semantics for semaphore_destroy():
On success, semaphore_destroy() should consume the semaphore reference.
On failure, semaphore_destroy() should still consume the semaphore reference.
Thus, semaphore_destroy() doesn't seem to follow the traditional rules of MIG semantics: a correct implementation always takes ownership of the semaphore object, regardless of whether the service routine returns success or failure.
This of course begs the question: what are the full rules governing MIG semantics? And are there any instances of code violating these other MIG rules?
A bad swap
Not long into my investigation into extended MIG semantics, I discovered the function task_swap_mach_voucher(). This is the MIG definition from osfmk/mach/task.defs:
routine task_swap_mach_voucher(
task : task_t;
new_voucher : ipc_voucher_t;
inout old_voucher : ipc_voucher_t);
And here's the relevant code from _Xtask_swap_mach_voucher(), the autogenerated MIG wrapper:
mig_internal novalue _Xtask_swap_mach_voucher
(mach_msg_header_t *InHeadP, mach_msg_header_t *OutHeadP)
{
...
kern_return_t RetCode;
task_t task;
ipc_voucher_t new_voucher;
ipc_voucher_t old_voucher;
...
task = convert_port_to_task(In0P->Head.msgh_request_port);
new_voucher = convert_port_to_voucher(In0P->new_voucher.name);
old_voucher = convert_port_to_voucher(In0P->old_voucher.name);
RetCode = task_swap_mach_voucher(task, new_voucher, &old_voucher);
ipc_voucher_release(new_voucher);
task_deallocate(task);
if (RetCode != KERN_SUCCESS) {
MIG_RETURN_ERROR(OutP, RetCode);
}
...
if (IP_VALID((ipc_port_t)In0P->old_voucher.name))
ipc_port_release_send((ipc_port_t)In0P->old_voucher.name);
if (IP_VALID((ipc_port_t)In0P->new_voucher.name))
ipc_port_release_send((ipc_port_t)In0P->new_voucher.name);
...
OutP->old_voucher.name = (mach_port_t)convert_voucher_to_port(old_voucher);
OutP->Head.msgh_bits |= MACH_MSGH_BITS_COMPLEX;
OutP->Head.msgh_size = (mach_msg_size_t)(sizeof(Reply));
OutP->msgh_body.msgh_descriptor_count = 1;
}
Once again, assuming that a correct implementation doesn't leak or consume extra references, we can infer the following intended semantics for task_swap_mach_voucher():
task_swap_mach_voucher() does not hold a reference on new_voucher; the new_voucher reference is borrowed and should not be consumed.
task_swap_mach_voucher() holds a reference on the input value of old_voucher that it should consume.
On failure, the output value of old_voucher should not hold any references on the pointed-to voucher object.
On success, the output value of old_voucher holds a voucher reference donated from task_swap_mach_voucher() to _Xtask_swap_mach_voucher() that the latter consumes via convert_voucher_to_port().
With these semantics in mind, we can compare against the actual implementation. Here's the code from XNU 4903.221.2's osfmk/kern/task.c, presumably a placeholder implementation:
kern_return_t
task_swap_mach_voucher(
task_t task,
ipc_voucher_t new_voucher,
ipc_voucher_t *in_out_old_voucher)
{
if (TASK_NULL == task)
return KERN_INVALID_TASK;
*in_out_old_voucher = new_voucher;
return KERN_SUCCESS;
}
This implementation does not respect the intended semantics:
The input value of in_out_old_voucher is a voucher reference owned by task_swap_mach_voucher(). By unconditionally overwriting it without first calling ipc_voucher_release(), task_swap_mach_voucher() leaks a voucher reference.
The value new_voucher is not owned by task_swap_mach_voucher(), and yet it is being returned in the output value of in_out_old_voucher. This consumes a voucher reference that task_swap_mach_voucher() does not own.
Thus, task_swap_mach_voucher() actually contains two reference counting issues! We can leak a reference on a voucher by calling task_swap_mach_voucher() with the voucher as the third argument, and we can drop a reference on the voucher by passing the voucher as the second argument. This is a great exploitation primitive, since it offers us nearly complete control over the voucher object's reference count.
(Further investigation revealed that thread_swap_mach_voucher() contained a similar vulnerability, but only the reference leak part, and changes in iOS 12 made the vulnerability unexploitable.)
On vouchers
In order to grasp the impact of this vulnerability, it's helpful to understand a bit more about Mach vouchers, although the full details aren't important for exploitation.
Mach vouchers are represented by the type ipc_voucher_t in the kernel, with the following structure definition:
/*
* IPC Voucher
*
* Vouchers are a reference counted immutable (once-created) set of
* indexes to particular resource manager attribute values
* (which themselves are reference counted).
*/
struct ipc_voucher {
iv_index_t iv_hash; /* checksum hash */
iv_index_t iv_sum; /* checksum of values */
os_refcnt_t iv_refs; /* reference count */
iv_index_t iv_table_size; /* size of the voucher table */
iv_index_t iv_inline_table[IV_ENTRIES_INLINE];
iv_entry_t iv_table; /* table of voucher attr entries */
ipc_port_t iv_port; /* port representing the voucher */
queue_chain_t iv_hash_link; /* link on hash chain */
};
As the comment indicates, an IPC voucher represents a set of arbitrary attributes that can be passed between processes via a send right in a Mach message. The primary client of Mach vouchers appears to be Apple's libdispatch library.
The only fields of ipc_voucher relevant to us are iv_refs and iv_port. The other fields are related to managing the global list of voucher objects and storing the attributes represented by a voucher, neither of which will be used in the exploit.
As of iOS 12, iv_refs is of type os_refcnt_t, which is a 32-bit reference count with allowed values in the range 1-0x0fffffff (that's 7 f's, not 8). Trying to retain or release a voucher with a reference count outside this range will trigger a panic.
iv_port is a pointer to the ipc_port object that represents this voucher to userspace. It gets initialized whenever convert_voucher_to_port() is called on an ipc_voucher with iv_port set to NULL.
In order to create a Mach voucher, you can call the host_create_mach_voucher() trap. This function takes a "recipe" describing the voucher's attributes and returns a voucher port representing the voucher. However, because vouchers are immutable, there is one quirk: if the resulting voucher's attributes are exactly the same as a voucher that already exists, then host_create_mach_voucher() will simply return a reference to the existing voucher rather than creating a new one.
That's out of line!
There are many different ways to exploit this bug, but in this post I'll discuss my favorite: incrementing an out-of-line Mach port pointer so that it points into pipe buffers.
Now that we understand what the vulnerability is, it's time to determine what we can do with it. As you'd expect, an ipc_voucher gets deallocated once its reference count drops to 0. Thus, we can use our vulnerability to cause the voucher to be unexpectedly freed.
But freeing the voucher is only useful if the freed voucher is subsequently reused in an interesting way. There are three components to this: storing a pointer to the freed voucher, reallocating the freed voucher with something useful, and reusing the stored voucher pointer to modify kernel state. If we can't get any one of these steps to work, then the whole bug is pretty much useless.
Let's consider the first step, storing a pointer to the voucher. There are a few places in the kernel that directly or indirectly store voucher pointers, including struct ipc_kmsg's ikm_voucher field and struct thread's ith_voucher field. Of these, the easiest to use is ith_voucher, since we can directly read and write this field's value from userspace by calling thread_get_mach_voucher() and thread_set_mach_voucher(). Thus, we can make ith_voucher point to a freed voucher by first calling thread_set_mach_voucher() to store a reference to the voucher, then using our voucher bug to remove the added reference, and finally deallocating the voucher port in userspace to free the voucher.
Next consider how to reallocate the voucher with something useful. ipc_voucher objects live in their own zalloc zone, ipc.vouchers, so we could easily get our freed voucher reallocated with another voucher object. Reallocating with any other type of object, however, would require us to force the kernel to perform zone garbage collection and move a page containing only freed vouchers over to another zone. Unfortunately, vouchers don't seem to store any significant privilege-relevant attributes, so reallocating our freed voucher with another voucher probably isn't helpful. That means we'll have to perform zone gc and reallocate the voucher with another type of object.
In order to figure out what type of object we should reallocate with, it's helpful to first examine how we will use the dangling voucher pointer in the thread's ith_voucher field. We have a few options, but the easiest is to call thread_get_mach_voucher() to create or return a voucher port for the freed voucher. This will invoke ipc_voucher_reference() and convert_voucher_to_port() on the freed ipc_voucher object, so we'll need to ensure that both iv_refs and iv_port are valid.
But what makes thread_get_mach_voucher() so useful for exploitation is that it returns the voucher's Mach port back to userspace. There are two ways we could leverage this. If the freed ipc_voucher object's iv_port field is non-NULL, then that pointer gets directly interpreted as an ipc_port pointer and thread_get_mach_voucher() returns it to us as a Mach send right. On the other hand, if iv_port is NULL, then convert_voucher_to_port() will return a freshly allocated voucher port that allows us to continue manipulating the freed voucher's reference count from userspace.
This brought me to the idea of reallocating the voucher using out-of-line ports. One way to send a large number of Mach port rights in a message is to list the ports in an out-of-line ports descriptor. When the kernel copies in an out-of-line ports descriptor, it allocates an array to store the list of ipc_port pointers. By sending many Mach messages containing out-of-line ports descriptors, we can reliably reallocate the freed ipc_voucher with an array of out-of-line Mach port pointers.
Since we can control which elements in the array are valid ports and which are MACH_PORT_NULL, we can ensure that we overwrite the voucher's iv_port field with NULL. That way, when we call thread_get_mach_voucher() in userspace, convert_voucher_to_port() will allocate a fresh voucher port that points to the overlapping voucher. Then we can use the reference counting bug again on the returned voucher port to modify the freed voucher's iv_refs field, which will change the value of the out-of-line port pointer that overlaps iv_refs by any amount we want.
Of course, we haven't yet addressed the question of ensuring that the iv_refs field is valid to begin with. As previously mentioned, iv_refs must be in the range 1-0x0fffffff if we want to reuse the freed ipc_voucher without triggering a kernel panic.
The ipc_voucher structure is 0x50 bytes and the iv_refs field is at offset 0x8; since the iPhone is little-endian, this means that if we reallocate the freed voucher with an array of out-of-line ports, iv_refs will always overlap with the lower 32 bits of an ipc_port pointer. Let's call the Mach port that overlaps iv_refs the base port. Using either MACH_PORT_NULL or MACH_PORT_DEAD as the base port would result in iv_refs being either 0 or 0xffffffff, both of which are invalid. Thus, the only remaining option is to use a real Mach port as the base port, so that iv_refs is overwritten with the lower 32 bits of a real ipc_port pointer.
This is dangerous because if the lower 32 bits of the base port's address are 0 or greater than 0x0fffffff, accessing the freed voucher will panic. Fortunately, kernel heap allocation on recent iOS devices is pretty well behaved: zalloc pages will be allocated from the range 0xffffffe0xxxxxxxx starting from low addresses, so as long as the heap hasn't become too unruly since the system booted (e.g. because of a heap groom or lots of activity), we can be reasonably sure that the lower 32 bits of the base port's address will lie within the required range. Hence overlapping iv_refs with an out-of-line Mach port pointer will almost certainly work fine if the exploit is run after a fresh boot.
This gives us our working strategy to exploit this bug:
Allocate a page of Mach vouchers.
Store a pointer to the target voucher in the thread's ith_voucher field and drop the added reference using the vulnerability.
Deallocate the voucher ports, freeing all the vouchers.
Force zone gc and reallocate the page of freed vouchers with an array of out-of-line ports. Overlap the target voucher's iv_refs field with the lower 32 bits of a pointer to the base port and overlap the voucher's iv_port field with NULL.
Call thread_get_mach_voucher() to retrieve a voucher port for the voucher overlapping the out-of-line ports.
Use the vulnerability again to modify the overlapping voucher's iv_refs field, which changes the out-of-line base port pointer so that it points somewhere else instead.
Once we receive the Mach message containing the out-of-line ports, we get a send right to arbitrary memory interpreted as an ipc_port.
Pipe dreams
So what should we get a send right to? Ideally we'd be able to fully control the contents of the fake ipc_port we receive without having to play risky games by deallocating and then reallocating the memory backing the fake port.
Ian actually came up with a great technique for this in his multi_path and empty_list exploits using pipe buffers. Our exploit so far allows us to modify an out-of-line pointer to the base port so that it points somewhere else. So, if the original base port lies directly in front of a bunch of pipe buffers in kernel memory, then we can leak voucher references to increment the base port pointer in the out-of-line ports array so that it points into the pipe buffers instead.
At this point, we can receive the message containing the out-of-line ports back in userspace. This message will contain a send right to an ipc_port that overlaps one of our pipe buffers, so we can directly read and write the contents of the fake ipc_port's memory by reading and writing the overlapping pipe's file descriptors.
tfp0
Once we have a send right to a completely controllable ipc_port object, exploitation is basically deterministic.
We can build a basic kernel memory read primitive using the same old pid_for_task() trick: convert our port into a fake task port such that the fake task's bsd_info field (which is a pointer to a proc struct) points to the memory we want to read, and then call pid_for_task() to read the 4 bytes overlapping bsd_info->p_pid. Unfortunately, there's a small catch: we don't know the address of our pipe buffer in kernel memory, so we don't know where to make our fake task port's ip_kobject field point.
We can get around this by instead placing our fake task struct in a Mach message that we send to the fake port, after which we can read the pipe buffer overlapping the port and get the address of the message containing our fake task from the port's ip_messages.imq_messages field. Once we know the address of the ipc_kmsg containing our fake task, we can overwrite the contents of the fake port to turn it into a task port pointing to the fake task, and then call pid_for_task() on the fake task port as usual to read 4 bytes of arbitrary kernel memory.
An unfortunate consequence of this approach is that it leaks one ipc_kmsg struct for each 4-byte read. Thus, we'll want to build a better read primitive as quickly as possible and then free all the leaked messages.
In order to get the address of the pipe buffer we can leverage the fact that it resides at a known offset from the address of the base port. We can call mach_port_request_notification() on the fake port to add a request that the base port be notified once the fake port becomes a dead name. This causes the fake port's ip_requests field to point to a freshly allocated array containing a pointer to the base port, which means we can use our memory read primitive to read out the address of the base port and compute the address of the pipe buffer.
At this point we can build a fake kernel task inside the pipe buffer, giving us full kernel read/write. Next we allocate kernel memory with mach_vm_allocate(), write a new fake kernel task inside that memory, and then modify the fake port pointer in our process's ipc_entry table to point to the new kernel task instead. Finally, once we have our new kernel task port, we can clean up all the leaked memory.
And that's the complete exploit! You can find exploit code for the iPhone XS, iPhone XR, and iPhone 8 here: voucher_swap. A more in-depth, step-by-step technical analysis of the exploit technique is available in the source code.
Bug collision
I reported this vulnerability to Apple on December 6, 2023, and by December 19th Apple had already released iOS 12.1.3 beta build 16D5032a which fixed the issue. Since this would be an incredibly quick turnaround for Apple, I suspected that this bug was found and reported by some other party first.
I subsequently learned that this bug was independently discovered and exploited by Qixun Zhao (@S0rryMybad) of Qihoo 360 Vulcan Team. Amusingly, we were both led to this bug through semaphore_destroy(); thus, I wouldn't be surprised to learn that this bug was broadly known before being fixed. SorryMybad used this vulnerability as part of a remote jailbreak for the Tianfu Cup; you can read about his strategy for obtaining tfp0.
Conclusion
This post looked at the discovery and exploitation of P0 issue 1731, an IPC voucher reference counting issue rooted in failing to follow MIG semantics for inout objects. When run a few seconds after a fresh boot, the exploit strategy discussed here is quite reliable: on the devices I've tested, the exploit succeeds upwards of 99% of the time. The exploit is also straightforward enough that, when successful, it allows us to clean up all leaked resources and leave the system in a completely stable state.
In a way, it's surprising that such "easy" vulnerabilities still exist: after all, XNU is open source and heavily scrutinized for valuable bugs like this. However, MIG semantics are very unintuitive and don't align well with the natural patterns for writing secure kernel code. While I'd love to believe that this is the last major MIG bug, I wouldn't be surprised to see at least a few more crop up.
This bug is also a good reminder that placeholder code can also introduce security vulnerabilities and should be scrutinized as tightly as functional code, no matter how simple it may seem.
And finally, it's worth noting that the biggest headache for me while exploiting this bug, the limited range of allowed reference count values, wasn't even an issue on iOS versions prior to 12. On earlier platforms, this bug would have always been incredibly reliable, not just directly after a clean boot. Thus, it's good to see that even though os_refcnt_t didn't stop this bug from being exploited, the mitigation at least impacts exploit reliability, and probably decreases the value of bugs like this to attackers.
My next post will show how to use this exploit to analyze Apple's implementation of Pointer Authentication, culminating in a technique that allows us to forge PACs for pointers signed with the A keys. This is sufficient to call arbitrary kernel functions or execute arbitrary code in the kernel via JOP.
Posting Komentar