Modernizing IDA Pro: how to make processor module glitches go away

Positive Research Center - 15 Říjen, 2018 - 12:39

Hi there,

This is my latest article on a topic near and dear to my heart: making IDA Pro more modern and, well, better.

Those familiar with IDA Pro probably know that feeling: there are glitches in the processor modules that you use, you don't have the source code, and they are driving you crazy! Unfortunately, not all of the glitches discussed here qualify as bugs, meaning that the developers are unlikely to ever fix them—unless you fix them yourself.

Localizing the glitchesNote: In this article, I will be looking for bugs and issues in the Motorola M68000 module (which happens to be my favorite, as well as a commonly used one).

Glitch #1: Addressing relative to the PC register. Specifically, the disassembler listing for such instructions is not always correct. Check out this screenshot:

Everything seems fine at first glance. And the glitch does not even interfere with analysis. But the opcode has been disassembled incorrectly. Let's look at this in an online disassembler:

We see that our addressing should be relative to the PC register since the target address of the reference is within the signed short range.

Glitch #2: "Mirrors" for RAM and certain other regions. Since addressing on m68k is 24-bit, all access to high (or low) regions should be re-addressed to the same range as the cross-references.

Glitch #3 (which is more like "missing functionality" than an actual glitch): The so-called lineA (1010) and lineF (1111) emulators. These opcodes did not make it into the main command set, so they have to be handled in a special way by interrupt vectors. The size of the opcodes depends only on the implementation in the handler. I've seen only a two-byte implementation. So we'll be adding this.

Glitch #4: Trap #N instruction" does not give any cref to the trap handlers.

Glitch #5: The movea.w instruction should make a full xref to an address from a word reference, but we get only a word integer.

Fixing the glitches (empty template)
To understand how to fix a particular processor module, you have to know what our abilities are in this regard, and what exactly a "fix" means.

In short: a fix is delivered in the form of a plug-in, which can be written in either Python or C++ (I chose the latter). C++ is less portable but if anyone is willing to take up the task of porting the plug-in to Python, I will be only grateful!

First we create an empty DLL project in Visual Studio: File->New->Project->Windows Desktop Wizard->Dynamic link library (.dll). Select the Empty Project checkbox and clear all the other checkboxes:

We unpack the IDA SDK and indicate it in the Visual Studio macros (here I am using the 2017 version), which makes it easy to refer to in the future. We simultaneously will add a macro for the path to IDA Pro.

Go to View->Other Windows->Property Manager:

Since we are working with SDK version 7.0, compilation will be performed with the x64 compiler. So select Debug | x64->Microsoft.Cpp.x64.user->Properties:

In the User Macros section, click Add Macro. There we will indicate IDA_SDK alongside the path of where we unpacked the SDK:

Now we can do the same with IDA_DIR (the path being for your copy of IDA Pro):

(By default, IDA is installed in %Program Files%, which requires administrator rights.)

Let's also get rid of the Win32 configuration (this article does not cover compilation for x86 systems) and leave just x64.

Create the empty file ida_plugin.cpp, which for the moment contains no code. Now we can select the character set and other settings for C++:

Now add some includes:

And SDK libraries:

Now we drop in the code template:

#include #include #include #include #include #include #define NAME "M68000 proc-fixer plugin"#define VERSION "1.0" static bool plugin_inited; static bool my_dbg; //--------------------------------------------------------------------------static void print_version(){ static const char format[] = NAME " v%s\n"; info(format, VERSION); msg(format, VERSION); } //--------------------------------------------------------------------------static bool init_plugin(void){ if ( != PLFM_68K) return false; return true; } #ifdef _DEBUGstatic const char* const optype_names[] = { "o_void", "o_reg", "o_mem", "o_phrase", "o_displ", "o_imm", "o_far", "o_near", "o_idpspec0", "o_idpspec1", "o_idpspec2", "o_idpspec3", "o_idpspec4", "o_idpspec5", }; static const char* const dtyp_names[] = { "dt_byte", "dt_word", "dt_dword", "dt_float", "dt_double", "dt_tbyte", "dt_packreal", "dt_qword", "dt_byte16", "dt_code", "dt_void", "dt_fword", "dt_bitfild", "dt_string", "dt_unicode", "dt_3byte", "dt_ldbl", "dt_byte32", "dt_byte64", }; static void print_insn(const insn_t *insn){ if (my_dbg) { msg("cs=%x, ", insn->cs); msg("ip=%x, ", insn->ip); msg("ea=%x, ", insn->ea); msg("itype=%x, ", insn->itype); msg("size=%x, ", insn->size); msg("auxpref=%x, ", insn->auxpref); msg("segpref=%x, ", insn->segpref); msg("insnpref=%x, ", insn->insnpref); msg("insnpref=%x, ", insn->insnpref); msg("flags["); if (insn->flags & INSN_MACRO) msg("INSN_MACRO|"); if (insn->flags & INSN_MODMAC) msg("OF_OUTER_DISP"); msg("]\n"); } } static void print_op(ea_t ea, const op_t *op){ if (my_dbg) { msg("type[%s], ", optype_names[op->type]); msg("flags["); if (op->flags & OF_NO_BASE_DISP) msg("OF_NO_BASE_DISP|"); if (op->flags & OF_OUTER_DISP) msg("OF_OUTER_DISP|"); if (op->flags & PACK_FORM_DEF) msg("PACK_FORM_DEF|"); if (op->flags & OF_NUMBER) msg("OF_NUMBER|"); if (op->flags & OF_SHOW) msg("OF_SHOW"); msg("], "); msg("dtyp[%s], ", dtyp_names[op->dtype]); if (op->type == o_reg) msg("reg=%x, ", op->reg); else if (op->type == o_displ || op->type == o_phrase) msg("phrase=%x, ", op->phrase); else msg("reg_phrase=%x, ", op->phrase); msg("addr=%x, ", op->addr); msg("value=%x, ", op->value); msg("specval=%x, ", op->specval); msg("specflag1=%x, ", op->specflag1); msg("specflag2=%x, ", op->specflag2); msg("specflag3=%x, ", op->specflag3); msg("specflag4=%x, ", op->specflag4); msg("refinfo["); opinfo_t buf; if (get_opinfo(&buf, ea, op->n, op->flags)) { msg("target=%x, ",; msg("base=%x, ", buf.ri.base); msg("tdelta=%x, ", buf.ri.tdelta); msg("flags["); if (buf.ri.flags & REFINFO_TYPE) msg("REFINFO_TYPE|"); if (buf.ri.flags & REFINFO_RVAOFF) msg("REFINFO_RVAOFF|"); if (buf.ri.flags & REFINFO_PASTEND) msg("REFINFO_PASTEND|"); if (buf.ri.flags & REFINFO_CUSTOM) msg("REFINFO_CUSTOM|"); if (buf.ri.flags & REFINFO_NOBASE) msg("REFINFO_NOBASE|"); if (buf.ri.flags & REFINFO_SUBTRACT) msg("REFINFO_SUBTRACT|"); if (buf.ri.flags & REFINFO_SIGNEDOP) msg("REFINFO_SIGNEDOP"); msg("]"); } msg("]\n"); } } #endif static bool ana_addr = 0; static ssize_t idaapi hook_idp(void *user_data, int notification_code, va_list va){ switch (notification_code) { case processor_t::ev_ana_insn: { insn_t *out = va_arg(va, insn_t*); if (ana_addr) break; ana_addr = 1; if (ph.ana_insn(out) <= 0) { ana_addr = 0; break; } ana_addr = 0; #ifdef _DEBUG print_insn(out); #endif for (int i = 0; i < UA_MAXOP; ++i) { op_t &op = out->ops[i]; #ifdef _DEBUG print_op(out->ea, &op); #endif } return out->size; } break; case processor_t::ev_emu_insn: { const insn_t *insn = va_arg(va, const insn_t*); } break; case processor_t::ev_out_mnem: { outctx_t *outbuffer = va_arg(va, outctx_t *); //outbuffer->out_custom_mnem(mnem); //return 1; } break; default: { #ifdef _DEBUG if (my_dbg) { msg("msg = %d\n", notification_code); } #endif } break; } return 0; } //--------------------------------------------------------------------------static int idaapi init(void){ if (init_plugin()) { plugin_inited = true; my_dbg = false; hook_to_notification_point(HT_IDP, hook_idp, NULL); print_version(); return PLUGIN_KEEP; } return PLUGIN_SKIP; } //--------------------------------------------------------------------------static void idaapi term(void){ if (plugin_inited) { unhook_from_notification_point(HT_IDP, hook_idp); plugin_inited = false; } } //--------------------------------------------------------------------------static bool idaapi run(size_t /*arg*/){ return false; } //--------------------------------------------------------------------------const char comment[] = NAME; const char help[] = NAME; //--------------------------------------------------------------------------//// PLUGIN DESCRIPTION BLOCK////--------------------------------------------------------------------------plugin_t PLUGIN = { IDP_INTERFACE_VERSION, PLUGIN_PROC | PLUGIN_MOD, // plugin flags init, // initialize term, // terminate. this pointer may be NULL. run, // invoke plugin comment, // long comment about the plugin // it could appear in the status line // or as a hint help, // multiline help about the plugin NAME, // the preferred short name of the plugin "" // the preferred hotkey to run the plugin};


Fixing the bugs (filling out the template)
The print_op() and print_insn() functions are needed so we can see which flags have been set by the current processor module for certain instructions. This is necessary for finding flags for available opcodes so we can use them in our fix.

The body of our "fixer" is the hook_idp() function. For our needs, we will need to implement three callbacks in it:

  1. processor_t::ev_ana_insn: this is needed if the processor module does not implement some opcodes
  2. processor_t::ev_emu_insn: here you can make cross-references to data/code to which new opcodes refer (or old ones do not refer)
  3. processor_t::ev_out_mnem: new opcodes need to get output somehow—so that's what goes on here

The init_plugin() function will stop our fixer from launching on other processor modules.
And most importantly, we hook the whole callback to processor module events:

hook_to_notification_point(HT_IDP, hook_idp, NULL);

The trick with the ana_addr global variable is necessary so that ana_insn does not get stuck in recursion when trying to get information about an instruction that we do not parse manually. This crutch has been around for a long time since early versions, and alas, doesn't seem to be going away anytime soon.
Fix for Glitch #1
Finding a solution required a lot of mucking about with the debugger output that I implemented for the purpose. I knew that in some cases, IDA successfully outputs references relative to PC (for instructions where a jump is made based on an offset table that is near the current instruction plus register index); but proper display of addressing had not been implemented for the lea instruction. Ultimately, I found one such instruction with a jump and figured out which flags are needed to make PC display properly with parentheses:

case processor_t::ev_ana_insn: { insn_t *out = va_arg(va, insn_t*); if (ana_addr) break; ana_addr = 1; if (ph.ana_insn(out) <= 0) { ana_addr = 0; break; } ana_addr = 0; for (int i = 0; i < UA_MAXOP; ++i) { op_t &op = out->ops[i]; switch (op.type) { case o_near: case o_mem: { if (out->itype != 0x76 || op.n != 0 || (op.phrase != 0x09 && op.phrase != 0x0A) || (op.addr == 0 || op.addr >= (1 << 23)) || op.specflag1 != 2) // lea table(pc),Ax break; short diff = op.addr - out->ea; if (diff >= SHRT_MIN && diff <= SHRT_MAX) { out->Op1.type = o_displ; out->Op1.offb = 2; out->Op1.dtype = dt_dword; out->Op1.phrase = 0x5B; out->Op1.specflag1 = 0x10; } } break; } } return out->size; } break;
Fix for Glitch #2
Here things are simple. We simply mask addresses for a specific range: 0xFF0000-0xFFFFFF (for RAM) and 0xC00000–0xC000FF (for VDP video memory). The main thing is to filter by operand type with o_near and o_mem.

case processor_t::ev_ana_insn: { insn_t *out = va_arg(va, insn_t*); if (ana_addr) break; ana_addr = 1; if (ph.ana_insn(out) <= 0) { ana_addr = 0; break; } ana_addr = 0; for (int i = 0; i < UA_MAXOP; ++i) { op_t &op = out->ops[i]; switch (op.type) { case o_near: case o_mem: { op.addr &= 0xFFFFFF; // for any mirrors if ((op.addr & 0xE00000) == 0xE00000) // RAM mirrors op.addr |= 0x1F0000; if ((op.addr >= 0xC00000 && op.addr <= 0xC0001F) || (op.addr >= 0xC00020 && op.addr <= 0xC0003F)) // VDP mirrors op.addr &= 0xC000FF; } break; } } return out->size; } break;

Fix for Glitch #3
To add the opcodes, what we need to do is:

1. Define indexes for the new opcodes. All new indexes should start with: CUSTOM_INSN_ITYPE

enum m68k_insn_type_t{ M68K_linea = CUSTOM_INSN_ITYPE, M68K_linef, };

2. The lineA/lineF opcodes trigger when the following bytes are encountered in the code: 0xA0/0xF0. So we read one byte.

3. Get reference to the handling vector. The interrupt vectors are located in the first 64 dwords of the header, in my case. The lineA/lineF handlers are at positions 0x0A and 0x0B:

value = get_dword(0x0A * sizeof(uint32)); // ...value = get_dword(0x0B * sizeof(uint32));

4. In ev_emu_insn we add cross-references for handlers and the following instruction to avoid interrupting the code flow:

insn->add_cref(insn->Op1.addr, 0, fl_CN); // code ref insn->add_cref(insn->ea + insn->size, insn->Op1.offb, fl_F); // flow ref

5. In ev_out_mnem we output our custom opcode:

const char *mnem = (outbuffer->insn.itype == M68K_linef) ? "line_f" : "line_a"; outbuffer->out_custom_mnem(mnem);

enum m68k_insn_type_t{ M68K_linea = CUSTOM_INSN_ITYPE, M68K_linef, }; /* after includes */ case processor_t::ev_ana_insn: { insn_t *out = va_arg(va, insn_t*); if (ana_addr) break; uint16 itype = 0; ea_t value = out->ea; uchar b = get_byte(out->ea); if (b == 0xA0 || b == 0xF0) { switch (b) { case 0xA0: itype = M68K_linea; value = get_dword(0x0A * sizeof(uint32)); break; case 0xF0: itype = M68K_linef; value = get_dword(0x0B * sizeof(uint32)); break; } out->itype = itype; out->size = 2; out->Op1.type = o_near; out->Op1.offb = 1; out->Op1.dtype = dt_dword; out->Op1.addr = value; out->Op1.phrase = 0x0A; out->Op1.specflag1 = 2; out->Op2.type = o_imm; out->Op2.offb = 1; out->Op2.dtype = dt_byte; out->Op2.value = get_byte(out->ea + 1); } return out->size; } break; case processor_t::ev_emu_insn: { const insn_t *insn = va_arg(va, const insn_t*); if (insn->itype == M68K_linea || insn->itype == M68K_linef) { insn->add_cref(insn->Op1.addr, 0, fl_CN); insn->add_cref(insn->ea + insn->size, insn->Op1.offb, fl_F); return 1; } } break; case processor_t::ev_out_mnem: { outctx_t *outbuffer = va_arg(va, outctx_t *); if (outbuffer->insn.itype != M68K_linea && outbuffer->insn.itype != M68K_linef) break; const char *mnem = (outbuffer->insn.itype == M68K_linef) ? "line_f" : "line_a"; outbuffer->out_custom_mnem(mnem); return 1; } break;
Fix for Glitch #4Find the opcode for the trap instruction, get the index from the instruction, and take the handler vector at that index. The result resembles the following:

case processor_t::ev_emu_insn: { const insn_t *insn = va_arg(va, const insn_t*); if (insn->itype == 0xB6) // trap #X { qstring name; ea_t trap_addr = get_dword((0x20 + (insn->Op1.value & 0xF)) * sizeof(uint32)); get_func_name(&name, trap_addr); set_cmt(insn->ea, name.c_str(), false); insn->add_cref(trap_addr, insn->Op1.offb, fl_CN); return 1; } } break;

Fix for Glitch #5
Things are clear-cut here too: first filter by the movea.w operation. Then, if the operand is of the word type and refers to RAM, we make a snazzy reference relative to the base 0xFF0000. Here's how this looks:

case processor_t::ev_ana_insn: { insn_t *out = va_arg(va, insn_t*); if (ana_addr) break; ana_addr = 1; if (ph.ana_insn(out) <= 0) { ana_addr = 0; break; } ana_addr = 0; for (int i = 0; i < UA_MAXOP; ++i) { op_t &op = out->ops[i]; switch (op.type) { case o_imm: { if (out->itype != 0x7F || op.n != 0) // movea break; if (op.value & 0xFF0000 && op.dtype == dt_word) { op.value &= 0xFFFF; } } break; } } return out->size; } break; case processor_t::ev_emu_insn: { const insn_t *insn = va_arg(va, const insn_t*); for (int i = 0; i < UA_MAXOP; ++i) { const op_t &op = insn->ops[i]; switch (op.type) { case o_imm: { if (insn->itype != 0x7F || op.n != 0 || op.dtype != dt_word) // movea break; op_offset(insn->ea, op.n, REF_OFF32, BADADDR, 0xFF0000); } break; } } } break;

Making fixes to modules can be rather involved, especially when it's something more complicated than simply implementing unknown opcodes.

This can mean hours of debugging the current implementation and trying to figure out how it all works (perhaps with some reversing of the module itself). But the results are well worth it.

Link to code:

Author: Vladimir Kononovich, Positive Technologies

Advanced attacks on Microsoft Active Directory: detection and mitigation

Positive Research Center - 11 Říjen, 2018 - 15:35
Attacks on Microsoft Active Directory have been a recurrent topic of reports on Black Hat and Defcon during the last four years. Speakers tell about new vectors, share their inventions, and give recommendations on detection and avoidance of these vectors. I believe that the IT department is capable of creating a secure infrastructure, which can be monitored by the security department. High-quality monitoring, in its turn, requires good tools. That's like a military base: you have erected guard towers around the perimeter but still keep watch over the area.

Six strategies that would not be overlookedNumerous vendors provide security software that supports monitoring of the following malicious activities:

Pass-the-HashThis attack is conditioned by architecture of NTLM, an authentication protocol created by Microsoft in 1990s. Logging in to a remote host requires password hash stored on the computer that is used for the authentication process. Therefore, the hash can be extracted from that computer.

MimikatzTo achieve that, a French researcher Benjamin Delpy developed in 2014 Mimikatz, the utility that allows dumping cleartext passwords and NTLM hashes from the computer memory.

Brute ForceIf credentials extracted from one host are not enough, the attacker can opt for a rough but effective technique of guessing the password.

net user /domainWhere do we take a username dictionary to conduct this attack? Any domain member is allowed to execute the net user /domain command that returns a full list of AD domain users.

KerberoastingIf a domain uses Kerberos as the authentication protocol, an attacker can try a Kerberoasting attack. Any user authenticated on the domain can request a Kerberos ticket for access to the service (Ticket Granting Service). TGS is encrypted with the password hash of the account used to run the service. The attacker who requested the TGS can now bruteforce it offline without any fear of being blocked. In case of success, the attacker gains the password to the account associated with the service, usually a privileged one.

PsExecAs soon as the attacker obtains the required credentials, the next task is remote command execution. This task can be easily solved using the PsExec utility from the Sysinternals set, which proved remarkably effective and is appreciated by both IT administrators and hackers.

Seven spells of hackersNow we are going to review seven spells of hackers that can help to gain full control over Active Directory.

[The figure shows four steps of the attack. Each step features a set of methods]

Let's start with reconnaissance.

PowerViewPowerView is a part of PowerSploit, a well-known PowerShell framework for penetration testing. PowerView supports Bloodhound, the tool that gives graph representation of object connections in AD.

Graph representation of relationships between Active Directory objects
Bloodhound immediately provides such possibilities as:

  • Finding accounts of all domain administrators
  • Finding hosts, on which domain administrators are logged
  • Finding the shortest path from the attacker's host to the host with the domain admin session

The last possibility replies to the question, what hosts need to be hacked to get to the domain admin account. This approach significantly reduces the time required for gaining full control over the domain.

The difference between PowerView and built-in utilities that allow obtaining data on AD objects (such as net.exe) is that PowerView uses LDAP, not SAMR. To detect this activity, we can use domain controller event 1644. Logging of this event is enabled by adding the relevant value in the register:

HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\NTDS\Diagnostic\\15 Field Engineering = 5

Enabling logging of LDAP event 1644Event 1644 with properties of an LDAP query

Note that there can be multiple events of this kind, and a good alternative to analysis of events is analysis of traffic, because LDAP is a cleartext protocol and all queries are seen clearly in traffic.

LDAP SearchRequestOne more feature of the framework is that it uses only PowerShell and has no dependencies. Moreover, PowerShell v5 has a new option of advanced audit, which is very useful in detection. Event 4104 shows the script body, which can be searched for function names that are specific for PowerView.

SPN ScanThis utility can be a substitute for Nmap. As soon as a hacker knows what users and groups exist in AD, he or she needs information about services to get the whole picture.

Commonly, scanning ports with Nmap provides this information. But now these data can be retrieved from AD—they are already stored there. The query result looks as follows: the reply returns an SPN (Service Principal Name) that consists of a service class, which is unique for each service type, host name in FQDN format, and port for some services.

Examples of SPN for different servicesFor the full list of SPNs, see

To detect SPN Scan, audit of LDAP events can be used.

SPN Scan has a clear advantage over Nmap: it is less noisy. When you use Nmap, you need to connect to each host and send packets to the range of ports specified by you, whereas the SPN list is a consequence of only one query.

Remote Sessions EnumerationDuring the phase that is called lateral movement, an important task is to match users with computers they are logged on: the attacker either already has the user credentials (hash or a Kerberos ticket) and searches for hosts to log in flawlessly, or searches for the host with a session of the domain administrator.

Both cases trigger the following scenario: hunt –> compromise any host –> upload Mimikatz – > profit.

To detect the use of this scenario, two events can be used: 4624—a successful logon to a remote system (logon type 3), and events of access to the network share IPC$, with one nuance, the named pipe "srvsvc". Why the pipe is named like this can be guessed from the traffic.

Red boxes in the left part show SMB connections, and then connections to the pipe "srvsvc". This pipe allows interacting via the Server Service Remote Protocol. End hosts receive various administrative information from the pipe; for example, one of the requests is NetSessEnum. This request returns a full list of users logged in to the remote system with their IP addresses and names.

MaxPatrol SIEM allows detection based on correlation of these two events with srvsvc taken into account. PT Network Attack Discovery performs similar detection based on traffic analysis.

Overpass-the-HashPass-the-Hash reincarnation. Let's continue with lateral movement. What an attacker can do with NTLM hash on hand? Conduct a Pass-the-Hash attack. But it is already well-known and can be detected. Therefore, a new attack vector—Overpass-the-Hash—has been found.

The Kerberos protocol was developed specifically to prevent sending user passwords over the network in any form. To avoid that, the user encrypts an authentication request using password hash on its own computer. A Key Distribution Center (a special service running on the domain controller) replies with a ticket to get other tickets, the so-called Ticket-Granting Ticket (TGT). Now the client is deemed authenticated and can request tickets to access other services within the next 10 hours. Therefore, if the attacker dumps hash of a user who is a member of a trusted group of the target service (for example, ERP system or database), the attacker can issue a ticket for itself and successfully log in to the target service.

How to detectIf a hacker uses the PowerShell version of Mimikatz for an attack, logging the script body would help, because Invoke-Mimikatz is quite an indicative line.

Another symptom is event 4688—creating a process with extended audit of the command line. Even if a binary file is renamed, the command line would contain the command, which is very peculiar to Mimikatz.

If you want to detect an Overpass-the-Hash attack by analyzing traffic, the following anomaly can be used: Microsoft recommends using AES256 encryption for authentication requests in modern domains, and if Mimikatz sends authentication request data, it encrypts the data with an outdated ARCFOUR.

Another specific feature is the cipher suite sent by Mimikatz, which is different from a legitimate domain's suite, and thus stands out in the traffic.

Golden TicketA well-known method.

What can an attacker get out of password hash of a special account called krbtgt? Previously, we reviewed a case where the user could be unprivileged. Now, we review a case where the user's password hash is used for signing absolutely all tickets for gaining other tickets (TGT). There is no need to address to a Key Distribution Center: an attacker can generate this ticket, because a Golden Ticket is in fact a TGT. Then the attacker can send authentication requests on any service in AD for an unlimited period. The result is unrestricted access to target resources—Golden Ticket has its name for a reason.

How to detect based on events Event 4768 informs that a TGT was granted, and event 4769 informs that a service ticket required for authentication on some service in AD was granted.

In this case, we can speculate in difference: while Golden Ticket does not request a TGT from the domain controller (because it generates the TGT itself), it has to request a TGS. Therefore, if we see that obtained TGT and TGS differ, we can assume that the Golden Ticket attack is underway.
MaxPatrol SIEM uses table lists to log all issued TGTs and TGSs to implement this method of detection.

WMI Remote ExecutionAfter being authenticated and authorized on target hosts, the hacker can start remote execution of tasks. WMI is a built-in mechanism that fits perfectly for this purpose. For the last few years, living off the land is on trend, which means using built-in Windows mechanisms. The main reason is the ability to mimic legitimate activities.

The figure below demonstrates the use of wmic, a built-in utility. The utility is given a host address for connection, credentials, the process call create operator, and a command to be executed on the remote host.

How to detect Check the combination of remote logon events 4624. An important parameter is the Logon ID. Also check event 4688 that informs about creating a process with the command line. Event 4688 shows that the parent of the created process is WmiPrvSE.exe, a special wmi service process used for remote administration. We can see the command net user /add sent by us, and the Logon ID, which is the same as in event 4624. Thus, we can tell absolutely precisely from which host this command was initiated.

Detection based on traffic analysisWe can clearly see words typical of Win32 process create, and the command line, which is going to be executed. The malware on the figure below was distributed on virtual networks in the same way as WannaCry, but instead of encryption, it set up a crypto miner. The malware used Mimikatz and EthernalBlue, dumped accounts, and used them to log in to hosts it could reach on the network. Using WMI, the malware ran PowerShell on these hosts and downloaded PowerShell payload, which also contained Mimikatz, EthernalBlue, and a miner. Thus, a chain reaction was created.


  1. Use complex and long passwords (at least 25 symbols) for service accounts. An attacker would not have any chance to conduct a Kerberoasting attack, because it takes too much time to bruteforce such passwords. 
  2. Enable PowerShell logging. This would help to reveal the use of various modern tools for attacks on AD. 
  3. Upgrade to Windows 10 and Windows Server 2016.  Microsoft created Credential Guard: it prevents dumping of NTLM hashes and Kerberos tickets. 
  4. Implement role-based access control. It is risky to assign permissions of AD, DC, server, and workstation admins to one role. 
  5. Reset the krbtgt (account used for signing TGT) password twice every year and every time the AD administrator changes. It is important to change the password twice, because the current and the previous passwords are stored. Even if a network is compromised and attackers issued a Golden Ticket, changing the password makes this Ticket useless, and they have to bruteforce the password once again. 
  6. Use protection tools with a continuously updating expert database. This helps revealing current ongoing attacks.

On January 24, 2018, Benjamin Delpy and Vincent Le Toux released during the Microsoft BlueHat in Israel a new Mimikatz module that implements the DCShadow attack. The idea of the attack is to create a rogue domain controller to replicate objects in AD. The researchers defined a minimum set of Kerberos SPNs required for successful replication—only two SPNs are required. They also showed a special function that can start forced replication of controllers. The authors assure that this attack can make your SIEM go blind. A rouge domain controller would not send events to SIEM, which means that attackers can do various evil tricks with AD and SIEM, and nobody would know about that.

The attack scheme:

Two SPNs should be added to the system the attack is run from. These SPNs are required for other domain controllers to authenticate using Kerberos for replication. Because the domain controller is represented as the object of class nTDSDSA in the AD base according to the specification, such an object should be created. Finally, replication is triggered by the DRSReplicaAdd function.

How to detect This is what DCShadow looks like in traffic. By analyzing the traffic, we can clearly see that a new object is added to the domain controller configuration scheme, and then replication is triggered.

Although the attack creators say that SIEM would not help in its detection, we found a way to inform the security department about suspicious activity on the network.

Our correlation has a list of legitimate domain controllers, and it would be triggered at every replication from a domain controller, which not included into this whitelist. Therefore, the security department can conduct an investigation to check if that is a legitimate domain controller added by the IT service, or a DCShadow attack.

An example of DCShadow confirms that new enterprise attack vectors appear. It is essential to stay on the crest of the wave in this ocean of information security, look ahead, and act quickly.

Every day, we at PT Expert Security Center research new threats and develop methods and tools to detect them. And we'll continue sharing this information with you

Author: Anton Tyurin, Head of Attack Detection Team, Positive Technologies

How STACKLEAK improves Linux kernel security

Positive Research Center - 11 Říjen, 2018 - 14:32

STACKLEAK is a Linux kernel security feature initially developed by Grsecurity/PaX. I'm working on introducing STACKLEAK into the Linux kernel mainline. This article describes the inner workings of this security feature and why the vanilla kernel needs it.

In short, STACKLEAK is needed because it mitigates several types of Linux kernel vulnerabilities, by:

  •  Reducing the information that can be revealed to an attacker by kernel stack leak bugs,
  •  Blocking some uninitialized stack variable attacks,
  •  Detecting kernel stack overflow during Stack Clash attack against Linux Kernel.

This security feature fits the mission of the Kernel Self Protection Project (KSPP): security is more than just fixing bugs. Fixing absolutely all bugs is impossible, which is why the Linux kernel has to fail safely in case of an error or vulnerability exploitation. More details about KSPP are available on its wiki.

STACKLEAK was initially developed by the PaX Team, going as PAX_MEMORY_STACKLEAK in the Grsecurity/PaX patch. But this patch is no longer freely available to the Linux kernel community. So I took its last public version for the 4.9 kernel (April 2017) and got to work. The plan has been as follows:

  • First extract STACKLEAK from the Grsecurity/PaX patch.
  • Then carefully study the code and create a new patch.
  • Send the result to the Linux kernel mailing list (LKML), get feedback, make improvements, and repeat until the code is accepted into the mainline.

As of October 9, 2018, the 15th version of the STACKLEAK patch series has been submitted. It contains the common code and x86_64/x86_32 support. The arm64 support developed by Laura Abbott from Red Hat has already been merged into mainline kernel v4.19.

Security features
Most importantly, STACKLEAK erases the kernel stack at the end of syscalls. This reduces the information that can be revealed through some kernel stack leak bugs. An example of such an information leak is shown in Figure 1.

Figure 1. Kernel stack leak exploitation, pre-STACKLEAKHowever, these leaks become useless for the attacker if the used part of the kernel stack is filled by some fixed value at the end of a syscall (Figure 2).

Figure 2. Kernel stack leak exploitation, post-STACKLEAKHence, STACKLEAK blocks exploitation of some uninitialized kernel stack variable vulnerabilities, such as CVE-2010-2963 and CVE-2017-17712. For a description of exploitation of vulnerability CVE-2010-2963, refer to the article by Kees Cook.

Figure 3 illustrates an attack on an uninitialized kernel stack variable.

Figure 3. Uninitialized kernel stack variable exploitation, pre-STACKLEAKSTACKLEAK mitigates this type of attack because at the end of a syscall, it fills the kernel stack with a value that points to an unused hole in the virtual memory map (Figure 4).

Figure 4. Uninitialized kernel stack variable exploitation, post-STACKLEAK
There is an important limitation: STACKLEAK does not help against similar attacks performed during a single syscall.

Runtime detection of kernel stack depth overflow
In the mainline kernel, STACKLEAK would be effective against kernel stack depth overflow only in combination with CONFIG_THREAD_INFO_IN_TASK and CONFIG_VMAP_STACK (both introduced by Andy Lutomirski).

The simplest type of stack depth overflow exploit is shown in Figure 5.

Figure 5. Stack depth overflow exploitation: mitigation with CONFIG_THREAD_INFO_IN_TASKOverwriting the thread_info structure at the bottom of the kernel stack allows an attacker to escalate privileges on the system. However, CONFIG_THREAD_INFO_IN_TASK moves thread_info out of the thread stack and therefore mitigates such an attack.

There is a more complex variant of the attack: make the kernel stack grow beyond  the end of the kernel's preallocated stack space and overwrite security-sensitive data in a neighboring memory region (Figure 6). More technical details are available in:

Figure 6. Stack depth overflow exploitation: a more complicated version
CONFIG_VMAP_STACK protects against such attacks by placing a special guard page next to the kernel stack (Figure 7). If accessed, the guard page triggers an exception.

Figure 7. Stack depth overflow exploitation: mitigation with guard pagesFinally, the most interesting version of a stack depth overflow attack is a Stack Clash (Figure 8). Gael Delalleau published this idea in 2005. It was later revisited by the Qualys Research Team in 2017. In essence, it is possible to jump over a guard page and overwrite data from a neighboring memory region using Variable Length Arrays (VLA).

Figure 8. Stack Clash attackSTACKLEAK mitigates Stack Clash attacks against the kernel stack. More information about STACKLEAK and Stack Clash is available on the grsecurity blog.

To prevent a Stack Clash in the kernel stack, a stack depth overflow check is performed before each alloca() call. This is the code from v14 of the patch series:

void __used stackleak_check_alloca(unsigned long size)
       unsigned long sp = (unsigned long)&sp;
       struct stack_info stack_info = {0};
       unsigned long visit_mask = 0;
       unsigned long stack_left;

       BUG_ON(get_stack_info(&sp, current, &stack_info, &visit_mask));

       stack_left = sp - (unsigned long)stack_info.begin;

       if (size >= stack_left) {
                * Kernel stack depth overflow is detected, let's report that.
                * If CONFIG_VMAP_STACK is enabled, we can safely use BUG().
                * If CONFIG_VMAP_STACK is disabled, BUG() handling can corrupt
                * the neighbour memory. CONFIG_SCHED_STACK_END_CHECK calls
                * panic() in a similar situation, so let's do the same if that
                * option is on. Otherwise just use BUG() and hope for the best.
               panic("alloca() over the kernel stack boundary\n");

However, this functionality was excluded from the 15th version of the STACKLEAK patch series. The main reason is that Linus Torvalds has forbidden use of BUG_ON() in kernel hardening patches. Moreover, during discussion of the 9th version, the maintainers decided to remove all VLAs from the mainline kernel. There are 15 kernel developers participating in that work, which will be finished soon.

Performance impact
Cursory performance testing was performed on x86_64 hardware: Intel Core i7-4770, 16 GB RAM.

Test 1, looking good: compiling the Linux kernel on one CPU core.

    # time make
    Result on 4.18:
        real 12m14.124s
        user 11m17.565s
        sys 1m6.943s
    Result on 4 .18+stackleak:
        real 12m20.335s (+0.85%)
        user 11m23.283s
        sys 1m8.221s

Test 2, not so hot:

    # hackbench -s 4096 -l 2000 -g 15 -f 25 –P
    Average on 4.18: 9.08 s
    Average on 4.18+stackleak: 9.47 s (+4.3%)

In summary: the performance penalty varies for different workloads. Test STACKLEAK on your expected workload before deploying it in production.

Inner workings
STACKLEAK consists of:

  • The code that erases the kernel stack at the end of syscalls,
  • The GCC plugin for kernel compile-time instrumentation.

Erasing the kernel stack is performed in the stackleak_erase() function. This function runs before returning from a syscall to userspace and writes STACKLEAK_POISON (-0xBEEF) to the used part of the thread stack (Figure 10). For speed, stackleak_erase() uses the lowest_stack variable as a starting point (Figure 9). This variable is regularly updated in stackleak_track_stack() during system calls.

Figure 9. Erasing the kernel stack with stackleak_erase()Figure 10. Erasing the kernel stack with stackleak_erase(), continuedKernel compile-time instrumentation is handled by the STACKLEAK GCC plugin. GCC plugins are compiler-loadable modules that can be project-specific. They register new compilation passes via the GCC Pass Manager and provide the callbacks for these passes.

So the STACKLEAK GCC plugin inserts the aforementioned stackleak_track_stack() calls for the functions with a large stack frame. It also inserts the stackleak_check_alloca() call before alloca and the stackleak_track_stack() call after it.

As I already mentioned, inserting stackleak_check_alloca() was dropped in the 15th version of the STACKLEAK patch series.

The way to the mainline
The path of STACKLEAK to the Linux kernel mainline is very long and complicated (Figure 11).

Figure 11. The way to the mainlineIn April 2017, the authors of grsecurity made their patches commercial. In May 2017, I decided to work on upstreaming STACKLEAK. It was the beginning of a very long story. My employer Positive Technologies allows me to spend a part of my working time on this task, although I mainly spend my free time on it.

As of October 9, 2018, the 15th version of the STACKLEAK patch series is contained in the linux-next branch. It fits Linus' requirements and is ready for the merge window of the 4.20/5.0 kernel release.

I gave a talk on STACKLEAK at the Linux Security Summit NA 2018. Slides and video is already available.

STACKLEAK is a very useful Linux kernel self-protection feature that mitigates several types of vulnerabilities. Moreover, the PaX Team has made it rather fast and technically beautiful. Considering the substantial work done in this direction, upstreaming STACKLEAK would benefit Linux users with high information security requirements and also focus the attention of the Linux developer community on kernel self-protection.

Author: Alexander Popov, Positive Technologies

Intel ME Manufacturing Mode: obscured dangers and their relationship to Apple MacBook vulnerability CVE-2018-4251

Positive Research Center - 2 Říjen, 2018 - 13:51

The weakness of "security through obscurity" is so well known as to be obvious. Yet major hardware manufacturers, citing the need to protect intellectual property, often require a non-disclosure agreement (NDA) before allowing access to technical documentation. The situation has become even more difficult with the growing intricacy of chip designs and integration of proprietary firmware. Such obstacles make it nearly impossible for independent researchers to analyze the security of these platforms. As a result, both ordinary users and hardware manufacturers lose out.

One example is Intel Management Engine (Intel ME), including its server (Intel SPS) and mobile (Intel TXE) versions (for background on Intel ME, we recommend consulting  [5] and [6]). In this article, we will describe how undocumented commands (although "undocumented" applies to practically everything about Intel ME) enable overwriting SPI flash memory and implementing the doomsday scenario: local exploitation of an ME vulnerability (INTEL-SA-00086). At the root of this problem is an undocumented Intel ME mode, specifically, Manufacturing Mode.

What is Manufacturing Mode?Intel ME Manufacturing Mode is intended for configuration and testing of the end platform during manufacturing, and as such should be disabled (closed) before sale and shipment to users. However, this mode and its potential risks are not described anywhere in Intel's public documentation. Ordinary users do not have the ability to disable this mode, since the relevant utility (part of Intel ME System Tools) is not officially available. As a result, there is no software that can protect, or even notify, the user if this mode is enabled for whatever reason. Even Chipsec [2], a utility specially designed to identify configuration errors in the chipset and CPU at the level of UEFI firmware (such as incorrect configuration of access rights for SPI flash regions), does not know anything about Intel Manufacturing Mode.

This mode allows configuring critical platform settings stored in one-time-programmable memory (FUSEs). These settings include those for BootGuard (the mode, policy, and hash for the digital signing key for the ACM and UEFI modules). Some of them are referred to as FPFs (Field Programmable Fuses). For a list of FPFs that can be written to FUSEs (a list that is incomplete, since a number of FPFs cannot be set directly), you can use the FPT (Flash Programming Tool) utility from Intel ME System Tools.

Figure 1. Output of the -FPFs option in FPTFPFs account for only a part of the FUSE array: instead, most are used by Intel to store platform parameters. Part of this space is called IP FUSEs, used to store the settings of IP (Intelligent Property, hardware logic blocks) units. Thus, the DFx Aggregator special device stores in FUSEs a sign of whether the platform is for testing or mass production.

In addition to FPFs, in Manufacturing Mode the hardware manufacturer can specify settings for Intel ME, which are stored in the Intel ME internal file system (MFS) on SPI flash memory. These parameters can be changed by reprogramming the SPI flash. The parameters are known as CVARs (Configurable NVARs, Named Variables).

Setting CVARs is the responsibility of the Intel ME module named mca_server. MCA is short for "Manufacture-Line Configuration Architecture," which is the general name for the process of configuring the platform during manufacturing. CVARs, just like FPFs, can be set and read via FPT.

Figure 2. List of CVARs output by FPT for the Broxton P platformThe list of CVARs depends on the platform and version of Intel ME. For chipsets supporting Intel AMT, one of the CVARs is the password for entering MEBx (ME BIOS Extension).

Setting FPFs, or almost any CVARs, requires that Intel ME be in Manufacturing Mode. The process of assigning FPFs consists of two steps: setting the values for FPFs (which are saved to temporary memory) and committing the FPF values to the FUSEs. The first step is possible only in Manufacturing Mode, but the actual "burn" occurs automatically after Manufacturing Mode is closed if, while in that mode, the manufacturer set FPF values and the corresponding range in the FUSE array has never been written to before. So, if a system is in Manufacturing Mode, the FPFs have likely never been initialized.

A sign of Manufacturing Mode having been closed is stored in the file /home/mca/eom on MFS. When the SPI flash is overwritten by firmware with the basic file system (just after build by FIT [9]), the platform can once again function in Manufacturing Mode, although overwriting FUSEs is no longer possible.

OEM public keyAccordingly, the procedure for configuring Intel platforms is rather complicated and consists of multiple steps. Any error or deviation from this procedure by hardware manufacturers places the platform at serious risk. Even if Manufacturing Mode has been closed, a manufacturer may not have set FPFs, which allows attackers to do so themselves by writing their own values for example instead of the key for signing the start code of the BootGuard (AСM) and UEFI modules. In this case, the platform would load only with the attacker's malicious code—and persistently so. This would lead to irreversible hardware compromise, since the attacker's key is written to permanent memory, from which it can never be removed (for details of this attack, see "Safeguarding rootkits: Intel BootGuard" by Alexander Ermolov [8]).

On newer systems (Apollo Lake, Gemini Lake, Cannon Point) FPFs store not just the key for BootGuard, but the OEM's public key (strictly speaking, the SHA256 hash for the RSA OEM public key), which underpins several ME security mechanisms. For example, the special section of SPI flash named Signed Master Image Profile (SMIP) stores manufacturer-specified PCH Straps (PCH hardware configuration). This section is signed using a key whose SHA256 hash is stored in a special file (partition) on SPI flash. This file name is oem.key in the FTPR partition ( in OEMP partition for Cannon Point PCH) and contains various OEM-provided public keys for signing all sorts of data. In the following figure, you can see a full list of the sets of data signed by the manufacturer, each with a unique key, for the Cannon Point platform:

Figure 3. List of OEM-signed data for the CNP platformThe oem.key file itself is signed with an OEM root key, whose public key’s hash should be written in the FPFs.

Figure 4. OEM signingTherefore, having compromised the OEM root key, an attacker can compromise all previously mentioned data, which is much worse than the Boot Guard–only takeover possible on older platforms.

Bypassing block on writing to the ME regionUntil recently (prior to Intel Apollo Lake), Intel ME was located in a separate SPI region that had independent access rights for the CPU, GBE, and ME. So as long as access attributes were correctly configured, it was impossible to read or write to ME from the CPU (main system) side. However, current SPI controllers for Intel chipsets have a special mechanism called Master Grant. This mechanism assigns a strictly defined portion of SPI flash to each SPI master. A master controls its particular region, regardless of the access rights indicated in the SPI descriptor. Each master can provide access (read or write) for its region (but only its own region!) to any other master it wishes.

                             Figure 5. Excerpt from Intel documentation describing SPI Master GrantWhat this means is that even if the SPI descriptor forbids host access to an SPI region of ME, it is possible for ME to still provide access. In our view, this change was likely intended to enable updating Intel ME in a way that bypasses the standard process.

Host ME Region Flash Protection OverrideIntel ME implements a special HECI command that allows opening write access to ME SPI region on the CPU side. The command is called HMR FPO (Host ME Region Flash Protection Override). We have detailed this command at length previously [5]. There are some things worth knowing about it.

After receiving the HMR FPO command, Intel ME opens access to the region only after a reset. Intel МЕ itself also includes security measures: the command is accepted only when the UEFI BIOS is owner of the platform boot process, prior to End Of Post (EOP). EOP is a different HECI command that sends the UEFI to ME before handing off control to the operating system (ExitBootServices). Sometimes, BIOS Setup contains an option for sending the HMRFPO command prior to EOP.

Figure 6. Opening the ME region in the BIOSAfter receiving EOP, Intel ME ignores HMR FPO and returns the corresponding error status. But this occurs only after Manufacturing Mode has been closed. Therefore, in Manufacturing Mode, Intel ME accepts HMR FPO at any time, regardless of the presence (or absence) of End Of Post. If the manufacturer has failed to close Manufacturing Mode, an attacker can alter Intel ME at any time (of course, administrator rights are needed, but even the OS kernel initially cannot re-flash Intel ME). At this stage, the attacker can re-flash the ME image, such as to exploit vulnerability INTEL-SA-00086. A reset is then needed to run the modified firmware, but this is no problem on nearly any platform, with the exception of the Apple MacBook. Apple's computers contain an additional check in the UEFI, which runs when the UEFI is launched and blocks startup of the system if the ME region has been opened with HMRFPO. However, as we will show here, this mechanism can be easily bypassed if Intel ME is in Manufacturing Mode.

Resetting ME without resetting the main CPUToday's computers can be restarted in several different ways: the documented versions include a global reset and reset of the main CPU only (without resetting ME). But, if there is a way to reset ME without resetting the main CPU (by running the HMRFPO command in advance as well), access to the region opens up and the main system continues to function.

Figure 7. Reset types
Having investigated the internal ME modules, we discovered that there is a HECI command ("80 06 00 07 00 00 0b 00 00 00 03 00", see more about sending commands in [5]) for a reset of only (!!!) Intel ME. In Manufacturing Mode, this command can be sent at any time, even after EOP:

Figure 8. Disassembler listing for the function responsible for handling HECI ME reset commandsTherefore, an attacker who sends these two HECI commands opens the ME region and can write arbitrary data there, without having to reset the platform as a whole. And it doesn't even matter what the SPI descriptor contains—correctly set protection attributes for SPI regions will not protect ME from modifications if the system is running in Manufacturing Mode.

Exploitation case: vulnerability CVE-2018-4251We analyzed several platforms from a number of manufacturers, including Lenovo and Apple MacBook Prо laptops. The Yoga and ThinkPad computers we examined did NOT have any issues related to Manufacturing Mode. But we found that Apple laptops on Intel chipsets are running in Manufacturing Mode. After this information was reported to Apple, the vulnerability (CVE-2018-4251) was patched in macOS High Sierra update 10.13.5.

Local exploitation of INTEL-SA-00086By exploiting CVE-2018-4251, an attacker could write old versions of Intel ME (such as versions containing vulnerability INTEL-SA-00086) to memory without needing an SPI programmer or access to the HDA_SDO bridge—in other words, without physical access to the computer. Thus, a local vector is possible for exploitation of INTEL-SA-00086, which enables running arbitrary code in ME.
Notably, in the notes for the INTEL-SA-00086 security bulletin, Intel does not mention enabled Manufacturing Mode as a method for local exploitation in the absence of physical access. Instead, the company incorrectly claims that local exploitation is possible only if access settings for SPI regions have been misconfigured. So to keep users safe, we decided to describe how to check the status of Manufacturing Mode and how to disable it.

What can users do?Intel System Tools includes MEInfo (and, for mobile and server platforms respectively, TXEInfo and SPSInfo) in order to allow obtaining thorough diagnostic information about the current state of ME and the platform overall. We demonstrated this utility in previous research about the undocumented HAP (High Assurance Platform) mode and how to disable ME [6]. The utility, when called with the -FWSTS flag, displays a detailed description of status HECI registers and the current status of Manufacturing Mode (when the fourth bit of the FWSTS status register is set, Manufacturing Mode is active).

Figure 9. Example of MEInfo outputWe also created a program [7] for checking the status of Manufacturing Mode if the user for whatever reason does not have access to Intel ME System Tools. Here is what the script shows on affected systems:

Figure 10. mmdetect scriptSo one logical question is, how can users close Manufacturing Mode themselves if the manufacturer has failed to do so? To disable Manufacturing Mode, FPT has a special option (-CLOSEMNF) that in addition to its main purpose also allows setting the recommended access rights for SPI flash regions in the descriptor.

Here is what happens when we enter -CLOSEMNF:

Figure 11. Process of closing Manufacturing Mode with FPT
In this example, we used the NO parameter for -CLOSEMNF to avoid resetting the platform, as would otherwise happen by default immediately after closing Manufacturing Mode.

ConclusionOur research shows that Intel ME has a Manufacturing Mode problem, and that even giant manufacturers such as Apple are not immune to configuration mistakes on Intel platforms. Worse still, there is no public information on the topic, leaving end users in the dark about weaknesses that could result in data theft, persistent irremovable rootkits, and even "bricking" of hardware.
We also suspect that the ability to reset ME without resetting the main CPU may lead to yet additional security issues, due to the states of the BIOS/UEFI and ME falling out of sync.

[1] Intel Management Engine Critical Firmware Update, Intel-SA-00086
[2] GitHub - chipsec/chipsec: Platform Security Assessment Framework
[4] Fast, secure and flexible OpenSource firmware, Coreboot
[5] Mark Ermolov, Maxim Goryachy, How to Become the Sole Owner of Your PC, PHDays VI, 2016
[6] Mark Ermolov, Maxim Goryachy, Disabling Intel ME 11 via undocumented mode, Positive Technologies blog
[7] Intel ME Manufacturing Mode Detection Tools
[8] Safeguarding rootkits: Intel BootGuard, Alexander Ermolov
[9] Dmitry Sklyarov. Intel ME: Flash File System. Explained

Authors: Maxim Goryachy, Mark Ermolov

How we developed the NIOS II processor module for IDA Pro

Positive Research Center - 28 Září, 2018 - 14:46
IDA Pro has a well-earned place in the toolkit of security researchers worldwide. We at Positive Technologies are no exception. In fact, we like it so much that we developed a disassembler processor module for the NIOS II architecture to make analyzing code faster and more convenient.

Here I will give a brief history of the project and share what exactly it is that we created.

It all started in 2016, when we had to develop a processor module in-house to analyze firmware for some work we were doing. Development started from scratch based on the Nios II Classic Processor Reference Guide, which was the most up-to-date reference at the time. This took about two weeks.

The processor module was developed for IDA version 6.9. IDA Python was the logical choice for the sake of speed. The procs subfolder inside the IDA Pro installation folder, where processor modules are stored, contains three Python modules: msp430, ebc, and spu. These modules offered an example as to module structure and how to implement basic functionality:

  • Parsing instructions and operands
  • Simplifying and displaying same
  • Creating offsets, cross-references, and the code and data to which they refer
  • Handling switch constructions
  • Handling manipulations with the stack and stack variables

This is the functionality I was able to implement at that time, more or less. Fortunately, these labors came in handy again during a different project a year later, during which I actively used and improved the module.

I decided to share this experience creating a processor module with the community at PHDays 8. The talk drew interest (a video is available on the PHDays site) and even Ilfak Guilfanov, the creator of IDA Pro, was in attendance. One of his questions was: is IDA Pro version 7 supported? The answer then was "no" but after the talk, I committed to releasing a module version that would. And that's when things got interesting.

Now there was a newer manual from Intel, which helped to make comparisons and check for bugs. I made big changes to the module, added numerous new features, and fixed some problems that had previously eluded solution. And of course, I added support for version 7 of IDA Pro. This is the result.

NIOS II programming model
NIOS II is an embedded processor developed for FPGAs from Altera (now a part of Intel). From a software standpoint, it has the following notable features: Little Endian byte order, 32-bit address space, 32-bit instruction set (meaning a fixed command length of 4 bytes), and 32 general-purpose registers and 32 special-purpose registers.

Disassembly and code references
So we open in IDA Pro a new file with firmware for the NIOS II processor. After installing the module, we see it in the list of IDA Pro processors. The list is shown in the following screenshot:

Let's say that the module does not yet support even basic parsing of commands. Since each command occupies 4 bytes, we place the bytes in groups of four, which resembles the following:

After we implement basic functionality to decode the instructions and operands, display them on screen, and analyze execution transfer instructions, the set of bytes from our above example turns into the following code:

As the example shows, cross-references with execution transfer commands are formed as well (in this particular case, we see a conditional jump and procedure call).

One useful thing we can implement in processor modules is comments for commands. If we disable display of byte values and enable comments instead, the same code will look as follows:

So if you are dealing with assembler code on an architecture that is new to you, comments can help you to get a feel for what is going on. The remaining code examples here will be given in the same way, with comments, so that you can concentrate on what is happening in the code instead of flipping through the NIOS II manual.

Pseudoinstructions and simplifying commands
Some NIOS II commands are pseudoinstructions. These commands do not have separate opcodes, and they themselves are modeled as special cases of other commands. During disassembly, instructions are simplified: in other words, certain combinations are replaced with pseudoinstructions. There are four types of NIOS II pseudoinstructions:

  • When the zero register (r0) is one of the sources and can be disregarded 
  • When a command has a negative value and the command is replaced with the opposite one
  • When a condition is replaced with the opposite one
  • When a 32-bit offset is moved in two commands (high and low halfword) and this is replaced with a single command

The first two types have been implemented, since replacing the condition does not change much. But 32-bit offsets are more diverse than described in the manual.

Let's see an example of the first type.

The zero register is used frequently in calculations, in our example. If we look closely, all commands (other than execution transfer commands) involve simply moving values to particular registers.

After pseudoinstruction handling has been applied, we get more readable code and instead of the OR and ADD commands, we get MOV instead.

Stack variables
NIOS II has stack support, and besides the stack pointer (sp) it also has a stack frame pointer (fp). Here is an example of a short procedure with use of a stack:

Space on the stack is reserved for local variables. Presumably, the ra register is saved in, and then restored from, a stack variable.

After we have added functionality to the module for monitoring stack pointer changes and creating stack variables, this is how the sample will look:

The code is easier to understand now, and we can name stack variables and study their purpose by referring to the cross-references. The __fastcall function in our example, as well as its arguments in the r4 and r5 registers, are moved to the stack to call a subprocedure with the _stdcall type.

32-bit numbers and offsets
During a single operation (=when performing one command), NIOS II can move a value of maximum size 2 bytes (16 bits) to a register. On the other hand, the processor registers and address space are 32-bit, meaning that 4 bytes are necessary for register addressing.

To overcome this, it is necessary to use offsets consisting of two parts. A similar mechanism is used on PowerPC processors: an offset consists of a high and low part and is moved to the register in two commands. This is how it works on PowerPC:

Cross-references are formed from both commands, although effectively it is the second command that sets the address. This can be inconvenient when trying to count the number of cross-references.

The non-standard type HIGHA16 is used in the properties of the offset for the high part, and sometimes the HIGH16 type is used; LOW16 is used for the low part.

Actually calculating 32-bit numbers from the two parts is not at all difficult. What's difficult is generating operands as the offsets for two separate commands. All of this processing is the job of the processor module. There were no existing examples of how to do this in the IDA SDK (and definitely not any written in Python).

In the PHDays talk, I mentioned offsets as an unresolved task. To solve this, we had to be clever: the 32-bit offset is taken only from the low halfword, relative to the base. The base is calculated as the high halfword shifted 16 bits to the left.

With this approach, a cross-reference is generated only for the command responsible for moving the low halfword of the 32-bit offset.

In the offset properties, we can see the base and property for treating the base address as a plain number, to avoid generating a large number of cross-references to the very same address that is serving as the base.

The NIOS II code contains the following mechanism for moving 32-bit numbers to a register: First, the high halfword of the offset is moved with the movhi command. Then it is joined by the low halfword. This can be accomplished in three different ways (commands): adding (addi), subtracting (subi), and with logical OR (ori).

For example, in the following code the registers are set to 32-bit numbers that are then moved to registers (arguments prior to calling a function):

After we have added offset calculations, we get the following representation of the code:

The resulting 32-bit offset is displayed next to the command for moving its low halfword. This example is rather striking and we could even mentally sum up all the 32-bit numbers with ease, simply by combining the high and low parts. Judging by the values, they are unlikely to be offsets.

Now we will look at a case when subtraction is used for moving a low halfword. Here we can no longer calculate the final 32-bit values (offsets) without effort.

After applying calculation of 32-bit numbers, it looks as follows:

Here we see that if an address is contained in the address space, an offset for it is generated, and the value formed by combining the high and low halfwords is no longer displayed next to it. Here we obtained the offset for the string "10/22/08". To make the final offsets point to valid addresses, we will increase the segment size slightly.

After enlarging the segment, we see that now all the calculated 32-bit numbers are offsets and point to valid addresses.

Earlier, I mentioned that we can also use the logical OR command to calculate offsets. The following code uses this approach to calculate two offsets:

The calculations from register r8 are then placed in the stack.

After conversion, we see that the registers are set to the start addresses of procedures: the procedure address is moved to the stack.

Reading and writing relative to base
So far, the 32-bit number being moved in two commands has been either a number or offset. In the next example, a base is moved to the high halfword of the register and then reading or writing is performed relative to it.

In this case, we get offsets for variables from the read and write commands themselves. Depending on the size of the operation, the size of the variable may be set as well.

Switch constructions
The switch constructions found in binary files can simplify analysis. For example, based on the number of options inside a switch construction, we can localize the switch responsible for handling a particular protocol or system of commands. This is why we want to recognize switch and its parameters. Take the following code:

After execution, it stops on the register jump jmp r2. This is followed by code referenced in the data; the end of each block of code contains a jump to the same label. Thus we can see that this is a switch construction, and that these blocks handle particular cases within it. Above we also see verification of the number of cases and a default jump.

After we add switch handling, the code looks as follows:

Now we clearly make out the jump, address of the table with offsets, number of cases, and each case with corresponding number.

The table with offsets is as follows: To save space, only the first five elements have been listed.

In essence, switch handling involves going through the code (starting with the tail end) and finding all of its components. So say that a particular switch organization scheme is being described. Sometimes schemes can contain exceptions. This is one reason why existing processor modules can fail to recognize seemingly obvious switches. In effect, the real-life switch simply doesn't fit the scheme defined inside the processor module. Or perhaps a scheme exists, but it contains other commands that are not part of the scheme, the locations of main commands have been switched, or the scheme is interrupted by jumps.

The NIOS II processor module recognizes a switch despite the presence of unrelated instructions between main commands, as well as a switch whose main commands have been switched places or one containing disruptive jumps. A reverse execution path approach is used that takes into account possible scheme-disrupting jumps, with setting of internal variables that signal various states of the recognizer. In total, there are approximately 10 different ways of organizing switch that are found in various firmware.

The custom instruction
The NIOS II has an interesting instruction by the name of custom. This instruction gives access to the 256 user-settable instructions supported on the NIOS II. In addition to general-purpose registers, the custom instruction can access a special set of 32 custom registers. After implementing logic for parsing the custom command, here is what we see:

Note that the two final instructions have the same instruction number and seem to perform the same actions.

The custom instruction is the subject of a separate manual. According to the manual, one of the most complete and modern custom instruction sets is the NIOS II Floating Point Hardware 2 Component (FPH2) set of instructions for floating-point computations. This is how our example looks after implementing FPH2 command parsing:

Based on the mnemonic of the two last commands, we indeed see that they perform the same action (the fadds command).

Jumping by register value
In firmware, we often see situations when a 32-bit offset (setting the jump location) is moved to a register and a jump is performed based on the register value.

Have a look at the code:

In the last line, there is a jump by register value. Before it, the address of the procedure (the one starting in the first line of the example) is moved to the register. The jump clearly is to the beginning of the procedure.

This is the result after adding functionality for jump recognition:

Next to the jmp r8 command is the address to which the jump is being made, if we were able to determine it. A cross-reference is also generated between the command and the address of the jump destination. The cross-reference is visible in the first line, while the jump itself occurs in the final line.

gp (global pointer) values: saving and loading
It is common to use a global pointer set to an address and then address variables relative to that pointer. In NIOS II, the gp register is used to store the global pointer. At a certain moment (most often during the firmware initialization procedures), an address value is moved to the gp register. The processor module handles this situation. To illustrate this, we have given examples of code and output from IDA Pro with debug messages enabled for the processor module.

In this case, the processor module finds and calculates the value of the gp register in the new base. When the idb base is closed, the value of gp is saved in the base.

When an existing idb base is loaded and the value of gp has already been found, the value is loaded form the base, as shown in the debug message in the following example:

Reading and writing relative to gp
Reading and writing with an offset relative to gp is a common occurrence. The following example includes three reads and one write of this type:

Since we have already obtained the address value, which is stored in the gp register, we can address these reads and writes.

Handling of gp-relative reading and writing makes things more convenient for us.

We can see which variables are being accessed, track their use, and determine their purpose.

Addressing relative to gp
The gp register can also be used for addressing variables.

For example, here we see that registers are set relative to gp to certain variables or data regions:

Once we drop in functionality to handle this situation by converting to offsets and adding cross-references, here is the result:

Now it is clear what is happening and we can identify the regions to which registers are being set relative to gp.

Addressing relative to sp
Similarly, registers in the next examples are set to certain memory regions, but this time relative to the stack pointer (sp).

As is visible, the registers are set to certain local variables. Such situations when arguments are set to local buffers before procedures calls are fairly common.

After adding handling for these situations (by converting the values to offsets), we obtain the following:

Now it is clear that after the procedure call, the values are loaded from the variables whose addresses were passed in parameters prior to the function call.

Cross-references from code to structure fieldsSetting and using structures in IDA Pro can make code analysis more efficient.

We can see from the code that the field_8 field is incremented and perhaps used as a counter for triggering an event. If the read and write fields are far away from each other in the code, cross-references might be useful.

Let's look at the structure:

Structure fields are accessed, but cross-references from the code to structure elements were not formed.

After these situations are handled, this is how it will all look in our case:

Now there are cross-references to structure fields from the specific commands involving those fields. Direct and reverse cross-references are present as well. And based on various procedures, we can see where the field values are read or written.

Where the manual and reality diverge
According to the manual, during decoding of some commands, certain bits are supposed to take only strictly defined values. For example, for the eret command for returning from an exception, bits 22–26 should equal 0x1E.

Here is a command example from firmware:

When we open different firmware in a place with similar context, something different happens:

These bytes were not automatically converted to a command, although all the commands can be handled. Judging by the context (and even similar address), this should be the same command. But take a close look at the bytes. This is the eret command, except that bits 22–26 equal zero instead of equaling 0x1E.

So we have to slightly tweak the parsing results for the command: although it doesn't exactly correspond to the manual anymore, it does match the reality.

IDA 7 support
The API provided by IDA Python for ordinary scripts has changed considerably as of IDA version 7.0. For processor modules, the changes are massive. Nonetheless, we succeeded in reworking the NIOS II processor module for version 7.

There is one strange thing: when a new binary file for NIOS II is loaded in IDA 7, analysis does not start automatically, unlike in IDA 6.9.

The SDK contains examples in which a processor module, besides having basic disassembler functionality, supports numerous features that make it easier to pick apart code. Certainly this all could be done by hand, but say that you have a binary file with megabytes of firmware containing tens of thousands of offsets of various types—why waste so much time if there is a more efficient way? A well-implemented processor module can perform this task instead. And cruising through code with the help of cross-references can be downright fun! With these abilities, IDA remains the convenient and helpful tool beloved by so many.

Author: Anton Dorfman, Positive Technologies

Chystá se konec a stěhování. - 28 Květen, 2017 - 02:00
Takže, tohle je poslední příspěvek na starém A také jeden z prvních , co se objeví i na novém, provozovaném na
Kategorie: Blogy

Aktuální vypsané termíny školení - sociální sítě, bezpečnost, internetový marketing, public relations - 17 Květen, 2017 - 02:00
Více informací o všech školeních i dalších možnostech školení na míru najdete na
Kategorie: Blogy
Syndikovat obsah