k1R4

calc [ICC24]

31 Oct 2024

Preface

I had the honour of authoring this challenge as part of Traboda for the jeopardy CTF in ICC 2024. Huge thanks to everyone involved!

You can find the challenge source here

Challenge Description

An experimental calculator that uses JIT to speed up calculations, or does it? what harm could some math expression really do?

Handout has the binary, source and the Dockerfile, needed to recreate the environment.

Initial Analysis

The binary has all mitigations that are expected:

[k1r4@enderman handout]$ pwn checksec calc
[*] '/home/k1r4/work/projects/calc/handout/calc'
    Arch:     amd64-64-little
    RELRO:    Full RELRO
    Stack:    Canary found
    NX:       NX enabled
    PIE:      PIE enabled

The sources provided are in C++. The program evaluates mathematical expressions interactively.

Implementation

The program consists of a Parser and an Evaluator, which are two separate classes. The Parser parses the input expression and converts it to a binary tree. The Evaluator takes the binary tree as input and evaluates the expression to produce the result.

Parser

class Parser {
    private:
        bool isOperator(char token);
        int  precedence(char op);
        void processOperator(std::stack<char>& operators, std::stack<std::unique_ptr<Node>>& operands);

    public:
        std::unique_ptr<Node> parseExpression(const std::string& expression);
};

The parser uses the parseExpression() method to parse the input expression. It does so iteratively by maintaining two stacks to keep track of operators and operands separately. The processOperator() method is for processing the current operator on top of the operators stack.

Evaluator

class Evaluator {
    private:
        char *jitCode, *jitCodeBase;
        long *jitConsts, *jitConstsBase;
        long jitConstCount, jitOpCount;
        long jitConstsSize, jitCodeSize;
        long jitPrevConst;

        std::unordered_map<long, short> constMap;

        void *createJitRegion(long size);
        void updateJitPerm(void);
        void jitEmitCode(const std::unique_ptr<Node>& root);
        void jitCountNodes(const std::unique_ptr<Node> & root);

    public:
        long evaluate(const std::unique_ptr<Node>& root);
        long jitEvaluate(const std::unique_ptr<Node>& root);
        ~Evaluator(void);
};

The evaluator uses the evaluate() method for evaluating binary expression trees when JIT is disabled. If JIT is enabled, then the jitEvaluate() method is used. jitEvalute() approximates the size of JIT memory required using jitCountNodes(). It then creates the JIT memory map using createJitRegion(). JIT code is then emitted by jitEmitCode(). updateJitPerm() is then used to set the permissions for the JIT memory appropriately. Finally the JIT-ed code is called and the result is returned.

All constants (numbers) used in the expression are stored in a read-only region. The JIT-ed code fetches the constants from the read-only memory through push qword ptr [rdi+rdx*8] where rdx is the index of the constant.

The constants are then popped into rax and rsi and the respective operation is done such as xor rax, rsi. Finally push rax is done to save the result. This is done until the epilogue of the JIT-ed code is reached, which performs pop rax ; ret.

Vulnerability

The mul instruction

There are two intended vulnerabilities in this program. The first involves the mul instruction which is emitted for the * operator. The single operand 64 bit mul instruction specification states that the result will be stored in rdx:rax. This means that the overflow of the multiplication will be stored in the rdx register. However rdx is used to index the memory used to fetch constants.

There is an “optimization” in the jitEmitCode() which doesn’t update rdx (technically edx) when the same constant is used back to back. Consider the example of 1+2, the emitted code would look something like:

mov edx, 0
push qword ptr [rdi+rdx*8]
mov edx, 1
push qword ptr [rdi+rdx*8]
pop rsi
pop rax
add rax, rsi
push rax

If 1*1 is inputted instead, the emitted code would look something like:

mov edx, 0
push qword ptr [rdi+rdx*8]
nop
nop
nop
nop
push qword ptr [rdi+rdx*8]
pop rsi
pop rax
add rax, rsi
push rax

This is because of the following code:

if(jitPrevConst == root->value) {
    *(int *)(jitCode-7) = 0x90909090;
}
else {
    jitPrevConst = root->value;
}

This replaces the mov edx, xxxx with nop instructions, when the previous constant that needed to be fetched, is the same as the current one.

Consider the case when an overflowing mul operation is followed by fetching a constant that is the exact same one as fetched previously. The emitted code following the mul rsi would be, a few nop instructions and finally a push qword ptr [rdi+rdx*8]. Hence controlling the overflow value stored in rdx will allow for fetching arbitrary constants from memory.

The JIT memory size

jitCountConsts() is used to calculate the number of constants and operators in the expression. Based on these numbers, the JIT memory required is approximated and page aligned.

jitCountNodes(root);
jitConstsSize = PAGE_ALIGN((jitConstCount*sizeof(long)));
jitCodeSize = PAGE_ALIGN(((jitConstCount*FETCH_SZ)+(jitOpCount*EXEC_SZ)));

FETCH_SZ and EXEC_SZ is the number of bytes of code required for a constant fetch and operation execution respectively. Although the size calculation is not incorrect by itself, it fails to account for the epilogue that is included at the end of the JIT-ed code.

std::memset(jitCodeBase, 0x90, jitCodeSize);
*(jitCode+jitCodeSize-2) = '\x58';
*(jitCode+jitCodeSize-1) = '\xc3';

jitEmitCode(root);

If the expression causes code to emit such that the pop rax in the epilogue is overwritten by push rax of the last operation, then this would allow for hijacking control flow and code execution.

Typically an expression contains N operands (constants) and N-1 operators, like in the case of 5+4-3*2/1, which has 5 constants and 4 operators. In order to get push rax to overwrite pop rax, the push rax must be present at one byte before the page boundary.

Consider the following equation where x is the number of operations in the expression and y is any natural number:

x * EXEC_SZ + (x+1) * FETCH_SZ = 0xyfff

x * 6 + (x+1) * 7 = 0xyfff

6x + 7x + 7 = 0xyfff

13x = 0xyfff - 7

Solving this equation for values of y from 0, until we get x as a whole number, we get x as 2520 and y as 7. So the JIT-ed code size will be 0x7fff and the number of operations will be 2520 with 2521 constants.

Exploit Strategy

The JIT memory regions are mmapped addresses which would mean that they are at constant offsets from libc, other loaded libraries and other mmapped regions. This allows fetching addresses from nearby regions and adding offset to get address of any library function in memory. Furthermore chaining this with the second bug allows execution of any function. The goal is to execute system("sh") for code execution.

The exploit expression would do the following:

  • Pad a lot of dummy operations (around 2520) at the start, to reach the target size for the second bug
  • Perform a mul operation with an overflow, which corresponds to the index of the out-of-bound fetch
  • Fetch the last operand used in the mul operation so that, the mov edx, xxxx is optimized out
  • The fetch now gets a memory address, preferably in libc
  • Add an offset to get the system function
  • Perform a dummy operation that uses the numerical represnation of the “sh” string
  • Ensure that the total number of operations in the expression are 2520
  • Profit!

Since rdi contains the address of the constants, it will point to the last constant used in the expression, which would be “sh”. This is because the binary expression tree contains the last operations and constants near the root of the tree.

Conclusion

I love authoring challenges and this challenge was no exception. The development process for this challenge was a lot of fun. All the lectures in university on data structures finally paid off!

You can find the exploit here