Orlok

A minimal strace-like process tracing utility for x86_64 Linux systems. I designed it to obtain a clearer understanding of how system calls work and how process tracers like strace are able to gather detailed syscall information from the kernel. The complexity in this project primarily lies in understanding the ptrace API and the basics of the syscalls being traced (their signatures, input and output parameters)

Tech Stack

Category Technology Used
Language C
Compiler GCC
Build System GNU Make
Libraries Standard C Library, Linux kernel headers
Operating System Linux
License GNU General Public License V2

Features

Challenges & Solutions

Testing

The implementation of each syscall was tested by comparing Orlok's output with that of strace for the same binary.

Lessons Learned

Initiating Tracing

PTRACE_ATTACH works by pausing the specified process using SIGSTOP and requesting that the kernel set the PT_PTRACED flag in the tracee's task_struct. After this, our tracer process is considered the ptrace parent of the tracee which means that it receives all tracing events (stops, signals, exits).

PTRACE_TRACEME sets the PT_PTRACED flag in the current process's task_struct. Execve, the syscall underlying execl, sends SIGTRAP if PT_PTRACED is true. This stops the child after it has replaced its memory with the new process but before it has begun executing, so that the parent can trace it from the beginning of execution.

PTRACE_SYSCALL requests that the kernel resume the process and stop it again at the next syscall boundary (entry or exit), so it gets run twice per syscall in Orlok.

If the child process exits via calling exit() or exit_group(), running PTRACE_SYSCALL will set errno to ESRCH (No such process). We ignore this entirely in Orlok because the next wait() call will return with WIFEXITED(status) or WIFSIGNALED(status), indicating process killed by exit or signal respectively, and the tracing loop will break properly at that point with wait reaping the zombie.

System Calls & Process Registers

The x86_64 calling convention is:

PTRACE_GETREGS fills a struct of type user_regs_struct with the current values of all CPU registers at the stop point. Each process has its own register state that gets saved by the kernel when changing context and restored when changing back: ptrace requests the copy of the process' registers, it doesn't read from the CPU registers directly.

The return value of a raw syscall is -errno on error. This is not apparent immediately as their glibc wrappers (what the manual pages refer to) return -1 on error and set errno directly.

I used the Chromium OS syscall tables to reference the System V AMD64 calling conventions for each syscall's register mappings.

Reading a Traced Process' Memory

System calls often take pointers as parameters. Reading these from our parent process is not as simple as dereferencing the pointer because the address is for the traced process's address space, not the parent's. This is where PTRACE_PEEKDATA comes in, PTRACE_PEEKDATA allows us to read memory from another process's address space.

The PTRACE_PEEKDATA mode of ptrace sets errno on error but does not indicate error in its return value (can return -1 on success). To handle the ambiguity of PTRACE_PEEKDATA return values (where -1 can be a valid word), I set errno to 0 before calling it and checked if errno had changed afterwards.

Calling PTRACE_PEEKDATA returns a single word (8 bytes on x86_64) read from the specified address in the tracee's memory. When reading strings using PTRACE_PEEKDATA, we advance by a word per call until the fetched word contains a null byte (which we then terminate at)

Future Plans

Pursuing complete feature parity with strace yields diminishing returns education-wise. If anything, I may implement logging of signals.

Further Reading

View Source Code on Github