Orlok

A minimal strace-like process tracing utility for x86_64 Linux systems. I designed it to obtain a clearer understanding of how system calls work and how process tracers like strace are able to gather detailed syscall information from the kernel. The complexity in this project primarily lies in understanding the ptrace API and the basics of the syscalls being traced (their signatures, input and output parameters)

Tech Stack

Category	Technology Used
Language	C
Compiler	GCC
Build System	GNU Make
Libraries	Standard C Library, Linux kernel headers
Operating System	Linux
License	GNU General Public License V2

Features

Tracing an existing process via PID
Forking a new process to trace
Tracing a variety of common syscalls:

File Descriptors: dup, dup2
File I/O: access, chdir, close, fstat, getcwd, lseek, lstat, openat, pipe, read, stat, write
Networking: accept, bind, connect, listen, socket
Processes: brk, clone, execve, exit, exit_group, fork, getpid, getppid, mmap, munmap

Challenges & Solutions

Output Parameters: Initially, I read all register values except for rax (return value) on syscall entry. This is a flawed approach because many syscalls have a buffer as one or more of their parameters which does not contain useful information until the syscall exits.
Tracing State: ptrace doesn't distinguish between syscall entry and exit, so I had to implement an entering_syscall flag so that we know when to read output parameters and return value.
Execve: I noticed early on that the entering_syscall flag was being offset on first stop, resulting in garbage output for output parameters and return values. Since the first stop in TRACEME mode is caused by execve, we have to toggle the entering_syscall state without inspecting it as a syscall to avoid said offset.

Testing

The implementation of each syscall was tested by comparing Orlok's output with that of strace for the same binary.

Lessons Learned

Initiating Tracing

PTRACE_ATTACH works by pausing the specified process using SIGSTOP and requesting that the kernel set the PT_PTRACED flag in the tracee's task_struct. After this, our tracer process is considered the ptrace parent of the tracee which means that it receives all tracing events (stops, signals, exits).

PTRACE_TRACEME sets the PT_PTRACED flag in the current process's task_struct. Execve, the syscall underlying execl, sends SIGTRAP if PT_PTRACED is true. This stops the child after it has replaced its memory with the new process but before it has begun executing, so that the parent can trace it from the beginning of execution.

PTRACE_SYSCALL requests that the kernel resume the process and stop it again at the next syscall boundary (entry or exit), so it gets run twice per syscall in Orlok.

If the child process exits via calling exit() or exit_group(), running PTRACE_SYSCALL will set errno to ESRCH (No such process). We ignore this entirely in Orlok because the next wait() call will return with WIFEXITED(status) or WIFSIGNALED(status), indicating process killed by exit or signal respectively, and the tracing loop will break properly at that point with wait reaping the zombie.

System Calls & Process Registers

The x86_64 calling convention is:

Syscall Number: orig_rax
Return Value: rax
Argument 0: rdi
Argument 1: rsi
Argument 2: rdx
Argument 3: r10
Argument 4: r8
Argument 5: r9

PTRACE_GETREGS fills a struct of type user_regs_struct with the current values of all CPU registers at the stop point. Each process has its own register state that gets saved by the kernel when changing context and restored when changing back: ptrace requests the copy of the process' registers, it doesn't read from the CPU registers directly.

The return value of a raw syscall is -errno on error. This is not apparent immediately as their glibc wrappers (what the manual pages refer to) return -1 on error and set errno directly.

I used the Chromium OS syscall tables to reference the System V AMD64 calling conventions for each syscall's register mappings.

Reading a Traced Process' Memory

System calls often take pointers as parameters. Reading these from our parent process is not as simple as dereferencing the pointer because the address is for the traced process's address space, not the parent's. This is where PTRACE_PEEKDATA comes in, PTRACE_PEEKDATA allows us to read memory from another process's address space.

The PTRACE_PEEKDATA mode of ptrace sets errno on error but does not indicate error in its return value (can return -1 on success). To handle the ambiguity of PTRACE_PEEKDATA return values (where -1 can be a valid word), I set errno to 0 before calling it and checked if errno had changed afterwards.

Calling PTRACE_PEEKDATA returns a single word (8 bytes on x86_64) read from the specified address in the tracee's memory. When reading strings using PTRACE_PEEKDATA, we advance by a word per call until the fetched word contains a null byte (which we then terminate at)

Future Plans

Pursuing complete feature parity with strace yields diminishing returns education-wise. If anything, I may implement logging of signals.