2

I understand that strace command uses ptrace(PTRACE_PEEKUSER, child, __builtin_offsetof(struct user, regs.orig_eax)) to find the index of a system call the tracee child is trapped at. Then to translate the index into the syscall function name, it has built up tables made from grepping the linux source code headers present in the installation.

This method must be undocumented and prone to failure because the location and syntax of source code declarations are not documented, must be found by grepping and may change in unknown ways. Am I correct to say that?

If that is so, then why would strace not use the following method, with seems to me, is simpler, relies only on documentation and is thus foolproof.

At the start of the first run after reboot, strace sends out a test syscall, one for each syscall function, traps that, and observes what syscall index that child uses. That gives a complete and correct custom table which can be stored in a file known to further invocations of strace.

I am sure this method must have been considered, as it is not anything particularly ingenious. So there must be something wrong with it. What is wrong??

user322908
  • 769
  • 2
  • 13
  • 24
  • How would it make these test system calls except by using the constants from the system/kernel headers? – R.. GitHub STOP HELPING ICE Jul 24 '15 at 02:11
  • Also "first run after reboot" is a concept that doesn't make sense at all. How would it even know such a thing? – R.. GitHub STOP HELPING ICE Jul 24 '15 at 02:11
  • @R.. ?? `open(foobar.txt, O_RDONLY)` no undocumented stuff needed – user322908 Jul 24 '15 at 03:44
  • @R.. as for "the first run", I just did not want to risk the situation, that someone changes some headers, recompiles the kernel and reboots with the new kernel (that, by the way, can break the current `strace` I believe). So I need to make sure if I store the table, because I don't want to compute it each time `strace` is called, then it is wiped out upon reboot. Surely there must be a way in Linux to tell, whether I am being called first time after reboot. Surely there must be some file which is touched/created upon reboot, so that I can compare my timestamp with it? – user322908 Jul 24 '15 at 03:47
  • @R.. for example, `sysinfo` system call will tell me how long since I booted, then comparing that with the timestamp of the file I have stored the table, I know whether to wipe out the file and start again. – user322908 Jul 24 '15 at 03:56
  • 1
    How do you know `open(foobar.txt, O_RDONLY)` results in exactly one syscall and that's `SYS_open`? It might use `SYS_openat`. It might result in other system calls taking place as part of lazy binding in the dynamic linker, if this is the first call to `open`. Or it might involve other system calls internally for some other purpose. There is no trivial one-to-one mapping you can assume between library calls and system calls. – R.. GitHub STOP HELPING ICE Jul 24 '15 at 04:09
  • If you think the definitions of system call numbers could change from rebooting, you have some serious misunderstandings that might explain why you're asking this question to begin with. They are an absolute permanent fixed part of the public kernel API/ABI. For them not to be, binaries built for one kernel could not be used on another one. – R.. GitHub STOP HELPING ICE Jul 24 '15 at 04:10
  • @R... OK I see your point on that there is no one-one correspondence between lib calls and sys calls. Excellent. I do not understand the second part, where you say "absolute part of public kernel API". If that is so, then can you point me to where that API is documented where it says "on platform so-and-so, open system call is index 2" or something like that. – user322908 Jul 24 '15 at 04:28
  • @R.. of course I have misunderstandings. If I had perfect understanding, I would not be asking this question but answering. Your comment is inappropriate. – user322908 Jul 24 '15 at 04:29
  • @R.. This documentation page man7.org/linux/man-pages/man2/syscalls.2.html clearly says, I can call certain library functions, and it will result in a corresponding syscall (in the end). – user322908 Jul 24 '15 at 04:59
  • Man pages are not normative. They are meant to be descriptive, but sometimes the descriptions are not entirely correct. If you feel this text is misleading please submit a bug report to the Linux man pages project. – R.. GitHub STOP HELPING ICE Jul 24 '15 at 17:19
  • About the syscall numbers being fixed, this is simply the Linux stability policy. On systems without such a policy, you would not have any backwards compatibility and any binaries making use of syscall numbers (this means any static binaries, and also dynamic binaries that use the `syscall()` function) would be tied to the specific kernel version they were built for. You can find the Linux stability policy somewhere in the kernel documentation, but it shouldn't even need to be stated; this is just the most basic common sense about compatibility. – R.. GitHub STOP HELPING ICE Jul 24 '15 at 17:22
  • @R.. Not only that, but historically, changes to the syscall table only involve adding and removing new syscalls. There are plenty of gaps in the table where deprecated and finally removed syscalls simply no longer exist (and trigger `-ENOSYS`). The table has never been "shifted" such that the meaning of a syscall changes. In fact, when a syscall needs to change, a new version is added with a similar name. – forest May 12 '18 at 10:03

1 Answers1

1

Because there is no concept of a syscall name at that low of a level. There is no way strace could say "Hey, let's make an fcntl() call and see what the number is!". It can only make calls based on the syscall numbers themselves. This is because a syscall is made when the the syscall number is saved to the eax or rax register, and the process calls int 0x80 or syscall.

Although it could call the wrapper function from the C library, there's no guarantee that the syscall would have the same name as the wrapper. For example, if it tried to figure out the syscall number of open() by calling the libc wrapper of the same name and checking the syscall number used, it would incorrectly conclude that it is syscall 257, when it is in fact syscall 2. This is because that wrapper function actually calls openat(), not open(). A trivial demonstration shell log:

$ cat open.c
#include <fcntl.h>

int main(void)
{
    open("/dev/null", O_RDONLY);
}
$ gcc open.c
$ strace -e trace=%file -P "/dev/null" ./a.out
openat(AT_FDCWD, "/dev/null", O_RDONLY) = 3
+++ exited with 0 +++

Now, you can do this using syscall(SYS_open, "/dev/null", O_RDONLY) instead, but then you're relying on a constant defined in a header again, so why not just cut the middle-man and have strace built with the syscall list from the C headers in the first place? That's what strace does.

forest
  • 1,344
  • 8
  • 19