• D
    syscalls: implement execveat() system call · 51f39a1f
    David Drysdale 提交于
    This patchset adds execveat(2) for x86, and is derived from Meredydd
    Luff's patch from Sept 2012 (https://lkml.org/lkml/2012/9/11/528).
    
    The primary aim of adding an execveat syscall is to allow an
    implementation of fexecve(3) that does not rely on the /proc filesystem,
    at least for executables (rather than scripts).  The current glibc version
    of fexecve(3) is implemented via /proc, which causes problems in sandboxed
    or otherwise restricted environments.
    
    Given the desire for a /proc-free fexecve() implementation, HPA suggested
    (https://lkml.org/lkml/2006/7/11/556) that an execveat(2) syscall would be
    an appropriate generalization.
    
    Also, having a new syscall means that it can take a flags argument without
    back-compatibility concerns.  The current implementation just defines the
    AT_EMPTY_PATH and AT_SYMLINK_NOFOLLOW flags, but other flags could be
    added in future -- for example, flags for new namespaces (as suggested at
    https://lkml.org/lkml/2006/7/11/474).
    
    Related history:
     - https://lkml.org/lkml/2006/12/27/123 is an example of someone
       realizing that fexecve() is likely to fail in a chroot environment.
     - http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=514043 covered
       documenting the /proc requirement of fexecve(3) in its manpage, to
       "prevent other people from wasting their time".
     - https://bugzilla.redhat.com/show_bug.cgi?id=241609 described a
       problem where a process that did setuid() could not fexecve()
       because it no longer had access to /proc/self/fd; this has since
       been fixed.
    
    This patch (of 4):
    
    Add a new execveat(2) system call.  execveat() is to execve() as openat()
    is to open(): it takes a file descriptor that refers to a directory, and
    resolves the filename relative to that.
    
    In addition, if the filename is empty and AT_EMPTY_PATH is specified,
    execveat() executes the file to which the file descriptor refers.  This
    replicates the functionality of fexecve(), which is a system call in other
    UNIXen, but in Linux glibc it depends on opening "/proc/self/fd/<fd>" (and
    so relies on /proc being mounted).
    
    The filename fed to the executed program as argv[0] (or the name of the
    script fed to a script interpreter) will be of the form "/dev/fd/<fd>"
    (for an empty filename) or "/dev/fd/<fd>/<filename>", effectively
    reflecting how the executable was found.  This does however mean that
    execution of a script in a /proc-less environment won't work; also, script
    execution via an O_CLOEXEC file descriptor fails (as the file will not be
    accessible after exec).
    
    Based on patches by Meredydd Luff.
    Signed-off-by: NDavid Drysdale <drysdale@google.com>
    Cc: Meredydd Luff <meredydd@senatehouse.org>
    Cc: Shuah Khan <shuah.kh@samsung.com>
    Cc: "Eric W. Biederman" <ebiederm@xmission.com>
    Cc: Andy Lutomirski <luto@amacapital.net>
    Cc: Alexander Viro <viro@zeniv.linux.org.uk>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: Ingo Molnar <mingo@redhat.com>
    Cc: "H. Peter Anvin" <hpa@zytor.com>
    Cc: Kees Cook <keescook@chromium.org>
    Cc: Arnd Bergmann <arnd@arndb.de>
    Cc: Rich Felker <dalias@aerifal.cx>
    Cc: Christoph Hellwig <hch@infradead.org>
    Cc: Michael Kerrisk <mtk.manpages@gmail.com>
    Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
    51f39a1f
syscalls.h 37.9 KB