1. 29 3月, 2023 3 次提交
  2. 05 7月, 2022 1 次提交
  3. 15 11月, 2021 1 次提交
  4. 30 10月, 2020 1 次提交
  5. 19 10月, 2020 1 次提交
    • J
      binfmt_elf: take the mmap lock around find_extend_vma() · b2767d97
      Jann Horn 提交于
      create_elf_tables() runs after setup_new_exec(), so other tasks can
      already access our new mm and do things like process_madvise() on it.  (At
      the time I'm writing this commit, process_madvise() is not in mainline
      yet, but has been in akpm's tree for some time.)
      
      While I believe that there are currently no APIs that would actually allow
      another process to mess up our VMA tree (process_madvise() is limited to
      MADV_COLD and MADV_PAGEOUT, and uring and userfaultfd cannot reach an mm
      under which no syscalls have been executed yet), this seems like an
      accident waiting to happen.
      
      Let's make sure that we always take the mmap lock around GUP paths as long
      as another process might be able to see the mm.
      
      (Yes, this diff looks suspicious because we drop the lock before doing
      anything with `vma`, but that's because we actually don't do anything with
      it apart from the NULL check.)
      Signed-off-by: NJann Horn <jannh@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: NMichel Lespinasse <walken@google.com>
      Cc: "Eric W . Biederman" <ebiederm@xmission.com>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
      Cc: Sakari Ailus <sakari.ailus@linux.intel.com>
      Link: https://lkml.kernel.org/r/CAG48ez1-PBCdv3y8pn-Ty-b+FmBSLwDuVKFSt8h7wARLy0dF-Q@mail.gmail.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b2767d97
  6. 17 10月, 2020 4 次提交
    • J
      binfmt_elf, binfmt_elf_fdpic: use a VMA list snapshot · a07279c9
      Jann Horn 提交于
      In both binfmt_elf and binfmt_elf_fdpic, use a new helper
      dump_vma_snapshot() to take a snapshot of the VMA list (including the gate
      VMA, if we have one) while protected by the mmap_lock, and then use that
      snapshot instead of walking the VMA list without locking.
      
      An alternative approach would be to keep the mmap_lock held across the
      entire core dumping operation; however, keeping the mmap_lock locked while
      we may be blocked for an unbounded amount of time (e.g.  because we're
      dumping to a FUSE filesystem or so) isn't really optimal; the mmap_lock
      blocks things like the ->release handler of userfaultfd, and we don't
      really want critical system daemons to grind to a halt just because
      someone "gifted" them SCM_RIGHTS to an eternally-locked userfaultfd, or
      something like that.
      
      Since both the normal ELF code and the FDPIC ELF code need this
      functionality (and if any other binfmt wants to add coredump support in
      the future, they'd probably need it, too), implement this with a common
      helper in fs/coredump.c.
      
      A downside of this approach is that we now need a bigger amount of kernel
      memory per userspace VMA in the normal ELF case, and that we need O(n)
      kernel memory in the FDPIC ELF case at all; but 40 bytes per VMA shouldn't
      be terribly bad.
      
      There currently is a data race between stack expansion and anything that
      reads ->vm_start or ->vm_end under the mmap_lock held in read mode; to
      mitigate that for core dumping, take the mmap_lock in write mode when
      taking a snapshot of the VMA hierarchy.  (If we only took the mmap_lock in
      read mode, we could end up with a corrupted core dump if someone does
      get_user_pages_remote() concurrently.  Not really a major problem, but
      taking the mmap_lock either way works here, so we might as well avoid the
      issue.) (This doesn't do anything about the existing data races with stack
      expansion in other mm code.)
      Signed-off-by: NJann Horn <jannh@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: "Eric W . Biederman" <ebiederm@xmission.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Link: http://lkml.kernel.org/r/20200827114932.3572699-6-jannh@google.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a07279c9
    • J
      coredump: rework elf/elf_fdpic vma_dump_size() into common helper · 429a22e7
      Jann Horn 提交于
      At the moment, the binfmt_elf and binfmt_elf_fdpic code have slightly
      different code to figure out which VMAs should be dumped, and if so,
      whether the dump should contain the entire VMA or just its first page.
      
      Eliminate duplicate code by reworking the binfmt_elf version into a
      generic core dumping helper in coredump.c.
      
      As part of that, change the heuristic for detecting executable/library
      header pages to check whether the inode is executable instead of looking
      at the file mode.
      
      This is less problematic in terms of locking because it lets us avoid
      get_user() under the mmap_sem.  (And arguably it looks nicer and makes
      more sense in generic code.)
      
      Adjust a little bit based on the binfmt_elf_fdpic version: ->anon_vma is
      only meaningful under CONFIG_MMU, otherwise we have to assume that the VMA
      has been written to.
      Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NJann Horn <jannh@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: "Eric W . Biederman" <ebiederm@xmission.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Link: http://lkml.kernel.org/r/20200827114932.3572699-5-jannh@google.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      429a22e7
    • J
      coredump: refactor page range dumping into common helper · afc63a97
      Jann Horn 提交于
      Both fs/binfmt_elf.c and fs/binfmt_elf_fdpic.c need to dump ranges of
      pages into the coredump file.  Extract that logic into a common helper.
      Signed-off-by: NJann Horn <jannh@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: "Eric W . Biederman" <ebiederm@xmission.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Link: http://lkml.kernel.org/r/20200827114932.3572699-4-jannh@google.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      afc63a97
    • C
      fs/binfmt_elf: use PT_LOAD p_align values for suitable start address · ce81bb25
      Chris Kennelly 提交于
      Patch series "Selecting Load Addresses According to p_align", v3.
      
      The current ELF loading mechancism provides page-aligned mappings.  This
      can lead to the program being loaded in a way unsuitable for file-backed,
      transparent huge pages when handling PIE executables.
      
      While specifying -z,max-page-size=0x200000 to the linker will generate
      suitably aligned segments for huge pages on x86_64, the executable needs
      to be loaded at a suitably aligned address as well.  This alignment
      requires the binary's cooperation, as distinct segments need to be
      appropriately paddded to be eligible for THP.
      
      For binaries built with increased alignment, this limits the number of
      bits usable for ASLR, but provides some randomization over using fixed
      load addresses/non-PIE binaries.
      
      This patch (of 2):
      
      The current ELF loading mechancism provides page-aligned mappings.  This
      can lead to the program being loaded in a way unsuitable for file-backed,
      transparent huge pages when handling PIE executables.
      
      For binaries built with increased alignment, this limits the number of
      bits usable for ASLR, but provides some randomization over using fixed
      load addresses/non-PIE binaries.
      
      Tested by verifying program with -Wl,-z,max-page-size=0x200000 loading.
      
      [akpm@linux-foundation.org: fix max() warning]
      [ckennelly@google.com: augment comment]
        Link: https://lkml.kernel.org/r/20200821233848.3904680-2-ckennelly@google.comSigned-off-by: NChris Kennelly <ckennelly@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Song Liu <songliubraving@fb.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Hugh Dickens <hughd@google.com>
      Cc: Suren Baghdasaryan <surenb@google.com>
      Cc: Sandeep Patil <sspatil@google.com>
      Cc: Fangrui Song <maskray@google.com>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Link: https://lkml.kernel.org/r/20200820170541.1132271-1-ckennelly@google.com
      Link: https://lkml.kernel.org/r/20200820170541.1132271-2-ckennelly@google.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ce81bb25
  7. 28 7月, 2020 2 次提交
    • A
      kill elf_fpxregs_t · 7a896028
      Al Viro 提交于
      all uses are conditional upon ELF_CORE_COPY_XFPREGS, which has not
      been defined on any architecture since 2010
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      7a896028
    • A
      introduction of regset ->get() wrappers, switching ELF coredumps to those · b4e9c954
      Al Viro 提交于
      Two new helpers: given a process and regset, dump into a buffer.
      regset_get() takes a buffer and size, regset_get_alloc() takes size
      and allocates a buffer.
      
      Return value in both cases is the amount of data actually dumped in
      case of success or -E...  on error.
      
      In both cases the size is capped by regset->n * regset->size, so
      ->get() is called with offset 0 and size no more than what regset
      expects.
      
      binfmt_elf.c callers of ->get() are switched to using those; the other
      caller (copy_regset_to_user()) will need some preparations to switch.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      b4e9c954
  8. 05 6月, 2020 1 次提交
  9. 04 6月, 2020 1 次提交
  10. 29 5月, 2020 1 次提交
  11. 21 5月, 2020 1 次提交
    • E
      exec: Generic execfd support · b8a61c9e
      Eric W. Biederman 提交于
      Most of the support for passing the file descriptor of an executable
      to an interpreter already lives in the generic code and in binfmt_elf.
      Rework the fields in binfmt_elf that deal with executable file
      descriptor passing to make executable file descriptor passing a first
      class concept.
      
      Move the fd_install from binfmt_misc into begin_new_exec after the new
      creds have been installed.  This means that accessing the file through
      /proc/<pid>/fd/N is able to see the creds for the new executable
      before allowing access to the new executables files.
      
      Performing the install of the executables file descriptor after
      the point of no return also means that nothing special needs to
      be done on error.  The exiting of the process will close all
      of it's open files.
      
      Move the would_dump from binfmt_misc into begin_new_exec right
      after would_dump is called on the bprm->file.  This makes it
      obvious this case exists and that no nesting of bprm->file is
      currently supported.
      
      In binfmt_misc the movement of fd_install into generic code means
      that it's special error exit path is no longer needed.
      
      Link: https://lkml.kernel.org/r/87y2poyd91.fsf_-_@x220.int.ebiederm.orgAcked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      b8a61c9e
  12. 08 5月, 2020 2 次提交
  13. 06 5月, 2020 2 次提交
  14. 08 4月, 2020 4 次提交
  15. 17 3月, 2020 2 次提交
  16. 01 2月, 2020 8 次提交
  17. 05 12月, 2019 2 次提交
  18. 15 11月, 2019 1 次提交
    • A
      y2038: elfcore: Use __kernel_old_timeval for process times · e2bb80d5
      Arnd Bergmann 提交于
      We store elapsed time for a crashed process in struct elf_prstatus using
      'timeval' structures. Once glibc starts using 64-bit time_t, this becomes
      incompatible with the kernel's idea of timeval since the structure layout
      no longer matches on 32-bit architectures.
      
      This changes the definition of the elf_prstatus structure to use
      __kernel_old_timeval instead, which is hardcoded to the currently used
      binary layout. There is no risk of overflow in y2038 though, because
      the time values are all relative times, and can store up to 68 years
      of process elapsed time.
      
      There is a risk of applications breaking at build time when they
      use the new kernel headers and expect the type to be exactly 'timeval'
      rather than a structure that has the same fields as before. Those
      applications have to be modified to deal with 64-bit time_t anyway.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      e2bb80d5
  19. 07 10月, 2019 1 次提交
    • L
      elf: don't use MAP_FIXED_NOREPLACE for elf executable mappings · b212921b
      Linus Torvalds 提交于
      In commit 4ed28639 ("fs, elf: drop MAP_FIXED usage from elf_map") we
      changed elf to use MAP_FIXED_NOREPLACE instead of MAP_FIXED for the
      executable mappings.
      
      Then, people reported that it broke some binaries that had overlapping
      segments from the same file, and commit ad55eac7 ("elf: enforce
      MAP_FIXED on overlaying elf segments") re-instated MAP_FIXED for some
      overlaying elf segment cases.  But only some - despite the summary line
      of that commit, it only did it when it also does a temporary brk vma for
      one obvious overlapping case.
      
      Now Russell King reports another overlapping case with old 32-bit x86
      binaries, which doesn't trigger that limited case.  End result: we had
      better just drop MAP_FIXED_NOREPLACE entirely, and go back to MAP_FIXED.
      
      Yes, it's a sign of old binaries generated with old tool-chains, but we
      do pride ourselves on not breaking existing setups.
      
      This still leaves MAP_FIXED_NOREPLACE in place for the load_elf_interp()
      and the old load_elf_library() use-cases, because nobody has reported
      breakage for those. Yet.
      
      Note that in all the cases seen so far, the overlapping elf sections
      seem to be just re-mapping of the same executable with different section
      attributes.  We could possibly introduce a new MAP_FIXED_NOFILECHANGE
      flag or similar, which acts like NOREPLACE, but allows just remapping
      the same executable file using different protection flags.
      
      It's not clear that would make a huge difference to anything, but if
      people really hate that "elf remaps over previous maps" behavior, maybe
      at least a more limited form of remapping would alleviate some concerns.
      
      Alternatively, we should take a look at our elf_map() logic to see if we
      end up not mapping things properly the first time.
      
      In the meantime, this is the minimal "don't do that then" patch while
      people hopefully think about it more.
      Reported-by: NRussell King <linux@armlinux.org.uk>
      Fixes: 4ed28639 ("fs, elf: drop MAP_FIXED usage from elf_map")
      Fixes: ad55eac7 ("elf: enforce  MAP_FIXED on overlaying elf segments")
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Kees Cook <keescook@chromium.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b212921b
  20. 27 9月, 2019 1 次提交