1. 02 8月, 2017 1 次提交
  2. 08 7月, 2017 1 次提交
  3. 24 6月, 2017 1 次提交
    • K
      fs/exec.c: account for argv/envp pointers · 98da7d08
      Kees Cook 提交于
      When limiting the argv/envp strings during exec to 1/4 of the stack limit,
      the storage of the pointers to the strings was not included.  This means
      that an exec with huge numbers of tiny strings could eat 1/4 of the stack
      limit in strings and then additional space would be later used by the
      pointers to the strings.
      
      For example, on 32-bit with a 8MB stack rlimit, an exec with 1677721
      single-byte strings would consume less than 2MB of stack, the max (8MB /
      4) amount allowed, but the pointers to the strings would consume the
      remaining additional stack space (1677721 * 4 == 6710884).
      
      The result (1677721 + 6710884 == 8388605) would exhaust stack space
      entirely.  Controlling this stack exhaustion could result in
      pathological behavior in setuid binaries (CVE-2017-1000365).
      
      [akpm@linux-foundation.org: additional commenting from Kees]
      Fixes: b6a2fea3 ("mm: variable length argument support")
      Link: http://lkml.kernel.org/r/20170622001720.GA32173@beastSigned-off-by: NKees Cook <keescook@chromium.org>
      Acked-by: NRik van Riel <riel@redhat.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Qualys Security Advisory <qsa@qualys.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      98da7d08
  4. 20 3月, 2017 1 次提交
    • K
      x86/arch_prctl: Add ARCH_[GET|SET]_CPUID · e9ea1e7f
      Kyle Huey 提交于
      Intel supports faulting on the CPUID instruction beginning with Ivy Bridge.
      When enabled, the processor will fault on attempts to execute the CPUID
      instruction with CPL>0. Exposing this feature to userspace will allow a
      ptracer to trap and emulate the CPUID instruction.
      
      When supported, this feature is controlled by toggling bit 0 of
      MSR_MISC_FEATURES_ENABLES. It is documented in detail in Section 2.3.2 of
      https://bugzilla.kernel.org/attachment.cgi?id=243991
      
      Implement a new pair of arch_prctls, available on both x86-32 and x86-64.
      
      ARCH_GET_CPUID: Returns the current CPUID state, either 0 if CPUID faulting
          is enabled (and thus the CPUID instruction is not available) or 1 if
          CPUID faulting is not enabled.
      
      ARCH_SET_CPUID: Set the CPUID state to the second argument. If
          cpuid_enabled is 0 CPUID faulting will be activated, otherwise it will
          be deactivated. Returns ENODEV if CPUID faulting is not supported on
          this system.
      
      The state of the CPUID faulting flag is propagated across forks, but reset
      upon exec.
      Signed-off-by: NKyle Huey <khuey@kylehuey.com>
      Cc: Grzegorz Andrejczuk <grzegorz.andrejczuk@intel.com>
      Cc: kvm@vger.kernel.org
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: linux-kselftest@vger.kernel.org
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: Robert O'Callahan <robert@ocallahan.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: user-mode-linux-devel@lists.sourceforge.net
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: user-mode-linux-user@lists.sourceforge.net
      Cc: David Matlack <dmatlack@google.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Dmitry Safonov <dsafonov@virtuozzo.com>
      Cc: linux-fsdevel@vger.kernel.org
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Link: http://lkml.kernel.org/r/20170320081628.18952-9-khuey@kylehuey.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      e9ea1e7f
  5. 02 3月, 2017 6 次提交
  6. 14 2月, 2017 1 次提交
    • V
      vfs: Use upper filesystem inode in bprm_fill_uid() · fea6d2a6
      Vivek Goyal 提交于
      Right now bprm_fill_uid() uses inode fetched from file_inode(bprm->file).
      This in turn returns inode of lower filesystem (in a stacked filesystem
      setup).
      
      I was playing with modified patches of shiftfs posted by james bottomley
      and realized that through shiftfs setuid bit does not take effect. And
      reason being that we fetch uid/gid from inode of lower fs (and not from
      shiftfs inode). And that results in following checks failing.
      
      /* We ignore suid/sgid if there are no mappings for them in the ns */
      if (!kuid_has_mapping(bprm->cred->user_ns, uid) ||
          !kgid_has_mapping(bprm->cred->user_ns, gid))
      	return;
      
      uid/gid fetched from lower fs inode might not be mapped inside the user
      namespace of container. So we need to look at uid/gid fetched from
      upper filesystem (shiftfs in this particular case) and these should be
      mapped and setuid bit can take affect.
      Signed-off-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      fea6d2a6
  7. 24 1月, 2017 1 次提交
  8. 25 12月, 2016 1 次提交
  9. 23 12月, 2016 1 次提交
    • A
      fs: exec: apply CLOEXEC before changing dumpable task flags · 613cc2b6
      Aleksa Sarai 提交于
      If you have a process that has set itself to be non-dumpable, and it
      then undergoes exec(2), any CLOEXEC file descriptors it has open are
      "exposed" during a race window between the dumpable flags of the process
      being reset for exec(2) and CLOEXEC being applied to the file
      descriptors. This can be exploited by a process by attempting to access
      /proc/<pid>/fd/... during this window, without requiring CAP_SYS_PTRACE.
      
      The race in question is after set_dumpable has been (for get_link,
      though the trace is basically the same for readlink):
      
      [vfs]
      -> proc_pid_link_inode_operations.get_link
         -> proc_pid_get_link
            -> proc_fd_access_allowed
               -> ptrace_may_access(task, PTRACE_MODE_READ_FSCREDS);
      
      Which will return 0, during the race window and CLOEXEC file descriptors
      will still be open during this window because do_close_on_exec has not
      been called yet. As a result, the ordering of these calls should be
      reversed to avoid this race window.
      
      This is of particular concern to container runtimes, where joining a
      PID namespace with file descriptors referring to the host filesystem
      can result in security issues (since PRCTL_SET_DUMPABLE doesn't protect
      against access of CLOEXEC file descriptors -- file descriptors which may
      reference filesystem objects the container shouldn't have access to).
      
      Cc: dev@opencontainers.org
      Cc: <stable@vger.kernel.org> # v3.2+
      Reported-by: NMichael Crosby <crosbymichael@gmail.com>
      Signed-off-by: NAleksa Sarai <asarai@suse.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      613cc2b6
  10. 15 12月, 2016 1 次提交
    • L
      mm: add locked parameter to get_user_pages_remote() · 5b56d49f
      Lorenzo Stoakes 提交于
      Patch series "mm: unexport __get_user_pages_unlocked()".
      
      This patch series continues the cleanup of get_user_pages*() functions
      taking advantage of the fact we can now pass gup_flags as we please.
      
      It firstly adds an additional 'locked' parameter to
      get_user_pages_remote() to allow for its callers to utilise
      VM_FAULT_RETRY functionality.  This is necessary as the invocation of
      __get_user_pages_unlocked() in process_vm_rw_single_vec() makes use of
      this and no other existing higher level function would allow it to do
      so.
      
      Secondly existing callers of __get_user_pages_unlocked() are replaced
      with the appropriate higher-level replacement -
      get_user_pages_unlocked() if the current task and memory descriptor are
      referenced, or get_user_pages_remote() if other task/memory descriptors
      are referenced (having acquiring mmap_sem.)
      
      This patch (of 2):
      
      Add a int *locked parameter to get_user_pages_remote() to allow
      VM_FAULT_RETRY faulting behaviour similar to get_user_pages_[un]locked().
      
      Taking into account the previous adjustments to get_user_pages*()
      functions allowing for the passing of gup_flags, we are now in a
      position where __get_user_pages_unlocked() need only be exported for his
      ability to allow VM_FAULT_RETRY behaviour, this adjustment allows us to
      subsequently unexport __get_user_pages_unlocked() as well as allowing
      for future flexibility in the use of get_user_pages_remote().
      
      [sfr@canb.auug.org.au: merge fix for get_user_pages_remote API change]
        Link: http://lkml.kernel.org/r/20161122210511.024ec341@canb.auug.org.au
      Link: http://lkml.kernel.org/r/20161027095141.2569-2-lstoakes@gmail.comSigned-off-by: NLorenzo Stoakes <lstoakes@gmail.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Radim Krcmar <rkrcmar@redhat.com>
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5b56d49f
  11. 23 11月, 2016 2 次提交
    • E
      exec: Ensure mm->user_ns contains the execed files · f84df2a6
      Eric W. Biederman 提交于
      When the user namespace support was merged the need to prevent
      ptrace from revealing the contents of an unreadable executable
      was overlooked.
      
      Correct this oversight by ensuring that the executed file
      or files are in mm->user_ns, by adjusting mm->user_ns.
      
      Use the new function privileged_wrt_inode_uidgid to see if
      the executable is a member of the user namespace, and as such
      if having CAP_SYS_PTRACE in the user namespace should allow
      tracing the executable.  If not update mm->user_ns to
      the parent user namespace until an appropriate parent is found.
      
      Cc: stable@vger.kernel.org
      Reported-by: NJann Horn <jann@thejh.net>
      Fixes: 9e4a36ec ("userns: Fail exec for suid and sgid binaries with ids outside our user namespace.")
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      f84df2a6
    • E
      ptrace: Capture the ptracer's creds not PT_PTRACE_CAP · 64b875f7
      Eric W. Biederman 提交于
      When the flag PT_PTRACE_CAP was added the PTRACE_TRACEME path was
      overlooked.  This can result in incorrect behavior when an application
      like strace traces an exec of a setuid executable.
      
      Further PT_PTRACE_CAP does not have enough information for making good
      security decisions as it does not report which user namespace the
      capability is in.  This has already allowed one mistake through
      insufficient granulariy.
      
      I found this issue when I was testing another corner case of exec and
      discovered that I could not get strace to set PT_PTRACE_CAP even when
      running strace as root with a full set of caps.
      
      This change fixes the above issue with strace allowing stracing as
      root a setuid executable without disabling setuid.  More fundamentaly
      this change allows what is allowable at all times, by using the correct
      information in it's decision.
      
      Cc: stable@vger.kernel.org
      Fixes: 4214e42f96d4 ("v2.4.9.11 -> v2.4.9.12")
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      64b875f7
  12. 16 11月, 2016 1 次提交
  13. 19 10月, 2016 1 次提交
  14. 03 8月, 2016 1 次提交
    • S
      firmware: support loading into a pre-allocated buffer · a098ecd2
      Stephen Boyd 提交于
      Some systems are memory constrained but they need to load very large
      firmwares.  The firmware subsystem allows drivers to request this
      firmware be loaded from the filesystem, but this requires that the
      entire firmware be loaded into kernel memory first before it's provided
      to the driver.  This can lead to a situation where we map the firmware
      twice, once to load the firmware into kernel memory and once to copy the
      firmware into the final resting place.
      
      This creates needless memory pressure and delays loading because we have
      to copy from kernel memory to somewhere else.  Let's add a
      request_firmware_into_buf() API that allows drivers to request firmware
      be loaded directly into a pre-allocated buffer.  This skips the
      intermediate step of allocating a buffer in kernel memory to hold the
      firmware image while it's read from the filesystem.  It also requires
      that drivers know how much memory they'll require before requesting the
      firmware and negates any benefits of firmware caching because the
      firmware layer doesn't manage the buffer lifetime.
      
      For a 16MB buffer, about half the time is spent performing a memcpy from
      the buffer to the final resting place.  I see loading times go from
      0.081171 seconds to 0.047696 seconds after applying this patch.  Plus
      the vmalloc pressure is reduced.
      
      This is based on a patch from Vikram Mulukutla on codeaurora.org:
        https://www.codeaurora.org/cgit/quic/la/kernel/msm-3.18/commit/drivers/base/firmware_class.c?h=rel/msm-3.18&id=0a328c5f6cd999f5c591f172216835636f39bcb5
      
      Link: http://lkml.kernel.org/r/20160607164741.31849-4-stephen.boyd@linaro.orgSigned-off-by: NStephen Boyd <stephen.boyd@linaro.org>
      Cc: Mimi Zohar <zohar@linux.vnet.ibm.com>
      Cc: Vikram Mulukutla <markivx@codeaurora.org>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Ming Lei <ming.lei@canonical.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a098ecd2
  15. 25 7月, 2016 1 次提交
  16. 24 6月, 2016 1 次提交
    • A
      fs: Treat foreign mounts as nosuid · 380cf5ba
      Andy Lutomirski 提交于
      If a process gets access to a mount from a different user
      namespace, that process should not be able to take advantage of
      setuid files or selinux entrypoints from that filesystem.  Prevent
      this by treating mounts from other mount namespaces and those not
      owned by current_user_ns() or an ancestor as nosuid.
      
      This will make it safer to allow more complex filesystems to be
      mounted in non-root user namespaces.
      
      This does not remove the need for MNT_LOCK_NOSUID.  The setuid,
      setgid, and file capability bits can no longer be abused if code in
      a user namespace were to clear nosuid on an untrusted filesystem,
      but this patch, by itself, is insufficient to protect the system
      from abuse of files that, when execed, would increase MAC privilege.
      
      As a more concrete explanation, any task that can manipulate a
      vfsmount associated with a given user namespace already has
      capabilities in that namespace and all of its descendents.  If they
      can cause a malicious setuid, setgid, or file-caps executable to
      appear in that mount, then that executable will only allow them to
      elevate privileges in exactly the set of namespaces in which they
      are already privileges.
      
      On the other hand, if they can cause a malicious executable to
      appear with a dangerous MAC label, running it could change the
      caller's security context in a way that should not have been
      possible, even inside the namespace in which the task is confined.
      
      As a hardening measure, this would have made CVE-2014-5207 much
      more difficult to exploit.
      Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: NSeth Forshee <seth.forshee@canonical.com>
      Acked-by: NJames Morris <james.l.morris@oracle.com>
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      380cf5ba
  17. 24 5月, 2016 2 次提交
    • M
      exec: make exec path waiting for mmap_sem killable · f268dfe9
      Michal Hocko 提交于
      setup_arg_pages requires mmap_sem for write.  If the waiting task gets
      killed by the oom killer it would block oom_reaper from asynchronous
      address space reclaim and reduce the chances of timely OOM resolving.
      Wait for the lock in the killable mode and return with EINTR if the task
      got killed while waiting.  All the callers are already handling error
      path and the fatal signal doesn't need any additional treatment.
      
      The same applies to __bprm_mm_init.
      Signed-off-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f268dfe9
    • O
      exec: remove the no longer needed remove_arg_zero()->free_arg_page() · 9eb8a659
      Oleg Nesterov 提交于
      remove_arg_zero() does free_arg_page() for no reason.  This was needed
      before and only if CONFIG_MMU=y: see commit 4fc75ff4 ("exec: fix
      remove_arg_zero"), install_arg_page() was called for every page != NULL
      in bprm->page[] array.  Today install_arg_page() has already gone and
      free_arg_page() is nop after another commit b6a2fea3 ("mm: variable
      length argument support").
      
      CONFIG_MMU=n does free_arg_pages() in free_bprm() and thus it doesn't
      need remove_arg_zero()->free_arg_page() too; apart from get_arg_page()
      it never checks if the page in bprm->page[] was allocated or not, so the
      "extra" non-freed page is fine.  OTOH, this free_arg_page() can add the
      minor pessimization, the caller is going to do copy_strings_kernel()
      right after remove_arg_zero() which will likely need to re-allocate the
      same page again.
      
      And as Hujunjie pointed out, the "offset == PAGE_SIZE" check is wrong
      because we are going to increment bprm->p once again before return, so
      CONFIG_MMU=n "leaks" the page anyway if '0' is the final byte in this
      page.
      
      NOTE: remove_arg_zero() assumes that argv[0] is null-terminated but this
      is not necessarily true.  copy_strings() does "len = strnlen_user(...)",
      then copy_from_user(len) but another thread or debuger can overwrite the
      trailing '0' in between.  Afaics nothing really bad can happen because
      we must always have the null-terminated bprm->filename copied by the 1st
      copy_strings_kernel(), but perhaps we should change this code to check
      "bprm->p < bprm->exec" anyway, and/or change copy_strings() to ensure
      that the last byte in string is always zero.
      
      Link: http://lkml.kernel.org/r/20160517155335.GA31435@redhat.comSigned-off-by: NOleg Nesterov <oleg@redhat.com>
      Reported by: hujunjie <jj.net@163.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9eb8a659
  18. 18 5月, 2016 1 次提交
  19. 01 5月, 2016 1 次提交
  20. 21 2月, 2016 3 次提交
  21. 19 2月, 2016 2 次提交
  22. 16 2月, 2016 1 次提交
    • D
      mm/gup: Introduce get_user_pages_remote() · 1e987790
      Dave Hansen 提交于
      For protection keys, we need to understand whether protections
      should be enforced in software or not.  In general, we enforce
      protections when working on our own task, but not when on others.
      We call these "current" and "remote" operations.
      
      This patch introduces a new get_user_pages() variant:
      
              get_user_pages_remote()
      
      Which is a replacement for when get_user_pages() is called on
      non-current tsk/mm.
      
      We also introduce a new gup flag: FOLL_REMOTE which can be used
      for the "__" gup variants to get this new behavior.
      
      The uprobes is_trap_at_addr() location holds mmap_sem and
      calls get_user_pages(current->mm) on an instruction address.  This
      makes it a pretty unique gup caller.  Being an instruction access
      and also really originating from the kernel (vs. the app), I opted
      to consider this a 'remote' access where protection keys will not
      be enforced.
      
      Without protection keys, this patch should not change any behavior.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dave Hansen <dave@sr71.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: jack@suse.cz
      Cc: linux-mm@kvack.org
      Link: http://lkml.kernel.org/r/20160212210154.3F0E51EA@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      1e987790
  23. 23 1月, 2016 1 次提交
    • A
      wrappers for ->i_mutex access · 5955102c
      Al Viro 提交于
      parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested},
      inode_foo(inode) being mutex_foo(&inode->i_mutex).
      
      Please, use those for access to ->i_mutex; over the coming cycle
      ->i_mutex will become rwsem, with ->lookup() done with it held
      only shared.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5955102c
  24. 04 1月, 2016 1 次提交
  25. 10 7月, 2015 1 次提交
    • E
      vfs: Commit to never having exectuables on proc and sysfs. · 90f8572b
      Eric W. Biederman 提交于
      Today proc and sysfs do not contain any executable files.  Several
      applications today mount proc or sysfs without noexec and nosuid and
      then depend on there being no exectuables files on proc or sysfs.
      Having any executable files show on proc or sysfs would cause
      a user space visible regression, and most likely security problems.
      
      Therefore commit to never allowing executables on proc and sysfs by
      adding a new flag to mark them as filesystems without executables and
      enforce that flag.
      
      Test the flag where MNT_NOEXEC is tested today, so that the only user
      visible effect will be that exectuables will be treated as if the
      execute bit is cleared.
      
      The filesystems proc and sysfs do not currently incoporate any
      executable files so this does not result in any user visible effects.
      
      This makes it unnecessary to vet changes to proc and sysfs tightly for
      adding exectuable files or changes to chattr that would modify
      existing files, as no matter what the individual file say they will
      not be treated as exectuable files by the vfs.
      
      Not having to vet changes to closely is important as without this we
      are only one proc_create call (or another goof up in the
      implementation of notify_change) from having problematic executables
      on proc.  Those mistakes are all too easy to make and would create
      a situation where there are security issues or the assumptions of
      some program having to be broken (and cause userspace regressions).
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      90f8572b
  26. 13 5月, 2015 1 次提交
    • H
      parisc,metag: Fix crashes due to stack randomization on stack-grows-upwards architectures · d045c77c
      Helge Deller 提交于
      On architectures where the stack grows upwards (CONFIG_STACK_GROWSUP=y,
      currently parisc and metag only) stack randomization sometimes leads to crashes
      when the stack ulimit is set to lower values than STACK_RND_MASK (which is 8 MB
      by default if not defined in arch-specific headers).
      
      The problem is, that when the stack vm_area_struct is set up in fs/exec.c, the
      additional space needed for the stack randomization (as defined by the value of
      STACK_RND_MASK) was not taken into account yet and as such, when the stack
      randomization code added a random offset to the stack start, the stack
      effectively got smaller than what the user defined via rlimit_max(RLIMIT_STACK)
      which then sometimes leads to out-of-stack situations and crashes.
      
      This patch fixes it by adding the maximum possible amount of memory (based on
      STACK_RND_MASK) which theoretically could be added by the stack randomization
      code to the initial stack size. That way, the user-defined stack size is always
      guaranteed to be at minimum what is defined via rlimit_max(RLIMIT_STACK).
      
      This bug is currently not visible on the metag architecture, because on metag
      STACK_RND_MASK is defined to 0 which effectively disables stack randomization.
      
      The changes to fs/exec.c are inside an "#ifdef CONFIG_STACK_GROWSUP"
      section, so it does not affect other platformws beside those where the
      stack grows upwards (parisc and metag).
      Signed-off-by: NHelge Deller <deller@gmx.de>
      Cc: linux-parisc@vger.kernel.org
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: linux-metag@vger.kernel.org
      Cc: stable@vger.kernel.org # v3.16+
      d045c77c
  27. 20 4月, 2015 1 次提交
  28. 17 4月, 2015 2 次提交
  29. 23 1月, 2015 1 次提交
    • P
      fs: create proper filename objects using getname_kernel() · 51689104
      Paul Moore 提交于
      There are several areas in the kernel that create temporary filename
      objects using the following pattern:
      
      	int func(const char *name)
      	{
      		struct filename *file = { .name = name };
      		...
      		return 0;
      	}
      
      ... which for the most part works okay, but it causes havoc within the
      audit subsystem as the filename object does not persist beyond the
      lifetime of the function.  This patch converts all of these temporary
      filename objects into proper filename objects using getname_kernel()
      and putname() which ensure that the filename object persists until the
      audit subsystem is finished with it.
      
      Also, a special thanks to Al Viro, Guenter Roeck, and Sabrina Dubroca
      for helping resolve a difficult kernel panic on boot related to a
      use-after-free problem in kern_path_create(); the thread can be seen
      at the link below:
      
       * https://lkml.org/lkml/2015/1/20/710
      
      This patch includes code that was either based on, or directly written
      by Al in the above thread.
      
      CC: viro@zeniv.linux.org.uk
      CC: linux@roeck-us.net
      CC: sd@queasysnail.net
      CC: linux-fsdevel@vger.kernel.org
      Signed-off-by: NPaul Moore <pmoore@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      51689104