1. 14 9月, 2017 1 次提交
    • M
      mm: treewide: remove GFP_TEMPORARY allocation flag · 0ee931c4
      Michal Hocko 提交于
      GFP_TEMPORARY was introduced by commit e12ba74d ("Group short-lived
      and reclaimable kernel allocations") along with __GFP_RECLAIMABLE.  It's
      primary motivation was to allow users to tell that an allocation is
      short lived and so the allocator can try to place such allocations close
      together and prevent long term fragmentation.  As much as this sounds
      like a reasonable semantic it becomes much less clear when to use the
      highlevel GFP_TEMPORARY allocation flag.  How long is temporary? Can the
      context holding that memory sleep? Can it take locks? It seems there is
      no good answer for those questions.
      
      The current implementation of GFP_TEMPORARY is basically GFP_KERNEL |
      __GFP_RECLAIMABLE which in itself is tricky because basically none of
      the existing caller provide a way to reclaim the allocated memory.  So
      this is rather misleading and hard to evaluate for any benefits.
      
      I have checked some random users and none of them has added the flag
      with a specific justification.  I suspect most of them just copied from
      other existing users and others just thought it might be a good idea to
      use without any measuring.  This suggests that GFP_TEMPORARY just
      motivates for cargo cult usage without any reasoning.
      
      I believe that our gfp flags are quite complex already and especially
      those with highlevel semantic should be clearly defined to prevent from
      confusion and abuse.  Therefore I propose dropping GFP_TEMPORARY and
      replace all existing users to simply use GFP_KERNEL.  Please note that
      SLAB users with shrinkers will still get __GFP_RECLAIMABLE heuristic and
      so they will be placed properly for memory fragmentation prevention.
      
      I can see reasons we might want some gfp flag to reflect shorterm
      allocations but I propose starting from a clear semantic definition and
      only then add users with proper justification.
      
      This was been brought up before LSF this year by Matthew [1] and it
      turned out that GFP_TEMPORARY really doesn't have a clear semantic.  It
      seems to be a heuristic without any measured advantage for most (if not
      all) its current users.  The follow up discussion has revealed that
      opinions on what might be temporary allocation differ a lot between
      developers.  So rather than trying to tweak existing users into a
      semantic which they haven't expected I propose to simply remove the flag
      and start from scratch if we really need a semantic for short term
      allocations.
      
      [1] http://lkml.kernel.org/r/20170118054945.GD18349@bombadil.infradead.org
      
      [akpm@linux-foundation.org: fix typo]
      [akpm@linux-foundation.org: coding-style fixes]
      [sfr@canb.auug.org.au: drm/i915: fix up]
        Link: http://lkml.kernel.org/r/20170816144703.378d4f4d@canb.auug.org.au
      Link: http://lkml.kernel.org/r/20170728091904.14627-1-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Neil Brown <neilb@suse.de>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0ee931c4
  2. 02 3月, 2017 3 次提交
  3. 15 1月, 2017 1 次提交
    • D
      coredump: Ensure proper size of sparse core files · 4d22c75d
      Dave Kleikamp 提交于
      If the last section of a core file ends with an unmapped or zero page,
      the size of the file does not correspond with the last dump_skip() call.
      gdb complains that the file is truncated and can be confusing to users.
      
      After all of the vma sections are written, make sure that the file size
      is no smaller than the current file position.
      
      This problem can be demonstrated with gdb's bigcore testcase on the
      sparc architecture.
      Signed-off-by: NDave Kleikamp <dave.kleikamp@oracle.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: linux-fsdevel@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      4d22c75d
  4. 25 12月, 2016 1 次提交
  5. 12 11月, 2016 1 次提交
  6. 08 6月, 2016 1 次提交
  7. 24 5月, 2016 1 次提交
  8. 13 5月, 2016 2 次提交
  9. 23 3月, 2016 1 次提交
    • J
      fs/coredump: prevent fsuid=0 dumps into user-controlled directories · 378c6520
      Jann Horn 提交于
      This commit fixes the following security hole affecting systems where
      all of the following conditions are fulfilled:
      
       - The fs.suid_dumpable sysctl is set to 2.
       - The kernel.core_pattern sysctl's value starts with "/". (Systems
         where kernel.core_pattern starts with "|/" are not affected.)
       - Unprivileged user namespace creation is permitted. (This is
         true on Linux >=3.8, but some distributions disallow it by
         default using a distro patch.)
      
      Under these conditions, if a program executes under secure exec rules,
      causing it to run with the SUID_DUMP_ROOT flag, then unshares its user
      namespace, changes its root directory and crashes, the coredump will be
      written using fsuid=0 and a path derived from kernel.core_pattern - but
      this path is interpreted relative to the root directory of the process,
      allowing the attacker to control where a coredump will be written with
      root privileges.
      
      To fix the security issue, always interpret core_pattern for dumps that
      are written under SUID_DUMP_ROOT relative to the root directory of init.
      Signed-off-by: NJann Horn <jann@thejh.net>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      378c6520
  10. 21 1月, 2016 1 次提交
    • J
      fs/coredump: prevent "" / "." / ".." core path components · ac94b6e3
      Jann Horn 提交于
      Let %h and %e print empty values as "!", "." as "!" and
      ".." as "!.".
      
      This prevents hostnames and comm values that are empty or consist of one
      or two dots from changing the directory level at which the corefile will
      be stored.
      
      Consider the case where someone decides to sort coredumps by hostname
      with a core pattern like "/cores/%h/core.%e.%p.%t" or so.  In this
      case, hostnames "" and "." would cause the coredump to land directly in
      /cores, which is not what the intent behind the core pattern is, and
      ".." would cause the coredump to land in /.
      
      Yeah, there probably aren't many people who do that, but I still don't
      want this edgecase to be kind of broken.
      
      It seems very unlikely that this caused security issues anywhere, so I'm
      not requesting a stable backport.
      
      [akpm@linux-foundation.org: tweak code comment]
      Signed-off-by: NJann Horn <jann@thejh.net>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ac94b6e3
  11. 07 12月, 2015 1 次提交
  12. 07 11月, 2015 2 次提交
  13. 11 9月, 2015 2 次提交
    • J
      fs: Don't dump core if the corefile would become world-readable. · 40f705a7
      Jann Horn 提交于
      On a filesystem like vfat, all files are created with the same owner
      and mode independent of who created the file. When a vfat filesystem
      is mounted with root as owner of all files and read access for everyone,
      root's processes left world-readable coredumps on it (but other
      users' processes only left empty corefiles when given write access
      because of the uid mismatch).
      
      Given that the old behavior was inconsistent and insecure, I don't see
      a problem with changing it. Now, all processes refuse to dump core unless
      the resulting corefile will only be readable by their owner.
      Signed-off-by: NJann Horn <jann@thejh.net>
      Acked-by: NKees Cook <keescook@chromium.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      40f705a7
    • J
      fs: if a coredump already exists, unlink and recreate with O_EXCL · fbb18169
      Jann Horn 提交于
      It was possible for an attacking user to trick root (or another user) into
      writing his coredumps into an attacker-readable, pre-existing file using
      rename() or link(), causing the disclosure of secret data from the victim
      process' virtual memory.  Depending on the configuration, it was also
      possible to trick root into overwriting system files with coredumps.  Fix
      that issue by never writing coredumps into existing files.
      
      Requirements for the attack:
       - The attack only applies if the victim's process has a nonzero
         RLIMIT_CORE and is dumpable.
       - The attacker can trick the victim into coredumping into an
         attacker-writable directory D, either because the core_pattern is
         relative and the victim's cwd is attacker-writable or because an
         absolute core_pattern pointing to a world-writable directory is used.
       - The attacker has one of these:
        A: on a system with protected_hardlinks=0:
           execute access to a folder containing a victim-owned,
           attacker-readable file on the same partition as D, and the
           victim-owned file will be deleted before the main part of the attack
           takes place. (In practice, there are lots of files that fulfill
           this condition, e.g. entries in Debian's /var/lib/dpkg/info/.)
           This does not apply to most Linux systems because most distros set
           protected_hardlinks=1.
        B: on a system with protected_hardlinks=1:
           execute access to a folder containing a victim-owned,
           attacker-readable and attacker-writable file on the same partition
           as D, and the victim-owned file will be deleted before the main part
           of the attack takes place.
           (This seems to be uncommon.)
        C: on any system, independent of protected_hardlinks:
           write access to a non-sticky folder containing a victim-owned,
           attacker-readable file on the same partition as D
           (This seems to be uncommon.)
      
      The basic idea is that the attacker moves the victim-owned file to where
      he expects the victim process to dump its core.  The victim process dumps
      its core into the existing file, and the attacker reads the coredump from
      it.
      
      If the attacker can't move the file because he does not have write access
      to the containing directory, he can instead link the file to a directory
      he controls, then wait for the original link to the file to be deleted
      (because the kernel checks that the link count of the corefile is 1).
      
      A less reliable variant that requires D to be non-sticky works with link()
      and does not require deletion of the original link: link() the file into
      D, but then unlink() it directly before the kernel performs the link count
      check.
      
      On systems with protected_hardlinks=0, this variant allows an attacker to
      not only gain information from coredumps, but also clobber existing,
      victim-writable files with coredumps.  (This could theoretically lead to a
      privilege escalation.)
      Signed-off-by: NJann Horn <jann@thejh.net>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fbb18169
  14. 26 6月, 2015 2 次提交
  15. 24 6月, 2015 1 次提交
  16. 12 4月, 2015 1 次提交
  17. 07 3月, 2015 1 次提交
  18. 20 2月, 2015 1 次提交
  19. 14 10月, 2014 1 次提交
    • O
      coredump: add %i/%I in core_pattern to report the tid of the crashed thread · b03023ec
      Oleg Nesterov 提交于
      format_corename() can only pass the leader's pid to the core handler,
      but there is no simple way to figure out which thread originated the
      coredump.
      
      As Jan explains, this also means that there is no simple way to create
      the backtrace of the crashed process:
      
      As programs are mostly compiled with implicit gcc -fomit-frame-pointer
      one needs program's .eh_frame section (equivalently PT_GNU_EH_FRAME
      segment) or .debug_frame section.  .debug_frame usually is present only
      in separate debug info files usually not even installed on the system.
      While .eh_frame is a part of the executable/library (and it is even
      always mapped for C++ exceptions unwinding) it no longer has to be
      present anywhere on the disk as the program could be upgraded in the
      meantime and the running instance has its executable file already
      unlinked from disk.
      
      One possibility is to echo 0x3f >/proc/*/coredump_filter and dump all
      the file-backed memory including the executable's .eh_frame section.
      But that can create huge core files, for example even due to mmapped
      data files.
      
      Other possibility would be to read .eh_frame from /proc/PID/mem at the
      core_pattern handler time of the core dump.  For the backtrace one needs
      to read the register state first which can be done from core_pattern
      handler:
      
          ptrace(PTRACE_SEIZE, tid, 0, PTRACE_O_TRACEEXIT)
          close(0);    // close pipe fd to resume the sleeping dumper
          waitpid();   // should report EXIT
          PTRACE_GETREGS or other requests
      
      The remaining problem is how to get the 'tid' value of the crashed
      thread.  It could be read from the first NT_PRSTATUS note of the core
      file but that makes the core_pattern handler complicated.
      
      Unfortunately %t is already used so this patch uses %i/%I.
      
      Automatic Bug Reporting Tool (https://github.com/abrt/abrt/wiki/overview)
      is experimenting with this.  It is using the elfutils
      (https://fedorahosted.org/elfutils/) unwinder for generating the
      backtraces.  Apart from not needing matching executables as mentioned
      above, another advantage is that we can get the backtrace without saving
      the core (which might be quite large) to disk.
      
      [mmilata@redhat.com: final paragraph of changelog]
      Signed-off-by: NJan Kratochvil <jan.kratochvil@redhat.com>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Jan Kratochvil <jan.kratochvil@redhat.com>
      Cc: Mark Wielaard <mjw@redhat.com>
      Cc: Martin Milata <mmilata@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b03023ec
  20. 24 7月, 2014 1 次提交
  21. 20 4月, 2014 1 次提交
    • E
      coredump: fix va_list corruption · 404ca80e
      Eric Dumazet 提交于
      A va_list needs to be copied in case it needs to be used twice.
      
      Thanks to Hugh for debugging this issue, leading to various panics.
      
      Tested:
      
        lpq84:~# echo "|/foobar12345 %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h" >/proc/sys/kernel/core_pattern
      
      'produce_core' is simply : main() { *(int *)0 = 1;}
      
        lpq84:~# ./produce_core
        Segmentation fault (core dumped)
        lpq84:~# dmesg | tail -1
        [  614.352947] Core dump to |/foobar12345 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 lpq84 (null) pipe failed
      
      Notice the last argument was replaced by a NULL (we were lucky enough to
      not crash, but do not try this on your production machine !)
      
      After fix :
      
        lpq83:~# echo "|/foobar12345 %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h %h" >/proc/sys/kernel/core_pattern
        lpq83:~# ./produce_core
        Segmentation fault
        lpq83:~# dmesg | tail -1
        [  740.800441] Core dump to |/foobar12345 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 lpq83 pipe failed
      
      Fixes: 5fe9d8ca ("coredump: cn_vprintf() has no reason to call vsnprintf() twice")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Diagnosed-by: NHugh Dickins <hughd@google.com>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: stable@vger.kernel.org # 3.11+
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      404ca80e
  22. 24 1月, 2014 1 次提交
  23. 16 11月, 2013 2 次提交
  24. 09 11月, 2013 5 次提交
  25. 25 10月, 2013 1 次提交
  26. 12 9月, 2013 1 次提交
  27. 04 7月, 2013 3 次提交