1. 13 12月, 2016 2 次提交
  2. 10 12月, 2016 1 次提交
  3. 08 12月, 2016 1 次提交
  4. 01 12月, 2016 2 次提交
  5. 25 11月, 2016 2 次提交
    • M
      locking/selftest: Fix output since KERN_CONT changes · 25139409
      Michael Ellerman 提交于
      Since the KERN_CONT changes the locking-selftest output is messed up, eg:
      
        ----------------------------------------------------------------------------
                                         | spin |wlock |rlock |mutex | wsem | rsem |
          --------------------------------------------------------------------------
                             A-A deadlock:
          ok  |
          ok  |
          ok  |
          ok  |
          ok  |
          ok  |
      
      Use pr_cont() to get it looking normal again:
      
        ----------------------------------------------------------------------------
                                         | spin |wlock |rlock |mutex | wsem | rsem |
          --------------------------------------------------------------------------
                             A-A deadlock:  ok  |  ok  |  ok  |  ok  |  ok  |  ok  |
      Reported-by: NChristian Kujau <lists@nerdbynature.de>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linuxppc-dev@ozlabs.org
      Link: http://lkml.kernel.org/r/1480027528-934-1-git-send-email-mpe@ellerman.id.auSigned-off-by: NIngo Molnar <mingo@kernel.org>
      25139409
    • A
      mpi: Fix NULL ptr dereference in mpi_powm() [ver #3] · f5527fff
      Andrey Ryabinin 提交于
      This fixes CVE-2016-8650.
      
      If mpi_powm() is given a zero exponent, it wants to immediately return
      either 1 or 0, depending on the modulus.  However, if the result was
      initalised with zero limb space, no limbs space is allocated and a
      NULL-pointer exception ensues.
      
      Fix this by allocating a minimal amount of limb space for the result when
      the 0-exponent case when the result is 1 and not touching the limb space
      when the result is 0.
      
      This affects the use of RSA keys and X.509 certificates that carry them.
      
      BUG: unable to handle kernel NULL pointer dereference at           (null)
      IP: [<ffffffff8138ce5d>] mpi_powm+0x32/0x7e6
      PGD 0
      Oops: 0002 [#1] SMP
      Modules linked in:
      CPU: 3 PID: 3014 Comm: keyctl Not tainted 4.9.0-rc6-fscache+ #278
      Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
      task: ffff8804011944c0 task.stack: ffff880401294000
      RIP: 0010:[<ffffffff8138ce5d>]  [<ffffffff8138ce5d>] mpi_powm+0x32/0x7e6
      RSP: 0018:ffff880401297ad8  EFLAGS: 00010212
      RAX: 0000000000000000 RBX: ffff88040868bec0 RCX: ffff88040868bba0
      RDX: ffff88040868b260 RSI: ffff88040868bec0 RDI: ffff88040868bee0
      RBP: ffff880401297ba8 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000047 R11: ffffffff8183b210 R12: 0000000000000000
      R13: ffff8804087c7600 R14: 000000000000001f R15: ffff880401297c50
      FS:  00007f7a7918c700(0000) GS:ffff88041fb80000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000000 CR3: 0000000401250000 CR4: 00000000001406e0
      Stack:
       ffff88040868bec0 0000000000000020 ffff880401297b00 ffffffff81376cd4
       0000000000000100 ffff880401297b10 ffffffff81376d12 ffff880401297b30
       ffffffff81376f37 0000000000000100 0000000000000000 ffff880401297ba8
      Call Trace:
       [<ffffffff81376cd4>] ? __sg_page_iter_next+0x43/0x66
       [<ffffffff81376d12>] ? sg_miter_get_next_page+0x1b/0x5d
       [<ffffffff81376f37>] ? sg_miter_next+0x17/0xbd
       [<ffffffff8138ba3a>] ? mpi_read_raw_from_sgl+0xf2/0x146
       [<ffffffff8132a95c>] rsa_verify+0x9d/0xee
       [<ffffffff8132acca>] ? pkcs1pad_sg_set_buf+0x2e/0xbb
       [<ffffffff8132af40>] pkcs1pad_verify+0xc0/0xe1
       [<ffffffff8133cb5e>] public_key_verify_signature+0x1b0/0x228
       [<ffffffff8133d974>] x509_check_for_self_signed+0xa1/0xc4
       [<ffffffff8133cdde>] x509_cert_parse+0x167/0x1a1
       [<ffffffff8133d609>] x509_key_preparse+0x21/0x1a1
       [<ffffffff8133c3d7>] asymmetric_key_preparse+0x34/0x61
       [<ffffffff812fc9f3>] key_create_or_update+0x145/0x399
       [<ffffffff812fe227>] SyS_add_key+0x154/0x19e
       [<ffffffff81001c2b>] do_syscall_64+0x80/0x191
       [<ffffffff816825e4>] entry_SYSCALL64_slow_path+0x25/0x25
      Code: 56 41 55 41 54 53 48 81 ec a8 00 00 00 44 8b 71 04 8b 42 04 4c 8b 67 18 45 85 f6 89 45 80 0f 84 b4 06 00 00 85 c0 75 2f 41 ff ce <49> c7 04 24 01 00 00 00 b0 01 75 0b 48 8b 41 18 48 83 38 01 0f
      RIP  [<ffffffff8138ce5d>] mpi_powm+0x32/0x7e6
       RSP <ffff880401297ad8>
      CR2: 0000000000000000
      ---[ end trace d82015255d4a5d8d ]---
      
      Basically, this is a backport of a libgcrypt patch:
      
      	http://git.gnupg.org/cgi-bin/gitweb.cgi?p=libgcrypt.git;a=patch;h=6e1adb05d290aeeb1c230c763970695f4a538526
      
      Fixes: cdec9cb5 ("crypto: GnuPG based MPI lib - source files (part 1)")
      Signed-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      cc: Dmitry Kasatkin <dmitry.kasatkin@gmail.com>
      cc: linux-ima-devel@lists.sourceforge.net
      cc: stable@vger.kernel.org
      Signed-off-by: NJames Morris <james.l.morris@oracle.com>
      f5527fff
  6. 20 11月, 2016 1 次提交
    • A
      netlink: smaller nla_attr_minlen table · 32d84cdc
      Alexey Dobriyan 提交于
      Length of a netlink attribute may be u16 but lengths of basic attributes
      are much smaller, so small we can save 16 bytes of .rodata and pocket
      change inside .text.
      
      16-bit is worse on x86-64 than 8-bit because of operand size override prefix.
      
      	add/remove: 0/0 grow/shrink: 0/3 up/down: 0/-19 (-19)
      	function                                     old     new   delta
      	validate_nla                                 418     417      -1
      	nla_policy_len                                66      64      -2
      	nla_attr_minlen                               32      16     -16
      	Total: Before=154865051, After=154865032, chg -0.00%
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      32d84cdc
  7. 19 11月, 2016 1 次提交
  8. 17 11月, 2016 1 次提交
    • A
      fix iov_iter_advance() for ITER_PIPE · 680bb946
      Abhi Das 提交于
      iov_iter_advance() needs to decrement iter->count by the number of
      bytes we'd moved beyond.  Normal flavours do that, but ITER_PIPE
      doesn't and ITER_PIPE generic_file_read_iter() for O_DIRECT files
      ends up with a bogus fallback to page cache read, resulting in incorrect
      values for file offset and bytes read.
      Signed-off-by: NAbhi Das <adas@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      680bb946
  9. 16 11月, 2016 1 次提交
  10. 15 11月, 2016 1 次提交
  11. 12 11月, 2016 1 次提交
  12. 01 11月, 2016 6 次提交
  13. 28 10月, 2016 3 次提交
    • D
      lib/genalloc.c: start search from start of chunk · 62e931fa
      Daniel Mentz 提交于
      gen_pool_alloc_algo() iterates over the chunks of a pool trying to find
      a contiguous block of memory that satisfies the allocation request.
      
      The shortcut
      
      	if (size > atomic_read(&chunk->avail))
      		continue;
      
      makes the loop skip over chunks that do not have enough bytes left to
      fulfill the request.  There are two situations, though, where an
      allocation might still fail:
      
      (1) The available memory is not contiguous, i.e.  the request cannot
          be fulfilled due to external fragmentation.
      
      (2) A race condition.  Another thread runs the same code concurrently
          and is quicker to grab the available memory.
      
      In those situations, the loop calls pool->algo() to search the entire
      chunk, and pool->algo() returns some value that is >= end_bit to
      indicate that the search failed.  This return value is then assigned to
      start_bit.  The variables start_bit and end_bit describe the range that
      should be searched, and this range should be reset for every chunk that
      is searched.  Today, the code fails to reset start_bit to 0.  As a
      result, prefixes of subsequent chunks are ignored.  Memory allocations
      might fail even though there is plenty of room left in these prefixes of
      those other chunks.
      
      Fixes: 7f184275 ("lib, Make gen_pool memory allocator lockless")
      Link: http://lkml.kernel.org/r/1477420604-28918-1-git-send-email-danielmentz@google.comSigned-off-by: NDaniel Mentz <danielmentz@google.com>
      Reviewed-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com>
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      62e931fa
    • D
      lib/stackdepot.c: bump stackdepot capacity from 16MB to 128MB · 02754e0a
      Dmitry Vyukov 提交于
      KASAN uses stackdepot to memorize stacks for all kmalloc/kfree calls.
      Current stackdepot capacity is 16MB (1024 top level entries x 4 pages on
      second level).  Size of each stack is (num_frames + 3) * sizeof(long).
      Which gives us ~84K stacks.  This capacity was chosen empirically and it
      is enough to run kernel normally.
      
      However, when lots of configs are enabled and a fuzzer tries to maximize
      code coverage, it easily hits the limit within tens of minutes.  I've
      tested for long a time with number of top level entries bumped 4x
      (4096).  And I think I've seen overflow only once.  But I don't have all
      configs enabled and code coverage has not reached maximum yet.  So bump
      it 8x to 8192.
      
      Since we have two-level table, memory cost of this is very moderate --
      currently the top-level table is 8KB, with this patch it is 64KB, which
      is negligible under KASAN.
      
      Here is some approx math.
      
      128MB allows us to memorize ~670K stacks (assuming stack is ~200b).
      I've grepped kernel for kmalloc|kfree|kmem_cache_alloc|kmem_cache_free|
      kzalloc|kstrdup|kstrndup|kmemdup and it gives ~60K matches.  Most of
      alloc/free call sites are reachable with only one stack.  But some
      utility functions can have large fanout.  Assuming average fanout is 5x,
      total number of alloc/free stacks is ~300K.
      
      Link: http://lkml.kernel.org/r/1476458416-122131-1-git-send-email-dvyukov@google.comSigned-off-by: NDmitry Vyukov <dvyukov@google.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Baozeng Ding <sploving1@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      02754e0a
    • K
      latent_entropy: raise CONFIG_FRAME_WARN by default · 0e07f663
      Kees Cook 提交于
      When building with the latent_entropy plugin, set the default
      CONFIG_FRAME_WARN to 2048, since some __init functions have many basic
      blocks that, when instrumented by the latent_entropy plugin, grow beyond
      1024 byte stack size on 32-bit builds.
      
      Link: http://lkml.kernel.org/r/20161018211216.GA39687@beastSigned-off-by: NKees Cook <keescook@chromium.org>
      Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
      Cc: Emese Revfy <re.emese@gmail.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Michal Marek <mmarek@suse.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Nikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0e07f663
  14. 21 10月, 2016 1 次提交
    • D
      bpf, test: fix ld_abs + vlan push/pop stress test · 0d906b1e
      Daniel Borkmann 提交于
      After commit 636c2628 ("net: skbuff: Remove errornous length
      validation in skb_vlan_pop()") mentioned test case stopped working,
      throwing a -12 (ENOMEM) return code. The issue however is not due to
      636c2628, but rather due to a buggy test case that got uncovered
      from the change in behaviour in 636c2628.
      
      The data_size of that test case for the skb was set to 1. In the
      bpf_fill_ld_abs_vlan_push_pop() handler bpf insns are generated that
      loop with: reading skb data, pushing 68 tags, reading skb data,
      popping 68 tags, reading skb data, etc, in order to force a skb
      expansion and thus trigger that JITs recache skb->data. Problem is
      that initial data_size is too small.
      
      While before 636c2628, the test silently bailed out due to the
      skb->len < VLAN_ETH_HLEN check with returning 0, and now throwing an
      error from failing skb_ensure_writable(). Set at least minimum of
      ETH_HLEN as an initial length so that on first push of data, equivalent
      pop will succeed.
      
      Fixes: 4d9c5c53 ("test_bpf: add bpf_skb_vlan_push/pop() tests")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0d906b1e
  15. 15 10月, 2016 1 次提交
  16. 12 10月, 2016 5 次提交
  17. 11 10月, 2016 1 次提交
    • E
      latent_entropy: Mark functions with __latent_entropy · 0766f788
      Emese Revfy 提交于
      The __latent_entropy gcc attribute can be used only on functions and
      variables.  If it is on a function then the plugin will instrument it for
      gathering control-flow entropy. If the attribute is on a variable then
      the plugin will initialize it with random contents.  The variable must
      be an integer, an integer array type or a structure with integer fields.
      
      These specific functions have been selected because they are init
      functions (to help gather boot-time entropy), are called at unpredictable
      times, or they have variable loops, each of which provide some level of
      latent entropy.
      Signed-off-by: NEmese Revfy <re.emese@gmail.com>
      [kees: expanded commit message]
      Signed-off-by: NKees Cook <keescook@chromium.org>
      0766f788
  18. 10 10月, 2016 1 次提交
  19. 08 10月, 2016 4 次提交
    • C
      nmi_backtrace: generate one-line reports for idle cpus · 6727ad9e
      Chris Metcalf 提交于
      When doing an nmi backtrace of many cores, most of which are idle, the
      output is a little overwhelming and very uninformative.  Suppress
      messages for cpus that are idling when they are interrupted and just
      emit one line, "NMI backtrace for N skipped: idling at pc 0xNNN".
      
      We do this by grouping all the cpuidle code together into a new
      .cpuidle.text section, and then checking the address of the interrupted
      PC to see if it lies within that section.
      
      This commit suitably tags x86 and tile idle routines, and only adds in
      the minimal framework for other architectures.
      
      Link: http://lkml.kernel.org/r/1472487169-14923-5-git-send-email-cmetcalf@mellanox.comSigned-off-by: NChris Metcalf <cmetcalf@mellanox.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Tested-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Tested-by: Daniel Thompson <daniel.thompson@linaro.org> [arm]
      Tested-by: NPetr Mladek <pmladek@suse.com>
      Cc: Aaron Tomlin <atomlin@redhat.com>
      Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6727ad9e
    • C
      nmi_backtrace: do a local dump_stack() instead of a self-NMI · 67766489
      Chris Metcalf 提交于
      Currently on arm there is code that checks whether it should call
      dump_stack() explicitly, to avoid trying to raise an NMI when the
      current context is not preemptible by the backtrace IPI.  Similarly, the
      forthcoming arch/tile support uses an IPI mechanism that does not
      support generating an NMI to self.
      
      Accordingly, move the code that guards this case into the generic
      mechanism, and invoke it unconditionally whenever we want a backtrace of
      the current cpu.  It seems plausible that in all cases, dump_stack()
      will generate better information than generating a stack from the NMI
      handler.  The register state will be missing, but that state is likely
      not particularly helpful in any case.
      
      Or, if we think it is helpful, we should be capturing and emitting the
      current register state in all cases when regs == NULL is passed to
      nmi_cpu_backtrace().
      
      Link: http://lkml.kernel.org/r/1472487169-14923-3-git-send-email-cmetcalf@mellanox.comSigned-off-by: NChris Metcalf <cmetcalf@mellanox.com>
      Tested-by: Daniel Thompson <daniel.thompson@linaro.org> [arm]
      Reviewed-by: NPetr Mladek <pmladek@suse.com>
      Acked-by: NAaron Tomlin <atomlin@redhat.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      67766489
    • C
      nmi_backtrace: add more trigger_*_cpu_backtrace() methods · 9a01c3ed
      Chris Metcalf 提交于
      Patch series "improvements to the nmi_backtrace code" v9.
      
      This patch series modifies the trigger_xxx_backtrace() NMI-based remote
      backtracing code to make it more flexible, and makes a few small
      improvements along the way.
      
      The motivation comes from the task isolation code, where there are
      scenarios where we want to be able to diagnose a case where some cpu is
      about to interrupt a task-isolated cpu.  It can be helpful to see both
      where the interrupting cpu is, and also an approximation of where the
      cpu that is being interrupted is.  The nmi_backtrace framework allows us
      to discover the stack of the interrupted cpu.
      
      I've tested that the change works as desired on tile, and build-tested
      x86, arm, mips, and sparc64.  For x86 I confirmed that the generic
      cpuidle stuff as well as the architecture-specific routines are in the
      new cpuidle section.  For arm, mips, and sparc I just build-tested it
      and made sure the generic cpuidle routines were in the new cpuidle
      section, but I didn't attempt to figure out which the platform-specific
      idle routines might be.  That might be more usefully done by someone
      with platform experience in follow-up patches.
      
      This patch (of 4):
      
      Currently you can only request a backtrace of either all cpus, or all
      cpus but yourself.  It can also be helpful to request a remote backtrace
      of a single cpu, and since we want that, the logical extension is to
      support a cpumask as the underlying primitive.
      
      This change modifies the existing lib/nmi_backtrace.c code to take a
      cpumask as its basic primitive, and modifies the linux/nmi.h code to use
      the new "cpumask" method instead.
      
      The existing clients of nmi_backtrace (arm and x86) are converted to
      using the new cpumask approach in this change.
      
      The other users of the backtracing API (sparc64 and mips) are converted
      to use the cpumask approach rather than the all/allbutself approach.
      The mips code ignored the "include_self" boolean but with this change it
      will now also dump a local backtrace if requested.
      
      Link: http://lkml.kernel.org/r/1472487169-14923-2-git-send-email-cmetcalf@mellanox.comSigned-off-by: NChris Metcalf <cmetcalf@mellanox.com>
      Tested-by: Daniel Thompson <daniel.thompson@linaro.org> [arm]
      Reviewed-by: NAaron Tomlin <atomlin@redhat.com>
      Reviewed-by: NPetr Mladek <pmladek@suse.com>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: David Miller <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9a01c3ed
    • V
      atomic64: no need for CONFIG_ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE · 51a02124
      Vineet Gupta 提交于
      This came to light when implementing native 64-bit atomics for ARCv2.
      
      The atomic64 self-test code uses CONFIG_ARCH_HAS_ATOMIC64_DEC_IF_POSITIVE
      to check whether atomic64_dec_if_positive() is available.  It seems it
      was needed when not every arch defined it.  However as of current code
      the Kconfig option seems needless
      
       - for CONFIG_GENERIC_ATOMIC64 it is auto-enabled in lib/Kconfig and a
         generic definition of API is present lib/atomic64.c
       - arches with native 64-bit atomics select it in arch/*/Kconfig and
         define the API in their headers
      
      So I see no point in keeping the Kconfig option
      
      Compile tested for:
       - blackfin (CONFIG_GENERIC_ATOMIC64)
       - x86 (!CONFIG_GENERIC_ATOMIC64)
       - ia64
      
      Link: http://lkml.kernel.org/r/1473703083-8625-3-git-send-email-vgupta@synopsys.comSigned-off-by: NVineet Gupta <vgupta@synopsys.com>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@mellanox.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Zhaoxiu Zeng <zhaoxiu.zeng@gmail.com>
      Cc: Linus Walleij <linus.walleij@linaro.org>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Herbert Xu <herbert@gondor.apana.org.au>
      Cc: Ming Lin <ming.l@ssi.samsung.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Boqun Feng <boqun.feng@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      51a02124
  20. 06 10月, 2016 3 次提交
    • M
      pipe: add pipe_buf_release() helper · a779638c
      Miklos Szeredi 提交于
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      a779638c
    • A
      new iov_iter flavour: pipe-backed · 241699cd
      Al Viro 提交于
      iov_iter variant for passing data into pipe.  copy_to_iter()
      copies data into page(s) it has allocated and stuffs them into
      the pipe; copy_page_to_iter() stuffs there a reference to the
      page given to it.  Both will try to coalesce if possible.
      iov_iter_zero() is similar to copy_to_iter(); iov_iter_get_pages()
      and friends will do as copy_to_iter() would have and return the
      pages where the data would've been copied.  iov_iter_advance()
      will truncate everything past the spot it has advanced to.
      
      New primitive: iov_iter_pipe(), used for initializing those.
      pipe should be locked all along.
      
      Running out of space acts as fault would for iovec-backed ones;
      in other words, giving it to ->read_iter() may result in short
      read if the pipe overflows, or -EFAULT if it happens with nothing
      copied there.
      
      In other words, ->read_iter() on those acts pretty much like
      ->splice_read().  Moreover, all generic_file_splice_read() users,
      as well as many other ->splice_read() instances can be switched
      to that scheme - that'll happen in the next commit.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      241699cd
    • J
      mm: filemap: don't plant shadow entries without radix tree node · d3798ae8
      Johannes Weiner 提交于
      When the underflow checks were added to workingset_node_shadow_dec(),
      they triggered immediately:
      
        kernel BUG at ./include/linux/swap.h:276!
        invalid opcode: 0000 [#1] SMP
        Modules linked in: isofs usb_storage fuse xt_CHECKSUM ipt_MASQUERADE nf_nat_masquerade_ipv4 tun nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_REJECT nf_reject_ipv6
         soundcore wmi acpi_als pinctrl_sunrisepoint kfifo_buf tpm_tis industrialio acpi_pad pinctrl_intel tpm_tis_core tpm nfsd auth_rpcgss nfs_acl lockd grace sunrpc dm_crypt
        CPU: 0 PID: 20929 Comm: blkid Not tainted 4.8.0-rc8-00087-gbe67d60b #1
        Hardware name: System manufacturer System Product Name/Z170-K, BIOS 1803 05/06/2016
        task: ffff8faa93ecd940 task.stack: ffff8faa7f478000
        RIP: page_cache_tree_insert+0xf1/0x100
        Call Trace:
          __add_to_page_cache_locked+0x12e/0x270
          add_to_page_cache_lru+0x4e/0xe0
          mpage_readpages+0x112/0x1d0
          blkdev_readpages+0x1d/0x20
          __do_page_cache_readahead+0x1ad/0x290
          force_page_cache_readahead+0xaa/0x100
          page_cache_sync_readahead+0x3f/0x50
          generic_file_read_iter+0x5af/0x740
          blkdev_read_iter+0x35/0x40
          __vfs_read+0xe1/0x130
          vfs_read+0x96/0x130
          SyS_read+0x55/0xc0
          entry_SYSCALL_64_fastpath+0x13/0x8f
        Code: 03 00 48 8b 5d d8 65 48 33 1c 25 28 00 00 00 44 89 e8 75 19 48 83 c4 18 5b 41 5c 41 5d 41 5e 5d c3 0f 0b 41 bd ef ff ff ff eb d7 <0f> 0b e8 88 68 ef ff 0f 1f 84 00
        RIP  page_cache_tree_insert+0xf1/0x100
      
      This is a long-standing bug in the way shadow entries are accounted in
      the radix tree nodes. The shrinker needs to know when radix tree nodes
      contain only shadow entries, no pages, so node->count is split in half
      to count shadows in the upper bits and pages in the lower bits.
      
      Unfortunately, the radix tree implementation doesn't know of this and
      assumes all entries are in node->count. When there is a shadow entry
      directly in root->rnode and the tree is later extended, the radix tree
      implementation will copy that entry into the new node and and bump its
      node->count, i.e. increases the page count bits. Once the shadow gets
      removed and we subtract from the upper counter, node->count underflows
      and triggers the warning. Afterwards, without node->count reaching 0
      again, the radix tree node is leaked.
      
      Limit shadow entries to when we have actual radix tree nodes and can
      count them properly. That means we lose the ability to detect refaults
      from files that had only the first page faulted in at eviction time.
      
      Fixes: 449dd698 ("mm: keep page cache radix tree nodes in check")
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Reported-and-tested-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d3798ae8
  21. 01 10月, 2016 1 次提交