1. 06 6月, 2018 4 次提交
    • K
      test_overflow: Add memory allocation overflow tests · ca90800a
      Kees Cook 提交于
      Make sure that the memory allocators are behaving as expected in the face
      of overflows of multiplied arguments or when using the array_size()-family
      helpers.
      
      Example output of new tests (with the expected __alloc_pages_slowpath
      and vmalloc warnings about refusing giant allocations removed):
      
      [   93.062076] test_overflow: kmalloc detected saturation
      [   93.062988] test_overflow: kmalloc_node detected saturation
      [   93.063818] test_overflow: kzalloc detected saturation
      [   93.064539] test_overflow: kzalloc_node detected saturation
      [   93.120386] test_overflow: kvmalloc detected saturation
      [   93.143458] test_overflow: kvmalloc_node detected saturation
      [   93.166861] test_overflow: kvzalloc detected saturation
      [   93.189924] test_overflow: kvzalloc_node detected saturation
      [   93.221671] test_overflow: vmalloc detected saturation
      [   93.246326] test_overflow: vmalloc_node detected saturation
      [   93.270260] test_overflow: vzalloc detected saturation
      [   93.293824] test_overflow: vzalloc_node detected saturation
      [   93.294597] test_overflow: devm_kmalloc detected saturation
      [   93.295383] test_overflow: devm_kzalloc detected saturation
      [   93.296217] test_overflow: all tests passed
      Signed-off-by: NKees Cook <keescook@chromium.org>
      ca90800a
    • K
      test_overflow: Report test failures · 8fee81aa
      Kees Cook 提交于
      This adjusts the overflow test to report failures, and prepares to
      add allocation tests.
      Signed-off-by: NKees Cook <keescook@chromium.org>
      8fee81aa
    • R
      test_overflow: macrofy some more, do more tests for free · 6d334432
      Rasmus Villemoes 提交于
      Obviously a+b==b+a and a*b==b*a, but the implementation of the fallback
      checks are not entirely symmetric in how they treat a and b. So we might
      as well check the (b,a,r,of) tuple as well as the (a,b,r,of) one for +
      and *. Rather than more copy-paste, factor out the common part to
      check_one_op.
      Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      6d334432
    • R
      lib: add runtime test of check_*_overflow functions · 455a35a6
      Rasmus Villemoes 提交于
      This adds a small module for testing that the check_*_overflow
      functions work as expected, whether implemented in C or using gcc
      builtins.
      
      Example output:
      
      test_overflow: u8 : 18 tests
      test_overflow: s8 : 19 tests
      test_overflow: u16: 17 tests
      test_overflow: s16: 17 tests
      test_overflow: u32: 17 tests
      test_overflow: s32: 17 tests
      test_overflow: u64: 17 tests
      test_overflow: s64: 21 tests
      Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
      [kees: add output to commit log, drop u64 tests on 32-bit]
      Signed-off-by: NKees Cook <keescook@chromium.org>
      455a35a6
  2. 05 6月, 2018 1 次提交
    • G
      lib/vsprintf: Remove atomic-unsafe support for %pCr · 666902e4
      Geert Uytterhoeven 提交于
      "%pCr" formats the current rate of a clock, and calls clk_get_rate().
      The latter obtains a mutex, hence it must not be called from atomic
      context.
      
      Remove support for this rarely-used format, as vsprintf() (and e.g.
      printk()) must be callable from any context.
      
      Any remaining out-of-tree users will start seeing the clock's name
      printed instead of its rate.
      Reported-by: NJia-Ju Bai <baijiaju1990@gmail.com>
      Fixes: 900cca29 ("lib/vsprintf: add %pC{,n,r} format specifiers for clocks")
      Link: http://lkml.kernel.org/r/1527845302-12159-5-git-send-email-geert+renesas@glider.be
      To: Jia-Ju Bai <baijiaju1990@gmail.com>
      To: Jonathan Corbet <corbet@lwn.net>
      To: Michael Turquette <mturquette@baylibre.com>
      To: Stephen Boyd <sboyd@kernel.org>
      To: Zhang Rui <rui.zhang@intel.com>
      To: Eduardo Valentin <edubezval@gmail.com>
      To: Eric Anholt <eric@anholt.net>
      To: Stefan Wahren <stefan.wahren@i2se.com>
      To: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: linux-doc@vger.kernel.org
      Cc: linux-clk@vger.kernel.org
      Cc: linux-pm@vger.kernel.org
      Cc: linux-serial@vger.kernel.org
      Cc: linux-arm-kernel@lists.infradead.org
      Cc: linux-renesas-soc@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Cc: Geert Uytterhoeven <geert+renesas@glider.be>
      Cc: stable@vger.kernel.org # 4.1+
      Signed-off-by: NGeert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: NPetr Mladek <pmladek@suse.com>
      666902e4
  3. 03 6月, 2018 1 次提交
  4. 01 6月, 2018 1 次提交
  5. 28 5月, 2018 1 次提交
  6. 26 5月, 2018 1 次提交
    • M
      idr: fix invalid ptr dereference on item delete · 7a4deea1
      Matthew Wilcox 提交于
      If the radix tree underlying the IDR happens to be full and we attempt
      to remove an id which is larger than any id in the IDR, we will call
      __radix_tree_delete() with an uninitialised 'slot' pointer, at which
      point anything could happen.  This was easiest to hit with a single
      entry at id 0 and attempting to remove a non-0 id, but it could have
      happened with 64 entries and attempting to remove an id >= 64.
      
      Roman said:
      
        The syzcaller test boils down to opening /dev/kvm, creating an
        eventfd, and calling a couple of KVM ioctls. None of this requires
        superuser. And the result is dereferencing an uninitialized pointer
        which is likely a crash. The specific path caught by syzbot is via
        KVM_HYPERV_EVENTD ioctl which is new in 4.17. But I guess there are
        other user-triggerable paths, so cc:stable is probably justified.
      
      Matthew added:
      
        We have around 250 calls to idr_remove() in the kernel today. Many of
        them pass an ID which is embedded in the object they're removing, so
        they're safe. Picking a few likely candidates:
      
        drivers/firewire/core-cdev.c looks unsafe; the ID comes from an ioctl.
        drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c is similar
        drivers/atm/nicstar.c could be taken down by a handcrafted packet
      
      Link: http://lkml.kernel.org/r/20180518175025.GD6361@bombadil.infradead.org
      Fixes: 0a835c4f ("Reimplement IDR and IDA using the radix tree")
      Reported-by: <syzbot+35666cba7f0a337e2e79@syzkaller.appspotmail.com>
      Debugged-by: NRoman Kagan <rkagan@virtuozzo.com>
      Signed-off-by: NMatthew Wilcox <mawilcox@microsoft.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7a4deea1
  7. 25 5月, 2018 1 次提交
    • M
      blk-mq: avoid starving tag allocation after allocating process migrates · e6fc4649
      Ming Lei 提交于
      When the allocation process is scheduled back and the mapped hw queue is
      changed, fake one extra wake up on previous queue for compensating wake
      up miss, so other allocations on the previous queue won't be starved.
      
      This patch fixes one request allocation hang issue, which can be
      triggered easily in case of very low nr_request.
      
      The race is as follows:
      
      1) 2 hw queues, nr_requests are 2, and wake_batch is one
      
      2) there are 3 waiters on hw queue 0
      
      3) two in-flight requests in hw queue 0 are completed, and only two
         waiters of 3 are waken up because of wake_batch, but both the two
         waiters can be scheduled to another CPU and cause to switch to hw
         queue 1
      
      4) then the 3rd waiter will wait for ever, since no in-flight request
         is in hw queue 0 any more.
      
      5) this patch fixes it by the fake wakeup when waiter is scheduled to
         another hw queue
      
      Cc: <stable@vger.kernel.org>
      Reviewed-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      
      Modified commit message to make it clearer, and make it apply on
      top of the 4.18 branch.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      e6fc4649
  8. 24 5月, 2018 1 次提交
    • R
      dma-debug: check scatterlist segments · 78c47830
      Robin Murphy 提交于
      Drivers/subsystems creating scatterlists for DMA should be taking care
      to respect the scatter-gather limitations of the appropriate device, as
      described by dma_parms. A DMA API implementation cannot feasibly split
      a scatterlist into *more* entries than originally passed, so it is not
      well defined what they should do when given a segment larger than the
      limit they are also required to respect.
      
      Conversely, devices which are less limited than the rather conservative
      defaults, or indeed have no limitations at all (e.g. GPUs with their own
      internal MMU), should be encouraged to set appropriate dma_parms, as
      they may get more efficient DMA mapping performance out of it.
      Signed-off-by: NRobin Murphy <robin.murphy@arm.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      78c47830
  9. 19 5月, 2018 4 次提交
    • C
      dma-mapping: provide a generic dma-noncoherent implementation · 782e6769
      Christoph Hellwig 提交于
      Add a new dma_map_ops implementation that uses dma-direct for the
      address mapping of streaming mappings, and which requires arch-specific
      implemenations of coherent allocate/free.
      
      Architectures have to provide flushing helpers to ownership trasnfers
      to the device and/or CPU, and can provide optional implementations of
      the coherent mmap functionality, and the cache_flush routines for
      non-coherent long term allocations.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Tested-by: NAlexey Brodkin <abrodkin@synopsys.com>
      Acked-by: NVineet Gupta <vgupta@synopsys.com>
      782e6769
    • C
      dma-mapping: simplify Kconfig dependencies · 35ddb69c
      Christoph Hellwig 提交于
      ARCH_DMA_ADDR_T_64BIT is always true for 64-bit architectures now, so we
      can skip the clause requiring it.  'n' is the default default, so no need
      to explicitly state it.
      Tested-by: NAlexey Brodkin <abrodkin@synopsys.com>
      Acked-by: NVineet Gupta <vgupta@synopsys.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      35ddb69c
    • R
      radix tree: fix multi-order iteration race · 9f418224
      Ross Zwisler 提交于
      Fix a race in the multi-order iteration code which causes the kernel to
      hit a GP fault.  This was first seen with a production v4.15 based
      kernel (4.15.6-300.fc27.x86_64) utilizing a DAX workload which used
      order 9 PMD DAX entries.
      
      The race has to do with how we tear down multi-order sibling entries
      when we are removing an item from the tree.  Remember for example that
      an order 2 entry looks like this:
      
        struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling]
      
      where 'entry' is in some slot in the struct radix_tree_node, and the
      three slots following 'entry' contain sibling pointers which point back
      to 'entry.'
      
      When we delete 'entry' from the tree, we call :
      
        radix_tree_delete()
          radix_tree_delete_item()
            __radix_tree_delete()
              replace_slot()
      
      replace_slot() first removes the siblings in order from the first to the
      last, then at then replaces 'entry' with NULL.  This means that for a
      brief period of time we end up with one or more of the siblings removed,
      so:
      
        struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling]
      
      This causes an issue if you have a reader iterating over the slots in
      the tree via radix_tree_for_each_slot() while only under
      rcu_read_lock()/rcu_read_unlock() protection.  This is a common case in
      mm/filemap.c.
      
      The issue is that when __radix_tree_next_slot() => skip_siblings() tries
      to skip over the sibling entries in the slots, it currently does so with
      an exact match on the slot directly preceding our current slot.
      Normally this works:
      
                                            V preceding slot
        struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling]
                                                    ^ current slot
      
      This lets you find the first sibling, and you skip them all in order.
      
      But in the case where one of the siblings is NULL, that slot is skipped
      and then our sibling detection is interrupted:
      
                                                   V preceding slot
        struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling]
                                                          ^ current slot
      
      This means that the sibling pointers aren't recognized since they point
      all the way back to 'entry', so we think that they are normal internal
      radix tree pointers.  This causes us to think we need to walk down to a
      struct radix_tree_node starting at the address of 'entry'.
      
      In a real running kernel this will crash the thread with a GP fault when
      you try and dereference the slots in your broken node starting at
      'entry'.
      
      We fix this race by fixing the way that skip_siblings() detects sibling
      nodes.  Instead of testing against the preceding slot we instead look
      for siblings via is_sibling_entry() which compares against the position
      of the struct radix_tree_node.slots[] array.  This ensures that sibling
      entries are properly identified, even if they are no longer contiguous
      with the 'entry' they point to.
      
      Link: http://lkml.kernel.org/r/20180503192430.7582-6-ross.zwisler@linux.intel.com
      Fixes: 148deab2 ("radix-tree: improve multiorder iterators")
      Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Reported-by: NCR, Sapthagirish <sapthagirish.cr@intel.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9f418224
    • M
      lib/test_bitmap.c: fix bitmap optimisation tests to report errors correctly · 1e3054b9
      Matthew Wilcox 提交于
      I had neglected to increment the error counter when the tests failed,
      which made the tests noisy when they fail, but not actually return an
      error code.
      
      Link: http://lkml.kernel.org/r/20180509114328.9887-1-mpe@ellerman.id.au
      Fixes: 3cc78125 ("lib/test_bitmap.c: add optimisation tests")
      Signed-off-by: NMatthew Wilcox <mawilcox@microsoft.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reported-by: NMichael Ellerman <mpe@ellerman.id.au>
      Tested-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Cc: Yury Norov <ynorov@caviumnetworks.com>
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: <stable@vger.kernel.org>	[4.13+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1e3054b9
  10. 16 5月, 2018 1 次提交
    • S
      vsprintf: Replace memory barrier with static_key for random_ptr_key update · 85f4f12d
      Steven Rostedt (VMware) 提交于
      Reviewing Tobin's patches for getting pointers out early before
      entropy has been established, I noticed that there's a lone smp_mb() in
      the code. As with most lone memory barriers, this one appears to be
      incorrectly used.
      
      We currently basically have this:
      
      	get_random_bytes(&ptr_key, sizeof(ptr_key));
      	/*
      	 * have_filled_random_ptr_key==true is dependent on get_random_bytes().
      	 * ptr_to_id() needs to see have_filled_random_ptr_key==true
      	 * after get_random_bytes() returns.
      	 */
      	smp_mb();
      	WRITE_ONCE(have_filled_random_ptr_key, true);
      
      And later we have:
      
      	if (unlikely(!have_filled_random_ptr_key))
      		return string(buf, end, "(ptrval)", spec);
      
      /* Missing memory barrier here. */
      
      	hashval = (unsigned long)siphash_1u64((u64)ptr, &ptr_key);
      
      As the CPU can perform speculative loads, we could have a situation
      with the following:
      
      	CPU0				CPU1
      	----				----
      				   load ptr_key = 0
         store ptr_key = random
         smp_mb()
         store have_filled_random_ptr_key
      
      				   load have_filled_random_ptr_key = true
      
      				    BAD BAD BAD! (you're so bad!)
      
      Because nothing prevents CPU1 from loading ptr_key before loading
      have_filled_random_ptr_key.
      
      But this race is very unlikely, but we can't keep an incorrect smp_mb() in
      place. Instead, replace the have_filled_random_ptr_key with a static_branch
      not_filled_random_ptr_key, that is initialized to true and changed to false
      when we get enough entropy. If the update happens in early boot, the
      static_key is updated immediately, otherwise it will have to wait till
      entropy is filled and this happens in an interrupt handler which can't
      enable a static_key, as that requires a preemptible context. In that case, a
      work_queue is used to enable it, as entropy already took too long to
      establish in the first place waiting a little more shouldn't hurt anything.
      
      The benefit of using the static key is that the unlikely branch in
      vsprintf() now becomes a nop.
      
      Link: http://lkml.kernel.org/r/20180515100558.21df515e@gandalf.local.home
      
      Cc: stable@vger.kernel.org
      Fixes: ad67b74d ("printk: hash addresses printed with %p")
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      85f4f12d
  11. 15 5月, 2018 2 次提交
    • D
      x86/asm/memcpy_mcsafe: Define copy_to_iter_mcsafe() · 8780356e
      Dan Williams 提交于
      Use the updated memcpy_mcsafe() implementation to define
      copy_user_mcsafe() and copy_to_iter_mcsafe(). The most significant
      difference from typical copy_to_iter() is that the ITER_KVEC and
      ITER_BVEC iterator types can fail to complete a full transfer.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: hch@lst.de
      Cc: linux-fsdevel@vger.kernel.org
      Cc: linux-nvdimm@lists.01.org
      Link: http://lkml.kernel.org/r/152539239150.31796.9189779163576449784.stgit@dwillia2-desk3.amr.corp.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8780356e
    • J
      sbitmap: fix race in wait batch accounting · c854ab57
      Jens Axboe 提交于
      If we have multiple callers of sbq_wake_up(), we can end up in a
      situation where the wait_cnt will continually go more and more
      negative. Consider the case where our wake batch is 1, hence
      wait_cnt will start out as 1.
      
      wait_cnt == 1
      
      CPU0				CPU1
      atomic_dec_return(), cnt == 0
      				atomic_dec_return(), cnt == -1
      				cmpxchg(-1, 0) (succeeds)
      				[wait_cnt now 0]
      cmpxchg(0, 1) (fails)
      
      This ends up with wait_cnt being 0, we'll wakeup immediately
      next time. Going through the same loop as above again, and
      we'll have wait_cnt -1.
      
      For the case where we have a larger wake batch, the only
      difference is that the starting point will be higher. We'll
      still end up with continually smaller batch wakeups, which
      defeats the purpose of the rolling wakeups.
      
      Always reset the wait_cnt to the batch value. Then it doesn't
      matter who wins the race. But ensure that whomever does win
      the race is the one that increments the ws index and wakes up
      our batch count, loser gets to call __sbq_wake_up() again to
      account his wakeups towards the next active wait state index.
      
      Fixes: 6c0ca7ae ("sbitmap: fix wakeup hang after sbq resize")
      Reviewed-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      c854ab57
  12. 12 5月, 2018 2 次提交
  13. 11 5月, 2018 2 次提交
  14. 09 5月, 2018 10 次提交
  15. 08 5月, 2018 4 次提交
  16. 07 5月, 2018 3 次提交
  17. 04 5月, 2018 1 次提交
    • D
      bpf: migrate ebpf ld_abs/ld_ind tests to test_verifier · 93731ef0
      Daniel Borkmann 提交于
      Remove all eBPF tests involving LD_ABS/LD_IND from test_bpf.ko. Reason
      is that the eBPF tests from test_bpf module do not go via BPF verifier
      and therefore any instruction rewrites from verifier cannot take place.
      
      Therefore, move them into test_verifier which runs out of user space,
      so that verfier can rewrite LD_ABS/LD_IND internally in upcoming patches.
      It will have the same effect since runtime tests are also performed from
      there. This also allows to finally unexport bpf_skb_vlan_{push,pop}_proto
      and keep it internal to core kernel.
      
      Additionally, also add further cBPF LD_ABS/LD_IND test coverage into
      test_bpf.ko suite.
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      93731ef0