1. 26 5月, 2018 1 次提交
    • M
      idr: fix invalid ptr dereference on item delete · 7a4deea1
      Matthew Wilcox 提交于
      If the radix tree underlying the IDR happens to be full and we attempt
      to remove an id which is larger than any id in the IDR, we will call
      __radix_tree_delete() with an uninitialised 'slot' pointer, at which
      point anything could happen.  This was easiest to hit with a single
      entry at id 0 and attempting to remove a non-0 id, but it could have
      happened with 64 entries and attempting to remove an id >= 64.
      
      Roman said:
      
        The syzcaller test boils down to opening /dev/kvm, creating an
        eventfd, and calling a couple of KVM ioctls. None of this requires
        superuser. And the result is dereferencing an uninitialized pointer
        which is likely a crash. The specific path caught by syzbot is via
        KVM_HYPERV_EVENTD ioctl which is new in 4.17. But I guess there are
        other user-triggerable paths, so cc:stable is probably justified.
      
      Matthew added:
      
        We have around 250 calls to idr_remove() in the kernel today. Many of
        them pass an ID which is embedded in the object they're removing, so
        they're safe. Picking a few likely candidates:
      
        drivers/firewire/core-cdev.c looks unsafe; the ID comes from an ioctl.
        drivers/gpu/drm/amd/amdgpu/amdgpu_ctx.c is similar
        drivers/atm/nicstar.c could be taken down by a handcrafted packet
      
      Link: http://lkml.kernel.org/r/20180518175025.GD6361@bombadil.infradead.org
      Fixes: 0a835c4f ("Reimplement IDR and IDA using the radix tree")
      Reported-by: <syzbot+35666cba7f0a337e2e79@syzkaller.appspotmail.com>
      Debugged-by: NRoman Kagan <rkagan@virtuozzo.com>
      Signed-off-by: NMatthew Wilcox <mawilcox@microsoft.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7a4deea1
  2. 19 5月, 2018 2 次提交
    • R
      radix tree: fix multi-order iteration race · 9f418224
      Ross Zwisler 提交于
      Fix a race in the multi-order iteration code which causes the kernel to
      hit a GP fault.  This was first seen with a production v4.15 based
      kernel (4.15.6-300.fc27.x86_64) utilizing a DAX workload which used
      order 9 PMD DAX entries.
      
      The race has to do with how we tear down multi-order sibling entries
      when we are removing an item from the tree.  Remember for example that
      an order 2 entry looks like this:
      
        struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling]
      
      where 'entry' is in some slot in the struct radix_tree_node, and the
      three slots following 'entry' contain sibling pointers which point back
      to 'entry.'
      
      When we delete 'entry' from the tree, we call :
      
        radix_tree_delete()
          radix_tree_delete_item()
            __radix_tree_delete()
              replace_slot()
      
      replace_slot() first removes the siblings in order from the first to the
      last, then at then replaces 'entry' with NULL.  This means that for a
      brief period of time we end up with one or more of the siblings removed,
      so:
      
        struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling]
      
      This causes an issue if you have a reader iterating over the slots in
      the tree via radix_tree_for_each_slot() while only under
      rcu_read_lock()/rcu_read_unlock() protection.  This is a common case in
      mm/filemap.c.
      
      The issue is that when __radix_tree_next_slot() => skip_siblings() tries
      to skip over the sibling entries in the slots, it currently does so with
      an exact match on the slot directly preceding our current slot.
      Normally this works:
      
                                            V preceding slot
        struct radix_tree_node.slots[] = [entry][sibling][sibling][sibling]
                                                    ^ current slot
      
      This lets you find the first sibling, and you skip them all in order.
      
      But in the case where one of the siblings is NULL, that slot is skipped
      and then our sibling detection is interrupted:
      
                                                   V preceding slot
        struct radix_tree_node.slots[] = [entry][NULL][sibling][sibling]
                                                          ^ current slot
      
      This means that the sibling pointers aren't recognized since they point
      all the way back to 'entry', so we think that they are normal internal
      radix tree pointers.  This causes us to think we need to walk down to a
      struct radix_tree_node starting at the address of 'entry'.
      
      In a real running kernel this will crash the thread with a GP fault when
      you try and dereference the slots in your broken node starting at
      'entry'.
      
      We fix this race by fixing the way that skip_siblings() detects sibling
      nodes.  Instead of testing against the preceding slot we instead look
      for siblings via is_sibling_entry() which compares against the position
      of the struct radix_tree_node.slots[] array.  This ensures that sibling
      entries are properly identified, even if they are no longer contiguous
      with the 'entry' they point to.
      
      Link: http://lkml.kernel.org/r/20180503192430.7582-6-ross.zwisler@linux.intel.com
      Fixes: 148deab2 ("radix-tree: improve multiorder iterators")
      Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Reported-by: NCR, Sapthagirish <sapthagirish.cr@intel.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9f418224
    • M
      lib/test_bitmap.c: fix bitmap optimisation tests to report errors correctly · 1e3054b9
      Matthew Wilcox 提交于
      I had neglected to increment the error counter when the tests failed,
      which made the tests noisy when they fail, but not actually return an
      error code.
      
      Link: http://lkml.kernel.org/r/20180509114328.9887-1-mpe@ellerman.id.au
      Fixes: 3cc78125 ("lib/test_bitmap.c: add optimisation tests")
      Signed-off-by: NMatthew Wilcox <mawilcox@microsoft.com>
      Signed-off-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reported-by: NMichael Ellerman <mpe@ellerman.id.au>
      Tested-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Cc: Yury Norov <ynorov@caviumnetworks.com>
      Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: <stable@vger.kernel.org>	[4.13+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1e3054b9
  3. 16 5月, 2018 1 次提交
    • S
      vsprintf: Replace memory barrier with static_key for random_ptr_key update · 85f4f12d
      Steven Rostedt (VMware) 提交于
      Reviewing Tobin's patches for getting pointers out early before
      entropy has been established, I noticed that there's a lone smp_mb() in
      the code. As with most lone memory barriers, this one appears to be
      incorrectly used.
      
      We currently basically have this:
      
      	get_random_bytes(&ptr_key, sizeof(ptr_key));
      	/*
      	 * have_filled_random_ptr_key==true is dependent on get_random_bytes().
      	 * ptr_to_id() needs to see have_filled_random_ptr_key==true
      	 * after get_random_bytes() returns.
      	 */
      	smp_mb();
      	WRITE_ONCE(have_filled_random_ptr_key, true);
      
      And later we have:
      
      	if (unlikely(!have_filled_random_ptr_key))
      		return string(buf, end, "(ptrval)", spec);
      
      /* Missing memory barrier here. */
      
      	hashval = (unsigned long)siphash_1u64((u64)ptr, &ptr_key);
      
      As the CPU can perform speculative loads, we could have a situation
      with the following:
      
      	CPU0				CPU1
      	----				----
      				   load ptr_key = 0
         store ptr_key = random
         smp_mb()
         store have_filled_random_ptr_key
      
      				   load have_filled_random_ptr_key = true
      
      				    BAD BAD BAD! (you're so bad!)
      
      Because nothing prevents CPU1 from loading ptr_key before loading
      have_filled_random_ptr_key.
      
      But this race is very unlikely, but we can't keep an incorrect smp_mb() in
      place. Instead, replace the have_filled_random_ptr_key with a static_branch
      not_filled_random_ptr_key, that is initialized to true and changed to false
      when we get enough entropy. If the update happens in early boot, the
      static_key is updated immediately, otherwise it will have to wait till
      entropy is filled and this happens in an interrupt handler which can't
      enable a static_key, as that requires a preemptible context. In that case, a
      work_queue is used to enable it, as entropy already took too long to
      establish in the first place waiting a little more shouldn't hurt anything.
      
      The benefit of using the static key is that the unlikely branch in
      vsprintf() now becomes a nop.
      
      Link: http://lkml.kernel.org/r/20180515100558.21df515e@gandalf.local.home
      
      Cc: stable@vger.kernel.org
      Fixes: ad67b74d ("printk: hash addresses printed with %p")
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      85f4f12d
  4. 12 5月, 2018 2 次提交
  5. 03 5月, 2018 2 次提交
  6. 02 5月, 2018 1 次提交
  7. 27 4月, 2018 1 次提交
    • M
      errseq: Always report a writeback error once · b4678df1
      Matthew Wilcox 提交于
      The errseq_t infrastructure assumes that errors which occurred before
      the file descriptor was opened are of no interest to the application.
      This turns out to be a regression for some applications, notably Postgres.
      
      Before errseq_t, a writeback error would be reported exactly once (as
      long as the inode remained in memory), so Postgres could open a file,
      call fsync() and find out whether there had been a writeback error on
      that file from another process.
      
      This patch changes the errseq infrastructure to report errors to all
      file descriptors which are opened after the error occurred, but before
      it was reported to any file descriptor.  This restores the user-visible
      behaviour.
      
      Cc: stable@vger.kernel.org
      Fixes: 5660e13d ("fs: new infrastructure for writeback error handling and reporting")
      Signed-off-by: NMatthew Wilcox <mawilcox@microsoft.com>
      Reviewed-by: NJeff Layton <jlayton@kernel.org>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      b4678df1
  8. 23 4月, 2018 2 次提交
  9. 17 4月, 2018 1 次提交
  10. 14 4月, 2018 1 次提交
  11. 13 4月, 2018 1 次提交
  12. 12 4月, 2018 9 次提交
  13. 11 4月, 2018 1 次提交
  14. 10 4月, 2018 1 次提交
  15. 06 4月, 2018 3 次提交
  16. 01 4月, 2018 1 次提交
  17. 31 3月, 2018 4 次提交
  18. 30 3月, 2018 1 次提交
  19. 28 3月, 2018 2 次提交
  20. 27 3月, 2018 1 次提交
  21. 26 3月, 2018 2 次提交