1. 07 9月, 2017 30 次提交
  2. 02 9月, 2017 1 次提交
    • O
      epoll: fix race between ep_poll_callback(POLLFREE) and ep_free()/ep_remove() · 138e4ad6
      Oleg Nesterov 提交于
      The race was introduced by me in commit 971316f0 ("epoll:
      ep_unregister_pollwait() can use the freed pwq->whead").  I did not
      realize that nothing can protect eventpoll after ep_poll_callback() sets
      ->whead = NULL, only whead->lock can save us from the race with
      ep_free() or ep_remove().
      
      Move ->whead = NULL to the end of ep_poll_callback() and add the
      necessary barriers.
      
      TODO: cleanup the ewake/EPOLLEXCLUSIVE logic, it was confusing even
      before this patch.
      
      Hopefully this explains use-after-free reported by syzcaller:
      
      	BUG: KASAN: use-after-free in debug_spin_lock_before
      	...
      	 _raw_spin_lock_irqsave+0x4a/0x60 kernel/locking/spinlock.c:159
      	 ep_poll_callback+0x29f/0xff0 fs/eventpoll.c:1148
      
      this is spin_lock(eventpoll->lock),
      
      	...
      	Freed by task 17774:
      	...
      	 kfree+0xe8/0x2c0 mm/slub.c:3883
      	 ep_free+0x22c/0x2a0 fs/eventpoll.c:865
      
      Fixes: 971316f0 ("epoll: ep_unregister_pollwait() can use the freed pwq->whead")
      Reported-by: N范龙飞 <long7573@126.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      138e4ad6
  3. 01 9月, 2017 4 次提交
  4. 31 8月, 2017 2 次提交
  5. 29 8月, 2017 1 次提交
    • H
      fs/select: Fix memory corruption in compat_get_fd_set() · 79de3cbe
      Helge Deller 提交于
      Commit 464d6242 ("select: switch compat_{get,put}_fd_set() to
      compat_{get,put}_bitmap()") changed the calculation on how many bytes
      need to be zeroed when userspace handed over a NULL pointer for a fdset
      array in the select syscall.
      
      The calculation was changed in compat_get_fd_set() wrongly from
      	memset(fdset, 0, ((nr + 1) & ~1)*sizeof(compat_ulong_t));
      to
      	memset(fdset, 0, ALIGN(nr, BITS_PER_LONG));
      
      The ALIGN(nr, BITS_PER_LONG) calculates the number of _bits_ which need
      to be zeroed in the target fdset array (rounded up to the next full bits
      for an unsigned long).
      
      But the memset() call expects the number of _bytes_ to be zeroed.
      
      This leads to clearing more memory than wanted (on the stack area or
      even at kmalloc()ed memory areas) and to random kernel crashes as we
      have seen them on the parisc platform.
      
      The correct change should have been
      
      	memset(fdset, 0, (ALIGN(nr, BITS_PER_LONG) / BITS_PER_LONG) * BYTES_PER_LONG);
      
      which is the same as can be archieved with a call to
      
      	zero_fd_set(nr, fdset).
      
      Fixes: 464d6242 ("select: switch compat_{get,put}_fd_set() to compat_{get,put}_bitmap()"
      Acked-by: N: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NHelge Deller <deller@gmx.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      79de3cbe
  6. 26 8月, 2017 1 次提交
    • R
      dax: fix deadlock due to misaligned PMD faults · fffa281b
      Ross Zwisler 提交于
      In DAX there are two separate places where the 2MiB range of a PMD is
      defined.
      
      The first is in the page tables, where a PMD mapping inserted for a
      given address spans from (vmf->address & PMD_MASK) to ((vmf->address &
      PMD_MASK) + PMD_SIZE - 1).  That is, from the 2MiB boundary below the
      address to the 2MiB boundary above the address.
      
      So, for example, a fault at address 3MiB (0x30 0000) falls within the
      PMD that ranges from 2MiB (0x20 0000) to 4MiB (0x40 0000).
      
      The second PMD range is in the mapping->page_tree, where a given file
      offset is covered by a radix tree entry that spans from one 2MiB aligned
      file offset to another 2MiB aligned file offset.
      
      So, for example, the file offset for 3MiB (pgoff 768) falls within the
      PMD range for the order 9 radix tree entry that ranges from 2MiB (pgoff
      512) to 4MiB (pgoff 1024).
      
      This system works so long as the addresses and file offsets for a given
      mapping both have the same offsets relative to the start of each PMD.
      
      Consider the case where the starting address for a given file isn't 2MiB
      aligned - say our faulting address is 3 MiB (0x30 0000), but that
      corresponds to the beginning of our file (pgoff 0).  Now all the PMDs in
      the mapping are misaligned so that the 2MiB range defined in the page
      tables never matches up with the 2MiB range defined in the radix tree.
      
      The current code notices this case for DAX faults to storage with the
      following test in dax_pmd_insert_mapping():
      
      	if (pfn_t_to_pfn(pfn) & PG_PMD_COLOUR)
      		goto unlock_fallback;
      
      This test makes sure that the pfn we get from the driver is 2MiB
      aligned, and relies on the assumption that the 2MiB alignment of the pfn
      we get back from the driver matches the 2MiB alignment of the faulting
      address.
      
      However, faults to holes were not checked and we could hit the problem
      described above.
      
      This was reported in response to the NVML nvml/src/test/pmempool_sync
      TEST5:
      
      	$ cd nvml/src/test/pmempool_sync
      	$ make TEST5
      
      You can grab NVML here:
      
      	https://github.com/pmem/nvml/
      
      The dmesg warning you see when you hit this error is:
      
        WARNING: CPU: 13 PID: 2900 at fs/dax.c:641 dax_insert_mapping_entry+0x2df/0x310
      
      Where we notice in dax_insert_mapping_entry() that the radix tree entry
      we are about to replace doesn't match the locked entry that we had
      previously inserted into the tree.  This happens because the initial
      insertion was done in grab_mapping_entry() using a pgoff calculated from
      the faulting address (vmf->address), and the replacement in
      dax_pmd_load_hole() => dax_insert_mapping_entry() is done using
      vmf->pgoff.
      
      In our failure case those two page offsets (one calculated from
      vmf->address, one using vmf->pgoff) point to different order 9 radix
      tree entries.
      
      This failure case can result in a deadlock because the radix tree unlock
      also happens on the pgoff calculated from vmf->address.  This means that
      the locked radix tree entry that we swapped in to the tree in
      dax_insert_mapping_entry() using vmf->pgoff is never unlocked, so all
      future faults to that 2MiB range will block forever.
      
      Fix this by validating that the faulting address's PMD offset matches
      the PMD offset from the start of the file.  This check is done at the
      very beginning of the fault and covers faults that would have mapped to
      storage as well as faults to holes.  I left the COLOUR check in
      dax_pmd_insert_mapping() in place in case we ever hit the insanity
      condition where the alignment of the pfn we get from the driver doesn't
      match the alignment of the userspace address.
      
      Link: http://lkml.kernel.org/r/20170822222436.18926-1-ross.zwisler@linux.intel.comSigned-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Reported-by: N"Slusarz, Marcin" <marcin.slusarz@intel.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fffa281b
  7. 25 8月, 2017 1 次提交
    • C
      nfsd: Limit end of page list when decoding NFSv4 WRITE · fc788f64
      Chuck Lever 提交于
      When processing an NFSv4 WRITE operation, argp->end should never
      point past the end of the data in the final page of the page list.
      Otherwise, nfsd4_decode_compound can walk into uninitialized memory.
      
      More critical, nfsd4_decode_write is failing to increment argp->pagelen
      when it increments argp->pagelist.  This can cause later xdr decoders
      to assume more data is available than really is, which can cause server
      crashes on malformed requests.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      fc788f64