1. 07 9月, 2017 24 次提交
  2. 02 9月, 2017 1 次提交
    • O
      epoll: fix race between ep_poll_callback(POLLFREE) and ep_free()/ep_remove() · 138e4ad6
      Oleg Nesterov 提交于
      The race was introduced by me in commit 971316f0 ("epoll:
      ep_unregister_pollwait() can use the freed pwq->whead").  I did not
      realize that nothing can protect eventpoll after ep_poll_callback() sets
      ->whead = NULL, only whead->lock can save us from the race with
      ep_free() or ep_remove().
      
      Move ->whead = NULL to the end of ep_poll_callback() and add the
      necessary barriers.
      
      TODO: cleanup the ewake/EPOLLEXCLUSIVE logic, it was confusing even
      before this patch.
      
      Hopefully this explains use-after-free reported by syzcaller:
      
      	BUG: KASAN: use-after-free in debug_spin_lock_before
      	...
      	 _raw_spin_lock_irqsave+0x4a/0x60 kernel/locking/spinlock.c:159
      	 ep_poll_callback+0x29f/0xff0 fs/eventpoll.c:1148
      
      this is spin_lock(eventpoll->lock),
      
      	...
      	Freed by task 17774:
      	...
      	 kfree+0xe8/0x2c0 mm/slub.c:3883
      	 ep_free+0x22c/0x2a0 fs/eventpoll.c:865
      
      Fixes: 971316f0 ("epoll: ep_unregister_pollwait() can use the freed pwq->whead")
      Reported-by: N范龙飞 <long7573@126.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      138e4ad6
  3. 01 9月, 2017 5 次提交
  4. 31 8月, 2017 2 次提交
  5. 29 8月, 2017 1 次提交
    • H
      fs/select: Fix memory corruption in compat_get_fd_set() · 79de3cbe
      Helge Deller 提交于
      Commit 464d6242 ("select: switch compat_{get,put}_fd_set() to
      compat_{get,put}_bitmap()") changed the calculation on how many bytes
      need to be zeroed when userspace handed over a NULL pointer for a fdset
      array in the select syscall.
      
      The calculation was changed in compat_get_fd_set() wrongly from
      	memset(fdset, 0, ((nr + 1) & ~1)*sizeof(compat_ulong_t));
      to
      	memset(fdset, 0, ALIGN(nr, BITS_PER_LONG));
      
      The ALIGN(nr, BITS_PER_LONG) calculates the number of _bits_ which need
      to be zeroed in the target fdset array (rounded up to the next full bits
      for an unsigned long).
      
      But the memset() call expects the number of _bytes_ to be zeroed.
      
      This leads to clearing more memory than wanted (on the stack area or
      even at kmalloc()ed memory areas) and to random kernel crashes as we
      have seen them on the parisc platform.
      
      The correct change should have been
      
      	memset(fdset, 0, (ALIGN(nr, BITS_PER_LONG) / BITS_PER_LONG) * BYTES_PER_LONG);
      
      which is the same as can be archieved with a call to
      
      	zero_fd_set(nr, fdset).
      
      Fixes: 464d6242 ("select: switch compat_{get,put}_fd_set() to compat_{get,put}_bitmap()"
      Acked-by: N: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NHelge Deller <deller@gmx.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      79de3cbe
  6. 28 8月, 2017 1 次提交
  7. 26 8月, 2017 1 次提交
    • R
      dax: fix deadlock due to misaligned PMD faults · fffa281b
      Ross Zwisler 提交于
      In DAX there are two separate places where the 2MiB range of a PMD is
      defined.
      
      The first is in the page tables, where a PMD mapping inserted for a
      given address spans from (vmf->address & PMD_MASK) to ((vmf->address &
      PMD_MASK) + PMD_SIZE - 1).  That is, from the 2MiB boundary below the
      address to the 2MiB boundary above the address.
      
      So, for example, a fault at address 3MiB (0x30 0000) falls within the
      PMD that ranges from 2MiB (0x20 0000) to 4MiB (0x40 0000).
      
      The second PMD range is in the mapping->page_tree, where a given file
      offset is covered by a radix tree entry that spans from one 2MiB aligned
      file offset to another 2MiB aligned file offset.
      
      So, for example, the file offset for 3MiB (pgoff 768) falls within the
      PMD range for the order 9 radix tree entry that ranges from 2MiB (pgoff
      512) to 4MiB (pgoff 1024).
      
      This system works so long as the addresses and file offsets for a given
      mapping both have the same offsets relative to the start of each PMD.
      
      Consider the case where the starting address for a given file isn't 2MiB
      aligned - say our faulting address is 3 MiB (0x30 0000), but that
      corresponds to the beginning of our file (pgoff 0).  Now all the PMDs in
      the mapping are misaligned so that the 2MiB range defined in the page
      tables never matches up with the 2MiB range defined in the radix tree.
      
      The current code notices this case for DAX faults to storage with the
      following test in dax_pmd_insert_mapping():
      
      	if (pfn_t_to_pfn(pfn) & PG_PMD_COLOUR)
      		goto unlock_fallback;
      
      This test makes sure that the pfn we get from the driver is 2MiB
      aligned, and relies on the assumption that the 2MiB alignment of the pfn
      we get back from the driver matches the 2MiB alignment of the faulting
      address.
      
      However, faults to holes were not checked and we could hit the problem
      described above.
      
      This was reported in response to the NVML nvml/src/test/pmempool_sync
      TEST5:
      
      	$ cd nvml/src/test/pmempool_sync
      	$ make TEST5
      
      You can grab NVML here:
      
      	https://github.com/pmem/nvml/
      
      The dmesg warning you see when you hit this error is:
      
        WARNING: CPU: 13 PID: 2900 at fs/dax.c:641 dax_insert_mapping_entry+0x2df/0x310
      
      Where we notice in dax_insert_mapping_entry() that the radix tree entry
      we are about to replace doesn't match the locked entry that we had
      previously inserted into the tree.  This happens because the initial
      insertion was done in grab_mapping_entry() using a pgoff calculated from
      the faulting address (vmf->address), and the replacement in
      dax_pmd_load_hole() => dax_insert_mapping_entry() is done using
      vmf->pgoff.
      
      In our failure case those two page offsets (one calculated from
      vmf->address, one using vmf->pgoff) point to different order 9 radix
      tree entries.
      
      This failure case can result in a deadlock because the radix tree unlock
      also happens on the pgoff calculated from vmf->address.  This means that
      the locked radix tree entry that we swapped in to the tree in
      dax_insert_mapping_entry() using vmf->pgoff is never unlocked, so all
      future faults to that 2MiB range will block forever.
      
      Fix this by validating that the faulting address's PMD offset matches
      the PMD offset from the start of the file.  This check is done at the
      very beginning of the fault and covers faults that would have mapped to
      storage as well as faults to holes.  I left the COLOUR check in
      dax_pmd_insert_mapping() in place in case we ever hit the insanity
      condition where the alignment of the pfn we get from the driver doesn't
      match the alignment of the userspace address.
      
      Link: http://lkml.kernel.org/r/20170822222436.18926-1-ross.zwisler@linux.intel.comSigned-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Reported-by: N"Slusarz, Marcin" <marcin.slusarz@intel.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fffa281b
  8. 25 8月, 2017 2 次提交
    • C
      nfsd: Limit end of page list when decoding NFSv4 WRITE · fc788f64
      Chuck Lever 提交于
      When processing an NFSv4 WRITE operation, argp->end should never
      point past the end of the data in the final page of the page list.
      Otherwise, nfsd4_decode_compound can walk into uninitialized memory.
      
      More critical, nfsd4_decode_write is failing to increment argp->pagelen
      when it increments argp->pagelist.  This can cause later xdr decoders
      to assume more data is available than really is, which can cause server
      crashes on malformed requests.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      fc788f64
    • E
      pty: Repair TIOCGPTPEER · 311fc65c
      Eric W. Biederman 提交于
      The implementation of TIOCGPTPEER has two issues.
      
      When /dev/ptmx (as opposed to /dev/pts/ptmx) is opened the wrong
      vfsmount is passed to dentry_open.  Which results in the kernel displaying
      the wrong pathname for the peer.
      
      The second is simply by caching the vfsmount and dentry of the peer it leaves
      them open, in a way they were not previously Which because of the inreased
      reference counts can cause unnecessary behaviour differences resulting in
      regressions.
      
      To fix these move the ioctl into tty_io.c at a generic level allowing
      the ioctl to have access to the struct file on which the ioctl is
      being called.  This allows the path of the slave to be derived when
      opening the slave through TIOCGPTPEER instead of requiring the path to
      the slave be cached.  Thus removing the need for caching the path.
      
      A new function devpts_ptmx_path is factored out of devpts_acquire and
      used to implement a function devpts_mntget.   The new function devpts_mntget
      takes a filp to perform the lookup on and fsi so that it can confirm
      that the superblock that is found by devpts_ptmx_path is the proper superblock.
      
      v2: Lots of fixes to make the code actually work
      v3: Suggestions by Linus
          - Removed the unnecessary initialization of filp in ptm_open_peer
          - Simplified devpts_ptmx_path as gotos are no longer required
      
      [ This is the fix for the issue that was reverted in commit
        143c97cc, but this time without breaking 'pbuilder' due to
        increased reference counts   - Linus ]
      
      Fixes: 54ebbfb1 ("tty: add TIOCGPTPEER ioctl")
      Reported-by: NChristian Brauner <christian.brauner@canonical.com>
      Reported-and-tested-by: NStefan Lippers-Hollmann <s.l-h@gmx.de>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      311fc65c
  9. 24 8月, 2017 3 次提交
    • O
      Btrfs: fix blk_status_t/errno confusion · 58efbc9f
      Omar Sandoval 提交于
      This fixes several instances of blk_status_t and bare errno ints being
      mixed up, some of which are real bugs.
      
      In the normal case, 0 matches BLK_STS_OK, so we don't observe any
      effects of the missing conversion, but in case of errors or passes
      through the repair/retry paths, the errors get mixed up.
      
      The changes were identified using 'sparse', we don't have reports of the
      buggy behaviour.
      
      Fixes: 4e4cbee9 ("block: switch bios to blk_status_t")
      Signed-off-by: NOmar Sandoval <osandov@fb.com>
      Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      58efbc9f
    • L
      Revert "pty: fix the cached path of the pty slave file descriptor in the master" · 143c97cc
      Linus Torvalds 提交于
      This reverts commit c8c03f18.
      
      It turns out that while fixing the ptmx file descriptor to have the
      correct 'struct path' to the associated slave pty is a really good
      thing, it breaks some user space tools for a very annoying reason.
      
      The problem is that /dev/ptmx and its associated slave pty (/dev/pts/X)
      are on different mounts.  That was what caused us to have the wrong path
      in the first place (we would mix up the vfsmount of the 'ptmx' node,
      with the dentry of the pty slave node), but it also means that now while
      we use the right vfsmount, having the pty master open also keeps the pts
      mount busy.
      
      And it turn sout that that makes 'pbuilder' very unhappy, as noted by
      Stefan Lippers-Hollmann:
      
       "This patch introduces a regression for me when using pbuilder
        0.228.7[2] (a helper to build Debian packages in a chroot and to
        create and update its chroots) when trying to umount /dev/ptmx (inside
        the chroot) on Debian/ unstable (full log and pbuilder configuration
        file[3] attached).
      
        [...]
        Setting up build-essential (12.3) ...
        Processing triggers for libc-bin (2.24-15) ...
        I: unmounting dev/ptmx filesystem
        W: Could not unmount dev/ptmx: umount: /var/cache/pbuilder/build/1340/dev/ptmx: target is busy
                (In some cases useful info about processes that
                 use the device is found by lsof(8) or fuser(1).)"
      
      apparently pbuilder tries to unmount the /dev/pts filesystem while still
      holding at least one master node open, which is arguably not very nice,
      but we don't break user space even when fixing other bugs.
      
      So this commit has to be reverted.
      
      I'll try to figure out a way to avoid caching the path to the slave pty
      in the master pty.  The only thing that actually wants that slave pty
      path is the "TIOCGPTPEER" ioctl, and I think we could just recreate the
      path at that time.
      Reported-by: NStefan Lippers-Hollmann <s.l-h@gmx.de>
      Cc: Eric W Biederman <ebiederm@xmission.com>
      Cc: Christian Brauner <christian.brauner@canonical.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      143c97cc
    • R
      cifs: return ENAMETOOLONG for overlong names in cifs_open()/cifs_lookup() · d3edede2
      Ronnie Sahlberg 提交于
      Add checking for the path component length and verify it is <= the maximum
      that the server advertizes via FileFsAttributeInformation.
      
      With this patch cifs.ko will now return ENAMETOOLONG instead of ENOENT
      when users to access an overlong path.
      
      To test this, try to cd into a (non-existing) directory on a CIFS share
      that has a too long name:
      cd /mnt/aaaaaaaaaaaaaaa...
      
      and it now should show a good error message from the shell:
      bash: cd: /mnt/aaaaaaaaaaaaaaaa...aaaaaa: File name too long
      
      rh bz 1153996
      Signed-off-by: NRonnie Sahlberg <lsahlber@redhat.com>
      Signed-off-by: NSteve French <smfrench@gmail.com>
      Cc: <stable@vger.kernel.org>
      d3edede2