1. 07 9月, 2017 8 次提交
    • J
      ocfs2: make ocfs2_set_acl() static · 01ffb56b
      Jan Kara 提交于
      The function is never called outside of fs/ocfs2/acl.c.
      
      Link: http://lkml.kernel.org/r/20170801141252.19675-2-jack@suse.czSigned-off-by: NJan Kara <jack@suse.cz>
      Cc: Mark Fasheh <mfasheh@versity.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Cc: Joseph Qi <jiangqi903@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      01ffb56b
    • N
      dax: initialize variable pfn before using it · 2f52074d
      Nicolas Iooss 提交于
      dax_pmd_insert_mapping() contains the following code:
      
              pfn_t pfn;
              if (bdev_dax_pgoff(bdev, sector, size, &pgoff) != 0)
                  goto fallback;
              /* ... */
          fallback:
            trace_dax_pmd_insert_mapping_fallback(inode, vmf, length, pfn, ret);
      
      When the condition in the if statement fails, the function calls
      trace_dax_pmd_insert_mapping_fallback() with an uninitialized pfn value.
      
      This issue has been found while building the kernel with clang.  The
      compiler reported:
      
          fs/dax.c:1280:6: error: variable 'pfn' is used uninitialized
          whenever 'if' condition is true [-Werror,-Wsometimes-uninitialized]
              if (bdev_dax_pgoff(bdev, sector, size, &pgoff) != 0)
                  ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
          fs/dax.c:1310:60: note: uninitialized use occurs here
            trace_dax_pmd_insert_mapping_fallback(inode, vmf, length, pfn, ret);
                                                                           ^~~
      
      Link: http://lkml.kernel.org/r/20170903083000.587-1-nicolas.iooss_linux@m4x.orgSigned-off-by: NNicolas Iooss <nicolas.iooss_linux@m4x.org>
      Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2f52074d
    • R
      dax: use PG_PMD_COLOUR instead of open coding · 917f3452
      Ross Zwisler 提交于
      Use ~PG_PMD_COLOUR in dax_entry_waitqueue() instead of open coding an
      equivalent page offset mask.
      
      Link: http://lkml.kernel.org/r/20170822222436.18926-2-ross.zwisler@linux.intel.comSigned-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: "Slusarz, Marcin" <marcin.slusarz@intel.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      917f3452
    • R
      dax: explain how read(2)/write(2) addresses are validated · a2e050f5
      Ross Zwisler 提交于
      Add a comment explaining how the user addresses provided to read(2) and
      write(2) are validated in the DAX I/O path.
      
      We call dax_copy_from_iter() or copy_to_iter() on these without calling
      access_ok() first in the DAX code, and there was a concern that the user
      might be able to read/write to arbitrary kernel addresses with this
      path.
      
      Link: http://lkml.kernel.org/r/20170816173615.10098-1-ross.zwisler@linux.intel.comSigned-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a2e050f5
    • R
      dax: move all DAX radix tree defs to fs/dax.c · 527b19d0
      Ross Zwisler 提交于
      Now that we no longer insert struct page pointers in DAX radix trees the
      page cache code no longer needs to know anything about DAX exceptional
      entries.  Move all the DAX exceptional entry definitions from dax.h to
      fs/dax.c.
      
      Link: http://lkml.kernel.org/r/20170724170616.25810-6-ross.zwisler@linux.intel.comSigned-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Suggested-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      527b19d0
    • R
      dax: remove DAX code from page_cache_tree_insert() · d01ad197
      Ross Zwisler 提交于
      Now that we no longer insert struct page pointers in DAX radix trees we
      can remove the special casing for DAX in page_cache_tree_insert().
      
      This also allows us to make dax_wake_mapping_entry_waiter() local to
      fs/dax.c, removing it from dax.h.
      
      Link: http://lkml.kernel.org/r/20170724170616.25810-5-ross.zwisler@linux.intel.comSigned-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Suggested-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d01ad197
    • R
      dax: use common 4k zero page for dax mmap reads · 91d25ba8
      Ross Zwisler 提交于
      When servicing mmap() reads from file holes the current DAX code
      allocates a page cache page of all zeroes and places the struct page
      pointer in the mapping->page_tree radix tree.
      
      This has three major drawbacks:
      
      1) It consumes memory unnecessarily. For every 4k page that is read via
         a DAX mmap() over a hole, we allocate a new page cache page. This
         means that if you read 1GiB worth of pages, you end up using 1GiB of
         zeroed memory. This is easily visible by looking at the overall
         memory consumption of the system or by looking at /proc/[pid]/smaps:
      
      	7f62e72b3000-7f63272b3000 rw-s 00000000 103:00 12   /root/dax/data
      	Size:            1048576 kB
      	Rss:             1048576 kB
      	Pss:             1048576 kB
      	Shared_Clean:          0 kB
      	Shared_Dirty:          0 kB
      	Private_Clean:   1048576 kB
      	Private_Dirty:         0 kB
      	Referenced:      1048576 kB
      	Anonymous:             0 kB
      	LazyFree:              0 kB
      	AnonHugePages:         0 kB
      	ShmemPmdMapped:        0 kB
      	Shared_Hugetlb:        0 kB
      	Private_Hugetlb:       0 kB
      	Swap:                  0 kB
      	SwapPss:               0 kB
      	KernelPageSize:        4 kB
      	MMUPageSize:           4 kB
      	Locked:                0 kB
      
      2) It is slower than using a common zero page because each page fault
         has more work to do. Instead of just inserting a common zero page we
         have to allocate a page cache page, zero it, and then insert it. Here
         are the average latencies of dax_load_hole() as measured by ftrace on
         a random test box:
      
          Old method, using zeroed page cache pages:	3.4 us
          New method, using the common 4k zero page:	0.8 us
      
         This was the average latency over 1 GiB of sequential reads done by
         this simple fio script:
      
           [global]
           size=1G
           filename=/root/dax/data
           fallocate=none
           [io]
           rw=read
           ioengine=mmap
      
      3) The fact that we had to check for both DAX exceptional entries and
         for page cache pages in the radix tree made the DAX code more
         complex.
      
      Solve these issues by following the lead of the DAX PMD code and using a
      common 4k zero page instead.  As with the PMD code we will now insert a
      DAX exceptional entry into the radix tree instead of a struct page
      pointer which allows us to remove all the special casing in the DAX
      code.
      
      Note that we do still pretty aggressively check for regular pages in the
      DAX radix tree, especially where we take action based on the bits set in
      the page.  If we ever find a regular page in our radix tree now that
      most likely means that someone besides DAX is inserting pages (which has
      happened lots of times in the past), and we want to find that out early
      and fail loudly.
      
      This solution also removes the extra memory consumption.  Here is that
      same /proc/[pid]/smaps after 1GiB of reading from a hole with the new
      code:
      
      	7f2054a74000-7f2094a74000 rw-s 00000000 103:00 12   /root/dax/data
      	Size:            1048576 kB
      	Rss:                   0 kB
      	Pss:                   0 kB
      	Shared_Clean:          0 kB
      	Shared_Dirty:          0 kB
      	Private_Clean:         0 kB
      	Private_Dirty:         0 kB
      	Referenced:            0 kB
      	Anonymous:             0 kB
      	LazyFree:              0 kB
      	AnonHugePages:         0 kB
      	ShmemPmdMapped:        0 kB
      	Shared_Hugetlb:        0 kB
      	Private_Hugetlb:       0 kB
      	Swap:                  0 kB
      	SwapPss:               0 kB
      	KernelPageSize:        4 kB
      	MMUPageSize:           4 kB
      	Locked:                0 kB
      
      Overall system memory consumption is similarly improved.
      
      Another major change is that we remove dax_pfn_mkwrite() from our fault
      flow, and instead rely on the page fault itself to make the PTE dirty
      and writeable.  The following description from the patch adding the
      vm_insert_mixed_mkwrite() call explains this a little more:
      
         "To be able to use the common 4k zero page in DAX we need to have our
          PTE fault path look more like our PMD fault path where a PTE entry
          can be marked as dirty and writeable as it is first inserted rather
          than waiting for a follow-up dax_pfn_mkwrite() =>
          finish_mkwrite_fault() call.
      
          Right now we can rely on having a dax_pfn_mkwrite() call because we
          can distinguish between these two cases in do_wp_page():
      
                  case 1: 4k zero page => writable DAX storage
                  case 2: read-only DAX storage => writeable DAX storage
      
          This distinction is made by via vm_normal_page(). vm_normal_page()
          returns false for the common 4k zero page, though, just as it does
          for DAX ptes. Instead of special casing the DAX + 4k zero page case
          we will simplify our DAX PTE page fault sequence so that it matches
          our DAX PMD sequence, and get rid of the dax_pfn_mkwrite() helper.
          We will instead use dax_iomap_fault() to handle write-protection
          faults.
      
          This means that insert_pfn() needs to follow the lead of
          insert_pfn_pmd() and allow us to pass in a 'mkwrite' flag. If
          'mkwrite' is set insert_pfn() will do the work that was previously
          done by wp_page_reuse() as part of the dax_pfn_mkwrite() call path"
      
      Link: http://lkml.kernel.org/r/20170724170616.25810-4-ross.zwisler@linux.intel.comSigned-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      91d25ba8
    • R
      dax: relocate some dax functions · e30331ff
      Ross Zwisler 提交于
      dax_load_hole() will soon need to call dax_insert_mapping_entry(), so it
      needs to be moved lower in dax.c so the definition exists.
      
      dax_wake_mapping_entry_waiter() will soon be removed from dax.h and be
      made static to dax.c, so we need to move its definition above all its
      callers.
      
      Link: http://lkml.kernel.org/r/20170724170616.25810-3-ross.zwisler@linux.intel.comSigned-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: "Darrick J. Wong" <darrick.wong@oracle.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Andreas Dilger <adilger.kernel@dilger.ca>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e30331ff
  2. 02 9月, 2017 1 次提交
    • O
      epoll: fix race between ep_poll_callback(POLLFREE) and ep_free()/ep_remove() · 138e4ad6
      Oleg Nesterov 提交于
      The race was introduced by me in commit 971316f0 ("epoll:
      ep_unregister_pollwait() can use the freed pwq->whead").  I did not
      realize that nothing can protect eventpoll after ep_poll_callback() sets
      ->whead = NULL, only whead->lock can save us from the race with
      ep_free() or ep_remove().
      
      Move ->whead = NULL to the end of ep_poll_callback() and add the
      necessary barriers.
      
      TODO: cleanup the ewake/EPOLLEXCLUSIVE logic, it was confusing even
      before this patch.
      
      Hopefully this explains use-after-free reported by syzcaller:
      
      	BUG: KASAN: use-after-free in debug_spin_lock_before
      	...
      	 _raw_spin_lock_irqsave+0x4a/0x60 kernel/locking/spinlock.c:159
      	 ep_poll_callback+0x29f/0xff0 fs/eventpoll.c:1148
      
      this is spin_lock(eventpoll->lock),
      
      	...
      	Freed by task 17774:
      	...
      	 kfree+0xe8/0x2c0 mm/slub.c:3883
      	 ep_free+0x22c/0x2a0 fs/eventpoll.c:865
      
      Fixes: 971316f0 ("epoll: ep_unregister_pollwait() can use the freed pwq->whead")
      Reported-by: N范龙飞 <long7573@126.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      138e4ad6
  3. 01 9月, 2017 5 次提交
  4. 31 8月, 2017 2 次提交
  5. 29 8月, 2017 1 次提交
    • H
      fs/select: Fix memory corruption in compat_get_fd_set() · 79de3cbe
      Helge Deller 提交于
      Commit 464d6242 ("select: switch compat_{get,put}_fd_set() to
      compat_{get,put}_bitmap()") changed the calculation on how many bytes
      need to be zeroed when userspace handed over a NULL pointer for a fdset
      array in the select syscall.
      
      The calculation was changed in compat_get_fd_set() wrongly from
      	memset(fdset, 0, ((nr + 1) & ~1)*sizeof(compat_ulong_t));
      to
      	memset(fdset, 0, ALIGN(nr, BITS_PER_LONG));
      
      The ALIGN(nr, BITS_PER_LONG) calculates the number of _bits_ which need
      to be zeroed in the target fdset array (rounded up to the next full bits
      for an unsigned long).
      
      But the memset() call expects the number of _bytes_ to be zeroed.
      
      This leads to clearing more memory than wanted (on the stack area or
      even at kmalloc()ed memory areas) and to random kernel crashes as we
      have seen them on the parisc platform.
      
      The correct change should have been
      
      	memset(fdset, 0, (ALIGN(nr, BITS_PER_LONG) / BITS_PER_LONG) * BYTES_PER_LONG);
      
      which is the same as can be archieved with a call to
      
      	zero_fd_set(nr, fdset).
      
      Fixes: 464d6242 ("select: switch compat_{get,put}_fd_set() to compat_{get,put}_bitmap()"
      Acked-by: N: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NHelge Deller <deller@gmx.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      79de3cbe
  6. 28 8月, 2017 1 次提交
  7. 26 8月, 2017 1 次提交
    • R
      dax: fix deadlock due to misaligned PMD faults · fffa281b
      Ross Zwisler 提交于
      In DAX there are two separate places where the 2MiB range of a PMD is
      defined.
      
      The first is in the page tables, where a PMD mapping inserted for a
      given address spans from (vmf->address & PMD_MASK) to ((vmf->address &
      PMD_MASK) + PMD_SIZE - 1).  That is, from the 2MiB boundary below the
      address to the 2MiB boundary above the address.
      
      So, for example, a fault at address 3MiB (0x30 0000) falls within the
      PMD that ranges from 2MiB (0x20 0000) to 4MiB (0x40 0000).
      
      The second PMD range is in the mapping->page_tree, where a given file
      offset is covered by a radix tree entry that spans from one 2MiB aligned
      file offset to another 2MiB aligned file offset.
      
      So, for example, the file offset for 3MiB (pgoff 768) falls within the
      PMD range for the order 9 radix tree entry that ranges from 2MiB (pgoff
      512) to 4MiB (pgoff 1024).
      
      This system works so long as the addresses and file offsets for a given
      mapping both have the same offsets relative to the start of each PMD.
      
      Consider the case where the starting address for a given file isn't 2MiB
      aligned - say our faulting address is 3 MiB (0x30 0000), but that
      corresponds to the beginning of our file (pgoff 0).  Now all the PMDs in
      the mapping are misaligned so that the 2MiB range defined in the page
      tables never matches up with the 2MiB range defined in the radix tree.
      
      The current code notices this case for DAX faults to storage with the
      following test in dax_pmd_insert_mapping():
      
      	if (pfn_t_to_pfn(pfn) & PG_PMD_COLOUR)
      		goto unlock_fallback;
      
      This test makes sure that the pfn we get from the driver is 2MiB
      aligned, and relies on the assumption that the 2MiB alignment of the pfn
      we get back from the driver matches the 2MiB alignment of the faulting
      address.
      
      However, faults to holes were not checked and we could hit the problem
      described above.
      
      This was reported in response to the NVML nvml/src/test/pmempool_sync
      TEST5:
      
      	$ cd nvml/src/test/pmempool_sync
      	$ make TEST5
      
      You can grab NVML here:
      
      	https://github.com/pmem/nvml/
      
      The dmesg warning you see when you hit this error is:
      
        WARNING: CPU: 13 PID: 2900 at fs/dax.c:641 dax_insert_mapping_entry+0x2df/0x310
      
      Where we notice in dax_insert_mapping_entry() that the radix tree entry
      we are about to replace doesn't match the locked entry that we had
      previously inserted into the tree.  This happens because the initial
      insertion was done in grab_mapping_entry() using a pgoff calculated from
      the faulting address (vmf->address), and the replacement in
      dax_pmd_load_hole() => dax_insert_mapping_entry() is done using
      vmf->pgoff.
      
      In our failure case those two page offsets (one calculated from
      vmf->address, one using vmf->pgoff) point to different order 9 radix
      tree entries.
      
      This failure case can result in a deadlock because the radix tree unlock
      also happens on the pgoff calculated from vmf->address.  This means that
      the locked radix tree entry that we swapped in to the tree in
      dax_insert_mapping_entry() using vmf->pgoff is never unlocked, so all
      future faults to that 2MiB range will block forever.
      
      Fix this by validating that the faulting address's PMD offset matches
      the PMD offset from the start of the file.  This check is done at the
      very beginning of the fault and covers faults that would have mapped to
      storage as well as faults to holes.  I left the COLOUR check in
      dax_pmd_insert_mapping() in place in case we ever hit the insanity
      condition where the alignment of the pfn we get from the driver doesn't
      match the alignment of the userspace address.
      
      Link: http://lkml.kernel.org/r/20170822222436.18926-1-ross.zwisler@linux.intel.comSigned-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Reported-by: N"Slusarz, Marcin" <marcin.slusarz@intel.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Matthew Wilcox <mawilcox@microsoft.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fffa281b
  8. 25 8月, 2017 2 次提交
    • C
      nfsd: Limit end of page list when decoding NFSv4 WRITE · fc788f64
      Chuck Lever 提交于
      When processing an NFSv4 WRITE operation, argp->end should never
      point past the end of the data in the final page of the page list.
      Otherwise, nfsd4_decode_compound can walk into uninitialized memory.
      
      More critical, nfsd4_decode_write is failing to increment argp->pagelen
      when it increments argp->pagelist.  This can cause later xdr decoders
      to assume more data is available than really is, which can cause server
      crashes on malformed requests.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      fc788f64
    • E
      pty: Repair TIOCGPTPEER · 311fc65c
      Eric W. Biederman 提交于
      The implementation of TIOCGPTPEER has two issues.
      
      When /dev/ptmx (as opposed to /dev/pts/ptmx) is opened the wrong
      vfsmount is passed to dentry_open.  Which results in the kernel displaying
      the wrong pathname for the peer.
      
      The second is simply by caching the vfsmount and dentry of the peer it leaves
      them open, in a way they were not previously Which because of the inreased
      reference counts can cause unnecessary behaviour differences resulting in
      regressions.
      
      To fix these move the ioctl into tty_io.c at a generic level allowing
      the ioctl to have access to the struct file on which the ioctl is
      being called.  This allows the path of the slave to be derived when
      opening the slave through TIOCGPTPEER instead of requiring the path to
      the slave be cached.  Thus removing the need for caching the path.
      
      A new function devpts_ptmx_path is factored out of devpts_acquire and
      used to implement a function devpts_mntget.   The new function devpts_mntget
      takes a filp to perform the lookup on and fsi so that it can confirm
      that the superblock that is found by devpts_ptmx_path is the proper superblock.
      
      v2: Lots of fixes to make the code actually work
      v3: Suggestions by Linus
          - Removed the unnecessary initialization of filp in ptm_open_peer
          - Simplified devpts_ptmx_path as gotos are no longer required
      
      [ This is the fix for the issue that was reverted in commit
        143c97cc, but this time without breaking 'pbuilder' due to
        increased reference counts   - Linus ]
      
      Fixes: 54ebbfb1 ("tty: add TIOCGPTPEER ioctl")
      Reported-by: NChristian Brauner <christian.brauner@canonical.com>
      Reported-and-tested-by: NStefan Lippers-Hollmann <s.l-h@gmx.de>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      311fc65c
  9. 24 8月, 2017 4 次提交
    • O
      Btrfs: fix blk_status_t/errno confusion · 58efbc9f
      Omar Sandoval 提交于
      This fixes several instances of blk_status_t and bare errno ints being
      mixed up, some of which are real bugs.
      
      In the normal case, 0 matches BLK_STS_OK, so we don't observe any
      effects of the missing conversion, but in case of errors or passes
      through the repair/retry paths, the errors get mixed up.
      
      The changes were identified using 'sparse', we don't have reports of the
      buggy behaviour.
      
      Fixes: 4e4cbee9 ("block: switch bios to blk_status_t")
      Signed-off-by: NOmar Sandoval <osandov@fb.com>
      Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
      Reviewed-by: NDavid Sterba <dsterba@suse.com>
      Signed-off-by: NDavid Sterba <dsterba@suse.com>
      58efbc9f
    • L
      Revert "pty: fix the cached path of the pty slave file descriptor in the master" · 143c97cc
      Linus Torvalds 提交于
      This reverts commit c8c03f18.
      
      It turns out that while fixing the ptmx file descriptor to have the
      correct 'struct path' to the associated slave pty is a really good
      thing, it breaks some user space tools for a very annoying reason.
      
      The problem is that /dev/ptmx and its associated slave pty (/dev/pts/X)
      are on different mounts.  That was what caused us to have the wrong path
      in the first place (we would mix up the vfsmount of the 'ptmx' node,
      with the dentry of the pty slave node), but it also means that now while
      we use the right vfsmount, having the pty master open also keeps the pts
      mount busy.
      
      And it turn sout that that makes 'pbuilder' very unhappy, as noted by
      Stefan Lippers-Hollmann:
      
       "This patch introduces a regression for me when using pbuilder
        0.228.7[2] (a helper to build Debian packages in a chroot and to
        create and update its chroots) when trying to umount /dev/ptmx (inside
        the chroot) on Debian/ unstable (full log and pbuilder configuration
        file[3] attached).
      
        [...]
        Setting up build-essential (12.3) ...
        Processing triggers for libc-bin (2.24-15) ...
        I: unmounting dev/ptmx filesystem
        W: Could not unmount dev/ptmx: umount: /var/cache/pbuilder/build/1340/dev/ptmx: target is busy
                (In some cases useful info about processes that
                 use the device is found by lsof(8) or fuser(1).)"
      
      apparently pbuilder tries to unmount the /dev/pts filesystem while still
      holding at least one master node open, which is arguably not very nice,
      but we don't break user space even when fixing other bugs.
      
      So this commit has to be reverted.
      
      I'll try to figure out a way to avoid caching the path to the slave pty
      in the master pty.  The only thing that actually wants that slave pty
      path is the "TIOCGPTPEER" ioctl, and I think we could just recreate the
      path at that time.
      Reported-by: NStefan Lippers-Hollmann <s.l-h@gmx.de>
      Cc: Eric W Biederman <ebiederm@xmission.com>
      Cc: Christian Brauner <christian.brauner@canonical.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      143c97cc
    • R
      cifs: return ENAMETOOLONG for overlong names in cifs_open()/cifs_lookup() · d3edede2
      Ronnie Sahlberg 提交于
      Add checking for the path component length and verify it is <= the maximum
      that the server advertizes via FileFsAttributeInformation.
      
      With this patch cifs.ko will now return ENAMETOOLONG instead of ENOENT
      when users to access an overlong path.
      
      To test this, try to cd into a (non-existing) directory on a CIFS share
      that has a too long name:
      cd /mnt/aaaaaaaaaaaaaaa...
      
      and it now should show a good error message from the shell:
      bash: cd: /mnt/aaaaaaaaaaaaaaaa...aaaaaa: File name too long
      
      rh bz 1153996
      Signed-off-by: NRonnie Sahlberg <lsahlber@redhat.com>
      Signed-off-by: NSteve French <smfrench@gmail.com>
      Cc: <stable@vger.kernel.org>
      d3edede2
    • S
      cifs: Fix df output for users with quota limits · 42bec214
      Sachin Prabhu 提交于
      The df for a SMB2 share triggers a GetInfo call for
      FS_FULL_SIZE_INFORMATION. The values returned are used to populate
      struct statfs.
      
      The problem is that none of the information returned by the call
      contains the total blocks available on the filesystem. Instead we use
      the blocks available to the user ie. quota limitation when filling out
      statfs.f_blocks. The information returned does contain Actual free units
      on the filesystem and is used to populate statfs.f_bfree. For users with
      quota enabled, it can lead to situations where the total free space
      reported is more than the total blocks on the system ending up with df
      reports like the following
      
       # df -h /mnt/a
      Filesystem         Size  Used Avail Use% Mounted on
      //192.168.22.10/a  2.5G -2.3G  2.5G    - /mnt/a
      
      To fix this problem, we instead populate both statfs.f_bfree with the
      same value as statfs.f_bavail ie. CallerAvailableAllocationUnits. This
      is similar to what is done already in the code for cifs and df now
      reports the quota information for the user used to mount the share.
      
       # df --si /mnt/a
      Filesystem         Size  Used Avail Use% Mounted on
      //192.168.22.10/a  2.7G  101M  2.6G   4% /mnt/a
      Signed-off-by: NSachin Prabhu <sprabhu@redhat.com>
      Signed-off-by: NPierguido Lambri <plambri@redhat.com>
      Signed-off-by: NSteve French <smfrench@gmail.com>
      Cc: <stable@vger.kernel.org>
      42bec214
  10. 18 8月, 2017 3 次提交
    • D
      xfs: don't leak quotacheck dquots when cow recovery · 77aff8c7
      Darrick J. Wong 提交于
      If we fail a mount on account of cow recovery errors, it's possible that
      a previous quotacheck left some dquots in memory.  The bailout clause of
      xfs_mountfs forgets to purge these, and so we leak them.  Fix that.
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      77aff8c7
    • D
      xfs: clear MS_ACTIVE after finishing log recovery · 8204f8dd
      Darrick J. Wong 提交于
      Way back when we established inode block-map redo log items, it was
      discovered that we needed to prevent the VFS from evicting inodes during
      log recovery because any given inode might be have bmap redo items to
      replay even if the inode has no link count and is ultimately deleted,
      and any eviction of an unlinked inode causes the inode to be truncated
      and freed too early.
      
      To make this possible, we set MS_ACTIVE so that inodes would not be torn
      down immediately upon release.  Unfortunately, this also results in the
      quota inodes not being released at all if a later part of the mount
      process should fail, because we never reclaim the inodes.  So, set
      MS_ACTIVE right before we do the last part of log recovery and clear it
      immediately after we finish the log recovery so that everything
      will be torn down properly if we abort the mount.
      
      Fixes: 17c12bcd ("xfs: when replaying bmap operations, don't let unlinked inodes get reaped")
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      8204f8dd
    • L
      pty: fix the cached path of the pty slave file descriptor in the master · c8c03f18
      Linus Torvalds 提交于
      Christian Brauner reported that if you use the TIOCGPTPEER ioctl() to
      get a slave pty file descriptor, the resulting file descriptor doesn't
      look right in /proc/<pid>/fd/<fd>.  In particular, he wanted to use
      readlink() on /proc/self/fd/<fd> to get the pathname of the slave pty
      (basically implementing "ptsname{_r}()").
      
      The reason for that was that we had generated the wrong 'struct path'
      when we create the pty in ptmx_open().
      
      In particular, the dentry was correct, but the vfsmount pointed to the
      mount of the ptmx node. That _can_ be correct - in case you use
      "/dev/pts/ptmx" to open the master - but usually is not.  The normal
      case is to use /dev/ptmx, which then looks up the pts/ directory, and
      then the vfsmount of the ptmx node is obviously the /dev directory, not
      the /dev/pts/ directory.
      
      We actually did have the right vfsmount available, but in the wrong
      place (it gets looked up in 'devpts_acquire()' when we get a reference
      to the pts filesystem), and so ptmx_open() used the wrong mnt pointer.
      
      The end result of this confusion was that the pty worked fine, but when
      if you did TIOCGPTPEER to get the slave side of the pty, end end result
      would also work, but have that dodgy 'struct path'.
      
      And then when doing "d_path()" on to get the pathname, the vfsmount
      would not match the root of the pts directory, and d_path() would return
      an empty pathname thinking that the entry had escaped a bind mount into
      another mount.
      
      This fixes the problem by making devpts_acquire() return the vfsmount
      for the pts filesystem, allowing ptmx_open() to trivially just use the
      right mount for the pts dentry, and create the proper 'struct path'.
      Reported-by: NChristian Brauner <christian.brauner@ubuntu.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Acked-by: NEric Biederman <ebiederm@xmission.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c8c03f18
  11. 17 8月, 2017 1 次提交
  12. 14 8月, 2017 2 次提交
  13. 12 8月, 2017 3 次提交
  14. 11 8月, 2017 4 次提交
  15. 10 8月, 2017 2 次提交