1. 08 10月, 2015 4 次提交
  2. 07 10月, 2015 1 次提交
  3. 04 10月, 2015 1 次提交
  4. 03 10月, 2015 6 次提交
  5. 02 10月, 2015 2 次提交
    • S
      [SMB3] Do not fall back to SMBWriteX in set_file_size error cases · 646200a0
      Steve French 提交于
      The error paths in set_file_size for cifs and smb3 are incorrect.
      
      In the unlikely event that a server did not support set file info
      of the file size, the code incorrectly falls back to trying SMBWriteX
      (note that only the original core SMB Write, used for example by DOS,
      can set the file size this way - this actually  does not work for the more
      recent SMBWriteX).  The idea was since the old DOS SMB Write could set
      the file size if you write zero bytes at that offset then use that if
      server rejects the normal set file info call.
      
      Fortunately the SMBWriteX will never be sent on the wire (except when
      file size is zero) since the length and offset fields were reversed
      in the two places in this function that call SMBWriteX causing
      the fall back path to return an error. It is also important to never call
      an SMB request from an SMB2/sMB3 session (which theoretically would
      be possible, and can cause a brief session drop, although the client
      recovers) so this should be fixed.  In practice this path does not happen
      with modern servers but the error fall back to SMBWriteX is clearly wrong.
      
      Removing the calls to SMBWriteX in the error paths in cifs_set_file_size
      
      Pointed out by PaX/grsecurity team
      Signed-off-by: NSteve French <steve.french@primarydata.com>
      Reported-by: NPaX Team <pageexec@freemail.hu>
      CC: Emese Revfy <re.emese@gmail.com>
      CC: Brad Spengler <spender@grsecurity.net>
      CC: Stable <stable@vger.kernel.org>
      646200a0
    • R
      dax: fix NULL pointer in __dax_pmd_fault() · 8346c416
      Ross Zwisler 提交于
      Commit 46c043ed ("mm: take i_mmap_lock in unmap_mapping_range() for
      DAX") moved some code in __dax_pmd_fault() that was responsible for
      zeroing newly allocated PMD pages.  The new location didn't properly set
      up 'kaddr', so when run this code resulted in a NULL pointer BUG.
      
      Fix this by getting the correct 'kaddr' via bdev_direct_access().
      Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Reported-by: NDan Williams <dan.j.williams@intel.com>
      Reviewed-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8346c416
  6. 29 9月, 2015 1 次提交
    • R
      UBIFS: Kill unneeded locking in ubifs_init_security · cf6f54e3
      Richard Weinberger 提交于
      Fixes the following lockdep splat:
      [    1.244527] =============================================
      [    1.245193] [ INFO: possible recursive locking detected ]
      [    1.245193] 4.2.0-rc1+ #37 Not tainted
      [    1.245193] ---------------------------------------------
      [    1.245193] cp/742 is trying to acquire lock:
      [    1.245193]  (&sb->s_type->i_mutex_key#9){+.+.+.}, at: [<ffffffff812b3f69>] ubifs_init_security+0x29/0xb0
      [    1.245193]
      [    1.245193] but task is already holding lock:
      [    1.245193]  (&sb->s_type->i_mutex_key#9){+.+.+.}, at: [<ffffffff81198e7f>] path_openat+0x3af/0x1280
      [    1.245193]
      [    1.245193] other info that might help us debug this:
      [    1.245193]  Possible unsafe locking scenario:
      [    1.245193]
      [    1.245193]        CPU0
      [    1.245193]        ----
      [    1.245193]   lock(&sb->s_type->i_mutex_key#9);
      [    1.245193]   lock(&sb->s_type->i_mutex_key#9);
      [    1.245193]
      [    1.245193]  *** DEADLOCK ***
      [    1.245193]
      [    1.245193]  May be due to missing lock nesting notation
      [    1.245193]
      [    1.245193] 2 locks held by cp/742:
      [    1.245193]  #0:  (sb_writers#5){.+.+.+}, at: [<ffffffff811ad37f>] mnt_want_write+0x1f/0x50
      [    1.245193]  #1:  (&sb->s_type->i_mutex_key#9){+.+.+.}, at: [<ffffffff81198e7f>] path_openat+0x3af/0x1280
      [    1.245193]
      [    1.245193] stack backtrace:
      [    1.245193] CPU: 2 PID: 742 Comm: cp Not tainted 4.2.0-rc1+ #37
      [    1.245193] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140816_022509-build35 04/01/2014
      [    1.245193]  ffffffff8252d530 ffff88007b023a38 ffffffff814f6f49 ffffffff810b56c5
      [    1.245193]  ffff88007c30cc80 ffff88007b023af8 ffffffff810a150d ffff88007b023a68
      [    1.245193]  000000008101302a ffff880000000000 00000008f447e23f ffffffff8252d500
      [    1.245193] Call Trace:
      [    1.245193]  [<ffffffff814f6f49>] dump_stack+0x4c/0x65
      [    1.245193]  [<ffffffff810b56c5>] ? console_unlock+0x1c5/0x510
      [    1.245193]  [<ffffffff810a150d>] __lock_acquire+0x1a6d/0x1ea0
      [    1.245193]  [<ffffffff8109fa78>] ? __lock_is_held+0x58/0x80
      [    1.245193]  [<ffffffff810a1a93>] lock_acquire+0xd3/0x270
      [    1.245193]  [<ffffffff812b3f69>] ? ubifs_init_security+0x29/0xb0
      [    1.245193]  [<ffffffff814fc83b>] mutex_lock_nested+0x6b/0x3a0
      [    1.245193]  [<ffffffff812b3f69>] ? ubifs_init_security+0x29/0xb0
      [    1.245193]  [<ffffffff812b3f69>] ? ubifs_init_security+0x29/0xb0
      [    1.245193]  [<ffffffff812b3f69>] ubifs_init_security+0x29/0xb0
      [    1.245193]  [<ffffffff8128e286>] ubifs_create+0xa6/0x1f0
      [    1.245193]  [<ffffffff81198e7f>] ? path_openat+0x3af/0x1280
      [    1.245193]  [<ffffffff81195d15>] vfs_create+0x95/0xc0
      [    1.245193]  [<ffffffff8119929c>] path_openat+0x7cc/0x1280
      [    1.245193]  [<ffffffff8109ffe3>] ? __lock_acquire+0x543/0x1ea0
      [    1.245193]  [<ffffffff81088f20>] ? sched_clock_cpu+0x90/0xc0
      [    1.245193]  [<ffffffff81088c00>] ? calc_global_load_tick+0x60/0x90
      [    1.245193]  [<ffffffff81088f20>] ? sched_clock_cpu+0x90/0xc0
      [    1.245193]  [<ffffffff811a9cef>] ? __alloc_fd+0xaf/0x180
      [    1.245193]  [<ffffffff8119ac55>] do_filp_open+0x75/0xd0
      [    1.245193]  [<ffffffff814ffd86>] ? _raw_spin_unlock+0x26/0x40
      [    1.245193]  [<ffffffff811a9cef>] ? __alloc_fd+0xaf/0x180
      [    1.245193]  [<ffffffff81189bd9>] do_sys_open+0x129/0x200
      [    1.245193]  [<ffffffff81189cc9>] SyS_open+0x19/0x20
      [    1.245193]  [<ffffffff81500717>] entry_SYSCALL_64_fastpath+0x12/0x6f
      
      While the lockdep splat is a false positive, becuase path_openat holds i_mutex
      of the parent directory and ubifs_init_security() tries to acquire i_mutex
      of a new inode, it reveals that taking i_mutex in ubifs_init_security() is
      in vain because it is only being called in the inode allocation path
      and therefore nobody else can see the inode yet.
      
      Cc: stable@vger.kernel.org # 3.20-
      Reported-and-tested-by: NBoris Brezillon <boris.brezillon@free-electrons.com>
      Reviewed-and-tested-by: NDongsheng Yang <yangds.fnst@cn.fujitsu.com>
      Signed-off-by: NRichard Weinberger <richard@nod.at>
      Signed-off-by: dedekind1@gmail.com
      cf6f54e3
  7. 26 9月, 2015 1 次提交
  8. 24 9月, 2015 2 次提交
  9. 23 9月, 2015 7 次提交
    • P
      NFS41: make close wait for layoutreturn · 500d701f
      Peng Tao 提交于
      If we send a layoutreturn asynchronously before close, the close
      might reach server first and layoutreturn would fail with BADSTATEID
      because there is nothing keeping the layout stateid alive.
      
      Also do not pretend sending layoutreturn if we are not.
      Signed-off-by: NPeng Tao <tao.peng@primarydata.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      500d701f
    • J
      ocfs2/dlm: fix deadlock when dispatch assert master · 012572d4
      Joseph Qi 提交于
      The order of the following three spinlocks should be:
      dlm_domain_lock < dlm_ctxt->spinlock < dlm_lock_resource->spinlock
      
      But dlm_dispatch_assert_master() is called while holding
      dlm_ctxt->spinlock and dlm_lock_resource->spinlock, and then it calls
      dlm_grab() which will take dlm_domain_lock.
      
      Once another thread (for example, dlm_query_join_handler) has already
      taken dlm_domain_lock, and tries to take dlm_ctxt->spinlock deadlock
      happens.
      Signed-off-by: NJoseph Qi <joseph.qi@huawei.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: "Junxiao Bi" <junxiao.bi@oracle.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      012572d4
    • A
      userfaultfd: revert "userfaultfd: waitqueue: add nr wake parameter to __wake_up_locked_key" · ac5be6b4
      Andrea Arcangeli 提交于
      This reverts commit 51360155 and adapts
      fs/userfaultfd.c to use the old version of that function.
      
      It didn't look robust to call __wake_up_common with "nr == 1" when we
      absolutely require wakeall semantics, but we've full control of what we
      insert in the two waitqueue heads of the blocked userfaults.  No
      exclusive waitqueue risks to be inserted into those two waitqueue heads
      so we can as well stick to "nr == 1" of the old code and we can rely
      purely on the fact no waitqueue inserted in one of the two waitqueue
      heads we must enforce as wakeall, has wait->flags WQ_FLAG_EXCLUSIVE set.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Shuah Khan <shuahkh@osg.samsung.com>
      Cc: Thierry Reding <treding@nvidia.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ac5be6b4
    • K
      NFS: Skip checking ds_cinfo.buckets when lseg's commit_through_mds is set · 834e465b
      Kinglong Mee 提交于
      When lseg's commit_through_mds is set, pnfs client always WARN once
      in nfs_direct_select_verf after checking ds_cinfo.nbuckets.
      
      nfs should use the DS verf except commit_through_mds is set for
      layout segment where nbuckets is zero.
      
      [17844.666094] ------------[ cut here ]------------
      [17844.667071] WARNING: CPU: 0 PID: 21758 at /root/source/linux-pnfs/fs/nfs/direct.c:174 nfs_direct_select_verf+0x5a/0x70 [nfs]()
      [17844.668650] Modules linked in: nfs_layout_nfsv41_files(OE) nfsv4(OE) nfs(OE) fscache(E) nfsd(OE) xfs libcrc32c btrfs ppdev coretemp crct10dif_pclmul auth_rpcgss crc32_pclmul crc32c_intel nfs_acl ghash_clmulni_intel lockd vmw_balloon xor vmw_vmci grace raid6_pq shpchp sunrpc parport_pc i2c_piix4 parport vmwgfx drm_kms_helper ttm drm serio_raw mptspi e1000 scsi_transport_spi mptscsih mptbase ata_generic pata_acpi [last unloaded: fscache]
      [17844.686676] CPU: 0 PID: 21758 Comm: kworker/0:1 Tainted: G        W  OE   4.3.0-rc1-pnfs+ #245
      [17844.687352] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/20/2014
      [17844.698502] Workqueue: nfsiod rpc_async_release [sunrpc]
      [17844.699212]  0000000000000009 0000000043e58010 ffff8800454fbc10 ffffffff813680c4
      [17844.699990]  ffff8800454fbc48 ffffffff8108b49d ffff88004eb20000 ffff88004eb20000
      [17844.700844]  ffff880062e26000 0000000000000000 0000000000000001 ffff8800454fbc58
      [17844.701637] Call Trace:
      [17844.725252]  [<ffffffff813680c4>] dump_stack+0x19/0x25
      [17844.732693]  [<ffffffff8108b49d>] warn_slowpath_common+0x7d/0xb0
      [17844.733855]  [<ffffffff8108b5da>] warn_slowpath_null+0x1a/0x20
      [17844.735015]  [<ffffffffa04a27ca>] nfs_direct_select_verf+0x5a/0x70 [nfs]
      [17844.735999]  [<ffffffffa04a2b83>] nfs_direct_set_hdr_verf+0x23/0x90 [nfs]
      [17844.736846]  [<ffffffffa04a2e17>] nfs_direct_write_completion+0x227/0x260 [nfs]
      [17844.737782]  [<ffffffffa04a433c>] nfs_pgio_release+0x1c/0x20 [nfs]
      [17844.738597]  [<ffffffffa0502df3>] pnfs_generic_rw_release+0x23/0x30 [nfsv4]
      [17844.739486]  [<ffffffffa01cbbea>] rpc_free_task+0x2a/0x70 [sunrpc]
      [17844.740326]  [<ffffffffa01cbcd5>] rpc_async_release+0x15/0x20 [sunrpc]
      [17844.741173]  [<ffffffff810a387c>] process_one_work+0x21c/0x4c0
      [17844.741984]  [<ffffffff810a37cd>] ? process_one_work+0x16d/0x4c0
      [17844.742837]  [<ffffffff810a3b6a>] worker_thread+0x4a/0x440
      [17844.743639]  [<ffffffff810a3b20>] ? process_one_work+0x4c0/0x4c0
      [17844.744399]  [<ffffffff810a3b20>] ? process_one_work+0x4c0/0x4c0
      [17844.745176]  [<ffffffff810a8d75>] kthread+0xf5/0x110
      [17844.745927]  [<ffffffff810a8c80>] ? kthread_create_on_node+0x240/0x240
      [17844.747105]  [<ffffffff8172ce1f>] ret_from_fork+0x3f/0x70
      [17844.747856]  [<ffffffff810a8c80>] ? kthread_create_on_node+0x240/0x240
      [17844.748642] ---[ end trace 336a2845d42b83f0 ]---
      Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      834e465b
    • P
      cifs: use server timestamp for ntlmv2 authentication · 98ce94c8
      Peter Seiderer 提交于
      Linux cifs mount with ntlmssp against an Mac OS X (Yosemite
      10.10.5) share fails in case the clocks differ more than +/-2h:
      
      digest-service: digest-request: od failed with 2 proto=ntlmv2
      digest-service: digest-request: kdc failed with -1561745592 proto=ntlmv2
      
      Fix this by (re-)using the given server timestamp for the
      ntlmv2 authentication (as Windows 7 does).
      
      A related problem was also reported earlier by Namjae Jaen (see below):
      
      Windows machine has extended security feature which refuse to allow
      authentication when there is time difference between server time and
      client time when ntlmv2 negotiation is used. This problem is prevalent
      in embedded enviornment where system time is set to default 1970.
      
      Modern servers send the server timestamp in the TargetInfo Av_Pair
      structure in the challenge message [see MS-NLMP 2.2.2.1]
      In [MS-NLMP 3.1.5.1.2] it is explicitly mentioned that the client must
      use the server provided timestamp if present OR current time if it is
      not
      Reported-by: NNamjae Jeon <namjae.jeon@samsung.com>
      Signed-off-by: NPeter Seiderer <ps.report@gmx.net>
      Signed-off-by: NSteve French <smfrench@gmail.com>
      CC: Stable <stable@vger.kernel.org>
      98ce94c8
    • S
      disabling oplocks/leases via module parm enable_oplocks broken for SMB3 · e0ddde9d
      Steve French 提交于
      leases (oplocks) were always requested for SMB2/SMB3 even when oplocks
      disabled in the cifs.ko module.
      Signed-off-by: NSteve French <steve.french@primarydata.com>
      Reviewed-by: NChandrika Srinivasan <chandrika.srinivasan@citrix.com>
      CC: Stable <stable@vger.kernel.org>
      e0ddde9d
    • J
      Btrfs: keep dropped roots in cache until transaction commit · 2b9dbef2
      Josef Bacik 提交于
      When dropping a snapshot we need to account for the qgroup changes.  If we drop
      the snapshot in all one go then the backref code will fail to find blocks from
      the snapshot we dropped since it won't be able to find the root in the fs root
      cache.  This can lead to us failing to find refs from other roots that pointed
      at blocks in the now deleted root.  To handle this we need to not remove the fs
      roots from the cache until after we process the qgroup operations.  Do this by
      adding dropped roots to a list on the transaction, and letting the transaction
      remove the roots at the same time it drops the commit roots.  This will keep all
      of the backref searching code in sync properly, and fixes a problem Mark was
      seeing with snapshot delete and qgroups.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fb.com>
      Tested-by: NHolger Hoffstätte <holger.hoffstaette@googlemail.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      2b9dbef2
  10. 22 9月, 2015 1 次提交
    • C
      Btrfs: Direct I/O: Fix space accounting · 50745b0a
      chandan 提交于
      The following call trace is seen when generic/095 test is executed,
      
      WARNING: CPU: 3 PID: 2769 at /home/chandan/code/repos/linux/fs/btrfs/inode.c:8967 btrfs_destroy_inode+0x284/0x2a0()
      Modules linked in:
      CPU: 3 PID: 2769 Comm: umount Not tainted 4.2.0-rc5+ #31
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20150306_163512-brownie 04/01/2014
       ffffffff81c08150 ffff8802ec9cbce8 ffffffff81984058 ffff8802ffd8feb0
       0000000000000000 ffff8802ec9cbd28 ffffffff81050385 ffff8802ec9cbd38
       ffff8802d12f8588 ffff8802d12f8588 ffff8802f15ab000 ffff8800bb96c0b0
      Call Trace:
       [<ffffffff81984058>] dump_stack+0x45/0x57
       [<ffffffff81050385>] warn_slowpath_common+0x85/0xc0
       [<ffffffff81050465>] warn_slowpath_null+0x15/0x20
       [<ffffffff81340294>] btrfs_destroy_inode+0x284/0x2a0
       [<ffffffff8117ce07>] destroy_inode+0x37/0x60
       [<ffffffff8117cf39>] evict+0x109/0x170
       [<ffffffff8117cfd5>] dispose_list+0x35/0x50
       [<ffffffff8117dd3a>] evict_inodes+0xaa/0x100
       [<ffffffff81165667>] generic_shutdown_super+0x47/0xf0
       [<ffffffff81165951>] kill_anon_super+0x11/0x20
       [<ffffffff81302093>] btrfs_kill_super+0x13/0x110
       [<ffffffff81165c99>] deactivate_locked_super+0x39/0x70
       [<ffffffff811660cf>] deactivate_super+0x5f/0x70
       [<ffffffff81180e1e>] cleanup_mnt+0x3e/0x90
       [<ffffffff81180ebd>] __cleanup_mnt+0xd/0x10
       [<ffffffff81069c06>] task_work_run+0x96/0xb0
       [<ffffffff81003a3d>] do_notify_resume+0x3d/0x50
       [<ffffffff8198cbc2>] int_signal+0x12/0x17
      
      This means that the inode had non-zero "outstanding extents" during
      eviction. This occurs because, during direct I/O a task which successfully
      used up its reserved data space would set BTRFS_INODE_DIO_READY bit and does
      not clear the bit after finishing the DIO write. A future DIO write could
      actually fail and the unused reserve space won't be freed because of the
      previously set BTRFS_INODE_DIO_READY bit.
      
      Clearing the BTRFS_INODE_DIO_READY bit in btrfs_direct_IO() caused the
      following issue,
      |-----------------------------------+-------------------------------------|
      | Task A                            | Task B                              |
      |-----------------------------------+-------------------------------------|
      | Start direct i/o write on inode X.|                                     |
      | reserve space                     |                                     |
      | Allocate ordered extent           |                                     |
      | release reserved space            |                                     |
      | Set BTRFS_INODE_DIO_READY bit.    |                                     |
      |                                   | splice()                            |
      |                                   | Transfer data from pipe buffer to   |
      |                                   | destination file.                   |
      |                                   | - kmap(pipe buffer page)            |
      |                                   | - Start direct i/o write on         |
      |                                   |   inode X.                          |
      |                                   |   - reserve space                   |
      |                                   |   - dio_refill_pages()              |
      |                                   |     - sdio->blocks_available == 0   |
      |                                   |     - Since a kernel address is     |
      |                                   |       being passed instead of a     |
      |                                   |       user space address,           |
      |                                   |       iov_iter_get_pages() returns  |
      |                                   |       -EFAULT.                      |
      |                                   |   - Since BTRFS_INODE_DIO_READY is  |
      |                                   |     set, we don't release reserved  |
      |                                   |     space.                          |
      |                                   |   - Clear BTRFS_INODE_DIO_READY bit.|
      | -EIOCBQUEUED is returned.         |                                     |
      |-----------------------------------+-------------------------------------|
      
      Hence this commit introduces "struct btrfs_dio_data" to track the usage of
      reserved data space. The remaining unused "reserve space" can now be freed
      reliably.
      Signed-off-by: NChandan Rajendra <chandan@linux.vnet.ibm.com>
      Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NChris Mason <clm@fb.com>
      50745b0a
  11. 21 9月, 2015 4 次提交
  12. 20 9月, 2015 1 次提交
    • C
      fs-writeback: unplug before cond_resched in writeback_sb_inodes · 590dca3a
      Chris Mason 提交于
      Commit 505a666e ("writeback: plug writeback in wb_writeback() and
      writeback_inodes_wb()") has us holding a plug during writeback_sb_inodes,
      which increases the merge rate when relatively contiguous small files
      are written by the filesystem.  It helps both on flash and spindles.
      
      For an fs_mark workload creating 4K files in parallel across 8 drives,
      this commit improves performance ~9% more by unplugging before calling
      cond_resched().  cond_resched() doesn't trigger an implicit unplug, so
      explicitly getting the IO down to the device before scheduling reduces
      latencies for anyone waiting on clean pages.
      
      It also cuts down on how often we use kblockd to unplug, which means
      less work bouncing from one workqueue to another.
      
      Many more details about how we got here:
      
        https://lkml.org/lkml/2015/9/11/570Signed-off-by: NChris Mason <clm@fb.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      590dca3a
  13. 18 9月, 2015 5 次提交
    • E
      userfaultfd: add missing mmput() in error path · c03e946f
      Eric Biggers 提交于
      This fixes a memleak if anon_inode_getfile() fails in userfaultfd().
      Signed-off-by: NEric Biggers <ebiggers3@gmail.com>
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c03e946f
    • K
      nfs/filelayout: Fix NULL reference caused by double freeing of fh_array · 3ec0c979
      Kinglong Mee 提交于
      If filelayout_decode_layout fail, _filelayout_free_lseg will causes
      a double freeing of fh_array.
      
      [ 1179.279800] BUG: unable to handle kernel NULL pointer dereference at           (null)
      [ 1179.280198] IP: [<ffffffffa027222d>] filelayout_free_fh_array.isra.11+0x1d/0x70 [nfs_layout_nfsv41_files]
      [ 1179.281010] PGD 0
      [ 1179.281443] Oops: 0000 [#1]
      [ 1179.281831] Modules linked in: nfs_layout_nfsv41_files(OE) nfsv4(OE) nfs(OE) fscache(E) xfs libcrc32c coretemp nfsd crct10dif_pclmul ppdev crc32_pclmul crc32c_intel auth_rpcgss ghash_clmulni_intel nfs_acl lockd vmw_balloon grace sunrpc parport_pc vmw_vmci parport shpchp i2c_piix4 vmwgfx drm_kms_helper ttm drm serio_raw mptspi scsi_transport_spi mptscsih e1000 mptbase ata_generic pata_acpi [last unloaded: fscache]
      [ 1179.283891] CPU: 0 PID: 13336 Comm: cat Tainted: G           OE   4.3.0-rc1-pnfs+ #244
      [ 1179.284323] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 05/20/2014
      [ 1179.285206] task: ffff8800501d48c0 ti: ffff88003e3c4000 task.ti: ffff88003e3c4000
      [ 1179.285668] RIP: 0010:[<ffffffffa027222d>]  [<ffffffffa027222d>] filelayout_free_fh_array.isra.11+0x1d/0x70 [nfs_layout_nfsv41_files]
      [ 1179.286612] RSP: 0018:ffff88003e3c77f8  EFLAGS: 00010202
      [ 1179.287092] RAX: 0000000000000000 RBX: ffff88001fe78900 RCX: 0000000000000000
      [ 1179.287731] RDX: ffffea0000f40760 RSI: ffff88001fe789c8 RDI: ffff88001fe789c0
      [ 1179.288383] RBP: ffff88003e3c7810 R08: ffffea0000f40760 R09: 0000000000000000
      [ 1179.289170] R10: 0000000000000000 R11: 0000000000000001 R12: ffff88001fe789c8
      [ 1179.289959] R13: ffff88001fe789c0 R14: ffff88004ec05a80 R15: ffff88004f935b88
      [ 1179.290791] FS:  00007f4e66bb5700(0000) GS:ffffffff81c29000(0000) knlGS:0000000000000000
      [ 1179.291580] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 1179.292209] CR2: 0000000000000000 CR3: 00000000203f8000 CR4: 00000000001406f0
      [ 1179.292731] Stack:
      [ 1179.293195]  ffff88001fe78900 00000000000000d0 ffff88001fe78178 ffff88003e3c7868
      [ 1179.293676]  ffffffffa0272737 0000000000000001 0000000000000001 ffff88001fe78800
      [ 1179.294151]  00000000614fffce ffffffff81727671 ffff88001fe78100 ffff88001fe78100
      [ 1179.294623] Call Trace:
      [ 1179.295092]  [<ffffffffa0272737>] filelayout_alloc_lseg+0xa7/0x2d0 [nfs_layout_nfsv41_files]
      [ 1179.295625]  [<ffffffff81727671>] ? out_of_line_wait_on_bit+0x81/0xb0
      [ 1179.296133]  [<ffffffffa040407e>] pnfs_layout_process+0xae/0x320 [nfsv4]
      [ 1179.296632]  [<ffffffffa03e0a01>] nfs4_proc_layoutget+0x2b1/0x360 [nfsv4]
      [ 1179.297134]  [<ffffffffa0402983>] pnfs_update_layout+0x853/0xb30 [nfsv4]
      [ 1179.297632]  [<ffffffffa039db24>] ? nfs_get_lock_context+0x74/0x170 [nfs]
      [ 1179.298158]  [<ffffffffa0271807>] filelayout_pg_init_read+0x37/0x50 [nfs_layout_nfsv41_files]
      [ 1179.298834]  [<ffffffffa03a72d9>] __nfs_pageio_add_request+0x119/0x460 [nfs]
      [ 1179.299385]  [<ffffffffa03a6bd7>] ? nfs_create_request.part.9+0x37/0x2e0 [nfs]
      [ 1179.299872]  [<ffffffffa03a7cc3>] nfs_pageio_add_request+0xa3/0x1b0 [nfs]
      [ 1179.300362]  [<ffffffffa03a8635>] readpage_async_filler+0x85/0x260 [nfs]
      [ 1179.300907]  [<ffffffff81180cb1>] read_cache_pages+0x91/0xd0
      [ 1179.301391]  [<ffffffffa03a85b0>] ? nfs_read_completion+0x220/0x220 [nfs]
      [ 1179.301867]  [<ffffffffa03a8dc8>] nfs_readpages+0x128/0x200 [nfs]
      [ 1179.302330]  [<ffffffff81180ef3>] __do_page_cache_readahead+0x203/0x280
      [ 1179.302784]  [<ffffffff81180dc8>] ? __do_page_cache_readahead+0xd8/0x280
      [ 1179.303413]  [<ffffffff81181116>] ondemand_readahead+0x1a6/0x2f0
      [ 1179.303855]  [<ffffffff81181371>] page_cache_sync_readahead+0x31/0x50
      [ 1179.304286]  [<ffffffff811750a6>] generic_file_read_iter+0x4a6/0x5c0
      [ 1179.304711]  [<ffffffffa03a0316>] ? __nfs_revalidate_mapping+0x1f6/0x240 [nfs]
      [ 1179.305132]  [<ffffffffa039ccf2>] nfs_file_read+0x52/0xa0 [nfs]
      [ 1179.305540]  [<ffffffff811e343c>] __vfs_read+0xcc/0x100
      [ 1179.305936]  [<ffffffff811e3d15>] vfs_read+0x85/0x130
      [ 1179.306326]  [<ffffffff811e4a98>] SyS_read+0x58/0xd0
      [ 1179.306708]  [<ffffffff8172caaf>] entry_SYSCALL_64_fastpath+0x12/0x76
      [ 1179.307094] Code: c4 66 66 66 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89 e5 41 55 41 54 53 8b 07 49 89 f4 85 c0 74 47 48 8b 06 49 89 fd <48> 8b 38 48 85 ff 74 22 31 db eb 0c 48 63 d3 48 8b 3c d0 48 85
      [ 1179.308357] RIP  [<ffffffffa027222d>] filelayout_free_fh_array.isra.11+0x1d/0x70 [nfs_layout_nfsv41_files]
      [ 1179.309177]  RSP <ffff88003e3c77f8>
      [ 1179.309582] CR2: 0000000000000000
      Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      3ec0c979
    • J
      nfs: fix v4.2 SEEK on files over 2 gigs · 306a5549
      J. Bruce Fields 提交于
      We're incorrectly assigning a loff_t return to an int.  If SEEK_HOLE or
      SEEK_DATA returns an offset over 2^31 then the application will see a
      weird lseek() result (usually -EIO).
      
      Cc: stable@vger.kernel.org
      Fixes: bdcc2cd1 "NFSv4.2: handle NFS-specific llseek errors"
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      Reviewed-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      306a5549
    • P
      nfs: fix pg_test page count calculation · 048883e0
      Peng Tao 提交于
      We really want sizeof(struct page *) instead. Otherwise we limit
      maximum IO size to 64 pages rather than 512 pages on a 64bit system.
      
      Fixes 2e11f829(nfs: cap request size to fit a kmalloced page array).
      
      Cc: Christoph Hellwig <hch@lst.de>
      Signed-off-by: NPeng Tao <tao.peng@primarydata.com>
      Fixes: 2e11f829 ("nfs: cap request size to fit a kmalloced page array")
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      048883e0
    • O
      Failing to send a CLOSE if file is opened WRONLY and server reboots on a 4.x mount · a41cbe86
      Olga Kornievskaia 提交于
      A test case is as the description says:
      open(foobar, O_WRONLY);
      sleep()  --> reboot the server
      close(foobar)
      
      The bug is because in nfs4state.c in nfs4_reclaim_open_state() a few
      line before going to restart, there is
      clear_bit(NFS4CLNT_RECLAIM_NOGRACE, &state->flags).
      
      NFS4CLNT_RECLAIM_NOGRACE is a flag for the client states not open
      owner states. Value of NFS4CLNT_RECLAIM_NOGRACE is 4 which is the
      value of NFS_O_WRONLY_STATE in nfs4_state->flags. So clearing it wipes
      out state and when we go to close it, “call_close” doesn’t get set as
      state flag is not set and CLOSE doesn’t go on the wire.
      Signed-off-by: NOlga Kornievskaia <aglo@umich.edu>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      a41cbe86
  14. 16 9月, 2015 2 次提交
  15. 15 9月, 2015 2 次提交
    • J
      btrfs: skip waiting on ordered range for special files · a30e577c
      Jeff Mahoney 提交于
      In btrfs_evict_inode, we properly truncate the page cache for evicted
      inodes but then we call btrfs_wait_ordered_range for every inode as well.
      It's the right thing to do for regular files but results in incorrect
      behavior for device inodes for block devices.
      
      filemap_fdatawrite_range gets called with inode->i_mapping which gets
      resolved to the block device inode before getting passed to
      wbc_attach_fdatawrite_inode and ultimately to inode_to_bdi.  What happens
      next depends on whether there's an open file handle associated with the
      inode.  If there is, we write to the block device, which is unexpected
      behavior.  If there isn't, we through normally and inode->i_data is used.
      We can also end up racing against open/close which can result in crashes
      when i_mapping points to a block device inode that has been closed.
      
      Since there can't be any page cache associated with special file inodes,
      it's safe to skip the btrfs_wait_ordered_range call entirely and avoid
      the problem.
      
      Cc: <stable@vger.kernel.org>
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=100911Tested-by: NChristoph Biedl <linux-kernel.bfrz@manchmal.in-ulm.de>
      Signed-off-by: NJeff Mahoney <jeffm@suse.com>
      Reviewed-by: NFilipe Manana <fdmanana@suse.com>
      a30e577c
    • F
      Btrfs: fix read corruption of compressed and shared extents · 005efedf
      Filipe Manana 提交于
      If a file has a range pointing to a compressed extent, followed by
      another range that points to the same compressed extent and a read
      operation attempts to read both ranges (either completely or part of
      them), the pages that correspond to the second range are incorrectly
      filled with zeroes.
      
      Consider the following example:
      
        File layout
        [0 - 8K]                      [8K - 24K]
            |                             |
            |                             |
         points to extent X,         points to extent X,
         offset 4K, length of 8K     offset 0, length 16K
      
        [extent X, compressed length = 4K uncompressed length = 16K]
      
      If a readpages() call spans the 2 ranges, a single bio to read the extent
      is submitted - extent_io.c:submit_extent_page() would only create a new
      bio to cover the second range pointing to the extent if the extent it
      points to had a different logical address than the extent associated with
      the first range. This has a consequence of the compressed read end io
      handler (compression.c:end_compressed_bio_read()) finish once the extent
      is decompressed into the pages covering the first range, leaving the
      remaining pages (belonging to the second range) filled with zeroes (done
      by compression.c:btrfs_clear_biovec_end()).
      
      So fix this by submitting the current bio whenever we find a range
      pointing to a compressed extent that was preceded by a range with a
      different extent map. This is the simplest solution for this corner
      case. Making the end io callback populate both ranges (or more, if we
      have multiple pointing to the same extent) is a much more complex
      solution since each bio is tightly coupled with a single extent map and
      the extent maps associated to the ranges pointing to the shared extent
      can have different offsets and lengths.
      
      The following test case for fstests triggers the issue:
      
        seq=`basename $0`
        seqres=$RESULT_DIR/$seq
        echo "QA output created by $seq"
        tmp=/tmp/$$
        status=1	# failure is the default!
        trap "_cleanup; exit \$status" 0 1 2 3 15
      
        _cleanup()
        {
            rm -f $tmp.*
        }
      
        # get standard environment, filters and checks
        . ./common/rc
        . ./common/filter
      
        # real QA test starts here
        _need_to_be_root
        _supported_fs btrfs
        _supported_os Linux
        _require_scratch
        _require_cloner
      
        rm -f $seqres.full
      
        test_clone_and_read_compressed_extent()
        {
            local mount_opts=$1
      
            _scratch_mkfs >>$seqres.full 2>&1
            _scratch_mount $mount_opts
      
            # Create a test file with a single extent that is compressed (the
            # data we write into it is highly compressible no matter which
            # compression algorithm is used, zlib or lzo).
            $XFS_IO_PROG -f -c "pwrite -S 0xaa 0K 4K"        \
                            -c "pwrite -S 0xbb 4K 8K"        \
                            -c "pwrite -S 0xcc 12K 4K"       \
                            $SCRATCH_MNT/foo | _filter_xfs_io
      
            # Now clone our extent into an adjacent offset.
            $CLONER_PROG -s $((4 * 1024)) -d $((16 * 1024)) -l $((8 * 1024)) \
                $SCRATCH_MNT/foo $SCRATCH_MNT/foo
      
            # Same as before but for this file we clone the extent into a lower
            # file offset.
            $XFS_IO_PROG -f -c "pwrite -S 0xaa 8K 4K"         \
                            -c "pwrite -S 0xbb 12K 8K"        \
                            -c "pwrite -S 0xcc 20K 4K"        \
                            $SCRATCH_MNT/bar | _filter_xfs_io
      
            $CLONER_PROG -s $((12 * 1024)) -d 0 -l $((8 * 1024)) \
                $SCRATCH_MNT/bar $SCRATCH_MNT/bar
      
            echo "File digests before unmounting filesystem:"
            md5sum $SCRATCH_MNT/foo | _filter_scratch
            md5sum $SCRATCH_MNT/bar | _filter_scratch
      
            # Evicting the inode or clearing the page cache before reading
            # again the file would also trigger the bug - reads were returning
            # all bytes in the range corresponding to the second reference to
            # the extent with a value of 0, but the correct data was persisted
            # (it was a bug exclusively in the read path). The issue happened
            # only if the same readpages() call targeted pages belonging to the
            # first and second ranges that point to the same compressed extent.
            _scratch_remount
      
            echo "File digests after mounting filesystem again:"
            # Must match the same digests we got before.
            md5sum $SCRATCH_MNT/foo | _filter_scratch
            md5sum $SCRATCH_MNT/bar | _filter_scratch
        }
      
        echo -e "\nTesting with zlib compression..."
        test_clone_and_read_compressed_extent "-o compress=zlib"
      
        _scratch_unmount
      
        echo -e "\nTesting with lzo compression..."
        test_clone_and_read_compressed_extent "-o compress=lzo"
      
        status=0
        exit
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NFilipe Manana <fdmanana@suse.com>
      Reviewed-by: Qu Wenruo<quwenruo@cn.fujitsu.com>
      Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
      005efedf