1. 18 8月, 2015 18 次提交
  2. 13 8月, 2015 12 次提交
  3. 11 8月, 2015 1 次提交
  4. 06 8月, 2015 1 次提交
    • C
      xprtrdma: Fix large NFS SYMLINK calls · 2fcc213a
      Chuck Lever 提交于
      Repair how rpcrdma_marshal_req() chooses which RDMA message type
      to use for large non-WRITE operations so that it picks RDMA_NOMSG
      in the correct situations, and sets up the marshaling logic to
      SEND only the RPC/RDMA header.
      
      Large NFSv2 SYMLINK requests now use RDMA_NOMSG calls. The Linux NFS
      server XDR decoder for NFSv2 SYMLINK does not handle having the
      pathname argument arrive in a separate buffer. The decoder could be
      fixed, but this is simpler and RDMA_NOMSG can be used in a variety
      of other situations.
      
      Ensure that the Linux client continues to use "RDMA_MSG + read
      list" when sending large NFSv3 SYMLINK requests, which is more
      efficient than using RDMA_NOMSG.
      
      Large NFSv4 CREATE(NF4LNK) requests are changed to use "RDMA_MSG +
      read list" just like NFSv3 (see Section 5 of RFC 5667). Before,
      these did not work at all.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Tested-by: NDevesh Sharma <devesh.sharma@avagotech.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      2fcc213a
  5. 02 8月, 2015 1 次提交
    • A
      link_path_walk(): be careful when failing with ENOTDIR · 97242f99
      Al Viro 提交于
      In RCU mode we might end up with dentry evicted just we check
      that it's a directory.  In such case we should return ECHILD
      rather than ENOTDIR, so that pathwalk would be retries in non-RCU
      mode.
      
      Breakage had been introduced in commit b18825a7 - prior to that
      we were looking at nd->inode, which had been fetched before
      verifying that ->d_seq was still valid.  That form of check
      would only be satisfied if at some point the pathname prefix
      would indeed have resolved to a non-directory.  The fix consists
      of checking ->d_seq after we'd run into a non-directory dentry,
      and failing with ECHILD in case of mismatch.
      
      Note that all branches since 3.12 have that problem...
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      97242f99
  6. 29 7月, 2015 3 次提交
    • D
      xfs: remote attributes need to be considered data · df150ed1
      Dave Chinner 提交于
      We don't log remote attribute contents, and instead write them
      synchronously before we commit the block allocation and attribute
      tree update transaction. As a result we are writing to the allocated
      space before the allcoation has been made permanent.
      
      As a result, we cannot consider this allocation to be a metadata
      allocation. Metadata allocation can take blocks from the free list
      and so reuse them before the transaction that freed the block is
      committed to disk. This behaviour is perfectly fine for journalled
      metadata changes as log recovery will ensure the free operation is
      replayed before the overwrite, but for remote attribute writes this
      is not the case.
      
      Hence we have to consider the remote attribute blocks to contain
      data and allocate accordingly. We do this by dropping the
      XFS_BMAPI_METADATA flag from the block allocation. This means the
      allocation will not use blocks that are on the busy list without
      first ensuring that the freeing transaction has been committed to
      disk and the blocks removed from the busy list. This ensures we will
      never overwrite a freed block without first ensuring that it is
      really free.
      
      cc: <stable@vger.kernel.org>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      
      df150ed1
    • D
      xfs: remote attribute headers contain an invalid LSN · e3c32ee9
      Dave Chinner 提交于
      In recent testing, a system that crashed failed log recovery on
      restart with a bad symlink buffer magic number:
      
      XFS (vda): Starting recovery (logdev: internal)
      XFS (vda): Bad symlink block magic!
      XFS: Assertion failed: 0, file: fs/xfs/xfs_log_recover.c, line: 2060
      
      On examination of the log via xfs_logprint, none of the symlink
      buffers in the log had a bad magic number, nor were any other types
      of buffer log format headers mis-identified as symlink buffers.
      Tracing was used to find the buffer the kernel was tripping over,
      and xfs_db identified it's contents as:
      
      000: 5841524d 00000000 00000346 64d82b48 8983e692 d71e4680 a5f49e2c b317576e
      020: 00000000 00602038 00000000 006034ce d0020000 00000000 4d4d4d4d 4d4d4d4d
      040: 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d
      060: 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d 4d4d4d4d
      .....
      
      This is a remote attribute buffer, which are notable in that they
      are not logged but are instead written synchronously by the remote
      attribute code so that they exist on disk before the attribute
      transactions are committed to the journal.
      
      The above remote attribute block has an invalid LSN in it - cycle
      0xd002000, block 0 - which means when log recovery comes along to
      determine if the transaction that writes to the underlying block
      should be replayed, it sees a block that has a future LSN and so
      does not replay the buffer data in the transaction. Instead, it
      validates the buffer magic number and attaches the buffer verifier
      to it.  It is this buffer magic number check that is failing in the
      above assert, indicating that we skipped replay due to the LSN of
      the underlying buffer.
      
      The problem here is that the remote attribute buffers cannot have a
      valid LSN placed into them, because the transaction that contains 
      the attribute tree pointer changes and the block allocation that the
      attribute data is being written to hasn't yet been committed. Hence
      the LSN field in the attribute block is completely unwritten,
      thereby leaving the underlying contents of the block in the LSN
      field. It could have any value, and hence a future overwrite of the
      block by log recovery may or may not work correctly.
      
      Fix this by always writing an invalid LSN to the remote attribute
      block, as any buffer in log recovery that needs to write over the
      remote attribute should occur. We are protected from having old data
      written over the attribute by the fact that freeing the block before
      the remote attribute is written will result in the buffer being
      marked stale in the log and so all changes prior to the buffer stale
      transaction will be cancelled by log recovery.
      
      Hence it is safe to ignore the LSN in the case or synchronously
      written, unlogged metadata such as remote attribute blocks, and to
      ensure we do that correctly, we need to write an invalid LSN to all
      remote attribute blocks to trigger immediate recovery of metadata
      that is written over the top.
      
      As a further protection for filesystems that may already have remote
      attribute blocks with bad LSNs on disk, change the log recovery code
      to always trigger immediate recovery of metadata over remote
      attribute blocks.
      
      cc: <stable@vger.kernel.org>
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      e3c32ee9
    • D
      xfs: call dax_fault on read page faults for DAX · b2442c5a
      Dave Chinner 提交于
      When modifying the patch series to handle the XFS MMAP_LOCK nesting
      of page faults, I botched the conversion of the read page fault
      path, and so it is only every calling through the page cache. Re-add
      the necessary __dax_fault() call for such files.
      
      Because the get_blocks callback on read faults may not set up the
      mapping buffer correctly to allow unwritten extent completion to be
      run, we need to allow callers of __dax_fault() to pass a null
      complete_unwritten() callback. The DAX code always zeros the
      unwritten page when it is read faulted so there are no stale data
      exposure issues with not doing the conversion. The only downside
      will be the potential for increased CPU overhead on repeated read
      faults of the same page. If this proves to be a problem, then the
      filesystem needs to fix it's get_block callback and provide a
      convert_unwritten() callback to the read fault path.
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Reviewed-by: NMatthew Wilcox <willy@linux.intel.com>
      Reviewed-by: NBrian Foster <bfoster@redhat.com>
      Signed-off-by: NDave Chinner <david@fromorbit.com>
      b2442c5a
  7. 28 7月, 2015 2 次提交
    • K
      nfs: Fix an oops caused by using other thread's stack space in ASYNC mode · a49c2691
      Kinglong Mee 提交于
      An oops caused by using other thread's stack space in sunrpc ASYNC sending thread.
      
      [ 9839.007187] ------------[ cut here ]------------
      [ 9839.007923] kernel BUG at fs/nfs/nfs4xdr.c:910!
      [ 9839.008069] invalid opcode: 0000 [#1] SMP
      [ 9839.008069] Modules linked in: blocklayoutdriver rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm joydev iosf_mbi crct10dif_pclmul snd_timer crc32_pclmul crc32c_intel ghash_clmulni_intel snd soundcore ppdev pvpanic parport_pc i2c_piix4 serio_raw virtio_balloon parport acpi_cpufreq nfsd nfs_acl lockd grace auth_rpcgss sunrpc qxl drm_kms_helper virtio_net virtio_console virtio_blk ttm drm virtio_pci virtio_ring virtio ata_generic pata_acpi
      [ 9839.008069] CPU: 0 PID: 308 Comm: kworker/0:1H Not tainted 4.0.0-0.rc4.git1.3.fc23.x86_64 #1
      [ 9839.008069] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [ 9839.008069] Workqueue: rpciod rpc_async_schedule [sunrpc]
      [ 9839.008069] task: ffff8800d8b4d8e0 ti: ffff880036678000 task.ti: ffff880036678000
      [ 9839.008069] RIP: 0010:[<ffffffffa0339cc9>]  [<ffffffffa0339cc9>] reserve_space.part.73+0x9/0x10 [nfsv4]
      [ 9839.008069] RSP: 0018:ffff88003667ba58  EFLAGS: 00010246
      [ 9839.008069] RAX: 0000000000000000 RBX: 000000001fc15e18 RCX: ffff8800c0193800
      [ 9839.008069] RDX: ffff8800e4ae3f24 RSI: 000000001fc15e2c RDI: ffff88003667bcd0
      [ 9839.008069] RBP: ffff88003667ba58 R08: ffff8800d9173008 R09: 0000000000000003
      [ 9839.008069] R10: ffff88003667bcd0 R11: 000000000000000c R12: 0000000000010000
      [ 9839.008069] R13: ffff8800d9173350 R14: 0000000000000000 R15: ffff8800c0067b98
      [ 9839.008069] FS:  0000000000000000(0000) GS:ffff88011fc00000(0000) knlGS:0000000000000000
      [ 9839.008069] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 9839.008069] CR2: 00007f988c9c8bb0 CR3: 00000000d99b6000 CR4: 00000000000407f0
      [ 9839.008069] Stack:
      [ 9839.008069]  ffff88003667bbc8 ffffffffa03412c5 00000000c6c55680 ffff880000000003
      [ 9839.008069]  0000000000000088 00000010c6c55680 0001000000000002 ffffffff816e87e9
      [ 9839.008069]  0000000000000000 00000000477290e2 ffff88003667bab8 ffffffff81327ba3
      [ 9839.008069] Call Trace:
      [ 9839.008069]  [<ffffffffa03412c5>] encode_attrs+0x435/0x530 [nfsv4]
      [ 9839.008069]  [<ffffffff816e87e9>] ? inet_sendmsg+0x69/0xb0
      [ 9839.008069]  [<ffffffff81327ba3>] ? selinux_socket_sendmsg+0x23/0x30
      [ 9839.008069]  [<ffffffff8164c1df>] ? do_sock_sendmsg+0x9f/0xc0
      [ 9839.008069]  [<ffffffff8164c278>] ? kernel_sendmsg+0x58/0x70
      [ 9839.008069]  [<ffffffffa011acc0>] ? xdr_reserve_space+0x20/0x170 [sunrpc]
      [ 9839.008069]  [<ffffffffa011acc0>] ? xdr_reserve_space+0x20/0x170 [sunrpc]
      [ 9839.008069]  [<ffffffffa0341b40>] ? nfs4_xdr_enc_open_noattr+0x130/0x130 [nfsv4]
      [ 9839.008069]  [<ffffffffa03419a5>] encode_open+0x2d5/0x340 [nfsv4]
      [ 9839.008069]  [<ffffffffa0341b40>] ? nfs4_xdr_enc_open_noattr+0x130/0x130 [nfsv4]
      [ 9839.008069]  [<ffffffffa011ab89>] ? xdr_encode_opaque+0x19/0x20 [sunrpc]
      [ 9839.008069]  [<ffffffffa0339cfb>] ? encode_string+0x2b/0x40 [nfsv4]
      [ 9839.008069]  [<ffffffffa0341bf3>] nfs4_xdr_enc_open+0xb3/0x140 [nfsv4]
      [ 9839.008069]  [<ffffffffa0110a4c>] rpcauth_wrap_req+0xac/0xf0 [sunrpc]
      [ 9839.008069]  [<ffffffffa01017db>] call_transmit+0x18b/0x2d0 [sunrpc]
      [ 9839.008069]  [<ffffffffa0101650>] ? call_decode+0x860/0x860 [sunrpc]
      [ 9839.008069]  [<ffffffffa0101650>] ? call_decode+0x860/0x860 [sunrpc]
      [ 9839.008069]  [<ffffffffa010caa0>] __rpc_execute+0x90/0x460 [sunrpc]
      [ 9839.008069]  [<ffffffffa010ce85>] rpc_async_schedule+0x15/0x20 [sunrpc]
      [ 9839.008069]  [<ffffffff810b452b>] process_one_work+0x1bb/0x410
      [ 9839.008069]  [<ffffffff810b47d3>] worker_thread+0x53/0x470
      [ 9839.008069]  [<ffffffff810b4780>] ? process_one_work+0x410/0x410
      [ 9839.008069]  [<ffffffff810b4780>] ? process_one_work+0x410/0x410
      [ 9839.008069]  [<ffffffff810ba7b8>] kthread+0xd8/0xf0
      [ 9839.008069]  [<ffffffff810ba6e0>] ? kthread_worker_fn+0x180/0x180
      [ 9839.008069]  [<ffffffff81786418>] ret_from_fork+0x58/0x90
      [ 9839.008069]  [<ffffffff810ba6e0>] ? kthread_worker_fn+0x180/0x180
      [ 9839.008069] Code: 00 00 48 c7 c7 21 fa 37 a0 e8 94 1c d6 e0 c6 05 d2 17 05 00 01 8b 03 eb d7 66 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 <0f> 0b 0f 1f 44 00 00 66 66 66 66 90 55 48 89 e5 41 54 53 89 f3
      [ 9839.008069] RIP  [<ffffffffa0339cc9>] reserve_space.part.73+0x9/0x10 [nfsv4]
      [ 9839.008069]  RSP <ffff88003667ba58>
      [ 9839.071114] ---[ end trace cc14c03adb522e94 ]---
      Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      a49c2691
    • J
      nfs: plug memory leak when ->prepare_layoutcommit fails · 3471648a
      Jeff Layton 提交于
      "data" is currently leaked when the prepare_layoutcommit operation
      returns an error. Put the cred before taking the spinlock in that
      case, take the lock and then goto out_unlock which will drop the
      lock and then free "data".
      Signed-off-by: NJeff Layton <jeff.layton@primarydata.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      3471648a
  8. 27 7月, 2015 2 次提交