1. 16 8月, 2016 1 次提交
  2. 29 7月, 2016 1 次提交
  3. 20 7月, 2016 3 次提交
    • T
    • T
      nfs4: clnt: respect noresvport when establishing connections to DSes · 3fc75f12
      Tigran Mkrtchyan 提交于
      result:
      
      $ mount -o vers=4.1 dcache-lab007:/ /pnfs
      $ cp /etc/profile /pnfs
      tcp        0      0 131.169.185.68:1005     131.169.191.141:32049   ESTABLISHED
      tcp        0      0 131.169.185.68:751      131.169.191.144:2049    ESTABLISHED
      $
      
      $ mount -o vers=4.1,noresvport dcache-lab007:/ /pnfs
      $ cp /etc/profile /pnfs
      tcp        0      0 131.169.185.68:34894    131.169.191.141:32049   ESTABLISHED
      tcp        0      0 131.169.185.68:35722    131.169.191.144:2049    ESTABLISHED
      $
      Signed-off-by: NTigran Mkrtchyan <tigran.mkrtchyan@desy.de>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      3fc75f12
    • S
      sunrpc: move NO_CRKEY_TIMEOUT to the auth->au_flags · ce52914e
      Scott Mayhew 提交于
      A generic_cred can be used to look up a unx_cred or a gss_cred, so it's
      not really safe to use the the generic_cred->acred->ac_flags to store
      the NO_CRKEY_TIMEOUT flag.  A lookup for a unx_cred triggered while the
      KEY_EXPIRE_SOON flag is already set will cause both NO_CRKEY_TIMEOUT and
      KEY_EXPIRE_SOON to be set in the ac_flags, leaving the user associated
      with the auth_cred to be in a state where they're perpetually doing 4K
      NFS_FILE_SYNC writes.
      
      This can be reproduced as follows:
      
      1. Mount two NFS filesystems, one with sec=krb5 and one with sec=sys.
      They do not need to be the same export, nor do they even need to be from
      the same NFS server.  Also, v3 is fine.
      $ sudo mount -o v3,sec=krb5 server1:/export /mnt/krb5
      $ sudo mount -o v3,sec=sys server2:/export /mnt/sys
      
      2. As the normal user, before accessing the kerberized mount, kinit with
      a short lifetime (but not so short that renewing the ticket would leave
      you within the 4-minute window again by the time the original ticket
      expires), e.g.
      $ kinit -l 10m -r 60m
      
      3. Do some I/O to the kerberized mount and verify that the writes are
      wsize, UNSTABLE:
      $ dd if=/dev/zero of=/mnt/krb5/file bs=1M count=1
      
      4. Wait until you're within 4 minutes of key expiry, then do some more
      I/O to the kerberized mount to ensure that RPC_CRED_KEY_EXPIRE_SOON gets
      set.  Verify that the writes are 4K, FILE_SYNC:
      $ dd if=/dev/zero of=/mnt/krb5/file bs=1M count=1
      
      5. Now do some I/O to the sec=sys mount.  This will cause
      RPC_CRED_NO_CRKEY_TIMEOUT to be set:
      $ dd if=/dev/zero of=/mnt/sys/file bs=1M count=1
      
      6. Writes for that user will now be permanently 4K, FILE_SYNC for that
      user, regardless of which mount is being written to, until you reboot
      the client.  Renewing the kerberos ticket (assuming it hasn't already
      expired) will have no effect.  Grabbing a new kerberos ticket at this
      point will have no effect either.
      
      Move the flag to the auth->au_flags field (which is currently unused)
      and rename it slightly to reflect that it's no longer associated with
      the auth_cred->ac_flags.  Add the rpc_auth to the arg list of
      rpcauth_cred_key_to_expire and check the au_flags there too.  Finally,
      add the inode to the arg list of nfs_ctx_key_to_expire so we can
      determine the rpc_auth to pass to rpcauth_cred_key_to_expire.
      Signed-off-by: NScott Mayhew <smayhew@redhat.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      ce52914e
  4. 06 7月, 2016 5 次提交
  5. 01 7月, 2016 1 次提交
    • T
      NFS: Fix an Oops in the pNFS files and flexfiles connection setup to the DS · 5c6e5b60
      Trond Myklebust 提交于
      Chris Worley reports:
       RIP: 0010:[<ffffffffa0245f80>]  [<ffffffffa0245f80>] rpc_new_client+0x2a0/0x2e0 [sunrpc]
       RSP: 0018:ffff880158f6f548  EFLAGS: 00010246
       RAX: 0000000000000000 RBX: ffff880234f8bc00 RCX: 000000000000ea60
       RDX: 0000000000074cc0 RSI: 000000000000ea60 RDI: ffff880234f8bcf0
       RBP: ffff880158f6f588 R08: 000000000001ac80 R09: ffff880237003300
       R10: ffff880201171000 R11: ffffea0000d75200 R12: ffffffffa03afc60
       R13: ffff880230c18800 R14: 0000000000000000 R15: ffff880158f6f680
       FS:  00007f0e32673740(0000) GS:ffff88023fc40000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
       CR2: 0000000000000008 CR3: 0000000234886000 CR4: 00000000001406e0
       Stack:
        ffffffffa047a680 0000000000000000 ffff880158f6f598 ffff880158f6f680
        ffff880158f6f680 ffff880234d11d00 ffff88023357f800 ffff880158f6f7d0
        ffff880158f6f5b8 ffffffffa024660a ffff880158f6f5b8 ffffffffa02492ec
       Call Trace:
        [<ffffffffa024660a>] rpc_create_xprt+0x1a/0xb0 [sunrpc]
        [<ffffffffa02492ec>] ? xprt_create_transport+0x13c/0x240 [sunrpc]
        [<ffffffffa0246766>] rpc_create+0xc6/0x1a0 [sunrpc]
        [<ffffffffa038e695>] nfs_create_rpc_client+0xf5/0x140 [nfs]
        [<ffffffffa038f31a>] nfs_init_client+0x3a/0xd0 [nfs]
        [<ffffffffa038f22f>] nfs_get_client+0x25f/0x310 [nfs]
        [<ffffffffa025cef8>] ? rpc_ntop+0xe8/0x100 [sunrpc]
        [<ffffffffa047512c>] nfs3_set_ds_client+0xcc/0x100 [nfsv3]
        [<ffffffffa041fa10>] nfs4_pnfs_ds_connect+0x120/0x400 [nfsv4]
        [<ffffffffa03d41c7>] nfs4_ff_layout_prepare_ds+0xe7/0x330 [nfs_layout_flexfiles]
        [<ffffffffa03d1b1b>] ff_layout_pg_init_write+0xcb/0x280 [nfs_layout_flexfiles]
        [<ffffffffa03a14dc>] __nfs_pageio_add_request+0x12c/0x490 [nfs]
        [<ffffffffa03a1fa2>] nfs_pageio_add_request+0xc2/0x2a0 [nfs]
        [<ffffffffa03a0365>] ? nfs_pageio_init+0x75/0x120 [nfs]
        [<ffffffffa03a5b50>] nfs_do_writepage+0x120/0x270 [nfs]
        [<ffffffffa03a5d31>] nfs_writepage_locked+0x61/0xc0 [nfs]
        [<ffffffff813d4115>] ? __percpu_counter_add+0x55/0x70
        [<ffffffffa03a6a9f>] nfs_wb_single_page+0xef/0x1c0 [nfs]
        [<ffffffff811ca4a3>] ? __dec_zone_page_state+0x33/0x40
        [<ffffffffa0395b21>] nfs_launder_page+0x41/0x90 [nfs]
        [<ffffffff811baba0>] invalidate_inode_pages2_range+0x340/0x3a0
        [<ffffffff811bac17>] invalidate_inode_pages2+0x17/0x20
        [<ffffffffa039960e>] nfs_release+0x9e/0xb0 [nfs]
        [<ffffffffa0399570>] ? nfs_open+0x60/0x60 [nfs]
        [<ffffffffa0394dad>] nfs_file_release+0x3d/0x60 [nfs]
        [<ffffffff81226e6c>] __fput+0xdc/0x1e0
        [<ffffffff81226fbe>] ____fput+0xe/0x10
        [<ffffffff810bf2e4>] task_work_run+0xc4/0xe0
        [<ffffffff810a4188>] do_exit+0x2e8/0xb30
        [<ffffffff8102471c>] ? do_audit_syscall_entry+0x6c/0x70
        [<ffffffff811464e6>] ? __audit_syscall_exit+0x1e6/0x280
        [<ffffffff810a4a5f>] do_group_exit+0x3f/0xa0
        [<ffffffff810a4ad4>] SyS_exit_group+0x14/0x20
        [<ffffffff8179b76e>] system_call_fastpath+0x12/0x71
      
      Which seems to be due to a call to utsname() when in a task exit context
      in order to determine the hostname to set in rpc_new_client().
      
      In reality, what we want here is not the hostname of the current task, but
      the hostname that was used to set up the metadata server.
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      5c6e5b60
  6. 18 5月, 2016 1 次提交
  7. 05 4月, 2016 1 次提交
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  8. 17 3月, 2016 2 次提交
  9. 22 1月, 2016 1 次提交
  10. 08 1月, 2016 1 次提交
  11. 01 1月, 2016 2 次提交
  12. 29 12月, 2015 1 次提交
    • P
      nfs: handle request add failure properly · 0bcbf039
      Peng Tao 提交于
      When we fail to queue a read page to IO descriptor,
      we need to clean it up otherwise it is hanging around
      preventing nfs module from being removed.
      
      When we fail to queue a write page to IO descriptor,
      we need to clean it up and also save the failure status
      to open context. Then at file close, we can try to write
      pages back again and drop the page if it fails to writeback
      in .launder_page, which will be done in the next patch.
      Signed-off-by: NPeng Tao <tao.peng@primarydata.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      0bcbf039
  13. 28 12月, 2015 1 次提交
    • O
      Adding stateid information to tracepoints · 48c9579a
      Olga Kornievskaia 提交于
      Operations to which stateid information is added:
      close, delegreturn, open, read, setattr, layoutget, layoutcommit, test_stateid,
      write, lock, locku, lockt
      
      Format is "stateid=<seqid>:<crc32 hash stateid.other>", also "openstateid=",
      "layoutstateid=", and "lockstateid=" for open_file, layoutget, set_lock
      tracepoints.
      
      New function is added to internal.h, nfs_stateid_hash(), to compute the hash
      
      trace_nfs4_setattr() is moved from nfs4_do_setattr() to _nfs4_do_setattr()
      to get access to stateid.
      
      trace_nfs4_setattr and trace_nfs4_delegreturn are changed from INODE_EVENT
      to new event type, INODE_STATEID_EVENT which is same as INODE_EVENT but adds
      stateid information
      
      for locking tracepoints, moved trace_nfs4_set_lock() into _nfs4_do_setlk()
      to get access to stateid information, and removed trace_nfs4_lock_reclaim(),
      trace_nfs4_lock_expired() as they call into _nfs4_do_setlk() and both were
      previously same LOCK_EVENT type.
      Signed-off-by: NOlga Kornievskaia <kolga@netapp.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      48c9579a
  14. 14 12月, 2015 1 次提交
  15. 08 9月, 2015 1 次提交
  16. 18 8月, 2015 1 次提交
  17. 11 8月, 2015 1 次提交
  18. 28 7月, 2015 1 次提交
    • K
      nfs: Fix an oops caused by using other thread's stack space in ASYNC mode · a49c2691
      Kinglong Mee 提交于
      An oops caused by using other thread's stack space in sunrpc ASYNC sending thread.
      
      [ 9839.007187] ------------[ cut here ]------------
      [ 9839.007923] kernel BUG at fs/nfs/nfs4xdr.c:910!
      [ 9839.008069] invalid opcode: 0000 [#1] SMP
      [ 9839.008069] Modules linked in: blocklayoutdriver rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache snd_hda_codec_generic snd_hda_intel snd_hda_controller snd_hda_codec snd_hwdep snd_seq snd_seq_device snd_pcm joydev iosf_mbi crct10dif_pclmul snd_timer crc32_pclmul crc32c_intel ghash_clmulni_intel snd soundcore ppdev pvpanic parport_pc i2c_piix4 serio_raw virtio_balloon parport acpi_cpufreq nfsd nfs_acl lockd grace auth_rpcgss sunrpc qxl drm_kms_helper virtio_net virtio_console virtio_blk ttm drm virtio_pci virtio_ring virtio ata_generic pata_acpi
      [ 9839.008069] CPU: 0 PID: 308 Comm: kworker/0:1H Not tainted 4.0.0-0.rc4.git1.3.fc23.x86_64 #1
      [ 9839.008069] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
      [ 9839.008069] Workqueue: rpciod rpc_async_schedule [sunrpc]
      [ 9839.008069] task: ffff8800d8b4d8e0 ti: ffff880036678000 task.ti: ffff880036678000
      [ 9839.008069] RIP: 0010:[<ffffffffa0339cc9>]  [<ffffffffa0339cc9>] reserve_space.part.73+0x9/0x10 [nfsv4]
      [ 9839.008069] RSP: 0018:ffff88003667ba58  EFLAGS: 00010246
      [ 9839.008069] RAX: 0000000000000000 RBX: 000000001fc15e18 RCX: ffff8800c0193800
      [ 9839.008069] RDX: ffff8800e4ae3f24 RSI: 000000001fc15e2c RDI: ffff88003667bcd0
      [ 9839.008069] RBP: ffff88003667ba58 R08: ffff8800d9173008 R09: 0000000000000003
      [ 9839.008069] R10: ffff88003667bcd0 R11: 000000000000000c R12: 0000000000010000
      [ 9839.008069] R13: ffff8800d9173350 R14: 0000000000000000 R15: ffff8800c0067b98
      [ 9839.008069] FS:  0000000000000000(0000) GS:ffff88011fc00000(0000) knlGS:0000000000000000
      [ 9839.008069] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [ 9839.008069] CR2: 00007f988c9c8bb0 CR3: 00000000d99b6000 CR4: 00000000000407f0
      [ 9839.008069] Stack:
      [ 9839.008069]  ffff88003667bbc8 ffffffffa03412c5 00000000c6c55680 ffff880000000003
      [ 9839.008069]  0000000000000088 00000010c6c55680 0001000000000002 ffffffff816e87e9
      [ 9839.008069]  0000000000000000 00000000477290e2 ffff88003667bab8 ffffffff81327ba3
      [ 9839.008069] Call Trace:
      [ 9839.008069]  [<ffffffffa03412c5>] encode_attrs+0x435/0x530 [nfsv4]
      [ 9839.008069]  [<ffffffff816e87e9>] ? inet_sendmsg+0x69/0xb0
      [ 9839.008069]  [<ffffffff81327ba3>] ? selinux_socket_sendmsg+0x23/0x30
      [ 9839.008069]  [<ffffffff8164c1df>] ? do_sock_sendmsg+0x9f/0xc0
      [ 9839.008069]  [<ffffffff8164c278>] ? kernel_sendmsg+0x58/0x70
      [ 9839.008069]  [<ffffffffa011acc0>] ? xdr_reserve_space+0x20/0x170 [sunrpc]
      [ 9839.008069]  [<ffffffffa011acc0>] ? xdr_reserve_space+0x20/0x170 [sunrpc]
      [ 9839.008069]  [<ffffffffa0341b40>] ? nfs4_xdr_enc_open_noattr+0x130/0x130 [nfsv4]
      [ 9839.008069]  [<ffffffffa03419a5>] encode_open+0x2d5/0x340 [nfsv4]
      [ 9839.008069]  [<ffffffffa0341b40>] ? nfs4_xdr_enc_open_noattr+0x130/0x130 [nfsv4]
      [ 9839.008069]  [<ffffffffa011ab89>] ? xdr_encode_opaque+0x19/0x20 [sunrpc]
      [ 9839.008069]  [<ffffffffa0339cfb>] ? encode_string+0x2b/0x40 [nfsv4]
      [ 9839.008069]  [<ffffffffa0341bf3>] nfs4_xdr_enc_open+0xb3/0x140 [nfsv4]
      [ 9839.008069]  [<ffffffffa0110a4c>] rpcauth_wrap_req+0xac/0xf0 [sunrpc]
      [ 9839.008069]  [<ffffffffa01017db>] call_transmit+0x18b/0x2d0 [sunrpc]
      [ 9839.008069]  [<ffffffffa0101650>] ? call_decode+0x860/0x860 [sunrpc]
      [ 9839.008069]  [<ffffffffa0101650>] ? call_decode+0x860/0x860 [sunrpc]
      [ 9839.008069]  [<ffffffffa010caa0>] __rpc_execute+0x90/0x460 [sunrpc]
      [ 9839.008069]  [<ffffffffa010ce85>] rpc_async_schedule+0x15/0x20 [sunrpc]
      [ 9839.008069]  [<ffffffff810b452b>] process_one_work+0x1bb/0x410
      [ 9839.008069]  [<ffffffff810b47d3>] worker_thread+0x53/0x470
      [ 9839.008069]  [<ffffffff810b4780>] ? process_one_work+0x410/0x410
      [ 9839.008069]  [<ffffffff810b4780>] ? process_one_work+0x410/0x410
      [ 9839.008069]  [<ffffffff810ba7b8>] kthread+0xd8/0xf0
      [ 9839.008069]  [<ffffffff810ba6e0>] ? kthread_worker_fn+0x180/0x180
      [ 9839.008069]  [<ffffffff81786418>] ret_from_fork+0x58/0x90
      [ 9839.008069]  [<ffffffff810ba6e0>] ? kthread_worker_fn+0x180/0x180
      [ 9839.008069] Code: 00 00 48 c7 c7 21 fa 37 a0 e8 94 1c d6 e0 c6 05 d2 17 05 00 01 8b 03 eb d7 66 0f 1f 84 00 00 00 00 00 66 66 66 66 90 55 48 89 e5 <0f> 0b 0f 1f 44 00 00 66 66 66 66 90 55 48 89 e5 41 54 53 89 f3
      [ 9839.008069] RIP  [<ffffffffa0339cc9>] reserve_space.part.73+0x9/0x10 [nfsv4]
      [ 9839.008069]  RSP <ffff88003667ba58>
      [ 9839.071114] ---[ end trace cc14c03adb522e94 ]---
      Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      a49c2691
  19. 02 6月, 2015 1 次提交
    • T
      writeback: move backing_dev_info->bdi_stat[] into bdi_writeback · 93f78d88
      Tejun Heo 提交于
      Currently, a bdi (backing_dev_info) embeds single wb (bdi_writeback)
      and the role of the separation is unclear.  For cgroup support for
      writeback IOs, a bdi will be updated to host multiple wb's where each
      wb serves writeback IOs of a different cgroup on the bdi.  To achieve
      that, a wb should carry all states necessary for servicing writeback
      IOs for a cgroup independently.
      
      This patch moves bdi->bdi_stat[] into wb.
      
      * enum bdi_stat_item is renamed to wb_stat_item and the prefix of all
        enums is changed from BDI_ to WB_.
      
      * BDI_STAT_BATCH() -> WB_STAT_BATCH()
      
      * [__]{add|inc|dec|sum}_wb_stat(bdi, ...) -> [__]{add|inc}_wb_stat(wb, ...)
      
      * bdi_stat[_error]() -> wb_stat[_error]()
      
      * bdi_writeout_inc() -> wb_writeout_inc()
      
      * stat init is moved to bdi_wb_init() and bdi_wb_exit() is added and
        frees stat.
      
      * As there's still only one bdi_writeback per backing_dev_info, all
        uses of bdi->stat[] are mechanically replaced with bdi->wb.stat[]
        introducing no behavior changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Trond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      93f78d88
  20. 02 3月, 2015 1 次提交
  21. 14 2月, 2015 1 次提交
  22. 06 2月, 2015 1 次提交
  23. 04 2月, 2015 10 次提交