1. 20 6月, 2017 1 次提交
    • I
      sched/wait: Split out the wait_bit*() APIs from <linux/wait.h> into <linux/wait_bit.h> · 5dd43ce2
      Ingo Molnar 提交于
      The wait_bit*() types and APIs are mixed into wait.h, but they
      are a pretty orthogonal extension of wait-queues.
      
      Furthermore, only about 50 kernel files use these APIs, while
      over 1000 use the regular wait-queue functionality.
      
      So clean up the main wait.h by moving the wait-bit functionality
      out of it, into a separate .h and .c file:
      
        include/linux/wait_bit.h  for types and APIs
        kernel/sched/wait_bit.c   for the implementation
      
      Update all header dependencies.
      
      This reduces the size of wait.h rather significantly, by about 30%.
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      5dd43ce2
  2. 04 6月, 2017 1 次提交
  3. 09 5月, 2017 1 次提交
  4. 04 5月, 2017 1 次提交
    • J
      nfs: Fix bdi handling for cloned superblocks · 9052c7cf
      Jan Kara 提交于
      In commit 0d3b12584972 "nfs: Convert to separately allocated bdi" I have
      wrongly cloned bdi reference in nfs_clone_super(). Further inspection
      has shown that originally the code was actually allocating a new bdi (in
      ->clone_server callback) which was later registered in
      nfs_fs_mount_common() and used for sb->s_bdi in nfs_initialise_sb().
      This could later result in bdi for the original superblock not getting
      unregistered when that superblock got shutdown (as the cloned sb still
      held bdi reference) and later when a new superblock was created under
      the same anonymous device number, a clash in sysfs has happened on bdi
      registration:
      
      ------------[ cut here ]------------
      WARNING: CPU: 1 PID: 10284 at /linux-next/fs/sysfs/dir.c:31 sysfs_warn_dup+0x64/0x74
      sysfs: cannot create duplicate filename '/devices/virtual/bdi/0:32'
      Modules linked in: axp20x_usb_power gpio_axp209 nvmem_sunxi_sid sun4i_dma sun4i_ss virt_dma
      CPU: 1 PID: 10284 Comm: mount.nfs Not tainted 4.11.0-rc4+ #14
      Hardware name: Allwinner sun7i (A20) Family
      [<c010f19c>] (unwind_backtrace) from [<c010bc74>] (show_stack+0x10/0x14)
      [<c010bc74>] (show_stack) from [<c03c6e24>] (dump_stack+0x78/0x8c)
      [<c03c6e24>] (dump_stack) from [<c0122200>] (__warn+0xe8/0x100)
      [<c0122200>] (__warn) from [<c0122250>] (warn_slowpath_fmt+0x38/0x48)
      [<c0122250>] (warn_slowpath_fmt) from [<c02ac178>] (sysfs_warn_dup+0x64/0x74)
      [<c02ac178>] (sysfs_warn_dup) from [<c02ac254>] (sysfs_create_dir_ns+0x84/0x94)
      [<c02ac254>] (sysfs_create_dir_ns) from [<c03c8b8c>] (kobject_add_internal+0x9c/0x2ec)
      [<c03c8b8c>] (kobject_add_internal) from [<c03c8e24>] (kobject_add+0x48/0x98)
      [<c03c8e24>] (kobject_add) from [<c048d75c>] (device_add+0xe4/0x5a0)
      [<c048d75c>] (device_add) from [<c048ddb4>] (device_create_groups_vargs+0xac/0xbc)
      [<c048ddb4>] (device_create_groups_vargs) from [<c048dde4>] (device_create_vargs+0x20/0x28)
      [<c048dde4>] (device_create_vargs) from [<c02075c8>] (bdi_register_va+0x44/0xfc)
      [<c02075c8>] (bdi_register_va) from [<c023d378>] (super_setup_bdi_name+0x48/0xa4)
      [<c023d378>] (super_setup_bdi_name) from [<c0312ef4>] (nfs_fill_super+0x1a4/0x204)
      [<c0312ef4>] (nfs_fill_super) from [<c03133f0>] (nfs_fs_mount_common+0x140/0x1e8)
      [<c03133f0>] (nfs_fs_mount_common) from [<c03335cc>] (nfs4_remote_mount+0x50/0x58)
      [<c03335cc>] (nfs4_remote_mount) from [<c023ef98>] (mount_fs+0x14/0xa4)
      [<c023ef98>] (mount_fs) from [<c025cba0>] (vfs_kern_mount+0x54/0x128)
      [<c025cba0>] (vfs_kern_mount) from [<c033352c>] (nfs_do_root_mount+0x80/0xa0)
      [<c033352c>] (nfs_do_root_mount) from [<c0333818>] (nfs4_try_mount+0x28/0x3c)
      [<c0333818>] (nfs4_try_mount) from [<c0313874>] (nfs_fs_mount+0x2cc/0x8c4)
      [<c0313874>] (nfs_fs_mount) from [<c023ef98>] (mount_fs+0x14/0xa4)
      [<c023ef98>] (mount_fs) from [<c025cba0>] (vfs_kern_mount+0x54/0x128)
      [<c025cba0>] (vfs_kern_mount) from [<c02600f0>] (do_mount+0x158/0xc7c)
      [<c02600f0>] (do_mount) from [<c0260f98>] (SyS_mount+0x8c/0xb4)
      [<c0260f98>] (SyS_mount) from [<c0107840>] (ret_fast_syscall+0x0/0x3c)
      
      Fix the problem by always creating new bdi for a superblock as we used
      to do.
      Reported-and-tested-by: NCorentin Labbe <clabbe.montjoie@gmail.com>
      Fixes: 0d3b12584972ce5781179ad3f15cca3cdb5cae05
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      9052c7cf
  5. 27 4月, 2017 1 次提交
  6. 21 4月, 2017 1 次提交
  7. 18 3月, 2017 1 次提交
  8. 20 12月, 2016 1 次提交
  9. 03 12月, 2016 1 次提交
  10. 02 12月, 2016 1 次提交
  11. 08 10月, 2016 1 次提交
  12. 06 10月, 2016 1 次提交
  13. 28 9月, 2016 1 次提交
  14. 27 9月, 2016 1 次提交
    • M
      fs: make remaining filesystems use .rename2 · 1cd66c93
      Miklos Szeredi 提交于
      This is trivial to do:
      
       - add flags argument to foo_rename()
       - check if flags is zero
       - assign foo_rename() to .rename2 instead of .rename
      
      This doesn't mean it's impossible to support RENAME_NOREPLACE for these
      filesystems, but it is not trivial, like for local filesystems.
      RENAME_NOREPLACE must guarantee atomicity (i.e. it shouldn't be possible
      for a file to be created on one host while it is overwritten by rename on
      another host).
      
      Filesystems converted:
      
      9p, afs, ceph, coda, ecryptfs, kernfs, lustre, ncpfs, nfs, ocfs2, orangefs.
      
      After this, we can get rid of the duplicate interfaces for rename.
      Signed-off-by: NMiklos Szeredi <mszeredi@redhat.com>
      Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Acked-by: David Howells <dhowells@redhat.com> [AFS]
      Acked-by: NMike Marshall <hubcap@omnibond.com>
      Cc: Eric Van Hensbergen <ericvh@gmail.com>
      Cc: Ilya Dryomov <idryomov@gmail.com>
      Cc: Jan Harkes <jaharkes@cs.cmu.edu>
      Cc: Tyler Hicks <tyhicks@canonical.com>
      Cc: Oleg Drokin <oleg.drokin@intel.com>
      Cc: Trond Myklebust <trond.myklebust@primarydata.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      1cd66c93
  15. 20 9月, 2016 2 次提交
  16. 16 8月, 2016 1 次提交
  17. 29 7月, 2016 1 次提交
  18. 20 7月, 2016 3 次提交
    • T
    • T
      nfs4: clnt: respect noresvport when establishing connections to DSes · 3fc75f12
      Tigran Mkrtchyan 提交于
      result:
      
      $ mount -o vers=4.1 dcache-lab007:/ /pnfs
      $ cp /etc/profile /pnfs
      tcp        0      0 131.169.185.68:1005     131.169.191.141:32049   ESTABLISHED
      tcp        0      0 131.169.185.68:751      131.169.191.144:2049    ESTABLISHED
      $
      
      $ mount -o vers=4.1,noresvport dcache-lab007:/ /pnfs
      $ cp /etc/profile /pnfs
      tcp        0      0 131.169.185.68:34894    131.169.191.141:32049   ESTABLISHED
      tcp        0      0 131.169.185.68:35722    131.169.191.144:2049    ESTABLISHED
      $
      Signed-off-by: NTigran Mkrtchyan <tigran.mkrtchyan@desy.de>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      3fc75f12
    • S
      sunrpc: move NO_CRKEY_TIMEOUT to the auth->au_flags · ce52914e
      Scott Mayhew 提交于
      A generic_cred can be used to look up a unx_cred or a gss_cred, so it's
      not really safe to use the the generic_cred->acred->ac_flags to store
      the NO_CRKEY_TIMEOUT flag.  A lookup for a unx_cred triggered while the
      KEY_EXPIRE_SOON flag is already set will cause both NO_CRKEY_TIMEOUT and
      KEY_EXPIRE_SOON to be set in the ac_flags, leaving the user associated
      with the auth_cred to be in a state where they're perpetually doing 4K
      NFS_FILE_SYNC writes.
      
      This can be reproduced as follows:
      
      1. Mount two NFS filesystems, one with sec=krb5 and one with sec=sys.
      They do not need to be the same export, nor do they even need to be from
      the same NFS server.  Also, v3 is fine.
      $ sudo mount -o v3,sec=krb5 server1:/export /mnt/krb5
      $ sudo mount -o v3,sec=sys server2:/export /mnt/sys
      
      2. As the normal user, before accessing the kerberized mount, kinit with
      a short lifetime (but not so short that renewing the ticket would leave
      you within the 4-minute window again by the time the original ticket
      expires), e.g.
      $ kinit -l 10m -r 60m
      
      3. Do some I/O to the kerberized mount and verify that the writes are
      wsize, UNSTABLE:
      $ dd if=/dev/zero of=/mnt/krb5/file bs=1M count=1
      
      4. Wait until you're within 4 minutes of key expiry, then do some more
      I/O to the kerberized mount to ensure that RPC_CRED_KEY_EXPIRE_SOON gets
      set.  Verify that the writes are 4K, FILE_SYNC:
      $ dd if=/dev/zero of=/mnt/krb5/file bs=1M count=1
      
      5. Now do some I/O to the sec=sys mount.  This will cause
      RPC_CRED_NO_CRKEY_TIMEOUT to be set:
      $ dd if=/dev/zero of=/mnt/sys/file bs=1M count=1
      
      6. Writes for that user will now be permanently 4K, FILE_SYNC for that
      user, regardless of which mount is being written to, until you reboot
      the client.  Renewing the kerberos ticket (assuming it hasn't already
      expired) will have no effect.  Grabbing a new kerberos ticket at this
      point will have no effect either.
      
      Move the flag to the auth->au_flags field (which is currently unused)
      and rename it slightly to reflect that it's no longer associated with
      the auth_cred->ac_flags.  Add the rpc_auth to the arg list of
      rpcauth_cred_key_to_expire and check the au_flags there too.  Finally,
      add the inode to the arg list of nfs_ctx_key_to_expire so we can
      determine the rpc_auth to pass to rpcauth_cred_key_to_expire.
      Signed-off-by: NScott Mayhew <smayhew@redhat.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      ce52914e
  19. 06 7月, 2016 5 次提交
  20. 01 7月, 2016 1 次提交
    • T
      NFS: Fix an Oops in the pNFS files and flexfiles connection setup to the DS · 5c6e5b60
      Trond Myklebust 提交于
      Chris Worley reports:
       RIP: 0010:[<ffffffffa0245f80>]  [<ffffffffa0245f80>] rpc_new_client+0x2a0/0x2e0 [sunrpc]
       RSP: 0018:ffff880158f6f548  EFLAGS: 00010246
       RAX: 0000000000000000 RBX: ffff880234f8bc00 RCX: 000000000000ea60
       RDX: 0000000000074cc0 RSI: 000000000000ea60 RDI: ffff880234f8bcf0
       RBP: ffff880158f6f588 R08: 000000000001ac80 R09: ffff880237003300
       R10: ffff880201171000 R11: ffffea0000d75200 R12: ffffffffa03afc60
       R13: ffff880230c18800 R14: 0000000000000000 R15: ffff880158f6f680
       FS:  00007f0e32673740(0000) GS:ffff88023fc40000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
       CR2: 0000000000000008 CR3: 0000000234886000 CR4: 00000000001406e0
       Stack:
        ffffffffa047a680 0000000000000000 ffff880158f6f598 ffff880158f6f680
        ffff880158f6f680 ffff880234d11d00 ffff88023357f800 ffff880158f6f7d0
        ffff880158f6f5b8 ffffffffa024660a ffff880158f6f5b8 ffffffffa02492ec
       Call Trace:
        [<ffffffffa024660a>] rpc_create_xprt+0x1a/0xb0 [sunrpc]
        [<ffffffffa02492ec>] ? xprt_create_transport+0x13c/0x240 [sunrpc]
        [<ffffffffa0246766>] rpc_create+0xc6/0x1a0 [sunrpc]
        [<ffffffffa038e695>] nfs_create_rpc_client+0xf5/0x140 [nfs]
        [<ffffffffa038f31a>] nfs_init_client+0x3a/0xd0 [nfs]
        [<ffffffffa038f22f>] nfs_get_client+0x25f/0x310 [nfs]
        [<ffffffffa025cef8>] ? rpc_ntop+0xe8/0x100 [sunrpc]
        [<ffffffffa047512c>] nfs3_set_ds_client+0xcc/0x100 [nfsv3]
        [<ffffffffa041fa10>] nfs4_pnfs_ds_connect+0x120/0x400 [nfsv4]
        [<ffffffffa03d41c7>] nfs4_ff_layout_prepare_ds+0xe7/0x330 [nfs_layout_flexfiles]
        [<ffffffffa03d1b1b>] ff_layout_pg_init_write+0xcb/0x280 [nfs_layout_flexfiles]
        [<ffffffffa03a14dc>] __nfs_pageio_add_request+0x12c/0x490 [nfs]
        [<ffffffffa03a1fa2>] nfs_pageio_add_request+0xc2/0x2a0 [nfs]
        [<ffffffffa03a0365>] ? nfs_pageio_init+0x75/0x120 [nfs]
        [<ffffffffa03a5b50>] nfs_do_writepage+0x120/0x270 [nfs]
        [<ffffffffa03a5d31>] nfs_writepage_locked+0x61/0xc0 [nfs]
        [<ffffffff813d4115>] ? __percpu_counter_add+0x55/0x70
        [<ffffffffa03a6a9f>] nfs_wb_single_page+0xef/0x1c0 [nfs]
        [<ffffffff811ca4a3>] ? __dec_zone_page_state+0x33/0x40
        [<ffffffffa0395b21>] nfs_launder_page+0x41/0x90 [nfs]
        [<ffffffff811baba0>] invalidate_inode_pages2_range+0x340/0x3a0
        [<ffffffff811bac17>] invalidate_inode_pages2+0x17/0x20
        [<ffffffffa039960e>] nfs_release+0x9e/0xb0 [nfs]
        [<ffffffffa0399570>] ? nfs_open+0x60/0x60 [nfs]
        [<ffffffffa0394dad>] nfs_file_release+0x3d/0x60 [nfs]
        [<ffffffff81226e6c>] __fput+0xdc/0x1e0
        [<ffffffff81226fbe>] ____fput+0xe/0x10
        [<ffffffff810bf2e4>] task_work_run+0xc4/0xe0
        [<ffffffff810a4188>] do_exit+0x2e8/0xb30
        [<ffffffff8102471c>] ? do_audit_syscall_entry+0x6c/0x70
        [<ffffffff811464e6>] ? __audit_syscall_exit+0x1e6/0x280
        [<ffffffff810a4a5f>] do_group_exit+0x3f/0xa0
        [<ffffffff810a4ad4>] SyS_exit_group+0x14/0x20
        [<ffffffff8179b76e>] system_call_fastpath+0x12/0x71
      
      Which seems to be due to a call to utsname() when in a task exit context
      in order to determine the hostname to set in rpc_new_client().
      
      In reality, what we want here is not the hostname of the current task, but
      the hostname that was used to set up the metadata server.
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      5c6e5b60
  21. 18 5月, 2016 1 次提交
  22. 05 4月, 2016 1 次提交
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  23. 17 3月, 2016 2 次提交
  24. 22 1月, 2016 1 次提交
  25. 08 1月, 2016 1 次提交
  26. 01 1月, 2016 2 次提交
  27. 29 12月, 2015 1 次提交
    • P
      nfs: handle request add failure properly · 0bcbf039
      Peng Tao 提交于
      When we fail to queue a read page to IO descriptor,
      we need to clean it up otherwise it is hanging around
      preventing nfs module from being removed.
      
      When we fail to queue a write page to IO descriptor,
      we need to clean it up and also save the failure status
      to open context. Then at file close, we can try to write
      pages back again and drop the page if it fails to writeback
      in .launder_page, which will be done in the next patch.
      Signed-off-by: NPeng Tao <tao.peng@primarydata.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      0bcbf039
  28. 28 12月, 2015 1 次提交
    • O
      Adding stateid information to tracepoints · 48c9579a
      Olga Kornievskaia 提交于
      Operations to which stateid information is added:
      close, delegreturn, open, read, setattr, layoutget, layoutcommit, test_stateid,
      write, lock, locku, lockt
      
      Format is "stateid=<seqid>:<crc32 hash stateid.other>", also "openstateid=",
      "layoutstateid=", and "lockstateid=" for open_file, layoutget, set_lock
      tracepoints.
      
      New function is added to internal.h, nfs_stateid_hash(), to compute the hash
      
      trace_nfs4_setattr() is moved from nfs4_do_setattr() to _nfs4_do_setattr()
      to get access to stateid.
      
      trace_nfs4_setattr and trace_nfs4_delegreturn are changed from INODE_EVENT
      to new event type, INODE_STATEID_EVENT which is same as INODE_EVENT but adds
      stateid information
      
      for locking tracepoints, moved trace_nfs4_set_lock() into _nfs4_do_setlk()
      to get access to stateid information, and removed trace_nfs4_lock_reclaim(),
      trace_nfs4_lock_expired() as they call into _nfs4_do_setlk() and both were
      previously same LOCK_EVENT type.
      Signed-off-by: NOlga Kornievskaia <kolga@netapp.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      48c9579a
  29. 14 12月, 2015 1 次提交
  30. 08 9月, 2015 1 次提交
  31. 18 8月, 2015 1 次提交