1. 27 9月, 2016 1 次提交
    • J
      nfsd: randomize SETCLIENTID reply to help distinguish servers · ebd7c72c
      J. Bruce Fields 提交于
      NFSv4.1 has built-in trunking support that allows a client to determine
      whether two connections to two different IP addresses are actually to
      the same server.  NFSv4.0 does not, but RFC 7931 attempts to provide
      clients a means to do this, basically by performing a SETCLIENTID to one
      address and confirming it with a SETCLIENTID_CONFIRM to the other.
      
      Linux clients since 05f4c350 "NFS: Discover NFSv4 server trunking
      when mounting" implement a variation on this suggestion.  It is possible
      that other clients do too.
      
      This depends on the clientid and verifier not being accepted by an
      unrelated server.  Since both are 64-bit values, that would be very
      unlikely if they were random numbers.  But they aren't:
      
      knfsd generates the 64-bit clientid by concatenating the 32-bit boot
      time (in seconds) and a counter.  This makes collisions between
      clientids generated by the same server extremely unlikely.  But
      collisions are very likely between clientids generated by servers that
      boot at the same time, and it's quite common for multiple servers to
      boot at the same time.  The verifier is a concatenation of the
      SETCLIENTID time (in seconds) and a counter, so again collisions between
      different servers are likely if multiple SETCLIENTIDs are done at the
      same time, which is a common case.
      
      Therefore recent NFSv4.0 clients may decide two different servers are
      really the same, and mount a filesystem from the wrong server.
      
      Fortunately the Linux client, since 55b9df93 "nfsv4/v4.1: Verify the
      client owner id during trunking detection", only does this when given
      the non-default "migration" mount option.
      
      The fault is really with RFC 7931, and needs a client fix, but in the
      meantime we can mitigate the chance of these collisions by randomizing
      the starting value of the counters used to generate clientids and
      verifiers.
      Reported-by: NFrank Sorenson <fsorenso@redhat.com>
      Reviewed-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      ebd7c72c
  2. 24 6月, 2016 1 次提交
    • E
      vfs: Pass data, ns, and ns->userns to mount_ns · d91ee87d
      Eric W. Biederman 提交于
      Today what is normally called data (the mount options) is not passed
      to fill_super through mount_ns.
      
      Pass the mount options and the namespace separately to mount_ns so
      that filesystems such as proc that have mount options, can use
      mount_ns.
      
      Pass the user namespace to mount_ns so that the standard permission
      check that verifies the mounter has permissions over the namespace can
      be performed in mount_ns instead of in each filesystems .mount method.
      Thus removing the duplication between mqueuefs and proc in terms of
      permission checks.  The extra permission check does not currently
      affect the rpc_pipefs filesystem and the nfsd filesystem as those
      filesystems do not currently allow unprivileged mounts.  Without
      unpvileged mounts it is guaranteed that the caller has already passed
      capable(CAP_SYS_ADMIN) which guarantees extra permission check will
      pass.
      
      Update rpc_pipefs and the nfsd filesystem to ensure that the network
      namespace reference is always taken in fill_super and always put in kill_sb
      so that the logic is simpler and so that errors originating inside of
      fill_super do not cause a network namespace leak.
      Acked-by: NSeth Forshee <seth.forshee@canonical.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      d91ee87d
  3. 30 5月, 2016 1 次提交
  4. 22 4月, 2015 1 次提交
    • G
      nfsd: fix nsfd startup race triggering BUG_ON · bb7ffbf2
      Giuseppe Cantavenera 提交于
      nfsd triggered a BUG_ON in net_generic(...) when rpc_pipefs_event(...)
      in fs/nfsd/nfs4recover.c was called before assigning ntfsd_net_id.
      The following was observed on a MIPS 32-core processor:
      kernel: Call Trace:
      kernel: [<ffffffffc00bc5e4>] rpc_pipefs_event+0x7c/0x158 [nfsd]
      kernel: [<ffffffff8017a2a0>] notifier_call_chain+0x70/0xb8
      kernel: [<ffffffff8017a4e4>] __blocking_notifier_call_chain+0x4c/0x70
      kernel: [<ffffffff8053aff8>] rpc_fill_super+0xf8/0x1a0
      kernel: [<ffffffff8022204c>] mount_ns+0xb4/0xf0
      kernel: [<ffffffff80222b48>] mount_fs+0x50/0x1f8
      kernel: [<ffffffff8023dc00>] vfs_kern_mount+0x58/0xf0
      kernel: [<ffffffff802404ac>] do_mount+0x27c/0xa28
      kernel: [<ffffffff80240cf0>] SyS_mount+0x98/0xe8
      kernel: [<ffffffff80135d24>] handle_sys64+0x44/0x68
      kernel:
      kernel:
              Code: 0040f809  00000000  2e020001 <00020336> 3c12c00d
                      3c02801a  de100000 6442eb98  0040f809
      kernel: ---[ end trace 7471374335809536 ]---
      
      Fixed this behaviour by calling register_pernet_subsys(&nfsd_net_ops) before
      registering rpc_pipefs_event(...) with the notifier chain.
      Signed-off-by: NGiuseppe Cantavenera <giuseppe.cantavenera.ext@nokia.com>
      Signed-off-by: NLorenzo Restelli <lorenzo.restelli.ext@nokia.com>
      Reviewed-by: NKinlong Mee <kinglongmee@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      bb7ffbf2
  5. 03 2月, 2015 1 次提交
    • C
      nfsd: implement pNFS operations · 9cf514cc
      Christoph Hellwig 提交于
      Add support for the GETDEVICEINFO, LAYOUTGET, LAYOUTCOMMIT and
      LAYOUTRETURN NFSv4.1 operations, as well as backing code to manage
      outstanding layouts and devices.
      
      Layout management is very straight forward, with a nfs4_layout_stateid
      structure that extends nfs4_stid to manage layout stateids as the
      top-level structure.  It is linked into the nfs4_file and nfs4_client
      structures like the other stateids, and contains a linked list of
      layouts that hang of the stateid.  The actual layout operations are
      implemented in layout drivers that are not part of this commit, but
      will be added later.
      
      The worst part of this commit is the management of the pNFS device IDs,
      which suffers from a specification that is not sanely implementable due
      to the fact that the device-IDs are global and not bound to an export,
      and have a small enough size so that we can't store the fsid portion of
      a file handle, and must never be reused.  As we still do need perform all
      export authentication and validation checks on a device ID passed to
      GETDEVICEINFO we are caught between a rock and a hard place.  To work
      around this issue we add a new hash that maps from a 64-bit integer to a
      fsid so that we can look up the export to authenticate against it,
      a 32-bit integer as a generation that we can bump when changing the device,
      and a currently unused 32-bit integer that could be used in the future
      to handle more than a single device per export.  Entries in this hash
      table are never deleted as we can't reuse the ids anyway, and would have
      a severe lifetime problem anyway as Linux export structures are temporary
      structures that can go away under load.
      
      Parts of the XDR data, structures and marshaling/unmarshaling code, as
      well as many concepts are derived from the old pNFS server implementation
      from Andy Adamson, Benny Halevy, Dean Hildebrand, Marc Eshel, Fred Isaman,
      Mike Sager, Ricardo Labiaga and many others.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      9cf514cc
  6. 02 12月, 2014 1 次提交
  7. 20 11月, 2014 1 次提交
  8. 18 9月, 2014 1 次提交
    • J
      nfsd: add a v4_end_grace file to /proc/fs/nfsd · 7f5ef2e9
      Jeff Layton 提交于
      Allow a privileged userland process to end the v4 grace period early.
      Writing "Y", "y", or "1" to the file will cause the v4 grace period to
      be lifted.  The basic idea with this will be to allow the userland
      client tracking program to lift the grace period once it knows that no
      more clients will be reclaiming state.
      Signed-off-by: NJeff Layton <jlayton@primarydata.com>
      7f5ef2e9
  9. 09 7月, 2014 1 次提交
    • J
      nfsd: add a new /proc/fs/nfsd/max_connections file · 5b8db00b
      Jeff Layton 提交于
      Currently, the maximum number of connections that nfsd will allow
      is based on the number of threads spawned. While this is fine for a
      default, there really isn't a clear relationship between the two.
      
      The number of threads corresponds to the number of concurrent requests
      that we want to allow the server to process at any given time. The
      connection limit corresponds to the maximum number of clients that we
      want to allow the server to handle. These are two entirely different
      quantities.
      
      Break the dependency on increasing threads in order to allow for more
      connections, by adding a new per-net parameter that can be set to a
      non-zero value. The default is still to base it on the number of threads,
      so there should be no behavior change for anyone who doesn't use it.
      
      Cc: Trond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: NJeff Layton <jlayton@primarydata.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      5b8db00b
  10. 23 6月, 2014 1 次提交
  11. 09 5月, 2014 1 次提交
  12. 01 4月, 2014 1 次提交
    • S
      nfsd: check passed socket's net matches NFSd superblock's one · 30646394
      Stanislav Kinsbursky 提交于
      There could be a case, when NFSd file system is mounted in network, different
      to socket's one, like below:
      
      "ip netns exec" creates new network and mount namespace, which duplicates NFSd
      mount point, created in init_net context. And thus NFS server stop in nested
      network context leads to RPCBIND client destruction in init_net.
      Then, on NFSd start in nested network context, rpc.nfsd process creates socket
      in nested net and passes it into "write_ports", which leads to RPCBIND sockets
      creation in init_net context because of the same reason (NFSd monut point was
      created in init_net context). An attempt to register passed socket in nested
      net leads to panic, because no RPCBIND client present in nexted network
      namespace.
      
      This patch add check that passed socket's net matches NFSd superblock's one.
      And returns -EINVAL error to user psace otherwise.
      
      v2: Put socket on exit.
      Reported-by: NWeng Meiling <wengmeiling.weng@huawei.com>
      Signed-off-by: NStanislav Kinsbursky <skinsbursky@parallels.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      30646394
  13. 10 4月, 2013 1 次提交
  14. 04 4月, 2013 1 次提交
  15. 03 4月, 2013 1 次提交
  16. 04 3月, 2013 1 次提交
    • E
      fs: Limit sys_mount to only request filesystem modules. · 7f78e035
      Eric W. Biederman 提交于
      Modify the request_module to prefix the file system type with "fs-"
      and add aliases to all of the filesystems that can be built as modules
      to match.
      
      A common practice is to build all of the kernel code and leave code
      that is not commonly needed as modules, with the result that many
      users are exposed to any bug anywhere in the kernel.
      
      Looking for filesystems with a fs- prefix limits the pool of possible
      modules that can be loaded by mount to just filesystems trivially
      making things safer with no real cost.
      
      Using aliases means user space can control the policy of which
      filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf
      with blacklist and alias directives.  Allowing simple, safe,
      well understood work-arounds to known problematic software.
      
      This also addresses a rare but unfortunate problem where the filesystem
      name is not the same as it's module name and module auto-loading
      would not work.  While writing this patch I saw a handful of such
      cases.  The most significant being autofs that lives in the module
      autofs4.
      
      This is relevant to user namespaces because we can reach the request
      module in get_fs_type() without having any special permissions, and
      people get uncomfortable when a user specified string (in this case
      the filesystem type) goes all of the way to request_module.
      
      After having looked at this issue I don't think there is any
      particular reason to perform any filtering or permission checks beyond
      making it clear in the module request that we want a filesystem
      module.  The common pattern in the kernel is to call request_module()
      without regards to the users permissions.  In general all a filesystem
      module does once loaded is call register_filesystem() and go to sleep.
      Which means there is not much attack surface exposed by loading a
      filesytem module unless the filesystem is mounted.  In a user
      namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT,
      which most filesystems do not set today.
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Reported-by: NKees Cook <keescook@google.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      7f78e035
  17. 23 2月, 2013 1 次提交
  18. 16 2月, 2013 2 次提交
  19. 05 2月, 2013 1 次提交
  20. 24 1月, 2013 1 次提交
  21. 11 12月, 2012 5 次提交
  22. 29 11月, 2012 1 次提交
  23. 28 11月, 2012 3 次提交
  24. 10 9月, 2012 1 次提交
    • J
      nfsd: remove unused listener-removal interfaces · eccf50c1
      J. Bruce Fields 提交于
      You can use nfsd/portlist to give nfsd additional sockets to listen on.
      In theory you can also remove listening sockets this way.  But nobody's
      ever done that as far as I can tell.
      
      Also this was partially broken in 2.6.25, by
      a217813f "knfsd: Support adding
      transports by writing portlist file".
      
      (Note that we decide whether to take the "delfd" case by checking for a
      digit--but what's actually expected in that case is something made by
      svc_one_sock_name(), which won't begin with a digit.)
      
      So, let's just rip out this stuff.
      Acked-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      eccf50c1
  25. 22 8月, 2012 2 次提交
  26. 21 8月, 2012 1 次提交
  27. 25 7月, 2012 1 次提交
  28. 01 6月, 2012 1 次提交
  29. 12 4月, 2012 4 次提交