1. 28 2月, 2017 2 次提交
  2. 01 2月, 2017 1 次提交
    • N
      NFSD: correctly range-check v4.x minor version when setting versions. · e35659f1
      NeilBrown 提交于
      Writing to /proc/fs/nfsd/versions allows individual major versions
      and NFSv4 minor versions to be enabled or disabled.
      
      However NFSv4.0 cannot currently be disabled, thought there is no good reason.
      Also the minor number is parsed as a 'long' but used as an 'int'
      so '4294967297' will be incorrectly treated as '1'.
      
      This patch removes the test on 'minor == 0' and switches to kstrtouint()
      to get correct range checking.
      
      When reading from /proc/fs/nfsd/versions, 4.0 is current not reported.
      To allow the disabling for v4.0 to be visible, while maintaining
      backward compatibility, change code to report "-4.0" if appropriate, but
      not "+4.0".
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      e35659f1
  3. 18 11月, 2016 1 次提交
    • A
      netns: make struct pernet_operations::id unsigned int · c7d03a00
      Alexey Dobriyan 提交于
      Make struct pernet_operations::id unsigned.
      
      There are 2 reasons to do so:
      
      1)
      This field is really an index into an zero based array and
      thus is unsigned entity. Using negative value is out-of-bound
      access by definition.
      
      2)
      On x86_64 unsigned 32-bit data which are mixed with pointers
      via array indexing or offsets added or subtracted to pointers
      are preffered to signed 32-bit data.
      
      "int" being used as an array index needs to be sign-extended
      to 64-bit before being used.
      
      	void f(long *p, int i)
      	{
      		g(p[i]);
      	}
      
        roughly translates to
      
      	movsx	rsi, esi
      	mov	rdi, [rsi+...]
      	call 	g
      
      MOVSX is 3 byte instruction which isn't necessary if the variable is
      unsigned because x86_64 is zero extending by default.
      
      Now, there is net_generic() function which, you guessed it right, uses
      "int" as an array index:
      
      	static inline void *net_generic(const struct net *net, int id)
      	{
      		...
      		ptr = ng->ptr[id - 1];
      		...
      	}
      
      And this function is used a lot, so those sign extensions add up.
      
      Patch snipes ~1730 bytes on allyesconfig kernel (without all junk
      messing with code generation):
      
      	add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
      
      Unfortunately some functions actually grow bigger.
      This is a semmingly random artefact of code generation with register
      allocator being used differently. gcc decides that some variable
      needs to live in new r8+ registers and every access now requires REX
      prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be
      used which is longer than [r8]
      
      However, overall balance is in negative direction:
      
      	add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
      	function                                     old     new   delta
      	nfsd4_lock                                  3886    3959     +73
      	tipc_link_build_proto_msg                   1096    1140     +44
      	mac80211_hwsim_new_radio                    2776    2808     +32
      	tipc_mon_rcv                                1032    1058     +26
      	svcauth_gss_legacy_init                     1413    1429     +16
      	tipc_bcbase_select_primary                   379     392     +13
      	nfsd4_exchange_id                           1247    1260     +13
      	nfsd4_setclientid_confirm                    782     793     +11
      		...
      	put_client_renew_locked                      494     480     -14
      	ip_set_sockfn_get                            730     716     -14
      	geneve_sock_add                              829     813     -16
      	nfsd4_sequence_done                          721     703     -18
      	nlmclnt_lookup_host                          708     686     -22
      	nfsd4_lockt                                 1085    1063     -22
      	nfs_get_client                              1077    1050     -27
      	tcf_bpf_init                                1106    1076     -30
      	nfsd4_encode_fattr                          5997    5930     -67
      	Total: Before=154856051, After=154854321, chg -0.00%
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c7d03a00
  4. 15 11月, 2016 1 次提交
  5. 27 9月, 2016 1 次提交
    • J
      nfsd: randomize SETCLIENTID reply to help distinguish servers · ebd7c72c
      J. Bruce Fields 提交于
      NFSv4.1 has built-in trunking support that allows a client to determine
      whether two connections to two different IP addresses are actually to
      the same server.  NFSv4.0 does not, but RFC 7931 attempts to provide
      clients a means to do this, basically by performing a SETCLIENTID to one
      address and confirming it with a SETCLIENTID_CONFIRM to the other.
      
      Linux clients since 05f4c350 "NFS: Discover NFSv4 server trunking
      when mounting" implement a variation on this suggestion.  It is possible
      that other clients do too.
      
      This depends on the clientid and verifier not being accepted by an
      unrelated server.  Since both are 64-bit values, that would be very
      unlikely if they were random numbers.  But they aren't:
      
      knfsd generates the 64-bit clientid by concatenating the 32-bit boot
      time (in seconds) and a counter.  This makes collisions between
      clientids generated by the same server extremely unlikely.  But
      collisions are very likely between clientids generated by servers that
      boot at the same time, and it's quite common for multiple servers to
      boot at the same time.  The verifier is a concatenation of the
      SETCLIENTID time (in seconds) and a counter, so again collisions between
      different servers are likely if multiple SETCLIENTIDs are done at the
      same time, which is a common case.
      
      Therefore recent NFSv4.0 clients may decide two different servers are
      really the same, and mount a filesystem from the wrong server.
      
      Fortunately the Linux client, since 55b9df93 "nfsv4/v4.1: Verify the
      client owner id during trunking detection", only does this when given
      the non-default "migration" mount option.
      
      The fault is really with RFC 7931, and needs a client fix, but in the
      meantime we can mitigate the chance of these collisions by randomizing
      the starting value of the counters used to generate clientids and
      verifiers.
      Reported-by: NFrank Sorenson <fsorenso@redhat.com>
      Reviewed-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      ebd7c72c
  6. 24 6月, 2016 1 次提交
    • E
      vfs: Pass data, ns, and ns->userns to mount_ns · d91ee87d
      Eric W. Biederman 提交于
      Today what is normally called data (the mount options) is not passed
      to fill_super through mount_ns.
      
      Pass the mount options and the namespace separately to mount_ns so
      that filesystems such as proc that have mount options, can use
      mount_ns.
      
      Pass the user namespace to mount_ns so that the standard permission
      check that verifies the mounter has permissions over the namespace can
      be performed in mount_ns instead of in each filesystems .mount method.
      Thus removing the duplication between mqueuefs and proc in terms of
      permission checks.  The extra permission check does not currently
      affect the rpc_pipefs filesystem and the nfsd filesystem as those
      filesystems do not currently allow unprivileged mounts.  Without
      unpvileged mounts it is guaranteed that the caller has already passed
      capable(CAP_SYS_ADMIN) which guarantees extra permission check will
      pass.
      
      Update rpc_pipefs and the nfsd filesystem to ensure that the network
      namespace reference is always taken in fill_super and always put in kill_sb
      so that the logic is simpler and so that errors originating inside of
      fill_super do not cause a network namespace leak.
      Acked-by: NSeth Forshee <seth.forshee@canonical.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      d91ee87d
  7. 30 5月, 2016 1 次提交
  8. 22 4月, 2015 1 次提交
    • G
      nfsd: fix nsfd startup race triggering BUG_ON · bb7ffbf2
      Giuseppe Cantavenera 提交于
      nfsd triggered a BUG_ON in net_generic(...) when rpc_pipefs_event(...)
      in fs/nfsd/nfs4recover.c was called before assigning ntfsd_net_id.
      The following was observed on a MIPS 32-core processor:
      kernel: Call Trace:
      kernel: [<ffffffffc00bc5e4>] rpc_pipefs_event+0x7c/0x158 [nfsd]
      kernel: [<ffffffff8017a2a0>] notifier_call_chain+0x70/0xb8
      kernel: [<ffffffff8017a4e4>] __blocking_notifier_call_chain+0x4c/0x70
      kernel: [<ffffffff8053aff8>] rpc_fill_super+0xf8/0x1a0
      kernel: [<ffffffff8022204c>] mount_ns+0xb4/0xf0
      kernel: [<ffffffff80222b48>] mount_fs+0x50/0x1f8
      kernel: [<ffffffff8023dc00>] vfs_kern_mount+0x58/0xf0
      kernel: [<ffffffff802404ac>] do_mount+0x27c/0xa28
      kernel: [<ffffffff80240cf0>] SyS_mount+0x98/0xe8
      kernel: [<ffffffff80135d24>] handle_sys64+0x44/0x68
      kernel:
      kernel:
              Code: 0040f809  00000000  2e020001 <00020336> 3c12c00d
                      3c02801a  de100000 6442eb98  0040f809
      kernel: ---[ end trace 7471374335809536 ]---
      
      Fixed this behaviour by calling register_pernet_subsys(&nfsd_net_ops) before
      registering rpc_pipefs_event(...) with the notifier chain.
      Signed-off-by: NGiuseppe Cantavenera <giuseppe.cantavenera.ext@nokia.com>
      Signed-off-by: NLorenzo Restelli <lorenzo.restelli.ext@nokia.com>
      Reviewed-by: NKinlong Mee <kinglongmee@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      bb7ffbf2
  9. 03 2月, 2015 1 次提交
    • C
      nfsd: implement pNFS operations · 9cf514cc
      Christoph Hellwig 提交于
      Add support for the GETDEVICEINFO, LAYOUTGET, LAYOUTCOMMIT and
      LAYOUTRETURN NFSv4.1 operations, as well as backing code to manage
      outstanding layouts and devices.
      
      Layout management is very straight forward, with a nfs4_layout_stateid
      structure that extends nfs4_stid to manage layout stateids as the
      top-level structure.  It is linked into the nfs4_file and nfs4_client
      structures like the other stateids, and contains a linked list of
      layouts that hang of the stateid.  The actual layout operations are
      implemented in layout drivers that are not part of this commit, but
      will be added later.
      
      The worst part of this commit is the management of the pNFS device IDs,
      which suffers from a specification that is not sanely implementable due
      to the fact that the device-IDs are global and not bound to an export,
      and have a small enough size so that we can't store the fsid portion of
      a file handle, and must never be reused.  As we still do need perform all
      export authentication and validation checks on a device ID passed to
      GETDEVICEINFO we are caught between a rock and a hard place.  To work
      around this issue we add a new hash that maps from a 64-bit integer to a
      fsid so that we can look up the export to authenticate against it,
      a 32-bit integer as a generation that we can bump when changing the device,
      and a currently unused 32-bit integer that could be used in the future
      to handle more than a single device per export.  Entries in this hash
      table are never deleted as we can't reuse the ids anyway, and would have
      a severe lifetime problem anyway as Linux export structures are temporary
      structures that can go away under load.
      
      Parts of the XDR data, structures and marshaling/unmarshaling code, as
      well as many concepts are derived from the old pNFS server implementation
      from Andy Adamson, Benny Halevy, Dean Hildebrand, Marc Eshel, Fred Isaman,
      Mike Sager, Ricardo Labiaga and many others.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      9cf514cc
  10. 02 12月, 2014 1 次提交
  11. 20 11月, 2014 1 次提交
  12. 18 9月, 2014 1 次提交
    • J
      nfsd: add a v4_end_grace file to /proc/fs/nfsd · 7f5ef2e9
      Jeff Layton 提交于
      Allow a privileged userland process to end the v4 grace period early.
      Writing "Y", "y", or "1" to the file will cause the v4 grace period to
      be lifted.  The basic idea with this will be to allow the userland
      client tracking program to lift the grace period once it knows that no
      more clients will be reclaiming state.
      Signed-off-by: NJeff Layton <jlayton@primarydata.com>
      7f5ef2e9
  13. 09 7月, 2014 1 次提交
    • J
      nfsd: add a new /proc/fs/nfsd/max_connections file · 5b8db00b
      Jeff Layton 提交于
      Currently, the maximum number of connections that nfsd will allow
      is based on the number of threads spawned. While this is fine for a
      default, there really isn't a clear relationship between the two.
      
      The number of threads corresponds to the number of concurrent requests
      that we want to allow the server to process at any given time. The
      connection limit corresponds to the maximum number of clients that we
      want to allow the server to handle. These are two entirely different
      quantities.
      
      Break the dependency on increasing threads in order to allow for more
      connections, by adding a new per-net parameter that can be set to a
      non-zero value. The default is still to base it on the number of threads,
      so there should be no behavior change for anyone who doesn't use it.
      
      Cc: Trond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: NJeff Layton <jlayton@primarydata.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      5b8db00b
  14. 23 6月, 2014 1 次提交
  15. 09 5月, 2014 1 次提交
  16. 01 4月, 2014 1 次提交
    • S
      nfsd: check passed socket's net matches NFSd superblock's one · 30646394
      Stanislav Kinsbursky 提交于
      There could be a case, when NFSd file system is mounted in network, different
      to socket's one, like below:
      
      "ip netns exec" creates new network and mount namespace, which duplicates NFSd
      mount point, created in init_net context. And thus NFS server stop in nested
      network context leads to RPCBIND client destruction in init_net.
      Then, on NFSd start in nested network context, rpc.nfsd process creates socket
      in nested net and passes it into "write_ports", which leads to RPCBIND sockets
      creation in init_net context because of the same reason (NFSd monut point was
      created in init_net context). An attempt to register passed socket in nested
      net leads to panic, because no RPCBIND client present in nexted network
      namespace.
      
      This patch add check that passed socket's net matches NFSd superblock's one.
      And returns -EINVAL error to user psace otherwise.
      
      v2: Put socket on exit.
      Reported-by: NWeng Meiling <wengmeiling.weng@huawei.com>
      Signed-off-by: NStanislav Kinsbursky <skinsbursky@parallels.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      30646394
  17. 10 4月, 2013 1 次提交
  18. 04 4月, 2013 1 次提交
  19. 03 4月, 2013 1 次提交
  20. 04 3月, 2013 1 次提交
    • E
      fs: Limit sys_mount to only request filesystem modules. · 7f78e035
      Eric W. Biederman 提交于
      Modify the request_module to prefix the file system type with "fs-"
      and add aliases to all of the filesystems that can be built as modules
      to match.
      
      A common practice is to build all of the kernel code and leave code
      that is not commonly needed as modules, with the result that many
      users are exposed to any bug anywhere in the kernel.
      
      Looking for filesystems with a fs- prefix limits the pool of possible
      modules that can be loaded by mount to just filesystems trivially
      making things safer with no real cost.
      
      Using aliases means user space can control the policy of which
      filesystem modules are auto-loaded by editing /etc/modprobe.d/*.conf
      with blacklist and alias directives.  Allowing simple, safe,
      well understood work-arounds to known problematic software.
      
      This also addresses a rare but unfortunate problem where the filesystem
      name is not the same as it's module name and module auto-loading
      would not work.  While writing this patch I saw a handful of such
      cases.  The most significant being autofs that lives in the module
      autofs4.
      
      This is relevant to user namespaces because we can reach the request
      module in get_fs_type() without having any special permissions, and
      people get uncomfortable when a user specified string (in this case
      the filesystem type) goes all of the way to request_module.
      
      After having looked at this issue I don't think there is any
      particular reason to perform any filtering or permission checks beyond
      making it clear in the module request that we want a filesystem
      module.  The common pattern in the kernel is to call request_module()
      without regards to the users permissions.  In general all a filesystem
      module does once loaded is call register_filesystem() and go to sleep.
      Which means there is not much attack surface exposed by loading a
      filesytem module unless the filesystem is mounted.  In a user
      namespace filesystems are not mounted unless .fs_flags = FS_USERNS_MOUNT,
      which most filesystems do not set today.
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Acked-by: NKees Cook <keescook@chromium.org>
      Reported-by: NKees Cook <keescook@google.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      7f78e035
  21. 23 2月, 2013 1 次提交
  22. 16 2月, 2013 2 次提交
  23. 05 2月, 2013 1 次提交
  24. 24 1月, 2013 1 次提交
  25. 11 12月, 2012 5 次提交
  26. 29 11月, 2012 1 次提交
  27. 28 11月, 2012 3 次提交
  28. 10 9月, 2012 1 次提交
    • J
      nfsd: remove unused listener-removal interfaces · eccf50c1
      J. Bruce Fields 提交于
      You can use nfsd/portlist to give nfsd additional sockets to listen on.
      In theory you can also remove listening sockets this way.  But nobody's
      ever done that as far as I can tell.
      
      Also this was partially broken in 2.6.25, by
      a217813f "knfsd: Support adding
      transports by writing portlist file".
      
      (Note that we decide whether to take the "delfd" case by checking for a
      digit--but what's actually expected in that case is something made by
      svc_one_sock_name(), which won't begin with a digit.)
      
      So, let's just rip out this stuff.
      Acked-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      eccf50c1
  29. 22 8月, 2012 2 次提交
  30. 21 8月, 2012 1 次提交
  31. 25 7月, 2012 1 次提交