1. 01 4月, 2014 1 次提交
    • S
      nfsd: check passed socket's net matches NFSd superblock's one · 30646394
      Stanislav Kinsbursky 提交于
      There could be a case, when NFSd file system is mounted in network, different
      to socket's one, like below:
      
      "ip netns exec" creates new network and mount namespace, which duplicates NFSd
      mount point, created in init_net context. And thus NFS server stop in nested
      network context leads to RPCBIND client destruction in init_net.
      Then, on NFSd start in nested network context, rpc.nfsd process creates socket
      in nested net and passes it into "write_ports", which leads to RPCBIND sockets
      creation in init_net context because of the same reason (NFSd monut point was
      created in init_net context). An attempt to register passed socket in nested
      net leads to panic, because no RPCBIND client present in nexted network
      namespace.
      
      This patch add check that passed socket's net matches NFSd superblock's one.
      And returns -EINVAL error to user psace otherwise.
      
      v2: Put socket on exit.
      Reported-by: NWeng Meiling <wengmeiling.weng@huawei.com>
      Signed-off-by: NStanislav Kinsbursky <skinsbursky@parallels.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      30646394
  2. 10 10月, 2013 1 次提交
  3. 09 10月, 2013 1 次提交
    • E
      ipv6: make lookups simpler and faster · efe4208f
      Eric Dumazet 提交于
      TCP listener refactoring, part 4 :
      
      To speed up inet lookups, we moved IPv4 addresses from inet to struct
      sock_common
      
      Now is time to do the same for IPv6, because it permits us to have fast
      lookups for all kind of sockets, including upcoming SYN_RECV.
      
      Getting IPv6 addresses in TCP lookups currently requires two extra cache
      lines, plus a dereference (and memory stall).
      
      inet6_sk(sk) does the dereference of inet_sk(__sk)->pinet6
      
      This patch is way bigger than its IPv4 counter part, because for IPv4,
      we could add aliases (inet_daddr, inet_rcv_saddr), while on IPv6,
      it's not doable easily.
      
      inet6_sk(sk)->daddr becomes sk->sk_v6_daddr
      inet6_sk(sk)->rcv_saddr becomes sk->sk_v6_rcv_saddr
      
      And timewait socket also have tw->tw_v6_daddr & tw->tw_v6_rcv_saddr
      at the same offset.
      
      We get rid of INET6_TW_MATCH() as INET6_MATCH() is now the generic
      macro.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      efe4208f
  4. 01 8月, 2013 1 次提交
    • N
      NFSD/sunrpc: avoid deadlock on TCP connection due to memory pressure. · 447383d2
      NeilBrown 提交于
      Since we enabled auto-tuning for sunrpc TCP connections we do not
      guarantee that there is enough write-space on each connection to
      queue a reply.
      
      If memory pressure causes the window to shrink too small, the request
      throttling in sunrpc/svc will not accept any requests so no more requests
      will be handled.  Even when pressure decreases the window will not
      grow again until data is sent on the connection.
      This means we get a deadlock:  no requests will be handled until there
      is more space, and no space will be allocated until a request is
      handled.
      
      This can be simulated by modifying svc_tcp_has_wspace to inflate the
      number of byte required and removing the 'svc_sock_setbufsize' calls
      in svc_setup_socket.
      
      I found that multiplying by 16 was enough to make the requirement
      exceed the default allocation.  With this modification in place:
         mount -o vers=3,proto=tcp 127.0.0.1:/home /mnt
      would block and eventually time out because the nfs server could not
      accept any requests.
      
      This patch relaxes the request throttling to always allow at least one
      request through per connection.  It does this by checking both
        sk_stream_min_wspace() and xprt->xpt_reserved
      are zero.
      The first is zero when the TCP transmit queue is empty.
      The second is zero when there are no RPC requests being processed.
      When both of these are zero the socket is idle and so one more
      request can safely be allowed through.
      
      Applying this patch allows the above mount command to succeed cleanly.
      Tracing shows that the allocated write buffer space quickly grows and
      after a few requests are handled, the extra tests are no longer needed
      to permit further requests to be processed.
      
      The main purpose of request throttling is to handle the case when one
      client is slow at collecting replies and the send queue gets full of
      replies that the client hasn't acknowledged (at the TCP level) yet.
      As we only change behaviour when the send queue is empty this main
      purpose is still preserved.
      Reported-by: NBen Myers <bpm@sgi.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      447383d2
  5. 25 7月, 2013 1 次提交
  6. 02 7月, 2013 2 次提交
    • J
      svcrpc: don't error out on small tcp fragment · 1f691b07
      J. Bruce Fields 提交于
      Though clients we care about mostly don't do this, it is possible for
      rpc requests to be sent in multiple fragments.  Here we have a sanity
      check to ensure that the final received rpc isn't too small--except that
      the number we're actually checking is the length of just the final
      fragment, not of the whole rpc.  So a perfectly legal rpc that's
      unluckily fragmented could cause the server to close the connection
      here.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      1f691b07
    • J
      svcrpc: fix handling of too-short rpc's · cf3aa02c
      J. Bruce Fields 提交于
      If we detect that an rpc is too short, we abort and close the
      connection.  Except, there's a bug here: we're leaving sk_datalen
      nonzero without leaving any pages in the sk_pages array.  The most
      likely result of the inconsistency is a subsequent crash in
      svc_tcp_clear_pages.
      
      Also demote the BUG_ON in svc_tcp_clear_pages to a WARN.
      
      Cc: stable@kernel.org
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      cf3aa02c
  7. 01 2月, 2013 1 次提交
  8. 18 12月, 2012 2 次提交
  9. 04 12月, 2012 5 次提交
  10. 05 11月, 2012 1 次提交
  11. 10 9月, 2012 1 次提交
    • J
      nfsd: remove unused listener-removal interfaces · eccf50c1
      J. Bruce Fields 提交于
      You can use nfsd/portlist to give nfsd additional sockets to listen on.
      In theory you can also remove listening sockets this way.  But nobody's
      ever done that as far as I can tell.
      
      Also this was partially broken in 2.6.25, by
      a217813f "knfsd: Support adding
      transports by writing portlist file".
      
      (Note that we decide whether to take the "delfd" case by checking for a
      digit--but what's actually expected in that case is something made by
      svc_one_sock_name(), which won't begin with a digit.)
      
      So, let's just rip out this stuff.
      Acked-by: NNeilBrown <neilb@suse.de>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      eccf50c1
  12. 22 8月, 2012 6 次提交
  13. 21 8月, 2012 1 次提交
    • J
      svcrpc: fix BUG() in svc_tcp_clear_pages · be1e4444
      J. Bruce Fields 提交于
      Examination of svc_tcp_clear_pages shows that it assumes sk_tcplen is
      consistent with sk_pages[] (in particular, sk_pages[n] can't be NULL if
      sk_tcplen would lead us to expect n pages of data).
      
      svc_tcp_restore_pages zeroes out sk_pages[] while leaving sk_tcplen.
      This is OK, since both functions are serialized by XPT_BUSY.  However,
      that means the inconsistency must be repaired before dropping XPT_BUSY.
      
      Therefore we should be ensuring that svc_tcp_save_pages repairs the
      problem before exiting svc_tcp_recv_record on error.
      
      Symptoms were a BUG() in svc_tcp_clear_pages.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      be1e4444
  14. 28 6月, 2012 1 次提交
  15. 16 5月, 2012 1 次提交
  16. 22 4月, 2012 1 次提交
  17. 12 3月, 2012 1 次提交
    • T
      SUNRPC: Fix a few sparse warnings · 09acfea5
      Trond Myklebust 提交于
      net/sunrpc/svcsock.c:412:22: warning: incorrect type in assignment
      (different address spaces)
       - svc_partial_recvfrom now takes a struct kvec, so the variable
         save_iovbase needs to be an ordinary (void *)
      
      Make a bunch of variables in net/sunrpc/xprtsock.c static
      
      Fix a couple of "warning: symbol 'foo' was not declared. Should it be
      static?" reports.
      
      Fix a couple of conflicting function declarations.
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      09acfea5
  18. 04 2月, 2012 1 次提交
  19. 01 2月, 2012 2 次提交
  20. 07 12月, 2011 1 次提交
  21. 23 11月, 2011 1 次提交
  22. 01 11月, 2011 1 次提交
  23. 14 9月, 2011 1 次提交
  24. 16 7月, 2011 1 次提交
  25. 15 7月, 2011 1 次提交
  26. 10 4月, 2011 1 次提交
    • J
      svcrpc: complete svsk processing on cb receive failure · 8985ef0b
      J. Bruce Fields 提交于
      Currently when there's some failure to receive a callback (because we
      couldn't find a matching xid, for example), we exit svc_recv with
      sk_tcplen still set but without any pages saved with the socket.  This
      will cause a crash later in svc_tcp_restore_pages.
      
      Instead, make sure we reset that tcp information whether the callback
      received failed or succeeded.
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      8985ef0b
  27. 08 4月, 2011 2 次提交
    • O
      svcrpc: take advantage of tcp autotuning · 96604398
      Olga Kornievskaia 提交于
      Allow the NFSv4 server to make use of TCP autotuning behaviour, which
      was previously disabled by setting the sk_userlocks variable.
      
      Set the receive buffers to be big enough to receive the whole RPC
      request, and set this for the listening socket, not the accept socket.
      
      Remove the code that readjusts the receive/send buffer sizes for the
      accepted socket. Previously this code was used to influence the TCP
      window management behaviour, which is no longer needed when autotuning
      is enabled.
      
      This can improve IO bandwidth on networks with high bandwidth-delay
      products, where a large tcp window is required.  It also simplifies
      performance tuning, since getting adequate tcp buffers previously
      required increasing the number of nfsd threads.
      Signed-off-by: NOlga Kornievskaia <aglo@citi.umich.edu>
      Cc: Jim Rees <rees@umich.edu>
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      96604398
    • J
      SUNRPC: Don't wait for full record to receive tcp data · 31d68ef6
      J. Bruce Fields 提交于
      Ensure that we immediately read and buffer data from the incoming TCP
      stream so that we grow the receive window quickly, and don't deadlock on
      large READ or WRITE requests.
      
      Also do some minor exit cleanup.
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      31d68ef6