1. 02 5月, 2019 1 次提交
  2. 02 8月, 2018 1 次提交
  3. 24 7月, 2018 2 次提交
    • K
      rds: Enable RDS IPv6 support · 1e2b44e7
      Ka-Cheong Poon 提交于
      This patch enables RDS to use IPv6 addresses. For RDS/TCP, the
      listener is now an IPv6 endpoint which accepts both IPv4 and IPv6
      connection requests.  RDS/RDMA/IB uses a private data (struct
      rds_ib_connect_private) exchange between endpoints at RDS connection
      establishment time to support RDMA. This private data exchange uses a
      32 bit integer to represent an IP address. This needs to be changed in
      order to support IPv6. A new private data struct
      rds6_ib_connect_private is introduced to handle this. To ensure
      backward compatibility, an IPv6 capable RDS stack uses another RDMA
      listener port (RDS_CM_PORT) to accept IPv6 connection. And it
      continues to use the original RDS_PORT for IPv4 RDS connections. When
      it needs to communicate with an IPv6 peer, it uses the RDS_CM_PORT to
      send the connection set up request.
      
      v5: Fixed syntax problem (David Miller).
      
      v4: Changed port history comments in rds.h (Sowmini Varadhan).
      
      v3: Added support to set up IPv4 connection using mapped address
          (David Miller).
          Added support to set up connection between link local and non-link
          addresses.
          Various review comments from Santosh Shilimkar and Sowmini Varadhan.
      
      v2: Fixed bound and peer address scope mismatched issue.
          Added back rds_connect() IPv6 changes.
      Signed-off-by: NKa-Cheong Poon <ka-cheong.poon@oracle.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1e2b44e7
    • K
      rds: Changing IP address internal representation to struct in6_addr · eee2fa6a
      Ka-Cheong Poon 提交于
      This patch changes the internal representation of an IP address to use
      struct in6_addr.  IPv4 address is stored as an IPv4 mapped address.
      All the functions which take an IP address as argument are also
      changed to use struct in6_addr.  But RDS socket layer is not modified
      such that it still does not accept IPv6 address from an application.
      And RDS layer does not accept nor initiate IPv6 connections.
      
      v2: Fixed sparse warnings.
      Signed-off-by: NKa-Cheong Poon <ka-cheong.poon@oracle.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eee2fa6a
  4. 08 3月, 2018 1 次提交
  5. 28 2月, 2018 1 次提交
  6. 17 2月, 2018 1 次提交
    • S
      rds: support for zcopy completion notification · 01883eda
      Sowmini Varadhan 提交于
      RDS removes a datagram (rds_message) from the retransmit queue when
      an ACK is received. The ACK indicates that the receiver has queued
      the RDS datagram, so that the sender can safely forget the datagram.
      When all references to the rds_message are quiesced, rds_message_purge
      is called to release resources used by the rds_message
      
      If the datagram to be removed had pinned pages set up, add
      an entry to the rs->rs_znotify_queue so that the notifcation
      will be sent up via rds_rm_zerocopy_callback() when the
      rds_message is eventually freed by rds_message_purge.
      
      rds_rm_zerocopy_callback() attempts to batch the number of cookies
      sent with each notification  to a max of SO_EE_ORIGIN_MAX_ZCOOKIES.
      This is achieved by checking the tail skb in the sk_error_queue:
      if this has room for one more cookie, the cookie from the
      current notification is added; else a new skb is added to the
      sk_error_queue. Every invocation of rds_rm_zerocopy_callback() will
      trigger a ->sk_error_report to notify the application.
      Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Acked-by: NWillem de Bruijn <willemb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      01883eda
  7. 13 2月, 2018 1 次提交
    • D
      net: make getname() functions return length rather than use int* parameter · 9b2c45d4
      Denys Vlasenko 提交于
      Changes since v1:
      Added changes in these files:
          drivers/infiniband/hw/usnic/usnic_transport.c
          drivers/staging/lustre/lnet/lnet/lib-socket.c
          drivers/target/iscsi/iscsi_target_login.c
          drivers/vhost/net.c
          fs/dlm/lowcomms.c
          fs/ocfs2/cluster/tcp.c
          security/tomoyo/network.c
      
      Before:
      All these functions either return a negative error indicator,
      or store length of sockaddr into "int *socklen" parameter
      and return zero on success.
      
      "int *socklen" parameter is awkward. For example, if caller does not
      care, it still needs to provide on-stack storage for the value
      it does not need.
      
      None of the many FOO_getname() functions of various protocols
      ever used old value of *socklen. They always just overwrite it.
      
      This change drops this parameter, and makes all these functions, on success,
      return length of sockaddr. It's always >= 0 and can be differentiated
      from an error.
      
      Tests in callers are changed from "if (err)" to "if (err < 0)", where needed.
      
      rpc_sockname() lost "int buflen" parameter, since its only use was
      to be passed to kernel_getsockname() as &buflen and subsequently
      not used in any way.
      
      Userspace API is not changed.
      
          text    data     bss      dec     hex filename
      30108430 2633624  873672 33615726 200ef6e vmlinux.before.o
      30108109 2633612  873672 33615393 200ee21 vmlinux.o
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      CC: David S. Miller <davem@davemloft.net>
      CC: linux-kernel@vger.kernel.org
      CC: netdev@vger.kernel.org
      CC: linux-bluetooth@vger.kernel.org
      CC: linux-decnet-user@lists.sourceforge.net
      CC: linux-wireless@vger.kernel.org
      CC: linux-rdma@vger.kernel.org
      CC: linux-sctp@vger.kernel.org
      CC: linux-nfs@vger.kernel.org
      CC: linux-x25@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9b2c45d4
  8. 12 2月, 2018 1 次提交
    • L
      vfs: do bulk POLL* -> EPOLL* replacement · a9a08845
      Linus Torvalds 提交于
      This is the mindless scripted replacement of kernel use of POLL*
      variables as described by Al, done by this script:
      
          for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
              L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
              for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
          done
      
      with de-mangling cleanups yet to come.
      
      NOTE! On almost all architectures, the EPOLL* constants have the same
      values as the POLL* constants do.  But they keyword here is "almost".
      For various bad reasons they aren't the same, and epoll() doesn't
      actually work quite correctly in some cases due to this on Sparc et al.
      
      The next patch from Al will sort out the final differences, and we
      should be all done.
      Scripted-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a9a08845
  9. 28 11月, 2017 1 次提交
  10. 07 1月, 2017 1 次提交
  11. 03 1月, 2017 1 次提交
  12. 18 11月, 2016 1 次提交
    • S
      RDS: TCP: Track peer's connection generation number · 905dd418
      Sowmini Varadhan 提交于
      The RDS transport has to be able to distinguish between
      two types of failure events:
      (a) when the transport fails (e.g., TCP connection reset)
          but the RDS socket/connection layer on both sides stays
          the same
      (b) when the peer's RDS layer itself resets (e.g., due to module
          reload or machine reboot at the peer)
      In case (a) both sides must reconnect and continue the RDS messaging
      without any message loss or disruption to the message sequence numbers,
      and this is achieved by rds_send_path_reset().
      
      In case (b) we should reset all rds_connection state to the
      new incarnation of the peer. Examples of state that needs to
      be reset are next expected rx sequence number from, or messages to be
      retransmitted to, the new incarnation of the peer.
      
      To achieve this, the RDS handshake probe added as part of
      commit 5916e2c1 ("RDS: TCP: Enable multipath RDS for TCP")
      is enhanced so that sender and receiver of the RDS ping-probe
      will add a generation number as part of the RDS_EXTHDR_GEN_NUM
      extension header. Each peer stores local and remote generation
      numbers as part of each rds_connection. Changes in generation
      number will be detected via incoming handshake probe ping
      request or response and will allow the receiver to reset rds_connection
      state.
      Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      905dd418
  13. 03 3月, 2016 1 次提交
  14. 03 11月, 2015 1 次提交
  15. 01 10月, 2015 2 次提交
    • S
      RDS: Use per-bucket rw lock for bind hash-table · 9b9acde7
      Santosh Shilimkar 提交于
      One global lock protecting hash-tables with 1024 buckets isn't
      efficient and it shows up in a massive systems with truck
      loads of RDS sockets serving multiple databases. The
      perf data clearly highlights the contention on the rw
      lock in these massive workloads.
      
      When the contention gets worse, the code gets into a state where
      it decides to back off on the lock. So while it has disabled interrupts,
      it sits and backs off on this lock get. This causes the system to
      become sluggish and eventually all sorts of bad things happen.
      
      The simple fix is to move the lock into the hash bucket and
      use per-bucket lock to improve the scalability.
      Signed-off-by: NSantosh Shilimkar <ssantosh@kernel.org>
      Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      9b9acde7
    • S
      RDS: make socket bind/release locking scheme simple and more efficient · 8b0a6b46
      Santosh Shilimkar 提交于
      RDS bind and release locking scheme is very inefficient. It
      uses RCU for maintaining the bind hash-table which is great but
      it also needs to hold spinlock for [add/remove]_bound(). So
      overall usecase, the hash-table concurrent speedup doesn't pay off.
      In fact blocking nature of synchronize_rcu() makes the RDS
      socket shutdown too slow which hurts RDS performance since
      connection shutdown and re-connect happens quite often to
      maintain the RC part of the protocol.
      
      So we make the locking scheme simpler and more efficient by
      replacing spin_locks with reader/writer locks and getting rid
      off rcu for bind hash-table.
      
      In subsequent patch, we also covert the global lock with per-bucket
      lock to reduce the global lock contention.
      Signed-off-by: NSantosh Shilimkar <ssantosh@kernel.org>
      Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      8b0a6b46
  16. 26 8月, 2015 1 次提交
  17. 01 6月, 2015 2 次提交
  18. 19 5月, 2015 1 次提交
  19. 11 5月, 2015 1 次提交
  20. 28 8月, 2014 1 次提交
  21. 25 1月, 2012 1 次提交
  22. 09 9月, 2010 6 次提交
  23. 21 4月, 2010 1 次提交
  24. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  25. 25 3月, 2010 1 次提交
  26. 17 3月, 2010 1 次提交
  27. 30 11月, 2009 1 次提交
  28. 06 11月, 2009 1 次提交
  29. 31 10月, 2009 1 次提交
    • A
      RDS: Add GET_MR_FOR_DEST sockopt · 244546f0
      Andy Grover 提交于
      RDS currently supports a GET_MR sockopt to establish a
      memory region (MR) for a chunk of memory. However, the fastreg
      method ties a MR to a particular destination. The GET_MR_FOR_DEST
      sockopt allows the remote machine to be specified, and thus
      support for fastreg (aka FRWRs).
      
      Note that this patch does *not* do all of this - it simply
      implements the new sockopt in terms of the old one, so applications
      can begin to use the new sockopt in preparation for cutover to
      FRWRs.
      Signed-off-by: NAndy Grover <andy.grover@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      244546f0
  30. 07 10月, 2009 1 次提交
  31. 01 10月, 2009 1 次提交
  32. 15 9月, 2009 1 次提交