1. 10 7月, 2019 1 次提交
    • G
      Revert "RDS: IB: split the mr registration and invalidation path" · a5520788
      Gerd Rausch 提交于
      This reverts commit 56012459.
      
      RDS kept spinning inside function "rds_ib_post_reg_frmr", waiting for
      "i_fastreg_wrs" to become incremented:
               while (atomic_dec_return(&ibmr->ic->i_fastreg_wrs) <= 0) {
                       atomic_inc(&ibmr->ic->i_fastreg_wrs);
                       cpu_relax();
               }
      
      Looking at the original commit:
      
      commit 56012459 ("RDS: IB: split the mr registration and
      invalidation path")
      
      In there, the "rds_ib_mr_cqe_handler" was changed in the following
      way:
      
       void rds_ib_mr_cqe_handler(struct
       rds_ib_connection *ic,
       struct ib_wc *wc)
              if (frmr->fr_inv) {
                        frmr->fr_state = FRMR_IS_FREE;
                        frmr->fr_inv = false;
                      atomic_inc(&ic->i_fastreg_wrs);
              } else {
                      atomic_inc(&ic->i_fastunreg_wrs);
              }
      
      It looks like it's got it exactly backwards:
      
      Function "rds_ib_post_reg_frmr" keeps track of the outstanding
      requests via "i_fastreg_wrs".
      
      Function "rds_ib_post_inv" keeps track of the outstanding requests
      via "i_fastunreg_wrs" (post original commit). It also sets:
               frmr->fr_inv = true;
      
      However the completion handler "rds_ib_mr_cqe_handler" adjusts
      "i_fastreg_wrs" when "fr_inv" had been true, and adjusts
      "i_fastunreg_wrs" otherwise.
      
      The original commit was done in the name of performance:
      to remove the performance bottleneck
      
      No performance benefit could be observed with a fixed-up version
      of the original commit measured between two Oracle X7 servers,
      both equipped with Mellanox Connect-X5 HCAs.
      
      The prudent course of action is to revert this commit.
      Signed-off-by: NGerd Rausch <gerd.rausch@oracle.com>
      Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      a5520788
  2. 05 2月, 2019 4 次提交
  3. 02 8月, 2018 2 次提交
  4. 24 7月, 2018 2 次提交
    • K
      rds: Enable RDS IPv6 support · 1e2b44e7
      Ka-Cheong Poon 提交于
      This patch enables RDS to use IPv6 addresses. For RDS/TCP, the
      listener is now an IPv6 endpoint which accepts both IPv4 and IPv6
      connection requests.  RDS/RDMA/IB uses a private data (struct
      rds_ib_connect_private) exchange between endpoints at RDS connection
      establishment time to support RDMA. This private data exchange uses a
      32 bit integer to represent an IP address. This needs to be changed in
      order to support IPv6. A new private data struct
      rds6_ib_connect_private is introduced to handle this. To ensure
      backward compatibility, an IPv6 capable RDS stack uses another RDMA
      listener port (RDS_CM_PORT) to accept IPv6 connection. And it
      continues to use the original RDS_PORT for IPv4 RDS connections. When
      it needs to communicate with an IPv6 peer, it uses the RDS_CM_PORT to
      send the connection set up request.
      
      v5: Fixed syntax problem (David Miller).
      
      v4: Changed port history comments in rds.h (Sowmini Varadhan).
      
      v3: Added support to set up IPv4 connection using mapped address
          (David Miller).
          Added support to set up connection between link local and non-link
          addresses.
          Various review comments from Santosh Shilimkar and Sowmini Varadhan.
      
      v2: Fixed bound and peer address scope mismatched issue.
          Added back rds_connect() IPv6 changes.
      Signed-off-by: NKa-Cheong Poon <ka-cheong.poon@oracle.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1e2b44e7
    • K
      rds: Changing IP address internal representation to struct in6_addr · eee2fa6a
      Ka-Cheong Poon 提交于
      This patch changes the internal representation of an IP address to use
      struct in6_addr.  IPv4 address is stored as an IPv4 mapped address.
      All the functions which take an IP address as argument are also
      changed to use struct in6_addr.  But RDS socket layer is not modified
      such that it still does not accept IPv6 address from an application.
      And RDS layer does not accept nor initiate IPv6 connections.
      
      v2: Fixed sparse warnings.
      Signed-off-by: NKa-Cheong Poon <ka-cheong.poon@oracle.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eee2fa6a
  5. 13 6月, 2018 1 次提交
    • K
      treewide: Use array_size() in vzalloc_node() · fd7beced
      Kees Cook 提交于
      The vzalloc_node() function has no 2-factor argument form, so
      multiplication factors need to be wrapped in array_size(). This patch
      replaces cases of:
      
              vzalloc_node(a * b, node)
      
      with:
              vzalloc_node(array_size(a, b), node)
      
      as well as handling cases of:
      
              vzalloc_node(a * b * c, node)
      
      with:
      
              vzalloc_node(array3_size(a, b, c), node)
      
      This does, however, attempt to ignore constant size factors like:
      
              vzalloc_node(4 * 1024, node)
      
      though any constants defined via macros get caught up in the conversion.
      
      Any factors with a sizeof() of "unsigned char", "char", and "u8" were
      dropped, since they're redundant.
      
      The Coccinelle script used for this was:
      
      // Fix redundant parens around sizeof().
      @@
      type TYPE;
      expression THING, E;
      @@
      
      (
        vzalloc_node(
      -	(sizeof(TYPE)) * E
      +	sizeof(TYPE) * E
        , ...)
      |
        vzalloc_node(
      -	(sizeof(THING)) * E
      +	sizeof(THING) * E
        , ...)
      )
      
      // Drop single-byte sizes and redundant parens.
      @@
      expression COUNT;
      typedef u8;
      typedef __u8;
      @@
      
      (
        vzalloc_node(
      -	sizeof(u8) * (COUNT)
      +	COUNT
        , ...)
      |
        vzalloc_node(
      -	sizeof(__u8) * (COUNT)
      +	COUNT
        , ...)
      |
        vzalloc_node(
      -	sizeof(char) * (COUNT)
      +	COUNT
        , ...)
      |
        vzalloc_node(
      -	sizeof(unsigned char) * (COUNT)
      +	COUNT
        , ...)
      |
        vzalloc_node(
      -	sizeof(u8) * COUNT
      +	COUNT
        , ...)
      |
        vzalloc_node(
      -	sizeof(__u8) * COUNT
      +	COUNT
        , ...)
      |
        vzalloc_node(
      -	sizeof(char) * COUNT
      +	COUNT
        , ...)
      |
        vzalloc_node(
      -	sizeof(unsigned char) * COUNT
      +	COUNT
        , ...)
      )
      
      // 2-factor product with sizeof(type/expression) and identifier or constant.
      @@
      type TYPE;
      expression THING;
      identifier COUNT_ID;
      constant COUNT_CONST;
      @@
      
      (
        vzalloc_node(
      -	sizeof(TYPE) * (COUNT_ID)
      +	array_size(COUNT_ID, sizeof(TYPE))
        , ...)
      |
        vzalloc_node(
      -	sizeof(TYPE) * COUNT_ID
      +	array_size(COUNT_ID, sizeof(TYPE))
        , ...)
      |
        vzalloc_node(
      -	sizeof(TYPE) * (COUNT_CONST)
      +	array_size(COUNT_CONST, sizeof(TYPE))
        , ...)
      |
        vzalloc_node(
      -	sizeof(TYPE) * COUNT_CONST
      +	array_size(COUNT_CONST, sizeof(TYPE))
        , ...)
      |
        vzalloc_node(
      -	sizeof(THING) * (COUNT_ID)
      +	array_size(COUNT_ID, sizeof(THING))
        , ...)
      |
        vzalloc_node(
      -	sizeof(THING) * COUNT_ID
      +	array_size(COUNT_ID, sizeof(THING))
        , ...)
      |
        vzalloc_node(
      -	sizeof(THING) * (COUNT_CONST)
      +	array_size(COUNT_CONST, sizeof(THING))
        , ...)
      |
        vzalloc_node(
      -	sizeof(THING) * COUNT_CONST
      +	array_size(COUNT_CONST, sizeof(THING))
        , ...)
      )
      
      // 2-factor product, only identifiers.
      @@
      identifier SIZE, COUNT;
      @@
      
        vzalloc_node(
      -	SIZE * COUNT
      +	array_size(COUNT, SIZE)
        , ...)
      
      // 3-factor product with 1 sizeof(type) or sizeof(expression), with
      // redundant parens removed.
      @@
      expression THING;
      identifier STRIDE, COUNT;
      type TYPE;
      @@
      
      (
        vzalloc_node(
      -	sizeof(TYPE) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        vzalloc_node(
      -	sizeof(TYPE) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        vzalloc_node(
      -	sizeof(TYPE) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        vzalloc_node(
      -	sizeof(TYPE) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        vzalloc_node(
      -	sizeof(THING) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        vzalloc_node(
      -	sizeof(THING) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        vzalloc_node(
      -	sizeof(THING) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        vzalloc_node(
      -	sizeof(THING) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      )
      
      // 3-factor product with 2 sizeof(variable), with redundant parens removed.
      @@
      expression THING1, THING2;
      identifier COUNT;
      type TYPE1, TYPE2;
      @@
      
      (
        vzalloc_node(
      -	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        vzalloc_node(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        vzalloc_node(
      -	sizeof(THING1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        vzalloc_node(
      -	sizeof(THING1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        vzalloc_node(
      -	sizeof(TYPE1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      |
        vzalloc_node(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      )
      
      // 3-factor product, only identifiers, with redundant parens removed.
      @@
      identifier STRIDE, SIZE, COUNT;
      @@
      
      (
        vzalloc_node(
      -	(COUNT) * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc_node(
      -	COUNT * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc_node(
      -	COUNT * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc_node(
      -	(COUNT) * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc_node(
      -	COUNT * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc_node(
      -	(COUNT) * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc_node(
      -	(COUNT) * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        vzalloc_node(
      -	COUNT * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      )
      
      // Any remaining multi-factor products, first at least 3-factor products
      // when they're not all constants...
      @@
      expression E1, E2, E3;
      constant C1, C2, C3;
      @@
      
      (
        vzalloc_node(C1 * C2 * C3, ...)
      |
        vzalloc_node(
      -	E1 * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      )
      
      // And then all remaining 2 factors products when they're not all constants.
      @@
      expression E1, E2;
      constant C1, C2;
      @@
      
      (
        vzalloc_node(C1 * C2, ...)
      |
        vzalloc_node(
      -	E1 * E2
      +	array_size(E1, E2)
        , ...)
      )
      Signed-off-by: NKees Cook <keescook@chromium.org>
      fd7beced
  6. 26 4月, 2018 1 次提交
  7. 09 2月, 2018 1 次提交
    • S
      rds: tcp: use rds_destroy_pending() to synchronize netns/module teardown and... · ebeeb1ad
      Sowmini Varadhan 提交于
      rds: tcp: use rds_destroy_pending() to synchronize netns/module teardown and rds connection/workq management
      
      An rds_connection can get added during netns deletion between lines 528
      and 529 of
      
        506 static void rds_tcp_kill_sock(struct net *net)
        :
        /* code to pull out all the rds_connections that should be destroyed */
        :
        528         spin_unlock_irq(&rds_tcp_conn_lock);
        529         list_for_each_entry_safe(tc, _tc, &tmp_list, t_tcp_node)
        530                 rds_conn_destroy(tc->t_cpath->cp_conn);
      
      Such an rds_connection would miss out the rds_conn_destroy()
      loop (that cancels all pending work) and (if it was scheduled
      after netns deletion) could trigger the use-after-free.
      
      A similar race-window exists for the module unload path
      in rds_tcp_exit -> rds_tcp_destroy_conns
      
      Concurrency with netns deletion (rds_tcp_kill_sock()) must be handled
      by checking check_net() before enqueuing new work or adding new
      connections.
      
      Concurrency with module-unload is handled by maintaining a module
      specific flag that is set at the start of the module exit function,
      and must be checked before enqueuing new work or adding new connections.
      
      This commit refactors existing RDS_DESTROY_PENDING checks added by
      commit 3db6e0d1 ("rds: use RCU to synchronize work-enqueue with
      connection teardown") and consolidates all the concurrency checks
      listed above into the function rds_destroy_pending().
      Signed-off-by: NSowmini Varadhan <sowmini.varadhan@oracle.com>
      Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ebeeb1ad
  8. 14 3月, 2017 1 次提交
  9. 10 3月, 2017 1 次提交
  10. 03 1月, 2017 5 次提交
  11. 02 7月, 2016 2 次提交
  12. 19 6月, 2016 1 次提交
  13. 15 6月, 2016 2 次提交
  14. 17 4月, 2016 1 次提交
  15. 03 3月, 2016 3 次提交
  16. 29 10月, 2015 1 次提交
  17. 06 10月, 2015 3 次提交
  18. 31 8月, 2015 1 次提交
  19. 26 8月, 2015 3 次提交
  20. 08 8月, 2015 1 次提交
  21. 13 6月, 2015 1 次提交
  22. 19 5月, 2015 1 次提交
  23. 05 5月, 2015 1 次提交
    • D
      net/rds: Fix new sparse warning · e2783717
      David Ahern 提交于
      c0adf54a introduced new sparse warnings:
        CHECK   /home/dahern/kernels/linux.git/net/rds/ib_cm.c
      net/rds/ib_cm.c:191:34: warning: incorrect type in initializer (different base types)
      net/rds/ib_cm.c:191:34:    expected unsigned long long [unsigned] [usertype] dp_ack_seq
      net/rds/ib_cm.c:191:34:    got restricted __be64 <noident>
      net/rds/ib_cm.c:194:51: warning: cast to restricted __be64
      
      The temporary variable for sequence number should have been declared as __be64
      rather than u64. Make it so.
      Signed-off-by: NDavid Ahern <david.ahern@oracle.com>
      Cc: shamir rabinovitch <shamir.rabinovitch@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e2783717