1. 01 8月, 2018 3 次提交
    • B
      NFSv4 client live hangs after live data migration recovery · 0f90be13
      Bill Baker 提交于
      After a live data migration event at the NFS server, the client may send
      I/O requests to the wrong server, causing a live hang due to repeated
      recovery events.  On the wire, this will appear as an I/O request failing
      with NFS4ERR_BADSESSION, followed by successful CREATE_SESSION, repeatedly.
      NFS4ERR_BADSSESSION is returned because the session ID being used was
      issued by the other server and is not valid at the old server.
      
      The failure is caused by async worker threads having cached the transport
      (xprt) in the rpc_task structure.  After the migration recovery completes,
      the task is redispatched and the task resends the request to the wrong
      server based on the old value still present in tk_xprt.
      
      The solution is to recompute the tk_xprt field of the rpc_task structure
      so that the request goes to the correct server.
      Signed-off-by: NBill Baker <bill.baker@oracle.com>
      Reviewed-by: NChuck Lever <chuck.lever@oracle.com>
      Tested-by: NHelen Chao <helen.chao@oracle.com>
      Fixes: fb43d172 ("SUNRPC: Use the multipath iterator to assign a ...")
      Cc: stable@vger.kernel.org # v4.9+
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      0f90be13
    • D
      sunrpc: kstrtoul() can also return -ERANGE · 1a54c0cf
      Dan Carpenter 提交于
      Smatch complains that "num" can be uninitialized when kstrtoul() returns
      -ERANGE.  It's true enough, but basically harmless in this case.
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      1a54c0cf
    • D
      sunrpc: Change rpc_print_iostats to rpc_clnt_show_stats and handle rpc_clnt clones · 016583d7
      Dave Wysochanski 提交于
      The existing rpc_print_iostats has a few shortcomings.  First, the naming
      is not consistent with other functions in the kernel that display stats.
      Second, it is really displaying stats for an rpc_clnt structure as it
      displays both xprt stats and per-op stats.  Third, it does not handle
      rpc_clnt clones, which is important for the one in-kernel tree caller
      of this function, the NFS client's nfs_show_stats function.
      
      Fix all of the above by renaming the rpc_print_iostats to
      rpc_clnt_show_stats and looping through any rpc_clnt clones via
      cl_parent.
      
      Once this interface is fixed, this addresses a problem with NFSv4.
      Before this patch, the /proc/self/mountstats always showed incorrect
      counts for NFSv4 lease and session related opcodes such as SEQUENCE,
      RENEW, SETCLIENTID, CREATE_SESSION, etc.  These counts were always 0
      even though many ops would go over the wire.  The reason for this is
      there are multiple rpc_clnt structures allocated for any given NFSv4
      mount, and inside nfs_show_stats() we callled into rpc_print_iostats()
      which only handled one of them, nfs_server->client.  Fix these counts
      by calling sunrpc's new rpc_clnt_show_stats() function, which handles
      cloned rpc_clnt structs and prints the stats together.
      
      Note that one side-effect of the above is that multiple mounts from
      the same NFS server will show identical counts in the above ops due
      to the fact the one rpc_clnt (representing the NFSv4 client state)
      is shared across mounts.
      Signed-off-by: NDave Wysochanski <dwysocha@redhat.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      016583d7
  2. 31 7月, 2018 3 次提交
  3. 19 6月, 2018 1 次提交
    • C
      sunrpc: Prevent duplicate XID allocation · 0dae72d5
      Chuck Lever 提交于
      Krzysztof Kozlowski <krzk@kernel.org> reports that a heavy NFSv4
      WRITE workload against a slow NFS server causes his Raspberry Pi
      clients to stall. Krzysztof bisected it to commit 37ac86c3
      ("SUNRPC: Initialize rpc_rqst outside of xprt->reserve_lock") .
      
      I was able to reproduce similar behavior and it appears that rarely
      the RPC client layer is re-allocating an XID for an RPC that it has
      already partially sent. This results in the client ignoring the
      subsequent reply, which carries the original XID.
      
      For various reasons, checking !req->rq_xmit_bytes_sent in
      xprt_prepare_transmit is not a 100% reliable mechanism for
      determining when a fresh XID is needed.
      
      Trond's preference is to allocate the XID at the time each rpc_rqst
      slot is initialized.
      
      This patch should also address a gcc 4.1.2 complaint reported by
      Geert Uytterhoeven <geert@linux-m68k.org>.
      Reported-by: NKrzysztof Kozlowski <krzk@kernel.org>
      Fixes: 37ac86c3 ("SUNRPC: Initialize rpc_rqst outside of ... ")
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Tested-by: NKrzysztof Kozlowski <krzk@kernel.org>
      Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>
      0dae72d5
  4. 13 6月, 2018 2 次提交
    • K
      treewide: kzalloc() -> kcalloc() · 6396bb22
      Kees Cook 提交于
      The kzalloc() function has a 2-factor argument form, kcalloc(). This
      patch replaces cases of:
      
              kzalloc(a * b, gfp)
      
      with:
              kcalloc(a * b, gfp)
      
      as well as handling cases of:
      
              kzalloc(a * b * c, gfp)
      
      with:
      
              kzalloc(array3_size(a, b, c), gfp)
      
      as it's slightly less ugly than:
      
              kzalloc_array(array_size(a, b), c, gfp)
      
      This does, however, attempt to ignore constant size factors like:
      
              kzalloc(4 * 1024, gfp)
      
      though any constants defined via macros get caught up in the conversion.
      
      Any factors with a sizeof() of "unsigned char", "char", and "u8" were
      dropped, since they're redundant.
      
      The Coccinelle script used for this was:
      
      // Fix redundant parens around sizeof().
      @@
      type TYPE;
      expression THING, E;
      @@
      
      (
        kzalloc(
      -	(sizeof(TYPE)) * E
      +	sizeof(TYPE) * E
        , ...)
      |
        kzalloc(
      -	(sizeof(THING)) * E
      +	sizeof(THING) * E
        , ...)
      )
      
      // Drop single-byte sizes and redundant parens.
      @@
      expression COUNT;
      typedef u8;
      typedef __u8;
      @@
      
      (
        kzalloc(
      -	sizeof(u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(__u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(char) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(unsigned char) * (COUNT)
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(u8) * COUNT
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(__u8) * COUNT
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(char) * COUNT
      +	COUNT
        , ...)
      |
        kzalloc(
      -	sizeof(unsigned char) * COUNT
      +	COUNT
        , ...)
      )
      
      // 2-factor product with sizeof(type/expression) and identifier or constant.
      @@
      type TYPE;
      expression THING;
      identifier COUNT_ID;
      constant COUNT_CONST;
      @@
      
      (
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * (COUNT_ID)
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * COUNT_ID
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * COUNT_CONST
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * (COUNT_ID)
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * COUNT_ID
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * COUNT_CONST
      +	COUNT_CONST, sizeof(THING)
        , ...)
      )
      
      // 2-factor product, only identifiers.
      @@
      identifier SIZE, COUNT;
      @@
      
      - kzalloc
      + kcalloc
        (
      -	SIZE * COUNT
      +	COUNT, SIZE
        , ...)
      
      // 3-factor product with 1 sizeof(type) or sizeof(expression), with
      // redundant parens removed.
      @@
      expression THING;
      identifier STRIDE, COUNT;
      type TYPE;
      @@
      
      (
        kzalloc(
      -	sizeof(TYPE) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kzalloc(
      -	sizeof(THING) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kzalloc(
      -	sizeof(THING) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kzalloc(
      -	sizeof(THING) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kzalloc(
      -	sizeof(THING) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      )
      
      // 3-factor product with 2 sizeof(variable), with redundant parens removed.
      @@
      expression THING1, THING2;
      identifier COUNT;
      type TYPE1, TYPE2;
      @@
      
      (
        kzalloc(
      -	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kzalloc(
      -	sizeof(THING1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kzalloc(
      -	sizeof(THING1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      |
        kzalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      )
      
      // 3-factor product, only identifiers, with redundant parens removed.
      @@
      identifier STRIDE, SIZE, COUNT;
      @@
      
      (
        kzalloc(
      -	(COUNT) * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	COUNT * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	COUNT * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	(COUNT) * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	COUNT * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	(COUNT) * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	(COUNT) * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kzalloc(
      -	COUNT * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      )
      
      // Any remaining multi-factor products, first at least 3-factor products,
      // when they're not all constants...
      @@
      expression E1, E2, E3;
      constant C1, C2, C3;
      @@
      
      (
        kzalloc(C1 * C2 * C3, ...)
      |
        kzalloc(
      -	(E1) * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kzalloc(
      -	(E1) * (E2) * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kzalloc(
      -	(E1) * (E2) * (E3)
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kzalloc(
      -	E1 * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      )
      
      // And then all remaining 2 factors products when they're not all constants,
      // keeping sizeof() as the second factor argument.
      @@
      expression THING, E1, E2;
      type TYPE;
      constant C1, C2, C3;
      @@
      
      (
        kzalloc(sizeof(THING) * C2, ...)
      |
        kzalloc(sizeof(TYPE) * C2, ...)
      |
        kzalloc(C1 * C2 * C3, ...)
      |
        kzalloc(C1 * C2, ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * (E2)
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(TYPE) * E2
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * (E2)
      +	E2, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	sizeof(THING) * E2
      +	E2, sizeof(THING)
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	(E1) * E2
      +	E1, E2
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	(E1) * (E2)
      +	E1, E2
        , ...)
      |
      - kzalloc
      + kcalloc
        (
      -	E1 * E2
      +	E1, E2
        , ...)
      )
      Signed-off-by: NKees Cook <keescook@chromium.org>
      6396bb22
    • K
      treewide: kmalloc() -> kmalloc_array() · 6da2ec56
      Kees Cook 提交于
      The kmalloc() function has a 2-factor argument form, kmalloc_array(). This
      patch replaces cases of:
      
              kmalloc(a * b, gfp)
      
      with:
              kmalloc_array(a * b, gfp)
      
      as well as handling cases of:
      
              kmalloc(a * b * c, gfp)
      
      with:
      
              kmalloc(array3_size(a, b, c), gfp)
      
      as it's slightly less ugly than:
      
              kmalloc_array(array_size(a, b), c, gfp)
      
      This does, however, attempt to ignore constant size factors like:
      
              kmalloc(4 * 1024, gfp)
      
      though any constants defined via macros get caught up in the conversion.
      
      Any factors with a sizeof() of "unsigned char", "char", and "u8" were
      dropped, since they're redundant.
      
      The tools/ directory was manually excluded, since it has its own
      implementation of kmalloc().
      
      The Coccinelle script used for this was:
      
      // Fix redundant parens around sizeof().
      @@
      type TYPE;
      expression THING, E;
      @@
      
      (
        kmalloc(
      -	(sizeof(TYPE)) * E
      +	sizeof(TYPE) * E
        , ...)
      |
        kmalloc(
      -	(sizeof(THING)) * E
      +	sizeof(THING) * E
        , ...)
      )
      
      // Drop single-byte sizes and redundant parens.
      @@
      expression COUNT;
      typedef u8;
      typedef __u8;
      @@
      
      (
        kmalloc(
      -	sizeof(u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(__u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(char) * (COUNT)
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(unsigned char) * (COUNT)
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(u8) * COUNT
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(__u8) * COUNT
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(char) * COUNT
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(unsigned char) * COUNT
      +	COUNT
        , ...)
      )
      
      // 2-factor product with sizeof(type/expression) and identifier or constant.
      @@
      type TYPE;
      expression THING;
      identifier COUNT_ID;
      constant COUNT_CONST;
      @@
      
      (
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * (COUNT_ID)
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * COUNT_ID
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * COUNT_CONST
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * (COUNT_ID)
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * COUNT_ID
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(THING)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * COUNT_CONST
      +	COUNT_CONST, sizeof(THING)
        , ...)
      )
      
      // 2-factor product, only identifiers.
      @@
      identifier SIZE, COUNT;
      @@
      
      - kmalloc
      + kmalloc_array
        (
      -	SIZE * COUNT
      +	COUNT, SIZE
        , ...)
      
      // 3-factor product with 1 sizeof(type) or sizeof(expression), with
      // redundant parens removed.
      @@
      expression THING;
      identifier STRIDE, COUNT;
      type TYPE;
      @@
      
      (
        kmalloc(
      -	sizeof(TYPE) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kmalloc(
      -	sizeof(THING) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kmalloc(
      -	sizeof(THING) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kmalloc(
      -	sizeof(THING) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kmalloc(
      -	sizeof(THING) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      )
      
      // 3-factor product with 2 sizeof(variable), with redundant parens removed.
      @@
      expression THING1, THING2;
      identifier COUNT;
      type TYPE1, TYPE2;
      @@
      
      (
        kmalloc(
      -	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kmalloc(
      -	sizeof(THING1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kmalloc(
      -	sizeof(THING1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      )
      
      // 3-factor product, only identifiers, with redundant parens removed.
      @@
      identifier STRIDE, SIZE, COUNT;
      @@
      
      (
        kmalloc(
      -	(COUNT) * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	COUNT * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	COUNT * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	(COUNT) * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	COUNT * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	(COUNT) * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	(COUNT) * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	COUNT * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      )
      
      // Any remaining multi-factor products, first at least 3-factor products,
      // when they're not all constants...
      @@
      expression E1, E2, E3;
      constant C1, C2, C3;
      @@
      
      (
        kmalloc(C1 * C2 * C3, ...)
      |
        kmalloc(
      -	(E1) * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kmalloc(
      -	(E1) * (E2) * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kmalloc(
      -	(E1) * (E2) * (E3)
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kmalloc(
      -	E1 * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      )
      
      // And then all remaining 2 factors products when they're not all constants,
      // keeping sizeof() as the second factor argument.
      @@
      expression THING, E1, E2;
      type TYPE;
      constant C1, C2, C3;
      @@
      
      (
        kmalloc(sizeof(THING) * C2, ...)
      |
        kmalloc(sizeof(TYPE) * C2, ...)
      |
        kmalloc(C1 * C2 * C3, ...)
      |
        kmalloc(C1 * C2, ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * (E2)
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * E2
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * (E2)
      +	E2, sizeof(THING)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * E2
      +	E2, sizeof(THING)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	(E1) * E2
      +	E1, E2
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	(E1) * (E2)
      +	E1, E2
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	E1 * E2
      +	E1, E2
        , ...)
      )
      Signed-off-by: NKees Cook <keescook@chromium.org>
      6da2ec56
  5. 09 6月, 2018 2 次提交
    • D
      Fix 16-byte memory leak in gssp_accept_sec_context_upcall · 0070ed3d
      Dave Wysochanski 提交于
      There is a 16-byte memory leak inside sunrpc/auth_gss on an nfs server when
      a client mounts with 'sec=krb5' in a simple mount / umount loop.  The leak
      is seen by either monitoring the kmalloc-16 slab or with kmemleak enabled
      
      unreferenced object 0xffff92e6a045f030 (size 16):
        comm "nfsd", pid 1096, jiffies 4294936658 (age 761.110s)
        hex dump (first 16 bytes):
          2a 86 48 86 f7 12 01 02 02 00 00 00 00 00 00 00  *.H.............
        backtrace:
          [<000000004b2b79a7>] gssx_dec_buffer+0x79/0x90 [auth_rpcgss]
          [<000000002610ac1a>] gssx_dec_accept_sec_context+0x215/0x6dd [auth_rpcgss]
          [<000000004fd0e81d>] rpcauth_unwrap_resp+0xa9/0xe0 [sunrpc]
          [<000000002b099233>] call_decode+0x1e9/0x840 [sunrpc]
          [<00000000954fc846>] __rpc_execute+0x80/0x3f0 [sunrpc]
          [<00000000c83a961c>] rpc_run_task+0x10d/0x150 [sunrpc]
          [<000000002c2cdcd2>] rpc_call_sync+0x4d/0xa0 [sunrpc]
          [<000000000b74eea2>] gssp_accept_sec_context_upcall+0x196/0x470 [auth_rpcgss]
          [<000000003271273f>] svcauth_gss_proxy_init+0x188/0x520 [auth_rpcgss]
          [<000000001cf69f01>] svcauth_gss_accept+0x3a6/0xb50 [auth_rpcgss]
      
      If you map the above to code you'll see the following call chain
        gssx_dec_accept_sec_context
          gssx_dec_ctx  (missing from kmemleak output)
            gssx_dec_buffer(xdr, &ctx->mech)
      
      Inside gssx_dec_buffer there is 'kmemdup' where we allocate memory for
      any gssx_buffer (buf) and store into buf->data.  In the above instance,
      'buf == &ctx->mech).
      
      Further up in the chain in gssp_accept_sec_context_upcall we see ctx->mech
      is part of a stack variable 'struct gssx_ctx rctxh'.  Now later inside
      gssp_accept_sec_context_upcall after gssp_call, there is a number of
      memcpy and kfree statements, but there is no kfree(rctxh.mech.data)
      after the memcpy into data->mech_oid.data.
      
      With this patch applied and the same mount / unmount loop, the kmalloc-16
      slab is stable and kmemleak enabled no longer shows the above backtrace.
      Signed-off-by: NDave Wysochanski <dwysocha@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      0070ed3d
    • C
      svcrdma: Fix incorrect return value/type in svc_rdma_post_recvs · af7fd74e
      Chuck Lever 提交于
      This crept in during the development process and wasn't caught
      before I posted the "final" version.
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Fixes: 0b2613c5883f ('svcrdma: Allocate recv_ctxt's on CPU ... ')
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      af7fd74e
  6. 02 6月, 2018 5 次提交
  7. 29 5月, 2018 1 次提交
    • A
      IB: Revert "remove redundant INFINIBAND kconfig dependencies" · 533d1dae
      Arnd Bergmann 提交于
      Several subsystems depend on INFINIBAND_ADDR_TRANS, which in turn depends
      on INFINIBAND. However, when with CONFIG_INIFIBAND=m, this leads to a
      link error when another driver using it is built-in. The
      INFINIBAND_ADDR_TRANS dependency is insufficient here as this is
      a 'bool' symbol that does not force anything to be a module in turn.
      
      fs/cifs/smbdirect.o: In function `smbd_disconnect_rdma_work':
      smbdirect.c:(.text+0x1e4): undefined reference to `rdma_disconnect'
      net/9p/trans_rdma.o: In function `rdma_request':
      trans_rdma.c:(.text+0x7bc): undefined reference to `rdma_disconnect'
      net/9p/trans_rdma.o: In function `rdma_destroy_trans':
      trans_rdma.c:(.text+0x830): undefined reference to `ib_destroy_qp'
      trans_rdma.c:(.text+0x858): undefined reference to `ib_dealloc_pd'
      
      Fixes: 9533b292 ("IB: remove redundant INFINIBAND kconfig dependencies")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NGreg Thelen <gthelen@google.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      533d1dae
  8. 12 5月, 2018 18 次提交
    • C
      svcrdma: Persistently allocate and DMA-map Send buffers · 99722fe4
      Chuck Lever 提交于
      While sending each RPC Reply, svc_rdma_sendto allocates and DMA-
      maps a separate buffer where the RPC/RDMA transport header is
      constructed. The buffer is unmapped and released in the Send
      completion handler. This is significant per-RPC overhead,
      especially for small RPCs.
      
      Instead, allocate and DMA-map a buffer, and cache it in each
      svc_rdma_send_ctxt. This buffer and its mapping can be re-used
      for each RPC, saving the cost of memory allocation and DMA
      mapping.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      99722fe4
    • C
      svcrdma: Simplify svc_rdma_send() · 3abb03fa
      Chuck Lever 提交于
      Clean up: No current caller of svc_rdma_send's passes in a chained
      WR. The logic that counts the chain length can be replaced with a
      constant (1).
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      3abb03fa
    • C
      svcrdma: Remove post_send_wr · 986b7889
      Chuck Lever 提交于
      Clean up: Now that the send_wr is part of the svc_rdma_send_ctxt,
      svc_rdma_post_send_wr is nearly empty.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      986b7889
    • C
      svcrdma: Don't overrun the SGE array in svc_rdma_send_ctxt · 25fd86ec
      Chuck Lever 提交于
      Receive buffers are always the same size, but each Send WR has a
      variable number of SGEs, based on the contents of the xdr_buf being
      sent.
      
      While assembling a Send WR, keep track of the number of SGEs so that
      we don't exceed the device's maximum, or walk off the end of the
      Send SGE array.
      
      For now the Send path just fails if it exceeds the maximum.
      
      The current logic in svc_rdma_accept bases the maximum number of
      Send SGEs on the largest NFS request that can be sent or received.
      In the transport layer, the limit is actually based on the
      capabilities of the underlying device, not on properties of the
      Upper Layer Protocol.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      25fd86ec
    • C
      svcrdma: Introduce svc_rdma_send_ctxt · 4201c746
      Chuck Lever 提交于
      svc_rdma_op_ctxt's are pre-allocated and maintained on a per-xprt
      free list. This eliminates the overhead of calling kmalloc / kfree,
      both of which grab a globally shared lock that disables interrupts.
      Introduce a replacement to svc_rdma_op_ctxt's that is built
      especially for the svcrdma Send path.
      
      Subsequent patches will take advantage of this new structure by
      allocating real resources which are then cached in these objects.
      The allocations are freed when the transport is torn down.
      
      I've renamed the structure so that static type checking can be used
      to ensure that uses of op_ctxt and send_ctxt are not confused. As an
      additional clean up, structure fields are renamed to conform with
      kernel coding conventions.
      
      Additional clean ups:
      - Handle svc_rdma_send_ctxt_get allocation failure at each call
        site, rather than pre-allocating and hoping we guessed correctly
      - All send_ctxt_put call-sites request page freeing, so remove
        the @free_pages argument
      - All send_ctxt_put call-sites unmap SGEs, so fold that into
        svc_rdma_send_ctxt_put
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      4201c746
    • C
      svcrdma: Clean up Send SGE accounting · 23262790
      Chuck Lever 提交于
      Clean up: Since there's already a svc_rdma_op_ctxt being passed
      around with the running count of mapped SGEs, drop unneeded
      parameters to svc_rdma_post_send_wr().
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      23262790
    • C
      svcrdma: Refactor svc_rdma_dma_map_buf · f016f305
      Chuck Lever 提交于
      Clean up: svc_rdma_dma_map_buf does mostly the same thing as
      svc_rdma_dma_map_page, so let's fold these together.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      f016f305
    • C
      svcrdma: Allocate recv_ctxt's on CPU handling Receives · eb5d7a62
      Chuck Lever 提交于
      There is a significant latency penalty when processing an ingress
      Receive if the Receive buffer resides in memory that is not on the
      same NUMA node as the the CPU handling completions for a CQ.
      
      The system administrator and the device driver determine which CPU
      handles completions. This CPU does not change during life of the CQ.
      Further the Upper Layer does not have any visibility of which CPU it
      is.
      
      Allocating Receive buffers in the Receive completion handler
      guarantees that Receive buffers are allocated on the preferred NUMA
      node for that CQ.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      eb5d7a62
    • C
      svcrdma: Persistently allocate and DMA-map Receive buffers · 3316f063
      Chuck Lever 提交于
      The current Receive path uses an array of pages which are allocated
      and DMA mapped when each Receive WR is posted, and then handed off
      to the upper layer in rqstp::rq_arg. The page flip releases unused
      pages in the rq_pages pagelist. This mechanism introduces a
      significant amount of overhead.
      
      So instead, kmalloc the Receive buffer, and leave it DMA-mapped
      while the transport remains connected. This confers a number of
      benefits:
      
      * Each Receive WR requires only one receive SGE, no matter how large
        the inline threshold is. This helps the server-side NFS/RDMA
        transport operate on less capable RDMA devices.
      
      * The Receive buffer is left allocated and mapped all the time. This
        relieves svc_rdma_post_recv from the overhead of allocating and
        DMA-mapping a fresh buffer.
      
      * svc_rdma_wc_receive no longer has to DMA unmap the Receive buffer.
        It has to DMA sync only the number of bytes that were received.
      
      * svc_rdma_build_arg_xdr no longer has to free a page in rq_pages
        for each page in the Receive buffer, making it a constant-time
        function.
      
      * The Receive buffer is now plugged directly into the rq_arg's
        head[0].iov_vec, and can be larger than a page without spilling
        over into rq_arg's page list. This enables simplification of
        the RDMA Read path in subsequent patches.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      3316f063
    • C
      svcrdma: Preserve Receive buffer until svc_rdma_sendto · 3a88092e
      Chuck Lever 提交于
      Rather than releasing the incoming svc_rdma_recv_ctxt at the end of
      svc_rdma_recvfrom, hold onto it until svc_rdma_sendto.
      
      This permits the contents of the Receive buffer to be preserved
      through svc_process and then referenced directly in sendto as it
      constructs Write and Reply chunks to return to the client.
      
      The real changes will come in subsequent patches.
      
      Note: I cannot use ->xpo_release_rqst for this purpose because that
      is called _before_ ->xpo_sendto. svc_rdma_sendto uses information in
      the received Call transport header to construct the Reply transport
      header, which is preserved in the RPC's Receive buffer.
      
      The historical comment in svc_send() isn't helpful: it is already
      obvious that ->xpo_release_rqst is being called before ->xpo_sendto,
      but there is no explanation for this ordering going back to the
      beginning of the git era.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      3a88092e
    • C
      svcrdma: Simplify svc_rdma_recv_ctxt_put · 1e5f4160
      Chuck Lever 提交于
      Currently svc_rdma_recv_ctxt_put's callers have to know whether they
      want to free the ctxt's pages or not. This means the human
      developers have to know when and why to set that free_pages
      argument.
      
      Instead, the ctxt should carry that information with it so that
      svc_rdma_recv_ctxt_put does the right thing no matter who is
      calling.
      
      We want to keep track of the number of pages in the Receive buffer
      separately from the number of pages pulled over by RDMA Read. This
      is so that the correct number of pages can be freed properly and
      that number is well-documented.
      
      So now, rc_hdr_count is the number of pages consumed by head[0]
      (ie., the page index where the Read chunk should start); and
      rc_page_count is always the number of pages that need to be released
      when the ctxt is put.
      
      The @free_pages argument is no longer needed.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      1e5f4160
    • C
      svcrdma: Remove sc_rq_depth · 2c577bfe
      Chuck Lever 提交于
      Clean up: No need to retain rq_depth in struct svcrdma_xprt, it is
      used only in svc_rdma_accept().
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      2c577bfe
    • C
      svcrdma: Introduce svc_rdma_recv_ctxt · ecf85b23
      Chuck Lever 提交于
      svc_rdma_op_ctxt's are pre-allocated and maintained on a per-xprt
      free list. This eliminates the overhead of calling kmalloc / kfree,
      both of which grab a globally shared lock that disables interrupts.
      To reduce contention further, separate the use of these objects in
      the Receive and Send paths in svcrdma.
      
      Subsequent patches will take advantage of this separation by
      allocating real resources which are then cached in these objects.
      The allocations are freed when the transport is torn down.
      
      I've renamed the structure so that static type checking can be used
      to ensure that uses of op_ctxt and recv_ctxt are not confused. As an
      additional clean up, structure fields are renamed to conform with
      kernel coding conventions.
      
      As a final clean up, helpers related to recv_ctxt are moved closer
      to the functions that use them.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      ecf85b23
    • C
      svcrdma: Trace key RDMA API events · bd2abef3
      Chuck Lever 提交于
      This includes:
        * Posting on the Send and Receive queues
        * Send, Receive, Read, and Write completion
        * Connect upcalls
        * QP errors
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      bd2abef3
    • C
      svcrdma: Trace key RPC/RDMA protocol events · 98895edb
      Chuck Lever 提交于
      This includes:
        * Transport accept and tear-down
        * Decisions about using Write and Reply chunks
        * Each RDMA segment that is handled
        * Whenever an RDMA_ERR is sent
      
      As a clean-up, I've standardized the order of the includes, and
      removed some now redundant dprintk call sites.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      98895edb
    • C
      xprtrdma: Prepare RPC/RDMA includes for server-side trace points · b6e717cb
      Chuck Lever 提交于
      Clean up: Move #include <trace/events/rpcrdma.h> into source files,
      similar to how it is done with trace/events/sunrpc.h.
      
      Server-side trace points will be part of the rpcrdma subsystem,
      just like the client-side trace points.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      b6e717cb
    • C
      svcrdma: Use passed-in net namespace when creating RDMA listener · 8dafcbee
      Chuck Lever 提交于
      Ensure each RDMA listener and its children transports are created in
      the same net namespace as the user that started the NFS service.
      This is similar to how listener sockets are created in
      svc_create_socket, required for enabling support for containers.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      8dafcbee
    • C
  9. 09 5月, 2018 1 次提交
  10. 07 5月, 2018 4 次提交