1. 18 5月, 2016 4 次提交
    • C
      xprtrdma: Prevent inline overflow · 302d3deb
      Chuck Lever 提交于
      When deciding whether to send a Call inline, rpcrdma_marshal_req
      doesn't take into account header bytes consumed by chunk lists.
      This results in Call messages on the wire that are sometimes larger
      than the inline threshold.
      
      Likewise, when a Write list or Reply chunk is in play, the server's
      reply has to emit an RDMA Send that includes a larger-than-minimal
      RPC-over-RDMA header.
      
      The actual size of a Call message cannot be estimated until after
      the chunk lists have been registered. Thus the size of each
      RPC-over-RDMA header can be estimated only after chunks are
      registered; but the decision to register chunks is based on the size
      of that header. Chicken, meet egg.
      
      The best a client can do is estimate header size based on the
      largest header that might occur, and then ensure that inline content
      is always smaller than that.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Tested-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      302d3deb
    • C
      xprtrdma: Limit number of RDMA segments in RPC-over-RDMA headers · 94931746
      Chuck Lever 提交于
      Send buffer space is shared between the RPC-over-RDMA header and
      an RPC message. A large RPC-over-RDMA header means less space is
      available for the associated RPC message, which then has to be
      moved via an RDMA Read or Write.
      
      As more segments are added to the chunk lists, the header increases
      in size.  Typical modern hardware needs only a few segments to
      convey the maximum payload size, but some devices and registration
      modes may need a lot of segments to convey data payload. Sometimes
      so many are needed that the remaining space in the Send buffer is
      not enough for the RPC message. Sending such a message usually
      fails.
      
      To ensure a transport can always make forward progress, cap the
      number of RDMA segments that are allowed in chunk lists. This
      prevents less-capable devices and memory registrations from
      consuming a large portion of the Send buffer by reducing the
      maximum data payload that can be conveyed with such devices.
      
      For now I choose an arbitrary maximum of 8 RDMA segments. This
      allows a maximum size RPC-over-RDMA header to fit nicely in the
      current 1024 byte inline threshold with over 700 bytes remaining
      for an inline RPC message.
      
      The current maximum data payload of NFS READ or WRITE requests is
      one megabyte. To convey that payload on a client with 4KB pages,
      each chunk segment would need to handle 32 or more data pages. This
      is well within the capabilities of FMR. For physical registration,
      the maximum payload size on platforms with 4KB pages is reduced to
      32KB.
      
      For FRWR, a device's maximum page list depth would need to be at
      least 34 to support the maximum 1MB payload. A device with a smaller
      maximum page list depth means the maximum data payload is reduced
      when using that device.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Tested-by: NSteve Wise <swise@opengridcomputing.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      94931746
    • C
      xprtrdma: Bound the inline threshold values · 29c55422
      Chuck Lever 提交于
      Currently the sysctls that allow setting the inline threshold allow
      any value to be set.
      
      Small values only make the transport run slower. The default 1KB
      setting is as low as is reasonable. And the logic that decides how
      to divide a Send buffer between RPC-over-RDMA header and RPC message
      assumes (but does not check) that the lower bound is not crazy (say,
      57 bytes).
      
      Send and receive buffers share a page with some control information.
      Values larger than about 3KB can't be supported, currently.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Tested-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      29c55422
    • C
      sunrpc: Advertise maximum backchannel payload size · 6b26cc8c
      Chuck Lever 提交于
      RPC-over-RDMA transports have a limit on how large a backward
      direction (backchannel) RPC message can be. Ensure that the NFSv4.x
      CREATE_SESSION operation advertises this limit to servers.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Tested-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      6b26cc8c
  2. 09 5月, 2016 3 次提交
  3. 05 4月, 2016 2 次提交
    • K
      mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usage · ea1754a0
      Kirill A. Shutemov 提交于
      Mostly direct substitution with occasional adjustment or removing
      outdated comments.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ea1754a0
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  4. 04 4月, 2016 1 次提交
  5. 18 3月, 2016 1 次提交
  6. 15 3月, 2016 9 次提交
  7. 02 3月, 2016 12 次提交
  8. 24 2月, 2016 1 次提交
    • S
      sunrpc/cache: fix off-by-one in qword_get() · b7052cd7
      Stefan Hajnoczi 提交于
      The qword_get() function NUL-terminates its output buffer.  If the input
      string is in hex format \xXXXX... and the same length as the output
      buffer, there is an off-by-one:
      
        int qword_get(char **bpp, char *dest, int bufsize)
        {
            ...
            while (len < bufsize) {
                ...
                *dest++ = (h << 4) | l;
                len++;
            }
            ...
            *dest = '\0';
            return len;
        }
      
      This patch ensures the NUL terminator doesn't fall outside the output
      buffer.
      Signed-off-by: NStefan Hajnoczi <stefanha@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      b7052cd7
  9. 18 2月, 2016 1 次提交
    • S
      auth_gss: fix panic in gss_pipe_downcall() in fips mode · 437b300c
      Scott Mayhew 提交于
      On Mon, 15 Feb 2016, Trond Myklebust wrote:
      
      > Hi Scott,
      >
      > On Mon, Feb 15, 2016 at 2:28 PM, Scott Mayhew <smayhew@redhat.com> wrote:
      > > md5 is disabled in fips mode, and attempting to import a gss context
      > > using md5 while in fips mode will result in crypto_alg_mod_lookup()
      > > returning -ENOENT, which will make its way back up to
      > > gss_pipe_downcall(), where the BUG() is triggered.  Handling the -ENOENT
      > > allows for a more graceful failure.
      > >
      > > Signed-off-by: Scott Mayhew <smayhew@redhat.com>
      > > ---
      > >  net/sunrpc/auth_gss/auth_gss.c | 3 +++
      > >  1 file changed, 3 insertions(+)
      > >
      > > diff --git a/net/sunrpc/auth_gss/auth_gss.c b/net/sunrpc/auth_gss/auth_gss.c
      > > index 799e65b..c30fc3b 100644
      > > --- a/net/sunrpc/auth_gss/auth_gss.c
      > > +++ b/net/sunrpc/auth_gss/auth_gss.c
      > > @@ -737,6 +737,9 @@ gss_pipe_downcall(struct file *filp, const char __user *src, size_t mlen)
      > >                 case -ENOSYS:
      > >                         gss_msg->msg.errno = -EAGAIN;
      > >                         break;
      > > +               case -ENOENT:
      > > +                       gss_msg->msg.errno = -EPROTONOSUPPORT;
      > > +                       break;
      > >                 default:
      > >                         printk(KERN_CRIT "%s: bad return from "
      > >                                 "gss_fill_context: %zd\n", __func__, err);
      > > --
      > > 2.4.3
      > >
      >
      > Well debugged, but I unfortunately do have to ask if this patch is
      > sufficient? In addition to -ENOENT, and -ENOMEM, it looks to me as if
      > crypto_alg_mod_lookup() can also fail with -EINTR, -ETIMEDOUT, and
      > -EAGAIN. Don't we also want to handle those?
      
      You're right, I was focusing on the panic that I could easily reproduce.
      I'm still not sure how I could trigger those other conditions.
      
      >
      > In fact, peering into the rats nest that is
      > gss_import_sec_context_kerberos(), it looks as if that is just a tiny
      > subset of all the errors that we might run into. Perhaps the right
      > thing to do here is to get rid of the BUG() (but keep the above
      > printk) and just return a generic error?
      
      That sounds fine to me -- updated patch attached.
      
      -Scott
      
      >From d54c6b64a107a90a38cab97577de05f9a4625052 Mon Sep 17 00:00:00 2001
      From: Scott Mayhew <smayhew@redhat.com>
      Date: Mon, 15 Feb 2016 15:12:19 -0500
      Subject: [PATCH] auth_gss: remove the BUG() from gss_pipe_downcall()
      
      Instead return a generic error via gss_msg->msg.errno.  None of the
      errors returned by gss_fill_context() should necessarily trigger a
      kernel panic.
      Signed-off-by: NScott Mayhew <smayhew@redhat.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      437b300c
  10. 17 2月, 2016 1 次提交
  11. 06 2月, 2016 5 次提交