1. 26 4月, 2017 8 次提交
    • C
      svcrdma: Clean up RDMA_ERROR path · 6b19cc5c
      Chuck Lever 提交于
      Now that svc_rdma_sendto has been renovated, svc_rdma_send_error can
      be refactored to reduce code duplication and remove C structure-
      based XDR encoding. It is also relocated to the source file that
      contains its only caller.
      
      This is a refactoring change only.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      6b19cc5c
    • C
      svcrdma: Use rdma_rw API in RPC reply path · 9a6a180b
      Chuck Lever 提交于
      The current svcrdma sendto code path posts one RDMA Write WR at a
      time. Each of these Writes typically carries a small number of pages
      (for instance, up to 30 pages for mlx4 devices). That means a 1MB
      NFS READ reply requires 9 ib_post_send() calls for the Write WRs,
      and one for the Send WR carrying the actual RPC Reply message.
      
      Instead, use the new rdma_rw API. The details of Write WR chain
      construction and memory registration are taken care of in the RDMA
      core. svcrdma can focus on the details of the RPC-over-RDMA
      protocol. This gives three main benefits:
      
      1. All Write WRs for one RDMA segment are posted in a single chain.
      As few as one ib_post_send() for each Write chunk.
      
      2. The Write path can now use FRWR to register the Write buffers.
      If the device's maximum page list depth is large, this means a
      single Write WR is needed for each RPC's Write chunk data.
      
      3. The new code introduces support for RPCs that carry both a Write
      list and a Reply chunk. This combination can be used for an NFSv4
      READ where the data payload is large, and thus is removed from the
      Payload Stream, but the Payload Stream is still larger than the
      inline threshold.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      9a6a180b
    • C
      svcrdma: Introduce local rdma_rw API helpers · f13193f5
      Chuck Lever 提交于
      The plan is to replace the local bespoke code that constructs and
      posts RDMA Read and Write Work Requests with calls to the rdma_rw
      API. This shares code with other RDMA-enabled ULPs that manages the
      gory details of buffer registration and posting Work Requests.
      
      Some design notes:
      
       o The structure of RPC-over-RDMA transport headers is flexible,
         allowing multiple segments per Reply with arbitrary alignment,
         each with a unique R_key. Write and Send WRs continue to be
         built and posted in separate code paths. However, one whole
         chunk (with one or more RDMA segments apiece) gets exactly
         one ib_post_send and one work completion.
      
       o svc_xprt reference counting is modified, since a chain of
         rdma_rw_ctx structs generates one completion, no matter how
         many Write WRs are posted.
      
       o The current code builds the transport header as it is construct-
         ing Write WRs. I've replaced that with marshaling of transport
         header data items in a separate step. This is because the exact
         structure of client-provided segments may not align with the
         components of the server's reply xdr_buf, or the pages in the
         page list. Thus parts of each client-provided segment may be
         written at different points in the send path.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      f13193f5
    • C
      svcrdma: Eliminate RPCRDMA_SQ_DEPTH_MULT · b623589d
      Chuck Lever 提交于
      The Send Queue depth is temporarily reduced to 1 SQE per credit. The
      new rdma_rw API does an internal computation, during QP creation, to
      increase the depth of the Send Queue to handle RDMA Read and Write
      operations.
      
      This change has to come before the NFSD code paths are updated to
      use the rdma_rw API. Without this patch, rdma_rw_init_qp() increases
      the size of the SQ too much, resulting in memory allocation failures
      during QP creation.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      b623589d
    • C
      svcrdma: Add svc_rdma_map_reply_hdr() · 6e6092ca
      Chuck Lever 提交于
      Introduce a helper to DMA-map a reply's transport header before
      sending it. This will in part replace the map vector cache.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      6e6092ca
    • C
      svcrdma: Move send_wr to svc_rdma_op_ctxt · 17f5f7f5
      Chuck Lever 提交于
      Clean up: Move the ib_send_wr off the stack, and move common code
      to post a Send Work Request into a helper.
      
      This is a refactoring change only.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      17f5f7f5
    • D
      uapi: fix linux/nfsd/cld.h userspace compilation errors · 16719199
      Dmitry V. Levin 提交于
      Include <linux/types.h> and consistently use types it provides
      to fix the following linux/nfsd/cld.h userspace compilation errors:
      
      /usr/include/linux/nfsd/cld.h:40:2: error: unknown type name 'uint16_t'
        uint16_t cn_len;    /* length of cm_id */
      /usr/include/linux/nfsd/cld.h:46:2: error: unknown type name 'uint8_t'
        uint8_t  cm_vers;  /* upcall version */
      /usr/include/linux/nfsd/cld.h:47:2: error: unknown type name 'uint8_t'
        uint8_t  cm_cmd;   /* upcall command */
      /usr/include/linux/nfsd/cld.h:48:2: error: unknown type name 'int16_t'
        int16_t  cm_status;  /* return code */
      /usr/include/linux/nfsd/cld.h:49:2: error: unknown type name 'uint32_t'
        uint32_t cm_xid;   /* transaction id */
      /usr/include/linux/nfsd/cld.h:51:3: error: unknown type name 'int64_t'
         int64_t  cm_gracetime; /* grace period start time */
      Signed-off-by: NDmitry V. Levin <ldv@altlinux.org>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      16719199
    • J
      nfsd: check for oversized NFSv2/v3 arguments · 51f56777
      J. Bruce Fields 提交于
      A client can append random data to the end of an NFSv2 or NFSv3 RPC call
      without our complaining; we'll just stop parsing at the end of the
      expected data and ignore the rest.
      
      Encoded arguments and replies are stored together in an array of pages,
      and if a call is too large it could leave inadequate space for the
      reply.  This is normally OK because NFS RPC's typically have either
      short arguments and long replies (like READ) or long arguments and short
      replies (like WRITE).  But a client that sends an incorrectly long reply
      can violate those assumptions.  This was observed to cause crashes.
      
      So, insist that the argument not be any longer than we expect.
      
      Also, several operations increment rq_next_page in the decode routine
      before checking the argument size, which can leave rq_next_page pointing
      well past the end of the page array, causing trouble later in
      svc_free_pages.
      
      As followup we may also want to rewrite the encoding routines to check
      more carefully that they aren't running off the end of the page array.
      Reported-by: NTuomas Haanpää <thaan@synopsys.com>
      Reported-by: NAri Kauppi <ari@synopsys.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      51f56777
  2. 22 4月, 2017 1 次提交
    • D
      net: ipv6: RTF_PCPU should not be settable from userspace · 557c44be
      David Ahern 提交于
      Andrey reported a fault in the IPv6 route code:
      
      kasan: GPF could be caused by NULL-ptr deref or user memory access
      general protection fault: 0000 [#1] SMP KASAN
      Modules linked in:
      CPU: 1 PID: 4035 Comm: a.out Not tainted 4.11.0-rc7+ #250
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      task: ffff880069809600 task.stack: ffff880062dc8000
      RIP: 0010:ip6_rt_cache_alloc+0xa6/0x560 net/ipv6/route.c:975
      RSP: 0018:ffff880062dced30 EFLAGS: 00010206
      RAX: dffffc0000000000 RBX: ffff8800670561c0 RCX: 0000000000000006
      RDX: 0000000000000003 RSI: ffff880062dcfb28 RDI: 0000000000000018
      RBP: ffff880062dced68 R08: 0000000000000001 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
      R13: ffff880062dcfb28 R14: dffffc0000000000 R15: 0000000000000000
      FS:  00007feebe37e7c0(0000) GS:ffff88006cb00000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00000000205a0fe4 CR3: 000000006b5c9000 CR4: 00000000000006e0
      Call Trace:
       ip6_pol_route+0x1512/0x1f20 net/ipv6/route.c:1128
       ip6_pol_route_output+0x4c/0x60 net/ipv6/route.c:1212
      ...
      
      Andrey's syzkaller program passes rtmsg.rtmsg_flags with the RTF_PCPU bit
      set. Flags passed to the kernel are blindly copied to the allocated
      rt6_info by ip6_route_info_create making a newly inserted route appear
      as though it is a per-cpu route. ip6_rt_cache_alloc sees the flag set
      and expects rt->dst.from to be set - which it is not since it is not
      really a per-cpu copy. The subsequent call to __ip6_dst_alloc then
      generates the fault.
      
      Fix by checking for the flag and failing with EINVAL.
      
      Fixes: d52d3997 ("ipv6: Create percpu rt6_info")
      Reported-by: NAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Acked-by: NMartin KaFai Lau <kafai@fb.com>
      Tested-by: NAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      557c44be
  3. 19 4月, 2017 1 次提交
  4. 15 4月, 2017 1 次提交
    • M
      block: fix bio_will_gap() for first bvec with offset · 5a8d75a1
      Ming Lei 提交于
      Commit 729204ef("block: relax check on sg gap") allows us to merge
      bios, if both are physically contiguous.  This change can merge a huge
      number of small bios, through mkfs for example, mkfs.ntfs running time
      can be decreased to ~1/10.
      
      But if one rq starts with a non-aligned buffer (the 1st bvec's bv_offset
      is non-zero) and if we allow the merge, it is quite difficult to respect
      sg gap limit, especially the max segment size, or we risk having an
      unaligned virtual boundary.  This patch tries to avoid the issue by
      disallowing a merge, if the req starts with an unaligned buffer.
      
      Also add comments to explain why the merged segment can't end in
      unaligned virt boundary.
      
      Fixes: 729204ef ("block: relax check on sg gap")
      Tested-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      
      Rewrote parts of the commit message and comments.
      Signed-off-by: NJens Axboe <axboe@fb.com>
      5a8d75a1
  5. 14 4月, 2017 1 次提交
  6. 11 4月, 2017 1 次提交
  7. 10 4月, 2017 1 次提交
    • H
      crypto: ahash - Fix EINPROGRESS notification callback · ef0579b6
      Herbert Xu 提交于
      The ahash API modifies the request's callback function in order
      to clean up after itself in some corner cases (unaligned final
      and missing finup).
      
      When the request is complete ahash will restore the original
      callback and everything is fine.  However, when the request gets
      an EBUSY on a full queue, an EINPROGRESS callback is made while
      the request is still ongoing.
      
      In this case the ahash API will incorrectly call its own callback.
      
      This patch fixes the problem by creating a temporary request
      object on the stack which is used to relay EINPROGRESS back to
      the original completion function.
      
      This patch also adds code to preserve the original flags value.
      
      Fixes: ab6bf4e5 ("crypto: hash - Fix the pointer voodoo in...")
      Cc: <stable@vger.kernel.org>
      Reported-by: NSabrina Dubroca <sd@queasysnail.net>
      Tested-by: NSabrina Dubroca <sd@queasysnail.net>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      ef0579b6
  8. 08 4月, 2017 3 次提交
  9. 07 4月, 2017 3 次提交
  10. 05 4月, 2017 2 次提交
  11. 04 4月, 2017 4 次提交
  12. 03 4月, 2017 3 次提交
  13. 02 4月, 2017 1 次提交
  14. 01 4月, 2017 4 次提交
  15. 31 3月, 2017 3 次提交
    • M
      target: Fix ALUA transition state race between multiple initiators · d19c4643
      Mike Christie 提交于
      Multiple threads could be writing to alua_access_state at
      the same time, or there could be multiple STPGs in flight
      (different initiators sending them or one initiator sending
      them to different ports), or a combo of both and the
      core_alua_do_transition_tg_pt calls will race with each other.
      
      Because from the last patches we no longer delay running
      core_alua_do_transition_tg_pt_work, there does not seem to be
      any point in running that in a workqueue. And, we always
      wait for it to complete one way or another, so we can sleep
      in this code path. So, this patch made over target-pending just adds a
      mutex and does the work core_alua_do_transition_tg_pt_work was doing in
      core_alua_do_transition_tg_pt.
      
      There is also no need to use an atomic for the
      tg_pt_gp_alua_access_state. In core_alua_do_transition_tg_pt we will
      test and set it under the transition mutex. And, it is a int/32 bits
      so in the other places where it is read, we will never see it partially
      updated.
      Signed-off-by: NMike Christie <mchristi@redhat.com>
      Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
      d19c4643
    • N
      target: Fix unknown fabric callback queue-full errors · fa7e25cf
      Nicholas Bellinger 提交于
      This patch fixes a set of queue-full response handling
      bugs, where outgoing responses are leaked when a fabric
      driver is propagating non -EAGAIN or -ENOMEM errors
      to target-core.
      
      It introduces TRANSPORT_COMPLETE_QF_ERR state used to
      signal when CHECK_CONDITION status should be generated,
      when fabric driver ->write_pending(), ->queue_data_in(),
      or ->queue_status() callbacks fail with non -EAGAIN or
      -ENOMEM errors, and data-transfer should not be retried.
      
      Note all fabric driver -EAGAIN and -ENOMEM errors are
      still retried indefinately with associated data-transfer
      callbacks, following existing queue-full logic.
      
      Also fix two missing ->queue_status() queue-full cases
      related to CMD_T_ABORTED w/ TAS status handling.
      Reported-by: NPotnuri Bharat Teja <bharat@chelsio.com>
      Reviewed-by: NPotnuri Bharat Teja <bharat@chelsio.com>
      Tested-by: NPotnuri Bharat Teja <bharat@chelsio.com>
      Cc: Potnuri Bharat Teja <bharat@chelsio.com>
      Reported-by: NSteve Wise <swise@opengridcomputing.com>
      Cc: Steve Wise <swise@opengridcomputing.com>
      Cc: Sagi Grimberg <sagi@grimberg.me>
      Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
      fa7e25cf
    • X
      sctp: alloc stream info when initializing asoc · 3dbcc105
      Xin Long 提交于
      When sending a msg without asoc established, sctp will send INIT packet
      first and then enqueue chunks.
      
      Before receiving INIT_ACK, stream info is not yet alloced. But enqueuing
      chunks needs to access stream info, like out stream state and out stream
      cnt.
      
      This patch is to fix it by allocing out stream info when initializing an
      asoc, allocing in stream and re-allocing out stream when processing init.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3dbcc105
  16. 30 3月, 2017 2 次提交
    • T
      drm/ttm, drm/vmwgfx: Relax permission checking when opening surfaces · fe25deb7
      Thomas Hellstrom 提交于
      Previously, when a surface was opened using a legacy (non prime) handle,
      it was verified to have been created by a client in the same master realm.
      Relax this so that opening is also allowed recursively if the client
      already has the surface open.
      
      This works around a regression in svga mesa where opening of a shared
      surface is used recursively to obtain surface information.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NThomas Hellstrom <thellstrom@vmware.com>
      Reviewed-by: NSinclair Yeh <syeh@vmware.com>
      fe25deb7
    • N
      target: Avoid mappedlun symlink creation during lun shutdown · 49cb77e2
      Nicholas Bellinger 提交于
      This patch closes a race between se_lun deletion during configfs
      unlink in target_fabric_port_unlink() -> core_dev_del_lun()
      -> core_tpg_remove_lun(), when transport_clear_lun_ref() blocks
      waiting for percpu_ref RCU grace period to finish, but a new
      NodeACL mappedlun is added before the RCU grace period has
      completed.
      
      This can happen in target_fabric_mappedlun_link() because it
      only checks for se_lun->lun_se_dev, which is not cleared until
      after transport_clear_lun_ref() percpu_ref RCU grace period
      finishes.
      
      This bug originally manifested as NULL pointer dereference
      OOPsen in target_stat_scsi_att_intr_port_show_attr_dev() on
      v4.1.y code, because it dereferences lun->lun_se_dev without
      a explicit NULL pointer check.
      
      In post v4.1 code with target-core RCU conversion, the code
      in target_stat_scsi_att_intr_port_show_attr_dev() no longer
      uses se_lun->lun_se_dev, but the same race still exists.
      
      To address the bug, go ahead and set se_lun>lun_shutdown as
      early as possible in core_tpg_remove_lun(), and ensure new
      NodeACL mappedlun creation in target_fabric_mappedlun_link()
      fails during se_lun shutdown.
      Reported-by: NJames Shen <jcs@datera.io>
      Cc: James Shen <jcs@datera.io>
      Tested-by: NJames Shen <jcs@datera.io>
      Cc: stable@vger.kernel.org # 3.10+
      Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
      49cb77e2
  17. 29 3月, 2017 1 次提交
    • X
      sctp: change to save MSG_MORE flag into assoc · f9ba3501
      Xin Long 提交于
      David Laight noticed the support for MSG_MORE with datamsg->force_delay
      didn't really work as we expected, as the first msg with MSG_MORE set
      would always block the following chunks' dequeuing.
      
      This Patch is to rewrite it by saving the MSG_MORE flag into assoc as
      David Laight suggested.
      
      asoc->force_delay is used to save MSG_MORE flag before a msg is sent.
      All chunks in queue would not be sent out if asoc->force_delay is set
      by the msg with MSG_MORE flag, until a new msg without MSG_MORE flag
      clears asoc->force_delay.
      
      Note that this change would not affect the flush is generated by other
      triggers, like asoc->state != ESTABLISHED, queue size > pmtu etc.
      
      v1->v2:
        Not clear asoc->force_delay after sending the msg with MSG_MORE flag.
      
      Fixes: 4ea0c32f ("sctp: add support for MSG_MORE")
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Acked-by: NDavid Laight <david.laight@aculab.com>
      Acked-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f9ba3501