1. 25 1月, 2017 1 次提交
    • K
      SUNRPC: cleanup ida information when removing sunrpc module · c929ea0b
      Kinglong Mee 提交于
      After removing sunrpc module, I get many kmemleak information as,
      unreferenced object 0xffff88003316b1e0 (size 544):
        comm "gssproxy", pid 2148, jiffies 4294794465 (age 4200.081s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<ffffffffb0cfb58a>] kmemleak_alloc+0x4a/0xa0
          [<ffffffffb03507fe>] kmem_cache_alloc+0x15e/0x1f0
          [<ffffffffb0639baa>] ida_pre_get+0xaa/0x150
          [<ffffffffb0639cfd>] ida_simple_get+0xad/0x180
          [<ffffffffc06054fb>] nlmsvc_lookup_host+0x4ab/0x7f0 [lockd]
          [<ffffffffc0605e1d>] lockd+0x4d/0x270 [lockd]
          [<ffffffffc06061e5>] param_set_timeout+0x55/0x100 [lockd]
          [<ffffffffc06cba24>] svc_defer+0x114/0x3f0 [sunrpc]
          [<ffffffffc06cbbe7>] svc_defer+0x2d7/0x3f0 [sunrpc]
          [<ffffffffc06c71da>] rpc_show_info+0x8a/0x110 [sunrpc]
          [<ffffffffb044a33f>] proc_reg_write+0x7f/0xc0
          [<ffffffffb038e41f>] __vfs_write+0xdf/0x3c0
          [<ffffffffb0390f1f>] vfs_write+0xef/0x240
          [<ffffffffb0392fbd>] SyS_write+0xad/0x130
          [<ffffffffb0d06c37>] entry_SYSCALL_64_fastpath+0x1a/0xa9
          [<ffffffffffffffff>] 0xffffffffffffffff
      
      I found, the ida information (dynamic memory) isn't cleanup.
      Signed-off-by: NKinglong Mee <kinglongmee@gmail.com>
      Fixes: 2f048db4 ("SUNRPC: Add an identifier for struct rpc_clnt")
      Cc: stable@vger.kernel.org # v3.12+
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      c929ea0b
  2. 13 1月, 2017 3 次提交
  3. 26 12月, 2016 1 次提交
    • T
      ktime: Get rid of the union · 2456e855
      Thomas Gleixner 提交于
      ktime is a union because the initial implementation stored the time in
      scalar nanoseconds on 64 bit machine and in a endianess optimized timespec
      variant for 32bit machines. The Y2038 cleanup removed the timespec variant
      and switched everything to scalar nanoseconds. The union remained, but
      become completely pointless.
      
      Get rid of the union and just keep ktime_t as simple typedef of type s64.
      
      The conversion was done with coccinelle and some manual mopping up.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      2456e855
  4. 25 12月, 2016 1 次提交
  5. 10 12月, 2016 1 次提交
    • N
      SUNRPC: fix refcounting problems with auth_gss messages. · 1cded9d2
      NeilBrown 提交于
      There are two problems with refcounting of auth_gss messages.
      
      First, the reference on the pipe->pipe list (taken by a call
      to rpc_queue_upcall()) is not counted.  It seems to be
      assumed that a message in pipe->pipe will always also be in
      pipe->in_downcall, where it is correctly reference counted.
      
      However there is no guaranty of this.  I have a report of a
      NULL dereferences in rpc_pipe_read() which suggests a msg
      that has been freed is still on the pipe->pipe list.
      
      One way I imagine this might happen is:
      - message is queued for uid=U and auth->service=S1
      - rpc.gssd reads this message and starts processing.
        This removes the message from pipe->pipe
      - message is queued for uid=U and auth->service=S2
      - rpc.gssd replies to the first message. gss_pipe_downcall()
        calls __gss_find_upcall(pipe, U, NULL) and it finds the
        *second* message, as new messages are placed at the head
        of ->in_downcall, and the service type is not checked.
      - This second message is removed from ->in_downcall and freed
        by gss_release_msg() (even though it is still on pipe->pipe)
      - rpc.gssd tries to read another message, and dereferences a pointer
        to this message that has just been freed.
      
      I fix this by incrementing the reference count before calling
      rpc_queue_upcall(), and decrementing it if that fails, or normally in
      gss_pipe_destroy_msg().
      
      It seems strange that the reply doesn't target the message more
      precisely, but I don't know all the details.  In any case, I think the
      reference counting irregularity became a measureable bug when the
      extra arg was added to __gss_find_upcall(), hence the Fixes: line
      below.
      
      The second problem is that if rpc_queue_upcall() fails, the new
      message is not freed. gss_alloc_msg() set the ->count to 1,
      gss_add_msg() increments this to 2, gss_unhash_msg() decrements to 1,
      then the pointer is discarded so the memory never gets freed.
      
      Fixes: 9130b8db ("SUNRPC: allow for upcalls for same uid but different gss service")
      Cc: stable@vger.kernel.org
      Link: https://bugzilla.opensuse.org/show_bug.cgi?id=1011250Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      1cded9d2
  6. 07 12月, 2016 1 次提交
  7. 02 12月, 2016 1 次提交
    • N
      sunrpc: Don't engage exponential backoff when connection attempt is rejected. · 2c2ee6d2
      NeilBrown 提交于
      xs_connect() contains an exponential backoff mechanism so the repeated
      connection attempts are delayed by longer and longer amounts.
      
      This is appropriate when the connection failed due to a timeout, but
      it not appropriate when a definitive "no" answer is received.  In such
      cases, call_connect_status() imposes a minimum 3-second back-off, so
      not having the exponetial back-off will never result in immediate
      retries.
      
      The current situation is a problem when the NFS server tries to
      register with rpcbind but rpcbind isn't running.  All connection
      attempts are made on the same "xprt" and as the connection is never
      "closed", the exponential back delays successive attempts to register,
      or de-register, different protocols.  This results in a multi-minute
      delay with no benefit.
      
      So, when call_connect_status() receives a definitive "no", use
      xprt_conditional_disconnect() to cancel the previous connection attempt.
      This will set XPRT_CLOSE_WAIT so that xprt->ops->close() calls xs_close()
      which resets the reestablish_timeout.
      
      To ensure xprt_conditional_disconnect() does the right thing, we
      ensure that rq_connect_cookie is set before a connection attempt, and
      allow xprt_conditional_disconnect() to complete even when the
      transport is not fully connected.
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      2c2ee6d2
  8. 01 12月, 2016 10 次提交
    • C
      svcrdma: Further clean-up of svc_rdma_get_inv_rkey() · fafedf81
      Chuck Lever 提交于
      No longer any need for the dprintk().
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      fafedf81
    • C
      svcrdma: Break up dprintk format in svc_rdma_accept() · 07257450
      Chuck Lever 提交于
      The current code results in:
      
      Nov  7 14:50:19 klimt kernel: svcrdma: newxprt->sc_cm_id=ffff88085590c800,
       newxprt->sc_pd=ffff880852a7ce00#012    cm_id->device=ffff88084dd20000,
       sc_pd->device=ffff88084dd20000#012    cap.max_send_wr = 272#012
       cap.max_recv_wr = 34#012    cap.max_send_sge = 32#012
       cap.max_recv_sge = 32
      Nov  7 14:50:19 klimt kernel: svcrdma: new connection ffff880855908000
       accepted with the following attributes:#012    local_ip        :
       10.0.0.5#012    local_port#011     : 20049#012    remote_ip       :
       10.0.0.2#012    remote_port     : 59909#012    max_sge         : 32#012
       max_sge_rd      : 30#012    sq_depth        : 272#012    max_requests    :
       32#012    ord             : 16
      
      Split up the output over multiple dprintks and take the opportunity
      to fix the display of IPv6 addresses.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      07257450
    • C
      svcrdma: Remove unused variable in rdma_copy_tail() · f5426d37
      Chuck Lever 提交于
      Clean up.
      
      linux-2.6/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c: In function
       ‘rdma_copy_tail’:
      linux-2.6/net/sunrpc/xprtrdma/svc_rdma_recvfrom.c:376:6: warning:
       variable ‘ret’ set but not used [-Wunused-but-set-variable]
        int ret;
            ^
      
      Fixes: a97c331f ("svcrdma: Handle additional inline content")
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      f5426d37
    • C
      svcrdma: Remove unused variables in xprt_rdma_bc_allocate() · 9a16a34d
      Chuck Lever 提交于
      Clean up.
      
      /linux-2.6/net/sunrpc/xprtrdma/svc_rdma_backchannel.c: In function
       ‘xprt_rdma_bc_allocate’:
      linux-2.6/net/sunrpc/xprtrdma/svc_rdma_backchannel.c:169:23: warning:
       variable ‘rdma’ set but not used [-Wunused-but-set-variable]
        struct svcxprt_rdma *rdma;
                             ^
      
      Fixes: 5d252f90 ("svcrdma: Add class for RDMA backwards ...")
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      9a16a34d
    • C
      svcrdma: Remove svc_rdma_op_ctxt::wc_status · 96a58f9c
      Chuck Lever 提交于
      Clean up: Completion status is already reported in the individual
      completion handlers. Save a few bytes in struct svc_rdma_op_ctxt.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      96a58f9c
    • C
      svcrdma: Remove DMA map accounting · dd6fd213
      Chuck Lever 提交于
      Clean up: sc_dma_used is not required for correct operation. It is
      simply a debugging tool to report when svcrdma has leaked DMA maps.
      
      However, manipulating an atomic has a measurable CPU cost, and DMA
      map accounting specific to svcrdma will be meaningless once svcrdma
      is converted to use the new generic r/w API.
      
      A similar kind of debug accounting can be done simply by enabling
      the IOMMU or by using CONFIG_DMA_API_DEBUG, CONFIG_IOMMU_DEBUG, and
      CONFIG_IOMMU_LEAK.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      dd6fd213
    • C
      svcrdma: Remove BH-disabled spin locking in svc_rdma_send() · e4eb42ce
      Chuck Lever 提交于
      svcrdma's current SQ accounting algorithm takes sc_lock and disables
      bottom-halves while posting all RDMA Read, Write, and Send WRs.
      
      This is relatively heavyweight serialization. And note that Write and
      Send are already fully serialized by the xpt_mutex.
      
      Using a single atomic_t should be all that is necessary to guarantee
      that ib_post_send() is called only when there is enough space on the
      send queue. This is what the other RDMA-enabled storage targets do.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      e4eb42ce
    • C
      svcrdma: Renovate sendto chunk list parsing · 5fdca653
      Chuck Lever 提交于
      The current sendto code appears to support clients that provide only
      one of a Read list, a Write list, or a Reply chunk. My reading of
      that code is that it doesn't support the following cases:
      
       - Read list + Write list
       - Read list + Reply chunk
       - Write list + Reply chunk
       - Read list + Write list + Reply chunk
      
      The protocol allows more than one Read or Write chunk in those
      lists. Some clients do send a Read list and Reply chunk
      simultaneously. NFSv4 WRITE uses a Read list for the data payload,
      and a Reply chunk because the GETATTR result in the reply can
      contain a large object like an ACL.
      
      Generalize one of the sendto code paths needed to support all of
      the above cases, and attempt to ensure that only one pass is done
      through the RPC Call's transport header to gather chunk list
      information for building the reply.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      5fdca653
    • C
      svcauth_gss: Close connection when dropping an incoming message · 4d712ef1
      Chuck Lever 提交于
      S5.3.3.1 of RFC 2203 requires that an incoming GSS-wrapped message
      whose sequence number lies outside the current window is dropped.
      The rationale is:
      
        The reason for discarding requests silently is that the server
        is unable to determine if the duplicate or out of range request
        was due to a sequencing problem in the client, network, or the
        operating system, or due to some quirk in routing, or a replay
        attack by an intruder.  Discarding the request allows the client
        to recover after timing out, if indeed the duplication was
        unintentional or well intended.
      
      However, clients may rely on the server dropping the connection to
      indicate that a retransmit is needed. Without a connection reset, a
      client can wait forever without retransmitting, and the workload
      just stops dead. I've reproduced this behavior by running xfstests
      generic/323 on an NFSv4.0 mount with proto=rdma and sec=krb5i.
      
      To address this issue, have the server close the connection when it
      silently discards an incoming message due to a GSS sequence number
      problem.
      
      There are a few other places where the server will never reply.
      Change those spots in a similar fashion.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      4d712ef1
    • C
      svcrdma: Clear xpt_bc_xps in xprt_setup_rdma_bc() error exit arm · 1b9f700b
      Chuck Lever 提交于
      Logic copied from xs_setup_bc_tcp().
      
      Fixes: 39a9beab ('rpc: share one xps between all backchannels')
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      1b9f700b
  9. 30 11月, 2016 12 次提交
  10. 18 11月, 2016 1 次提交
    • A
      netns: make struct pernet_operations::id unsigned int · c7d03a00
      Alexey Dobriyan 提交于
      Make struct pernet_operations::id unsigned.
      
      There are 2 reasons to do so:
      
      1)
      This field is really an index into an zero based array and
      thus is unsigned entity. Using negative value is out-of-bound
      access by definition.
      
      2)
      On x86_64 unsigned 32-bit data which are mixed with pointers
      via array indexing or offsets added or subtracted to pointers
      are preffered to signed 32-bit data.
      
      "int" being used as an array index needs to be sign-extended
      to 64-bit before being used.
      
      	void f(long *p, int i)
      	{
      		g(p[i]);
      	}
      
        roughly translates to
      
      	movsx	rsi, esi
      	mov	rdi, [rsi+...]
      	call 	g
      
      MOVSX is 3 byte instruction which isn't necessary if the variable is
      unsigned because x86_64 is zero extending by default.
      
      Now, there is net_generic() function which, you guessed it right, uses
      "int" as an array index:
      
      	static inline void *net_generic(const struct net *net, int id)
      	{
      		...
      		ptr = ng->ptr[id - 1];
      		...
      	}
      
      And this function is used a lot, so those sign extensions add up.
      
      Patch snipes ~1730 bytes on allyesconfig kernel (without all junk
      messing with code generation):
      
      	add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
      
      Unfortunately some functions actually grow bigger.
      This is a semmingly random artefact of code generation with register
      allocator being used differently. gcc decides that some variable
      needs to live in new r8+ registers and every access now requires REX
      prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be
      used which is longer than [r8]
      
      However, overall balance is in negative direction:
      
      	add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
      	function                                     old     new   delta
      	nfsd4_lock                                  3886    3959     +73
      	tipc_link_build_proto_msg                   1096    1140     +44
      	mac80211_hwsim_new_radio                    2776    2808     +32
      	tipc_mon_rcv                                1032    1058     +26
      	svcauth_gss_legacy_init                     1413    1429     +16
      	tipc_bcbase_select_primary                   379     392     +13
      	nfsd4_exchange_id                           1247    1260     +13
      	nfsd4_setclientid_confirm                    782     793     +11
      		...
      	put_client_renew_locked                      494     480     -14
      	ip_set_sockfn_get                            730     716     -14
      	geneve_sock_add                              829     813     -16
      	nfsd4_sequence_done                          721     703     -18
      	nlmclnt_lookup_host                          708     686     -22
      	nfsd4_lockt                                 1085    1063     -22
      	nfs_get_client                              1077    1050     -27
      	tcf_bpf_init                                1106    1076     -30
      	nfsd4_encode_fattr                          5997    5930     -67
      	Total: Before=154856051, After=154854321, chg -0.00%
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c7d03a00
  11. 14 11月, 2016 1 次提交
    • S
      sunrpc: svc_age_temp_xprts_now should not call setsockopt non-tcp transports · ea08e392
      Scott Mayhew 提交于
      This fixes the following panic that can occur with NFSoRDMA.
      
      general protection fault: 0000 [#1] SMP
      Modules linked in: rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi
      scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp
      scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm
      mlx5_ib ib_core intel_powerclamp coretemp kvm_intel kvm sg ioatdma
      ipmi_devintf ipmi_ssif dcdbas iTCO_wdt iTCO_vendor_support pcspkr
      irqbypass sb_edac shpchp dca crc32_pclmul ghash_clmulni_intel edac_core
      lpc_ich aesni_intel lrw gf128mul glue_helper ablk_helper mei_me mei
      ipmi_si cryptd wmi ipmi_msghandler acpi_pad acpi_power_meter nfsd
      auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod
      crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper
      syscopyarea sysfillrect sysimgblt ahci fb_sys_fops ttm libahci mlx5_core
      tg3 crct10dif_pclmul drm crct10dif_common
      ptp i2c_core libata crc32c_intel pps_core fjes dm_mirror dm_region_hash
      dm_log dm_mod
      CPU: 1 PID: 120 Comm: kworker/1:1 Not tainted 3.10.0-514.el7.x86_64 #1
      Hardware name: Dell Inc. PowerEdge R320/0KM5PX, BIOS 2.4.2 01/29/2015
      Workqueue: events check_lifetime
      task: ffff88031f506dd0 ti: ffff88031f584000 task.ti: ffff88031f584000
      RIP: 0010:[<ffffffff8168d847>]  [<ffffffff8168d847>]
      _raw_spin_lock_bh+0x17/0x50
      RSP: 0018:ffff88031f587ba8  EFLAGS: 00010206
      RAX: 0000000000020000 RBX: 20041fac02080072 RCX: ffff88031f587fd8
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 20041fac02080072
      RBP: ffff88031f587bb0 R08: 0000000000000008 R09: ffffffff8155be77
      R10: ffff880322a59b00 R11: ffffea000bf39f00 R12: 20041fac02080072
      R13: 000000000000000d R14: ffff8800c4fbd800 R15: 0000000000000001
      FS:  0000000000000000(0000) GS:ffff880322a40000(0000)
      knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f3c52d4547e CR3: 00000000019ba000 CR4: 00000000001407e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Stack:
      20041fac02080002 ffff88031f587bd0 ffffffff81557830 20041fac02080002
      ffff88031f587c78 ffff88031f587c40 ffffffff8155ae08 000000010157df32
      0000000800000001 ffff88031f587c20 ffffffff81096acb ffffffff81aa37d0
      Call Trace:
      [<ffffffff81557830>] lock_sock_nested+0x20/0x50
      [<ffffffff8155ae08>] sock_setsockopt+0x78/0x940
      [<ffffffff81096acb>] ? lock_timer_base.isra.33+0x2b/0x50
      [<ffffffff8155397d>] kernel_setsockopt+0x4d/0x50
      [<ffffffffa0386284>] svc_age_temp_xprts_now+0x174/0x1e0 [sunrpc]
      [<ffffffffa03b681d>] nfsd_inetaddr_event+0x9d/0xd0 [nfsd]
      [<ffffffff81691ebc>] notifier_call_chain+0x4c/0x70
      [<ffffffff810b687d>] __blocking_notifier_call_chain+0x4d/0x70
      [<ffffffff810b68b6>] blocking_notifier_call_chain+0x16/0x20
      [<ffffffff815e8538>] __inet_del_ifa+0x168/0x2d0
      [<ffffffff815e8cef>] check_lifetime+0x25f/0x270
      [<ffffffff810a7f3b>] process_one_work+0x17b/0x470
      [<ffffffff810a8d76>] worker_thread+0x126/0x410
      [<ffffffff810a8c50>] ? rescuer_thread+0x460/0x460
      [<ffffffff810b052f>] kthread+0xcf/0xe0
      [<ffffffff810b0460>] ? kthread_create_on_node+0x140/0x140
      [<ffffffff81696418>] ret_from_fork+0x58/0x90
      [<ffffffff810b0460>] ? kthread_create_on_node+0x140/0x140
      Code: ca 75 f1 5d c3 0f 1f 80 00 00 00 00 eb d9 66 0f 1f 44 00 00 0f 1f
      44 00 00 55 48 89 e5 53 48 89 fb e8 7e 04 a0 ff b8 00 00 02 00 <f0> 0f
      c1 03 89 c2 c1 ea 10 66 39 c2 75 03 5b 5d c3 83 e2 fe 0f
      RIP  [<ffffffff8168d847>] _raw_spin_lock_bh+0x17/0x50
      RSP <ffff88031f587ba8>
      Signed-off-by: NScott Mayhew <smayhew@redhat.com>
      Fixes: c3d4879e ("sunrpc: Add a function to close temporary transports immediately")
      Reviewed-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      ea08e392
  12. 11 11月, 2016 1 次提交
  13. 08 11月, 2016 2 次提交
    • A
      SUNRPC: Fix suspicious RCU usage · bb29dd84
      Anna Schumaker 提交于
      We need to hold the rcu_read_lock() when calling rcu_dereference(),
      otherwise we can't guarantee that the object being dereferenced still
      exists.
      
      Fixes: 39e5d2df ("SUNRPC search xprt switch for sockaddr")
      Signed-off-by: NAnna Schumaker <Anna.Schumaker@Netapp.com>
      bb29dd84
    • P
      udp: do fwd memory scheduling on dequeue · 7c13f97f
      Paolo Abeni 提交于
      A new argument is added to __skb_recv_datagram to provide
      an explicit skb destructor, invoked under the receive queue
      lock.
      The UDP protocol uses such argument to perform memory
      reclaiming on dequeue, so that the UDP protocol does not
      set anymore skb->desctructor.
      Instead explicit memory reclaiming is performed at close() time and
      when skbs are removed from the receive queue.
      The in kernel UDP protocol users now need to call a
      skb_recv_udp() variant instead of skb_recv_datagram() to
      properly perform memory accounting on dequeue.
      
      Overall, this allows acquiring only once the receive queue
      lock on dequeue.
      
      Tested using pktgen with random src port, 64 bytes packet,
      wire-speed on a 10G link as sender and udp_sink as the receiver,
      using an l4 tuple rxhash to stress the contention, and one or more
      udp_sink instances with reuseport.
      
      nr sinks	vanilla		patched
      1		440		560
      3		2150		2300
      6		3650		3800
      9		4450		4600
      12		6250		6450
      
      v1 -> v2:
       - do rmem and allocated memory scheduling under the receive lock
       - do bulk scheduling in first_packet_length() and in udp_destruct_sock()
       - avoid the typdef for the dequeue callback
      Suggested-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7c13f97f
  14. 02 11月, 2016 2 次提交
  15. 29 10月, 2016 1 次提交
    • J
      sunrpc: fix some missing rq_rbuffer assignments · 18e601d6
      Jeff Layton 提交于
      We've been seeing some crashes in testing that look like this:
      
      BUG: unable to handle kernel NULL pointer dereference at           (null)
      IP: [<ffffffff8135ce99>] memcpy_orig+0x29/0x110
      PGD 212ca2067 PUD 212ca3067 PMD 0
      Oops: 0002 [#1] SMP
      Modules linked in: rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache ppdev parport_pc i2c_piix4 sg parport i2c_core virtio_balloon pcspkr acpi_cpufreq nfsd auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod ata_generic pata_acpi virtio_scsi 8139too ata_piix libata 8139cp mii virtio_pci floppy virtio_ring serio_raw virtio
      CPU: 1 PID: 1540 Comm: nfsd Not tainted 4.9.0-rc1 #39
      Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2007
      task: ffff88020d7ed200 task.stack: ffff880211838000
      RIP: 0010:[<ffffffff8135ce99>]  [<ffffffff8135ce99>] memcpy_orig+0x29/0x110
      RSP: 0018:ffff88021183bdd0  EFLAGS: 00010206
      RAX: 0000000000000000 RBX: ffff88020d7fa000 RCX: 000000f400000000
      RDX: 0000000000000014 RSI: ffff880212927020 RDI: 0000000000000000
      RBP: ffff88021183be30 R08: 01000000ef896996 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000000 R12: ffff880211704ca8
      R13: ffff88021473f000 R14: 00000000ef896996 R15: ffff880211704800
      FS:  0000000000000000(0000) GS:ffff88021fc80000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000000 CR3: 0000000212ca1000 CR4: 00000000000006e0
      Stack:
       ffffffffa01ea087 ffffffff63400001 ffff880215145e00 ffff880211bacd00
       ffff88021473f2b8 0000000000000004 00000000d0679d67 ffff880211bacd00
       ffff88020d7fa000 ffff88021473f000 0000000000000000 ffff88020d7faa30
      Call Trace:
       [<ffffffffa01ea087>] ? svc_tcp_recvfrom+0x5a7/0x790 [sunrpc]
       [<ffffffffa01f84d8>] svc_recv+0xad8/0xbd0 [sunrpc]
       [<ffffffffa0262d5e>] nfsd+0xde/0x160 [nfsd]
       [<ffffffffa0262c80>] ? nfsd_destroy+0x60/0x60 [nfsd]
       [<ffffffff810a9418>] kthread+0xd8/0xf0
       [<ffffffff816dbdbf>] ret_from_fork+0x1f/0x40
       [<ffffffff810a9340>] ? kthread_park+0x60/0x60
      Code: 00 00 48 89 f8 48 83 fa 20 72 7e 40 38 fe 7c 35 48 83 ea 20 48 83 ea 20 4c 8b 06 4c 8b 4e 08 4c 8b 56 10 4c 8b 5e 18 48 8d 76 20 <4c> 89 07 4c 89 4f 08 4c 89 57 10 4c 89 5f 18 48 8d 7f 20 73 d4
      RIP  [<ffffffff8135ce99>] memcpy_orig+0x29/0x110
       RSP <ffff88021183bdd0>
      CR2: 0000000000000000
      
      Both Bruce and Eryu ran a bisect here and found that the problematic
      patch was 68778945 (SUNRPC: Separate buffer pointers for RPC Call and
      Reply messages).
      
      That patch changed rpc_xdr_encode to use a new rq_rbuffer pointer to
      set up the receive buffer, but didn't change all of the necessary
      codepaths to set it properly. In particular the backchannel setup was
      missing.
      
      We need to set rq_rbuffer whenever rq_buffer is set. Ensure that it is.
      Reviewed-by: NChuck Lever <chuck.lever@oracle.com>
      Tested-by: NChuck Lever <chuck.lever@oracle.com>
      Reported-by: NEryu Guan <guaneryu@gmail.com>
      Tested-by: NEryu Guan <guaneryu@gmail.com>
      Fixes: 68778945 "SUNRPC: Separate buffer pointers..."
      Reported-by: NJ. Bruce Fields <bfields@fieldses.org>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      18e601d6
  16. 27 10月, 2016 1 次提交
    • J
      sunrpc: don't pass on-stack memory to sg_set_buf · 2876a344
      J. Bruce Fields 提交于
      As of ac4e97ab "scatterlist: sg_set_buf() argument must be in linear
      mapping", sg_set_buf hits a BUG when make_checksum_v2->xdr_process_buf,
      among other callers, passes it memory on the stack.
      
      We only need a scatterlist to pass this to the crypto code, and it seems
      like overkill to require kmalloc'd memory just to encrypt a few bytes,
      but for now this seems the best fix.
      
      Many of these callers are in the NFS write paths, so we allocate with
      GFP_NOFS.  It might be possible to do without allocations here entirely,
      but that would probably be a bigger project.
      
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      2876a344