1. 17 1月, 2019 1 次提交
    • V
      sunrpc: use-after-free in svc_process_common() · 44e7bab3
      Vasily Averin 提交于
      commit d4b09acf924b84bae77cad090a9d108e70b43643 upstream.
      
      if node have NFSv41+ mounts inside several net namespaces
      it can lead to use-after-free in svc_process_common()
      
      svc_process_common()
              /* Setup reply header */
              rqstp->rq_xprt->xpt_ops->xpo_prep_reply_hdr(rqstp); <<< HERE
      
      svc_process_common() can use incorrect rqstp->rq_xprt,
      its caller function bc_svc_process() takes it from serv->sv_bc_xprt.
      The problem is that serv is global structure but sv_bc_xprt
      is assigned per-netnamespace.
      
      According to Trond, the whole "let's set up rqstp->rq_xprt
      for the back channel" is nothing but a giant hack in order
      to work around the fact that svc_process_common() uses it
      to find the xpt_ops, and perform a couple of (meaningless
      for the back channel) tests of xpt_flags.
      
      All we really need in svc_process_common() is to be able to run
      rqstp->rq_xprt->xpt_ops->xpo_prep_reply_hdr()
      
      Bruce J Fields points that this xpo_prep_reply_hdr() call
      is an awfully roundabout way just to do "svc_putnl(resv, 0);"
      in the tcp case.
      
      This patch does not initialiuze rqstp->rq_xprt in bc_svc_process(),
      now it calls svc_process_common() with rqstp->rq_xprt = NULL.
      
      To adjust reply header svc_process_common() just check
      rqstp->rq_prot and calls svc_tcp_prep_reply_hdr() for tcp case.
      
      To handle rqstp->rq_xprt = NULL case in functions called from
      svc_process_common() patch intruduces net namespace pointer
      svc_rqst->rq_bc_net and adjust SVC_NET() definition.
      Some other function was also adopted to properly handle described case.
      Signed-off-by: NVasily Averin <vvs@virtuozzo.com>
      Cc: stable@vger.kernel.org
      Fixes: 23c20ecd ("NFS: callback up - users counting cleanup")
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      v2: added lost extern svc_tcp_prep_reply_hdr()
      Signed-off-by: NVasily Averin <vvs@virtuozzo.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      44e7bab3
  2. 10 1月, 2019 1 次提交
    • D
      sock: Make sock->sk_stamp thread-safe · 60f05ddd
      Deepa Dinamani 提交于
      [ Upstream commit 3a0ed3e9619738067214871e9cb826fa23b2ddb9 ]
      
      Al Viro mentioned (Message-ID
      <20170626041334.GZ10672@ZenIV.linux.org.uk>)
      that there is probably a race condition
      lurking in accesses of sk_stamp on 32-bit machines.
      
      sock->sk_stamp is of type ktime_t which is always an s64.
      On a 32 bit architecture, we might run into situations of
      unsafe access as the access to the field becomes non atomic.
      
      Use seqlocks for synchronization.
      This allows us to avoid using spinlocks for readers as
      readers do not need mutual exclusion.
      
      Another approach to solve this is to require sk_lock for all
      modifications of the timestamps. The current approach allows
      for timestamps to have their own lock: sk_stamp_lock.
      This allows for the patch to not compete with already
      existing critical sections, and side effects are limited
      to the paths in the patch.
      
      The addition of the new field maintains the data locality
      optimizations from
      commit 9115e8cd ("net: reorganize struct sock for better data
      locality")
      
      Note that all the instances of the sk_stamp accesses
      are either through the ioctl or the syscall recvmsg.
      Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      60f05ddd
  3. 04 4月, 2018 2 次提交
  4. 13 2月, 2018 1 次提交
    • D
      net: make getname() functions return length rather than use int* parameter · 9b2c45d4
      Denys Vlasenko 提交于
      Changes since v1:
      Added changes in these files:
          drivers/infiniband/hw/usnic/usnic_transport.c
          drivers/staging/lustre/lnet/lnet/lib-socket.c
          drivers/target/iscsi/iscsi_target_login.c
          drivers/vhost/net.c
          fs/dlm/lowcomms.c
          fs/ocfs2/cluster/tcp.c
          security/tomoyo/network.c
      
      Before:
      All these functions either return a negative error indicator,
      or store length of sockaddr into "int *socklen" parameter
      and return zero on success.
      
      "int *socklen" parameter is awkward. For example, if caller does not
      care, it still needs to provide on-stack storage for the value
      it does not need.
      
      None of the many FOO_getname() functions of various protocols
      ever used old value of *socklen. They always just overwrite it.
      
      This change drops this parameter, and makes all these functions, on success,
      return length of sockaddr. It's always >= 0 and can be differentiated
      from an error.
      
      Tests in callers are changed from "if (err)" to "if (err < 0)", where needed.
      
      rpc_sockname() lost "int buflen" parameter, since its only use was
      to be passed to kernel_getsockname() as &buflen and subsequently
      not used in any way.
      
      Userspace API is not changed.
      
          text    data     bss      dec     hex filename
      30108430 2633624  873672 33615726 200ef6e vmlinux.before.o
      30108109 2633612  873672 33615393 200ee21 vmlinux.o
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      CC: David S. Miller <davem@davemloft.net>
      CC: linux-kernel@vger.kernel.org
      CC: netdev@vger.kernel.org
      CC: linux-bluetooth@vger.kernel.org
      CC: linux-decnet-user@lists.sourceforge.net
      CC: linux-wireless@vger.kernel.org
      CC: linux-rdma@vger.kernel.org
      CC: linux-sctp@vger.kernel.org
      CC: linux-nfs@vger.kernel.org
      CC: linux-x25@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9b2c45d4
  5. 06 2月, 2018 1 次提交
  6. 03 12月, 2017 1 次提交
  7. 25 8月, 2017 2 次提交
    • C
      sunrpc: Const-ify instances of struct svc_xprt_ops · 2412e927
      Chuck Lever 提交于
      Close an attack vector by moving the arrays of server-side transport
      methods to read-only memory.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      2412e927
    • V
      net: sunrpc: svcsock: fix NULL-pointer exception · eebe53e8
      Vadim Lomovtsev 提交于
      While running nfs/connectathon tests kernel NULL-pointer exception
      has been observed due to races in svcsock.c.
      
      Race is appear when kernel accepts connection by kernel_accept
      (which creates new socket) and start queuing ingress packets
      to new socket. This happens in ksoftirq context which could run
      concurrently on a different core while new socket setup is not done yet.
      
      The fix is to re-order socket user data init sequence and add
      write/read barrier calls to be sure that we got proper values
      for callback pointers before actually calling them.
      
      Test results: nfs/connectathon reports '0' failed tests for about 200+ iterations.
      
      Crash log:
      ---<-snip->---
      [ 6708.638984] Unable to handle kernel NULL pointer dereference at virtual address 00000000
      [ 6708.647093] pgd = ffff0000094e0000
      [ 6708.650497] [00000000] *pgd=0000010ffff90003, *pud=0000010ffff90003, *pmd=0000010ffff80003, *pte=0000000000000000
      [ 6708.660761] Internal error: Oops: 86000005 [#1] SMP
      [ 6708.665630] Modules linked in: nfsv3 nfnetlink_queue nfnetlink_log nfnetlink rpcsec_gss_krb5 nfsv4 dns_resolver nfs fscache overlay xt_CONNSECMARK xt_SECMARK xt_conntrack iptable_security ip_tables ah4 xfrm4_mode_transport sctp tun binfmt_misc ext4 jbd2 mbcache loop tcp_diag udp_diag inet_diag rpcrdma ib_isert iscsi_target_mod ib_iser rdma_cm iw_cm libiscsi scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp ib_ipoib ib_ucm ib_uverbs ib_umad ib_cm ib_core nls_koi8_u nls_cp932 ts_kmp nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack vfat fat ghash_ce sha2_ce sha1_ce cavium_rng_vf i2c_thunderx sg thunderx_edac i2c_smbus edac_core cavium_rng nfsd auth_rpcgss nfs_acl lockd grace sunrpc xfs libcrc32c nicvf nicpf ast i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops
      [ 6708.736446]  ttm drm i2c_core thunder_bgx thunder_xcv mdio_thunder mdio_cavium dm_mirror dm_region_hash dm_log dm_mod [last unloaded: stap_3c300909c5b3f46dcacd49aab3334af_87021]
      [ 6708.752275] CPU: 84 PID: 0 Comm: swapper/84 Tainted: G        W  OE   4.11.0-4.el7.aarch64 #1
      [ 6708.760787] Hardware name: www.cavium.com CRB-2S/CRB-2S, BIOS 0.3 Mar 13 2017
      [ 6708.767910] task: ffff810006842e80 task.stack: ffff81000689c000
      [ 6708.773822] PC is at 0x0
      [ 6708.776739] LR is at svc_data_ready+0x38/0x88 [sunrpc]
      [ 6708.781866] pc : [<0000000000000000>] lr : [<ffff0000029d7378>] pstate: 60000145
      [ 6708.789248] sp : ffff810ffbad3900
      [ 6708.792551] x29: ffff810ffbad3900 x28: ffff000008c73d58
      [ 6708.797853] x27: 0000000000000000 x26: ffff81000bbe1e00
      [ 6708.803156] x25: 0000000000000020 x24: ffff800f7410bf28
      [ 6708.808458] x23: ffff000008c63000 x22: ffff000008c63000
      [ 6708.813760] x21: ffff800f7410bf28 x20: ffff81000bbe1e00
      [ 6708.819063] x19: ffff810012412400 x18: 00000000d82a9df2
      [ 6708.824365] x17: 0000000000000000 x16: 0000000000000000
      [ 6708.829667] x15: 0000000000000000 x14: 0000000000000001
      [ 6708.834969] x13: 0000000000000000 x12: 722e736f622e676e
      [ 6708.840271] x11: 00000000f814dd99 x10: 0000000000000000
      [ 6708.845573] x9 : 7374687225000000 x8 : 0000000000000000
      [ 6708.850875] x7 : 0000000000000000 x6 : 0000000000000000
      [ 6708.856177] x5 : 0000000000000028 x4 : 0000000000000000
      [ 6708.861479] x3 : 0000000000000000 x2 : 00000000e5000000
      [ 6708.866781] x1 : 0000000000000000 x0 : ffff81000bbe1e00
      [ 6708.872084]
      [ 6708.873565] Process swapper/84 (pid: 0, stack limit = 0xffff81000689c000)
      [ 6708.880341] Stack: (0xffff810ffbad3900 to 0xffff8100068a0000)
      [ 6708.886075] Call trace:
      [ 6708.888513] Exception stack(0xffff810ffbad3710 to 0xffff810ffbad3840)
      [ 6708.894942] 3700:                                   ffff810012412400 0001000000000000
      [ 6708.902759] 3720: ffff810ffbad3900 0000000000000000 0000000060000145 ffff800f79300000
      [ 6708.910577] 3740: ffff000009274d00 00000000000003ea 0000000000000015 ffff000008c63000
      [ 6708.918395] 3760: ffff810ffbad3830 ffff800f79300000 000000000000004d 0000000000000000
      [ 6708.926212] 3780: ffff810ffbad3890 ffff0000080f88dc ffff800f79300000 000000000000004d
      [ 6708.934030] 37a0: ffff800f7930093c ffff000008c63000 0000000000000000 0000000000000140
      [ 6708.941848] 37c0: ffff000008c2c000 0000000000040b00 ffff81000bbe1e00 0000000000000000
      [ 6708.949665] 37e0: 00000000e5000000 0000000000000000 0000000000000000 0000000000000028
      [ 6708.957483] 3800: 0000000000000000 0000000000000000 0000000000000000 7374687225000000
      [ 6708.965300] 3820: 0000000000000000 00000000f814dd99 722e736f622e676e 0000000000000000
      [ 6708.973117] [<          (null)>]           (null)
      [ 6708.977824] [<ffff0000086f9fa4>] tcp_data_queue+0x754/0xc5c
      [ 6708.983386] [<ffff0000086fa64c>] tcp_rcv_established+0x1a0/0x67c
      [ 6708.989384] [<ffff000008704120>] tcp_v4_do_rcv+0x15c/0x22c
      [ 6708.994858] [<ffff000008707418>] tcp_v4_rcv+0xaf0/0xb58
      [ 6709.000077] [<ffff0000086df784>] ip_local_deliver_finish+0x10c/0x254
      [ 6709.006419] [<ffff0000086dfea4>] ip_local_deliver+0xf0/0xfc
      [ 6709.011980] [<ffff0000086dfad4>] ip_rcv_finish+0x208/0x3a4
      [ 6709.017454] [<ffff0000086e018c>] ip_rcv+0x2dc/0x3c8
      [ 6709.022328] [<ffff000008692fc8>] __netif_receive_skb_core+0x2f8/0xa0c
      [ 6709.028758] [<ffff000008696068>] __netif_receive_skb+0x38/0x84
      [ 6709.034580] [<ffff00000869611c>] netif_receive_skb_internal+0x68/0xdc
      [ 6709.041010] [<ffff000008696bc0>] napi_gro_receive+0xcc/0x1a8
      [ 6709.046690] [<ffff0000014b0fc4>] nicvf_cq_intr_handler+0x59c/0x730 [nicvf]
      [ 6709.053559] [<ffff0000014b1380>] nicvf_poll+0x38/0xb8 [nicvf]
      [ 6709.059295] [<ffff000008697a6c>] net_rx_action+0x2f8/0x464
      [ 6709.064771] [<ffff000008081824>] __do_softirq+0x11c/0x308
      [ 6709.070164] [<ffff0000080d14e4>] irq_exit+0x12c/0x174
      [ 6709.075206] [<ffff00000813101c>] __handle_domain_irq+0x78/0xc4
      [ 6709.081027] [<ffff000008081608>] gic_handle_irq+0x94/0x190
      [ 6709.086501] Exception stack(0xffff81000689fdf0 to 0xffff81000689ff20)
      [ 6709.092929] fde0:                                   0000810ff2ec0000 ffff000008c10000
      [ 6709.100747] fe00: ffff000008c70ef4 0000000000000001 0000000000000000 ffff810ffbad9b18
      [ 6709.108565] fe20: ffff810ffbad9c70 ffff8100169d3800 ffff810006843ab0 ffff81000689fe80
      [ 6709.116382] fe40: 0000000000000bd0 0000ffffdf979cd0 183f5913da192500 0000ffff8a254ce4
      [ 6709.124200] fe60: 0000ffff8a254b78 0000aaab10339808 0000000000000000 0000ffff8a0c2a50
      [ 6709.132018] fe80: 0000ffffdf979b10 ffff000008d6d450 ffff000008c10000 ffff000008d6d000
      [ 6709.139836] fea0: 0000000000000054 ffff000008cd3dbc 0000000000000000 0000000000000000
      [ 6709.147653] fec0: 0000000000000000 0000000000000000 0000000000000000 ffff81000689ff20
      [ 6709.155471] fee0: ffff000008085240 ffff81000689ff20 ffff000008085244 0000000060000145
      [ 6709.163289] ff00: ffff81000689ff10 ffff00000813f1e4 ffffffffffffffff ffff00000813f238
      [ 6709.171107] [<ffff000008082eb4>] el1_irq+0xb4/0x140
      [ 6709.175976] [<ffff000008085244>] arch_cpu_idle+0x44/0x11c
      [ 6709.181368] [<ffff0000087bf3b8>] default_idle_call+0x20/0x30
      [ 6709.187020] [<ffff000008116d50>] do_idle+0x158/0x1e4
      [ 6709.191973] [<ffff000008116ff4>] cpu_startup_entry+0x2c/0x30
      [ 6709.197624] [<ffff00000808e7cc>] secondary_start_kernel+0x13c/0x160
      [ 6709.203878] [<0000000001bc71c4>] 0x1bc71c4
      [ 6709.207967] Code: bad PC value
      [ 6709.211061] SMP: stopping secondary CPUs
      [ 6709.218830] Starting crashdump kernel...
      [ 6709.222749] Bye!
      ---<-snip>---
      Signed-off-by: NVadim Lomovtsev <vlomovts@redhat.com>
      Reviewed-by: NJeff Layton <jlayton@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      eebe53e8
  8. 19 8月, 2017 1 次提交
  9. 10 3月, 2017 1 次提交
  10. 28 2月, 2017 1 次提交
  11. 25 2月, 2017 1 次提交
  12. 26 12月, 2016 1 次提交
    • T
      ktime: Get rid of the union · 2456e855
      Thomas Gleixner 提交于
      ktime is a union because the initial implementation stored the time in
      scalar nanoseconds on 64 bit machine and in a endianess optimized timespec
      variant for 32bit machines. The Y2038 cleanup removed the timespec variant
      and switched everything to scalar nanoseconds. The union remained, but
      become completely pointless.
      
      Get rid of the union and just keep ktime_t as simple typedef of type s64.
      
      The conversion was done with coccinelle and some manual mopping up.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      2456e855
  13. 25 12月, 2016 1 次提交
  14. 14 11月, 2016 1 次提交
    • S
      sunrpc: svc_age_temp_xprts_now should not call setsockopt non-tcp transports · ea08e392
      Scott Mayhew 提交于
      This fixes the following panic that can occur with NFSoRDMA.
      
      general protection fault: 0000 [#1] SMP
      Modules linked in: rpcrdma ib_isert iscsi_target_mod ib_iser libiscsi
      scsi_transport_iscsi ib_srpt target_core_mod ib_srp scsi_transport_srp
      scsi_tgt ib_ipoib rdma_ucm ib_ucm ib_uverbs ib_umad rdma_cm ib_cm iw_cm
      mlx5_ib ib_core intel_powerclamp coretemp kvm_intel kvm sg ioatdma
      ipmi_devintf ipmi_ssif dcdbas iTCO_wdt iTCO_vendor_support pcspkr
      irqbypass sb_edac shpchp dca crc32_pclmul ghash_clmulni_intel edac_core
      lpc_ich aesni_intel lrw gf128mul glue_helper ablk_helper mei_me mei
      ipmi_si cryptd wmi ipmi_msghandler acpi_pad acpi_power_meter nfsd
      auth_rpcgss nfs_acl lockd grace sunrpc ip_tables xfs libcrc32c sd_mod
      crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper
      syscopyarea sysfillrect sysimgblt ahci fb_sys_fops ttm libahci mlx5_core
      tg3 crct10dif_pclmul drm crct10dif_common
      ptp i2c_core libata crc32c_intel pps_core fjes dm_mirror dm_region_hash
      dm_log dm_mod
      CPU: 1 PID: 120 Comm: kworker/1:1 Not tainted 3.10.0-514.el7.x86_64 #1
      Hardware name: Dell Inc. PowerEdge R320/0KM5PX, BIOS 2.4.2 01/29/2015
      Workqueue: events check_lifetime
      task: ffff88031f506dd0 ti: ffff88031f584000 task.ti: ffff88031f584000
      RIP: 0010:[<ffffffff8168d847>]  [<ffffffff8168d847>]
      _raw_spin_lock_bh+0x17/0x50
      RSP: 0018:ffff88031f587ba8  EFLAGS: 00010206
      RAX: 0000000000020000 RBX: 20041fac02080072 RCX: ffff88031f587fd8
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: 20041fac02080072
      RBP: ffff88031f587bb0 R08: 0000000000000008 R09: ffffffff8155be77
      R10: ffff880322a59b00 R11: ffffea000bf39f00 R12: 20041fac02080072
      R13: 000000000000000d R14: ffff8800c4fbd800 R15: 0000000000000001
      FS:  0000000000000000(0000) GS:ffff880322a40000(0000)
      knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f3c52d4547e CR3: 00000000019ba000 CR4: 00000000001407e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Stack:
      20041fac02080002 ffff88031f587bd0 ffffffff81557830 20041fac02080002
      ffff88031f587c78 ffff88031f587c40 ffffffff8155ae08 000000010157df32
      0000000800000001 ffff88031f587c20 ffffffff81096acb ffffffff81aa37d0
      Call Trace:
      [<ffffffff81557830>] lock_sock_nested+0x20/0x50
      [<ffffffff8155ae08>] sock_setsockopt+0x78/0x940
      [<ffffffff81096acb>] ? lock_timer_base.isra.33+0x2b/0x50
      [<ffffffff8155397d>] kernel_setsockopt+0x4d/0x50
      [<ffffffffa0386284>] svc_age_temp_xprts_now+0x174/0x1e0 [sunrpc]
      [<ffffffffa03b681d>] nfsd_inetaddr_event+0x9d/0xd0 [nfsd]
      [<ffffffff81691ebc>] notifier_call_chain+0x4c/0x70
      [<ffffffff810b687d>] __blocking_notifier_call_chain+0x4d/0x70
      [<ffffffff810b68b6>] blocking_notifier_call_chain+0x16/0x20
      [<ffffffff815e8538>] __inet_del_ifa+0x168/0x2d0
      [<ffffffff815e8cef>] check_lifetime+0x25f/0x270
      [<ffffffff810a7f3b>] process_one_work+0x17b/0x470
      [<ffffffff810a8d76>] worker_thread+0x126/0x410
      [<ffffffff810a8c50>] ? rescuer_thread+0x460/0x460
      [<ffffffff810b052f>] kthread+0xcf/0xe0
      [<ffffffff810b0460>] ? kthread_create_on_node+0x140/0x140
      [<ffffffff81696418>] ret_from_fork+0x58/0x90
      [<ffffffff810b0460>] ? kthread_create_on_node+0x140/0x140
      Code: ca 75 f1 5d c3 0f 1f 80 00 00 00 00 eb d9 66 0f 1f 44 00 00 0f 1f
      44 00 00 55 48 89 e5 53 48 89 fb e8 7e 04 a0 ff b8 00 00 02 00 <f0> 0f
      c1 03 89 c2 c1 ea 10 66 39 c2 75 03 5b 5d c3 83 e2 fe 0f
      RIP  [<ffffffff8168d847>] _raw_spin_lock_bh+0x17/0x50
      RSP <ffff88031f587ba8>
      Signed-off-by: NScott Mayhew <smayhew@redhat.com>
      Fixes: c3d4879e ("sunrpc: Add a function to close temporary transports immediately")
      Reviewed-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      ea08e392
  15. 08 11月, 2016 1 次提交
    • P
      udp: do fwd memory scheduling on dequeue · 7c13f97f
      Paolo Abeni 提交于
      A new argument is added to __skb_recv_datagram to provide
      an explicit skb destructor, invoked under the receive queue
      lock.
      The UDP protocol uses such argument to perform memory
      reclaiming on dequeue, so that the UDP protocol does not
      set anymore skb->desctructor.
      Instead explicit memory reclaiming is performed at close() time and
      when skbs are removed from the receive queue.
      The in kernel UDP protocol users now need to call a
      skb_recv_udp() variant instead of skb_recv_datagram() to
      properly perform memory accounting on dequeue.
      
      Overall, this allows acquiring only once the receive queue
      lock on dequeue.
      
      Tested using pktgen with random src port, 64 bytes packet,
      wire-speed on a 10G link as sender and udp_sink as the receiver,
      using an l4 tuple rxhash to stress the contention, and one or more
      udp_sink instances with reuseport.
      
      nr sinks	vanilla		patched
      1		440		560
      3		2150		2300
      6		3650		3800
      9		4450		4600
      12		6250		6450
      
      v1 -> v2:
       - do rmem and allocated memory scheduling under the receive lock
       - do bulk scheduling in first_packet_length() and in udp_destruct_sock()
       - avoid the typdef for the dequeue callback
      Suggested-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7c13f97f
  16. 23 10月, 2016 1 次提交
    • P
      udp: use it's own memory accounting schema · 850cbadd
      Paolo Abeni 提交于
      Completely avoid default sock memory accounting and replace it
      with udp-specific accounting.
      
      Since the new memory accounting model encapsulates completely
      the required locking, remove the socket lock on both enqueue and
      dequeue, and avoid using the backlog on enqueue.
      
      Be sure to clean-up rx queue memory on socket destruction, using
      udp its own sk_destruct.
      
      Tested using pktgen with random src port, 64 bytes packet,
      wire-speed on a 10G link as sender and udp_sink as the receiver,
      using an l4 tuple rxhash to stress the contention, and one or more
      udp_sink instances with reuseport.
      
      nr readers      Kpps (vanilla)  Kpps (patched)
      1               170             440
      3               1250            2150
      6               3000            3650
      9               4200            4450
      12              5700            6250
      
      v4 -> v5:
        - avoid unneeded test in first_packet_length
      
      v3 -> v4:
        - remove useless sk_rcvqueues_full() call
      
      v2 -> v3:
        - do not set the now unsed backlog_rcv callback
      
      v1 -> v2:
        - add memory pressure support
        - fixed dropwatch accounting for ipv6
      Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      850cbadd
  17. 02 8月, 2016 2 次提交
    • T
      SUNRPC: Detect immediate closure of accepted sockets · c7995f8a
      Trond Myklebust 提交于
      This modification is useful for debugging issues that happen while
      the socket is being initialised.
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      c7995f8a
    • T
      SUNRPC: accept() may return sockets that are still in SYN_RECV · b2f21f7d
      Trond Myklebust 提交于
      We're seeing traces of the following form:
      
       [10952.396347] svc: transport ffff88042ba4a 000 dequeued, inuse=2
       [10952.396351] svc: tcp_accept ffff88042ba4 a000 sock ffff88042a6e4c80
       [10952.396362] nfsd: connect from 10.2.6.1, port=187
       [10952.396364] svc: svc_setup_socket ffff8800b99bcf00
       [10952.396368] setting up TCP socket for reading
       [10952.396370] svc: svc_setup_socket created ffff8803eb10a000 (inet ffff88042b75b800)
       [10952.396373] svc: transport ffff8803eb10a000 put into queue
       [10952.396375] svc: transport ffff88042ba4a000 put into queue
       [10952.396377] svc: server ffff8800bb0ec000 waiting for data (to = 3600000)
       [10952.396380] svc: transport ffff8803eb10a000 dequeued, inuse=2
       [10952.396381] svc_recv: found XPT_CLOSE
       [10952.396397] svc: svc_delete_xprt(ffff8803eb10a000)
       [10952.396398] svc: svc_tcp_sock_detach(ffff8803eb10a000)
       [10952.396399] svc: svc_sock_detach(ffff8803eb10a000)
       [10952.396412] svc: svc_sock_free(ffff8803eb10a000)
      
      i.e. an immediate close of the socket after initialisation.
      
      The culprit appears to be the test at the end of svc_tcp_init, which
      checks if the newly created socket is in the TCP_ESTABLISHED state,
      and immediately closes it if not. The evidence appears to suggest that
      the socket might still be in the SYN_RECV state at this time.
      
      The fix is to check for both states, and then to add a check in
      svc_tcp_state_change() to ensure we don't close the socket when
      it transitions into TCP_ESTABLISHED.
      Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      b2f21f7d
  18. 14 7月, 2016 4 次提交
  19. 14 4月, 2016 1 次提交
  20. 12 4月, 2016 1 次提交
  21. 11 11月, 2015 1 次提交
  22. 10 11月, 2015 1 次提交
  23. 24 10月, 2015 1 次提交
  24. 12 4月, 2015 1 次提交
  25. 10 12月, 2014 1 次提交
  26. 20 11月, 2014 1 次提交
  27. 29 8月, 2014 1 次提交
  28. 18 8月, 2014 1 次提交
  29. 30 7月, 2014 2 次提交
  30. 18 7月, 2014 1 次提交
  31. 31 5月, 2014 1 次提交
  32. 23 5月, 2014 2 次提交