1. 01 3月, 2018 5 次提交
  2. 28 2月, 2018 4 次提交
    • B
      nvme-multipath: fix sysfs dangerously created links · 9bd82b1a
      Baegjae Sung 提交于
      If multipathing is enabled, each NVMe subsystem creates a head
      namespace (e.g., nvme0n1) and multiple private namespaces
      (e.g., nvme0c0n1 and nvme0c1n1) in sysfs. When creating links for
      private namespaces, links of head namespace are used, so the
      namespace creation order must be followed (e.g., nvme0n1 ->
      nvme0c1n1). If the order is not followed, links of sysfs will be
      incomplete or kernel panic will occur.
      
      The kernel panic was:
        kernel BUG at fs/sysfs/symlink.c:27!
        Call Trace:
          nvme_mpath_add_disk_links+0x5d/0x80 [nvme_core]
          nvme_validate_ns+0x5c2/0x850 [nvme_core]
          nvme_scan_work+0x1af/0x2d0 [nvme_core]
      
      Correct order
      Context A     Context B
      nvme0n1
      nvme0c0n1     nvme0c1n1
      
      Incorrect order
      Context A     Context B
                    nvme0c1n1
      nvme0n1
      nvme0c0n1
      
      The nvme_mpath_add_disk (for creating head namespace) is called
      just before the nvme_mpath_add_disk_links (for creating private
      namespaces). In nvme_mpath_add_disk, the first context acquires
      the lock of subsystem and creates a head namespace, and other
      contexts do nothing by checking GENHD_FL_UP of a head namespace
      after waiting to acquire the lock. We verified the code with or
      without multipathing using three vendors of dual-port NVMe SSDs.
      Signed-off-by: NBaegjae Sung <baegjae@gmail.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      9bd82b1a
    • G
      nbd: fix return value in error handling path · 0979962f
      Gustavo A. R. Silva 提交于
      It seems that the proper value to return in this particular case is the
      one contained into variable new_index instead of ret.
      
      Addresses-Coverity-ID: 1465148 ("Copy-paste error")
      Fixes: e46c7287 ("nbd: add a basic netlink interface")
      Reviewed-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      0979962f
    • T
      bcache: fix kcrashes with fio in RAID5 backend dev · 60eb34ec
      Tang Junhui 提交于
      Kernel crashed when run fio in a RAID5 backend bcache device, the call
      trace is bellow:
      [  440.012034] kernel BUG at block/blk-ioc.c:146!
      [  440.012696] invalid opcode: 0000 [#1] SMP NOPTI
      [  440.026537] CPU: 2 PID: 2205 Comm: md127_raid5 Not tainted 4.15.0 #8
      [  440.027441] Hardware name: HP ProLiant MicroServer Gen8, BIOS J06 07/16
      /2015
      [  440.028615] RIP: 0010:put_io_context+0x8b/0x90
      [  440.029246] RSP: 0018:ffffa8c882b43af8 EFLAGS: 00010246
      [  440.029990] RAX: 0000000000000000 RBX: ffffa8c88294fca0 RCX: 0000000000
      0f4240
      [  440.031006] RDX: 0000000000000004 RSI: 0000000000000286 RDI: ffffa8c882
      94fca0
      [  440.032030] RBP: ffffa8c882b43b10 R08: 0000000000000003 R09: ffff949cb8
      0c1700
      [  440.033206] R10: 0000000000000104 R11: 000000000000b71c R12: 00000000000
      01000
      [  440.034222] R13: 0000000000000000 R14: ffff949cad84db70 R15: ffff949cb11
      bd1e0
      [  440.035239] FS:  0000000000000000(0000) GS:ffff949cba280000(0000) knlGS:
      0000000000000000
      [  440.060190] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  440.084967] CR2: 00007ff0493ef000 CR3: 00000002f1e0a002 CR4: 00000000001
      606e0
      [  440.110498] Call Trace:
      [  440.135443]  bio_disassociate_task+0x1b/0x60
      [  440.160355]  bio_free+0x1b/0x60
      [  440.184666]  bio_put+0x23/0x30
      [  440.208272]  search_free+0x23/0x40 [bcache]
      [  440.231448]  cached_dev_write_complete+0x31/0x70 [bcache]
      [  440.254468]  closure_put+0xb6/0xd0 [bcache]
      [  440.277087]  request_endio+0x30/0x40 [bcache]
      [  440.298703]  bio_endio+0xa1/0x120
      [  440.319644]  handle_stripe+0x418/0x2270 [raid456]
      [  440.340614]  ? load_balance+0x17b/0x9c0
      [  440.360506]  handle_active_stripes.isra.58+0x387/0x5a0 [raid456]
      [  440.380675]  ? __release_stripe+0x15/0x20 [raid456]
      [  440.400132]  raid5d+0x3ed/0x5d0 [raid456]
      [  440.419193]  ? schedule+0x36/0x80
      [  440.437932]  ? schedule_timeout+0x1d2/0x2f0
      [  440.456136]  md_thread+0x122/0x150
      [  440.473687]  ? wait_woken+0x80/0x80
      [  440.491411]  kthread+0x102/0x140
      [  440.508636]  ? find_pers+0x70/0x70
      [  440.524927]  ? kthread_associate_blkcg+0xa0/0xa0
      [  440.541791]  ret_from_fork+0x35/0x40
      [  440.558020] Code: c2 48 00 5b 41 5c 41 5d 5d c3 48 89 c6 4c 89 e7 e8 bb c2
      48 00 48 8b 3d bc 36 4b 01 48 89 de e8 7c f7 e0 ff 5b 41 5c 41 5d 5d c3 <0f> 0b
      0f 1f 00 0f 1f 44 00 00 55 48 8d 47 b8 48 89 e5 41 57 41
      [  440.610020] RIP: put_io_context+0x8b/0x90 RSP: ffffa8c882b43af8
      [  440.628575] ---[ end trace a1fd79d85643a73e ]--
      
      All the crash issue happened when a bypass IO coming, in such scenario
      s->iop.bio is pointed to the s->orig_bio. In search_free(), it finishes the
      s->orig_bio by calling bio_complete(), and after that, s->iop.bio became
      invalid, then kernel would crash when calling bio_put(). Maybe its upper
      layer's faulty, since bio should not be freed before we calling bio_put(),
      but we'd better calling bio_put() first before calling bio_complete() to
      notify upper layer ending this bio.
      
      This patch moves bio_complete() under bio_put() to avoid kernel crash.
      
      [mlyle: fixed commit subject for character limits]
      Reported-by: NMatthias Ferdinand <bcache@mfedv.net>
      Tested-by: NMatthias Ferdinand <bcache@mfedv.net>
      Signed-off-by: NTang Junhui <tang.junhui@zte.com.cn>
      Reviewed-by: NMichael Lyle <mlyle@lyle.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      60eb34ec
    • C
      bcache: correct flash only vols (check all uuids) · 02aa8a8b
      Coly Li 提交于
      Commit 2831231d ("bcache: reduce cache_set devices iteration by
      devices_max_used") adds c->devices_max_used to reduce iteration of
      c->uuids elements, this value is updated in bcache_device_attach().
      
      But for flash only volume, when calling flash_devs_run(), the function
      bcache_device_attach() is not called yet and c->devices_max_used is not
      updated. The unexpected result is, the flash only volume won't be run
      by flash_devs_run().
      
      This patch fixes the issue by iterate all c->uuids elements in
      flash_devs_run(). c->devices_max_used will be updated properly when
      bcache_device_attach() gets called.
      
      [mlyle: commit subject edited for character limit]
      
      Fixes: 2831231d ("bcache: reduce cache_set devices iteration by devices_max_used")
      Reported-by: NTang Junhui <tang.junhui@zte.com.cn>
      Signed-off-by: NColy Li <colyli@suse.de>
      Reviewed-by: NMichael Lyle <mlyle@lyle.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      02aa8a8b
  3. 27 2月, 2018 1 次提交
  4. 26 2月, 2018 1 次提交
  5. 22 2月, 2018 5 次提交
    • C
      nvmet-loop: use blk_rq_payload_bytes for sgl selection · 796b0b8d
      Christoph Hellwig 提交于
      blk_rq_bytes does the wrong thing for special payloads like discards and
      might cause the driver to not set up a SGL.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      796b0b8d
    • C
      nvme-rdma: use blk_rq_payload_bytes instead of blk_rq_bytes · 0d309923
      Christoph Hellwig 提交于
      blk_rq_bytes does the wrong thing for special payloads like discards and
      might cause the driver to not set up a SGL.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      0d309923
    • C
      nvme-fabrics: don't check for non-NULL module in nvmf_register_transport · 5a1e5953
      Christoph Hellwig 提交于
      THIS_MODULE evaluates to NULL when used from code built into the kernel,
      thus breaking built-in transport modules.  Remove the bogus check.
      
      Fixes: 0de5cd36 ("nvme-fabrics: protect against module unload during create_ctrl")
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      5a1e5953
    • H
      mm, swap, frontswap: fix THP swap if frontswap enabled · 7ba71669
      Huang Ying 提交于
      It was reported by Sergey Senozhatsky that if THP (Transparent Huge
      Page) and frontswap (via zswap) are both enabled, when memory goes low
      so that swap is triggered, segfault and memory corruption will occur in
      random user space applications as follow,
      
      kernel: urxvt[338]: segfault at 20 ip 00007fc08889ae0d sp 00007ffc73a7fc40 error 6 in libc-2.26.so[7fc08881a000+1ae000]
       #0  0x00007fc08889ae0d _int_malloc (libc.so.6)
       #1  0x00007fc08889c2f3 malloc (libc.so.6)
       #2  0x0000560e6004bff7 _Z14rxvt_wcstoutf8PKwi (urxvt)
       #3  0x0000560e6005e75c n/a (urxvt)
       #4  0x0000560e6007d9f1 _ZN16rxvt_perl_interp6invokeEP9rxvt_term9hook_typez (urxvt)
       #5  0x0000560e6003d988 _ZN9rxvt_term9cmd_parseEv (urxvt)
       #6  0x0000560e60042804 _ZN9rxvt_term6pty_cbERN2ev2ioEi (urxvt)
       #7  0x0000560e6005c10f _Z17ev_invoke_pendingv (urxvt)
       #8  0x0000560e6005cb55 ev_run (urxvt)
       #9  0x0000560e6003b9b9 main (urxvt)
       #10 0x00007fc08883af4a __libc_start_main (libc.so.6)
       #11 0x0000560e6003f9da _start (urxvt)
      
      After bisection, it was found the first bad commit is bd4c82c2 ("mm,
      THP, swap: delay splitting THP after swapped out").
      
      The root cause is as follows:
      
      When the pages are written to swap device during swapping out in
      swap_writepage(), zswap (fontswap) is tried to compress the pages to
      improve performance.  But zswap (frontswap) will treat THP as a normal
      page, so only the head page is saved.  After swapping in, tail pages
      will not be restored to their original contents, causing memory
      corruption in the applications.
      
      This is fixed by refusing to save page in the frontswap store functions
      if the page is a THP.  So that the THP will be swapped out to swap
      device.
      
      Another choice is to split THP if frontswap is enabled.  But it is found
      that the frontswap enabling isn't flexible.  For example, if
      CONFIG_ZSWAP=y (cannot be module), frontswap will be enabled even if
      zswap itself isn't enabled.
      
      Frontswap has multiple backends, to make it easy for one backend to
      enable THP support, the THP checking is put in backend frontswap store
      functions instead of the general interfaces.
      
      Link: http://lkml.kernel.org/r/20180209084947.22749-1-ying.huang@intel.com
      Fixes: bd4c82c2 ("mm, THP, swap: delay splitting THP after swapped out")
      Signed-off-by: N"Huang, Ying" <ying.huang@intel.com>
      Reported-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Tested-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Suggested-by: Minchan Kim <minchan@kernel.org>	[put THP checking in backend]
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Cc: Seth Jennings <sjenning@redhat.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: <stable@vger.kernel.org>	[4.14]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7ba71669
    • L
      RDMA/uverbs: Fix kernel panic while using XRC_TGT QP type · f4576587
      Leon Romanovsky 提交于
      Attempt to modify XRC_TGT QP type from the user space (ibv_xsrq_pingpong
      invocation) will trigger the following kernel panic. It is caused by the
      fact that such QPs missed uobject initialization.
      
      [   17.408845] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
      [   17.412645] IP: rdma_lookup_put_uobject+0x9/0x50
      [   17.416567] PGD 0 P4D 0
      [   17.419262] Oops: 0000 [#1] SMP PTI
      [   17.422915] CPU: 0 PID: 455 Comm: ibv_xsrq_pingpo Not tainted 4.16.0-rc1+ #86
      [   17.424765] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
      [   17.427399] RIP: 0010:rdma_lookup_put_uobject+0x9/0x50
      [   17.428445] RSP: 0018:ffffb8c7401e7c90 EFLAGS: 00010246
      [   17.429543] RAX: 0000000000000000 RBX: ffffb8c7401e7cf8 RCX: 0000000000000000
      [   17.432426] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000000
      [   17.437448] RBP: 0000000000000000 R08: 00000000000218f0 R09: ffffffff8ebc4cac
      [   17.440223] R10: fffff6038052cd80 R11: ffff967694b36400 R12: ffff96769391f800
      [   17.442184] R13: ffffb8c7401e7cd8 R14: 0000000000000000 R15: ffff967699f60000
      [   17.443971] FS:  00007fc29207d700(0000) GS:ffff96769fc00000(0000) knlGS:0000000000000000
      [   17.446623] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   17.448059] CR2: 0000000000000048 CR3: 000000001397a000 CR4: 00000000000006b0
      [   17.449677] Call Trace:
      [   17.450247]  modify_qp.isra.20+0x219/0x2f0
      [   17.451151]  ib_uverbs_modify_qp+0x90/0xe0
      [   17.452126]  ib_uverbs_write+0x1d2/0x3c0
      [   17.453897]  ? __handle_mm_fault+0x93c/0xe40
      [   17.454938]  __vfs_write+0x36/0x180
      [   17.455875]  vfs_write+0xad/0x1e0
      [   17.456766]  SyS_write+0x52/0xc0
      [   17.457632]  do_syscall_64+0x75/0x180
      [   17.458631]  entry_SYSCALL_64_after_hwframe+0x21/0x86
      [   17.460004] RIP: 0033:0x7fc29198f5a0
      [   17.460982] RSP: 002b:00007ffccc71f018 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      [   17.463043] RAX: ffffffffffffffda RBX: 0000000000000078 RCX: 00007fc29198f5a0
      [   17.464581] RDX: 0000000000000078 RSI: 00007ffccc71f050 RDI: 0000000000000003
      [   17.466148] RBP: 0000000000000000 R08: 0000000000000078 R09: 00007ffccc71f050
      [   17.467750] R10: 000055b6cf87c248 R11: 0000000000000246 R12: 00007ffccc71f300
      [   17.469541] R13: 000055b6cf8733a0 R14: 0000000000000000 R15: 0000000000000000
      [   17.471151] Code: 00 00 0f 1f 44 00 00 48 8b 47 48 48 8b 00 48 8b 40 10 e9 0b 8b 68 00 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 53 89 f5 <48> 8b 47 48 48 89 fb 40 0f b6 f6 48 8b 00 48 8b 40 20 e8 e0 8a
      [   17.475185] RIP: rdma_lookup_put_uobject+0x9/0x50 RSP: ffffb8c7401e7c90
      [   17.476841] CR2: 0000000000000048
      [   17.477764] ---[ end trace 1dbcc5354071a712 ]---
      [   17.478880] Kernel panic - not syncing: Fatal exception
      [   17.480277] Kernel Offset: 0xd000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
      
      Fixes: 2f08ee36 ("RDMA/restrack: don't use uaccess_kernel()")
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      f4576587
  6. 21 2月, 2018 5 次提交
  7. 20 2月, 2018 9 次提交
  8. 18 2月, 2018 1 次提交
  9. 17 2月, 2018 9 次提交
    • L
      iio: adis_lib: Initialize trigger before requesting interrupt · f027e0b3
      Lars-Peter Clausen 提交于
      The adis_probe_trigger() creates a new IIO trigger and requests an
      interrupt associated with the trigger. The interrupt uses the generic
      iio_trigger_generic_data_rdy_poll() function as its interrupt handler.
      
      Currently the driver initializes some fields of the trigger structure after
      the interrupt has been requested. But an interrupt can fire as soon as it
      has been requested. This opens up a race condition.
      
      iio_trigger_generic_data_rdy_poll() will access the trigger data structure
      and dereference the ops field. If the ops field is not yet initialized this
      will result in a NULL pointer deref.
      
      It is not expected that the device generates an interrupt at this point, so
      typically this issue did not surface unless e.g. due to a hardware
      misconfiguration (wrong interrupt number, wrong polarity, etc.).
      
      But some newer devices from the ADIS family start to generate periodic
      interrupts in their power-on reset configuration and unfortunately the
      interrupt can not be masked in the device.  This makes the race condition
      much more visible and the following crash has been observed occasionally
      when booting a system using the ADIS16460.
      
      	Unable to handle kernel NULL pointer dereference at virtual address 00000008
      	pgd = c0004000
      	[00000008] *pgd=00000000
      	Internal error: Oops: 5 [#1] PREEMPT SMP ARM
      	Modules linked in:
      	CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.9.0-04126-gf9739f0-dirty #257
      	Hardware name: Xilinx Zynq Platform
      	task: ef04f640 task.stack: ef050000
      	PC is at iio_trigger_notify_done+0x30/0x68
      	LR is at iio_trigger_generic_data_rdy_poll+0x18/0x20
      	pc : [<c042d868>]    lr : [<c042d924>]    psr: 60000193
      	sp : ef051bb8  ip : 00000000  fp : ef106400
      	r10: c081d80a  r9 : ef3bfa00  r8 : 00000087
      	r7 : ef051bec  r6 : 00000000  r5 : ef3bfa00  r4 : ee92ab00
      	r3 : 00000000  r2 : 00000000  r1 : 00000000  r0 : ee97e400
      	Flags: nZCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment none
      	Control: 18c5387d  Table: 0000404a  DAC: 00000051
      	Process swapper/0 (pid: 1, stack limit = 0xef050210)
      	[<c042d868>] (iio_trigger_notify_done) from [<c0065b10>] (__handle_irq_event_percpu+0x88/0x118)
      	[<c0065b10>] (__handle_irq_event_percpu) from [<c0065bbc>] (handle_irq_event_percpu+0x1c/0x58)
      	[<c0065bbc>] (handle_irq_event_percpu) from [<c0065c30>] (handle_irq_event+0x38/0x5c)
      	[<c0065c30>] (handle_irq_event) from [<c0068e28>] (handle_level_irq+0xa4/0x130)
      	[<c0068e28>] (handle_level_irq) from [<c0064e74>] (generic_handle_irq+0x24/0x34)
      	[<c0064e74>] (generic_handle_irq) from [<c021ab7c>] (zynq_gpio_irqhandler+0xb8/0x13c)
      	[<c021ab7c>] (zynq_gpio_irqhandler) from [<c0064e74>] (generic_handle_irq+0x24/0x34)
      	[<c0064e74>] (generic_handle_irq) from [<c0065370>] (__handle_domain_irq+0x5c/0xb4)
      	[<c0065370>] (__handle_domain_irq) from [<c000940c>] (gic_handle_irq+0x48/0x8c)
      	[<c000940c>] (gic_handle_irq) from [<c0013e8c>] (__irq_svc+0x6c/0xa8)
      
      To fix this make sure that the trigger is fully initialized before
      requesting the interrupt.
      
      Fixes: ccd2b52f ("staging:iio: Add common ADIS library")
      Reported-by: NRobin Getz <Robin.Getz@analog.com>
      Signed-off-by: NLars-Peter Clausen <lars@metafoo.de>
      Cc: <Stable@vger.kernel.org>
      Signed-off-by: NJonathan Cameron <Jonathan.Cameron@huawei.com>
      f027e0b3
    • S
      pvcalls-front: wait for other operations to return when release passive sockets · d1a75e08
      Stefano Stabellini 提交于
      Passive sockets can have ongoing operations on them, specifically, we
      have two wait_event_interruptable calls in pvcalls_front_accept.
      
      Add two wake_up calls in pvcalls_front_release, then wait for the
      potential waiters to return and release the sock_mapping refcount.
      Signed-off-by: NStefano Stabellini <stefano@aporeto.com>
      Acked-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      d1a75e08
    • S
      pvcalls-front: introduce a per sock_mapping refcount · 64d68718
      Stefano Stabellini 提交于
      Introduce a per sock_mapping refcount, in addition to the existing
      global refcount. Thanks to the sock_mapping refcount, we can safely wait
      for it to be 1 in pvcalls_front_release before freeing an active socket,
      instead of waiting for the global refcount to be 1.
      Signed-off-by: NStefano Stabellini <stefano@aporeto.com>
      Acked-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      64d68718
    • J
      xenbus: track caller request id · 29fee6ee
      Joao Martins 提交于
      Commit fd8aa909 ("xen: optimize xenbus driver for multiple concurrent
      xenstore accesses") optimized xenbus concurrent accesses but in doing so
      broke UABI of /dev/xen/xenbus. Through /dev/xen/xenbus applications are in
      charge of xenbus message exchange with the correct header and body. Now,
      after the mentioned commit the replies received by application will no
      longer have the header req_id echoed back as it was on request (see
      specification below for reference), because that particular field is being
      overwritten by kernel.
      
      struct xsd_sockmsg
      {
        uint32_t type;  /* XS_??? */
        uint32_t req_id;/* Request identifier, echoed in daemon's response.  */
        uint32_t tx_id; /* Transaction id (0 if not related to a transaction). */
        uint32_t len;   /* Length of data following this. */
      
        /* Generally followed by nul-terminated string(s). */
      };
      
      Before there was only one request at a time so req_id could simply be
      forwarded back and forth. To allow simultaneous requests we need a
      different req_id for each message thus kernel keeps a monotonic increasing
      counter for this field and is written on every request irrespective of
      userspace value.
      
      Forwarding again the req_id on userspace requests is not a solution because
      we would open the possibility of userspace-generated req_id colliding with
      kernel ones. So this patch instead takes another route which is to
      artificially keep user req_id while keeping the xenbus logic as is. We do
      that by saving the original req_id before xs_send(), use the private kernel
      counter as req_id and then once reply comes and was validated, we restore
      back the original req_id.
      
      Cc: <stable@vger.kernel.org> # 4.11
      Fixes: fd8aa909 ("xen: optimize xenbus driver for multiple concurrent xenstore accesses")
      Reported-by: NBhavesh Davda <bhavesh.davda@oracle.com>
      Signed-off-by: NJoao Martins <joao.m.martins@oracle.com>
      Reviewed-by: NJuergen Gross <jgross@suse.com>
      Signed-off-by: NJuergen Gross <jgross@suse.com>
      29fee6ee
    • E
      tun: fix tun_napi_alloc_frags() frag allocator · 43a08e0f
      Eric Dumazet 提交于
      <Mark Rutland reported>
          While fuzzing arm64 v4.16-rc1 with Syzkaller, I've been hitting a
          misaligned atomic in __skb_clone:
      
              atomic_inc(&(skb_shinfo(skb)->dataref));
      
         where dataref doesn't have the required natural alignment, and the
         atomic operation faults. e.g. i often see it aligned to a single
         byte boundary rather than a four byte boundary.
      
         AFAICT, the skb_shared_info is misaligned at the instant it's
         allocated in __napi_alloc_skb()  __napi_alloc_skb()
      </end of report>
      
      Problem is caused by tun_napi_alloc_frags() using
      napi_alloc_frag() with user provided seg sizes,
      leading to other users of this API getting unaligned
      page fragments.
      
      Since we would like to not necessarily add paddings or alignments to
      the frags that tun_napi_alloc_frags() attaches to the skb, switch to
      another page frag allocator.
      
      As a bonus skb_page_frag_refill() can use GFP_KERNEL allocations,
      meaning that we can not deplete memory reserves as easily.
      
      Fixes: 90e33d45 ("tun: enable napi_gro_frags() for TUN/TAP driver")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NMark Rutland <mark.rutland@arm.com>
      Tested-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      43a08e0f
    • C
      PCI/cxgb4: Extend T3 PCI quirk to T4+ devices · 7dcf688d
      Casey Leedom 提交于
      We've run into a problem where our device is attached
      to a Virtual Machine and the use of the new pci_set_vpd_size()
      API doesn't help.  The VM kernel has been informed that
      the accesses are okay, but all of the actual VPD Capability
      Accesses are trapped down into the KVM Hypervisor where it
      goes ahead and imposes the silent denials.
      
      The right idea is to follow the kernel.org
      commit 1c7de2b4 ("PCI: Enable access to non-standard VPD for
      Chelsio devices (cxgb3)") which Alexey Kardashevskiy authored
      to establish a PCI Quirk for our T3-based adapters. This commit
      extends that PCI Quirk to cover Chelsio T4 devices and later.
      
      The advantage of this approach is that the VPD Size gets set early
      in the Base OS/Hypervisor Boot and doesn't require that the cxgb4
      driver even be available in the Base OS/Hypervisor.  Thus PF4 can
      be exported to a Virtual Machine and everything should work.
      
      Fixes: 67e65879 ("cxgb4: Set VPD size so we can read both VPD structures")
      Cc: <stable@vger.kernel.org>  # v4.9+
      Signed-off-by: NCasey Leedom <leedom@chelsio.com>
      Signed-off-by: NArjun Vynipadath <arjun@chelsio.com>
      Signed-off-by: NGanesh Goudar <ganeshgr@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7dcf688d
    • R
      cxgb4: fix trailing zero in CIM LA dump · e6f02a4d
      Rahul Lakkireddy 提交于
      Set correct size of the CIM LA dump for T6.
      
      Fixes: 27887bc7 ("cxgb4: collect hardware LA dumps")
      Signed-off-by: NRahul Lakkireddy <rahul.lakkireddy@chelsio.com>
      Signed-off-by: NGanesh Goudar <ganeshgr@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e6f02a4d
    • G
      cxgb4: free up resources of pf 0-3 · c4e43e14
      Ganesh Goudar 提交于
      free pf 0-3 resources, commit baf50868 ("cxgb4:
      restructure VF mgmt code") erroneously removed the
      code which frees the pf 0-3 resources, causing the
      probe of pf 0-3 to fail in case of driver reload.
      
      Fixes: baf50868 ("cxgb4: restructure VF mgmt code")
      Signed-off-by: NGanesh Goudar <ganeshgr@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c4e43e14
    • S
      RDMA/restrack: don't use uaccess_kernel() · 2f08ee36
      Steve Wise 提交于
      uaccess_kernel() isn't sufficient to determine if an rdma resource is
      user-mode or not.  For example, resources allocated in the add_one()
      function of an ib_client get falsely labeled as user mode, when they
      are kernel mode allocations.  EG: mad qps.
      
      The result is that these qps are skipped over during a nldev query
      because of an erroneous namespace mismatch.
      
      So now we determine if the resource is user-mode by looking at the object
      struct's uobject or similar pointer to know if it was allocated for user
      mode applications.
      
      Fixes: 02d8883f ("RDMA/restrack: Add general infrastructure to track RDMA resources")
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NJason Gunthorpe <jgg@mellanox.com>
      2f08ee36