1. 09 3月, 2018 4 次提交
  2. 07 3月, 2018 1 次提交
  3. 01 3月, 2018 9 次提交
  4. 28 2月, 2018 4 次提交
    • B
      nvme-multipath: fix sysfs dangerously created links · 9bd82b1a
      Baegjae Sung 提交于
      If multipathing is enabled, each NVMe subsystem creates a head
      namespace (e.g., nvme0n1) and multiple private namespaces
      (e.g., nvme0c0n1 and nvme0c1n1) in sysfs. When creating links for
      private namespaces, links of head namespace are used, so the
      namespace creation order must be followed (e.g., nvme0n1 ->
      nvme0c1n1). If the order is not followed, links of sysfs will be
      incomplete or kernel panic will occur.
      
      The kernel panic was:
        kernel BUG at fs/sysfs/symlink.c:27!
        Call Trace:
          nvme_mpath_add_disk_links+0x5d/0x80 [nvme_core]
          nvme_validate_ns+0x5c2/0x850 [nvme_core]
          nvme_scan_work+0x1af/0x2d0 [nvme_core]
      
      Correct order
      Context A     Context B
      nvme0n1
      nvme0c0n1     nvme0c1n1
      
      Incorrect order
      Context A     Context B
                    nvme0c1n1
      nvme0n1
      nvme0c0n1
      
      The nvme_mpath_add_disk (for creating head namespace) is called
      just before the nvme_mpath_add_disk_links (for creating private
      namespaces). In nvme_mpath_add_disk, the first context acquires
      the lock of subsystem and creates a head namespace, and other
      contexts do nothing by checking GENHD_FL_UP of a head namespace
      after waiting to acquire the lock. We verified the code with or
      without multipathing using three vendors of dual-port NVMe SSDs.
      Signed-off-by: NBaegjae Sung <baegjae@gmail.com>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      9bd82b1a
    • G
      nbd: fix return value in error handling path · 0979962f
      Gustavo A. R. Silva 提交于
      It seems that the proper value to return in this particular case is the
      one contained into variable new_index instead of ret.
      
      Addresses-Coverity-ID: 1465148 ("Copy-paste error")
      Fixes: e46c7287 ("nbd: add a basic netlink interface")
      Reviewed-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NGustavo A. R. Silva <gustavo@embeddedor.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      0979962f
    • T
      bcache: fix kcrashes with fio in RAID5 backend dev · 60eb34ec
      Tang Junhui 提交于
      Kernel crashed when run fio in a RAID5 backend bcache device, the call
      trace is bellow:
      [  440.012034] kernel BUG at block/blk-ioc.c:146!
      [  440.012696] invalid opcode: 0000 [#1] SMP NOPTI
      [  440.026537] CPU: 2 PID: 2205 Comm: md127_raid5 Not tainted 4.15.0 #8
      [  440.027441] Hardware name: HP ProLiant MicroServer Gen8, BIOS J06 07/16
      /2015
      [  440.028615] RIP: 0010:put_io_context+0x8b/0x90
      [  440.029246] RSP: 0018:ffffa8c882b43af8 EFLAGS: 00010246
      [  440.029990] RAX: 0000000000000000 RBX: ffffa8c88294fca0 RCX: 0000000000
      0f4240
      [  440.031006] RDX: 0000000000000004 RSI: 0000000000000286 RDI: ffffa8c882
      94fca0
      [  440.032030] RBP: ffffa8c882b43b10 R08: 0000000000000003 R09: ffff949cb8
      0c1700
      [  440.033206] R10: 0000000000000104 R11: 000000000000b71c R12: 00000000000
      01000
      [  440.034222] R13: 0000000000000000 R14: ffff949cad84db70 R15: ffff949cb11
      bd1e0
      [  440.035239] FS:  0000000000000000(0000) GS:ffff949cba280000(0000) knlGS:
      0000000000000000
      [  440.060190] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  440.084967] CR2: 00007ff0493ef000 CR3: 00000002f1e0a002 CR4: 00000000001
      606e0
      [  440.110498] Call Trace:
      [  440.135443]  bio_disassociate_task+0x1b/0x60
      [  440.160355]  bio_free+0x1b/0x60
      [  440.184666]  bio_put+0x23/0x30
      [  440.208272]  search_free+0x23/0x40 [bcache]
      [  440.231448]  cached_dev_write_complete+0x31/0x70 [bcache]
      [  440.254468]  closure_put+0xb6/0xd0 [bcache]
      [  440.277087]  request_endio+0x30/0x40 [bcache]
      [  440.298703]  bio_endio+0xa1/0x120
      [  440.319644]  handle_stripe+0x418/0x2270 [raid456]
      [  440.340614]  ? load_balance+0x17b/0x9c0
      [  440.360506]  handle_active_stripes.isra.58+0x387/0x5a0 [raid456]
      [  440.380675]  ? __release_stripe+0x15/0x20 [raid456]
      [  440.400132]  raid5d+0x3ed/0x5d0 [raid456]
      [  440.419193]  ? schedule+0x36/0x80
      [  440.437932]  ? schedule_timeout+0x1d2/0x2f0
      [  440.456136]  md_thread+0x122/0x150
      [  440.473687]  ? wait_woken+0x80/0x80
      [  440.491411]  kthread+0x102/0x140
      [  440.508636]  ? find_pers+0x70/0x70
      [  440.524927]  ? kthread_associate_blkcg+0xa0/0xa0
      [  440.541791]  ret_from_fork+0x35/0x40
      [  440.558020] Code: c2 48 00 5b 41 5c 41 5d 5d c3 48 89 c6 4c 89 e7 e8 bb c2
      48 00 48 8b 3d bc 36 4b 01 48 89 de e8 7c f7 e0 ff 5b 41 5c 41 5d 5d c3 <0f> 0b
      0f 1f 00 0f 1f 44 00 00 55 48 8d 47 b8 48 89 e5 41 57 41
      [  440.610020] RIP: put_io_context+0x8b/0x90 RSP: ffffa8c882b43af8
      [  440.628575] ---[ end trace a1fd79d85643a73e ]--
      
      All the crash issue happened when a bypass IO coming, in such scenario
      s->iop.bio is pointed to the s->orig_bio. In search_free(), it finishes the
      s->orig_bio by calling bio_complete(), and after that, s->iop.bio became
      invalid, then kernel would crash when calling bio_put(). Maybe its upper
      layer's faulty, since bio should not be freed before we calling bio_put(),
      but we'd better calling bio_put() first before calling bio_complete() to
      notify upper layer ending this bio.
      
      This patch moves bio_complete() under bio_put() to avoid kernel crash.
      
      [mlyle: fixed commit subject for character limits]
      Reported-by: NMatthias Ferdinand <bcache@mfedv.net>
      Tested-by: NMatthias Ferdinand <bcache@mfedv.net>
      Signed-off-by: NTang Junhui <tang.junhui@zte.com.cn>
      Reviewed-by: NMichael Lyle <mlyle@lyle.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      60eb34ec
    • C
      bcache: correct flash only vols (check all uuids) · 02aa8a8b
      Coly Li 提交于
      Commit 2831231d ("bcache: reduce cache_set devices iteration by
      devices_max_used") adds c->devices_max_used to reduce iteration of
      c->uuids elements, this value is updated in bcache_device_attach().
      
      But for flash only volume, when calling flash_devs_run(), the function
      bcache_device_attach() is not called yet and c->devices_max_used is not
      updated. The unexpected result is, the flash only volume won't be run
      by flash_devs_run().
      
      This patch fixes the issue by iterate all c->uuids elements in
      flash_devs_run(). c->devices_max_used will be updated properly when
      bcache_device_attach() gets called.
      
      [mlyle: commit subject edited for character limit]
      
      Fixes: 2831231d ("bcache: reduce cache_set devices iteration by devices_max_used")
      Reported-by: NTang Junhui <tang.junhui@zte.com.cn>
      Signed-off-by: NColy Li <colyli@suse.de>
      Reviewed-by: NMichael Lyle <mlyle@lyle.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      02aa8a8b
  5. 27 2月, 2018 1 次提交
  6. 26 2月, 2018 1 次提交
  7. 22 2月, 2018 5 次提交
    • C
      nvmet-loop: use blk_rq_payload_bytes for sgl selection · 796b0b8d
      Christoph Hellwig 提交于
      blk_rq_bytes does the wrong thing for special payloads like discards and
      might cause the driver to not set up a SGL.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      796b0b8d
    • C
      nvme-rdma: use blk_rq_payload_bytes instead of blk_rq_bytes · 0d309923
      Christoph Hellwig 提交于
      blk_rq_bytes does the wrong thing for special payloads like discards and
      might cause the driver to not set up a SGL.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      0d309923
    • C
      nvme-fabrics: don't check for non-NULL module in nvmf_register_transport · 5a1e5953
      Christoph Hellwig 提交于
      THIS_MODULE evaluates to NULL when used from code built into the kernel,
      thus breaking built-in transport modules.  Remove the bogus check.
      
      Fixes: 0de5cd36 ("nvme-fabrics: protect against module unload during create_ctrl")
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NSagi Grimberg <sagi@grimberg.me>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NKeith Busch <keith.busch@intel.com>
      5a1e5953
    • H
      mm, swap, frontswap: fix THP swap if frontswap enabled · 7ba71669
      Huang Ying 提交于
      It was reported by Sergey Senozhatsky that if THP (Transparent Huge
      Page) and frontswap (via zswap) are both enabled, when memory goes low
      so that swap is triggered, segfault and memory corruption will occur in
      random user space applications as follow,
      
      kernel: urxvt[338]: segfault at 20 ip 00007fc08889ae0d sp 00007ffc73a7fc40 error 6 in libc-2.26.so[7fc08881a000+1ae000]
       #0  0x00007fc08889ae0d _int_malloc (libc.so.6)
       #1  0x00007fc08889c2f3 malloc (libc.so.6)
       #2  0x0000560e6004bff7 _Z14rxvt_wcstoutf8PKwi (urxvt)
       #3  0x0000560e6005e75c n/a (urxvt)
       #4  0x0000560e6007d9f1 _ZN16rxvt_perl_interp6invokeEP9rxvt_term9hook_typez (urxvt)
       #5  0x0000560e6003d988 _ZN9rxvt_term9cmd_parseEv (urxvt)
       #6  0x0000560e60042804 _ZN9rxvt_term6pty_cbERN2ev2ioEi (urxvt)
       #7  0x0000560e6005c10f _Z17ev_invoke_pendingv (urxvt)
       #8  0x0000560e6005cb55 ev_run (urxvt)
       #9  0x0000560e6003b9b9 main (urxvt)
       #10 0x00007fc08883af4a __libc_start_main (libc.so.6)
       #11 0x0000560e6003f9da _start (urxvt)
      
      After bisection, it was found the first bad commit is bd4c82c2 ("mm,
      THP, swap: delay splitting THP after swapped out").
      
      The root cause is as follows:
      
      When the pages are written to swap device during swapping out in
      swap_writepage(), zswap (fontswap) is tried to compress the pages to
      improve performance.  But zswap (frontswap) will treat THP as a normal
      page, so only the head page is saved.  After swapping in, tail pages
      will not be restored to their original contents, causing memory
      corruption in the applications.
      
      This is fixed by refusing to save page in the frontswap store functions
      if the page is a THP.  So that the THP will be swapped out to swap
      device.
      
      Another choice is to split THP if frontswap is enabled.  But it is found
      that the frontswap enabling isn't flexible.  For example, if
      CONFIG_ZSWAP=y (cannot be module), frontswap will be enabled even if
      zswap itself isn't enabled.
      
      Frontswap has multiple backends, to make it easy for one backend to
      enable THP support, the THP checking is put in backend frontswap store
      functions instead of the general interfaces.
      
      Link: http://lkml.kernel.org/r/20180209084947.22749-1-ying.huang@intel.com
      Fixes: bd4c82c2 ("mm, THP, swap: delay splitting THP after swapped out")
      Signed-off-by: N"Huang, Ying" <ying.huang@intel.com>
      Reported-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Tested-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Suggested-by: Minchan Kim <minchan@kernel.org>	[put THP checking in backend]
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Dan Streetman <ddstreet@ieee.org>
      Cc: Seth Jennings <sjenning@redhat.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Shaohua Li <shli@kernel.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: <stable@vger.kernel.org>	[4.14]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7ba71669
    • L
      RDMA/uverbs: Fix kernel panic while using XRC_TGT QP type · f4576587
      Leon Romanovsky 提交于
      Attempt to modify XRC_TGT QP type from the user space (ibv_xsrq_pingpong
      invocation) will trigger the following kernel panic. It is caused by the
      fact that such QPs missed uobject initialization.
      
      [   17.408845] BUG: unable to handle kernel NULL pointer dereference at 0000000000000048
      [   17.412645] IP: rdma_lookup_put_uobject+0x9/0x50
      [   17.416567] PGD 0 P4D 0
      [   17.419262] Oops: 0000 [#1] SMP PTI
      [   17.422915] CPU: 0 PID: 455 Comm: ibv_xsrq_pingpo Not tainted 4.16.0-rc1+ #86
      [   17.424765] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5-0-ge51488c-20140602_164612-nilsson.home.kraxel.org 04/01/2014
      [   17.427399] RIP: 0010:rdma_lookup_put_uobject+0x9/0x50
      [   17.428445] RSP: 0018:ffffb8c7401e7c90 EFLAGS: 00010246
      [   17.429543] RAX: 0000000000000000 RBX: ffffb8c7401e7cf8 RCX: 0000000000000000
      [   17.432426] RDX: 0000000000000001 RSI: 0000000000000000 RDI: 0000000000000000
      [   17.437448] RBP: 0000000000000000 R08: 00000000000218f0 R09: ffffffff8ebc4cac
      [   17.440223] R10: fffff6038052cd80 R11: ffff967694b36400 R12: ffff96769391f800
      [   17.442184] R13: ffffb8c7401e7cd8 R14: 0000000000000000 R15: ffff967699f60000
      [   17.443971] FS:  00007fc29207d700(0000) GS:ffff96769fc00000(0000) knlGS:0000000000000000
      [   17.446623] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   17.448059] CR2: 0000000000000048 CR3: 000000001397a000 CR4: 00000000000006b0
      [   17.449677] Call Trace:
      [   17.450247]  modify_qp.isra.20+0x219/0x2f0
      [   17.451151]  ib_uverbs_modify_qp+0x90/0xe0
      [   17.452126]  ib_uverbs_write+0x1d2/0x3c0
      [   17.453897]  ? __handle_mm_fault+0x93c/0xe40
      [   17.454938]  __vfs_write+0x36/0x180
      [   17.455875]  vfs_write+0xad/0x1e0
      [   17.456766]  SyS_write+0x52/0xc0
      [   17.457632]  do_syscall_64+0x75/0x180
      [   17.458631]  entry_SYSCALL_64_after_hwframe+0x21/0x86
      [   17.460004] RIP: 0033:0x7fc29198f5a0
      [   17.460982] RSP: 002b:00007ffccc71f018 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      [   17.463043] RAX: ffffffffffffffda RBX: 0000000000000078 RCX: 00007fc29198f5a0
      [   17.464581] RDX: 0000000000000078 RSI: 00007ffccc71f050 RDI: 0000000000000003
      [   17.466148] RBP: 0000000000000000 R08: 0000000000000078 R09: 00007ffccc71f050
      [   17.467750] R10: 000055b6cf87c248 R11: 0000000000000246 R12: 00007ffccc71f300
      [   17.469541] R13: 000055b6cf8733a0 R14: 0000000000000000 R15: 0000000000000000
      [   17.471151] Code: 00 00 0f 1f 44 00 00 48 8b 47 48 48 8b 00 48 8b 40 10 e9 0b 8b 68 00 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 53 89 f5 <48> 8b 47 48 48 89 fb 40 0f b6 f6 48 8b 00 48 8b 40 20 e8 e0 8a
      [   17.475185] RIP: rdma_lookup_put_uobject+0x9/0x50 RSP: ffffb8c7401e7c90
      [   17.476841] CR2: 0000000000000048
      [   17.477764] ---[ end trace 1dbcc5354071a712 ]---
      [   17.478880] Kernel panic - not syncing: Fatal exception
      [   17.480277] Kernel Offset: 0xd000000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
      
      Fixes: 2f08ee36 ("RDMA/restrack: don't use uaccess_kernel()")
      Signed-off-by: NLeon Romanovsky <leonro@mellanox.com>
      Signed-off-by: NDoug Ledford <dledford@redhat.com>
      f4576587
  8. 21 2月, 2018 5 次提交
  9. 20 2月, 2018 9 次提交
  10. 18 2月, 2018 1 次提交