1. 22 3月, 2019 1 次提交
  2. 13 2月, 2019 6 次提交
    • F
      block/swim3: Fix -EBUSY error when re-opening device after unmount · 295b3e2a
      Finn Thain 提交于
      [ Upstream commit 296dcc40f2f2e402facf7cd26cf3f2c8f4b17d47 ]
      
      When the block device is opened with FMODE_EXCL, ref_count is set to -1.
      This value doesn't get reset when the device is closed which means the
      device cannot be opened again. Fix this by checking for refcount <= 0
      in the release method.
      Reported-and-tested-by: NStan Johnson <userm57@yahoo.com>
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Cc: linuxppc-dev@lists.ozlabs.org
      Signed-off-by: NFinn Thain <fthain@telegraphics.com.au>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      295b3e2a
    • M
      zram: fix lockdep warning of free block handling · 3b3ee499
      Minchan Kim 提交于
      [ Upstream commit 3c9959e025472122a61faebb208525cf26b305d1 ]
      
      Patch series "zram idle page writeback", v3.
      
      Inherently, swap device has many idle pages which are rare touched since
      it was allocated.  It is never problem if we use storage device as swap.
      However, it's just waste for zram-swap.
      
      This patchset supports zram idle page writeback feature.
      
      * Admin can define what is idle page "no access since X time ago"
      * Admin can define when zram should writeback them
      * Admin can define when zram should stop writeback to prevent wearout
      
      Details are in each patch's description.
      
      This patch (of 7):
      
        ================================
        WARNING: inconsistent lock state
        4.19.0+ #390 Not tainted
        --------------------------------
        inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
        zram_verify/2095 [HC0[0]:SC1[1]:HE1:SE0] takes:
        00000000b1828693 (&(&zram->bitmap_lock)->rlock){+.?.}, at: put_entry_bdev+0x1e/0x50
        {SOFTIRQ-ON-W} state was registered at:
          _raw_spin_lock+0x2c/0x40
          zram_make_request+0x755/0xdc9
          generic_make_request+0x373/0x6a0
          submit_bio+0x6c/0x140
          __swap_writepage+0x3a8/0x480
          shrink_page_list+0x1102/0x1a60
          shrink_inactive_list+0x21b/0x3f0
          shrink_node_memcg.constprop.99+0x4f8/0x7e0
          shrink_node+0x7d/0x2f0
          do_try_to_free_pages+0xe0/0x300
          try_to_free_pages+0x116/0x2b0
          __alloc_pages_slowpath+0x3f4/0xf80
          __alloc_pages_nodemask+0x2a2/0x2f0
          __handle_mm_fault+0x42e/0xb50
          handle_mm_fault+0x55/0xb0
          __do_page_fault+0x235/0x4b0
          page_fault+0x1e/0x30
        irq event stamp: 228412
        hardirqs last  enabled at (228412): [<ffffffff98245846>] __slab_free+0x3e6/0x600
        hardirqs last disabled at (228411): [<ffffffff98245625>] __slab_free+0x1c5/0x600
        softirqs last  enabled at (228396): [<ffffffff98e0031e>] __do_softirq+0x31e/0x427
        softirqs last disabled at (228403): [<ffffffff98072051>] irq_exit+0xd1/0xe0
      
        other info that might help us debug this:
         Possible unsafe locking scenario:
      
               CPU0
               ----
          lock(&(&zram->bitmap_lock)->rlock);
          <Interrupt>
            lock(&(&zram->bitmap_lock)->rlock);
      
         *** DEADLOCK ***
      
        no locks held by zram_verify/2095.
      
        stack backtrace:
        CPU: 5 PID: 2095 Comm: zram_verify Not tainted 4.19.0+ #390
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014
        Call Trace:
         <IRQ>
         dump_stack+0x67/0x9b
         print_usage_bug+0x1bd/0x1d3
         mark_lock+0x4aa/0x540
         __lock_acquire+0x51d/0x1300
         lock_acquire+0x90/0x180
         _raw_spin_lock+0x2c/0x40
         put_entry_bdev+0x1e/0x50
         zram_free_page+0xf6/0x110
         zram_slot_free_notify+0x42/0xa0
         end_swap_bio_read+0x5b/0x170
         blk_update_request+0x8f/0x340
         scsi_end_request+0x2c/0x1e0
         scsi_io_completion+0x98/0x650
         blk_done_softirq+0x9e/0xd0
         __do_softirq+0xcc/0x427
         irq_exit+0xd1/0xe0
         do_IRQ+0x93/0x120
         common_interrupt+0xf/0xf
         </IRQ>
      
      With writeback feature, zram_slot_free_notify could be called in softirq
      context by end_swap_bio_read.  However, bitmap_lock is not aware of that
      so lockdep yell out:
      
        get_entry_bdev
        spin_lock(bitmap->lock);
        irq
        softirq
        end_swap_bio_read
        zram_slot_free_notify
        zram_slot_lock <-- deadlock prone
        zram_free_page
        put_entry_bdev
        spin_lock(bitmap->lock); <-- deadlock prone
      
      With akpm's suggestion (i.e.  bitmap operation is already atomic), we
      could remove bitmap lock.  It might fail to find a empty slot if serious
      contention happens.  However, it's not severe problem because huge page
      writeback has already possiblity to fail if there is severe memory
      pressure.  Worst case is just keeping the incompressible in memory, not
      storage.
      
      The other problem is zram_slot_lock in zram_slot_slot_free_notify.  To
      make it safe is this patch introduces zram_slot_trylock where
      zram_slot_free_notify uses it.  Although it's rare to be contented, this
      patch adds new debug stat "miss_free" to keep monitoring how often it
      happens.
      
      Link: http://lkml.kernel.org/r/20181127055429.251614-2-minchan@kernel.orgSigned-off-by: NMinchan Kim <minchan@kernel.org>
      Reviewed-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Reviewed-by: NJoey Pabalinas <joeypabalinas@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      3b3ee499
    • L
      drbd: skip spurious timeout (ping-timeo) when failing promote · 66345d53
      Lars Ellenberg 提交于
      [ Upstream commit 9848b6ddd8c92305252f94592c5e278574e7a6ac ]
      
      If you try to promote a Secondary while connected to a Primary
      and allow-two-primaries is NOT set, we will wait for "ping-timeout"
      to give this node a chance to detect a dead primary,
      in case the cluster manager noticed faster than we did.
      
      But if we then are *still* connected to a Primary,
      we fail (after an additional timeout of ping-timout).
      
      This change skips the spurious second timeout.
      
      Most people won't notice really,
      since "ping-timeout" by default is half a second.
      
      But in some installations, ping-timeout may be 10 or 20 seconds or more,
      and spuriously delaying the error return becomes annoying.
      Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      66345d53
    • L
      drbd: disconnect, if the wrong UUIDs are attached on a connected peer · af70af5b
      Lars Ellenberg 提交于
      [ Upstream commit b17b59602b6dcf8f97a7dc7bc489a48388d7063a ]
      
      With "on-no-data-accessible suspend-io", DRBD requires the next attach
      or connect to be to the very same data generation uuid tag it lost last.
      
      If we first lost connection to the peer,
      then later lost connection to our own disk,
      we would usually refuse to re-connect to the peer,
      because it presents the wrong data set.
      
      However, if the peer first connects without a disk,
      and then attached its disk, we accepted that same wrong data set,
      which would be "unexpected" by any user of that DRBD
      and cause "undefined results" (read: very likely data corruption).
      
      The fix is to forcefully disconnect as soon as we notice that the peer
      attached to the "wrong" dataset.
      Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      af70af5b
    • R
      drbd: narrow rcu_read_lock in drbd_sync_handshake · 3d67b428
      Roland Kammerer 提交于
      [ Upstream commit d29e89e34952a9ad02c77109c71a80043544296e ]
      
      So far there was the possibility that we called
      genlmsg_new(GFP_NOIO)/mutex_lock() while holding an rcu_read_lock().
      
      This included cases like:
      
      drbd_sync_handshake (acquire the RCU lock)
        drbd_asb_recover_1p
          drbd_khelper
            drbd_bcast_event
              genlmsg_new(GFP_NOIO) --> may sleep
      
      drbd_sync_handshake (acquire the RCU lock)
        drbd_asb_recover_1p
          drbd_khelper
            notify_helper
              genlmsg_new(GFP_NOIO) --> may sleep
      
      drbd_sync_handshake (acquire the RCU lock)
        drbd_asb_recover_1p
          drbd_khelper
            notify_helper
              mutex_lock --> may sleep
      
      While using GFP_ATOMIC whould have been possible in the first two cases,
      the real fix is to narrow the rcu_read_lock.
      Reported-by: NJia-Ju Bai <baijiaju1990@163.com>
      Reviewed-by: NLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: NRoland Kammerer <roland.kammerer@linbit.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      3d67b428
    • Y
      sunvdc: Do not spin in an infinite loop when vio_ldc_send() returns EAGAIN · 3fbba4e5
      Young Xiao 提交于
      [ Upstream commit a11f6ca9aef989b56cd31ff4ee2af4fb31a172ec ]
      
      __vdc_tx_trigger should only loop on EAGAIN a finite
      number of times.
      
      See commit adddc32d ("sunvnet: Do not spin in an
      infinite loop when vio_ldc_send() returns EAGAIN") for detail.
      Signed-off-by: NYoung Xiao <YangX92@hotmail.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      3fbba4e5
  3. 23 1月, 2019 19 次提交
  4. 17 1月, 2019 1 次提交
  5. 13 1月, 2019 1 次提交
  6. 01 12月, 2018 1 次提交
    • J
      floppy: fix race condition in __floppy_read_block_0() · 73fd491d
      Jens Axboe 提交于
      [ Upstream commit de7b75d8 ]
      
      LKP recently reported a hang at bootup in the floppy code:
      
      [  245.678853] INFO: task mount:580 blocked for more than 120 seconds.
      [  245.679906]       Tainted: G                T 4.19.0-rc6-00172-ga9f38e1d #1
      [  245.680959] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
      [  245.682181] mount           D 6372   580      1 0x00000004
      [  245.683023] Call Trace:
      [  245.683425]  __schedule+0x2df/0x570
      [  245.683975]  schedule+0x2d/0x80
      [  245.684476]  schedule_timeout+0x19d/0x330
      [  245.685090]  ? wait_for_common+0xa5/0x170
      [  245.685735]  wait_for_common+0xac/0x170
      [  245.686339]  ? do_sched_yield+0x90/0x90
      [  245.686935]  wait_for_completion+0x12/0x20
      [  245.687571]  __floppy_read_block_0+0xfb/0x150
      [  245.688244]  ? floppy_resume+0x40/0x40
      [  245.688844]  floppy_revalidate+0x20f/0x240
      [  245.689486]  check_disk_change+0x43/0x60
      [  245.690087]  floppy_open+0x1ea/0x360
      [  245.690653]  __blkdev_get+0xb4/0x4d0
      [  245.691212]  ? blkdev_get+0x1db/0x370
      [  245.691777]  blkdev_get+0x1f3/0x370
      [  245.692351]  ? path_put+0x15/0x20
      [  245.692871]  ? lookup_bdev+0x4b/0x90
      [  245.693539]  blkdev_get_by_path+0x3d/0x80
      [  245.694165]  mount_bdev+0x2a/0x190
      [  245.694695]  squashfs_mount+0x10/0x20
      [  245.695271]  ? squashfs_alloc_inode+0x30/0x30
      [  245.695960]  mount_fs+0xf/0x90
      [  245.696451]  vfs_kern_mount+0x43/0x130
      [  245.697036]  do_mount+0x187/0xc40
      [  245.697563]  ? memdup_user+0x28/0x50
      [  245.698124]  ksys_mount+0x60/0xc0
      [  245.698639]  sys_mount+0x19/0x20
      [  245.699167]  do_int80_syscall_32+0x61/0x130
      [  245.699813]  entry_INT80_32+0xc7/0xc7
      
      showing that we never complete that read request. The reason is that
      the completion setup is racy - it initializes the completion event
      AFTER submitting the IO, which means that the IO could complete
      before/during the init. If it does, we are passing garbage to
      complete() and we may sleep forever waiting for the event to
      occur.
      
      Fixes: 7b7b68bb ("floppy: bail out in open() if drive is not responding to block0 read")
      Reviewed-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      73fd491d
  7. 27 11月, 2018 1 次提交
    • M
      block: brd: associate with queue until adding disk · 0987d5a6
      Ming Lei 提交于
      [ Upstream commit 153fcd5f6d93b8e1e4040b1337f564a10f8d93af ]
      
      brd_free() may be called in failure path on one brd instance which
      disk isn't added yet, so release handler of gendisk may free the
      associated request_queue early and causes the following use-after-free[1].
      
      This patch fixes this issue by associating gendisk with request_queue
      just before adding disk.
      
      [1] KASAN: use-after-free Read in del_timer_syncNon-volatile memory driver v1.3
      Linux agpgart interface v0.103
      [drm] Initialized vgem 1.0.0 20120112 for virtual device on minor 0
      usbcore: registered new interface driver udl
      ==================================================================
      BUG: KASAN: use-after-free in __lock_acquire+0x36d9/0x4c20
      kernel/locking/lockdep.c:3218
      Read of size 8 at addr ffff8801d1b6b540 by task swapper/0/1
      
      CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.19.0+ #88
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
      Google 01/01/2011
      Call Trace:
        __dump_stack lib/dump_stack.c:77 [inline]
        dump_stack+0x244/0x39d lib/dump_stack.c:113
        print_address_description.cold.7+0x9/0x1ff mm/kasan/report.c:256
        kasan_report_error mm/kasan/report.c:354 [inline]
        kasan_report.cold.8+0x242/0x309 mm/kasan/report.c:412
        __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
        __lock_acquire+0x36d9/0x4c20 kernel/locking/lockdep.c:3218
        lock_acquire+0x1ed/0x520 kernel/locking/lockdep.c:3844
        del_timer_sync+0xb7/0x270 kernel/time/timer.c:1283
        blk_cleanup_queue+0x413/0x710 block/blk-core.c:809
        brd_free+0x5d/0x71 drivers/block/brd.c:422
        brd_init+0x2eb/0x393 drivers/block/brd.c:518
        do_one_initcall+0x145/0x957 init/main.c:890
        do_initcall_level init/main.c:958 [inline]
        do_initcalls init/main.c:966 [inline]
        do_basic_setup init/main.c:984 [inline]
        kernel_init_freeable+0x5c6/0x6b9 init/main.c:1148
        kernel_init+0x11/0x1ae init/main.c:1068
        ret_from_fork+0x3a/0x50 arch/x86/entry/entry_64.S:350
      
      Reported-by: syzbot+3701447012fe951dabb2@syzkaller.appspotmail.com
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      0987d5a6
  8. 21 11月, 2018 1 次提交
  9. 14 11月, 2018 4 次提交
  10. 09 10月, 2018 1 次提交
  11. 28 9月, 2018 2 次提交
  12. 27 9月, 2018 1 次提交
  13. 20 9月, 2018 1 次提交
    • A
      floppy: Do not copy a kernel pointer to user memory in FDGETPRM ioctl · 65eea8ed
      Andy Whitcroft 提交于
      The final field of a floppy_struct is the field "name", which is a pointer
      to a string in kernel memory.  The kernel pointer should not be copied to
      user memory.  The FDGETPRM ioctl copies a floppy_struct to user memory,
      including this "name" field.  This pointer cannot be used by the user
      and it will leak a kernel address to user-space, which will reveal the
      location of kernel code and data and undermine KASLR protection.
      
      Model this code after the compat ioctl which copies the returned data
      to a previously cleared temporary structure on the stack (excluding the
      name pointer) and copy out to userspace from there.  As we already have
      an inparam union with an appropriate member and that memory is already
      cleared even for read only calls make use of that as a temporary store.
      
      Based on an initial patch by Brian Belleville.
      
      CVE-2018-7755
      Signed-off-by: NAndy Whitcroft <apw@canonical.com>
      
      Broke up long line.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      65eea8ed