1. 09 3月, 2018 3 次提交
    • M
      cdrom: do not call check_disk_change() inside cdrom_open() · 2bbea6e1
      Maurizio Lombardi 提交于
      when mounting an ISO filesystem sometimes (very rarely)
      the system hangs because of a race condition between two tasks.
      
      PID: 6766   TASK: ffff88007b2a6dd0  CPU: 0   COMMAND: "mount"
       #0 [ffff880078447ae0] __schedule at ffffffff8168d605
       #1 [ffff880078447b48] schedule_preempt_disabled at ffffffff8168ed49
       #2 [ffff880078447b58] __mutex_lock_slowpath at ffffffff8168c995
       #3 [ffff880078447bb8] mutex_lock at ffffffff8168bdef
       #4 [ffff880078447bd0] sr_block_ioctl at ffffffffa00b6818 [sr_mod]
       #5 [ffff880078447c10] blkdev_ioctl at ffffffff812fea50
       #6 [ffff880078447c70] ioctl_by_bdev at ffffffff8123a8b3
       #7 [ffff880078447c90] isofs_fill_super at ffffffffa04fb1e1 [isofs]
       #8 [ffff880078447da8] mount_bdev at ffffffff81202570
       #9 [ffff880078447e18] isofs_mount at ffffffffa04f9828 [isofs]
      #10 [ffff880078447e28] mount_fs at ffffffff81202d09
      #11 [ffff880078447e70] vfs_kern_mount at ffffffff8121ea8f
      #12 [ffff880078447ea8] do_mount at ffffffff81220fee
      #13 [ffff880078447f28] sys_mount at ffffffff812218d6
      #14 [ffff880078447f80] system_call_fastpath at ffffffff81698c49
          RIP: 00007fd9ea914e9a  RSP: 00007ffd5d9bf648  RFLAGS: 00010246
          RAX: 00000000000000a5  RBX: ffffffff81698c49  RCX: 0000000000000010
          RDX: 00007fd9ec2bc210  RSI: 00007fd9ec2bc290  RDI: 00007fd9ec2bcf30
          RBP: 0000000000000000   R8: 0000000000000000   R9: 0000000000000010
          R10: 00000000c0ed0001  R11: 0000000000000206  R12: 00007fd9ec2bc040
          R13: 00007fd9eb6b2380  R14: 00007fd9ec2bc210  R15: 00007fd9ec2bcf30
          ORIG_RAX: 00000000000000a5  CS: 0033  SS: 002b
      
      This task was trying to mount the cdrom.  It allocated and configured a
      super_block struct and owned the write-lock for the super_block->s_umount
      rwsem. While exclusively owning the s_umount lock, it called
      sr_block_ioctl and waited to acquire the global sr_mutex lock.
      
      PID: 6785   TASK: ffff880078720fb0  CPU: 0   COMMAND: "systemd-udevd"
       #0 [ffff880078417898] __schedule at ffffffff8168d605
       #1 [ffff880078417900] schedule at ffffffff8168dc59
       #2 [ffff880078417910] rwsem_down_read_failed at ffffffff8168f605
       #3 [ffff880078417980] call_rwsem_down_read_failed at ffffffff81328838
       #4 [ffff8800784179d0] down_read at ffffffff8168cde0
       #5 [ffff8800784179e8] get_super at ffffffff81201cc7
       #6 [ffff880078417a10] __invalidate_device at ffffffff8123a8de
       #7 [ffff880078417a40] flush_disk at ffffffff8123a94b
       #8 [ffff880078417a88] check_disk_change at ffffffff8123ab50
       #9 [ffff880078417ab0] cdrom_open at ffffffffa00a29e1 [cdrom]
      #10 [ffff880078417b68] sr_block_open at ffffffffa00b6f9b [sr_mod]
      #11 [ffff880078417b98] __blkdev_get at ffffffff8123ba86
      #12 [ffff880078417bf0] blkdev_get at ffffffff8123bd65
      #13 [ffff880078417c78] blkdev_open at ffffffff8123bf9b
      #14 [ffff880078417c90] do_dentry_open at ffffffff811fc7f7
      #15 [ffff880078417cd8] vfs_open at ffffffff811fc9cf
      #16 [ffff880078417d00] do_last at ffffffff8120d53d
      #17 [ffff880078417db0] path_openat at ffffffff8120e6b2
      #18 [ffff880078417e48] do_filp_open at ffffffff8121082b
      #19 [ffff880078417f18] do_sys_open at ffffffff811fdd33
      #20 [ffff880078417f70] sys_open at ffffffff811fde4e
      #21 [ffff880078417f80] system_call_fastpath at ffffffff81698c49
          RIP: 00007f29438b0c20  RSP: 00007ffc76624b78  RFLAGS: 00010246
          RAX: 0000000000000002  RBX: ffffffff81698c49  RCX: 0000000000000000
          RDX: 00007f2944a5fa70  RSI: 00000000000a0800  RDI: 00007f2944a5fa70
          RBP: 00007f2944a5f540   R8: 0000000000000000   R9: 0000000000000020
          R10: 00007f2943614c40  R11: 0000000000000246  R12: ffffffff811fde4e
          R13: ffff880078417f78  R14: 000000000000000c  R15: 00007f2944a4b010
          ORIG_RAX: 0000000000000002  CS: 0033  SS: 002b
      
      This task tried to open the cdrom device, the sr_block_open function
      acquired the global sr_mutex lock. The call to check_disk_change()
      then saw an event flag indicating a possible media change and tried
      to flush any cached data for the device.
      As part of the flush, it tried to acquire the super_block->s_umount
      lock associated with the cdrom device.
      This was the same super_block as created and locked by the previous task.
      
      The first task acquires the s_umount lock and then the sr_mutex_lock;
      the second task acquires the sr_mutex_lock and then the s_umount lock.
      
      This patch fixes the issue by moving check_disk_change() out of
      cdrom_open() and let the caller take care of it.
      Signed-off-by: NMaurizio Lombardi <mlombard@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      2bbea6e1
    • B
      block: Use blk_queue_flag_*() in drivers instead of queue_flag_*() · 8b904b5b
      Bart Van Assche 提交于
      This patch has been generated as follows:
      
      for verb in set_unlocked clear_unlocked set clear; do
        replace-in-files queue_flag_${verb} blk_queue_flag_${verb%_unlocked} \
          $(git grep -lw queue_flag_${verb} drivers block/bsg*)
      done
      
      Except for protecting all queue flag changes with the queue lock
      this patch does not change any functionality.
      
      Cc: Mike Snitzer <snitzer@redhat.com>
      Cc: Shaohua Li <shli@fb.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Ming Lei <ming.lei@redhat.com>
      Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
      Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Acked-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      8b904b5b
    • B
      iscsi: Use blk_queue_flag_set() · bc74c33e
      Bart Van Assche 提交于
      Use blk_queue_flag_set() instead of open-coding this function.
      
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Ming Lei <ming.lei@redhat.com>
      Acked-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      bc74c33e
  2. 01 3月, 2018 1 次提交
  3. 12 2月, 2018 1 次提交
    • L
      vfs: do bulk POLL* -> EPOLL* replacement · a9a08845
      Linus Torvalds 提交于
      This is the mindless scripted replacement of kernel use of POLL*
      variables as described by Al, done by this script:
      
          for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
              L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
              for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
          done
      
      with de-mangling cleanups yet to come.
      
      NOTE! On almost all architectures, the EPOLL* constants have the same
      values as the POLL* constants do.  But they keyword here is "almost".
      For various bad reasons they aren't the same, and epoll() doesn't
      actually work quite correctly in some cases due to this on Sparc et al.
      
      The next patch from Al will sort out the final differences, and we
      should be all done.
      Scripted-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a9a08845
  4. 31 1月, 2018 1 次提交
    • M
      blk-mq: introduce BLK_STS_DEV_RESOURCE · 86ff7c2a
      Ming Lei 提交于
      This status is returned from driver to block layer if device related
      resource is unavailable, but driver can guarantee that IO dispatch
      will be triggered in future when the resource is available.
      
      Convert some drivers to return BLK_STS_DEV_RESOURCE.  Also, if driver
      returns BLK_STS_RESOURCE and SCHED_RESTART is set, rerun queue after
      a delay (BLK_MQ_DELAY_QUEUE) to avoid IO stalls.  BLK_MQ_DELAY_QUEUE is
      3 ms because both scsi-mq and nvmefc are using that magic value.
      
      If a driver can make sure there is in-flight IO, it is safe to return
      BLK_STS_DEV_RESOURCE because:
      
      1) If all in-flight IOs complete before examining SCHED_RESTART in
      blk_mq_dispatch_rq_list(), SCHED_RESTART must be cleared, so queue
      is run immediately in this case by blk_mq_dispatch_rq_list();
      
      2) if there is any in-flight IO after/when examining SCHED_RESTART
      in blk_mq_dispatch_rq_list():
      - if SCHED_RESTART isn't set, queue is run immediately as handled in 1)
      - otherwise, this request will be dispatched after any in-flight IO is
        completed via blk_mq_sched_restart()
      
      3) if SCHED_RESTART is set concurently in context because of
      BLK_STS_RESOURCE, blk_mq_delay_run_hw_queue() will cover the above two
      cases and make sure IO hang can be avoided.
      
      One invariant is that queue will be rerun if SCHED_RESTART is set.
      Suggested-by: NJens Axboe <axboe@kernel.dk>
      Tested-by: NLaurence Oberman <loberman@redhat.com>
      Signed-off-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      86ff7c2a
  5. 23 1月, 2018 15 次提交
  6. 17 1月, 2018 10 次提交
  7. 16 1月, 2018 1 次提交
    • D
      scsi: Define usercopy region in scsi_sense_cache slab cache · 0afe76e8
      David Windsor 提交于
      SCSI sense buffers, stored in struct scsi_cmnd.sense and therefore
      contained in the scsi_sense_cache slab cache, need to be copied to/from
      userspace.
      
      cache object allocation:
          drivers/scsi/scsi_lib.c:
              scsi_select_sense_cache(...):
                  return ... ? scsi_sense_isadma_cache : scsi_sense_cache
      
              scsi_alloc_sense_buffer(...):
                  return kmem_cache_alloc_node(scsi_select_sense_cache(), ...);
      
              scsi_init_request(...):
                  ...
                  cmd->sense_buffer = scsi_alloc_sense_buffer(...);
                  ...
                  cmd->req.sense = cmd->sense_buffer
      
      example usage trace:
      
          block/scsi_ioctl.c:
              (inline from sg_io)
              blk_complete_sghdr_rq(...):
                  struct scsi_request *req = scsi_req(rq);
                  ...
                  copy_to_user(..., req->sense, len)
      
              scsi_cmd_ioctl(...):
                  sg_io(...);
      
      In support of usercopy hardening, this patch defines a region in
      the scsi_sense_cache slab cache in which userspace copy operations
      are allowed.
      
      This region is known as the slab cache's usercopy region. Slab caches
      can now check that each dynamically sized copy operation involving
      cache-managed memory falls entirely within the slab's usercopy region.
      Signed-off-by: NDavid Windsor <dave@nullcore.net>
      [kees: adjust commit log, provide usage trace]
      Cc: "James E.J. Bottomley" <jejb@linux.vnet.ibm.com>
      Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
      Cc: linux-scsi@vger.kernel.org
      Signed-off-by: NKees Cook <keescook@chromium.org>
      0afe76e8
  8. 11 1月, 2018 8 次提交