1. 11 10月, 2017 3 次提交
  2. 04 10月, 2017 3 次提交
    • B
      bsg-lib: fix use-after-free under memory-pressure · eab40cf3
      Benjamin Block 提交于
      When under memory-pressure it is possible that the mempool which backs
      the 'struct request_queue' will make use of up to BLKDEV_MIN_RQ count
      emergency buffers - in case it can't get a regular allocation. These
      buffers are preallocated and once they are also used, they are
      re-supplied with old finished requests from the same request_queue (see
      mempool_free()).
      
      The bug is, when re-supplying the emergency pool, the old requests are
      not again ran through the callback mempool_t->alloc(), and thus also not
      through the callback bsg_init_rq(). Thus we skip initialization, and
      while the sense-buffer still should be good, scsi_request->cmd might
      have become to be an invalid pointer in the meantime. When the request
      is initialized in bsg.c, and the user's CDB is larger than BLK_MAX_CDB,
      bsg will replace it with a custom allocated buffer, which is freed when
      the user's command is finished, thus it dangles afterwards. When next a
      command is sent by the user that has a smaller/similar CDB as
      BLK_MAX_CDB, bsg will assume that scsi_request->cmd is backed by
      scsi_request->__cmd, will not make a custom allocation, and write into
      undefined memory.
      
      Fix this by splitting bsg_init_rq() into two functions:
       - bsg_init_rq() is changed to only do the allocation of the
         sense-buffer, which is used to back the bsg job's reply buffer. This
         pointer should never change during the lifetime of a scsi_request, so
         it doesn't need re-initialization.
       - bsg_initialize_rq() is a new function that makes use of
         'struct request_queue's initialize_rq_fn callback (which was
         introduced in v4.12). This is always called before the request is
         given out via blk_get_request(). This function does the remaining
         initialization that was previously done in bsg_init_rq(), and will
         also do it when the request is taken from the emergency-pool of the
         backing mempool.
      
      Fixes: 50b4d485 ("bsg-lib: fix kernel panic resulting from missing allocation of reply-buffer")
      Cc: <stable@vger.kernel.org> # 4.11+
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Reviewed-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBenjamin Block <bblock@linux.vnet.ibm.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      eab40cf3
    • O
      blk-mq-debugfs: fix device sched directory for default scheduler · 70e62f4b
      Omar Sandoval 提交于
      In blk_mq_debugfs_register(), I remembered to set up the per-hctx sched
      directories if a default scheduler was already configured by
      blk_mq_sched_init() from blk_mq_init_allocated_queue(), but I didn't do
      the same for the device-wide sched directory. Fix it.
      
      Fixes: d332ce09 ("blk-mq-debugfs: allow schedulers to register debugfs attributes")
      Signed-off-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      70e62f4b
    • J
      blk-throttle: fix possible io stall when upgrade to max · 4f02fb76
      Joseph Qi 提交于
      There is a case which will lead to io stall. The case is described as
      follows.
      /test1
        |-subtest1
      /test2
        |-subtest2
      And subtest1 and subtest2 each has 32 queued bios already.
      
      Now upgrade to max. In throtl_upgrade_state, it will try to dispatch
      bios as follows:
      1) tg=subtest1, do nothing;
      2) tg=test1, transfer 32 queued bios from subtest1 to test1; no pending
      left, no need to schedule next dispatch;
      3) tg=subtest2, do nothing;
      4) tg=test2, transfer 32 queued bios from subtest2 to test2; no pending
      left, no need to schedule next dispatch;
      5) tg=/, transfer 8 queued bios from test1 to /, 8 queued bios from
      test2 to /, 8 queued bios from test1 to /, and 8 queued bios from test2
      to /; note that test1 and test2 each still has 16 queued bios left;
      6) tg=/, try to schedule next dispatch, but since disptime is now
      (update in tg_update_disptime, wait=0), pending timer is not scheduled
      in fact;
      7) In throtl_upgrade_state it totally dispatches 32 queued bios and with
      32 left. test1 and test2 each has 16 queued bios;
      8) throtl_pending_timer_fn sees the left over bios, but could do
      nothing, because throtl_select_dispatch returns 0, and test1/test2 has
      no pending tg.
      
      The blktrace shows the following:
      8,32   0        0     2.539007641     0  m   N throtl upgrade to max
      8,32   0        0     2.539072267     0  m   N throtl /test2 dispatch nr_queued=16 read=0 write=16
      8,32   7        0     2.539077142     0  m   N throtl /test1 dispatch nr_queued=16 read=0 write=16
      
      So force schedule dispatch if there are pending children.
      Reviewed-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NJoseph Qi <qijiang.qj@alibaba-inc.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      4f02fb76
  3. 25 9月, 2017 3 次提交
    • S
      block: fix a crash caused by wrong API · f5c156c4
      Shaohua Li 提交于
      part_stat_show takes a part device not a disk, so we should use
      part_to_disk.
      
      Fixes: d62e26b3("block: pass in queue to inflight accounting")
      Cc: Bart Van Assche <bart.vanassche@wdc.com>
      Cc: Omar Sandoval <osandov@fb.com>
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      f5c156c4
    • W
      blktrace: Fix potential deadlock between delete & sysfs ops · 5acb3cc2
      Waiman Long 提交于
      The lockdep code had reported the following unsafe locking scenario:
      
             CPU0                    CPU1
             ----                    ----
        lock(s_active#228);
                                     lock(&bdev->bd_mutex/1);
                                     lock(s_active#228);
        lock(&bdev->bd_mutex);
      
       *** DEADLOCK ***
      
      The deadlock may happen when one task (CPU1) is trying to delete a
      partition in a block device and another task (CPU0) is accessing
      tracing sysfs file (e.g. /sys/block/dm-1/trace/act_mask) in that
      partition.
      
      The s_active isn't an actual lock. It is a reference count (kn->count)
      on the sysfs (kernfs) file. Removal of a sysfs file, however, require
      a wait until all the references are gone. The reference count is
      treated like a rwsem using lockdep instrumentation code.
      
      The fact that a thread is in the sysfs callback method or in the
      ioctl call means there is a reference to the opended sysfs or device
      file. That should prevent the underlying block structure from being
      removed.
      
      Instead of using bd_mutex in the block_device structure, a new
      blk_trace_mutex is now added to the request_queue structure to protect
      access to the blk_trace structure.
      Suggested-by: NChristoph Hellwig <hch@infradead.org>
      Signed-off-by: NWaiman Long <longman@redhat.com>
      Acked-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      
      Fix typo in patch subject line, and prune a comment detailing how
      the code used to work.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      5acb3cc2
    • C
      bsg-lib: don't free job in bsg_prepare_job · f507b54d
      Christoph Hellwig 提交于
      The job structure is allocated as part of the request, so we should not
      free it in the error path of bsg_prepare_job.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Cc: stable@vger.kernel.org
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      f507b54d
  4. 12 9月, 2017 1 次提交
    • J
      block: directly insert blk-mq request from blk_insert_cloned_request() · 157f377b
      Jens Axboe 提交于
      A NULL pointer crash was reported for the case of having the BFQ IO
      scheduler attached to the underlying blk-mq paths of a DM multipath
      device.  The crash occured in blk_mq_sched_insert_request()'s call to
      e->type->ops.mq.insert_requests().
      
      Paolo Valente correctly summarized why the crash occured with:
      "the call chain (dm_mq_queue_rq -> map_request -> setup_clone ->
      blk_rq_prep_clone) creates a cloned request without invoking
      e->type->ops.mq.prepare_request for the target elevator e.  The cloned
      request is therefore not initialized for the scheduler, but it is
      however inserted into the scheduler by blk_mq_sched_insert_request."
      
      All said, a request-based DM multipath device's IO scheduler should be
      the only one used -- when the original requests are issued to the
      underlying paths as cloned requests they are inserted directly in the
      underlying dispatch queue(s) rather than through an additional elevator.
      
      But commit bd166ef1 ("blk-mq-sched: add framework for MQ capable IO
      schedulers") switched blk_insert_cloned_request() from using
      blk_mq_insert_request() to blk_mq_sched_insert_request().  Which
      incorrectly added elevator machinery into a call chain that isn't
      supposed to have any.
      
      To fix this introduce a blk-mq private blk_mq_request_bypass_insert()
      that blk_insert_cloned_request() calls to insert the request without
      involving any elevator that may be attached to the cloned request's
      request_queue.
      
      Fixes: bd166ef1 ("blk-mq-sched: add framework for MQ capable IO schedulers")
      Cc: stable@vger.kernel.org
      Reported-by: NBart Van Assche <Bart.VanAssche@wdc.com>
      Tested-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      157f377b
  5. 11 9月, 2017 2 次提交
  6. 09 9月, 2017 2 次提交
  7. 02 9月, 2017 5 次提交
  8. 01 9月, 2017 1 次提交
    • B
      compat_hdio_ioctl: Fix a declaration · 8363dae2
      Bart Van Assche 提交于
      This patch avoids that sparse reports the following warning messages:
      
      block/compat_ioctl.c:85:11: warning: incorrect type in assignment (different address spaces)
      block/compat_ioctl.c:85:11:    expected unsigned long *[noderef] <asn:1>p
      block/compat_ioctl.c:85:11:    got void [noderef] <asn:1>*
      block/compat_ioctl.c:91:21: warning: incorrect type in argument 1 (different address spaces)
      block/compat_ioctl.c:91:21:    expected void const volatile [noderef] <asn:1>*<noident>
      block/compat_ioctl.c:91:21:    got unsigned long *[noderef] <asn:1>p
      block/compat_ioctl.c:87:53: warning: dereference of noderef expression
      block/compat_ioctl.c:91:21: warning: dereference of noderef expression
      
      Fixes: commit d597580d ("generic ...copy_..._user primitives")
      Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      8363dae2
  9. 31 8月, 2017 3 次提交
    • P
      block, bfq: guarantee update_next_in_service always returns an eligible entity · 24d90bb2
      Paolo Valente 提交于
      If the function bfq_update_next_in_service is invoked as a consequence
      of the activation or requeueing of an entity, say E, then it doesn't
      invoke bfq_lookup_next_entity to get the next-in-service entity. In
      contrast, it follows a shorter path: if E happens to be eligible (see
      commit "bfq-sq-mq: make lookup_next_entity push up vtime on
      expirations" for details on eligibility) and to have a lower virtual
      finish time than the current candidate as next-in-service entity, then
      E directly becomes the next-in-service entity. Unfortunately, there is
      a corner case for which this shorter path makes
      bfq_update_next_in_service choose a non eligible entity: it occurs if
      both E and the current next-in-service entity happen to be non
      eligible when bfq_update_next_in_service is invoked. In this case, E
      is not set as next-in-service, and, since bfq_lookup_next_entity is
      not invoked, the state of the parent entity is not updated so as to
      end up with an eligible entity as the proper next-in-service entity.
      
      In this respect, next-in-service is actually allowed to be non
      eligible while some queue is in service: since no system-virtual-time
      push-up can be performed in that case (see again commit "bfq-sq-mq:
      make lookup_next_entity push up vtime on expirations" for details),
      next-in-service is chosen, speculatively, as a function of the
      possible value that the system virtual time may get after a push
      up. But the correctness of the schedule breaks if next-in-service is
      still a non eligible entity when it is time to set in service the next
      entity. Unfortunately, this may happen in the above corner case.
      
      This commit fixes this problem by making bfq_update_next_in_service
      invoke bfq_lookup_next_entity not only if the above shorter path
      cannot be taken, but also if the shorter path is taken but fails to
      yield an eligible next-in-service entity.
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Tested-by: NLee Tibbert <lee.tibbert@gmail.com>
      Tested-by: NOleksandr Natalenko <oleksandr@natalenko.name>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      24d90bb2
    • P
      block, bfq: remove direct switch to an entity in higher class · a02195ce
      Paolo Valente 提交于
      If the function bfq_update_next_in_service is invoked as a consequence
      of the activation or requeueing of an entity, say E, and finds out
      that E belongs to a higher-priority class than that of the current
      next-in-service entity, then it sets next_in_service directly to
      E. But this may lead to anomalous schedules, because E may happen not
      be eligible for service, because its virtual start time is higher than
      the system virtual time for its service tree.
      
      This commit addresses this issue by simply removing this direct
      switch.
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Tested-by: NLee Tibbert <lee.tibbert@gmail.com>
      Tested-by: NOleksandr Natalenko <oleksandr@natalenko.name>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      a02195ce
    • P
      block, bfq: make lookup_next_entity push up vtime on expirations · 80294c3b
      Paolo Valente 提交于
      To provide a very smooth service, bfq starts to serve a bfq_queue
      only if the queue is 'eligible', i.e., if the same queue would
      have started to be served in the ideal, perfectly fair system that
      bfq simulates internally. This is obtained by associating each
      queue with a virtual start time, and by computing a special system
      virtual time quantity: a queue is eligible only if the system
      virtual time has reached the virtual start time of the
      queue. Finally, bfq guarantees that, when a new queue must be set
      in service, there is always at least one eligible entity for each
      active parent entity in the scheduler. To provide this guarantee,
      the function __bfq_lookup_next_entity pushes up, for each parent
      entity on which it is invoked, the system virtual time to the
      minimum among the virtual start times of the entities in the
      active tree for the parent entity (more precisely, the push up
      occurs if the system virtual time happens to be lower than all
      such virtual start times).
      
      There is however a circumstance in which __bfq_lookup_next_entity
      cannot push up the system virtual time for a parent entity, even
      if the system virtual time is lower than the virtual start times
      of all the child entities in the active tree. It happens if one of
      the child entities is in service. In fact, in such a case, there
      is already an eligible entity, the in-service one, even if it may
      not be not present in the active tree (because in-service entities
      may be removed from the active tree).
      
      Unfortunately, in the last re-design of the
      hierarchical-scheduling engine, the reset of the pointer to the
      in-service entity for a given parent entity--reset to be done as a
      consequence of the expiration of the in-service entity--always
      happens after the function __bfq_lookup_next_entity has been
      invoked. This causes the function to think that there is still an
      entity in service for the parent entity, and then that the system
      virtual time cannot be pushed up, even if actually such a
      no-more-in-service entity has already been properly reinserted
      into the active tree (or in some other tree if no more
      active). Yet, the system virtual time *had* to be pushed up, to be
      ready to correctly choose the next queue to serve. Because of the
      lack of this push up, bfq may wrongly set in service a queue that
      had been speculatively pre-computed as the possible
      next-in-service queue, but that would no more be the one to serve
      after the expiration and the reinsertion into the active trees of
      the previously in-service entities.
      
      This commit addresses this issue by making
      __bfq_lookup_next_entity properly push up the system virtual time
      if an expiration is occurring.
      Signed-off-by: NPaolo Valente <paolo.valente@linaro.org>
      Tested-by: NLee Tibbert <lee.tibbert@gmail.com>
      Tested-by: NOleksandr Natalenko <oleksandr@natalenko.name>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      80294c3b
  10. 30 8月, 2017 4 次提交
  11. 29 8月, 2017 4 次提交
    • D
      block: Make blk_dequeue_request() static · 5034435c
      Damien Le Moal 提交于
      The only caller of this function is blk_start_request() in the same
      file. Fix blk_start_request() description accordingly.
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NBart Van Assche <bart.vanassche@wdc.com>
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      5034435c
    • Y
      smp: Avoid using two cache lines for struct call_single_data · 966a9671
      Ying Huang 提交于
      struct call_single_data is used in IPIs to transfer information between
      CPUs.  Its size is bigger than sizeof(unsigned long) and less than
      cache line size.  Currently it is not allocated with any explicit alignment
      requirements.  This makes it possible for allocated call_single_data to
      cross two cache lines, which results in double the number of the cache lines
      that need to be transferred among CPUs.
      
      This can be fixed by requiring call_single_data to be aligned with the
      size of call_single_data. Currently the size of call_single_data is the
      power of 2.  If we add new fields to call_single_data, we may need to
      add padding to make sure the size of new definition is the power of 2
      as well.
      
      Fortunately, this is enforced by GCC, which will report bad sizes.
      
      To set alignment requirements of call_single_data to the size of
      call_single_data, a struct definition and a typedef is used.
      
      To test the effect of the patch, I used the vm-scalability multiple
      thread swap test case (swap-w-seq-mt).  The test will create multiple
      threads and each thread will eat memory until all RAM and part of swap
      is used, so that huge number of IPIs are triggered when unmapping
      memory.  In the test, the throughput of memory writing improves ~5%
      compared with misaligned call_single_data, because of faster IPIs.
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NHuang, Ying <ying.huang@intel.com>
      [ Add call_single_data_t and align with size of call_single_data. ]
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Aaron Lu <aaron.lu@intel.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/87bmnqd6lz.fsf@yhuang-mobile.sh.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      966a9671
    • D
      block: fix warning when I/O elevator is changed as request_queue is being removed · e9a823fb
      David Jeffery 提交于
      There is a race between changing I/O elevator and request_queue removal
      which can trigger the warning in kobject_add_internal.  A program can
      use sysfs to request a change of elevator at the same time another task
      is unregistering the request_queue the elevator would be attached to.
      The elevator's kobject will then attempt to be connected to the
      request_queue in the object tree when the request_queue has just been
      removed from sysfs.  This triggers the warning in kobject_add_internal
      as the request_queue no longer has a sysfs directory:
      
      kobject_add_internal failed for iosched (error: -2 parent: queue)
      ------------[ cut here ]------------
      WARNING: CPU: 3 PID: 14075 at lib/kobject.c:244 kobject_add_internal+0x103/0x2d0
      
      To fix this warning, we can check the QUEUE_FLAG_REGISTERED flag when
      changing the elevator and use the request_queue's sysfs_lock to
      serialize between clearing the flag and the elevator testing the flag.
      Signed-off-by: NDavid Jeffery <djeffery@redhat.com>
      Tested-by: NMing Lei <ming.lei@redhat.com>
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      e9a823fb
    • W
      block, scheduler: convert xxx_var_store to void · 235f8da1
      weiping zhang 提交于
      The last parameter "count" never be used in xxx_var_store,
      convert these functions to void.
      Signed-off-by: Nweiping zhang <zhangweiping@didichuxing.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      235f8da1
  12. 26 8月, 2017 3 次提交
  13. 25 8月, 2017 1 次提交
  14. 24 8月, 2017 5 次提交
    • B
      compat_hdio_ioctl: Fix a declaration · 6a934bb8
      Bart Van Assche 提交于
      This patch avoids that sparse reports the following warning messages:
      
      block/compat_ioctl.c:85:11: warning: incorrect type in assignment (different address spaces)
      block/compat_ioctl.c:85:11:    expected unsigned long *[noderef] <asn:1>p
      block/compat_ioctl.c:85:11:    got void [noderef] <asn:1>*
      block/compat_ioctl.c:91:21: warning: incorrect type in argument 1 (different address spaces)
      block/compat_ioctl.c:91:21:    expected void const volatile [noderef] <asn:1>*<noident>
      block/compat_ioctl.c:91:21:    got unsigned long *[noderef] <asn:1>p
      block/compat_ioctl.c:87:53: warning: dereference of noderef expression
      block/compat_ioctl.c:91:21: warning: dereference of noderef expression
      
      Fixes: commit d597580d ("generic ...copy_..._user primitives")
      Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      6a934bb8
    • W
      block: remove blk_free_devt in add_partition · 47570848
      weiping zhang 提交于
      put_device(pdev) will call pdev->type->release finally, and blk_free_devt
      has been called in part_release(), so remove it.
      Signed-off-by: Nweiping zhang <zhangweiping@didichuxing.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      47570848
    • B
      bsg-lib: fix kernel panic resulting from missing allocation of reply-buffer · 50b4d485
      Benjamin Block 提交于
      Since we split the scsi_request out of struct request bsg fails to
      provide a reply-buffer for the drivers. This was done via the pointer
      for sense-data, that is not preallocated anymore.
      
      Failing to allocate/assign it results in illegal dereferences because
      LLDs use this pointer unquestioned.
      
      An example panic on s390x, using the zFCP driver, looks like this (I had
      debugging on, otherwise NULL-pointer dereferences wouldn't even panic on
      s390x):
      
      Unable to handle kernel pointer dereference in virtual kernel address space
      Failing address: 6b6b6b6b6b6b6000 TEID: 6b6b6b6b6b6b6403
      Fault in home space mode while using kernel ASCE.
      AS:0000000001590007 R3:0000000000000024
      Oops: 0038 ilc:2 [#1] PREEMPT SMP DEBUG_PAGEALLOC
      Modules linked in: <Long List>
      CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.12.0-bsg-regression+ #3
      Hardware name: IBM 2964 N96 702 (z/VM 6.4.0)
      task: 0000000065cb0100 task.stack: 0000000065cb4000
      Krnl PSW : 0704e00180000000 000003ff801e4156 (zfcp_fc_ct_els_job_handler+0x16/0x58 [zfcp])
                 R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:2 PM:0 RI:0 EA:3
      Krnl GPRS: 0000000000000001 000000005fa9d0d0 000000005fa9d078 0000000000e16866
                 000003ff00000290 6b6b6b6b6b6b6b6b 0000000059f78f00 000000000000000f
                 00000000593a0958 00000000593a0958 0000000060d88800 000000005ddd4c38
                 0000000058b50100 07000000659cba08 000003ff801e8556 00000000659cb9a8
      Krnl Code: 000003ff801e4146: e31020500004        lg      %r1,80(%r2)
                 000003ff801e414c: 58402040           l       %r4,64(%r2)
                #000003ff801e4150: e35020200004       lg      %r5,32(%r2)
                >000003ff801e4156: 50405004           st      %r4,4(%r5)
                 000003ff801e415a: e54c50080000       mvhi    8(%r5),0
                 000003ff801e4160: e33010280012       lt      %r3,40(%r1)
                 000003ff801e4166: a718fffb           lhi     %r1,-5
                 000003ff801e416a: 1803               lr      %r0,%r3
      Call Trace:
      ([<000003ff801e8556>] zfcp_fsf_req_complete+0x726/0x768 [zfcp])
       [<000003ff801ea82a>] zfcp_fsf_reqid_check+0x102/0x180 [zfcp]
       [<000003ff801eb980>] zfcp_qdio_int_resp+0x230/0x278 [zfcp]
       [<00000000009b91b6>] qdio_kick_handler+0x2ae/0x2c8
       [<00000000009b9e3e>] __tiqdio_inbound_processing+0x406/0xc10
       [<00000000001684c2>] tasklet_action+0x15a/0x1d8
       [<0000000000bd28ec>] __do_softirq+0x3ec/0x848
       [<00000000001675a4>] irq_exit+0x74/0xf8
       [<000000000010dd6a>] do_IRQ+0xba/0xf0
       [<0000000000bd19e8>] io_int_handler+0x104/0x2d4
       [<00000000001033b6>] enabled_wait+0xb6/0x188
      ([<000000000010339e>] enabled_wait+0x9e/0x188)
       [<000000000010396a>] arch_cpu_idle+0x32/0x50
       [<0000000000bd0112>] default_idle_call+0x52/0x68
       [<00000000001cd0fa>] do_idle+0x102/0x188
       [<00000000001cd41e>] cpu_startup_entry+0x3e/0x48
       [<0000000000118c64>] smp_start_secondary+0x11c/0x130
       [<0000000000bd2016>] restart_int_handler+0x62/0x78
       [<0000000000000000>]           (null)
      INFO: lockdep is turned off.
      Last Breaking-Event-Address:
       [<000003ff801e41d6>] zfcp_fc_ct_job_handler+0x3e/0x48 [zfcp]
      
      Kernel panic - not syncing: Fatal exception in interrupt
      
      This patch moves bsg-lib to allocate and setup struct bsg_job ahead of
      time, including the allocation of a buffer for the reply-data.
      
      This means, struct bsg_job is not allocated separately anymore, but as part
      of struct request allocation - similar to struct scsi_cmd. Reflect this in
      the function names that used to handle creation/destruction of struct
      bsg_job.
      Reported-by: NSteffen Maier <maier@linux.vnet.ibm.com>
      Suggested-by: NChristoph Hellwig <hch@lst.de>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NBenjamin Block <bblock@linux.vnet.ibm.com>
      Fixes: 82ed4db4 ("block: split scsi_request out of struct request")
      Cc: <stable@vger.kernel.org> #4.11+
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      50b4d485
    • M
      bio-integrity: Fix regression if profile verify_fn is NULL · 97e05463
      Milan Broz 提交于
      In dm-integrity target we register integrity profile that have
      both generate_fn and verify_fn callbacks set to NULL.
      
      This is used if dm-integrity is stacked under a dm-crypt device
      for authenticated encryption (integrity payload contains authentication
      tag and IV seed).
      
      In this case the verification is done through own crypto API
      processing inside dm-crypt; integrity profile is only holder
      of these data. (And memory is owned by dm-crypt as well.)
      
      After the commit (and previous changes)
        Commit 7c20f116
        Author: Christoph Hellwig <hch@lst.de>
        Date:   Mon Jul 3 16:58:43 2017 -0600
      
          bio-integrity: stop abusing bi_end_io
      
      we get this crash:
      
      : BUG: unable to handle kernel NULL pointer dereference at   (null)
      : IP:   (null)
      : *pde = 00000000
      ...
      :
      : Workqueue: kintegrityd bio_integrity_verify_fn
      : task: f48ae180 task.stack: f4b5c000
      : EIP:   (null)
      : EFLAGS: 00210286 CPU: 0
      : EAX: f4b5debc EBX: 00001000 ECX: 00000001 EDX: 00000000
      : ESI: 00001000 EDI: ed25f000 EBP: f4b5dee8 ESP: f4b5dea4
      :  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
      : CR0: 80050033 CR2: 00000000 CR3: 32823000 CR4: 001406d0
      : Call Trace:
      :  ? bio_integrity_process+0xe3/0x1e0
      :  bio_integrity_verify_fn+0xea/0x150
      :  process_one_work+0x1c7/0x5c0
      :  worker_thread+0x39/0x380
      :  kthread+0xd6/0x110
      :  ? process_one_work+0x5c0/0x5c0
      :  ? kthread_worker_fn+0x100/0x100
      :  ? kthread_worker_fn+0x100/0x100
      :  ret_from_fork+0x19/0x24
      : Code:  Bad EIP value.
      : EIP:   (null) SS:ESP: 0068:f4b5dea4
      : CR2: 0000000000000000
      
      Patch just skip the whole verify workqueue if verify_fn is set to NULL.
      
      Fixes: 7c20f116 ("bio-integrity: stop abusing bi_end_io")
      Signed-off-by: NMilan Broz <gmazyland@gmail.com>
      [hch: trivial whitespace fix]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      97e05463
    • S
      blk-throttle: cap discard request size · ea0ea2bc
      Shaohua Li 提交于
      discard request usually is very big and easily use all bandwidth budget
      of a cgroup. discard request size doesn't really mean the size of data
      written, so it doesn't make sense to account it into bandwidth budget.
      Jens pointed out treating the size 0 doesn't make sense too, because
      discard request does have cost. But it's not easy to find the actual
      cost. This patch simply makes the size one sector.
      Signed-off-by: NShaohua Li <shli@fb.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      ea0ea2bc