1. 20 1月, 2018 2 次提交
    • B
      block: Remove kblockd_schedule_delayed_work{,_on}() · f5ced52a
      Bart Van Assche 提交于
      The previous patch removed all users of these two functions. Hence
      also remove the functions themselves.
      Reviewed-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      f5ced52a
    • B
      lib/scatterlist: Fix chaining support in sgl_alloc_order() · 8c7a8d1c
      Bart Van Assche 提交于
      This patch avoids that workloads with large block sizes (megabytes)
      can trigger the following call stack with the ib_srpt driver (that
      driver is the only driver that chains scatterlists allocated by
      sgl_alloc_order()):
      
      BUG: Bad page state in process kworker/0:1H  pfn:2423a78
      page:fffffb03d08e9e00 count:-3 mapcount:0 mapping:          (null) index:0x0
      flags: 0x57ffffc0000000()
      raw: 0057ffffc0000000 0000000000000000 0000000000000000 fffffffdffffffff
      raw: dead000000000100 dead000000000200 0000000000000000 0000000000000000
      page dumped because: nonzero _count
      CPU: 0 PID: 733 Comm: kworker/0:1H Tainted: G          I      4.15.0-rc7.bart+ #1
      Hardware name: HP ProLiant DL380 G7, BIOS P67 08/16/2015
      Workqueue: ib-comp-wq ib_cq_poll_work [ib_core]
      Call Trace:
       dump_stack+0x5c/0x83
       bad_page+0xf5/0x10f
       get_page_from_freelist+0xa46/0x11b0
       __alloc_pages_nodemask+0x103/0x290
       sgl_alloc_order+0x101/0x180
       target_alloc_sgl+0x2c/0x40 [target_core_mod]
       srpt_alloc_rw_ctxs+0x173/0x2d0 [ib_srpt]
       srpt_handle_new_iu+0x61e/0x7f0 [ib_srpt]
       __ib_process_cq+0x55/0xa0 [ib_core]
       ib_cq_poll_work+0x1b/0x60 [ib_core]
       process_one_work+0x141/0x340
       worker_thread+0x47/0x3e0
       kthread+0xf5/0x130
       ret_from_fork+0x1f/0x30
      
      Fixes: e80a0af4 ("lib/scatterlist: Introduce sgl_alloc() and sgl_free()")
      Reported-by: NLaurence Oberman <loberman@redhat.com>
      Tested-by: NLaurence Oberman <loberman@redhat.com>
      Signed-off-by: NBart Van Assche <bart.vanassche@wdc.com>
      Cc: Nicholas A. Bellinger <nab@linux-iscsi.org>
      Cc: Laurence Oberman <loberman@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      8c7a8d1c
  2. 19 1月, 2018 1 次提交
  3. 18 1月, 2018 1 次提交
  4. 16 1月, 2018 1 次提交
    • A
      blkcg: simplify statistic accumulation code · ddc21231
      Arnd Bergmann 提交于
      Some older compilers (gcc-4.4 through 4.6 in particular) struggle
      with the way that blkg_rwstat_read() returns a structure, leading
      to excessive stack usage and rather inefficient code:
      
      block/blk-cgroup.c: In function 'blkg_destroy':
      block/blk-cgroup.c:354:1: error: the frame size of 1296 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
      block/cfq-iosched.c: In function 'cfqg_stats_add_aux':
      block/cfq-iosched.c:753:1: error: the frame size of 1928 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
      block/bfq-cgroup.c: In function 'bfqg_stats_add_aux':
      block/bfq-cgroup.c:299:1: error: the frame size of 1928 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
      
      I also notice that there is no point in using atomic accesses
      for the local variables, so storing the temporaries in simple 'u64'
      variables not only avoids the stack usage on older compilers but
      also improves the object code on modern versions.
      
      Fixes: e6269c44 ("blkcg: add blkg_[rw]stat->aux_cnt and replace cfq_group->dead_stats with it")
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      ddc21231
  5. 15 1月, 2018 1 次提交
    • M
      block: allow gendisk's request_queue registration to be deferred · fa70d2e2
      Mike Snitzer 提交于
      Since I can remember DM has forced the block layer to allow the
      allocation and initialization of the request_queue to be distinct
      operations.  Reason for this is block/genhd.c:add_disk() has requires
      that the request_queue (and associated bdi) be tied to the gendisk
      before add_disk() is called -- because add_disk() also deals with
      exposing the request_queue via blk_register_queue().
      
      DM's dynamic creation of arbitrary device types (and associated
      request_queue types) requires the DM device's gendisk be available so
      that DM table loads can establish a master/slave relationship with
      subordinate devices that are referenced by loaded DM tables -- using
      bd_link_disk_holder().  But until these DM tables, and their associated
      subordinate devices, are known DM cannot know what type of request_queue
      it needs -- nor what its queue_limits should be.
      
      This chicken and egg scenario has created all manner of problems for DM
      and, at times, the block layer.
      
      Summary of changes:
      
      - Add device_add_disk_no_queue_reg() and add_disk_no_queue_reg() variant
        that drivers may use to add a disk without also calling
        blk_register_queue().  Driver must call blk_register_queue() once its
        request_queue is fully initialized.
      
      - Return early from blk_unregister_queue() if QUEUE_FLAG_REGISTERED
        is not set.  It won't be set if driver used add_disk_no_queue_reg()
        but driver encounters an error and must del_gendisk() before calling
        blk_register_queue().
      
      - Export blk_register_queue().
      
      These changes allow DM to use add_disk_no_queue_reg() to anchor its
      gendisk as the "master" for master/slave relationships DM must establish
      with subordinate devices referenced in DM tables that get loaded.  Once
      all "slave" devices for a DM device are known its request_queue can be
      properly initialized and then advertised via sysfs -- important
      improvement being that no request_queue resource initialization
      performed by blk_register_queue() is missed for DM devices anymore.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Reviewed-by: NMing Lei <ming.lei@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      fa70d2e2
  6. 11 1月, 2018 5 次提交
  7. 10 1月, 2018 3 次提交
    • T
      blk-mq: rename blk_mq_hw_ctx->queue_rq_srcu to ->srcu · 05707b64
      Tejun Heo 提交于
      The RCU protection has been expanded to cover both queueing and
      completion paths making ->queue_rq_srcu a misnomer.  Rename it to
      ->srcu as suggested by Bart.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Bart Van Assche <Bart.VanAssche@wdc.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      05707b64
    • T
      blk-mq: remove REQ_ATOM_COMPLETE usages from blk-mq · 634f9e46
      Tejun Heo 提交于
      After the recent updates to use generation number and state based
      synchronization, blk-mq no longer depends on REQ_ATOM_COMPLETE except
      to avoid firing the same timeout multiple times.
      
      Remove all REQ_ATOM_COMPLETE usages and use a new rq_flags flag
      RQF_MQ_TIMEOUT_EXPIRED to avoid firing the same timeout multiple
      times.  This removes atomic bitops from hot paths too.
      
      v2: Removed blk_clear_rq_complete() from blk_mq_rq_timed_out().
      
      v3: Added RQF_MQ_TIMEOUT_EXPIRED flag.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: "jianchao.wang" <jianchao.w.wang@oracle.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      634f9e46
    • T
      blk-mq: replace timeout synchronization with a RCU and generation based scheme · 1d9bd516
      Tejun Heo 提交于
      Currently, blk-mq timeout path synchronizes against the usual
      issue/completion path using a complex scheme involving atomic
      bitflags, REQ_ATOM_*, memory barriers and subtle memory coherence
      rules.  Unfortunately, it contains quite a few holes.
      
      There's a complex dancing around REQ_ATOM_STARTED and
      REQ_ATOM_COMPLETE between issue/completion and timeout paths; however,
      they don't have a synchronization point across request recycle
      instances and it isn't clear what the barriers add.
      blk_mq_check_expired() can easily read STARTED from N-2'th iteration,
      deadline from N-1'th, blk_mark_rq_complete() against Nth instance.
      
      In fact, it's pretty easy to make blk_mq_check_expired() terminate a
      later instance of a request.  If we induce 5 sec delay before
      time_after_eq() test in blk_mq_check_expired(), shorten the timeout to
      2s, and issue back-to-back large IOs, blk-mq starts timing out
      requests spuriously pretty quickly.  Nothing actually timed out.  It
      just made the call on a recycle instance of a request and then
      terminated a later instance long after the original instance finished.
      The scenario isn't theoretical either.
      
      This patch replaces the broken synchronization mechanism with a RCU
      and generation number based one.
      
      1. Each request has a u64 generation + state value, which can be
         updated only by the request owner.  Whenever a request becomes
         in-flight, the generation number gets bumped up too.  This provides
         the basis for the timeout path to distinguish different recycle
         instances of the request.
      
         Also, marking a request in-flight and setting its deadline are
         protected with a seqcount so that the timeout path can fetch both
         values coherently.
      
      2. The timeout path fetches the generation, state and deadline.  If
         the verdict is timeout, it records the generation into a dedicated
         request abortion field and does RCU wait.
      
      3. The completion path is also protected by RCU (from the previous
         patch) and checks whether the current generation number and state
         match the abortion field.  If so, it skips completion.
      
      4. The timeout path, after RCU wait, scans requests again and
         terminates the ones whose generation and state still match the ones
         requested for abortion.
      
         By now, the timeout path knows that either the generation number
         and state changed if it lost the race or the completion will yield
         to it and can safely timeout the request.
      
      While it's more lines of code, it's conceptually simpler, doesn't
      depend on direct use of subtle memory ordering or coherence, and
      hopefully doesn't terminate the wrong instance.
      
      While this change makes REQ_ATOM_COMPLETE synchronization unnecessary
      between issue/complete and timeout paths, REQ_ATOM_COMPLETE isn't
      removed yet as it's still used in other places.  Future patches will
      move all state tracking to the new mechanism and remove all bitops in
      the hot paths.
      
      Note that this patch adds a comment explaining a race condition in
      BLK_EH_RESET_TIMER path.  The race has always been there and this
      patch doesn't change it.  It's just documenting the existing race.
      
      v2: - Fixed BLK_EH_RESET_TIMER handling as pointed out by Jianchao.
          - s/request->gstate_seqc/request->gstate_seq/ as suggested by Peter.
          - READ_ONCE() added in blk_mq_rq_update_state() as suggested by Peter.
      
      v3: - Fixed possible extended seqcount / u64_stats_sync read looping
            spotted by Peter.
          - MQ_RQ_IDLE was incorrectly being set in complete_request instead
            of free_request.  Fixed.
      
      v4: - Rebased on top of hctx_lock() refactoring patch.
          - Added comment explaining the use of hctx_lock() in completion path.
      
      v5: - Added comments requested by Bart.
          - Note the addition of BLK_EH_RESET_TIMER race condition in the
            commit message.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: "jianchao.wang" <jianchao.w.wang@oracle.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Bart Van Assche <Bart.VanAssche@wdc.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      1d9bd516
  8. 07 1月, 2018 4 次提交
  9. 06 1月, 2018 1 次提交
    • C
      block: introduce zoned block devices zone write locking · 6cc77e9c
      Christoph Hellwig 提交于
      Components relying only on the request_queue structure for accessing
      block devices (e.g. I/O schedulers) have a limited knowledged of the
      device characteristics. In particular, the device capacity cannot be
      easily discovered, which for a zoned block device also result in the
      inability to easily know the number of zones of the device (the zone
      size is indicated by the chunk_sectors field of the queue limits).
      
      Introduce the nr_zones field to the request_queue structure to simplify
      access to this information. Also, add the bitmap seq_zone_bitmap which
      indicates which zones of the device are sequential zones (write
      preferred or write required) and the bitmap seq_zones_wlock which
      indicates if a zone is write locked, that is, if a write request
      targeting a zone was dispatched to the device. These fields are
      initialized by the low level block device driver (sd.c for ZBC/ZAC
      disks). They are not initialized by stacking drivers (device mappers)
      handling zoned block devices (e.g. dm-linear).
      
      Using this, I/O schedulers can introduce zone write locking to control
      request dispatching to a zoned block device and avoid write request
      reordering by limiting to at most a single write request per zone
      outside of the scheduler at any time.
      
      Based on previous patches from Damien Le Moal.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      [Damien]
      * Fixed comments and identation in blkdev.h
      * Changed helper functions
      * Fixed this commit message
      Signed-off-by: NDamien Le Moal <damien.lemoal@wdc.com>
      Reviewed-by: NMartin K. Petersen <martin.petersen@oracle.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      6cc77e9c
  10. 05 1月, 2018 6 次提交
  11. 17 12月, 2017 3 次提交
  12. 16 12月, 2017 2 次提交
  13. 15 12月, 2017 6 次提交
  14. 14 12月, 2017 2 次提交
    • D
      drm: rework delayed connector cleanup in connector_iter · ea497bb9
      Daniel Vetter 提交于
      PROBE_DEFER also uses system_wq to reprobe drivers, which means when
      that again fails, and we try to flush the overall system_wq (to get
      all the delayed connectore cleanup work_struct completed), we
      deadlock.
      
      Fix this by using just a single cleanup work, so that we can only
      flush that one and don't block on anything else. That means a free
      list plus locking, a standard pattern.
      
      v2:
      - Correctly free connectors only on last ref. Oops (Chris).
      - use llist_head/node (Chris).
      
      v3
      - Add init_llist_head (Chris).
      
      Fixes: a703c550 ("drm: safely free connectors from connector_iter")
      Fixes: 613051da ("drm: locking&new iterators for connector_list")
      Cc: Ben Widawsky <ben@bwidawsk.net>
      Cc: Dave Airlie <airlied@gmail.com>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Sean Paul <seanpaul@chromium.org>
      Cc: <stable@vger.kernel.org> # v4.11+: 613051da ("drm: locking&new iterators for connector_list"
      Cc: <stable@vger.kernel.org> # v4.11+
      Cc: Daniel Vetter <daniel.vetter@intel.com>
      Cc: Jani Nikula <jani.nikula@linux.intel.com>
      Cc: Gustavo Padovan <gustavo@padovan.org>
      Cc: David Airlie <airlied@linux.ie>
      Cc: Javier Martinez Canillas <javier@dowhile0.org>
      Cc: Shuah Khan <shuahkh@osg.samsung.com>
      Cc: Guillaume Tucker <guillaume.tucker@collabora.com>
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Kevin Hilman <khilman@baylibre.com>
      Cc: Matt Hart <matthew.hart@linaro.org>
      Cc: Thierry Escande <thierry.escande@collabora.co.uk>
      Cc: Tomeu Vizoso <tomeu.vizoso@collabora.com>
      Cc: Enric Balletbo i Serra <enric.balletbo@collabora.com>
      Tested-by: NMarek Szyprowski <m.szyprowski@samsung.com>
      Reviewed-by: NChris Wilson <chris@chris-wilson.co.uk>
      Signed-off-by: NDaniel Vetter <daniel.vetter@intel.com>
      Link: https://patchwork.freedesktop.org/patch/msgid/20171213124936.17914-1-daniel.vetter@ffwll.ch
      ea497bb9
    • E
      ipv4: igmp: guard against silly MTU values · b5476022
      Eric Dumazet 提交于
      IPv4 stack reacts to changes to small MTU, by disabling itself under
      RTNL.
      
      But there is a window where threads not using RTNL can see a wrong
      device mtu. This can lead to surprises, in igmp code where it is
      assumed the mtu is suitable.
      
      Fix this by reading device mtu once and checking IPv4 minimal MTU.
      
      This patch adds missing IPV4_MIN_MTU define, to not abuse
      ETH_MIN_MTU anymore.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b5476022
  15. 13 12月, 2017 1 次提交
    • K
      drm: Update edid-derived drm_display_info fields at edid property set [v2] · 4b4df570
      Keith Packard 提交于
      There are a set of values in the drm_display_info structure for each
      connector which hold information derived from EDID. These are computed
      in drm_add_display_info. Before this patch, that was only called in
      drm_add_edid_modes. This meant that they were only set when EDID was
      present and never reset when EDID was not, as happened when the
      display was disconnected.
      
      One of these fields, non_desktop, is used from
      drm_mode_connector_update_edid_property, the function responsible for
      assigning the new edid value to the application-visible property.
      
      Various drivers call these two functions (drm_add_edid_modes and
      drm_mode_connector_update_edid_property) in different orders. This
      means that even when EDID is present, the drm_display_info fields may
      not have been computed at the time that
      drm_mode_connector_update_edid_property used the non_desktop value to
      set the non_desktop property.
      
      I've added a public function (drm_reset_display_info) that resets the
      drm_display_info field values to default values and then made the
      drm_add_display_info function public. These two functions are now
      called directly from drm_mode_connector_update_edid_property so that
      the drm_display_info fields are always computed from the current EDID
      information before being used in that function.
      
      This means that the drm_display_info values are often computed twice,
      once when the EDID property it set and a second time when EDID is used
      to compute modes for the device. The alternative would be to uniformly
      ensure that the values were computed once before being used, which
      would require that all drivers reliably invoke the two paths in the
      same order. The computation is inexpensive enough that it seems more
      maintainable in the long term to simply compute them in both paths.
      
      The API to drm_add_display_info has been changed so that it no longer
      takes the set of edid-based quirks as a parameter. Rather, it now
      computes those quirks itself and returns them for further use by
      drm_add_edid_modes.
      
      This patch also includes a number of 'const' additions caused by
      drm_mode_connector_update_edid_property taking a 'const struct edid *'
      parameter and wanting to pass that along to drm_add_display_info.
      
      v2: after review by Daniel Vetter <daniel.vetter@ffwll.ch>
      
      	Removed EXPORT_SYMBOL_GPL for drm_reset_display_info and
      	drm_add_display_info.
      
      	Added FIXME in drm_mode_connector_update_edid_property about
      	potentially merging that with drm_add_edid_modes to avoid
      	the need for two driver calls.
      Signed-off-by: NKeith Packard <keithp@keithp.com>
      Reviewed-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Link: https://patchwork.freedesktop.org/patch/msgid/20171213084427.31199-1-keithp@keithp.com
      (danvet: cherry picked from commit 12a889bf4bca ("drm: rework delayed
      connector cleanup in connector_iter") from drm-misc-next since
      functional conflict with changes in -next and we need to make sure
      both have the right version and nothing gets lost.)
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      4b4df570
  16. 12 12月, 2017 1 次提交