1. 10 3月, 2017 3 次提交
    • L
      Merge branch 'for-linus' of git://git.kernel.dk/linux-block · 34bbce9e
      Linus Torvalds 提交于
      Pull block fixes from Jens Axboe:
       "Sending this a bit sooner than I otherwise would have, as a fix in the
        merge window had some unfortunate issues and side effects for some
        folks.
      
        This contains:
      
         - Fixes from Jan for the bdi registration/unregistration. These have
           been tested by the various parties reporting issues, and should be
           solid at this point.
      
         - Also from Jan, fix for axonram gendisk registration.
      
         - A stable fix for zram from Johannes.
      
         - A small series from Ming, fixing up some long standing issues with
           blk-mq hardware queue kobject initialization and registration.
      
         - A fix for sed opal from Jon, fixing a nonsensical range check and
           some set-but-not-used variables.
      
         - A fix from Neil for a long standing deadlock issue for stacking
           device drivers. With this in place, dm/md don't have to work around
           the issue anymore, and can be properly fixed up"
      
      * 'for-linus' of git://git.kernel.dk/linux-block:
        axonram: Fix gendisk handling
        blk: improve order of bio handling in generic_make_request()
        Revert "scsi, block: fix duplicate bdi name registration crashes"
        block: Make del_gendisk() safer for disks without queues
        bdi: Fix use-after-free in wb_congested_put()
        block: Allow bdi re-registration
        block/sed: Fix opal user range check and unused variables
        zram: set physical queue limits to avoid array out of bounds accesses
        blk-mq: free hctx->cpumask in release handler of hctx's kobject
        blk-mq: make lifetime consistent between hctx and its kobject
        blk-mq: make lifetime consitent between q/ctx and its kobject
        blk-mq: initialize mq kobjects in blk_mq_init_allocated_queue()
      34bbce9e
    • L
      Merge tag 'media/v4.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media · bb61ce54
      Linus Torvalds 提交于
      Pull media fixes from Mauro Carvalho Chehab:
       "Media regression fixes:
      
         - serial_ir: fix a Kernel crash during boot on Kernel 4.11-rc1, due
           to an IRQ code called too early
      
         - other IR regression fixes at lirc and at the raw IR decoding
      
         - a deadlock fix at the RC nuvoton driver
      
         - fix another issue with DMA on stack at dw2102 driver
      
        There's an extra patch there that change a driver interface for the
        SoC VSP1 driver, with is shared between the DRM and V4L2 driver. The
        patch itself is trivial, and was acked by David Arlie"
      
      * tag 'media/v4.11-2' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
        [media] v4l: vsp1: Adapt vsp1_du_setup_lif() interface to use a structure
        [media] dw2102: don't do DMA on stack
        [media] rc: protocol is not set on register for raw IR devices
        [media] rc: raw decoder for keymap protocol is not loaded on register
        [media] rc: nuvoton: fix deadlock in nvt_write_wakeup_codes
        [media] lirc: fix dead lock between open and wakeup_filter
        [media] serial_ir: ensure we're ready to receive interrupts
      bb61ce54
    • L
      Merge tag 'for-linus-4.11-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip · cb2113cb
      Linus Torvalds 提交于
      Pull xen fix and cleanup from Juergen Gross:
       "This contains one fix for MSIX handling under Xen and a trivial
        cleanup patch"
      
      * tag 'for-linus-4.11-rc1-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
        xenbus: Remove duplicate inclusion of linux/init.h
        xen: do not re-use pirq number cached in pci device msi msg data
      cb2113cb
  2. 09 3月, 2017 17 次提交
    • L
      Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · ea6200e8
      Linus Torvalds 提交于
      Pull sched.h split-up fixes for MIPS from Ingo Molnar:
       "These are the fixes for MIPS build failures due to the sched.h
        split-up, from Arnd Bergmann"
      
      * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        MIPS: Add missing include files
      ea6200e8
    • T
      mm, page_alloc: Add missing check for memory holes · b4fb8f66
      Tony Luck 提交于
      Commit 13ad59df ("mm, page_alloc: avoid page_to_pfn() when merging
      buddies") moved the check for memory holes out of page_is_buddy() and
      had the callers do the check.
      
      But this wasn't done correctly in one place which caused ia64 to crash
      very early in boot.
      
      Update to fix that and make ia64 boot again.
      
      [ v2: Vlastimil pointed out we don't need to call page_to_pfn()
            since we already have the result of that in "buddy_pfn" ]
      
      Fixes: 13ad59df ("avoid page_to_pfn() when merging buddies")
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b4fb8f66
    • L
      Merge tag 'ktest-v4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest · 8557b8e4
      Linus Torvalds 提交于
      Pull ktest fixes from Steven Rostedt:
       "Greg Kroah-Hartman reported to me that the ktest of v4.11-rc1 locked
        up in an infinite loop while doing the make mrproper.
      
        Looking into the cause I noticed that a recent update to the function
        run_command (used for running all shell commands, including "make
        mrproper") changed the internal loop to use the function
        wait_for_input.
      
        The wait_for_input function uses select to look at two file
        descriptors. One is the file descriptor of the command it is running,
        the other is STDIN. The STDIN check was not checking the return status
        of the sysread call, and was also just writing a lot of data into
        syswrite without regard to the size of the data read.
      
        Changing the code to check the return status of sysread, and also to
        still process the passed in descriptor data without looping back to
        the select fixed Greg's problem.
      
        While looking at this code I also realized that the loop did not honor
        the timeout if STDIN always had input (or for some reason return
        error). this could prevent wait_for_input to timeout on the file
        descriptor it is suppose to be waiting for. That is fixed too"
      
      * tag 'ktest-v4.11-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-ktest:
        ktest: Make sure wait_for_input does honor the timeout
        ktest: Fix while loop in wait_for_input
      8557b8e4
    • L
      overlayfs: remove now unnecessary header file include · 04bb94b1
      Linus Torvalds 提交于
      This removes the extra include header file that was added in commit
      e58bc927 "Pull overlayfs updates from Miklos Szeredi" now that it
      is no longer needed.
      
      There are probably other such includes that got added during the
      scheduler header splitup series, but this is the one that annoyed me
      personally and I know about.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      04bb94b1
    • L
      sched/headers: fix up header file dependency on <linux/sched/signal.h> · bd0f9b35
      Linus Torvalds 提交于
      The scheduler header file split and cleanups ended up exposing a few
      nasty header file dependencies, and in particular it showed how we in
      <linux/wait.h> ended up depending on "signal_pending()", which now comes
      from <linux/sched/signal.h>.
      
      That's a very subtle and annoying dependency, which already caused a
      semantic merge conflict (see commit e58bc927 "Pull overlayfs updates
      from Miklos Szeredi", which added that fixup in the merge commit).
      
      It turns out that we can avoid this dependency _and_ improve code
      generation by moving the guts of the fairly nasty helper #define
      __wait_event_interruptible_locked() to out-of-line code.  The code that
      includes the signal_pending() check is all in the slow-path where we
      actually go to sleep waiting for the event anyway, so using a helper
      function is the right thing to do.
      
      Using a helper function is also what we already did for the non-locked
      versions, see the "__wait_event*()" macros and the "prepare_to_wait*()"
      set of helper functions.
      
      We might want to try to unify all these macro games, we have a _lot_ of
      subtly different wait-event loops.  But this is the minimal patch to fix
      the annoying header dependency.
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bd0f9b35
    • J
      axonram: Fix gendisk handling · 672a2c87
      Jan Kara 提交于
      It is invalid to call del_gendisk() when disk->queue is NULL. Fix error
      handling in axon_ram_probe() to avoid doing that.
      
      Also del_gendisk() does not drop a reference to gendisk allocated by
      alloc_disk(). That has to be done by put_disk(). Add that call where
      needed.
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      672a2c87
    • N
      blk: improve order of bio handling in generic_make_request() · 79bd9959
      NeilBrown 提交于
      To avoid recursion on the kernel stack when stacked block devices
      are in use, generic_make_request() will, when called recursively,
      queue new requests for later handling.  They will be handled when the
      make_request_fn for the current bio completes.
      
      If any bios are submitted by a make_request_fn, these will ultimately
      be handled seqeuntially.  If the handling of one of those generates
      further requests, they will be added to the end of the queue.
      
      This strict first-in-first-out behaviour can lead to deadlocks in
      various ways, normally because a request might need to wait for a
      previous request to the same device to complete.  This can happen when
      they share a mempool, and can happen due to interdependencies
      particular to the device.  Both md and dm have examples where this happens.
      
      These deadlocks can be erradicated by more selective ordering of bios.
      Specifically by handling them in depth-first order.  That is: when the
      handling of one bio generates one or more further bios, they are
      handled immediately after the parent, before any siblings of the
      parent.  That way, when generic_make_request() calls make_request_fn
      for some particular device, we can be certain that all previously
      submited requests for that device have been completely handled and are
      not waiting for anything in the queue of requests maintained in
      generic_make_request().
      
      An easy way to achieve this would be to use a last-in-first-out stack
      instead of a queue.  However this will change the order of consecutive
      bios submitted by a make_request_fn, which could have unexpected consequences.
      Instead we take a slightly more complex approach.
      A fresh queue is created for each call to a make_request_fn.  After it completes,
      any bios for a different device are placed on the front of the main queue, followed
      by any bios for the same device, followed by all bios that were already on
      the queue before the make_request_fn was called.
      This provides the depth-first approach without reordering bios on the same level.
      
      This, by itself, it not enough to remove all deadlocks.  It just makes
      it possible for drivers to take the extra step required themselves.
      
      To avoid deadlocks, drivers must never risk waiting for a request
      after submitting one to generic_make_request.  This includes never
      allocing from a mempool twice in the one call to a make_request_fn.
      
      A common pattern in drivers is to call bio_split() in a loop, handling
      the first part and then looping around to possibly split the next part.
      Instead, a driver that finds it needs to split a bio should queue
      (with generic_make_request) the second part, handle the first part,
      and then return.  The new code in generic_make_request will ensure the
      requests to underlying bios are processed first, then the second bio
      that was split off.  If it splits again, the same process happens.  In
      each case one bio will be completely handled before the next one is attempted.
      
      With this is place, it should be possible to disable the
      punt_bios_to_recover() recovery thread for many block devices, and
      eventually it may be possible to remove it completely.
      
      Ref: http://www.spinics.net/lists/raid/msg54680.htmlTested-by: NJinpu Wang <jinpu.wang@profitbricks.com>
      Inspired-by: NLars Ellenberg <lars.ellenberg@linbit.com>
      Signed-off-by: NNeilBrown <neilb@suse.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      79bd9959
    • J
      Revert "scsi, block: fix duplicate bdi name registration crashes" · c01228db
      Jan Kara 提交于
      This reverts commit 0dba1314. It causes
      leaking of device numbers for SCSI when SCSI registers multiple gendisks
      for one request_queue in succession. It can be easily reproduced using
      Omar's script [1] on kernel with CONFIG_DEBUG_TEST_DRIVER_REMOVE.
      Furthermore the protection provided by this commit is not needed anymore
      as the problem it was fixing got also fixed by commit 165a5e22
      "block: Move bdi_unregister() to del_gendisk()".
      
      [1]: http://marc.info/?l=linux-block&m=148554717109098&w=2Signed-off-by: NJan Kara <jack@suse.cz>
      Acked-by: NDan Williams <dan.j.williams@intel.com>
      Tested-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      c01228db
    • J
      block: Make del_gendisk() safer for disks without queues · 90f16fdd
      Jan Kara 提交于
      Commit 165a5e22 "block: Move bdi_unregister() to del_gendisk()"
      added disk->queue dereference to del_gendisk(). Although del_gendisk()
      is not supposed to be called without disk->queue valid and
      blk_unregister_queue() warns in that case, this change will make it oops
      instead. Return to the old more robust behavior of just warning when
      del_gendisk() gets called for gendisk with disk->queue being NULL.
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Tested-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      90f16fdd
    • J
      bdi: Fix use-after-free in wb_congested_put() · df23de55
      Jan Kara 提交于
      bdi_writeback_congested structures get created for each blkcg and bdi
      regardless whether bdi is registered or not. When they are created in
      unregistered bdi and the request queue (and thus bdi) is then destroyed
      while blkg still holds reference to bdi_writeback_congested structure,
      this structure will be referencing freed bdi and last wb_congested_put()
      will try to remove the structure from already freed bdi.
      
      With commit 165a5e22 "block: Move bdi_unregister() to
      del_gendisk()", SCSI started to destroy bdis without calling
      bdi_unregister() first (previously it was calling bdi_unregister() even
      for unregistered bdis) and thus the code detaching
      bdi_writeback_congested in cgwb_bdi_destroy() was not triggered and we
      started hitting this use-after-free bug. It is enough to boot a KVM
      instance with virtio-scsi device to trigger this behavior.
      
      Fix the problem by detaching bdi_writeback_congested structures in
      bdi_exit() instead of bdi_unregister(). This is also more logical as
      they can get attached to bdi regardless whether it ever got registered
      or not.
      
      Fixes: 165a5e22Signed-off-by: NJan Kara <jack@suse.cz>
      Tested-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      df23de55
    • J
      block: Allow bdi re-registration · b6f8fec4
      Jan Kara 提交于
      SCSI can call device_add_disk() several times for one request queue when
      a device in unbound and bound, creating new gendisk each time. This will
      lead to bdi being repeatedly registered and unregistered. This was not a
      big problem until commit 165a5e22 "block: Move bdi_unregister() to
      del_gendisk()" since bdi was only registered repeatedly (bdi_register()
      handles repeated calls fine, only we ended up leaking reference to
      gendisk due to overwriting bdi->owner) but unregistered only in
      blk_cleanup_queue() which didn't get called repeatedly. After
      165a5e22 we were doing correct bdi_register() - bdi_unregister()
      cycles however bdi_unregister() is not prepared for it. So make sure
      bdi_unregister() cleans up bdi in such a way that it is prepared for
      a possible following bdi_register() call.
      
      An easy way to provoke this behavior is to enable
      CONFIG_DEBUG_TEST_DRIVER_REMOVE and use scsi_debug driver to create a
      scsi disk which immediately hangs without this fix.
      
      Fixes: 165a5e22Signed-off-by: NJan Kara <jack@suse.cz>
      Tested-by: NOmar Sandoval <osandov@fb.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      b6f8fec4
    • J
      block/sed: Fix opal user range check and unused variables · b0bfdfc2
      Jon Derrick 提交于
      Fixes check that the opal user is within the range, and cleans up unused
      method variables.
      Signed-off-by: NJon Derrick <jonathan.derrick@intel.com>
      Reviewed-by: NScott Bauer <scott.bauer@intel.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      b0bfdfc2
    • J
      zram: set physical queue limits to avoid array out of bounds accesses · 0bc31538
      Johannes Thumshirn 提交于
      zram can handle at most SECTORS_PER_PAGE sectors in a bio's bvec. When using
      the NVMe over Fabrics loopback target which potentially sends a huge bulk of
      pages attached to the bio's bvec this results in a kernel panic because of
      array out of bounds accesses in zram_decompress_page().
      Signed-off-by: NJohannes Thumshirn <jthumshirn@suse.de>
      Reviewed-by: NHannes Reinecke <hare@suse.com>
      Reviewed-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      0bc31538
    • M
      blk-mq: free hctx->cpumask in release handler of hctx's kobject · 01388df3
      Ming Lei 提交于
      It is obviously that hctx->cpumask is per hctx, and both
      share same lifetime, so this patch moves freeing of hctx->cpumask
      into release handler of hctx's kobject.
      Signed-off-by: NMing Lei <tom.leiming@gmail.com>
      Tested-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      01388df3
    • M
      blk-mq: make lifetime consistent between hctx and its kobject · 6c8b232e
      Ming Lei 提交于
      This patch removes kobject_put() over hctx in __blk_mq_unregister_dev(),
      and trys to keep lifetime consistent between hctx and hctx's kobject.
      
      Now blk_mq_sysfs_register() and blk_mq_sysfs_unregister() become
      totally symmetrical, and kobject's refcounter drops to zero just
      when the hctx is freed.
      Signed-off-by: NMing Lei <tom.leiming@gmail.com>
      Tested-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      6c8b232e
    • M
      blk-mq: make lifetime consitent between q/ctx and its kobject · 7ea5fe31
      Ming Lei 提交于
      Currently from kobject view, both q->mq_kobj and ctx->kobj can
      be released during one cycle of blk_mq_register_dev() and
      blk_mq_unregister_dev(). Actually, sw queue's lifetime is
      same with its request queue's, which is covered by request_queue->kobj.
      
      So we don't need to call kobject_put() for the two kinds of
      kobject in __blk_mq_unregister_dev(), instead we do that
      in release handler of request queue.
      Signed-off-by: NMing Lei <tom.leiming@gmail.com>
      Tested-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      7ea5fe31
    • M
      blk-mq: initialize mq kobjects in blk_mq_init_allocated_queue() · 737f98cf
      Ming Lei 提交于
      Both q->mq_kobj and sw queues' kobjects should have been initialized
      once, instead of doing that each add_disk context.
      
      Also this patch removes clearing of ctx in blk_mq_init_cpu_queues()
      because percpu allocator fills zero to allocated variable.
      
      This patch fixes one issue[1] reported from Omar.
      
      [1] kernel wearning when doing unbind/bind on one scsi-mq device
      
      [   19.347924] kobject (ffff8800791ea0b8): tried to init an initialized object, something is seriously wrong.
      [   19.349781] CPU: 1 PID: 84 Comm: kworker/u8:1 Not tainted 4.10.0-rc7-00210-g53f39eeaa263 #34
      [   19.350686] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.1-20161122_114906-anatol 04/01/2014
      [   19.350920] Workqueue: events_unbound async_run_entry_fn
      [   19.350920] Call Trace:
      [   19.350920]  dump_stack+0x63/0x83
      [   19.350920]  kobject_init+0x77/0x90
      [   19.350920]  blk_mq_register_dev+0x40/0x130
      [   19.350920]  blk_register_queue+0xb6/0x190
      [   19.350920]  device_add_disk+0x1ec/0x4b0
      [   19.350920]  sd_probe_async+0x10d/0x1c0 [sd_mod]
      [   19.350920]  async_run_entry_fn+0x48/0x150
      [   19.350920]  process_one_work+0x1d0/0x480
      [   19.350920]  worker_thread+0x48/0x4e0
      [   19.350920]  kthread+0x101/0x140
      [   19.350920]  ? process_one_work+0x480/0x480
      [   19.350920]  ? kthread_create_on_node+0x60/0x60
      [   19.350920]  ret_from_fork+0x2c/0x40
      
      Cc: Omar Sandoval <osandov@osandov.com>
      Signed-off-by: NMing Lei <tom.leiming@gmail.com>
      Tested-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      737f98cf
  3. 08 3月, 2017 20 次提交