1. 25 6月, 2012 8 次提交
    • T
      block: add q->nr_rqs[] and move q->rq.elvpriv to q->nr_rqs_elvpriv · 8a5ecdd4
      Tejun Heo 提交于
      Add q->nr_rqs[] which currently behaves the same as q->rq.count[] and
      move q->rq.elvpriv to q->nr_rqs_elvpriv.  blk_drain_queue() is updated
      to use q->nr_rqs[] instead of q->rq.count[].
      
      These counters separates queue-wide request statistics from the
      request list and allow implementation of per-queue request allocation.
      
      While at it, properly indent fields of struct request_list.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      8a5ecdd4
    • T
      blkcg: inline bio_blkcg() and friends · b1208b56
      Tejun Heo 提交于
      Make bio_blkcg() and friends inline.  They all are very simple and
      used only in few places.
      
      This patch is to prepare for further updates to request allocation
      path.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b1208b56
    • T
      block: allocate io_context upfront · 7f4b35d1
      Tejun Heo 提交于
      Block layer very lazy allocation of ioc.  It waits until the moment
      ioc is absolutely necessary; unfortunately, that time could be inside
      queue lock and __get_request() performs unlock - try alloc - retry
      dancing.
      
      Just allocate it up-front on entry to block layer.  We're not saving
      the rain forest by deferring it to the last possible moment and
      complicating things unnecessarily.
      
      This patch is to prepare for further updates to request allocation
      path.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      7f4b35d1
    • T
      block: refactor get_request[_wait]() · a06e05e6
      Tejun Heo 提交于
      Currently, there are two request allocation functions - get_request()
      and get_request_wait().  The former tries to allocate a request once
      and the latter keeps retrying until it succeeds.  The latter wraps the
      former and keeps retrying until allocation succeeds.
      
      The combination of two functions deliver fallible non-wait allocation,
      fallible wait allocation and unfailing wait allocation.  However,
      given that forward progress is guaranteed, fallible wait allocation
      isn't all that useful and in fact nobody uses it.
      
      This patch simplifies the interface as follows.
      
      * get_request() is renamed to __get_request() and is only used by the
        wrapper function.
      
      * get_request_wait() is renamed to get_request().  It now takes
        @gfp_mask and retries iff it contains %__GFP_WAIT.
      
      This patch doesn't introduce any functional change and is to prepare
      for further updates to request allocation path.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      a06e05e6
    • T
      block: drop custom queue draining used by scsi_transport_{iscsi|fc} · 86072d81
      Tejun Heo 提交于
      iscsi_remove_host() uses bsg_remove_queue() which implements custom
      queue draining.  fc_bsg_remove() open-codes mostly identical logic.
      
      The draining logic isn't correct in that blk_stop_queue() doesn't
      prevent new requests from being queued - it just stops processing, so
      nothing prevents new requests to be queued after the logic determines
      that the queue is drained.
      
      blk_cleanup_queue() now implements proper queue draining and these
      custom draining logics aren't necessary.  Drop them and use
      bsg_unregister_queue() + blk_cleanup_queue() instead.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NMike Christie <michaelc@cs.wisc.edu>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
      Cc: James Smart <james.smart@emulex.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      86072d81
    • T
      mempool: add @gfp_mask to mempool_create_node() · a91a5ac6
      Tejun Heo 提交于
      mempool_create_node() currently assumes %GFP_KERNEL.  Its only user,
      blk_init_free_list(), is about to be updated to use other allocation
      flags - add @gfp_mask argument to the function.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      a91a5ac6
    • T
      blkcg: make root blkcg allocation use %GFP_KERNEL · 15974993
      Tejun Heo 提交于
      Currently, blkcg_activate_policy() depends on %GFP_ATOMIC allocation
      from __blkg_lookup_create() for root blkcg creation.  This could make
      policy fail unnecessarily.
      
      Make blkg_alloc() take @gfp_mask, __blkg_lookup_create() take an
      optional @new_blkg for preallocated blkg, and blkcg_activate_policy()
      preload radix tree and preallocate blkg with %GFP_KERNEL before trying
      to create the root blkg.
      
      v2: __blkg_lookup_create() was returning %NULL on blkg alloc failure
         instead of ERR_PTR() value.  Fixed.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      15974993
    • T
      blkcg: __blkg_lookup_create() doesn't need radix preload · 13589864
      Tejun Heo 提交于
      There's no point in calling radix_tree_preload() if preloading doesn't
      use more permissible GFP mask.  Drop preloading from
      __blkg_lookup_create().
      
      While at it, drop sparse locking annotation which no longer applies.
      
      v2: Vivek pointed out the odd preload usage.  Instead of updating,
          just drop it.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      13589864
  2. 15 6月, 2012 4 次提交
    • J
      scsi: Silence unnecessary warnings about ioctl to partition · 6d935928
      Jan Kara 提交于
      Sometimes, warnings about ioctls to partition happen often enough that they
      form majority of the warnings in the kernel log and users complain. In some
      cases warnings are about ioctls such as SG_IO so it's not good to get rid of
      the warnings completely as they can ease debugging of userspace problems
      when ioctl is refused.
      
      Since I have seen warnings from lots of commands, including some proprietary
      userspace applications, I don't think disallowing the ioctls for processes
      with CAP_SYS_RAWIO will happen in the near future if ever. So lets just
      stop warning for processes with CAP_SYS_RAWIO for which ioctl is allowed.
      
      CC: Paolo Bonzini <pbonzini@redhat.com>
      CC: James Bottomley <JBottomley@parallels.com>
      CC: linux-scsi@vger.kernel.org
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      6d935928
    • A
      block: Drop dead function blk_abort_queue() · 76aaa510
      Asias He 提交于
      This function was only used by btrfs code in btrfs_abort_devices()
      (seems in a wrong way).
      
      It was removed in commit d07eb911,
      So, Let's remove the dead code to avoid any confusion.
      
      Changes in v2: update commit log, btrfs_abort_devices() was removed
      already.
      
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: linux-kernel@vger.kernel.org
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: linux-btrfs@vger.kernel.org
      Cc: David Sterba <dave@jikos.cz>
      Signed-off-by: NAsias He <asias@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      76aaa510
    • A
      block: Mitigate lock unbalance caused by lock switching · 5e5cfac0
      Asias He 提交于
      Commit 777eb1bf disconnects externally
      supplied queue_lock before blk_drain_queue(). Switching the lock would
      introduce lock unbalance because theads which have taken the external
      lock might unlock the internal lock in the during the queue drain. This
      patch mitigate this by disconnecting the lock after the queue draining
      since queue draining makes a lot of request_queue users go away.
      
      However, please note, this patch only makes the problem less likely to
      happen. Anyone who still holds a ref might try to issue a new request on
      a dead queue after the blk_cleanup_queue() finishes draining, the lock
      unbalance might still happen in this case.
      
       =====================================
       [ BUG: bad unlock balance detected! ]
       3.4.0+ #288 Not tainted
       -------------------------------------
       fio/17706 is trying to release lock (&(&q->__queue_lock)->rlock) at:
       [<ffffffff81329372>] blk_queue_bio+0x2a2/0x380
       but there are no more locks to release!
      
       other info that might help us debug this:
       1 lock held by fio/17706:
        #0:  (&(&vblk->lock)->rlock){......}, at: [<ffffffff81327f1a>]
       get_request_wait+0x19a/0x250
      
       stack backtrace:
       Pid: 17706, comm: fio Not tainted 3.4.0+ #288
       Call Trace:
        [<ffffffff81329372>] ? blk_queue_bio+0x2a2/0x380
        [<ffffffff810dea49>] print_unlock_inbalance_bug+0xf9/0x100
        [<ffffffff810dfe4f>] lock_release_non_nested+0x1df/0x330
        [<ffffffff811dae24>] ? dio_bio_end_aio+0x34/0xc0
        [<ffffffff811d6935>] ? bio_check_pages_dirty+0x85/0xe0
        [<ffffffff811daea1>] ? dio_bio_end_aio+0xb1/0xc0
        [<ffffffff81329372>] ? blk_queue_bio+0x2a2/0x380
        [<ffffffff81329372>] ? blk_queue_bio+0x2a2/0x380
        [<ffffffff810e0079>] lock_release+0xd9/0x250
        [<ffffffff81a74553>] _raw_spin_unlock_irq+0x23/0x40
        [<ffffffff81329372>] blk_queue_bio+0x2a2/0x380
        [<ffffffff81328faa>] generic_make_request+0xca/0x100
        [<ffffffff81329056>] submit_bio+0x76/0xf0
        [<ffffffff8115470c>] ? set_page_dirty_lock+0x3c/0x60
        [<ffffffff811d69e1>] ? bio_set_pages_dirty+0x51/0x70
        [<ffffffff811dd1a8>] do_blockdev_direct_IO+0xbf8/0xee0
        [<ffffffff811d8620>] ? blkdev_get_block+0x80/0x80
        [<ffffffff811dd4e5>] __blockdev_direct_IO+0x55/0x60
        [<ffffffff811d8620>] ? blkdev_get_block+0x80/0x80
        [<ffffffff811d92e7>] blkdev_direct_IO+0x57/0x60
        [<ffffffff811d8620>] ? blkdev_get_block+0x80/0x80
        [<ffffffff8114c6ae>] generic_file_aio_read+0x70e/0x760
        [<ffffffff810df7c5>] ? __lock_acquire+0x215/0x5a0
        [<ffffffff811e9924>] ? aio_run_iocb+0x54/0x1a0
        [<ffffffff8114bfa0>] ? grab_cache_page_nowait+0xc0/0xc0
        [<ffffffff811e82cc>] aio_rw_vect_retry+0x7c/0x1e0
        [<ffffffff811e8250>] ? aio_fsync+0x30/0x30
        [<ffffffff811e9936>] aio_run_iocb+0x66/0x1a0
        [<ffffffff811ea9b0>] do_io_submit+0x6f0/0xb80
        [<ffffffff8134de2e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
        [<ffffffff811eae50>] sys_io_submit+0x10/0x20
        [<ffffffff81a7c9e9>] system_call_fastpath+0x16/0x1b
      
      Changes since v2: Update commit log to explain how the code is still
                        broken even if we delay the lock switching after the drain.
      Changes since v1: Update commit log as Tejun suggested.
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NAsias He <asias@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      5e5cfac0
    • A
      block: Avoid missed wakeup in request waitqueue · 458f27a9
      Asias He 提交于
      After hot-unplug a stressed disk, I found that rl->wait[] is not empty
      while rl->count[] is empty and there are theads still sleeping on
      get_request after the queue cleanup. With simple debug code, I found
      there are exactly nr_sleep - nr_wakeup of theads in D state. So there
      are missed wakeup.
      
        $ dmesg | grep nr_sleep
        [   52.917115] ---> nr_sleep=1046, nr_wakeup=873, delta=173
        $ vmstat 1
        1 173  0 712640  24292  96172 0 0  0  0  419  757  0  0  0 100  0
      
      To quote Tejun:
      
        Ah, okay, freed_request() wakes up single waiter with the assumption
        that after the wakeup there will at least be one successful allocation
        which in turn will continue the wakeup chain until the wait list is
        empty - ie. waiter wakeup is dependent on successful request
        allocation happening after each wakeup.  With queue marked dead, any
        woken up waiter fails the allocation path, so the wakeup chaining is
        lost and we're left with hung waiters. What we need is wake_up_all()
        after drain completion.
      
      This patch fixes the missed wakeup by waking up all the theads which
      are sleeping on wait queue after queue drain.
      
      Changes in v2: Drop waitqueue_active() optimization
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NAsias He <asias@redhat.com>
      
      Fixed a bug by me, where stacked devices would oops on calling
      blk_drain_queue() since ->rq.wait[] do not get initialized unless
      it's a full queue setup.
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      458f27a9
  3. 14 6月, 2012 4 次提交
  4. 12 6月, 2012 4 次提交
    • L
      drbd: fix null pointer dereference with on-congestion policy when diskless · 0d5934e3
      Lars Ellenberg 提交于
      We must not look at mdev->actlog, unless we have a get_ldev() reference.
      It also does not make much sense to try to disconnect or pull-ahead of
      the peer, if we don't have good local data.
      
      Only even consider congestion policies, if our local disk is D_UP_TO_DATE.
      Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
      Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
      0d5934e3
    • L
      drbd: fix list corruption by failing but already aborted reads · 1ed25b26
      Lars Ellenberg 提交于
      If a read is aborted due to force-detach of a supposedly unresponsive
      local backing device, and retried on the peer, it can happen that the
      local request later still completes (hopefully with an error).
      As it may already have been completed to upper layers meanwhile,
      it must not be retried again now.
      Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
      Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
      1ed25b26
    • L
      drbd: fix access of unallocated pages and kernel panic · 4eccc579
      Lars Ellenberg 提交于
      BUG: unable to handle kernel NULL pointer dereference at (null)
      ...
       [<d1e17561>] ? _drbd_bm_set_bits+0x151/0x240 [drbd]
       [<d1e236f8>] ? receive_bitmap+0x4f8/0xbc0 [drbd]
      
      This fixes an off-by-one error in the receive_bitmap() path,
      if run-length encoded bitmap transfer is enabled.
      
      If the bitmap is an exact multiple of PAGE_SIZE, which means the visible
      capacity of the drbd device is an exact multiple of 128 MiB (for 4k page
      size), and bitmap compression (use-rle) is enabled (which became default
      with 8.4), and the very last bit is dirty and reported in an rle
      comressed bitmap packet, we ended up trying to kmap_atomic a page pointer
      that does not exist (bitmap->bm_pages[last index + 1]).
      
      bug introduced by:
          Date:   Fri Jul 24 15:33:24 2009 +0200
          set bits: optimize for complete last word, fix off-by-one-word corner case
      
      made effective by:
          Date:   Thu Dec 16 00:32:38 2010 +0100
          drbd: get rid of unused debug code
      
          Long time ago, we had paranoia code in the bitmap that allocated one
          extra word, assigned a magic value, and checked on every occasion that
          the magic value was still unchanged.
      
          That debug code is unused, the extra long word complicates code a bit.
          Get rid of it.
      
      No-one triggered this bug in the last few years, because a large subset
      of our userbase is unaffected:
       * typically the last few blocks of a device are not modified
         frequently, and remain unset
       * use-rle was disabled by default in drbd < 8.4
       * those with slightly "odd" device sizes, or
       * drbd internal meta data (which will skew the device size slightly,
         thus makes it harder to have a bug relevant device size)
      Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com>
      Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com>
      4eccc579
    • K
      xen/blkfront: Add WARN to deal with misbehaving backends. · 6878c32e
      Konrad Rzeszutek Wilk 提交于
      Part of the ring structure is the 'id' field which is under
      control of the frontend. The frontend stamps it with "some"
      value (this some in this implementation being a value less
      than BLK_RING_SIZE), and when it gets a response expects
      said value to be in the response structure. We have a check
      for the id field when spolling new requests but not when
      de-spolling responses.
      
      We also add an extra check in add_id_to_freelist to make
      sure that the 'struct request' was not NULL - as we cannot
      pass a NULL to __blk_end_request_all, otherwise that crashes
      (and all the operations that the response is dealing with
      end up with __blk_end_request_all).
      
      Lastly we also print the name of the operation that failed.
      
      [v1: s/BUG/WARN/ suggested by Stefano]
      [v2: Add extra check in add_id_to_freelist]
      [v3: Redid op_name per Jan's suggestion]
      [v4: add const * and add WARN on failure returns]
      Acked-by: NJan Beulich <jbeulich@suse.com>
      Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      6878c32e
  5. 06 6月, 2012 1 次提交
  6. 05 6月, 2012 2 次提交
  7. 04 6月, 2012 9 次提交
    • T
      blkcg: fix blkg_alloc() failure path · 9b2ea86b
      Tejun Heo 提交于
      When policy data allocation fails in the middle, blkg_alloc() invokes
      blkg_free() to destroy the half constructed blkg.  This ends up
      calling pd_exit_fn() on policy datas which didn't go through
      pd_init_fn().  Fix it by making blkg_alloc() call pd_init_fn()
      immediately after each policy data allocation.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      9b2ea86b
    • T
      block: blkcg_policy_cfq shouldn't be used if !CONFIG_CFQ_GROUP_IOSCHED · ffea73fc
      Tejun Heo 提交于
      cfq may be built w/ or w/o blkcg support depending on
      CONFIG_CFQ_CGROUP_IOSCHED.  If blkcg support is disabled, most of
      related code is ifdef'd out but some part is left dangling -
      blkcg_policy_cfq is left zero-filled and blkcg_policy_[un]register()
      calls are made on it.
      
      Feeding zero filled policy to blkcg_policy_register() is incorrect and
      triggers the following WARN_ON() if CONFIG_BLK_CGROUP &&
      !CONFIG_CFQ_GROUP_IOSCHED.
      
       ------------[ cut here ]------------
       WARNING: at block/blk-cgroup.c:867
       Modules linked in:
       Modules linked in:
       CPU: 3 Not tainted 3.4.0-09547-gfb21affa #1
       Process swapper/0 (pid: 1, task: 000000003ff80000, ksp: 000000003ff7f8b8)
       Krnl PSW : 0704100180000000 00000000003d76ca (blkcg_policy_register+0xca/0xe0)
      	    R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:0 CC:1 PM:0 EA:3
       Krnl GPRS: 0000000000000000 00000000014b85ec 00000000014b85b0 0000000000000000
      	    000000000096fb60 0000000000000000 00000000009a8e78 0000000000000048
      	    000000000099c070 0000000000b6f000 0000000000000000 000000000099c0b8
      	    00000000014b85b0 0000000000667580 000000003ff7fd98 000000003ff7fd70
       Krnl Code: 00000000003d76be: a7280001           lhi     %r2,1
      	    00000000003d76c2: a7f4ffdf           brc     15,3d7680
      	   #00000000003d76c6: a7f40001           brc     15,3d76c8
      	   >00000000003d76ca: a7c8ffea           lhi     %r12,-22
      	    00000000003d76ce: a7f4ffce           brc     15,3d766a
      	    00000000003d76d2: a7f40001           brc     15,3d76d4
      	    00000000003d76d6: a7c80000           lhi     %r12,0
      	    00000000003d76da: a7f4ffc2           brc     15,3d765e
       Call Trace:
       ([<0000000000b6f000>] initcall_debug+0x0/0x4)
        [<0000000000989e8a>] cfq_init+0x62/0xd4
        [<00000000001000ba>] do_one_initcall+0x3a/0x170
        [<000000000096fb60>] kernel_init+0x214/0x2bc
        [<0000000000623202>] kernel_thread_starter+0x6/0xc
        [<00000000006231fc>] kernel_thread_starter+0x0/0xc
       no locks held by swapper/0/1.
       Last Breaking-Event-Address:
        [<00000000003d76c6>] blkcg_policy_register+0xc6/0xe0
       ---[ end trace b8ef4903fcbf9dd3 ]---
      
      This patch fixes the problem by ensuring all blkcg support code is
      inside CONFIG_CFQ_GROUP_IOSCHED.
      
      * blkcg_policy_cfq declaration and blkg_to_cfqg() definition are moved
        inside the first CONFIG_CFQ_GROUP_IOSCHED block.  __maybe_unused is
        dropped from blkcg_policy_cfq decl.
      
      * blkcg_deactivate_poilcy() invocation is moved inside ifdef.  This
        also makes the activation logic match cfq_init_queue().
      
      * All blkcg_policy_[un]register() invocations are moved inside ifdef.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      LKML-Reference: <20120601112954.GC3535@osiris.boeblingen.de.ibm.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      ffea73fc
    • T
      block: fix return value on cfq_init() failure · fd794956
      Tejun Heo 提交于
      cfq_init() would return zero after kmem cache creation failure.  Fix
      so that it returns -ENOMEM.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      fd794956
    • S
      mtip32xx: Remove version.h header file inclusion · 87c9ea76
      Sachin Kamat 提交于
      version.h header file inclusion is no longer required.
      Signed-off-by: NSachin Kamat <sachin.kamat@linaro.org>
      87c9ea76
    • K
      gpio/samsung: fix the typo 'exynos5_xxx' instead of 'exonys5_xxx' · 5041caa4
      Kukjin Kim 提交于
      Should be 'exynos5_xxx' instead of 'exonys5_xxx'.
      
      It happened at the commit 30b84288 ("Merge tag 'soc2' of
      git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc")
      during v3.5 merge window.
      Signed-off-by: NKukjin Kim <kgene.kim@samsung.com>
      [ My bad  - Linus ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5041caa4
    • L
      Merge branch 'pm-acpi' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 4d578573
      Linus Torvalds 提交于
      Pull some left-over PM patches from Rafael J. Wysocki.
      
      * 'pm-acpi' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        ACPI / PM: Make acpi_pm_device_sleep_state() follow the specification
        ACPI / PM: Make __acpi_bus_get_power() cover D3cold correctly
        ACPI / PM: Fix error messages in drivers/acpi/bus.c
        rtc-cmos / PM: report wakeup event on ACPI RTC alarm
        ACPI / PM: Generate wakeup events on fixed power button
      4d578573
    • L
      Revert "mm: compaction: handle incorrect MIGRATE_UNMOVABLE type pageblocks" · 68e3e926
      Linus Torvalds 提交于
      This reverts commit 5ceb9ce6.
      
      That commit seems to be the cause of the mm compation list corruption
      issues that Dave Jones reported.  The locking (or rather, absense
      there-of) is dubious, as is the use of the 'page' variable once it has
      been found to be outside the pageblock range.
      
      So revert it for now, we can re-visit this for 3.6.  If we even need to:
      as Minchan Kim says, "The patch wasn't a bug fix and even test workload
      was very theoretical".
      Reported-and-tested-by: NDave Jones <davej@redhat.com>
      Acked-by: NHugh Dickins <hughd@google.com>
      Acked-by: NKOSAKI Motohiro <kosaki.motohiro@gmail.com>
      Acked-by: NMinchan Kim <minchan@kernel.org>
      Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
      Cc: Kyungmin Park <kyungmin.park@samsung.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      68e3e926
    • H
      mm: fix warning in __set_page_dirty_nobuffers · 752dc185
      Hugh Dickins 提交于
      New tmpfs use of !PageUptodate pages for fallocate() is triggering the
      WARNING: at mm/page-writeback.c:1990 when __set_page_dirty_nobuffers()
      is called from migrate_page_copy() for compaction.
      
      It is anomalous that migration should use __set_page_dirty_nobuffers()
      on an address_space that does not participate in dirty and writeback
      accounting; and this has also been observed to insert surprising dirty
      tags into a tmpfs radix_tree, despite tmpfs not using tags at all.
      
      We should probably give migrate_page_copy() a better way to preserve the
      tag and migrate accounting info, when mapping_cap_account_dirty().  But
      that needs some more work: so in the interim, avoid the warning by using
      a simple SetPageDirty on PageSwapBacked pages.
      Reported-and-tested-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      752dc185
    • L
      vfs: move inode stat information closer together · 2f9d3df8
      Linus Torvalds 提交于
      The comment above it says "Stat data, not accessed from path walking",
      but in fact some of inode fields we use for the common stat data was way
      down at the end of the inode, causing unnecessary cache misses for the
      common stat operations.
      
      The inode structure is pretty big, and this can change padding depending
      on field width, but at least on the common 64-bit configurations this
      doesn't change the size.  Some of our inode layout has historically been
      to tro to avoid unnecessary padding fields, but cache locality is at
      least as important for layout, if not more.
      
      Noticed by looking at kernel profiles, and noticing that the "i_blkbits"
      access stood out like a sore thumb.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2f9d3df8
  8. 03 6月, 2012 8 次提交
    • L
      Linux 3.5-rc1 · f8f5701b
      Linus Torvalds 提交于
      f8f5701b
    • L
      Merge tag 'dm-3.5-changes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm · 912afc36
      Linus Torvalds 提交于
      Pull device-mapper updates from Alasdair G Kergon:
       "Improve multipath's retrying mechanism in some defined circumstances
        and provide a simple reserve/release mechanism for userspace tools to
        access thin provisioning metadata while the pool is in use."
      
      * tag 'dm-3.5-changes-1' of git://git.kernel.org/pub/scm/linux/kernel/git/agk/linux-dm:
        dm thin: provide userspace access to pool metadata
        dm thin: use slab mempools
        dm mpath: allow ioctls to trigger pg init
        dm mpath: delay retry of bypassed pg
        dm mpath: reduce size of struct multipath
      912afc36
    • J
      dm thin: provide userspace access to pool metadata · cc8394d8
      Joe Thornber 提交于
      This patch implements two new messages that can be sent to the thin
      pool target allowing it to take a snapshot of the _metadata_.  This,
      read-only snapshot can be accessed by userland, concurrently with the
      live target.
      
      Only one metadata snapshot can be held at a time.  The pool's status
      line will give the block location for the current msnap.
      
      Since version 0.1.5 of the userland thin provisioning tools, the
      thin_dump program displays the msnap as follows:
      
          thin_dump -m <msnap root> <metadata dev>
      
      Available here: https://github.com/jthornber/thin-provisioning-tools
      
      Now that userland can access the metadata we can do various things
      that have traditionally been kernel side tasks:
      
           i) Incremental backups.
      
           By using metadata snapshots we can work out what blocks have
           changed over time.  Combined with data snapshots we can ensure
           the data doesn't change while we back it up.
      
           A short proof of concept script can be found here:
      
           https://github.com/jthornber/thinp-test-suite/blob/master/incremental_backup_example.rb
      
           ii) Migration of thin devices from one pool to another.
      
           iii) Merging snapshots back into an external origin.
      
           iv) Asyncronous replication.
      Signed-off-by: NJoe Thornber <ejt@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      cc8394d8
    • M
      dm thin: use slab mempools · a24c2569
      Mike Snitzer 提交于
      Use dedicated caches prefixed with a "dm_" name rather than relying on
      kmalloc mempools backed by generic slab caches so the memory usage of
      thin provisioning (and any leaks) can be accounted for independently.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      a24c2569
    • M
      dm mpath: allow ioctls to trigger pg init · 35991652
      Mikulas Patocka 提交于
      After the failure of a group of paths, any alternative paths that
      need initialising do not become available until further I/O is sent to
      the device.  Until this has happened, ioctls return -EAGAIN.
      
      With this patch, new paths are made available in response to an ioctl
      too.  The processing of the ioctl gets delayed until this has happened.
      
      Instead of returning an error, we submit a work item to kmultipathd
      (that will potentially activate the new path) and retry in ten
      milliseconds.
      
      Note that the patch doesn't retry an ioctl if the ioctl itself fails due
      to a path failure.  Such retries should be handled intelligently by the
      code that generated the ioctl in the first place, noting that some SCSI
      commands should not be retried because they are not idempotent (XOR write
      commands).  For commands that could be retried, there is a danger that
      if the device rejected the SCSI command, the path could be errorneously
      marked as failed, and the request would be retried on another path which
      might fail too.  It can be determined if the failure happens on the
      device or on the SCSI controller, but there is no guarantee that all
      SCSI drivers set these flags correctly.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      35991652
    • M
      dm mpath: delay retry of bypassed pg · f220fd4e
      Mike Christie 提交于
      If I/O needs retrying and only bypassed priority groups are available,
      set the pg_init_delay_retry flag to wait before retrying.
      
      If, for example, the reason for the bypass is that the controller is
      getting reset or there is a firmware upgrade happening, retrying right
      away would cause a flood of log messages and retries for what could be a
      few seconds or even several minutes.
      Signed-off-by: NMike Christie <michaelc@cs.wisc.edu>
      Acked-by: NMike Snitzer <snitzer@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      f220fd4e
    • M
      dm mpath: reduce size of struct multipath · 1fbdd2b3
      Mike Snitzer 提交于
      Move multipath structure's 'lock' and 'queue_size' members to eliminate
      two 4-byte holes.  Also use a bit within a single unsigned int for each
      existing flag (saves 8-bytes).  This allows future flags to be added
      without each consuming an unsigned int.
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      Acked-by: NHannes Reinecke <hare@suse.de>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      1fbdd2b3
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 4fc3acf2
      Linus Torvalds 提交于
      Pull networking updates from David Miller:
      
       1) Make syn floods consume significantly less resources by
      
          a) Not pre-COW'ing routing metrics for SYN/ACKs
          b) Mirroring the device queue mapping of the SYN for the SYN/ACK
             reply.
      
          Both from Eric Dumazet.
      
       2) Fix calculation errors in Byte Queue Limiting, from Hiroaki SHIMODA.
      
       3) Validate the length requested when building a paged SKB for a
          socket, so we don't overrun the page vector accidently.  From Jason
          Wang.
      
       4) When netlabel is disabled, we abort all IP option processing when we
          see a CIPSO option.  This isn't the right thing to do, we should
          simply skip over it and continue processing the remaining options
          (if any).  Fix from Paul Moore.
      
       5) SRIOV fixes for the mellanox driver from Jack orgenstein and Marcel
          Apfelbaum.
      
       6) 8139cp enables the receiver before the ring address is properly
          programmed, which potentially lets the device crap over random
          memory.  Fix from Jason Wang.
      
       7) e1000/e1000e fixes for i217 RST handling, and an improper buffer
          address reference in jumbo RX frame processing from Bruce Allan and
          Sebastian Andrzej Siewior, respectively.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
        fec_mpc52xx: fix timestamp filtering
        mcs7830: Implement link state detection
        e1000e: fix Rapid Start Technology support for i217
        e1000: look into the page instead of skb->data for e1000_tbi_adjust_stats()
        r8169: call netif_napi_del at errpaths and at driver unload
        tcp: reflect SYN queue_mapping into SYNACK packets
        tcp: do not create inetpeer on SYNACK message
        8139cp/8139too: terminate the eeprom access with the right opmode
        8139cp: set ring address before enabling receiver
        cipso: handle CIPSO options correctly when NetLabel is disabled
        net: sock: validate data_len before allocating skb in sock_alloc_send_pskb()
        bql: Avoid possible inconsistent calculation.
        bql: Avoid unneeded limit decrement.
        bql: Fix POSDIFF() to integer overflow aware.
        net/mlx4_core: Fix obscure mlx4_cmd_box parameter in QUERY_DEV_CAP
        net/mlx4_core: Check port out-of-range before using in mlx4_slave_cap
        net/mlx4_core: Fixes for VF / Guest startup flow
        net/mlx4_en: Fix improper use of "port" parameter in mlx4_en_event
        net/mlx4_core: Fix number of EQs used in ICM initialisation
        net/mlx4_core: Fix the slave_id out-of-range test in mlx4_eq_int
      4fc3acf2