1. 07 3月, 2012 8 次提交
    • T
      block: add io_context->active_ref · f6e8d01b
      Tejun Heo 提交于
      Currently ioc->nr_tasks is used to decide two things - whether an ioc
      is done issuing IOs and whether it's shared by multiple tasks.  This
      patch separate out the first into ioc->active_ref, which is acquired
      and released using {get|put}_io_context_active() respectively.
      
      This will be used to associate bio's with a given task.  This patch
      doesn't introduce any visible behavior change.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      f6e8d01b
    • T
      block: ioc_task_link() can't fail · 3d48749d
      Tejun Heo 提交于
      ioc_task_link() is used to share %current's ioc on clone.  If
      %current->io_context is set, %current is guaranteed to have refcount
      on the ioc and, thus, ioc_task_link() can't fail.
      
      Replace error checking in ioc_task_link() with WARN_ON_ONCE() and make
      it just increment refcount and nr_tasks.
      
      -v2: Description typo fix (Vivek).
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      3d48749d
    • T
      blkcg: drop unnecessary RCU locking · c875f4d0
      Tejun Heo 提交于
      Now that blkg additions / removals are always done under both q and
      blkcg locks, the only places RCU locking is necessary are
      blkg_lookup[_create]() for lookup w/o blkcg lock.  This patch drops
      unncessary RCU locking replacing it with plain blkcg locking as
      necessary.
      
      * blkiocg_pre_destroy() already perform proper locking and don't need
        RCU.  Dropped.
      
      * blkio_read_blkg_stats() now uses blkcg->lock instead of RCU read
        lock.  This isn't a hot path.
      
      * Now unnecessary synchronize_rcu() from queue exit paths removed.
        This makes q->nr_blkgs unnecessary.  Dropped.
      
      * RCU annotation on blkg->q removed.
      
      -v2: Vivek pointed out that blkg_lookup_create() still needs to be
           called under rcu_read_lock().  Updated.
      
      -v3: After the update, stats_lock locking in blkio_read_blkg_stats()
           shouldn't be using _irq variant as it otherwise ends up enabling
           irq while blkcg->lock is locked.  Fixed.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      c875f4d0
    • T
      blkcg: let blkcg core manage per-queue blkg list and counter · 03aa264a
      Tejun Heo 提交于
      With the previous patch to move blkg list heads and counters to
      request_queue and blkg, logic to manage them in both policies are
      almost identical and can be moved to blkcg core.
      
      This patch moves blkg link logic into blkg_lookup_create(), implements
      common blkg unlink code in blkg_destroy(), and updates
      blkg_destory_all() so that it's policy specific and can skip root
      group.  The updated blkg_destroy_all() is now used to both clear queue
      for bypassing and elv switching, and release all blkgs on q exit.
      
      This patch introduces a race window where policy [de]registration may
      race against queue blkg clearing.  This can only be a problem on cfq
      unload and shouldn't be a real problem in practice (and we have many
      other places where this race already exists).  Future patches will
      remove these unlikely races.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      03aa264a
    • T
      blkcg: move per-queue blkg list heads and counters to queue and blkg · 4eef3049
      Tejun Heo 提交于
      Currently, specific policy implementations are responsible for
      maintaining list and number of blkgs.  This duplicates code
      unnecessarily, and hinders factoring common code and providing blkcg
      API with better defined semantics.
      
      After this patch, request_queue hosts list heads and counters and blkg
      has list nodes for both policies.  This patch only relocates the
      necessary fields and the next patch will actually move management code
      into blkcg core.
      
      Note that request_queue->blkg_list[] and ->nr_blkgs[] are hardcoded to
      have 2 elements.  This is to avoid include dependency and will be
      removed by the next patch.
      
      This patch doesn't introduce any behavior change.
      
      -v2: Now unnecessary conditional on CONFIG_BLK_CGROUP_MODULE removed
           as pointed out by Vivek.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      4eef3049
    • T
      blkcg: clear all request_queues on blkcg policy [un]registrations · 923adde1
      Tejun Heo 提交于
      Keep track of all request_queues which have blkcg initialized and turn
      on bypass and invoke blkcg_clear_queue() on all before making changes
      to blkcg policies.
      
      This is to prepare for moving blkg management into blkcg core.  Note
      that this uses more brute force than necessary.  Finer grained shoot
      down will be implemented later and given that policy [un]registration
      almost never happens on running systems (blk-throtl can't be built as
      a module and cfq usually is the builtin default iosched), this
      shouldn't be a problem for the time being.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      923adde1
    • T
      block: implement blk_queue_bypass_start/end() · d732580b
      Tejun Heo 提交于
      Rename and extend elv_queisce_start/end() to
      blk_queue_bypass_start/end() which are exported and supports nesting
      via @q->bypass_depth.  Also add blk_queue_bypass() to test bypass
      state.
      
      This will be further extended and used for blkio_group management.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      d732580b
    • T
      elevator: make elevator_init_fn() return 0/-errno · b2fab5ac
      Tejun Heo 提交于
      elevator_ops->elevator_init_fn() has a weird return value.  It returns
      a void * which the caller should assign to q->elevator->elevator_data
      and %NULL return denotes init failure.
      
      Update such that it returns integer 0/-errno and sets elevator_data
      directly as necessary.
      
      This makes the interface more conventional and eases further cleanup.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b2fab5ac
  2. 02 3月, 2012 2 次提交
  3. 15 2月, 2012 2 次提交
    • T
      block: exit_io_context() should call elevator_exit_icq_fn() · 621032ad
      Tejun Heo 提交于
      While updating locking, b2efa052 "block, cfq: unlink
      cfq_io_context's immediately" moved elevator_exit_icq_fn() invocation
      from exit_io_context() to the final ioc put.  While this doesn't cause
      catastrophic failure, it effectively removes task exit notification to
      elevator and cause noticeable IO performance degradation with CFQ.
      
      On task exit, CFQ used to immediately expire the slice if it was being
      used by the exiting task as no more IO would be issued by the task;
      however, after b2efa052, the notification is lost and disk could sit
      idle needlessly, leading to noticeable IO performance degradation for
      certain workloads.
      
      This patch renames ioc_exit_icq() to ioc_destroy_icq(), separates
      elevator_exit_icq_fn() invocation into ioc_exit_icq() and invokes it
      from exit_io_context().  ICQ_EXITED flag is added to avoid invoking
      the callback more than once for the same icq.
      
      Walking icq_list from ioc side and invoking elevator callback requires
      reverse double locking.  This may be better implemented using RCU;
      unfortunately, using RCU isn't trivial.  e.g. RCU protection would
      need to cover request_queue and queue_lock switch on cleanup makes
      grabbing queue_lock from RCU unsafe.  Reverse double locking should
      do, at least for now.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-and-bisected-by: NShaohua Li <shli@kernel.org>
      LKML-Reference: <CANejiEVzs=pUhQSTvUppkDcc2TNZyfohBRLygW5zFmXyk5A-xQ@mail.gmail.com>
      Tested-by: NShaohua Li <shaohua.li@intel.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      621032ad
    • T
      block: replace icq->changed with icq->flags · d705ae6b
      Tejun Heo 提交于
      icq->changed was used for ICQ_*_CHANGED bits.  Rename it to flags and
      access it under ioc->lock instead of using atomic bitops.
      ioc_get_changed() is added so that the changed part can be fetched and
      cleared as before.
      
      icq->flags will be used to carry other flags.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Tested-by: NShaohua Li <shaohua.li@intel.com>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      d705ae6b
  4. 14 2月, 2012 4 次提交
    • S
      mmc: dw_mmc: Fix PIO mode with support of highmem · f9c2a0dc
      Seungwon Jeon 提交于
      Current PIO mode makes a kernel crash with CONFIG_HIGHMEM.
      Highmem pages have a NULL from sg_virt(sg).
      This patch fixes the following problem.
      
      Unable to handle kernel NULL pointer dereference at virtual address 00000000
      pgd = c0004000
      [00000000] *pgd=00000000
      Internal error: Oops: 817 [#1] PREEMPT SMP
      Modules linked in:
      CPU: 0    Not tainted  (3.0.15-01423-gdbf465f #589)
      PC is at dw_mci_pull_data32+0x4c/0x9c
      LR is at dw_mci_read_data_pio+0x54/0x1f0
      pc : [<c0358824>]    lr : [<c035988c>]    psr: 20000193
      sp : c0619d48  ip : c0619d70  fp : c0619d6c
      r10: 00000000  r9 : 00000002  r8 : 00001000
      r7 : 00000200  r6 : 00000000  r5 : e1dd3100  r4 : 00000000
      r3 : 65622023  r2 : 0000007f  r1 : eeb96000  r0 : e1dd3100
      Flags: nzCv  IRQs off  FIQs on  Mode SVC_32  ISA ARM  Segment
      xkernel
      Control: 10c5387d  Table: 61e2004a  DAC: 00000015
      Process swapper (pid: 0, stack limit = 0xc06182f0)
      Stack: (0xc0619d48 to 0xc061a000)
      9d40:                   e1dd3100 e1a4f000 00000000 e1dd3100 e1a4f000 00000200
      9d60: c0619da4 c0619d70 c035988c c03587e4 c0619d9c e18158f4 e1dd3100 e1dd3100
      9d80: 00000020 00000000 00000000 00000020 c06e8a84 00000000 c0619e04 c0619da8
      9da0: c0359b24 c0359844 e18158f4 e1dd3164 e1dd3168 e1dd3150 3d02fc79 e1dd3154
      9dc0: e1dd3178 00000000 00000020 00000000 e1dd3150 00000000 c10dd7e8 e1a84900
      9de0: c061e7cc 00000000 00000000 0000008d c06e8a84 c061e780 c0619e4c c0619e08
      9e00: c00c4738 c0359a34 3d02fc79 00000000 c0619e4c c05a1698 c05a1670 c05a165c
      9e20: c04de8b0 c061e780 c061e7cc e1a84900 ffffed68 0000008d c0618000 00000000
      9e40: c0619e6c c0619e50 c00c48b4 c00c46c8 c061e780 c00423ac c061e7cc ffffed68
      9e60: c0619e8c c0619e70 c00c7358 c00c487c 0000008d ffffee38 c0618000 ffffed68
      9e80: c0619ea4 c0619e90 c00c4258 c00c72b0 c00423ac ffffee38 c0619ecc c0619ea8
      9ea0: c004241c c00c4234 ffffffff f8810000 0000006d 00000002 00000001 7fffffff
      9ec0: c0619f44 c0619ed0 c0048bc0 c00423c4 220ae7a9 00000000 386f0d30 0005d3a4
      9ee0: c00423ac c10dd0b8 c06f2cd8 c0618000 c0594778 c003a674 7fffffff c0619f44
      9f00: 386f0d30 c0619f18 c00a6f94 c005be3c 80000013 ffffffff 386f0d30 0005d3a4
      9f20: 386f0d30 0005d2d1 c10dd0a8 c10dd0b8 c06f2cd8 c0618000 c0619f74 c0619f48
      9f40: c0345858 c005be00 c00a2440 c0618000 c0618000 c00410d8 c06c1944 c00410fc
      9f60: c0594778 c003a674 c0619f9c c0619f78 c004a7e8 c03457b4 c0618000 c06c18f8
      9f80: 00000000 c0039c70 c06c18d4 c003a674 c0619fb4 c0619fa0 c04ceafc c004a714
      9fa0: c06287b4 c06c18f8 c0619ff4 c0619fb8 c0008b68 c04cea68 c0008578 00000000
      9fc0: 00000000 c003a674 00000000 10c5387d c0628658 c003aa78 c062f1c4 4000406a
      9fe0: 413fc090 00000000 00000000 c0619ff8 40008044 c0008858 00000000 00000000
      Backtrace:
      [<c03587d8>] (dw_mci_pull_data32+0x0/0x9c) from [<c035988c>] (dw_mci_read_data_pio+0x54/0x1f0)
       r6:00000200 r5:e1a4f000 r4:e1dd3100
       [<c0359838>] (dw_mci_read_data_pio+0x0/0x1f0) from [<c0359b24>] (dw_mci_interrupt+0xfc/0x4a4)
      [<c0359a28>] (dw_mci_interrupt+0x0/0x4a4) from [<c00c4738>] (handle_irq_event_percpu+0x7c/0x1b4)
      [<c00c46bc>] (handle_irq_event_percpu+0x0/0x1b4) from [<c00c48b4>] (handle_irq_event+0x44/0x64)
      [<c00c4870>] (handle_irq_event+0x0/0x64) from [<c00c7358>] (handle_fasteoi_irq+0xb4/0x124)
       r7:ffffed68 r6:c061e7cc r5:c00423ac r4:c061e780
       [<c00c72a4>] (handle_fasteoi_irq+0x0/0x124) from [<c00c4258>] (generic_handle_irq+0x30/0x38)
       r7:ffffed68 r6:c0618000 r5:ffffee38 r4:0000008d
       [<c00c4228>] (generic_handle_irq+0x0/0x38) from [<c004241c>] (asm_do_IRQ+0x64/0xe0)
       r5:ffffee38 r4:c00423ac
       [<c00423b8>] (asm_do_IRQ+0x0/0xe0) from [<c0048bc0>] (__irq_svc+0x80/0x14c)
      Exception stack(0xc0619ed0 to 0xc0619f18)
      Signed-off-by: NSeungwon Jeon <tgih.jun@samsung.com>
      Acked-by: NWill Newton <will.newton@imgtec.com>
      Cc: stable <stable@vger.kernel.org>
      Signed-off-by: NChris Ball <cjb@laptop.org>
      f9c2a0dc
    • G
      mmc: core: Fix PowerOff Notify suspend/resume · 3e73c36b
      Girish K S 提交于
      Modified the mmc_poweroff to resume before sending the poweroff
      notification command. In sleep mode only AWAKE and RESET commands are
      allowed, so before sending the poweroff notification command resume from
      sleep mode and then send the notification command.
      
      PowerOff Notify is tested on a Synopsis Designware Host Controller
      (eMMC 4.5). The suspend to RAM and resume works fine.
      Signed-off-by: NGirish K S <girish.shivananjappa@linaro.org>
      Tested-by: NGirish K S <girish.shivananjappa@linaro.org>
      Reviewed-by: NSaugata Das <saugata.das@linaro.org>
      Signed-off-by: NChris Ball <cjb@laptop.org>
      3e73c36b
    • J
      mmc: core: add the capability for broken voltage · 6e8201f5
      Jaehoon Chung 提交于
      There is an understood mismatch between the voltage the host controller is
      set to and the voltage supplied to the card by a fixed voltage regulator.
      Teaching the driver to accept the mismatch is overly complicated.  Instead
      just accept the regulator's voltage.
      
      This patch adds MMC_CAP2_BROKEN_VOLTAGE.
      
      If the voltage didn't satisfy between min_uV and max_uV, try to change
      the voltage in core.c.  When changing the voltage, maybe use
      regulator_set_voltage().
      
      In regulator_set_voltage(), check the below condition.
      
      	/* sanity check */
      	if (!rdev->desc->ops->set_voltage &&
      	    !rdev->desc->ops->set_voltage_sel) {
      		ret = -EINVAL;
      		goto out;
      	}
      
      If some board should use the fixed-regulator, always return -EINVAL.
      Then, eMMC didn't initialize always.
      
      So if use a fixed-regulator, we need to add the MMC_CAP2_BROKEN_VOLTAGE.
      Signed-off-by: NJaehoon Chung <jh80.chung@samsung.com>
      Signed-off-by: NKyungmin Park <kyungmin.park@samsung.com>
      Acked-by: NAdrian Hunter <adrian.hunter@intel.com>
      Signed-off-by: NChris Ball <cjb@laptop.org>
      6e8201f5
    • S
      mmc: core: Ensure clocks are always enabled before host interaction · 2c4967f7
      Sujit Reddy Thumma 提交于
      Ensure clocks are always enabled before any interaction with the
      host controller driver. This makes sure that there is no race
      between host execution and the core layer turning off clocks
      in different context with clock gating framework.
      Signed-off-by: NSujit Reddy Thumma <sthumma@codeaurora.org>
      Acked-by: NLinus Walleij <linus.walleij@linaro.org>
      Acked-by: NPer Forlin <per.forlin@stericsson.com>
      Signed-off-by: NChris Ball <cjb@laptop.org>
      2c4967f7
  5. 09 2月, 2012 1 次提交
  6. 08 2月, 2012 2 次提交
    • T
      block: don't call elevator callbacks for plug merges · 07c2bd37
      Tejun Heo 提交于
      Plug merge calls two elevator callbacks outside queue lock -
      elevator_allow_merge_fn() and elevator_bio_merged_fn().  Although
      attempt_plug_merge() suggests that elevator is guaranteed to be there
      through the existing request on the plug list, nothing prevents plug
      merge from calling into dying or initializing elevator.
      
      For regular merges, bypass ensures elvpriv count to reach zero, which
      in turn prevents merges as all !ELVPRIV requests get REQ_SOFTBARRIER
      from forced back insertion.  Plug merge doesn't check ELVPRIV, and, as
      the requests haven't gone through elevator insertion yet, it doesn't
      have SOFTBARRIER set allowing merges on a bypassed queue.
      
      This, for example, leads to the following crash during elevator
      switch.
      
       BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
       IP: [<ffffffff813b34e9>] cfq_allow_merge+0x49/0xa0
       PGD 112cbc067 PUD 115d5c067 PMD 0
       Oops: 0000 [#1] PREEMPT SMP
       CPU 1
       Modules linked in: deadline_iosched
      
       Pid: 819, comm: dd Not tainted 3.3.0-rc2-work+ #76 Bochs Bochs
       RIP: 0010:[<ffffffff813b34e9>]  [<ffffffff813b34e9>] cfq_allow_merge+0x49/0xa0
       RSP: 0018:ffff8801143a38f8  EFLAGS: 00010297
       RAX: 0000000000000000 RBX: ffff88011817ce28 RCX: ffff880116eb6cc0
       RDX: 0000000000000000 RSI: ffff880118056e20 RDI: ffff8801199512f8
       RBP: ffff8801143a3908 R08: 0000000000000000 R09: 0000000000000000
       R10: 0000000000000001 R11: 0000000000000000 R12: ffff880118195708
       R13: ffff880118052aa0 R14: ffff8801143a3d50 R15: ffff880118195708
       FS:  00007f19f82cb700(0000) GS:ffff88011fc80000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
       CR2: 0000000000000008 CR3: 0000000112c6a000 CR4: 00000000000006e0
       DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
       DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
       Process dd (pid: 819, threadinfo ffff8801143a2000, task ffff880116eb6cc0)
       Stack:
        ffff88011817ce28 ffff880118195708 ffff8801143a3928 ffffffff81391bba
        ffff88011817ce28 ffff880118195708 ffff8801143a3948 ffffffff81391bf1
        ffff88011817ce28 0000000000000000 ffff8801143a39a8 ffffffff81398e3e
       Call Trace:
        [<ffffffff81391bba>] elv_rq_merge_ok+0x4a/0x60
        [<ffffffff81391bf1>] elv_try_merge+0x21/0x40
        [<ffffffff81398e3e>] blk_queue_bio+0x8e/0x390
        [<ffffffff81396a5a>] generic_make_request+0xca/0x100
        [<ffffffff81396b04>] submit_bio+0x74/0x100
        [<ffffffff811d45c2>] __blockdev_direct_IO+0x1ce2/0x3450
        [<ffffffff811d0dc7>] blkdev_direct_IO+0x57/0x60
        [<ffffffff811460b5>] generic_file_aio_read+0x6d5/0x760
        [<ffffffff811986b2>] do_sync_read+0xe2/0x120
        [<ffffffff81199345>] vfs_read+0xc5/0x180
        [<ffffffff81199501>] sys_read+0x51/0x90
        [<ffffffff81aeac12>] system_call_fastpath+0x16/0x1b
      
      There are multiple ways to fix this including making plug merge check
      ELVPRIV; however,
      
      * Calling into elevator outside queue lock is confusing and
        error-prone.
      
      * Requests on plug list aren't known to the elevator.  They aren't on
        the elevator yet, so there's no elevator specific state to update.
      
      * Given the nature of plug merges - collecting bio's for the same
        purpose from the same issuer - elevator specific restrictions aren't
        applicable.
      
      So, simply don't call into elevator methods from plug merge by moving
      elv_bio_merged() from bio_attempt_*_merge() to blk_queue_bio(), and
      using blk_try_merge() in attempt_plug_merge().
      
      This is based on Jens' patch to skip elevator_allow_merge_fn() from
      plug merge.
      
      Note that this makes per-cgroup merged stats skip plug merging.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      LKML-Reference: <4F16F3CA.90904@kernel.dk>
      Original-patch-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      07c2bd37
    • T
      block: separate out blk_rq_merge_ok() and blk_try_merge() from elevator functions · 050c8ea8
      Tejun Heo 提交于
      blk_rq_merge_ok() is the elevator-neutral part of merge eligibility
      test.  blk_try_merge() determines merge direction and expects the
      caller to have tested elv_rq_merge_ok() previously.
      
      elv_rq_merge_ok() now wraps blk_rq_merge_ok() and then calls
      elv_iosched_allow_merge().  elv_try_merge() is removed and the two
      callers are updated to call elv_rq_merge_ok() explicitly followed by
      blk_try_merge().  While at it, make rq_merge_ok() functions return
      bool.
      
      This is to prepare for plug merge update and doesn't introduce any
      behavior change.
      
      This is based on Jens' patch to skip elevator_allow_merge_fn() from
      plug merge.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      LKML-Reference: <4F16F3CA.90904@kernel.dk>
      Original-patch-by: NJens Axboe <axboe@kernel.dk>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      050c8ea8
  7. 07 2月, 2012 2 次提交
    • T
      block: strip out locking optimization in put_io_context() · 11a3122f
      Tejun Heo 提交于
      put_io_context() performed a complex trylock dancing to avoid
      deferring ioc release to workqueue.  It was also broken on UP because
      trylock was always assumed to succeed which resulted in unbalanced
      preemption count.
      
      While there are ways to fix the UP breakage, even the most
      pathological microbench (forced ioc allocation and tight fork/exit
      loop) fails to show any appreciable performance benefit of the
      optimization.  Strip it out.  If there turns out to be workloads which
      are affected by this change, simpler optimization from the discussion
      thread can be applied later.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      LKML-Reference: <1328514611.21268.66.camel@sli10-conroe>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      11a3122f
    • H
      exec: fix use-after-free bug in setup_new_exec() · 96e02d15
      Heiko Carstens 提交于
      Setting the task name is done within setup_new_exec() by accessing
      bprm->filename. However this happens after flush_old_exec().
      This may result in a use after free bug, flush_old_exec() may
      "complete" vfork_done, which will wake up the parent which in turn
      may free the passed in filename.
      To fix this add a new tcomm field in struct linux_binprm which
      contains the now early generated task name until it is used.
      
      Fixes this bug on s390:
      
        Unable to handle kernel pointer dereference at virtual kernel address 0000000039768000
        Process kworker/u:3 (pid: 245, task: 000000003a3dc840, ksp: 0000000039453818)
        Krnl PSW : 0704000180000000 0000000000282e94 (setup_new_exec+0xa0/0x374)
        Call Trace:
        ([<0000000000282e2c>] setup_new_exec+0x38/0x374)
         [<00000000002dd12e>] load_elf_binary+0x402/0x1bf4
         [<0000000000280a42>] search_binary_handler+0x38e/0x5bc
         [<0000000000282b6c>] do_execve_common+0x410/0x514
         [<0000000000282cb6>] do_execve+0x46/0x58
         [<00000000005bce58>] kernel_execve+0x28/0x70
         [<000000000014ba2e>] ____call_usermodehelper+0x102/0x140
         [<00000000005bc8da>] kernel_thread_starter+0x6/0xc
         [<00000000005bc8d4>] kernel_thread_starter+0x0/0xc
        Last Breaking-Event-Address:
         [<00000000002830f0>] setup_new_exec+0x2fc/0x374
      
        Kernel panic - not syncing: Fatal exception: panic_on_oops
      Reported-by: NSebastian Ott <sebott@linux.vnet.ibm.com>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      96e02d15
  8. 05 2月, 2012 1 次提交
    • V
      PM / QoS: CPU C-state breakage with PM Qos change · d020283d
      Venkatesh Pallipadi 提交于
      Looks like change "PM QoS: Move and rename the implementation files"
      merged during the 3.2 development cycle made PM QoS depend on
      CONFIG_PM which depends on (PM_SLEEP || PM_RUNTIME).
      
      That breaks CPU C-states with kernels not having these CONFIGs, causing CPUs
      to spend time in Polling loop idle instead of going into deep C-states,
      consuming way way more power. This is with either acpi idle or intel idle
      enabled.
      
      Either CONFIG_PM should be enabled with any pm_qos users or
      the !CONFIG_PM pm_qos_request() should return sane defaults not to break
      the existing users. Here's is the patch for the latter option.
      
      [rjw: Modified the changelog slightly.]
      Signed-off-by: NVenkatesh Pallipadi <venki@google.com>
      Signed-off-by: NRafael J. Wysocki <rjw@sisk.pl>
      Cc: stable@vger.kernel.org
      d020283d
  9. 04 2月, 2012 1 次提交
  10. 03 2月, 2012 4 次提交
  11. 02 2月, 2012 5 次提交
  12. 01 2月, 2012 2 次提交
  13. 30 1月, 2012 2 次提交
  14. 27 1月, 2012 1 次提交
    • S
      perf: Fix broken interrupt rate throttling · e050e3f0
      Stephane Eranian 提交于
      This patch fixes the sampling interrupt throttling mechanism.
      
      It was broken in v3.2. Events were not being unthrottled. The
      unthrottling mechanism required that events be checked at each
      timer tick.
      
      This patch solves this problem and also separates:
      
        - unthrottling
        - multiplexing
        - frequency-mode period adjustments
      
      Not all of them need to be executed at each timer tick.
      
      This third version of the patch is based on my original patch +
      PeterZ proposal (https://lkml.org/lkml/2012/1/7/87).
      
      At each timer tick, for each context:
      
        - if the current CPU has throttled events, we unthrottle events
      
        - if context has frequency-based events, we adjust sampling periods
      
        - if we have reached the jiffies interval, we multiplex (rotate)
      
      We decoupled rotation (multiplexing) from frequency-mode sampling
      period adjustments.  They should not necessarily happen at the same
      rate. Multiplexing is subject to jiffies_interval (currently at 1
      but could be higher once the tunable is exposed via sysfs).
      
      We have grouped frequency-mode adjustment and unthrottling into the
      same routine to minimize code duplication. When throttled while in
      frequency mode, we scan the events only once.
      
      We have fixed the threshold enforcement code in __perf_event_overflow().
      There was a bug whereby it would allow more than the authorized rate
      because an increment of hwc->interrupts was not executed at the right
      place.
      
      The patch was tested with low sampling limit (2000) and fixed periods,
      frequency mode, overcommitted PMU.
      
      On a 2.1GHz AMD CPU:
      
       $ cat /proc/sys/kernel/perf_event_max_sample_rate
       2000
      
      We set a rate of 3000 samples/sec (2.1GHz/3000 = 700000):
      
       $ perf record -e cycles,cycles -c 700000  noploop 10
       $ perf report -D | tail -21
      
       Aggregated stats:
                 TOTAL events:      80086
                  MMAP events:         88
                  COMM events:          2
                  EXIT events:          4
              THROTTLE events:      19996
            UNTHROTTLE events:      19996
                SAMPLE events:      40000
      
       cycles stats:
                 TOTAL events:      40006
                  MMAP events:          5
                  COMM events:          1
                  EXIT events:          4
              THROTTLE events:       9998
            UNTHROTTLE events:       9998
                SAMPLE events:      20000
      
       cycles stats:
                 TOTAL events:      39996
              THROTTLE events:       9998
            UNTHROTTLE events:       9998
                SAMPLE events:      20000
      
      For 10s, the cap is 2x2000x10 = 40000 samples.
      We get exactly that: 20000 samples/event.
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Cc: <stable@kernel.org> # v3.2+
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Link: http://lkml.kernel.org/r/20120126160319.GA5655@quadSigned-off-by: NIngo Molnar <mingo@elte.hu>
      e050e3f0
  15. 25 1月, 2012 3 次提交
    • J
      team: send only changed options/ports via netlink · b82b9183
      Jiri Pirko 提交于
      This patch changes event message behaviour to send only updated records
      instead of whole list. This fixes bug on which userspace receives non-actual
      data in case multiple events occur in row.
      Signed-off-by: NJiri Pirko <jpirko@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b82b9183
    • R
      kernel-doc: fix new warning in usb.h · 0fcd9778
      Randy Dunlap 提交于
      Fix new kernel-doc warning:
      
      Warning(include/linux/usb.h:1251): No description found for parameter 'num_mapped_sgs'
      Signed-off-by: NRandy Dunlap <rdunlap@xenotime.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      0fcd9778
    • R
      kernel-doc: fix new warnings in device.h · 1a5e29fc
      Randy Dunlap 提交于
      Fix new kernel-doc warnings:
      
      Warning(include/linux/device.h:299): No description found for parameter 'name'
      Warning(include/linux/device.h:299): No description found for parameter 'subsys'
      Warning(include/linux/device.h:299): No description found for parameter 'node'
      Warning(include/linux/device.h:299): No description found for parameter 'add_dev'
      Warning(include/linux/device.h:299): No description found for parameter 'remove_dev'
      Warning(include/linux/device.h:685): No description found for parameter 'id'
      Warning(include/linux/device.h:1009): No description found for parameter '__driver'
      Warning(include/linux/device.h:1009): No description found for parameter '__register'
      Warning(include/linux/device.h:1009): No description found for parameter '__unregister'
      Signed-off-by: NRandy Dunlap <rdunlap@xenotime.net>
      Cc: Lars-Peter Clausen <lars@metafoo.de>
      Cc: Greg Kroah-Hartman <gregkh@suse.de>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      1a5e29fc