1. 10 6月, 2014 1 次提交
  2. 09 6月, 2014 1 次提交
    • V
      cpufreq: governor: remove copy_prev_load from 'struct cpu_dbs_common_info' · c8ae481b
      Viresh Kumar 提交于
      'copy_prev_load' was recently added by commit: 18b46abd (cpufreq: governor: Be
      friendly towards latency-sensitive bursty workloads).
      
      It actually is a bit redundant as we also have 'prev_load' which can store any
      integer value and can be used instead of 'copy_prev_load' by setting it zero.
      
      True load can also turn out to be zero during long idle intervals (and hence the
      actual value of 'prev_load' and the overloaded value can clash). However this is
      not a problem because, if the true load was really zero in the previous
      interval, it makes sense to evaluate the load afresh for the current interval
      rather than copying the previous load.
      
      So, drop 'copy_prev_load' and use 'prev_load' instead.
      
      Update comments as well to make it more clear.
      
      There is another change here which was probably missed by Srivatsa during the
      last version of updates he made. The unlikely in the 'if' statement was covering
      only half of the condition and the whole line should actually come under it.
      
      Also checkpatch is made more silent as it was reporting this (--strict option):
      
      CHECK: Alignment should match open parenthesis
      +		if (unlikely(wall_time > (2 * sampling_rate) &&
      +						j_cdbs->prev_load)) {
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Reviewed-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      c8ae481b
  3. 08 6月, 2014 1 次提交
    • S
      cpufreq: governor: Be friendly towards latency-sensitive bursty workloads · 18b46abd
      Srivatsa S. Bhat 提交于
      Cpufreq governors like the ondemand governor calculate the load on the CPU
      periodically by employing deferrable timers. A deferrable timer won't fire
      if the CPU is completely idle (and there are no other timers to be run), in
      order to avoid unnecessary wakeups and thus save CPU power.
      
      However, the load calculation logic is agnostic to all this, and this can
      lead to the problem described below.
      
      Time (ms)               CPU 1
      
      100                Task-A running
      
      110                Governor's timer fires, finds load as 100% in the last
                         10ms interval and increases the CPU frequency.
      
      110.5              Task-A running
      
      120		   Governor's timer fires, finds load as 100% in the last
      		   10ms interval and increases the CPU frequency.
      
      125		   Task-A went to sleep. With nothing else to do, CPU 1
      		   went completely idle.
      
      200		   Task-A woke up and started running again.
      
      200.5		   Governor's deferred timer (which was originally programmed
      		   to fire at time 130) fires now. It calculates load for the
      		   time period 120 to 200.5, and finds the load is almost zero.
      		   Hence it decreases the CPU frequency to the minimum.
      
      210		   Governor's timer fires, finds load as 100% in the last
      		   10ms interval and increases the CPU frequency.
      
      So, after the workload woke up and started running, the frequency was suddenly
      dropped to absolute minimum, and after that, there was an unnecessary delay of
      10ms (sampling period) to increase the CPU frequency back to a reasonable value.
      And this pattern repeats for every wake-up-from-cpu-idle for that workload.
      This can be quite undesirable for latency- or response-time sensitive bursty
      workloads. So we need to fix the governor's logic to detect such wake-up-from-
      cpu-idle scenarios and start the workload at a reasonably high CPU frequency.
      
      One extreme solution would be to fake a load of 100% in such scenarios. But
      that might lead to undesirable side-effects such as frequency spikes (which
      might also need voltage changes) especially if the previous frequency happened
      to be very low.
      
      We just want to avoid the stupidity of dropping down the frequency to a minimum
      and then enduring a needless (and long) delay before ramping it up back again.
      So, let us simply carry forward the previous load - that is, let us just pretend
      that the 'load' for the current time-window is the same as the load for the
      previous window. That way, the frequency and voltage will continue to be set
      to whatever values they were set at previously. This means that bursty workloads
      will get a chance to influence the CPU frequency at which they wake up from
      cpu-idle, based on their past execution history. Thus, they might be able to
      avoid suffering from slow wakeups and long response-times.
      
      However, we should take care not to over-do this. For example, such a "copy
      previous load" logic will benefit cases like this: (where # represents busy
      and . represents idle)
      
      ##########.........#########.........###########...........##########........
      
      but it will be detrimental in cases like the one shown below, because it will
      retain the high frequency (copied from the previous interval) even in a mostly
      idle system:
      
      ##########.........#.................#.....................#...............
      
      (i.e., the workload finished and the remaining tasks are such that their busy
      periods are smaller than the sampling interval, which causes the timer to
      always get deferred. So, this will make the copy-previous-load logic copy
      the initial high load to subsequent idle periods over and over again, thus
      keeping the frequency high unnecessarily).
      
      So, we modify this copy-previous-load logic such that it is used only once
      upon every wakeup-from-idle. Thus if we have 2 consecutive idle periods, the
      previous load won't get blindly copied over; cpufreq will freshly evaluate the
      load in the second idle interval, thus ensuring that the system comes back to
      its normal state.
      
      [ The right way to solve this whole problem is to teach the CPU frequency
      governors to also track load on a per-task basis, not just a per-CPU basis,
      and then use both the data sources intelligently to set the appropriate
      frequency on the CPUs. But that involves redesigning the cpufreq subsystem,
      so this patch should make the situation bearable until then. ]
      
      Experimental results:
      +-------------------+
      
      I ran a modified version of ebizzy (called 'sleeping-ebizzy') that sleeps in
      between its execution such that its total utilization can be a user-defined
      value, say 10% or 20% (higher the utilization specified, lesser the amount of
      sleeps injected). This ebizzy was run with a single-thread, tied to CPU 8.
      
      Behavior observed with tracing (sample taken from 40% utilization runs):
      ------------------------------------------------------------------------
      
      Without patch:
      ~~~~~~~~~~~~~~
      kworker/8:2-12137  416.335742: cpu_frequency: state=2061000 cpu_id=8
      kworker/8:2-12137  416.335744: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy
            <...>-40753  416.345741: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2
      kworker/8:2-12137  416.345744: cpu_frequency: state=4123000 cpu_id=8
      kworker/8:2-12137  416.345746: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy
            <...>-40753  416.355738: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2
      <snip>  ---------------------------------------------------------------------  <snip>
            <...>-40753  416.402202: sched_switch: prev_comm=ebizzy ==> next_comm=swapper/8
           <idle>-0      416.502130: sched_switch: prev_comm=swapper/8 ==> next_comm=ebizzy
            <...>-40753  416.505738: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2
      kworker/8:2-12137  416.505739: cpu_frequency: state=2061000 cpu_id=8
      kworker/8:2-12137  416.505741: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy
            <...>-40753  416.515739: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2
      kworker/8:2-12137  416.515742: cpu_frequency: state=4123000 cpu_id=8
      kworker/8:2-12137  416.515744: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy
      
      Observation: Ebizzy went idle at 416.402202, and started running again at
      416.502130. But cpufreq noticed the long idle period, and dropped the frequency
      at 416.505739, only to increase it back again at 416.515742, realizing that the
      workload is in-fact CPU bound. Thus ebizzy needlessly ran at the lowest frequency
      for almost 13 milliseconds (almost 1 full sample period), and this pattern
      repeats on every sleep-wakeup. This could hurt latency-sensitive workloads quite
      a lot.
      
      With patch:
      ~~~~~~~~~~~
      
      kworker/8:2-29802  464.832535: cpu_frequency: state=2061000 cpu_id=8
      <snip>  ---------------------------------------------------------------------  <snip>
      kworker/8:2-29802  464.962538: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy
            <...>-40738  464.972533: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2
      kworker/8:2-29802  464.972536: cpu_frequency: state=4123000 cpu_id=8
      kworker/8:2-29802  464.972538: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy
            <...>-40738  464.982531: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2
      <snip>  ---------------------------------------------------------------------  <snip>
      kworker/8:2-29802  465.022533: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy
            <...>-40738  465.032531: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2
      kworker/8:2-29802  465.032532: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy
            <...>-40738  465.035797: sched_switch: prev_comm=ebizzy ==> next_comm=swapper/8
           <idle>-0      465.240178: sched_switch: prev_comm=swapper/8 ==> next_comm=ebizzy
            <...>-40738  465.242533: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2
      kworker/8:2-29802  465.242535: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy
            <...>-40738  465.252531: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2
      
      Observation: Ebizzy went idle at 465.035797, and started running again at
      465.240178. Since ebizzy was the only real workload running on this CPU,
      cpufreq retained the frequency at 4.1Ghz throughout the run of ebizzy, no
      matter how many times ebizzy slept and woke-up in-between. Thus, ebizzy
      got the 10ms worth of 4.1 Ghz benefit during every sleep-wakeup (as compared
      to the run without the patch) and this boost gave a modest improvement in total
      throughput, as shown below.
      
      Sleeping-ebizzy records-per-second:
      -----------------------------------
      
      Utilization  Without patch  With patch  Difference (Absolute and % values)
          10%         274767        277046        +  2279 (+0.829%)
          20%         543429        553484        + 10055 (+1.850%)
          40%        1090744       1107959        + 17215 (+1.578%)
          60%        1634908       1662018        + 27110 (+1.658%)
      
      A rudimentary and somewhat approximately latency-sensitive workload such as
      sleeping-ebizzy itself showed a consistent, noticeable performance improvement
      with this patch. Hence, workloads that are truly latency-sensitive will benefit
      quite a bit from this change. Moreover, this is an overall win-win since this
      patch does not hurt power-savings at all (because, this patch does not reduce
      the idle time or idle residency; and the high frequency of the CPU when it goes
      to cpu-idle does not affect/hurt the power-savings of deep idle states).
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Reviewed-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
      Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      18b46abd
  4. 07 6月, 2014 2 次提交
  5. 06 6月, 2014 2 次提交
    • V
      cpufreq: Tegra: implement intermediate frequency callbacks · 00917ddc
      Viresh Kumar 提交于
      Tegra has been switching to intermediate frequency (pll_p_clk) forever.
      CPUFreq core has better support for handling notifications for these
      frequencies and so we can adapt Tegra's driver to it.
      
      Also do a WARN() if clk_set_parent() fails while moving back to pll_x
      as we should have atleast restored to earlier frequency on error.
      Tested-by: NStephen Warren <swarren@nvidia.com>
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Reviewed-by: NDoug Anderson <dianders@chromium.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      00917ddc
    • V
      cpufreq: add support for intermediate (stable) frequencies · 1c03a2d0
      Viresh Kumar 提交于
      Douglas Anderson, recently pointed out an interesting problem due to which
      udelay() was expiring earlier than it should.
      
      While transitioning between frequencies few platforms may temporarily switch to
      a stable frequency, waiting for the main PLL to stabilize.
      
      For example: When we transition between very low frequencies on exynos, like
      between 200MHz and 300MHz, we may temporarily switch to a PLL running at 800MHz.
      No CPUFREQ notification is sent for that. That means there's a period of time
      when we're running at 800MHz but loops_per_jiffy is calibrated at between 200MHz
      and 300MHz. And so udelay behaves badly.
      
      To get this fixed in a generic way, introduce another set of callbacks
      get_intermediate() and target_intermediate(), only for drivers with
      target_index() and CPUFREQ_ASYNC_NOTIFICATION unset.
      
      get_intermediate() should return a stable intermediate frequency platform wants
      to switch to, and target_intermediate() should set CPU to that frequency,
      before jumping to the frequency corresponding to 'index'. Core will take care of
      sending notifications and driver doesn't have to handle them in
      target_intermediate() or target_index().
      
      NOTE: ->target_index() should restore to policy->restore_freq in case of
      failures as core would send notifications for that.
      Tested-by: NStephen Warren <swarren@nvidia.com>
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Reviewed-by: NDoug Anderson <dianders@chromium.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      1c03a2d0
  6. 03 6月, 2014 1 次提交
  7. 02 6月, 2014 9 次提交
  8. 01 6月, 2014 3 次提交
    • L
      Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · a4bf79eb
      Linus Torvalds 提交于
      Pull core futex/rtmutex fixes from Thomas Gleixner:
       "Three fixlets for long standing issues in the futex/rtmutex code
        unearthed by Dave Jones syscall fuzzer:
      
         - Add missing early deadlock detection checks in the futex code
         - Prevent user space from attaching a futex to kernel threads
         - Make the deadlock detector of rtmutex work again
      
        Looks large, but is more comments than code change"
      
      * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
        rtmutex: Fix deadlock detector for real
        futex: Prevent attaching to kernel threads
        futex: Add another early deadlock detection check
      a4bf79eb
    • L
      Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux · 80e06794
      Linus Torvalds 提交于
      Pull drm fixes from Dave Airlie:
       "Mostly quiet now:
      
        i915:
          fixing userspace visiblie issues, all stable marked
      
        radeon:
          one more pll fix, two crashers, one suspend/resume regression"
      
      * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
        drm/radeon: Resume fbcon last
        drm/radeon: only allocate necessary size for vm bo list
        drm/radeon: don't allow RADEON_GEM_DOMAIN_CPU for command submission
        drm/radeon: avoid crash if VM command submission isn't available
        drm/radeon: lower the ref * post PLL maximum once more
        drm/i915: Prevent negative relocation deltas from wrapping
        drm/i915: Only copy back the modified fields to userspace from execbuffer
        drm/i915: Fix dynamic allocation of physical handles
      80e06794
    • L
      dcache: add missing lockdep annotation · 9f12600f
      Linus Torvalds 提交于
      lock_parent() very much on purpose does nested locking of dentries, and
      is careful to maintain the right order (lock parent first).  But because
      it didn't annotate the nested locking order, lockdep thought it might be
      a deadlock on d_lock, and complained.
      
      Add the proper annotation for the inner locking of the child dentry to
      make lockdep happy.
      
      Introduced by commit 046b961b ("shrink_dentry_list(): take parent's
      ->d_lock earlier").
      Reported-and-tested-by: NJosh Boyer <jwboyer@fedoraproject.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9f12600f
  9. 31 5月, 2014 7 次提交
    • D
      drm/radeon: Resume fbcon last · 18ee37a4
      Daniel Vetter 提交于
      So a few people complained that
      
      commit 177cf92d
      Author: Daniel Vetter <daniel.vetter@ffwll.ch>
      Date:   Tue Apr 1 22:14:59 2014 +0200
      
          drm/crtc-helpers: fix dpms on logic
      
      which was merged into 3.15-rc1, broke resume on radeons. Strangely git
      bisect lead everyone to
      
      commit 25f397a4
      Author: Daniel Vetter <daniel.vetter@ffwll.ch>
      Date:   Fri Jul 19 18:57:11 2013 +0200
      
          drm/crtc-helper: explicit DPMS on after modeset
      
      which was merged long ago and actually part of 3.14.
      
      Digging deeper I've noticed (again) that the call to
      drm_helper_resume_force_mode in the radeon resume handlers was a no-op
      previously because everything gets shut down on suspend. radeon does
      this with explicit calls to drm_helper_connector_dpms with DPMS_OFF.
      But with 177c we now force the dpms state to ON, so suddenly
      resume_force_mode actually forced the crtcs back on.
      
      This is the intention of the change after all, the problem is that
      radeon resumes the fbdev console layer _before_ restoring the display,
      through calling fb_set_suspend. And fbcon does an immediate ->set_par,
      which in turn causes the same forced mode restore to happen.
      
      Two concurrent modeset operations didn't lead to happiness. Fix this
      by delaying the fbcon resume until the end of the readeon resum
      functions.
      
      v2: Fix up a bit of the spelling fail.
      
      References: https://lkml.org/lkml/2014/5/29/1043
      References: https://lkml.org/lkml/2014/5/2/388
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=74751Tested-by: NKen Moffat <zarniwhoop@ntlworld.com>
      Cc: Alex Deucher <alexdeucher@gmail.com>
      Cc: Ken Moffat <zarniwhoop@ntlworld.com>
      Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
      Signed-off-by: NDave Airlie <airlied@gmail.com>
      18ee37a4
    • D
      Merge branch 'drm-fixes-3.15' of git://people.freedesktop.org/~deathsimple/linux into drm-fixes · 1446e04c
      Dave Airlie 提交于
      this is the next pull request for stashed up radeon fixes for 3.15. This is finally calming down with only four patches in this pull request.
      
      * 'drm-fixes-3.15' of git://people.freedesktop.org/~deathsimple/linux:
        drm/radeon: only allocate necessary size for vm bo list
        drm/radeon: don't allow RADEON_GEM_DOMAIN_CPU for command submission
        drm/radeon: avoid crash if VM command submission isn't available
        drm/radeon: lower the ref * post PLL maximum once more
      1446e04c
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · 1487385e
      Linus Torvalds 提交于
      Pull input subsystem fixes from Dmitry Torokhov:
       "A couple of driver/build fixups and also redone quirk for Synaptics
        touchpads on Lenovo boxes (now using PNP IDs instead of DMI data to
        limit number of quirks)"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
        Input: synaptics - change min/max quirk table to pnp-id matching
        Input: synaptics - add a matches_pnp_id helper function
        Input: synaptics - T540p - unify with other LEN0034 models
        Input: synaptics - add min/max quirk for the ThinkPad W540
        Input: ambakmi - request a shared interrupt for AMBA KMI devices
        Input: pxa27x-keypad - fix generating scancode
        Input: atmel-wm97xx - only build for AVR32
        Input: fix ps2/serio module dependency
      1487385e
    • L
      Merge tag 'firewire-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394 · 1326af24
      Linus Torvalds 提交于
      Pull firewire fix from Stefan Richter:
       "A regression fix for the IEEE 1394 subsystem: re-enable IRQ-based
        asynchronous request reception at addresses below 128 TB"
      
      * tag 'firewire-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394:
        firewire: revert to 4 GB RDMA, fix protocols using Memory Space
      1326af24
    • L
      Merge tag 'dm-3.15-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm · 24e19d27
      Linus Torvalds 提交于
      Pull device-mapper fixes from Mike Snitzer:
       "A dm-cache stable fix to split discards on cache block boundaries
        because dm-cache cannot yet handle discards that span cache blocks.
      
        Really fix a dm-mpath LOCKDEP warning that was introduced in -rc1.
      
        Add a 'no_space_timeout' control to dm-thinp to restore the ability to
        queue IO indefinitely when no data space is available.  This fixes a
        change in behavior that was introduced in -rc6 where the timeout
        couldn't be disabled"
      
      * tag 'dm-3.15-fixes-3' of git://git.kernel.org/pub/scm/linux/kernel/git/device-mapper/linux-dm:
        dm mpath: really fix lockdep warning
        dm cache: always split discards on cache block boundaries
        dm thin: add 'no_space_timeout' dm-thin-pool module param
      24e19d27
    • M
      x86_64: expand kernel stack to 16K · 6538b8ea
      Minchan Kim 提交于
      While I play inhouse patches with much memory pressure on qemu-kvm,
      3.14 kernel was randomly crashed. The reason was kernel stack overflow.
      
      When I investigated the problem, the callstack was a little bit deeper
      by involve with reclaim functions but not direct reclaim path.
      
      I tried to diet stack size of some functions related with alloc/reclaim
      so did a hundred of byte but overflow was't disappeard so that I encounter
      overflow by another deeper callstack on reclaim/allocator path.
      
      Of course, we might sweep every sites we have found for reducing
      stack usage but I'm not sure how long it saves the world(surely,
      lots of developer start to add nice features which will use stack
      agains) and if we consider another more complex feature in I/O layer
      and/or reclaim path, it might be better to increase stack size(
      meanwhile, stack usage on 64bit machine was doubled compared to 32bit
      while it have sticked to 8K. Hmm, it's not a fair to me and arm64
      already expaned to 16K. )
      
      So, my stupid idea is just let's expand stack size and keep an eye
      toward stack consumption on each kernel functions via stacktrace of ftrace.
      For example, we can have a bar like that each funcion shouldn't exceed 200K
      and emit the warning when some function consumes more in runtime.
      Of course, it could make false positive but at least, it could make a
      chance to think over it.
      
      I guess this topic was discussed several time so there might be
      strong reason not to increase kernel stack size on x86_64, for me not
      knowing so Ccing x86_64 maintainers, other MM guys and virtio
      maintainers.
      
      Here's an example call trace using up the kernel stack:
      
               Depth    Size   Location    (51 entries)
               -----    ----   --------
         0)     7696      16   lookup_address
         1)     7680      16   _lookup_address_cpa.isra.3
         2)     7664      24   __change_page_attr_set_clr
         3)     7640     392   kernel_map_pages
         4)     7248     256   get_page_from_freelist
         5)     6992     352   __alloc_pages_nodemask
         6)     6640       8   alloc_pages_current
         7)     6632     168   new_slab
         8)     6464       8   __slab_alloc
         9)     6456      80   __kmalloc
        10)     6376     376   vring_add_indirect
        11)     6000     144   virtqueue_add_sgs
        12)     5856     288   __virtblk_add_req
        13)     5568      96   virtio_queue_rq
        14)     5472     128   __blk_mq_run_hw_queue
        15)     5344      16   blk_mq_run_hw_queue
        16)     5328      96   blk_mq_insert_requests
        17)     5232     112   blk_mq_flush_plug_list
        18)     5120     112   blk_flush_plug_list
        19)     5008      64   io_schedule_timeout
        20)     4944     128   mempool_alloc
        21)     4816      96   bio_alloc_bioset
        22)     4720      48   get_swap_bio
        23)     4672     160   __swap_writepage
        24)     4512      32   swap_writepage
        25)     4480     320   shrink_page_list
        26)     4160     208   shrink_inactive_list
        27)     3952     304   shrink_lruvec
        28)     3648      80   shrink_zone
        29)     3568     128   do_try_to_free_pages
        30)     3440     208   try_to_free_pages
        31)     3232     352   __alloc_pages_nodemask
        32)     2880       8   alloc_pages_current
        33)     2872     200   __page_cache_alloc
        34)     2672      80   find_or_create_page
        35)     2592      80   ext4_mb_load_buddy
        36)     2512     176   ext4_mb_regular_allocator
        37)     2336     128   ext4_mb_new_blocks
        38)     2208     256   ext4_ext_map_blocks
        39)     1952     160   ext4_map_blocks
        40)     1792     384   ext4_writepages
        41)     1408      16   do_writepages
        42)     1392      96   __writeback_single_inode
        43)     1296     176   writeback_sb_inodes
        44)     1120      80   __writeback_inodes_wb
        45)     1040     160   wb_writeback
        46)      880     208   bdi_writeback_workfn
        47)      672     144   process_one_work
        48)      528     112   worker_thread
        49)      416     240   kthread
        50)      176     176   ret_from_fork
      
      [ Note: the problem is exacerbated by certain gcc versions that seem to
        generate much bigger stack frames due to apparently bad coalescing of
        temporaries and generating too many spills.  Rusty saw gcc-4.6.4 using
        35% more stack on the virtio path than 4.8.2 does, for example.
      
        Minchan not only uses such a bad gcc version (4.6.3 in his case), but
        some of the stack use is due to debugging (CONFIG_DEBUG_PAGEALLOC is
        what causes that kernel_map_pages() frame, for example). But we're
        clearly getting too close.
      
        The VM code also seems to have excessive stack frames partly for the
        same compiler reason, triggered by excessive inlining and lots of
        function arguments.
      
        We need to improve on our stack use, but in the meantime let's do this
        simple stack increase too.  Unlike most earlier reports, there is
        nothing simple that stands out as being really horribly wrong here,
        apart from the fact that the stack frames are just bigger than they
        should need to be.        - Linus ]
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Cc: Peter Anvin <hpa@zytor.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Michael S Tsirkin <mst@redhat.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: PJ Waskiewicz <pjwaskiewicz@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6538b8ea
    • L
      Merge branch 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · 6f6111e4
      Linus Torvalds 提交于
      Pull vfs dcache livelock fix from Al Viro:
       "Fixes for livelocks in shrink_dentry_list() introduced by fixes to
        shrink list corruption; the root cause was that trylock of parent's
        ->d_lock could be disrupted by d_walk() happening on other CPUs,
        resulting in shrink_dentry_list() making no progress *and* the same
        d_walk() being called again and again for as long as
        shrink_dentry_list() doesn't get past that mess.
      
        The solution is to have shrink_dentry_list() treat that trylock
        failure not as 'try to do the same thing again', but 'lock them in the
        right order'"
      
      * 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        dentry_kill() doesn't need the second argument now
        dealing with the rest of shrink_dentry_list() livelock
        shrink_dentry_list(): take parent's ->d_lock earlier
        expand dentry_kill(dentry, 0) in shrink_dentry_list()
        split dentry_kill()
        lift the "already marked killed" case into shrink_dentry_list()
      6f6111e4
  10. 30 5月, 2014 11 次提交
  11. 29 5月, 2014 2 次提交