提交 · c8ae481b9a12f5cea080651ea87736104b111f8e · openeuler / Kernel

09 6月, 2014 1 次提交

cpufreq: governor: remove copy_prev_load from 'struct cpu_dbs_common_info' · c8ae481b

由 Viresh Kumar 提交于 6月 09, 2014

'copy_prev_load' was recently added by commit: 18b46abd (cpufreq: governor: Be
friendly towards latency-sensitive bursty workloads).

It actually is a bit redundant as we also have 'prev_load' which can store any
integer value and can be used instead of 'copy_prev_load' by setting it zero.

True load can also turn out to be zero during long idle intervals (and hence the
actual value of 'prev_load' and the overloaded value can clash). However this is
not a problem because, if the true load was really zero in the previous
interval, it makes sense to evaluate the load afresh for the current interval
rather than copying the previous load.

So, drop 'copy_prev_load' and use 'prev_load' instead.

Update comments as well to make it more clear.

There is another change here which was probably missed by Srivatsa during the
last version of updates he made. The unlikely in the 'if' statement was covering
only half of the condition and the whole line should actually come under it.

Also checkpatch is made more silent as it was reporting this (--strict option):

CHECK: Alignment should match open parenthesis
+		if (unlikely(wall_time > (2 * sampling_rate) &&
+						j_cdbs->prev_load)) {
Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: NPavel Machek <pavel@ucw.cz>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

c8ae481b

08 6月, 2014 1 次提交

cpufreq: governor: Be friendly towards latency-sensitive bursty workloads · 18b46abd

由 Srivatsa S. Bhat 提交于 6月 08, 2014

Cpufreq governors like the ondemand governor calculate the load on the CPU
periodically by employing deferrable timers. A deferrable timer won't fire
if the CPU is completely idle (and there are no other timers to be run), in
order to avoid unnecessary wakeups and thus save CPU power.

However, the load calculation logic is agnostic to all this, and this can
lead to the problem described below.

Time (ms)               CPU 1

100                Task-A running

110                Governor's timer fires, finds load as 100% in the last
                   10ms interval and increases the CPU frequency.

110.5              Task-A running

120		   Governor's timer fires, finds load as 100% in the last
		   10ms interval and increases the CPU frequency.

125		   Task-A went to sleep. With nothing else to do, CPU 1
		   went completely idle.

200		   Task-A woke up and started running again.

200.5		   Governor's deferred timer (which was originally programmed
		   to fire at time 130) fires now. It calculates load for the
		   time period 120 to 200.5, and finds the load is almost zero.
		   Hence it decreases the CPU frequency to the minimum.

210		   Governor's timer fires, finds load as 100% in the last
		   10ms interval and increases the CPU frequency.

So, after the workload woke up and started running, the frequency was suddenly
dropped to absolute minimum, and after that, there was an unnecessary delay of
10ms (sampling period) to increase the CPU frequency back to a reasonable value.
And this pattern repeats for every wake-up-from-cpu-idle for that workload.
This can be quite undesirable for latency- or response-time sensitive bursty
workloads. So we need to fix the governor's logic to detect such wake-up-from-
cpu-idle scenarios and start the workload at a reasonably high CPU frequency.

One extreme solution would be to fake a load of 100% in such scenarios. But
that might lead to undesirable side-effects such as frequency spikes (which
might also need voltage changes) especially if the previous frequency happened
to be very low.

We just want to avoid the stupidity of dropping down the frequency to a minimum
and then enduring a needless (and long) delay before ramping it up back again.
So, let us simply carry forward the previous load - that is, let us just pretend
that the 'load' for the current time-window is the same as the load for the
previous window. That way, the frequency and voltage will continue to be set
to whatever values they were set at previously. This means that bursty workloads
will get a chance to influence the CPU frequency at which they wake up from
cpu-idle, based on their past execution history. Thus, they might be able to
avoid suffering from slow wakeups and long response-times.

However, we should take care not to over-do this. For example, such a "copy
previous load" logic will benefit cases like this: (where # represents busy
and . represents idle)

##########.........#########.........###########...........##########........

but it will be detrimental in cases like the one shown below, because it will
retain the high frequency (copied from the previous interval) even in a mostly
idle system:

##########.........#.................#.....................#...............

(i.e., the workload finished and the remaining tasks are such that their busy
periods are smaller than the sampling interval, which causes the timer to
always get deferred. So, this will make the copy-previous-load logic copy
the initial high load to subsequent idle periods over and over again, thus
keeping the frequency high unnecessarily).

So, we modify this copy-previous-load logic such that it is used only once
upon every wakeup-from-idle. Thus if we have 2 consecutive idle periods, the
previous load won't get blindly copied over; cpufreq will freshly evaluate the
load in the second idle interval, thus ensuring that the system comes back to
its normal state.

[ The right way to solve this whole problem is to teach the CPU frequency
governors to also track load on a per-task basis, not just a per-CPU basis,
and then use both the data sources intelligently to set the appropriate
frequency on the CPUs. But that involves redesigning the cpufreq subsystem,
so this patch should make the situation bearable until then. ]

Experimental results:
+-------------------+

I ran a modified version of ebizzy (called 'sleeping-ebizzy') that sleeps in
between its execution such that its total utilization can be a user-defined
value, say 10% or 20% (higher the utilization specified, lesser the amount of
sleeps injected). This ebizzy was run with a single-thread, tied to CPU 8.

Behavior observed with tracing (sample taken from 40% utilization runs):
------------------------------------------------------------------------

Without patch:
~~~~~~~~~~~~~~
kworker/8:2-12137  416.335742: cpu_frequency: state=2061000 cpu_id=8
kworker/8:2-12137  416.335744: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy
      <...>-40753  416.345741: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2
kworker/8:2-12137  416.345744: cpu_frequency: state=4123000 cpu_id=8
kworker/8:2-12137  416.345746: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy
      <...>-40753  416.355738: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2
<snip>  ---------------------------------------------------------------------  <snip>
      <...>-40753  416.402202: sched_switch: prev_comm=ebizzy ==> next_comm=swapper/8
     <idle>-0      416.502130: sched_switch: prev_comm=swapper/8 ==> next_comm=ebizzy
      <...>-40753  416.505738: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2
kworker/8:2-12137  416.505739: cpu_frequency: state=2061000 cpu_id=8
kworker/8:2-12137  416.505741: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy
      <...>-40753  416.515739: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2
kworker/8:2-12137  416.515742: cpu_frequency: state=4123000 cpu_id=8
kworker/8:2-12137  416.515744: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy

Observation: Ebizzy went idle at 416.402202, and started running again at
416.502130. But cpufreq noticed the long idle period, and dropped the frequency
at 416.505739, only to increase it back again at 416.515742, realizing that the
workload is in-fact CPU bound. Thus ebizzy needlessly ran at the lowest frequency
for almost 13 milliseconds (almost 1 full sample period), and this pattern
repeats on every sleep-wakeup. This could hurt latency-sensitive workloads quite
a lot.

With patch:
~~~~~~~~~~~

kworker/8:2-29802  464.832535: cpu_frequency: state=2061000 cpu_id=8
<snip>  ---------------------------------------------------------------------  <snip>
kworker/8:2-29802  464.962538: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy
      <...>-40738  464.972533: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2
kworker/8:2-29802  464.972536: cpu_frequency: state=4123000 cpu_id=8
kworker/8:2-29802  464.972538: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy
      <...>-40738  464.982531: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2
<snip>  ---------------------------------------------------------------------  <snip>
kworker/8:2-29802  465.022533: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy
      <...>-40738  465.032531: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2
kworker/8:2-29802  465.032532: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy
      <...>-40738  465.035797: sched_switch: prev_comm=ebizzy ==> next_comm=swapper/8
     <idle>-0      465.240178: sched_switch: prev_comm=swapper/8 ==> next_comm=ebizzy
      <...>-40738  465.242533: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2
kworker/8:2-29802  465.242535: sched_switch: prev_comm=kworker/8:2 ==> next_comm=ebizzy
      <...>-40738  465.252531: sched_switch: prev_comm=ebizzy ==> next_comm=kworker/8:2

Observation: Ebizzy went idle at 465.035797, and started running again at
465.240178. Since ebizzy was the only real workload running on this CPU,
cpufreq retained the frequency at 4.1Ghz throughout the run of ebizzy, no
matter how many times ebizzy slept and woke-up in-between. Thus, ebizzy
got the 10ms worth of 4.1 Ghz benefit during every sleep-wakeup (as compared
to the run without the patch) and this boost gave a modest improvement in total
throughput, as shown below.

Sleeping-ebizzy records-per-second:
-----------------------------------

Utilization  Without patch  With patch  Difference (Absolute and % values)
    10%         274767        277046        +  2279 (+0.829%)
    20%         543429        553484        + 10055 (+1.850%)
    40%        1090744       1107959        + 17215 (+1.578%)
    60%        1634908       1662018        + 27110 (+1.658%)

A rudimentary and somewhat approximately latency-sensitive workload such as
sleeping-ebizzy itself showed a consistent, noticeable performance improvement
with this patch. Hence, workloads that are truly latency-sensitive will benefit
quite a bit from this change. Moreover, this is an overall win-win since this
patch does not hurt power-savings at all (because, this patch does not reduce
the idle time or idle residency; and the high frequency of the CPU when it goes
to cpu-idle does not affect/hurt the power-savings of deep idle states).
Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Reviewed-by: NGautham R. Shenoy <ego@linux.vnet.ibm.com>
Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

18b46abd

07 6月, 2014 2 次提交

cpufreq: ppc-corenet-cpu-freq: do_div use quotient · 906fe033

由 Ed Swarthout 提交于 6月 05, 2014

Commit 6712d293 (cpufreq: ppc-corenet-cpufreq: Fix __udivdi3 modpost
error) used the remainder from do_div instead of the quotient.  Fix that
and add one to ensure minimum is met.

Fixes: 6712d293 (cpufreq: ppc-corenet-cpufreq: Fix __udivdi3 modpost error)
Signed-off-by: NEd Swarthout <Ed.Swarthout@freescale.com>
Cc: 3.15+ <stable@vger.kernel.org> # 3.15+
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

906fe033

Revert "cpufreq: Enable big.LITTLE cpufreq driver on arm64" · 57aa5ea0

由 Rafael J. Wysocki 提交于 6月 05, 2014

This reverts commit 4920ab84 (cpufreq: Enable big.LITTLE cpufreq
driver on arm64) that breaks build on arm64.

Fixes: 4920ab84 (cpufreq: Enable big.LITTLE cpufreq driver on arm64)
Reported-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

57aa5ea0

06 6月, 2014 2 次提交

cpufreq: Tegra: implement intermediate frequency callbacks · 00917ddc

由 Viresh Kumar 提交于 6月 02, 2014

Tegra has been switching to intermediate frequency (pll_p_clk) forever.
CPUFreq core has better support for handling notifications for these
frequencies and so we can adapt Tegra's driver to it.

Also do a WARN() if clk_set_parent() fails while moving back to pll_x
as we should have atleast restored to earlier frequency on error.
Tested-by: NStephen Warren <swarren@nvidia.com>
Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: NDoug Anderson <dianders@chromium.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

00917ddc

cpufreq: add support for intermediate (stable) frequencies · 1c03a2d0

由 Viresh Kumar 提交于 6月 02, 2014

Douglas Anderson, recently pointed out an interesting problem due to which
udelay() was expiring earlier than it should.

While transitioning between frequencies few platforms may temporarily switch to
a stable frequency, waiting for the main PLL to stabilize.

For example: When we transition between very low frequencies on exynos, like
between 200MHz and 300MHz, we may temporarily switch to a PLL running at 800MHz.
No CPUFREQ notification is sent for that. That means there's a period of time
when we're running at 800MHz but loops_per_jiffy is calibrated at between 200MHz
and 300MHz. And so udelay behaves badly.

To get this fixed in a generic way, introduce another set of callbacks
get_intermediate() and target_intermediate(), only for drivers with
target_index() and CPUFREQ_ASYNC_NOTIFICATION unset.

get_intermediate() should return a stable intermediate frequency platform wants
to switch to, and target_intermediate() should set CPU to that frequency,
before jumping to the frequency corresponding to 'index'. Core will take care of
sending notifications and driver doesn't have to handle them in
target_intermediate() or target_index().

NOTE: ->target_index() should restore to policy->restore_freq in case of
failures as core would send notifications for that.
Tested-by: NStephen Warren <swarren@nvidia.com>
Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: NDoug Anderson <dianders@chromium.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

1c03a2d0

02 6月, 2014 4 次提交

intel_pstate: Improve initial busy calculation · bf810222

由 Doug Smythies 提交于 5月 30, 2014

This change makes the busy calculation using 64 bit math which prevents
overflow for large values of aperf/mperf.

Cc: 3.14+ <stable@vger.kernel.org> # 3.14+
Signed-off-by: NDoug Smythies <dsmythies@telus.net>
Signed-off-by: NDirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

bf810222

intel_pstate: add sample time scaling · c4ee841f

由 Dirk Brandewie 提交于 5月 29, 2014

The PID assumes that samples are of equal time, which for a deferable
timers this is not true when the system goes idle. This causes the
PID to take a long time to converge to the min P state and depending
on the pattern of the idle load can make the P state appear stuck.

The hold-off value of three sample times before using the scaling is
to give a grace period for applications that have high performance
requirements and spend a lot of time idle, The poster child for this
behavior is the ffmpeg benchmark in the Phoronix test suite.

Cc: 3.14+ <stable@vger.kernel.org> # 3.14+
Signed-off-by: NDirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

c4ee841f

intel_pstate: Correct rounding in busy calculation · f0fe3cd7

由 Dirk Brandewie 提交于 5月 29, 2014

Changing to fixed point math throughout the busy calculation in
commit e66c1768 (Change busy calculation to use fixed point
math.) Introduced some inaccuracies by rounding the busy value at two
points in the calculation.  This change removes roundings and moves
the rounding to the output of the PID where the calculations are
complete and the value returned as an integer.

Fixes: e66c1768 (intel_pstate: Change busy calculation to use fixed point math.)
Reported-by: NDoug Smythies <dsmythies@telus.net>
Cc: 3.14+ <stable@vger.kernel.org> # 3.14+
Signed-off-by: NDirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

f0fe3cd7

intel_pstate: Remove C0 tracking · adacdf3f

由 Dirk Brandewie 提交于 5月 29, 2014

Commit fcb6a15c (intel_pstate: Take core C0 time into account for core
busy calculation) introduced a regression referenced below. The issue
with "lockup" after suspend that this commit was addressing is now dealt
with in the suspend path.

Fixes: fcb6a15c (intel_pstate: Take core C0 time into account for core busy calculation)
Link: https://bugzilla.kernel.org/show_bug.cgi?id=66581
Link: https://bugzilla.kernel.org/show_bug.cgi?id=75121Reported-by: NDoug Smythies <dsmythies@telus.net>
Cc: 3.14+ <stable@vger.kernel.org> # 3.14+
Signed-off-by: NDirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

adacdf3f

31 5月, 2014 1 次提交

drm/radeon: Resume fbcon last · 18ee37a4

由 Daniel Vetter 提交于 5月 30, 2014

So a few people complained that

commit 177cf92d
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Tue Apr 1 22:14:59 2014 +0200

    drm/crtc-helpers: fix dpms on logic

which was merged into 3.15-rc1, broke resume on radeons. Strangely git
bisect lead everyone to

commit 25f397a4
Author: Daniel Vetter <daniel.vetter@ffwll.ch>
Date:   Fri Jul 19 18:57:11 2013 +0200

    drm/crtc-helper: explicit DPMS on after modeset

which was merged long ago and actually part of 3.14.

Digging deeper I've noticed (again) that the call to
drm_helper_resume_force_mode in the radeon resume handlers was a no-op
previously because everything gets shut down on suspend. radeon does
this with explicit calls to drm_helper_connector_dpms with DPMS_OFF.
But with 177c we now force the dpms state to ON, so suddenly
resume_force_mode actually forced the crtcs back on.

This is the intention of the change after all, the problem is that
radeon resumes the fbdev console layer _before_ restoring the display,
through calling fb_set_suspend. And fbcon does an immediate ->set_par,
which in turn causes the same forced mode restore to happen.

Two concurrent modeset operations didn't lead to happiness. Fix this
by delaying the fbcon resume until the end of the readeon resum
functions.

v2: Fix up a bit of the spelling fail.

References: https://lkml.org/lkml/2014/5/29/1043
References: https://lkml.org/lkml/2014/5/2/388
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=74751Tested-by: NKen Moffat <zarniwhoop@ntlworld.com>
Cc: Alex Deucher <alexdeucher@gmail.com>
Cc: Ken Moffat <zarniwhoop@ntlworld.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: NDave Airlie <airlied@gmail.com>

18ee37a4

30 5月, 2014 4 次提交

drm/radeon: only allocate necessary size for vm bo list · 7d95f6cc

由 Christian König 提交于 5月 28, 2014

No need to always allocate the theoretical maximum here.
Signed-off-by: NChristian König <christian.koenig@amd.com>

7d95f6cc

drm/radeon: don't allow RADEON_GEM_DOMAIN_CPU for command submission · ec65da38

由 Marek Olšák 提交于 5月 27, 2014

It hangs the hardware.
Signed-off-by: NMarek Olšák <marek.olsak@amd.com>
Reviewed-by: NChristian König <christian.koenig@amd.com>
Cc: stable@vger.kernel.org

ec65da38

C
drm/radeon: avoid crash if VM command submission isn't available · 60a44540
由 Christian König 提交于 5月 21, 2014
```
Signed-off-by: NChristian König <christian.koenig@amd.com>
CC: stable@vger.kernel.org
```
60a44540

drm/radeon: lower the ref * post PLL maximum once more · 4b21ce1b

由 Christian König 提交于 5月 21, 2014

Let's be conservative and use 100 here until we find something better.

Bugs: https://bugzilla.kernel.org/show_bug.cgi?id=75241Signed-off-by: NChristian König <christian.koenig@amd.com>

4b21ce1b

29 5月, 2014 2 次提交

firewire: revert to 4 GB RDMA, fix protocols using Memory Space · 2fe2023a

由 Stefan Richter 提交于 5月 29, 2014

Undo a feature introduced in v3.14 by commit fcd46b34
"firewire: Enable remote DMA above 4 GB". That change raised the
minimum address at which protocol drivers and user programs can register
for request reception from 0x0001'0000'0000 to 0x8000'0000'0000.
It turned out that at least one vendor-specific protocol exists which
uses lower addresses: https://bugzilla.kernel.org/show_bug.cgi?id=76921

For the time being, revert most of commit fcd46b34 so that affected
protocols work like with kernel v3.13 and before. Just keep the valid
documentation parts from the regressing commit, and the ability to
identify controllers which could be programmed to accept >32 bit
physical DMA addresses. The rest of fcd46b34 should probably be
brought back as an optional instead of default feature.
Reported-by: NFabien Spindler <fabien.spindler@inria.fr>
Cc: <stable@vger.kernel.org> # 3.14+
Signed-off-by: NStefan Richter <stefanr@s5r6.in-berlin.de>

2fe2023a

cpufreq: handle calls to ->target_index() in separate routine · 8d65775d

由 Viresh Kumar 提交于 5月 21, 2014

Handling calls to ->target_index() has got complex over time and might become
more complex. So, its better to take target_index() bits out in another routine
__target_index() for better code readability. Shouldn't have any functional
impact.
Tested-by: NStephen Warren <swarren@nvidia.com>
Reviewed-by: NDoug Anderson <dianders@chromium.org>
Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

8d65775d

27 5月, 2014 11 次提交

dm mpath: really fix lockdep warning · 63d832c3

由 Hannes Reinecke 提交于 5月 26, 2014

lockdep complains about a circular locking.  And indeed, we need to
release the lock before calling dm_table_run_md_queue_async().

As such, commit 4cdd2ad7 ("dm mpath: fix lock order inconsistency in
multipath_ioctl") must also be reverted in addition to fixing the
lock order in the other dm_table_run_md_queue_async() callers.
Reported-by: NBart van Assche <bvanassche@acm.org>
Tested-by: NBart van Assche <bvanassche@acm.org>
Signed-off-by: NHannes Reinecke <hare@suse.de>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>

63d832c3

virtio_blk: fix race between start and stop queue · aa0818c6

由 Ming Lei 提交于 5月 16, 2014

When there isn't enough vring descriptor for adding to vq,
blk-mq will be put as stopped state until some of pending
descriptors are completed & freed.

Unfortunately, the vq's interrupt may come just before
blk-mq's BLK_MQ_S_STOPPED flag is set, so the blk-mq will
still be kept as stopped even though lots of descriptors
are completed and freed in the interrupt handler. The worst
case is that all pending descriptors are freed in the
interrupt handler, and the queue is kept as stopped forever.

This patch fixes the problem by starting/stopping blk-mq
with holding vq_lock.

Cc: Jens Axboe <axboe@kernel.dk>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NMing Lei <tom.leiming@gmail.com>
Cc: stable@kernel.org
Signed-off-by: NJens Axboe <axboe@fb.com>

Conflicts:
	drivers/block/virtio_blk.c

aa0818c6

dm cache: always split discards on cache block boundaries · f1daa838

由 Heinz Mauelshagen 提交于 5月 23, 2014

The DM cache target cannot cope with discards that span multiple cache
blocks, so each discard bio that spans more than one cache block must
get split by the DM core.
Signed-off-by: NHeinz Mauelshagen <heinzm@redhat.com>
Acked-by: NJoe Thornber <ejt@redhat.com>
Signed-off-by: NMike Snitzer <snitzer@redhat.com>
Cc: stable@vger.kernel.org # v3.9+

f1daa838

drm/i915: Prevent negative relocation deltas from wrapping · d23db88c

由 Chris Wilson 提交于 5月 23, 2014

This is pure evil. Userspace, I'm looking at you SNA, repacks batch
buffers on the fly after generation as they are being passed to the
kernel for execution. These batches also contain self-referenced
relocations as a single buffer encompasses the state commands, kernels,
vertices and sampler. During generation the buffers are placed at known
offsets within the full batch, and then the relocation deltas (as passed
to the kernel) are tweaked as the batch is repacked into a smaller buffer.
This means that userspace is passing negative relocations deltas, which
subsequently wrap to large values if the batch is at a low address. The
GPU hangs when it then tries to use the large value as a base for its
address offsets, rather than wrapping back to the real value (as one
would hope). As the GPU uses positive offsets from the base, we can
treat the relocation address as the minimum address read by the GPU.
For the upper bound, we trust that userspace will not read beyond the
end of the buffer.

So, how do we fix negative relocations from wrapping? We can either
check that every relocation looks valid when we write it, and then
position each object such that we prevent the offset wraparound, or we
just special-case the self-referential behaviour of SNA and force all
batches to be above 256k. Daniel prefers the latter approach.

This fixes a GPU hang when it tries to use an address (relocation +
offset) greater than the GTT size. The issue would occur quite easily
with full-ppgtt as each fd gets its own VM space, so low offsets would
often be handed out. However, with the rearrangement of the low GTT due
to capturing the BIOS framebuffer, it is already affecting kernels 3.15
onwards. I think only IVB+ is susceptible to this bug, but the workaround
should only kick in rarely, so it seems sensible to always apply it.

v3: Use a bias for batch buffers to prevent small negative delta relocations
from wrapping.

v4 from Daniel:
- s/BIAS/BATCH_OFFSET_BIAS/
- Extract eb_vma_misplaced/i915_vma_misplaced since the conditions
  were growing rather cumbersome.
- Add a comment to eb_get_batch explaining why we do this.
- Apply the batch offset bias everywhere but mention that we've only
  observed it on gen7 gpus.
- Drop PIN_OFFSET_FIX for now, that slipped in from a feature patch.

v5: Add static to eb_get_batch, spotted by 0-day tester.

Testcase: igt/gem_bad_reloc
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=78533
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> (v3)
Cc: stable@vger.kernel.org
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

d23db88c

drm/i915: Only copy back the modified fields to userspace from execbuffer · 9aab8bff

由 Chris Wilson 提交于 5月 23, 2014

We only want to modifiy a single field in the userspace view of the
execbuffer command buffer, so explicitly change that rather than copy
everything back again.

This serves two purposes:

1. The single fields are much cheaper to copy (constant size so the
copy uses special case code) and much smaller than the whole array.

2. We modify the array for internal use that need to be masked from
the user.

Note: We need this backported since without it the next bugfix will
blow up when userspace recycles batchbuffers and relocations.
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: stable@vger.kernel.org
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

9aab8bff

drm/i915: Fix dynamic allocation of physical handles · 00731155

由 Chris Wilson 提交于 5月 21, 2014

A single object may be referenced by multiple registers fundamentally
breaking the static allotment of ids in the current design. When the
object is used the second time, the physical address of the first
assignment is relinquished and a second one granted. However, the
hardware is still reading (and possibly writing) to the old physical
address now returned to the system. Eventually hilarity will ensue, but
in the short term, it just means that cursors are broken when using more
than one pipe.

v2: Fix up leak of pci handle when handling an error during attachment,
and avoid a double kmap/kunmap. (Ville)
Rebase against -fixes.

v3: And fix the error handling added in v2 (Ville)

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=77351Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Jani Nikula <jani.nikula@linux.intel.com>
Cc: stable@vger.kernel.org
Reviewed-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>

00731155

cpufreq: s5pv210: drop check for CONFIG_PM_VERBOSE · 1ef546f2

由 Paul Bolle 提交于 5月 23, 2014

A pr_err() was added in v3.1. It was guarded by a check for
CONFIG_PM_VERBOSE. The Kconfig symbol PM_VERBOSE was removed in v3.0. So
this pr_err() has never been used. Drop that check and clean up the
message a bit.
Signed-off-by: NPaul Bolle <pebolle@tiscali.nl>
Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: NSachin Kamat <sachin.kamat@linaro.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

1ef546f2

cpufreq: intel_pstate: Remove unused member name of cpudata · 94f89e07

由 Stratos Karafotis 提交于 5月 20, 2014

Although, a value is assigned to member name of struct cpudata,
it is never used.

We can safely remove it.
Signed-off-by: NStratos Karafotis <stratosk@semaphore.gr>
Acked-by: NViresh Kumar <viresh.kumar@linaro.org>
Acked-by: NDirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

94f89e07

Input: synaptics - change min/max quirk table to pnp-id matching · 0f68f39c

由 Hans de Goede 提交于 5月 19, 2014

Most of the affected models share pnp-ids for the touchpad. So switching
to pnp-ids give us 2 advantages:
1) It shrinks the quirk list
2) It will lower the new quirk addition frequency, ie the recently added W540
   quirk would not have been necessary since it uses the same LEN0034 pnp ids
   as other models already added before it

As an added bonus it actually puts the quirk on the actual psmouse, rather
then on the machine, which is technically more correct.

Cc: stable@vger.kernel.org
Signed-off-by: NHans de Goede <hdegoede@redhat.com>
Signed-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>

0f68f39c

Input: synaptics - add a matches_pnp_id helper function · e2f61102

由 Hans de Goede 提交于 5月 19, 2014

This is a preparation patch for simplifying the min/max quirk table.

Cc: stable@vger.kernel.org
Signed-off-by: NHans de Goede <hdegoede@redhat.com>
Signed-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>

e2f61102

Input: synaptics - T540p - unify with other LEN0034 models · 6d396ede

由 Hans de Goede 提交于 5月 19, 2014

The T540p has a touchpad with pnp-id LEN0034, all the models with this
pnp-id have the same min/max values, except the T540p where the values are
slightly off. Fix them to be identical.

This is a preparation patch for simplifying the quirk table.

Cc: stable@vger.kernel.org
Signed-off-by: NHans de Goede <hdegoede@redhat.com>
Signed-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>

6d396ede

26 5月, 2014 1 次提交

ACPI / thermal: fix workqueue destroy order · 2807bd18

由 Aaron Lu 提交于 5月 26, 2014

When the thermal module is to be removed, we should destroy the wq
acpi_thermal_pm_queue after the ACPI driver's remove callback is
executed as we will need to flush the workqueue there, or a NULL pointer
access will be hit.
Reported-and-tested-by: NKui Zhang <kuizhang@gmail.com>
References: http://www.spinics.net/lists/kernel/msg1747251.html
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: NAaron Lu <aaron.lu@intel.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

2807bd18

25 5月, 2014 2 次提交

hwmon: (ntc_thermistor) Fix OF device ID mapping · ead82d67

由 Jean Delvare 提交于 5月 25, 2014

The mapping from OF device IDs to platform device IDs is wrong.
TYPE_NCPXXWB473 is 0, TYPE_NCPXXWL333 is 1, so
ntc_thermistor_id[TYPE_NCPXXWB473] is { "ncp15wb473", TYPE_NCPXXWB473 }
while
ntc_thermistor_id[TYPE_NCPXXWL333] is { "ncp18wb473", TYPE_NCPXXWB473 }.

So the name is wrong for all but the "ntc,ncp15wb473" entry, and the
type is wrong for the "ntc,ncp15wl333" entry.

So map the entries by index, it is neither elegant nor robust but at
least it is correct.
Signed-off-by: NJean Delvare <jdelvare@suse.de>
Fixes: 9e8269de hwmon: (ntc_thermistor) Add DT with IIO support to NTC thermistor driver
Reviewed-by: NGuenter Roeck <linux@roeck-us.net>
Cc: Naveen Krishna Chatradhi <ch.naveen@samsung.com>
Cc: Doug Anderson <dianders@chromium.org>

ead82d67

hwmon: (ntc_thermistor) Fix dependencies · 59cf4243

由 Jean Delvare 提交于 5月 25, 2014

In commit 9e8269de, support was added for ntc_thermistor devices being
declared in the device tree and implemented on top of IIO. With that
change, a dependency was added to the ntc_thermistor driver:

	depends on (!OF && !IIO) || (OF && IIO)

This construct has the drawback that the driver can no longer be
selected when OF is set and IIO isn't, nor when IIO is set and OF is
not. This is a regression for the original users of the driver.

As the new code depends on IIO and is useless without OF, include it
only if both are enabled, and set the dependencies accordingly. This
is clearer, more simple and more correct.
Signed-off-by: NJean Delvare <jdelvare@suse.de>
Fixes: 9e8269de hwmon: (ntc_thermistor) Add DT with IIO support to NTC thermistor driver
Reviewed-by: NGuenter Roeck <linux@roeck-us.net>
Cc: Naveen Krishna Chatradhi <ch.naveen@samsung.com>
Cc: Doug Anderson <dianders@chromium.org>

59cf4243

24 5月, 2014 2 次提交

clk: st: Fix memory leak · 72b1c2c3

由 Valentin Ilie 提交于 4月 22, 2014

When it fails to allocate div, gate should be free'd before return
Signed-off-by: NValentin Ilie <valentin.ilie@gmail.com>
Signed-off-by: NMike Turquette <mturquette@linaro.org>

72b1c2c3

clk: divider: Fix table round up function · fe52e750

由 Maxime COQUELIN 提交于 5月 07, 2014

Commit 1d9fe6b97 ("clk: divider: Fix best div calculation for power-of-two and
table dividers") introduces a regression in its _table_round_up function.

When the divider passed to this function is greater than the max divider
available in the table, this function returns table's max divider.
Problem is that it causes an infinite loop in clk_divider_bestdiv() because
_next_div() will never return a value greater than maxdiv.

Instead of returning table's max divider, this patch returns INT_MAX.
Reported-by: NFabio Estevam <festevam@gmail.com>
Reported-by: NShawn Guo <shawn.guo@freescale.com>
Tested-by: NFabio Estevam <festevam@gmail.com>
Tested-by: NShawn Guo <shawn.guo@freescale.com>
Signed-off-by: NMaxime Coquelin <maxime.coquelin@st.com>
Signed-off-by: NMike Turquette <mturquette@linaro.org>

fe52e750

23 5月, 2014 7 次提交

[SCSI] scsi_transport_sas: move bsg destructor into sas_rphy_remove · 6aa6caff

由 Joe Lawrence 提交于 5月 22, 2014

The recent change in sysfs, bcdde7e2
"sysfs: make __sysfs_remove_dir() recursive" revealed an asymmetric
rphy device creation/deletion sequence in scsi_transport_sas:

  modprobe mpt2sas
    sas_rphy_add
      device_add A               rphy->dev
      device_add B               sas_device transport class
      device_add C               sas_end_device transport class
      device_add D               bsg class

  rmmod mpt2sas
    sas_rphy_delete
      sas_rphy_remove
        device_del B
        device_del C
        device_del A
          sysfs_remove_group     recursive sysfs dir removal
      sas_rphy_free
        device_del D             warning

  where device A is the parent of B, C, and D.

When sas_rphy_free tries to unregister the bsg request queue (device D
above), the ensuing sysfs cleanup discovers that its sysfs group has
already been removed and emits a warning, "sysfs group... not found for
kobject 'end_device-X:0'".

Since bsg creation is a side effect of sas_rphy_add, move its
complementary removal call into sas_rphy_remove. This imposes the
following tear-down order for the devices above: D, B, C, A.

Note the sas_device and sas_end_device transport class devices (B and C
above) are created and destroyed both via the list match traversal in
attribute_container_device_trigger, so the order in which they are
handled is fixed. This is fine as long as they are deleted before their
parent device.
Signed-off-by: NJoe Lawrence <joe.lawrence@stratus.com>
Acked-by: NDan Williams <dan.j.williams@intel.com>
Signed-off-by: NJames Bottomley <JBottomley@Parallels.com>

6aa6caff

pinctrl: vt8500: Ensure value reg is updated when setting direction · 7ea45643

由 Alexey Charkov 提交于 4月 29, 2014

Current code only touches the direction register when setting direction
to output, which breaks logic like

echo high > /sys/class/gpio/gpio0/direction

which is expected to also set the value. This patch also adds a call
to update the value register when setting direction to output.
Signed-off-by: NAlexey Charkov <alchark@gmail.com>
Acked-by: NTony Prisk <linux@prisktech.co.nz>
Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>

7ea45643

bonding: Send ALB learning packets using the right source · d0c21d43

由 Vlad Yasevich 提交于 5月 21, 2014

ALB learning packets are currentlyalways sent using the slave mac
address for all vlans configured on top of bond.   This is not always
correct, as vlans may change their mac address.
This patch introduced a concept of strict matching where the
source of learning packets can either strictly match the address
passed in, or it can determine a more correct address to use.

There are 3 casese to consider:
  1) Switchover.  In this case, we have a new active slave and we need
     tell the switch about all addresses available on the slave.
  2) Monitor.  We'll periodically refresh learning info for all slaves.
     In this case, we refresh all addresses for current active, and just
     the slave address for other slaves.
  3) Teaching of disabled adddress.  This happens as part of the
     failover and in this case, we alwyas to use just the address
     provided.

CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d0c21d43

bonding: Don't assume 802.1Q when sending alb learning packets. · d6b694c0

由 Vlad Yasevich 提交于 5月 21, 2014

TLB/ALB learning packets always assume 802.1Q vlan protocol, but
that is no longer the case since we now have support for Q-in-Q
on top of bonding.  Pass the vlan protocol to alb_send_lp_vid()
so that the packets are properly tagged.

CC: Jay Vosburgh <j.vosburgh@gmail.com>
CC: Veaceslav Falico <vfalico@gmail.com>
CC: Andy Gospodarek <andy@greyhouse.net>
Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
Acked-by: NVeaceslav Falico <vfalico@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d6b694c0

stmmac: Remove unbalanced clk_disable call · 89df20d9

由 Hans de Goede 提交于 5月 20, 2014

The stmmac_open call was calling clk_disable_unprepare on phy init
failure, but it never calls clk_prepare_enable, this causes
a WARN_ON in the clk framework to trigger if for some reason phy init
fails.
Signed-off-by: NHans de Goede <hdegoede@redhat.com>
Acked-by: NGiuseppe Cavallaro <peppe.cavallaro@st.com>
Acked-by: NChen-Yu Tsai <wens@csie.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

89df20d9

clocksource: tcb_clksrc: Make tc_mode interrupt safe · acbf6d21

由 David Jander 提交于 5月 08, 2014

tc_mode() can be called from interrupt context and thus must not call
clk_*prepare*() functions.
Signed-off-by: NDavid Jander <david@protonic.nl>
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: NAlexandre Belloni <alexandre.belloni@free-electrons.com>
Acked-by: NNicolas Ferre <nicolas.ferre@atmel.com>

acbf6d21

clocksource: marco: Fix the affinity set for local timer of CPU1 · f214be50

由 Zhiwu Song 提交于 5月 07, 2014

irqchip will reject the affinity set to CPUs which is not online
yet. but in the CPU1 wakeup stage, OS only sets CPU1 to be online
after local timer is set, so that causes the irq_set_affinity not
work. this patch moves to irq_force_affinity() for the low level
boot stage.
Signed-off-by: NZhiwu Song <Zhiwu.Song@csr.com>
Signed-off-by: NBarry Song <Baohua.Song@csr.com>
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>

f214be50

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功