提交 · 62b94a08da1bae9d187d49dfcd6665af393750f8 · openeuler / Kernel

13 1月, 2014 24 次提交

sched/preempt: Take away preempt_enable_no_resched() from modules · 62b94a08

由 Peter Zijlstra 提交于 11月 20, 2013

Discourage drivers/modules to be creative with preemption.

Sadly all is implemented in macros and inline so if they want to do
evil they still can, but at least try and discourage some.
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Cc: Eliezer Tamir <eliezer.tamir@linux.intel.com>
Cc: rui.zhang@intel.com
Cc: jacob.jun.pan@linux.intel.com
Cc: Mike Galbraith <bitbucket@online.de>
Cc: hpa@zytor.com
Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: lenb@kernel.org
Cc: rjw@rjwysocki.net
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/n/tip-fn7h6vu8wtgxk0ih402qcijx@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

62b94a08

locking: Optimize lock_bh functions · 9ea4c380

由 Peter Zijlstra 提交于 11月 19, 2013

Currently all _bh_ lock functions do two preempt_count operations:

  local_bh_disable();
  preempt_disable();

and for the unlock:

  preempt_enable_no_resched();
  local_bh_enable();

Since its a waste of perfectly good cycles to modify the same variable
twice when you can do it in one go; use the new
__local_bh_{dis,en}able_ip() functions that allow us to provide a
preempt_count value to add/sub.

So define SOFTIRQ_LOCK_OFFSET as the offset a _bh_ lock needs to
add/sub to be done in one go.

As a bonus it gets rid of the preempt_enable_no_resched() usage.

This reduces a 1000 loops of:

  spin_lock_bh(&bh_lock);
  spin_unlock_bh(&bh_lock);

from 53596 cycles to 51995 cycles. I didn't do enough measurements to
say for absolute sure that the result is significant but the the few
runs I did for each suggest it is so.
Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Cc: jacob.jun.pan@linux.intel.com
Cc: Mike Galbraith <bitbucket@online.de>
Cc: hpa@zytor.com
Cc: Arjan van de Ven <arjan@linux.intel.com>
Cc: lenb@kernel.org
Cc: rjw@rjwysocki.net
Cc: rui.zhang@intel.com
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20131119151338.GF3694@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>

9ea4c380

sched: Factor out the on_null_domain() checks in trigger_load_balance() · c726099e

由 Daniel Lezcano 提交于 1月 06, 2014

The test on_null_domain is done twice in the trigger_load_balance function.

Move the test at the begin of the function, so there is only one check.
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1389008085-9069-9-git-send-email-daniel.lezcano@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

c726099e

sched: Pass 'struct rq' to nohz_idle_balance() · 208cb16b

由 Daniel Lezcano 提交于 1月 06, 2014

The cpu information is stored in the struct rq. Pass the struct rq to
nohz_idle_balance, so all the functions called in run_rebalance_domains have
the same parameters and the 'this_cpu' variable becomes pointless.
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
[ Added !SMP build fix. ]
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1389008085-9069-8-git-send-email-daniel.lezcano@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

208cb16b

sched: Pass 'struct rq' to rebalance_domains() · f7ed0a89

由 Daniel Lezcano 提交于 1月 06, 2014

The cpu information is stored in the struct rq and the caller of the
rebalance_domains function pass the cpu to retrieve the struct rq but
it already has the struct rq info. Replace the cpu parameter with the
struct rq.
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1389008085-9069-7-git-send-email-daniel.lezcano@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

f7ed0a89

sched: Remove unused parameter from nohz_balancer_kick() · 0aeeeeba

由 Daniel Lezcano 提交于 1月 06, 2014

The cpu parameter is no longer needed in nohz_balancer_kick, let's remove
the parameter.
Reviewed-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com>
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1389008085-9069-6-git-send-email-daniel.lezcano@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

0aeeeeba

sched: Remove unused parameter from find_new_ilb() · 3dd0337d

由 Daniel Lezcano 提交于 1月 06, 2014

The 'call_cpu' is never used in the function. Remove it.
Reviewed-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com>
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1389008085-9069-5-git-send-email-daniel.lezcano@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

3dd0337d

sched: Pass 'struct rq' to on_null_domain() · 63f609b1

由 Daniel Lezcano 提交于 1月 06, 2014

The on_null_domain() function is getting the cpu to retrieve the struct rq
associated with it.

Pass 'struct rq' directly to the function as the caller already has the info.
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1389008085-9069-4-git-send-email-daniel.lezcano@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

63f609b1

sched: Reduce nohz_kick_needed() parameters · 4a725627

由 Daniel Lezcano 提交于 1月 06, 2014

The cpu information is already stored in the struct rq, so no need to pass it
as parameter to the nohz_kick_needed function.

The caller of this function just called idle_cpu() before to fill the
rq->idle_balance field.

Use rq->cpu and rq->idle_balance.
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1389008085-9069-3-git-send-email-daniel.lezcano@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

4a725627

sched: Reduce trigger_load_balance() parameters · 7caff66f

由 Daniel Lezcano 提交于 1月 06, 2014

The cpu information is already stored in the struct rq, so no need to pass it
as parameter to the trigger_load_balance function.

Cc: linaro-kernel@lists.linaro.org
Cc: preeti.lkml@gmail.com
Cc: mingo@redhat.com
Cc: peterz@infradead.org
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1389008085-9069-2-git-send-email-daniel.lezcano@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

7caff66f

sched/deadline: Fix hotplug admission control · de212f18

由 Peter Zijlstra 提交于 12月 19, 2013

The current hotplug admission control is broken because:

CPU_DYING -> migration_call() -> migrate_tasks() -> __migrate_task()

cannot fail and hard assumes it _will_ move all tasks off of the dying
cpu, failing this will break hotplug.

The much simpler solution is a DOWN_PREPARE handler that fails when
removing one CPU gets us below the total allocated bandwidth.
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20131220171343.GL2480@laptop.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>

de212f18

sched/deadline: Remove the sysctl_sched_dl knobs · 1724813d

由 Peter Zijlstra 提交于 12月 17, 2013

Remove the deadline specific sysctls for now. The problem with them is
that the interaction with the exisiting rt knobs is nearly impossible
to get right.

The current (as per before this patch) situation is that the rt and dl
bandwidth is completely separate and we enforce rt+dl < 100%. This is
undesirable because this means that the rt default of 95% leaves us
hardly any room, even though dl tasks are saver than rt tasks.

Another proposed solution was (a discarted patch) to have the dl
bandwidth be a fraction of the rt bandwidth. This is highly
confusing imo.

Furthermore neither proposal is consistent with the situation we
actually want; which is rt tasks ran from a dl server. In which case
the rt bandwidth is a direct subset of dl.

So whichever way we go, the introduction of dl controls at this point
is painful. Therefore remove them and instead share the rt budget.

This means that for now the rt knobs are used for dl admission control
and the dl runtime is accounted against the rt runtime. I realise that
this isn't entirely desirable either; but whatever we do we appear to
need to change the interface later, so better have a small interface
for now.
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-zpyqbqds1r0vyxtxza1e7rdc@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

1724813d

sched/deadline: Fix up the smp-affinity mask tests · e4099a5e

由 Peter Zijlstra 提交于 12月 17, 2013

For now deadline tasks are not allowed to set smp affinity; however
the current tests are wrong, cure this.

The test in __sched_setscheduler() also uses an on-stack cpumask_t
which is a no-no.

Change both tests to use cpumask_subset() such that we test the root
domain span to be a subset of the cpus_allowed mask. This way we're
sure the tasks can always run on all CPUs they can be balanced over,
and have no effective affinity constraints.
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/n/tip-fyqtb1lapxca3lhsxv9cumdc@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

e4099a5e

sched/deadline: speed up SCHED_DEADLINE pushes with a push-heap · 6bfd6d72

由 Juri Lelli 提交于 11月 07, 2013

Data from tests confirmed that the original active load balancing
logic didn't scale neither in the number of CPU nor in the number of
tasks (as sched_rt does).

Here we provide a global data structure to keep track of deadlines
of the running tasks in the system. The structure is composed by
a bitmask showing the free CPUs and a max-heap, needed when the system
is heavily loaded.

The implementation and concurrent access scheme are kept simple by
design. However, our measurements show that we can compete with sched_rt
on large multi-CPUs machines [1].

Only the push path is addressed, the extension to use this structure
also for pull decisions is straightforward. However, we are currently
evaluating different (in order to decrease/avoid contention) data
structures to solve possibly both problems. We are also going to re-run
tests considering recent changes inside cpupri [2].

 [1] http://retis.sssup.it/~jlelli/papers/Ospert11Lelli.pdf
 [2] http://www.spinics.net/lists/linux-rt-users/msg06778.htmlSigned-off-by: NJuri Lelli <juri.lelli@gmail.com>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1383831828-15501-14-git-send-email-juri.lelli@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

6bfd6d72

sched/deadline: Add bandwidth management for SCHED_DEADLINE tasks · 332ac17e

由 Dario Faggioli 提交于 11月 07, 2013

In order of deadline scheduling to be effective and useful, it is
important that some method of having the allocation of the available
CPU bandwidth to tasks and task groups under control.
This is usually called "admission control" and if it is not performed
at all, no guarantee can be given on the actual scheduling of the
-deadline tasks.

Since when RT-throttling has been introduced each task group have a
bandwidth associated to itself, calculated as a certain amount of
runtime over a period. Moreover, to make it possible to manipulate
such bandwidth, readable/writable controls have been added to both
procfs (for system wide settings) and cgroupfs (for per-group
settings).

Therefore, the same interface is being used for controlling the
bandwidth distrubution to -deadline tasks and task groups, i.e.,
new controls but with similar names, equivalent meaning and with
the same usage paradigm are added.

However, more discussion is needed in order to figure out how
we want to manage SCHED_DEADLINE bandwidth at the task group level.
Therefore, this patch adds a less sophisticated, but actually
very sensible, mechanism to ensure that a certain utilization
cap is not overcome per each root_domain (the single rq for !SMP
configurations).

Another main difference between deadline bandwidth management and
RT-throttling is that -deadline tasks have bandwidth on their own
(while -rt ones doesn't!), and thus we don't need an higher level
throttling mechanism to enforce the desired bandwidth.

This patch, therefore:

 - adds system wide deadline bandwidth management by means of:
    * /proc/sys/kernel/sched_dl_runtime_us,
    * /proc/sys/kernel/sched_dl_period_us,
   that determine (i.e., runtime / period) the total bandwidth
   available on each CPU of each root_domain for -deadline tasks;

 - couples the RT and deadline bandwidth management, i.e., enforces
   that the sum of how much bandwidth is being devoted to -rt
   -deadline tasks to stay below 100%.

This means that, for a root_domain comprising M CPUs, -deadline tasks
can be created until the sum of their bandwidths stay below:

    M * (sched_dl_runtime_us / sched_dl_period_us)

It is also possible to disable this bandwidth management logic, and
be thus free of oversubscribing the system up to any arbitrary level.
Signed-off-by: NDario Faggioli <raistlin@linux.it>
Signed-off-by: NJuri Lelli <juri.lelli@gmail.com>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1383831828-15501-12-git-send-email-juri.lelli@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

332ac17e

sched/deadline: Add SCHED_DEADLINE inheritance logic · 2d3d891d

由 Dario Faggioli 提交于 11月 07, 2013

Some method to deal with rt-mutexes and make sched_dl interact with
the current PI-coded is needed, raising all but trivial issues, that
needs (according to us) to be solved with some restructuring of
the pi-code (i.e., going toward a proxy execution-ish implementation).

This is under development, in the meanwhile, as a temporary solution,
what this commits does is:

 - ensure a pi-lock owner with waiters is never throttled down. Instead,
   when it runs out of runtime, it immediately gets replenished and it's
   deadline is postponed;

 - the scheduling parameters (relative deadline and default runtime)
   used for that replenishments --during the whole period it holds the
   pi-lock-- are the ones of the waiting task with earliest deadline.

Acting this way, we provide some kind of boosting to the lock-owner,
still by using the existing (actually, slightly modified by the previous
commit) pi-architecture.

We would stress the fact that this is only a surely needed, all but
clean solution to the problem. In the end it's only a way to re-start
discussion within the community. So, as always, comments, ideas, rants,
etc.. are welcome! :-)
Signed-off-by: NDario Faggioli <raistlin@linux.it>
Signed-off-by: NJuri Lelli <juri.lelli@gmail.com>
[ Added !RT_MUTEXES build fix. ]
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1383831828-15501-11-git-send-email-juri.lelli@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

2d3d891d

rtmutex: Turn the plist into an rb-tree · fb00aca4

由 Peter Zijlstra 提交于 11月 07, 2013

Turn the pi-chains from plist to rb-tree, in the rt_mutex code,
and provide a proper comparison function for -deadline and
-priority tasks.

This is done mainly because:
 - classical prio field of the plist is just an int, which might
   not be enough for representing a deadline;
 - manipulating such a list would become O(nr_deadline_tasks),
   which might be to much, as the number of -deadline task increases.

Therefore, an rb-tree is used, and tasks are queued in it according
to the following logic:
 - among two -priority (i.e., SCHED_BATCH/OTHER/RR/FIFO) tasks, the
   one with the higher (lower, actually!) prio wins;
 - among a -priority and a -deadline task, the latter always wins;
 - among two -deadline tasks, the one with the earliest deadline
   wins.

Queueing and dequeueing functions are changed accordingly, for both
the list of a task's pi-waiters and the list of tasks blocked on
a pi-lock.
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Signed-off-by: NDario Faggioli <raistlin@linux.it>
Signed-off-by: NJuri Lelli <juri.lelli@gmail.com>
Signed-off-again-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1383831828-15501-10-git-send-email-juri.lelli@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

fb00aca4

sched/deadline: Add latency tracing for SCHED_DEADLINE tasks · af6ace76

由 Dario Faggioli 提交于 11月 07, 2013

It is very likely that systems that wants/needs to use the new
SCHED_DEADLINE policy also want to have the scheduling latency of
the -deadline tasks under control.

For this reason a new version of the scheduling wakeup latency,
called "wakeup_dl", is introduced.

As a consequence of applying this patch there will be three wakeup
latency tracer:

 * "wakeup", that deals with all tasks in the system;
 * "wakeup_rt", that deals with -rt and -deadline tasks only;
 * "wakeup_dl", that deals with -deadline tasks only.
Signed-off-by: NDario Faggioli <raistlin@linux.it>
Signed-off-by: NJuri Lelli <juri.lelli@gmail.com>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1383831828-15501-9-git-send-email-juri.lelli@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

af6ace76

sched/deadline: Add period support for SCHED_DEADLINE tasks · 755378a4

由 Harald Gustafsson 提交于 11月 07, 2013

Make it possible to specify a period (different or equal than
deadline) for -deadline tasks. Relative deadlines (D_i) are used on
task arrivals to generate new scheduling (absolute) deadlines as "d =
t + D_i", and periods (P_i) to postpone the scheduling deadlines as "d
= d + P_i" when the budget is zero.

This is in general useful to model (and schedule) tasks that have slow
activation rates (long periods), but have to be scheduled soon once
activated (short deadlines).
Signed-off-by: NHarald Gustafsson <harald.gustafsson@ericsson.com>
Signed-off-by: NDario Faggioli <raistlin@linux.it>
Signed-off-by: NJuri Lelli <juri.lelli@gmail.com>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1383831828-15501-7-git-send-email-juri.lelli@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

755378a4

sched/deadline: Add SCHED_DEADLINE avg_update accounting · 239be4a9

由 Dario Faggioli 提交于 11月 07, 2013

Make the core scheduler and load balancer aware of the load
produced by -deadline tasks, by updating the moving average
like for sched_rt.
Signed-off-by: NDario Faggioli <raistlin@linux.it>
Signed-off-by: NJuri Lelli <juri.lelli@gmail.com>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1383831828-15501-6-git-send-email-juri.lelli@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

239be4a9

sched/deadline: Add SCHED_DEADLINE SMP-related data structures & logic · 1baca4ce

由 Juri Lelli 提交于 11月 07, 2013

Introduces data structures relevant for implementing dynamic
migration of -deadline tasks and the logic for checking if
runqueues are overloaded with -deadline tasks and for choosing
where a task should migrate, when it is the case.

Adds also dynamic migrations to SCHED_DEADLINE, so that tasks can
be moved among CPUs when necessary. It is also possible to bind a
task to a (set of) CPU(s), thus restricting its capability of
migrating, or forbidding migrations at all.

The very same approach used in sched_rt is utilised:
- -deadline tasks are kept into CPU-specific runqueues,
- -deadline tasks are migrated among runqueues to achieve the
following:
* on an M-CPU system the M earliest deadline ready tasks
are always running;
* affinity/cpusets settings of all the -deadline tasks is
always respected.

Therefore, this very special form of "load balancing" is done with
an active method, i.e., the scheduler pushes or pulls tasks between
runqueues when they are woken up and/or (de)scheduled.
IOW, every time a preemption occurs, the descheduled task might be sent
to some other CPU (depending on its deadline) to continue executing
(push). On the other hand, every time a CPU becomes idle, it might pull
the second earliest deadline ready task from some other CPU.

To enforce this, a pull operation is always attempted before taking any
scheduling decision (pre_schedule()), as well as a push one after each
scheduling decision (post_schedule()). In addition, when a task arrives
or wakes up, the best CPU where to resume it is selected taking into
account its affinity mask, the system topology, but also its deadline.
E.g., from the scheduling point of view, the best CPU where to wake
up (and also where to push) a task is the one which is running the task
with the latest deadline among the M executing ones.

In order to facilitate these decisions, per-runqueue "caching" of the
deadlines of the currently running and of the first ready task is used.
Queued but not running tasks are also parked in another rb-tree to
speed-up pushes.
Signed-off-by: NJuri Lelli <juri.lelli@gmail.com>
Signed-off-by: NDario Faggioli <raistlin@linux.it>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1383831828-15501-5-git-send-email-juri.lelli@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

1baca4ce

sched/deadline: Add SCHED_DEADLINE structures & implementation · aab03e05

由 Dario Faggioli 提交于 11月 28, 2013

Introduces the data structures, constants and symbols needed for
SCHED_DEADLINE implementation.

Core data structure of SCHED_DEADLINE are defined, along with their
initializers. Hooks for checking if a task belong to the new policy
are also added where they are needed.

Adds a scheduling class, in sched/dl.c and a new policy called
SCHED_DEADLINE. It is an implementation of the Earliest Deadline
First (EDF) scheduling algorithm, augmented with a mechanism (called
Constant Bandwidth Server, CBS) that makes it possible to isolate
the behaviour of tasks between each other.

The typical -deadline task will be made up of a computation phase
(instance) which is activated on a periodic or sporadic fashion. The
expected (maximum) duration of such computation is called the task's
runtime; the time interval by which each instance need to be completed
is called the task's relative deadline. The task's absolute deadline
is dynamically calculated as the time instant a task (better, an
instance) activates plus the relative deadline.

The EDF algorithms selects the task with the smallest absolute
deadline as the one to be executed first, while the CBS ensures each
task to run for at most its runtime every (relative) deadline
length time interval, avoiding any interference between different
tasks (bandwidth isolation).
Thanks to this feature, also tasks that do not strictly comply with
the computational model sketched above can effectively use the new
policy.

To summarize, this patch:
 - introduces the data structures, constants and symbols needed;
 - implements the core logic of the scheduling algorithm in the new
   scheduling class file;
 - provides all the glue code between the new scheduling class and
   the core scheduler and refines the interactions between sched/dl
   and the other existing scheduling classes.
Signed-off-by: NDario Faggioli <raistlin@linux.it>
Signed-off-by: NMichael Trimarchi <michael@amarulasolutions.com>
Signed-off-by: NFabio Checconi <fchecconi@gmail.com>
Signed-off-by: NJuri Lelli <juri.lelli@gmail.com>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1383831828-15501-4-git-send-email-juri.lelli@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

aab03e05

sched: Add new scheduler syscalls to support an extended scheduling parameters ABI · d50dde5a

由 Dario Faggioli 提交于 11月 07, 2013

Add the syscalls needed for supporting scheduling algorithms
with extended scheduling parameters (e.g., SCHED_DEADLINE).

In general, it makes possible to specify a periodic/sporadic task,
that executes for a given amount of runtime at each instance, and is
scheduled according to the urgency of their own timing constraints,
i.e.:

- a (maximum/typical) instance execution time,
- a minimum interval between consecutive instances,
- a time constraint by which each instance must be completed.

Thus, both the data structure that holds the scheduling parameters of
the tasks and the system calls dealing with it must be extended.
Unfortunately, modifying the existing struct sched_param would break
the ABI and result in potentially serious compatibility issues with
legacy binaries.

For these reasons, this patch:

- defines the new struct sched_attr, containing all the fields
that are necessary for specifying a task in the computational
model described above;

- defines and implements the new scheduling related syscalls that
manipulate it, i.e., sched_setattr() and sched_getattr().

Syscalls are introduced for x86 (32 and 64 bits) and ARM only, as a
proof of concept and for developing and testing purposes. Making them
available on other architectures is straightforward.

Since no "user" for these new parameters is introduced in this patch,
the implementation of the new system calls is just identical to their
already existing counterpart. Future patches that implement scheduling
policies able to exploit the new data structure must also take care of
modifying the sched_*attr() calls accordingly with their own purposes.
Signed-off-by: NDario Faggioli <raistlin@linux.it>
[ Rewrote to use sched_attr. ]
Signed-off-by: NJuri Lelli <juri.lelli@gmail.com>
[ Removed sched_setscheduler2() for now. ]
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1383831828-15501-3-git-send-email-juri.lelli@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

d50dde5a

Merge branch 'sched/urgent' into sched/core · 56b48110

由 Ingo Molnar 提交于 1月 13, 2014

Pick up the latest fixes before applying new changes.
Signed-off-by: NIngo Molnar <mingo@kernel.org>

56b48110

12 1月, 2014 1 次提交

sched: Calculate effective load even if local weight is 0 · 9722c2da

由 Rik van Riel 提交于 1月 06, 2014

Thomas Hellstrom bisected a regression where erratic 3D performance is
experienced on virtual machines as measured by glxgears. It identified
commit 58d081b5 ("sched/numa: Avoid overloading CPUs on a preferred NUMA
node") as the problem which had modified the behaviour of effective_load.

Effective load calculates the difference to the system-wide load if a
scheduling entity was moved to another CPU. The task group is not heavier
as a result of the move but overall system load can increase/decrease as a
result of the change. Commit 58d081b5 ("sched/numa: Avoid overloading CPUs
on a preferred NUMA node") changed effective_load to make it suitable for
calculating if a particular NUMA node was compute overloaded. To reduce
the cost of the function, it assumed that a current sched entity weight
of 0 was uninteresting but that is not the case.

wake_affine() uses a weight of 0 for sync wakeups on the grounds that it
is assuming the waking task will sleep and not contribute to load in the
near future. In this case, we still want to calculate the effective load
of the sched entity hierarchy. As effective_load is no longer used by
task_numa_compare since commit fb13c7ee (sched/numa: Use a system-wide
search to find swap/migration candidates), this patch simply restores the
historical behaviour.
Reported-and-tested-by: NThomas Hellstrom <thellstrom@vmware.com>
Signed-off-by: NRik van Riel <riel@redhat.com>
[ Wrote changelog]
Signed-off-by: NMel Gorman <mgorman@suse.de>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20140106113912.GC6178@suse.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

9722c2da

11 1月, 2014 14 次提交

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · 228fdc08

由 Linus Torvalds 提交于 1月 11, 2014

Pull networking fixes from David Miller:
 "Famouse last words: "final pull request" :-)

  I'm sending this because Jason Wang's fixes are pretty important

   1) Add missing per-cpu stats initialization to ip6_vti.  Otherwise
      lockdep spits out a call trace.  From Li RongQing.

   2) Fix NULL oops in wireless hwsim, from Javier Lopez

   3) TIPC deferred packet queue unlink must NULL out skb->next to avoid
      crashes.  From Erik Hugne

   4) Fix access to uninitialized buffer in nf_nat netfilter code, from
      Daniel Borkmann

   5) Fix lifetime of ipv6 loopback and SIT tunnel addresses, otherwise
      they basically timeout immediately.  From Hannes Frederic Sowa

   6) Fix DMA unmapping of TSO packets in bnx2x driver, from Michal
      Schmidt

   7) Do not allow L2 forwarding offload via macvtap device, the way
      things are now it will not end up being forwaded at all.  From
      Jason Wang

   8) Fix transmit queue selection via ndo_dfwd_start_xmit(), fixing
      things like applying NETIF_F_LLTX to the wrong device (!!) and
      eliding the proper transmit watchdog handling

   9) qlcnic driver was not updating tx statistics at all, from Manish
      Chopra"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  qlcnic: Fix ethtool statistics length calculation
  qlcnic: Fix bug in TX statistics
  net: core: explicitly select a txq before doing l2 forwarding
  macvlan: forbid L2 fowarding offload for macvtap
  bnx2x: fix DMA unmapping of TSO split BDs
  ipv6: add link-local, sit and loopback address with INFINITY_LIFE_TIME
  bnx2x: prevent WARN during driver unload
  tipc: correctly unlink packets from deferred packet queue
  ipv6: pcpu_tstats.syncp should be initialised in ip6_vti.c
  netfilter: only warn once on wrong seqadj usage
  netfilter: nf_nat: fix access to uninitialized buffer in IRC NAT helper
  NFC: Fix target mode p2p link establishment
  iwlwifi: add new devices for 7265 series
  mac80211: move "bufferable MMPDU" check to fix AP mode scan
  mac80211_hwsim: Fix NULL pointer dereference

228fdc08

Merge tag 'xfs-for-linus-v3.13-rc8' of git://oss.sgi.com/xfs/xfs · e2bc4470

由 Linus Torvalds 提交于 1月 11, 2014

Pull xfs bugfixes from Ben Myers:
 "Here we have a bugfix for an off-by-one in the remote attribute
  verifier that results in a forced shutdown which you can hit with v5
  superblock by creating a 64k xattr, and a fix for a missing
  destroy_work_on_stack() in the allocation worker.

  It's a bit late, but they are both fairly straightforward"

* tag 'xfs-for-linus-v3.13-rc8' of git://oss.sgi.com/xfs/xfs:
  xfs: Calling destroy_work_on_stack() to pair with INIT_WORK_ONSTACK()
  xfs: fix off-by-one error in xfs_attr3_rmt_verify

e2bc4470

Merge branch 'leds-fixes-for-3.13' of... · 324c66ff

由 Linus Torvalds 提交于 1月 11, 2014

Merge branch 'leds-fixes-for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/linux-leds

Pull LED fix from Bryan Wu:
 "Pali Rohár and Pavel Machek reported the LED of Nokia N900 doesn't
  work with our latest 3.13-rc6 kernel.  Milo fixed the regression here"

* 'leds-fixes-for-3.13' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/linux-leds:
  leds: lp5521/5523: Remove duplicate mutex

324c66ff

Merge tag 'pm+acpi-3.13-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · cff539b1

由 Linus Torvalds 提交于 1月 11, 2014

Pull ACPI and power management fixes from Rafael Wysocki:

 - Recent commits modifying the lists of C-states in the intel_idle
   driver introduced bugs leading to crashes on some systems.  Two fixes
   from Jiang Liu.

 - The ACPI AC driver should receive all types of notifications, but
   recent change made it ignore some of them.  Fix from Alexander Mezin.

 - intel_pstate's validity checks for MSRs it depends on are not
   sufficient to catch the lack of support in nested KVM setups, so they
   are extended to cover that case.  From Dirk Brandewie.

 - NEC LZ750/LS has a botched up _BIX method in its ACPI tables, so our
   ACPI battery driver needs a quirk for it.  From Lan Tianyu.

 - The tpm_ppi driver sometimes leaks memory allocated by
   acpi_get_name().  Fix from Jiang Liu.

* tag 'pm+acpi-3.13-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  intel_idle: close avn_cstates array with correct marker
  Revert "intel_idle: mark states tables with __initdata tag"
  ACPI / Battery: Add a _BIX quirk for NEC LZ750/LS
  intel_pstate: Add X86_FEATURE_APERFMPERF to cpu match parameters.
  ACPI / TPM: fix memory leak when walking ACPI namespace
  ACPI / AC: change notification handler type to ACPI_ALL_NOTIFY

cff539b1

Merge tag 'mfd-fixes-3.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-fixes · c43a5eb2

由 Linus Torvalds 提交于 1月 11, 2014

Pull MFD fix from Samuel Ortiz:
 "This is the 2nd MFD pull request for 3.13

  It only contains one fix for the rtsx_pcr driver.  Without it we see a
  kernel panic on some machines, when resuming from suspend to RAM"

* tag 'mfd-fixes-3.13-2' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-fixes:
  mfd: rtsx_pcr: Disable interrupts before cancelling delayed works

c43a5eb2

leds: lp5521/5523: Remove duplicate mutex · e70988d1

由 Milo Kim 提交于 12月 02, 2013

It can be a problem when a pattern is loaded via the firmware interface.
LP55xx common driver has already locked the mutex in 'lp55xx_firmware_loaded()'.
So it should be deleted.

On the other hand, locks are required in store_engine_load()
on updating program memory.
Reported-by: NPali Rohár <pali.rohar@gmail.com>
Reported-by: NPavel Machek <pavel@ucw.cz>
Signed-off-by: NMilo Kim <milo.kim@ti.com>
Signed-off-by: NBryan Wu <cooloney@gmail.com>
Cc: <stable@vger.kernel.org>

e70988d1

xfs: Calling destroy_work_on_stack() to pair with INIT_WORK_ONSTACK() · 1f4a63bf

由 Chuansheng Liu 提交于 1月 07, 2014

In case CONFIG_DEBUG_OBJECTS_WORK is defined, it is needed to
call destroy_work_on_stack() which frees the debug object to pair
with INIT_WORK_ONSTACK().
Signed-off-by: NLiu, Chuansheng <chuansheng.liu@intel.com>
Reviewed-by: NBen Myers <bpm@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

(cherry picked from commit 6f96b306)

1f4a63bf

xfs: fix off-by-one error in xfs_attr3_rmt_verify · bba719b5

由 Jie Liu 提交于 1月 01, 2014

With CRC check is enabled, if trying to set an attributes value just
equal to the maximum size of XATTR_SIZE_MAX would cause the v3 remote
attr write verification procedure failure, which would yield the back
trace like below:

<snip>
XFS (sda7): Internal error xfs_attr3_rmt_write_verify at line 191 of file fs/xfs/xfs_attr_remote.c
<snip>
Call Trace:
[<ffffffff816f0042>] dump_stack+0x45/0x56
[<ffffffffa0d99c8b>] xfs_error_report+0x3b/0x40 [xfs]
[<ffffffffa0d96edd>] ? _xfs_buf_ioapply+0x6d/0x390 [xfs]
[<ffffffffa0d99ce5>] xfs_corruption_error+0x55/0x80 [xfs]
[<ffffffffa0dbef6b>] xfs_attr3_rmt_write_verify+0x14b/0x1a0 [xfs]
[<ffffffffa0d96edd>] ? _xfs_buf_ioapply+0x6d/0x390 [xfs]
[<ffffffffa0d97315>] ? xfs_bdstrat_cb+0x55/0xb0 [xfs]
[<ffffffffa0d96edd>] _xfs_buf_ioapply+0x6d/0x390 [xfs]
[<ffffffff81184cda>] ? vm_map_ram+0x31a/0x460
[<ffffffff81097230>] ? wake_up_state+0x20/0x20
[<ffffffffa0d97315>] ? xfs_bdstrat_cb+0x55/0xb0 [xfs]
[<ffffffffa0d9726b>] xfs_buf_iorequest+0x6b/0xc0 [xfs]
[<ffffffffa0d97315>] xfs_bdstrat_cb+0x55/0xb0 [xfs]
[<ffffffffa0d97906>] xfs_bwrite+0x46/0x80 [xfs]
[<ffffffffa0dbfa94>] xfs_attr_rmtval_set+0x334/0x490 [xfs]
[<ffffffffa0db84aa>] xfs_attr_leaf_addname+0x24a/0x410 [xfs]
[<ffffffffa0db8893>] xfs_attr_set_int+0x223/0x470 [xfs]
[<ffffffffa0db8b76>] xfs_attr_set+0x96/0xb0 [xfs]
[<ffffffffa0db13b2>] xfs_xattr_set+0x42/0x70 [xfs]
[<ffffffff811df9b2>] generic_setxattr+0x62/0x80
[<ffffffff811e0213>] __vfs_setxattr_noperm+0x63/0x1b0
[<ffffffff81307afe>] ? evm_inode_setxattr+0xe/0x10
[<ffffffff811e0415>] vfs_setxattr+0xb5/0xc0
[<ffffffff811e054e>] setxattr+0x12e/0x1c0
[<ffffffff811c6e82>] ? final_putname+0x22/0x50
[<ffffffff811c708b>] ? putname+0x2b/0x40
[<ffffffff811cc4bf>] ? user_path_at_empty+0x5f/0x90
[<ffffffff811bdfd9>] ? __sb_start_write+0x49/0xe0
[<ffffffff81168589>] ? vm_mmap_pgoff+0x99/0xc0
[<ffffffff811e07df>] SyS_setxattr+0x8f/0xe0
[<ffffffff81700c2d>] system_call_fastpath+0x1a/0x1f

Tests:
    setfattr -n user.longxattr -v `perl -e 'print "A"x65536'` testfile

This patch fix it to check the remote EA size is greater than the
XATTR_SIZE_MAX rather than more than or equal to it, because it's
valid if the specified EA value size is equal to the limitation as
per VFS setxattr interface.
Signed-off-by: NJie Liu <jeff.liu@oracle.com>
Reviewed-by: NMark Tinguely <tinguely@sgi.com>
Signed-off-by: NBen Myers <bpm@sgi.com>

(cherry picked from commit 85dd0707)

bba719b5

qlcnic: Fix ethtool statistics length calculation · d6e9c89a

由 Shahed Shaikh 提交于 1月 09, 2014

o Consider number of Tx queues while calculating the length of
  Tx statistics as part of ethtool stats.
o Calculate statistics lenght properly for 82xx and 83xx adapter
Signed-off-by: NShahed Shaikh <shahed.shaikh@qlogic.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d6e9c89a

qlcnic: Fix bug in TX statistics · 1ac6762a

由 Manish Chopra 提交于 1月 09, 2014

o Driver was not updating TX stats so it was not populating
  statistics in `ifconfig` command output.
Signed-off-by: NManish Chopra <manish.chopra@qlogic.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1ac6762a

net: core: explicitly select a txq before doing l2 forwarding · f663dd9a

由 Jason Wang 提交于 1月 10, 2014

Currently, the tx queue were selected implicitly in ndo_dfwd_start_xmit(). The
will cause several issues:

- NETIF_F_LLTX were removed for macvlan, so txq lock were done for macvlan
  instead of lower device which misses the necessary txq synchronization for
  lower device such as txq stopping or frozen required by dev watchdog or
  control path.
- dev_hard_start_xmit() was called with NULL txq which bypasses the net device
  watchdog.
- dev_hard_start_xmit() does not check txq everywhere which will lead a crash
  when tso is disabled for lower device.

Fix this by explicitly introducing a new param for .ndo_select_queue() for just
selecting queues in the case of l2 forwarding offload. netdev_pick_tx() was also
extended to accept this parameter and dev_queue_xmit_accel() was used to do l2
forwarding transmission.

With this fixes, NETIF_F_LLTX could be preserved for macvlan and there's no need
to check txq against NULL in dev_hard_start_xmit(). Also there's no need to keep
a dedicated ndo_dfwd_start_xmit() and we can just reuse the code of
dev_queue_xmit() to do the transmission.

In the future, it was also required for macvtap l2 forwarding support since it
provides a necessary synchronization method.

Cc: John Fastabend <john.r.fastabend@intel.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: e1000-devel@lists.sourceforge.net
Signed-off-by: NJason Wang <jasowang@redhat.com>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Acked-by: NJohn Fastabend <john.r.fastabend@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f663dd9a

macvlan: forbid L2 fowarding offload for macvtap · b13ba1b8

由 Jason Wang 提交于 1月 10, 2014

L2 fowarding offload will bypass the rx handler of real device. This will make
the packet could not be forwarded to macvtap device. Another problem is the
dev_hard_start_xmit() called for macvtap does not have any synchronization.

Fix this by forbidding L2 forwarding for macvtap.

Cc: John Fastabend <john.r.fastabend@intel.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NJason Wang <jasowang@redhat.com>
Acked-by: NJohn Fastabend <john.r.fastabend@intel.com.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b13ba1b8

Merge branch 'for-davem' of git://git.kernel.org/pub/scm/linux/kernel/git/linville/wireless · c4d70998

由 David S. Miller 提交于 1月 10, 2014

John W. Linville says:

====================
For the mac80211 bits, Johannes says:

"I have a fix from Javier for mac80211_hwsim when used with wmediumd
userspace, and a fix from Felix for buffering in AP mode."

For the NFC bits, Samuel says:

"This pull request only contains one fix for a regression introduced with
commit e29a9e2a. Without this fix, we can not establish a p2p link
in target mode. Only initiator mode works."

For the iwlwifi bits, Emmanuel says:

"It only includes new device IDs so it's not vital. If you have a pull
request to net.git anyway, I'd happy to have this in."
====================
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c4d70998

bnx2x: fix DMA unmapping of TSO split BDs · 95e92fd4

由 Michal Schmidt 提交于 1月 09, 2014

bnx2x triggers warnings with CONFIG_DMA_API_DEBUG=y:

  WARNING: CPU: 0 PID: 2253 at lib/dma-debug.c:887 check_unmap+0xf8/0x920()
  bnx2x 0000:28:00.0: DMA-API: device driver frees DMA memory with
  different size [device address=0x00000000da2b389e] [map size=1490 bytes]
  [unmap size=66 bytes]

The reason is that bnx2x splits a TSO BD into two BDs (headers + data)
using one DMA mapping for both, but it uses only the length of the first
BD when unmapping.

This patch fixes the bug by unmapping the whole length of the two BDs.
Signed-off-by: NMichal Schmidt <mschmidt@redhat.com>
Reviewed-by: NEric Dumazet <edumazet@google.com>
Acked-by: NDmitry Kravkov <dmitry@broadcom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

95e92fd4

10 1月, 2014 1 次提交

Merge tag 'clk-fixes-for-linus' of git://git.linaro.org/people/mike.turquette/linux · 21e20e22

由 Linus Torvalds 提交于 1月 10, 2014

Pull clock fixes from Mike Turquette:
 "Late fixes for clock drivers.  All of these fixes are for user-visible
  regressions, typically boot failures or other unsafe system
  configuration that causes badness"

* tag 'clk-fixes-for-linus' of git://git.linaro.org/people/mike.turquette/linux:
  clk: clk-divider: fix divisor > 255 bug
  clk: exynos: File scope reg_save array should depend on PM_SLEEP
  clk: samsung: exynos5250: Add CLK_IGNORE_UNUSED flag for the sysreg clock
  ARM: dts: exynos5250: Fix MDMA0 clock number
  clk: samsung: exynos5250: Add MDMA0 clocks
  clk: samsung: exynos5250: Fix ACP gate register offset
  clk: exynos5250: fix sysmmu_mfc{l,r} gate clocks
  clk: samsung: exynos4: Correct SRC_MFC register

21e20e22

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功