- 07 5月, 2014 11 次提交
-
-
由 Vincent Guittot 提交于
We replace the old way to configure the scheduler topology with a new method which enables a platform to declare additionnal level (if needed). We still have a default topology table definition that can be used by platform that don't want more level than the SMT, MC, CPU and NUMA ones. This table can be overwritten by an arch which either wants to add new level where a load balance make sense like BOOK or powergating level or wants to change the flags configuration of some levels. For each level, we need a function pointer that returns cpumask for each cpu, a function pointer that returns the flags for the level and a name. Only flags that describe topology, can be set by an architecture. The current topology flags are: SD_SHARE_CPUPOWER SD_SHARE_PKG_RESOURCES SD_NUMA SD_ASYM_PACKING Then, each level must be a subset on the next one. The build sequence of the sched_domain will take care of removing useless levels like those with 1 CPU and those with the same CPU span and no more relevant information for load balancing than its children. Signed-off-by: NVincent Guittot <vincent.guittot@linaro.org> Tested-by: NDietmar Eggemann <dietmar.eggemann@arm.com> Reviewed-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com> Reviewed-by: NDietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Chris Metcalf <cmetcalf@tilera.com> Cc: Christoph Lameter <cl@linux.com> Cc: David S. Miller <davem@davemloft.net> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hanjun Guo <hanjun.guo@linaro.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Jason Low <jason.low2@hp.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Tony Luck <tony.luck@intel.com> Cc: linux390@de.ibm.com Cc: linux-ia64@vger.kernel.org Cc: linux-s390@vger.kernel.org Link: http://lkml.kernel.org/r/1397209481-28542-2-git-send-email-vincent.guittot@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Rik van Riel 提交于
Setting the numa_preferred_node for a task in task_numa_migrate does nothing on a 2-node system. Either we migrate to the node that already was our preferred node, or we stay where we were. On a 4-node system, it can slightly decrease overhead, by not calling the NUMA code as much. Since every node tends to be directly connected to every other node, running on the wrong node for a while does not do much damage. However, on an 8 node system, there are far more bad nodes than there are good ones, and pretending that a second choice is actually the preferred node can greatly delay, or even prevent, a workload from converging. The only time we can safely pretend that a second choice node is the preferred node is when the task is part of a workload that spans multiple NUMA nodes. Signed-off-by: NRik van Riel <riel@redhat.com> Tested-by: NVinod Chegu <chegu_vinod@hp.com> Acked-by: NMel Gorman <mgorman@suse.de> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1397235629-16328-4-git-send-email-riel@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Rik van Riel 提交于
When tasks have not converged on their preferred nodes yet, we want to retry fairly often, to make sure we do not migrate a task's memory to an undesirable location, only to have to move it again later. This patch reduces the interval at which migration is retried, when the task's numa_scan_period is small. Signed-off-by: NRik van Riel <riel@redhat.com> Tested-by: NVinod Chegu <chegu_vinod@hp.com> Acked-by: NMel Gorman <mgorman@suse.de> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1397235629-16328-3-git-send-email-riel@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Rik van Riel 提交于
The NUMA code is smart enough to distribute the memory of workloads that span multiple NUMA nodes across those NUMA nodes. However, it still has a pretty high scan rate for such workloads, because any memory that is left on a node other than the node of the CPU that faulted on the memory is counted as non-local, which causes the scan rate to go up. Counting the memory on any node where the task's numa group is actively running as local, allows the scan rate to slow down once the application is settled in. This should reduce the overhead of the automatic NUMA placement code, when a workload spans multiple NUMA nodes. Signed-off-by: NRik van Riel <riel@redhat.com> Tested-by: NVinod Chegu <chegu_vinod@hp.com> Acked-by: NMel Gorman <mgorman@suse.de> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: http://lkml.kernel.org/r/1397235629-16328-2-git-send-email-riel@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Jason Low 提交于
Also initialize the per-sd variables for newidle load balancing in sd_numa_init(). Signed-off-by: NJason Low <jason.low2@hp.com> Acked-by: morten.rasmussen@arm.com Cc: daniel.lezcano@linaro.org Cc: alex.shi@linaro.org Cc: preeti@linux.vnet.ibm.com Cc: efault@gmx.de Cc: vincent.guittot@linaro.org Cc: aswin@hp.com Cc: chegu_vinod@hp.com Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1398303035-18255-3-git-send-email-jason.low2@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Jason Low 提交于
The following commit: e5fc6611 ("sched: Fix race in idle_balance()") can potentially cause rq->max_idle_balance_cost to not be updated, even when load_balance(NEWLY_IDLE) is attempted and the per-sd max cost value is updated. Preeti noticed a similar issue with updating rq->next_balance. In this patch, we fix this by making sure we still check/update those values even if a task gets enqueued while browsing the domains. Signed-off-by: NJason Low <jason.low2@hp.com> Reviewed-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Cc: morten.rasmussen@arm.com Cc: aswin@hp.com Cc: daniel.lezcano@linaro.org Cc: alex.shi@linaro.org Cc: efault@gmx.de Cc: vincent.guittot@linaro.org Link: http://lkml.kernel.org/r/1398725155-7591-2-git-send-email-jason.low2@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Tim wrote: "The current code will call pick_next_task_fair a second time in the slow path if we did not pull any task in our first try. This is really unnecessary as we already know no task can be pulled and it doubles the delay for the cpu to enter idle. We instrumented some network workloads and that saw that pick_next_task_fair is frequently called twice before a cpu enters idle. The call to pick_next_task_fair can add non trivial latency as it calls load_balance which runs find_busiest_group on an hierarchy of sched domains spanning the cpus for a large system. For some 4 socket systems, we saw almost 0.25 msec spent per call of pick_next_task_fair before a cpu can be idled." Optimize the second call away for the common case and document the dependency. Reported-by: NTim Chen <tim.c.chen@linux.intel.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Len Brown <len.brown@intel.com> Link: http://lkml.kernel.org/r/20140424100047.GP11096@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Steven Rostedt (Red Hat) 提交于
The check at the beginning of cpupri_find() makes sure that the task_pri variable does not exceed the cp->pri_to_cpu array length. But that length is CPUPRI_NR_PRIORITIES not MAX_RT_PRIO, where it will miss the last two priorities in that array. As task_pri is computed from convert_prio() which should never be bigger than CPUPRI_NR_PRIORITIES, if the check should cause a panic if it is hit. Reported-by: NMike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/1397015410.5212.13.camel@marge.simpson.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Li Zefan 提交于
Free cpudl->free_cpus allocated in cpudl_init(). Signed-off-by: NLi Zefan <lizefan@huawei.com> Acked-by: NJuri Lelli <juri.lelli@gmail.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Cc: <stable@vger.kernel.org> # 3.14+ Link: http://lkml.kernel.org/r/534F36CE.2000409@huawei.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Juri Lelli 提交于
yield_task_dl() is broken: o it forces current to be throttled setting its runtime to zero; o it sets current's dl_se->dl_new to one, expecting that dl_task_timer() will queue it back with proper parameters at replenish time. Unfortunately, dl_task_timer() has this check at the very beginning: if (!dl_task(p) || dl_se->dl_new) goto unlock; So, it just bails out and the task is never replenished. It actually yielded forever. To fix this, introduce a new flag indicating that the task properly yielded the CPU before its current runtime expired. While this is a little overdoing at the moment, the flag would be useful in the future to discriminate between "good" jobs (of which remaining runtime could be reclaimed, i.e. recycled) and "bad" jobs (for which dl_throttled task has been set) that needed to be stopped. Reported-by: Nyjay.kim <yjay.kim@lge.com> Signed-off-by: NJuri Lelli <juri.lelli@gmail.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140429103953.e68eba1b2ac3309214e3dc5a@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Thomas Gleixner 提交于
Russell reported, that irqtime_account_idle_ticks() takes ages due to: for (i = 0; i < ticks; i++) irqtime_account_process_tick(current, 0, rq); It's sad, that this code was written way _AFTER_ the NOHZ idle functionality was available. I charge myself guitly for not paying attention when that crap got merged with commit abb74cef ("sched: Export ns irqtimes through /proc/stat") So instead of looping nr_ticks times just apply the whole thing at once. As a side note: The whole cputime_t vs. u64 business in that context wants to be cleaned up as well. There is no point in having all these back and forth conversions. Lets standardise on u64 nsec for all kernel internal accounting and be done with it. Everything else does not make sense at all for fine grained accounting. Frederic, can you please take care of that? Reported-by: NRussell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Cc: Venkatesh Pallipadi <venki@google.com> Cc: Shaun Ruffell <sruffell@digium.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/r/alpine.DEB.2.02.1405022307000.6261@ionos.tec.linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 24 4月, 2014 1 次提交
-
-
由 Masanari Iida 提交于
When 'flags' argument to sched_{set,get}attr() syscalls were added in: 6d35ab48 ("sched: Add 'flags' argument to sched_{set,get}attr() syscalls") no description for 'flags' was added. It causes the following warnings on "make htmldocs": Warning(/kernel/sched/core.c:3645): No description found for parameter 'flags' Warning(/kernel/sched/core.c:3789): No description found for parameter 'flags' Signed-off-by: NMasanari Iida <standby24x7@gmail.com> Cc: peterz@infradead.org Link: http://lkml.kernel.org/r/1397753955-2914-1-git-send-email-standby24x7@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 19 4月, 2014 1 次提交
-
-
由 Andrew Morton 提交于
Fix: BUG: using __this_cpu_write() in preemptible [00000000] code: systemd-udevd/497 caller is __this_cpu_preempt_check+0x13/0x20 CPU: 3 PID: 497 Comm: systemd-udevd Tainted: G W 3.15.0-rc1 #9 Hardware name: Hewlett-Packard HP EliteBook 8470p/179B, BIOS 68ICF Ver. F.02 04/27/2012 Call Trace: check_preemption_disabled+0xe1/0xf0 __this_cpu_preempt_check+0x13/0x20 touch_nmi_watchdog+0x28/0x40 Reported-by: NLuis Henriques <luis.henriques@canonical.com> Tested-by: NLuis Henriques <luis.henriques@canonical.com> Cc: Eric Piel <eric.piel@tremplin-utc.net> Cc: Robert Moore <robert.moore@intel.com> Cc: Lv Zheng <lv.zheng@intel.com> Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com> Cc: Len Brown <lenb@kernel.org> Cc: Christoph Lameter <cl@linux.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 18 4月, 2014 7 次提交
-
-
由 Kirill Tkhai 提交于
This reverts commit 4c6c4e38 ("sched/core: Fix endless loop in pick_next_task()"), which is not necessary after ("sched/rt: Substract number of tasks of throttled queues from rq->nr_running"). Signed-off-by: NKirill Tkhai <tkhai@yandex.ru> Reviewed-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com> [conflict resolution with stop task checking patch] Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1394835307.18748.34.camel@HP-250-G1-Notebook-PC Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Kirill Tkhai 提交于
Now rq->rt becomes to be able to be in dequeued or enqueued state. We add new member rt_rq->rt_queued, which is used to indicate this. The member is used only for top queue rq->rt_rq. The goal is to fit generic scheme which is used in deadline and fair classes, i.e. throttled rt_rq's rt_nr_running is beeing substracted from rq->nr_running. Signed-off-by: NKirill Tkhai <tkhai@yandex.ru> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1394835300.18748.33.camel@HP-250-G1-Notebook-PC Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Kirill Tkhai 提交于
Two accessors for RT_GROUP_SCHED and !RT_GROUP_SCHED cases. Signed-off-by: NKirill Tkhai <tkhai@yandex.ru> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1394835295.18748.32.camel@HP-250-G1-Notebook-PC Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Kirill Tkhai 提交于
{inc,dec}_rt_tasks() used to count entities which are directly queued on the rt_rq. If an entity was not a task (i.e., it is some queue), its children were not counted. There is no problem here, but now we want to count number of all tasks which are actually queued under the rt_rq in all the hierarchy (except throttled rt queues). Empty queues are not able to be queued and all of the places, which use ->rt_nr_running, just compare it with zero, so we do not break anything here. Signed-off-by: NKirill Tkhai <tkhai@yandex.ru> Reviewed-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/1394835289.18748.31.camel@HP-250-G1-Notebook-PC Cc: linux-kernel@vger.kernel.org [ Twiddled the changelog. ] Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Dongsheng Yang 提交于
Replace various -20/+19 hardcoded nice values with MIN_NICE/MAX_NICE. Signed-off-by: NDongsheng Yang <yangds.fnst@cn.fujitsu.com> Acked-by: NTejun Heo <tj@kernel.org> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/ff13819fd09b7a5dba5ab5ae797f2e7019bdfa17.1394532288.git.yangds.fnst@cn.fujitsu.com Cc: devel@driverdev.osuosl.org Cc: devicetree@vger.kernel.org Cc: fcoe-devel@open-fcoe.org Cc: linux390@de.ibm.com Cc: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org Cc: linux-s390@vger.kernel.org Cc: linux-scsi@vger.kernel.org Cc: nbd-general@lists.sourceforge.net Cc: ocfs2-devel@oss.oracle.com Cc: openipmi-developer@lists.sourceforge.net Cc: qla2xxx-upstream@qlogic.com Cc: linux-arch@vger.kernel.org [ Consolidated the patches, twiddled the changelog. ] Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Kirill V Tkhai 提交于
Just switched pinned task is not able to be pushed. If the rq had had several RT tasks before they have already been considered as candidates to be pushed (or pulled). Signed-off-by: NKirill V Tkhai <tkhai@yandex.ru> Acked-by: NSteven Rostedt <rostedt@goodmis.org> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140312061833.3a43aa64@gandalf.local.home Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Mike reported that, while unlikely, its entirely possible for scale_rt_power() to see the time go backwards. This yields rather 'interesting' results. So like all other sites that deal with clocks; make this one ignore backward clock movement too. Reported-by: NMike Galbraith <bitbucket@online.de> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140227094035.GZ9987@twins.programming.kicks-ass.net Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 17 4月, 2014 4 次提交
-
-
由 zhangwei(Jovi) 提交于
Forgot to free uprobe_cpu_buffer percpu page in uprobe_buffer_disable(). Link: http://lkml.kernel.org/p/534F8B3F.1090407@huawei.com Cc: stable@vger.kernel.org # v3.14+ Acked-by: NNamhyung Kim <namhyung@kernel.org> Signed-off-by: Nzhangwei(Jovi) <jovi.zhangwei@huawei.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Kirill Tkhai 提交于
We need to do it like we do for the other higher priority classes.. Signed-off-by: NKirill Tkhai <tkhai@yandex.ru> Cc: Michael wang <wangyun@linux.vnet.ibm.com> Cc: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/336561397137116@web27h.yandex.ruSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Steven Rostedt (Red Hat) 提交于
With the restructing of the function tracer working with instances, the "top level" buffer is a bit special, as the function tracing is mapped to the same set of filters. This is done by using a "global_ops" descriptor and having the "set_ftrace_filter" and "set_ftrace_notrace" map to it. When an instance is created, it creates the same files but its for the local instance and not the global_ops. The issues is that the local instance creation shares some code with the global instance one and we end up trying to create th top level "set_ftrace_*" files twice, and on boot up, we get an error like this: Could not create debugfs 'set_ftrace_filter' entry Could not create debugfs 'set_ftrace_notrace' entry The reason they failed to be created was because they were created twice, and the second time gives this error as you can not create the same file twice. Reported-by: NBorislav Petkov <bp@alien8.de> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Kees Cook 提交于
This sets the correct error code when final filter memory is unavailable, and frees the raw filter no matter what. unreferenced object 0xffff8800d6ea4000 (size 512): comm "sshd", pid 278, jiffies 4294898315 (age 46.653s) hex dump (first 32 bytes): 21 00 00 00 04 00 00 00 15 00 01 00 3e 00 00 c0 !...........>... 06 00 00 00 00 00 00 00 21 00 00 00 00 00 00 00 ........!....... backtrace: [<ffffffff8151414e>] kmemleak_alloc+0x4e/0xb0 [<ffffffff811a3a40>] __kmalloc+0x280/0x320 [<ffffffff8110842e>] prctl_set_seccomp+0x11e/0x3b0 [<ffffffff8107bb6b>] SyS_prctl+0x3bb/0x4a0 [<ffffffff8152ef2d>] system_call_fastpath+0x1a/0x1f [<ffffffffffffffff>] 0xffffffffffffffff Reported-by: NMasami Ichikawa <masami256@gmail.com> Signed-off-by: NKees Cook <keescook@chromium.org> Tested-by: NMasami Ichikawa <masami256@gmail.com> Acked-by: NDaniel Borkmann <dborkman@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 16 4月, 2014 3 次提交
-
-
由 Viresh Kumar 提交于
Since commit d689fe22 (NOHZ: Check for nohz active instead of nohz enabled) the tick_nohz_switch_to_nohz() function returns because it checks for the tick_nohz_active flag. This can't be set, because the function itself sets it. Undo the change in tick_nohz_switch_to_nohz(). Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org> Cc: linaro-kernel@lists.linaro.org Cc: fweisbec@gmail.com Cc: Arvind.Chauhan@arm.com Cc: linaro-networking@linaro.org Cc: <stable@vger.kernel.org> # 3.13+ Link: http://lkml.kernel.org/r/40939c05f2d65d781b92b20302b02243d0654224.1397537987.git.viresh.kumar@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Viresh Kumar 提交于
In tick_do_update_jiffies64() we are processing ticks only if delta is greater than tick_period. This is what we are supposed to do here and it broke a bit with this patch: commit 47a1b796 (tick/timekeeping: Call update_wall_time outside the jiffies lock) With above patch, we might end up calling update_wall_time() even if delta is found to be smaller that tick_period. Fix this by returning when the delta is less than tick period. [ tglx: Made it a 3 liner and massaged changelog ] Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org> Cc: linaro-kernel@lists.linaro.org Cc: fweisbec@gmail.com Cc: Arvind.Chauhan@arm.com Cc: linaro-networking@linaro.org Cc: John Stultz <john.stultz@linaro.org> Cc: <stable@vger.kernel.org> # v3.14+ Link: http://lkml.kernel.org/r/80afb18a494b0bd9710975bcc4de134ae323c74f.1397537987.git.viresh.kumar@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Viresh Kumar 提交于
tick_check_replacement() returns if a replacement of clock_event_device is possible or not. It does this as the first check: if (tick_check_percpu(curdev, newdev, smp_processor_id())) return false; Thats wrong. tick_check_percpu() returns true when the device is useable. Check for false instead. [ tglx: Massaged changelog ] Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org> Cc: <stable@vger.kernel.org> # v3.11+ Cc: linaro-kernel@lists.linaro.org Cc: fweisbec@gmail.com Cc: Arvind.Chauhan@arm.com Cc: linaro-networking@linaro.org Link: http://lkml.kernel.org/r/486a02efe0246635aaba786e24b42d316438bf3b.1397537987.git.viresh.kumar@linaro.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 15 4月, 2014 2 次提交
-
-
由 Mikulas Patocka 提交于
smp_read_barrier_depends() can be used if there is data dependency between the readers - i.e. if the read operation after the barrier uses address that was obtained from the read operation before the barrier. In this file, there is only control dependency, no data dependecy, so the use of smp_read_barrier_depends() is incorrect. The code could fail in the following way: * the cpu predicts that idx < entries is true and starts executing the body of the for loop * the cpu fetches map->extent[0].first and map->extent[0].count * the cpu fetches map->nr_extents * the cpu verifies that idx < extents is true, so it commits the instructions in the body of the for loop The problem is that in this scenario, the cpu read map->extent[0].first and map->nr_extents in the wrong order. We need a full read memory barrier to prevent it. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Daniel Borkmann 提交于
Linus reports that on 32-bit x86 Chromium throws the following seccomp resp. audit log messages: audit: type=1326 audit(1397359304.356:28108): auid=500 uid=500 gid=500 ses=2 subj=unconfined_u:unconfined_r:chrome_sandbox_t:s0-s0:c0.c1023 pid=3677 comm="chrome" exe="/opt/google/chrome/chrome" sig=0 syscall=172 compat=0 ip=0xb2dd9852 code=0x30000 audit: type=1326 audit(1397359304.356:28109): auid=500 uid=500 gid=500 ses=2 subj=unconfined_u:unconfined_r:chrome_sandbox_t:s0-s0:c0.c1023 pid=3677 comm="chrome" exe="/opt/google/chrome/chrome" sig=0 syscall=5 compat=0 ip=0xb2dd9852 code=0x50000 These audit messages are being triggered via audit_seccomp() through __secure_computing() in seccomp mode (BPF) filter with seccomp return codes 0x30000 (== SECCOMP_RET_TRAP) and 0x50000 (== SECCOMP_RET_ERRNO) during filter runtime. Moreover, Linus reports that x86_64 Chromium seems fine. The underlying issue that explains this is that the implementation of populate_seccomp_data() is wrong. Our seccomp data structure sd that is being shared with user ABI is: struct seccomp_data { int nr; __u32 arch; __u64 instruction_pointer; __u64 args[6]; }; Therefore, a simple cast to 'unsigned long *' for storing the value of the syscall argument via syscall_get_arguments() is just wrong as on 32-bit x86 (or any other 32bit arch), it would result in storing a0-a5 at wrong offsets in args[] member, and thus i) could leak stack memory to user space and ii) tampers with the logic of seccomp BPF programs that read out and check for syscall arguments: syscall_get_arguments(task, regs, 0, 1, (unsigned long *) &sd->args[0]); Tested on 32-bit x86 with Google Chrome, unfortunately only via remote test machine through slow ssh X forwarding, but it fixes the issue on my side. So fix it up by storing args in type correct variables, gcc is clever and optimizes the copy away in other cases, e.g. x86_64. Fixes: bd4cf0ed ("net: filter: rework/optimize internal BPF interpreter's instruction set") Reported-and-bisected-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NDaniel Borkmann <dborkman@redhat.com> Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Eric Paris <eparis@redhat.com> Cc: James Morris <james.l.morris@oracle.com> Cc: Kees Cook <keescook@chromium.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 13 4月, 2014 1 次提交
-
-
由 Davidlohr Bueso 提交于
Commits 11d4616b ("futex: revert back to the explicit waiter counting code") and 69cd9eba ("futex: avoid race between requeue and wake") changed some of the finer details of how we think about futexes. One was a late fix and the other a consequence of overlooking the whole requeuing logic. The first change caused our documentation to be incorrect, and the second made us aware that we need to explicitly add more details to it. Signed-off-by: NDavidlohr Bueso <davidlohr@hp.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 12 4月, 2014 1 次提交
-
-
由 Al Viro 提交于
that commit has fixed only the parts of that mess in fs/splice.c itself; there had been more in several other ->splice_read() instances... Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 11 4月, 2014 3 次提交
-
-
由 Peter Zijlstra 提交于
debug_mutex_unlock() would bail when !debug_locks and forgets to actually unlock. Reported-by: N"Michael L. Semon" <mlsemon35@gmail.com> Reported-by: N"Kirill A. Shutemov" <kirill@shutemov.name> Reported-by: NValdis Kletnieks <Valdis.Kletnieks@vt.edu> Fixes: 6f008e72 ("locking/mutex: Fix debug checks") Tested-by: NDave Jones <davej@redhat.com> Cc: Jason Low <jason.low2@hp.com> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140410141559.GE13658@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Mike Galbraith 提交于
Sasha reported that lockdep claims that the following commit: made numa_group.lock interrupt unsafe: 156654f4 ("sched/numa: Move task_numa_free() to __put_task_struct()") While I don't see how that could be, given the commit in question moved task_numa_free() from one irq enabled region to another, the below does make both gripes and lockups upon gripe with numa=fake=4 go away. Reported-by: NSasha Levin <sasha.levin@oracle.com> Fixes: 156654f4 ("sched/numa: Move task_numa_free() to __put_task_struct()") Signed-off-by: NMike Galbraith <bitbucket@online.de> Signed-off-by: NPeter Zijlstra <peterz@infradead.org> Cc: torvalds@linux-foundation.org Cc: mgorman@suse.com Cc: akpm@linux-foundation.org Cc: Dave Jones <davej@redhat.com> Link: http://lkml.kernel.org/r/1396860915.5170.5.camel@marge.simpson.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Steven Rostedt (Red Hat) 提交于
The debugfs tracing README file lists all the function triggers except for dump and cpudump. These should be added too. Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
- 10 4月, 2014 1 次提交
-
-
由 Mathieu Desnoyers 提交于
gcc <= 4.5.x has significant limitations with respect to initialization of anonymous unions within structures. They need to be surrounded by brackets, _and_ they need to be initialized in the same order in which they appear in the structure declaration. Link: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=10676 Link: http://lkml.kernel.org/r/1397077568-3156-1-git-send-email-mathieu.desnoyers@efficios.comSigned-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
- 09 4月, 2014 4 次提交
-
-
由 Linus Torvalds 提交于
Jan Stancek reported: "pthread_cond_broadcast/4-1.c testcase from openposix testsuite (LTP) occasionally fails, because some threads fail to wake up. Testcase creates 5 threads, which are all waiting on same condition. Main thread then calls pthread_cond_broadcast() without holding mutex, which calls: futex(uaddr1, FUTEX_CMP_REQUEUE_PRIVATE, 1, 2147483647, uaddr2, ..) This immediately wakes up single thread A, which unlocks mutex and tries to wake up another thread: futex(uaddr2, FUTEX_WAKE_PRIVATE, 1) If thread A manages to call futex_wake() before any waiters are requeued for uaddr2, no other thread is woken up" The ordering constraints for the hash bucket waiter counting are that the waiter counts have to be incremented _before_ getting the spinlock (because the spinlock acts as part of the memory barrier), but the "requeue" operation didn't honor those rules, and nobody had even thought about that case. This fairly simple patch just increments the waiter count for the target hash bucket (hb2) when requeing a futex before taking the locks. It then decrements them again after releasing the lock - the code that actually moves the futex(es) between hash buckets will do the additional required waiter count housekeeping. Reported-and-tested-by: NJan Stancek <jstancek@redhat.com> Acked-by: NDavidlohr Bueso <davidlohr@hp.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org # 3.14 Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mathieu Desnoyers 提交于
Fix the following sparse warnings: CHECK kernel/tracepoint.c kernel/tracepoint.c:184:18: warning: incorrect type in assignment (different address spaces) kernel/tracepoint.c:184:18: expected struct tracepoint_func *tp_funcs kernel/tracepoint.c:184:18: got struct tracepoint_func [noderef] <asn:4>*funcs kernel/tracepoint.c:216:18: warning: incorrect type in assignment (different address spaces) kernel/tracepoint.c:216:18: expected struct tracepoint_func *tp_funcs kernel/tracepoint.c:216:18: got struct tracepoint_func [noderef] <asn:4>*funcs kernel/tracepoint.c:392:24: error: return expression in void function CC kernel/tracepoint.o kernel/tracepoint.c: In function tracepoint_module_going: kernel/tracepoint.c:491:6: warning: symbol 'syscall_regfunc' was not declared. Should it be static? kernel/tracepoint.c:508:6: warning: symbol 'syscall_unregfunc' was not declared. Should it be static? Link: http://lkml.kernel.org/r/1397049883-28692-1-git-send-email-mathieu.desnoyers@efficios.comSigned-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Steven Rostedt (Red Hat) 提交于
Instead of copying the num_tracepoints and tracepoints_ptrs from the module structure to the tp_mod structure, which only uses it to find the module associated to tracepoints of modules that are coming and going, simply copy the pointer to the module struct to the tracepoint tp_module structure. Also removed un-needed brackets around an if statement. Link: http://lkml.kernel.org/r/20140408201705.4dad2c4a@gandalf.local.homeAcked-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Mathieu Desnoyers 提交于
Register/unregister tracepoint probes with struct tracepoint pointer rather than tracepoint name. This change, which vastly simplifies tracepoint.c, has been proposed by Steven Rostedt. It also removes 8.8kB (mostly of text) to the vmlinux size. From this point on, the tracers need to pass a struct tracepoint pointer to probe register/unregister. A probe can now only be connected to a tracepoint that exists. Moreover, tracers are responsible for unregistering the probe before the module containing its associated tracepoint is unloaded. text data bss dec hex filename 10443444 4282528 10391552 25117524 17f4354 vmlinux.orig 10434930 4282848 10391552 25109330 17f2352 vmlinux Link: http://lkml.kernel.org/r/1396992381-23785-2-git-send-email-mathieu.desnoyers@efficios.com CC: Ingo Molnar <mingo@kernel.org> CC: Frederic Weisbecker <fweisbec@gmail.com> CC: Andrew Morton <akpm@linux-foundation.org> CC: Frank Ch. Eigler <fche@redhat.com> CC: Johannes Berg <johannes.berg@intel.com> Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com> [ SDR - fixed return val in void func in tracepoint_module_going() ] Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
- 08 4月, 2014 1 次提交
-
-
由 Josh Triplett 提交于
When the system has only one CPU, lglock is effectively a spinlock; map it directly to spinlock to eliminate the indirection and duplicate code. In addition to removing overhead, this drops 1.6k of code with a defconfig modified to have !CONFIG_SMP, and 1.1k with a minimal config. Signed-off-by: NJosh Triplett <josh@joshtriplett.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Michal Marek <mmarek@suse.cz> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: David Howells <dhowells@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Nick Piggin <npiggin@kernel.dk> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-