- 09 3月, 2016 1 次提交
-
-
由 Chen Fan 提交于
Per the x86-specific footnote to PCI spec r3.0, sec 6.2.4, the value 255 in the Interrupt Line register means "unknown" or "no connection." Previously, when we couldn't derive an IRQ from the _PRT, we fell back to using the value from Interrupt Line as an IRQ. It's questionable whether we should do that at all, but the spec clearly suggests we shouldn't do it for the value 255 on x86. Calling request_irq() with IRQ 255 may succeed, but the driver won't receive any interrupts. Or, if IRQ 255 is shared with another device, it may succeed, and the driver's ISR will be called at random times when the *other* device interrupts. Or it may fail if another device is using IRQ 255 with incompatible flags. What we *want* is for request_irq() to fail predictably so the driver can fall back to polling. On x86, assume 255 in the Interrupt Line means the INTx line is not connected. In that case, set dev->irq to IRQ_NOTCONNECTED so request_irq() will fail gracefully with -ENOTCONN. We found this problem on a system where Secure Boot firmware assigned Interrupt Line 255 to an i801_smbus device and another device was already using MSI-X IRQ 255. This was in v3.10, where i801_probe() fails if request_irq() fails: i801_smbus 0000:00:1f.3: enabling device (0140 -> 0143) i801_smbus 0000:00:1f.3: can't derive routing for PCI INT C i801_smbus 0000:00:1f.3: PCI INT C: no GSI genirq: Flags mismatch irq 255. 00000080 (i801_smbus) vs. 00000000 (megasa) CPU: 0 PID: 2487 Comm: kworker/0:1 Not tainted 3.10.0-229.el7.x86_64 #1 Hardware name: FUJITSU PRIMEQUEST 2800E2/D3736, BIOS PRIMEQUEST 2000 Serie5 Call Trace: dump_stack+0x19/0x1b __setup_irq+0x54a/0x570 request_threaded_irq+0xcc/0x170 i801_probe+0x32f/0x508 [i2c_i801] local_pci_probe+0x45/0xa0 i801_smbus 0000:00:1f.3: Failed to allocate irq 255: -16 i801_smbus: probe of 0000:00:1f.3 failed with error -16 After aeb8a3d1 ("i2c: i801: Check if interrupts are disabled"), i801_probe() will fall back to polling if request_irq() fails. But we still need this patch because request_irq() may succeed or fail depending on other devices in the system. If request_irq() fails, i801_smbus will work by falling back to polling, but if it succeeds, i801_smbus won't work because it expects interrupts that it may not receive. Signed-off-by: NChen Fan <chen.fan.fnst@cn.fujitsu.com> Acked-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NBjorn Helgaas <bhelgaas@google.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 04 3月, 2016 1 次提交
-
-
由 Steven Rostedt (Red Hat) 提交于
Commit 9f616680 "tracing: Allow triggers to filter for CPU ids and process names" added a 'comm' filter that will filter events based on the current tasks struct 'comm'. But this now hides the ability to filter events that have a 'comm' field too. For example, sched_migrate_task trace event. That has a 'comm' field of the task to be migrated. echo 'comm == "bash"' > events/sched_migrate_task/filter will now filter all sched_migrate_task events for tasks named "bash" that migrates other tasks (in interrupt context), instead of seeing when "bash" itself gets migrated. This fix requires a couple of changes. 1) Change the look up order for filter predicates to look at the events fields before looking at the generic filters. 2) Instead of basing the filter function off of the "comm" name, have the generic "comm" filter have its own filter_type (FILTER_COMM). Test against the type instead of the name to assign the filter function. 3) Add a new "COMM" filter that works just like "comm" but will filter based on the current task, even if the trace event contains a "comm" field. Do the same for "cpu" field, adding a FILTER_CPU and a filter "CPU". Cc: stable@vger.kernel.org # v4.3+ Fixes: 9f616680 "tracing: Allow triggers to filter for CPU ids and process names" Reported-by: NMatt Fleming <matt@codeblueprint.co.uk> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
- 25 2月, 2016 12 次提交
-
-
由 Peter Zijlstra 提交于
Since there is no serialization between task_function_call() doing task_curr() and the other CPU doing context switches, we could end up not sending an IPI even if we had to. And I'm not sure I still buy my own argument we're OK. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dvyukov@google.com Cc: eranian@google.com Cc: oleg@redhat.com Cc: panand@redhat.com Cc: sasha.levin@oracle.com Cc: vince@deater.net Link: http://lkml.kernel.org/r/20160224174948.340031200@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Completely reworks perf_install_in_context() (again!) in order to ensure that there will be no ctx time hole between add_event_to_ctx() and any potential ctx_sched_in(). Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dvyukov@google.com Cc: eranian@google.com Cc: oleg@redhat.com Cc: panand@redhat.com Cc: sasha.levin@oracle.com Cc: vince@deater.net Link: http://lkml.kernel.org/r/20160224174948.279399438@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Similar to the perf_enable_on_exec(), ensure that event timings are consistent across perf_event_enable(). Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dvyukov@google.com Cc: eranian@google.com Cc: oleg@redhat.com Cc: panand@redhat.com Cc: sasha.levin@oracle.com Cc: vince@deater.net Link: http://lkml.kernel.org/r/20160224174948.218288698@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
The recent commit 3e349507 ("perf: Fix perf_enable_on_exec() event scheduling") caused this by moving task_ctx_sched_out() from before __perf_event_mask_enable() to after it. The overlooked consequence of that change is that task_ctx_sched_out() would update the ctx time fields, and now __perf_event_mask_enable() uses stale time. In order to fix this, explicitly stop our context's time before enabling the event(s). Reported-by: NOleg Nesterov <oleg@redhat.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dvyukov@google.com Cc: eranian@google.com Cc: panand@redhat.com Cc: sasha.levin@oracle.com Cc: vince@deater.net Fixes: 3e349507 ("perf: Fix perf_enable_on_exec() event scheduling") Link: http://lkml.kernel.org/r/20160224174948.159242158@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Currently any ctx_sched_in() call will re-start the ctx time tracking, this means that calls like: ctx_sched_in(.event_type = EVENT_PINNED); ctx_sched_in(.event_type = EVENT_FLEXIBLE); will have a hole in their ctx time tracking. This is likely harmless but can confuse things a little. By adding EVENT_TIME, we can have the first ctx_sched_in() (is_active: 0 -> !0) start the time and any further ctx_sched_in() will leave the timestamps alone. Secondly, this allows for an early disable like: ctx_sched_out(.event_type = EVENT_TIME); which would update the ctx time (if the ctx is active) and any further calls to ctx_sched_out() would not further modify the ctx time. For ctx_sched_in() any 0 -> !0 transition will automatically include EVENT_TIME. For ctx_sched_out(), any transition that clears EVENT_ALL will automatically clear EVENT_TIME. These two rules ensure that under normal circumstances we need not bother with EVENT_TIME and get natural ctx time behaviour. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dvyukov@google.com Cc: eranian@google.com Cc: oleg@redhat.com Cc: panand@redhat.com Cc: sasha.levin@oracle.com Cc: vince@deater.net Link: http://lkml.kernel.org/r/20160224174948.100446561@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Because event_sched_out() checks event->pending_disable _before_ actually disabling the event, it can happen that the event fires after it checks but before it gets disabled. This would leave event->pending_disable set and the queued irq_work will try and process it. However, if the event trigger was during schedule(), the event might have been de-scheduled by the time the irq_work runs, and perf_event_disable_local() will fail. Fix this by checking event->pending_disable _after_ we call event->pmu->del(). This depends on the latter being a compiler barrier, such that the compiler does not lift the load and re-creates the problem. Tested-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dvyukov@google.com Cc: eranian@google.com Cc: oleg@redhat.com Cc: panand@redhat.com Cc: sasha.levin@oracle.com Cc: vince@deater.net Link: http://lkml.kernel.org/r/20160224174948.040469884@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
perf_install_in_context() relies upon the context switch hooks to have scheduled in events when the IPI misses its target -- after all, if the task has moved from the CPU (or wasn't running at all), it will have to context switch to run elsewhere. This however doesn't appear to be happening. It is possible for the IPI to not happen (task wasn't running) only to later observe the task running with an inactive context. The only possible explanation is that the context switch hooks are not called. Therefore put in a sync_sched() after toggling the jump_label to guarantee all CPUs will have them enabled before we install an event. A simple if (0->1) sync_sched() will not in fact work, because any further increment can race and complete before the sync_sched(). Therefore we must jump through some hoops. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dvyukov@google.com Cc: eranian@google.com Cc: oleg@redhat.com Cc: panand@redhat.com Cc: sasha.levin@oracle.com Cc: vince@deater.net Link: http://lkml.kernel.org/r/20160224174947.980211985@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Alexander reported that when the 'original' context gets destroyed, no new clones happen. This can happen irrespective of the ctx switch optimization, any task can die, even the parent, and we want to continue monitoring the task hierarchy until we either close the event or no tasks are left in the hierarchy. perf_event_init_context() will attempt to pin the 'parent' context during clone(). At that point current is the parent, and since current cannot have exited while executing clone(), its context cannot have passed through perf_event_exit_task_context(). Therefore perf_pin_task_context() cannot observe ctx->task == TASK_TOMBSTONE. However, since inherit_event() does: if (parent_event->parent) parent_event = parent_event->parent; it looks at the 'original' event when it does: is_orphaned_event(). This can return true if the context that contains the this event has passed through perf_event_exit_task_context(). And thus we'll fail to clone the perf context. Fix this by adding a new state: STATE_DEAD, which is set by perf_release() to indicate that the filedesc (or kernel reference) is dead and there are no observers for our data left. Only for STATE_DEAD will is_orphaned_event() be true and inhibit cloning. STATE_EXIT is otherwise preserved such that is_event_hup() remains functional and will report when the observed task hierarchy becomes empty. Reported-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com> Tested-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dvyukov@google.com Cc: eranian@google.com Cc: oleg@redhat.com Cc: panand@redhat.com Cc: sasha.levin@oracle.com Cc: vince@deater.net Fixes: c6e5b732 ("perf: Synchronously clean up child events") Link: http://lkml.kernel.org/r/20160224174947.919845295@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dvyukov@google.com Cc: eranian@google.com Cc: oleg@redhat.com Cc: panand@redhat.com Cc: sasha.levin@oracle.com Cc: vince@deater.net Link: http://lkml.kernel.org/r/20160224174947.860690919@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
In the err_file: fput(event_file) case, the event will not yet have been attached to a context. However perf_release() does assume it has been. Cure this. Tested-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dvyukov@google.com Cc: eranian@google.com Cc: oleg@redhat.com Cc: panand@redhat.com Cc: sasha.levin@oracle.com Cc: vince@deater.net Link: http://lkml.kernel.org/r/20160224174947.793996260@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
In case of: err_file: fput(event_file), we'll end up calling perf_release() which in turn will free the event. Do not then free the event _again_. Tested-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dvyukov@google.com Cc: eranian@google.com Cc: oleg@redhat.com Cc: panand@redhat.com Cc: sasha.levin@oracle.com Cc: vince@deater.net Link: http://lkml.kernel.org/r/20160224174947.697350349@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Consider the following scenario: CPU0 CPU1 ctx = find_get_ctx(); perf_event_exit_task_context() mutex_lock(&ctx->mutex); perf_install_in_context(ctx, ...); /* NO-OP */ mutex_unlock(&ctx->mutex); ... perf_release() WARN_ON_ONCE(event->state != STATE_EXIT); Since the event doesn't pass through perf_remove_from_context() because perf_install_in_context() NO-OPs because the ctx is dead, and perf_event_exit_task_context() will not observe the event because its not attached yet, the event->state will not be set. Solve this by revalidating ctx->task after we acquire ctx->mutex and failing the event creation as a whole. Tested-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: dvyukov@google.com Cc: eranian@google.com Cc: oleg@redhat.com Cc: panand@redhat.com Cc: sasha.levin@oracle.com Cc: vince@deater.net Link: http://lkml.kernel.org/r/20160224174947.626853419@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 24 2月, 2016 2 次提交
-
-
由 Steven Rostedt (Red Hat) 提交于
The ftrace:function event is only displayed for parsing the function tracer data. It is not used to enable function tracing, and does not include an "enable" file in its event directory. Originally, this event was kept separate from other events because it did not have a ->reg parameter. But perf added a "reg" parameter for its use which caused issues, because it made the event available to functions where it was not compatible for. Commit 9b63776f "tracing: Do not enable function event with enable" added a TRACE_EVENT_FL_IGNORE_ENABLE flag that prevented the function event from being enabled by normal trace events. But this commit missed keeping the function event from being displayed by the "available_events" directory, which is used to show what events can be enabled by set_event. One documented way to enable all events is to: cat available_events > set_event But because the function event is displayed in the available_events, this now causes an INVALID error: cat: write error: Invalid argument Reported-by: NChunyu Hu <chuhu@redhat.com> Fixes: 9b63776f "tracing: Do not enable function event with enable" Cc: stable@vger.kernel.org # 3.4+ Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
由 Toshi Kani 提交于
devm_memremap() returns an ERR_PTR() value in case of error. However, it returns NULL when memremap() failed. This causes the caller, such as the pmem driver, to proceed and oops later. Change devm_memremap() to return ERR_PTR(-ENXIO) when memremap() failed. Signed-off-by: NToshi Kani <toshi.kani@hpe.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: <stable@vger.kernel.org> Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 21 2月, 2016 1 次提交
-
-
由 Simon Guinot 提交于
In __request_region, if a conflict with a BUSY and MUXED resource is detected, then the caller goes to sleep and waits for the resource to be released. A pointer on the conflicting resource is kept. At wake-up this pointer is used as a parent to retry to request the region. A first problem is that this pointer might well be invalid (if for example the conflicting resource have already been freed). Another problem is that the next call to __request_region() fails to detect a remaining conflict. The previously conflicting resource is passed as a parameter and __request_region() will look for a conflict among the children of this resource and not at the resource itself. It is likely to succeed anyway, even if there is still a conflict. Instead, the parent of the conflicting resource should be passed to __request_region(). As a fix, this patch doesn't update the parent resource pointer in the case we have to wait for a muxed region right after. Reported-and-tested-by: NVincent Pelletier <plr.vincent@gmail.com> Signed-off-by: NSimon Guinot <simon.guinot@sequanux.org> Tested-by: NVincent Donnefort <vdonnefort@gmail.com> Cc: stable@kernel.org Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 20 2月, 2016 1 次提交
-
-
由 Yang Shi 提交于
When enabling stack trace via "echo 1 > /proc/sys/kernel/stack_tracer_enabled", the below KASAN warning is triggered: BUG: KASAN: stack-out-of-bounds in check_stack+0x344/0x848 at addr ffffffc0689ebab8 Read of size 8 by task ksoftirqd/4/29 page:ffffffbdc3a27ac0 count:0 mapcount:0 mapping: (null) index:0x0 flags: 0x0() page dumped because: kasan: bad access detected CPU: 4 PID: 29 Comm: ksoftirqd/4 Not tainted 4.5.0-rc1 #129 Hardware name: Freescale Layerscape 2085a RDB Board (DT) Call trace: [<ffffffc000091300>] dump_backtrace+0x0/0x3a0 [<ffffffc0000916c4>] show_stack+0x24/0x30 [<ffffffc0009bbd78>] dump_stack+0xd8/0x168 [<ffffffc000420bb0>] kasan_report_error+0x6a0/0x920 [<ffffffc000421688>] kasan_report+0x70/0xb8 [<ffffffc00041f7f0>] __asan_load8+0x60/0x78 [<ffffffc0002e05c4>] check_stack+0x344/0x848 [<ffffffc0002e0c8c>] stack_trace_call+0x1c4/0x370 [<ffffffc0002af558>] ftrace_ops_no_ops+0x2c0/0x590 [<ffffffc00009f25c>] ftrace_graph_call+0x0/0x14 [<ffffffc0000881bc>] fpsimd_thread_switch+0x24/0x1e8 [<ffffffc000089864>] __switch_to+0x34/0x218 [<ffffffc0011e089c>] __schedule+0x3ac/0x15b8 [<ffffffc0011e1f6c>] schedule+0x5c/0x178 [<ffffffc0001632a8>] smpboot_thread_fn+0x350/0x960 [<ffffffc00015b518>] kthread+0x1d8/0x2b0 [<ffffffc0000874d0>] ret_from_fork+0x10/0x40 Memory state around the buggy address: ffffffc0689eb980: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 f4 f4 f4 ffffffc0689eba00: f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00 >ffffffc0689eba80: 00 00 f1 f1 f1 f1 00 f4 f4 f4 f3 f3 f3 f3 00 00 ^ ffffffc0689ebb00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ffffffc0689ebb80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 The stacker tracer traverses the whole kernel stack when saving the max stack trace. It may touch the stack red zones to cause the warning. So, just disable the instrumentation to silence the warning. Link: http://lkml.kernel.org/r/1455309960-18930-1-git-send-email-yang.shi@linaro.orgSigned-off-by: NYang Shi <yang.shi@linaro.org> Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
-
- 19 2月, 2016 1 次提交
-
-
由 Toshi Kani 提交于
The pmem driver calls devm_memremap() to map a persistent memory range. When the pmem driver is unloaded, this memremap'd range is not released so the kernel will leak a vma. Fix devm_memremap_release() to handle a given memremap'd address properly. Signed-off-by: NToshi Kani <toshi.kani@hpe.com> Acked-by: NDan Williams <dan.j.williams@intel.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 18 2月, 2016 1 次提交
-
-
由 Jessica Yu 提交于
Remove the ftrace module notifier in favor of directly calling ftrace_module_enable() and ftrace_release_mod() in the module loader. Hard-coding the function calls directly in the module loader removes dependence on the module notifier call chain and provides better visibility and control over what gets called when, which is important to kernel utilities such as livepatch. This fixes a notifier ordering issue in which the ftrace module notifier (and hence ftrace_module_enable()) for coming modules was being called after klp_module_notify(), which caused livepatch modules to initialize incorrectly. This patch removes dependence on the module notifier call chain in favor of hard coding the corresponding function calls in the module loader. This ensures that ftrace and livepatch code get called in the correct order on patch module load and unload. Fixes: 5156dca3 ("ftrace: Fix the race between ftrace and insmod") Signed-off-by: NJessica Yu <jeyu@redhat.com> Reviewed-by: NSteven Rostedt <rostedt@goodmis.org> Reviewed-by: NPetr Mladek <pmladek@suse.cz> Acked-by: NRusty Russell <rusty@rustcorp.com.au> Reviewed-by: NJosh Poimboeuf <jpoimboe@redhat.com> Reviewed-by: NMiroslav Benes <mbenes@suse.cz> Signed-off-by: NJiri Kosina <jkosina@suse.cz>
-
- 17 2月, 2016 4 次提交
-
-
由 Thomas Gleixner 提交于
If CPU_UP_PREPARE is called it is not guaranteed, that a previously allocated and assigned hash has been freed already, but perf_event_init_cpu() unconditionally allocates and assignes a new hash if the swhash is referenced. By overwriting the pointer the existing hash is not longer accessible. Verify that there is no hash assigned on this cpu before allocating and assigning a new one. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/20160209201007.843269966@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Thomas Gleixner 提交于
If CPU_DOWN_PREPARE fails the perf hotplug notifier is called for CPU_DOWN_FAILED and calls perf_event_init_cpu(), which checks whether the swhash is referenced. If yes it allocates a new hash and stores the pointer in the per cpu data structure. But at this point the cpu is still online, so there must be a valid hash already. By overwriting the pointer the existing hash is not longer accessible. Remove the CPU_DOWN_FAILED state, as there is nothing to (re)allocate. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/20160209201007.763417379@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Thomas Gleixner 提交于
If CPU_UP_PREPARE fails the perf hotplug code calls perf_event_exit_cpu(), which is a pointless exercise. The cpu is not online, so the smp function calls return -ENXIO. So the result is a list walk to call noops. Remove it. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Vince Weaver <vincent.weaver@maine.edu> Link: http://lkml.kernel.org/r/20160209201007.682184765@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Steven Rostedt 提交于
It's "too much" not "to much". Signed-off-by: NSteven Rostedt <rostedt@goodmis.org> Acked-by: NJuri Lelli <juri.lelli@arm.com> Cc: Jiri Kosina <trivial@kernel.org> Cc: Juri Lelli <juri.lelli@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/20160210120422.4ca77e68@gandalf.local.homeSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 12 2月, 2016 2 次提交
-
-
由 Dan Williams 提交于
The pfn_t type uses an unsigned long to store a pfn + flags value. On a 64-bit platform the upper 12 bits of an unsigned long are never used for storing the value of a pfn. However, this is not true on highmem platforms, all 32-bits of a pfn value are used to address a 44-bit physical address space. A pfn_t needs to store a 64-bit value. Link: https://bugzilla.kernel.org/show_bug.cgi?id=112211 Fixes: 01c8f1c4 ("mm, dax, gpu: convert vm_insert_mixed to pfn_t") Signed-off-by: NDan Williams <dan.j.williams@intel.com> Reported-by: NStuart Foster <smf.linux@ntlworld.com> Reported-by: NJulian Margetson <runaway@candw.ms> Tested-by: NJulian Margetson <runaway@candw.ms> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Andrew Morton 提交于
Mike said: : CONFIG_UBSAN_ALIGNMENT breaks x86-64 kernel with lockdep enabled, i. e : kernel with CONFIG_UBSAN_ALIGNMENT fails to load without even any error : message. : : The problem is that ubsan callbacks use spinlocks and might be called : before lockdep is initialized. Particularly this line in the : reserve_ebda_region function causes problem: : : lowmem = *(unsigned short *)__va(BIOS_LOWMEM_KILOBYTES); : : If i put lockdep_init() before reserve_ebda_region call in : x86_64_start_reservations kernel loads well. Fix this ordering issue permanently: change lockdep so that it uses hlists for the hash tables. Unlike a list_head, an hlist_head is in its initialized state when it is all-zeroes, so lockdep is ready for operation immediately upon boot - lockdep_init() need not have run. The patch will also save some memory. lockdep_init() and lockdep_initialized can be done away with now - a 4.6 patch has been prepared to do this. Reported-by: NMike Krinkin <krinkin.m.u@gmail.com> Suggested-by: NMike Krinkin <krinkin.m.u@gmail.com> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 11 2月, 2016 2 次提交
-
-
由 Daniel Borkmann 提交于
When ctx access is used, the kernel often needs to expand/rewrite instructions, so after that patching, branch offsets have to be adjusted for both forward and backward jumps in the new eBPF program, but for backward jumps it fails to account the delta. Meaning, for example, if the expansion happens exactly on the insn that sits at the jump target, it doesn't fix up the back jump offset. Analysis on what the check in adjust_branches() is currently doing: /* adjust offset of jmps if necessary */ if (i < pos && i + insn->off + 1 > pos) insn->off += delta; else if (i > pos && i + insn->off + 1 < pos) insn->off -= delta; First condition (forward jumps): Before: After: insns[0] insns[0] insns[1] <--- i/insn insns[1] <--- i/insn insns[2] <--- pos insns[P] <--- pos insns[3] insns[P] `------| delta insns[4] <--- target_X insns[P] `-----| insns[5] insns[3] insns[4] <--- target_X insns[5] First case is if we cross pos-boundary and the jump instruction was before pos. This is handeled correctly. I.e. if i == pos, then this would mean our jump that we currently check was the patchlet itself that we just injected. Since such patchlets are self-contained and have no awareness of any insns before or after the patched one, the delta is correctly not adjusted. Also, for the second condition in case of i + insn->off + 1 == pos, means we jump to that newly patched instruction, so no offset adjustment are needed. That part is correct. Second condition (backward jumps): Before: After: insns[0] insns[0] insns[1] <--- target_X insns[1] <--- target_X insns[2] <--- pos <-- target_Y insns[P] <--- pos <-- target_Y insns[3] insns[P] `------| delta insns[4] <--- i/insn insns[P] `-----| insns[5] insns[3] insns[4] <--- i/insn insns[5] Second interesting case is where we cross pos-boundary and the jump instruction was after pos. Backward jump with i == pos would be impossible and pose a bug somewhere in the patchlet, so the first condition checking i > pos is okay only by itself. However, i + insn->off + 1 < pos does not always work as intended to trigger the adjustment. It works when jump targets would be far off where the delta wouldn't matter. But, for example, where the fixed insn->off before pointed to pos (target_Y), it now points to pos + delta, so that additional room needs to be taken into account for the check. This means that i) both tests here need to be adjusted into pos + delta, and ii) for the second condition, the test needs to be <= as pos itself can be a target in the backjump, too. Fixes: 9bac3d6d ("bpf: allow extended BPF programs access skb fields") Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Tejun Heo 提交于
When looking up the pool_workqueue to use for an unbound workqueue, workqueue assumes that the target CPU is always bound to a valid NUMA node. However, currently, when a CPU goes offline, the mapping is destroyed and cpu_to_node() returns NUMA_NO_NODE. This has always been broken but hasn't triggered often enough before 874bbfe6 ("workqueue: make sure delayed work run in local cpu"). After the commit, workqueue forcifully assigns the local CPU for delayed work items without explicit target CPU to fix a different issue. This widens the window where CPU can go offline while a delayed work item is pending causing delayed work items dispatched with target CPU set to an already offlined CPU. The resulting NUMA_NO_NODE mapping makes workqueue try to queue the work item on a NULL pool_workqueue and thus crash. While 874bbfe6 has been reverted for a different reason making the bug less visible again, it can still happen. Fix it by mapping NUMA_NO_NODE to the default pool_workqueue from unbound_pwq_by_node(). This is a temporary workaround. The long term solution is keeping CPU -> NODE mapping stable across CPU off/online cycles which is being worked on. Signed-off-by: NTejun Heo <tj@kernel.org> Reported-by: NMike Galbraith <umgwanakikbuti@gmail.com> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Len Brown <len.brown@intel.com> Cc: stable@vger.kernel.org Link: http://lkml.kernel.org/g/1454424264.11183.46.camel@gmail.com Link: http://lkml.kernel.org/g/1453702100-2597-1-git-send-email-tangchen@cn.fujitsu.com
-
- 10 2月, 2016 3 次提交
-
-
由 Tejun Heo 提交于
Workqueue used to guarantee local execution for work items queued without explicit target CPU. The guarantee is gone now which can break some usages in subtle ways. To flush out those cases, this patch implements a debug feature which forces round-robin CPU selection for all such work items. The debug feature defaults to off and can be enabled with a kernel parameter. The default can be flipped with a debug config option. If you hit this commit during bisection, please refer to 041bd12e ("Revert "workqueue: make sure delayed work run in local cpu"") for more information and ping me. Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Mike Galbraith 提交于
WORK_CPU_UNBOUND work items queued to a bound workqueue always run locally. This is a good thing normally, but not when the user has asked us to keep unbound work away from certain CPUs. Round robin these to wq_unbound_cpumask CPUs instead, as perturbation avoidance trumps performance. tj: Cosmetic and comment changes. WARN_ON_ONCE() dropped from empty (wq_unbound_cpumask AND cpu_online_mask). If we want that, it should be done when config changes. Signed-off-by: NMike Galbraith <umgwanakikbuti@gmail.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Tejun Heo 提交于
This reverts commit 874bbfe6. Workqueue used to implicity guarantee that work items queued without explicit CPU specified are put on the local CPU. Recent changes in timer broke the guarantee and led to vmstat breakage which was fixed by 176bed1d ("vmstat: explicitly schedule per-cpu work on the CPU we need it to run on"). vmstat is the most likely to expose the issue and it's quite possible that there are other similar problems which are a lot more difficult to trigger. As a preventive measure, 874bbfe6 ("workqueue: make sure delayed work run in local cpu") was applied to restore the local CPU guarnatee. Unfortunately, the change exposed a bug in timer code which got fixed by 22b886dd ("timers: Use proper base migration in add_timer_on()"). Due to code restructuring, the commit couldn't be backported beyond certain point and stable kernels which only had 874bbfe6 started crashing. The local CPU guarantee was accidental more than anything else and we want to get rid of it anyway. As, with the vmstat case fixed, 874bbfe6 is causing more problems than it's fixing, it has been decided to take the chance and officially break the guarantee by reverting the commit. A debug feature will be added to force foreign CPU assignment to expose cases relying on the guarantee and fixes for the individual cases will be backported to stable as necessary. Signed-off-by: NTejun Heo <tj@kernel.org> Fixes: 874bbfe6 ("workqueue: make sure delayed work run in local cpu") Link: http://lkml.kernel.org/g/20160120211926.GJ10810@quack.suse.cz Cc: stable@vger.kernel.org Cc: Mike Galbraith <umgwanakikbuti@gmail.com> Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br> Cc: Daniel Bilik <daniel.bilik@neosystem.cz> Cc: Jan Kara <jack@suse.cz> Cc: Shaohua Li <shli@fb.com> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Ben Hutchings <ben@decadent.org.uk> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Daniel Bilik <daniel.bilik@neosystem.cz> Cc: Jiri Slaby <jslaby@suse.cz> Cc: Michal Hocko <mhocko@kernel.org>
-
- 09 2月, 2016 1 次提交
-
-
由 Dmitry Vyukov 提交于
check_prev_add() caches saved stack trace in static trace variable to avoid duplicate save_trace() calls in dependencies involving trylocks. But that caching logic contains a bug. We may not save trace on first iteration due to early return from check_prev_add(). Then on the second iteration when we actually need the trace we don't save it because we think that we've already saved it. Let check_prev_add() itself control when stack is saved. There is another bug. Trace variable is protected by graph lock. But we can temporary release graph lock during printing. Fix this by invalidating cached stack trace when we release graph lock. Signed-off-by: NDmitry Vyukov <dvyukov@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: glider@google.com Cc: kcc@google.com Cc: peter@hurleysoftware.com Cc: sasha.levin@oracle.com Link: http://lkml.kernel.org/r/1454593240-121647-1-git-send-email-dvyukov@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
- 06 2月, 2016 1 次提交
-
-
由 Sasha Levin 提交于
A random wakeup can get us out of sigsuspend() without TIF_SIGPENDING being set. Avoid that by making sure we were signaled, like sys_pause() does. Signed-off-by: NSasha Levin <sasha.levin@oracle.com> Acked-by: NOleg Nesterov <oleg@redhat.com> Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 03 2月, 2016 3 次提交
-
-
由 Rusty Russell 提交于
For CONFIG_KALLSYMS, we keep two symbol tables and two string tables. There's one full copy, marked SHF_ALLOC and laid out at the end of the module's init section. There's also a cut-down version that only contains core symbols and strings, and lives in the module's core section. After module init (and before we free the module memory), we switch the mod->symtab, mod->num_symtab and mod->strtab to point to the core versions. We do this under the module_mutex. However, kallsyms doesn't take the module_mutex: it uses preempt_disable() and rcu tricks to walk through the modules, because it's used in the oops path. It's also used in /proc/kallsyms. There's nothing atomic about the change of these variables, so we can get the old (larger!) num_symtab and the new symtab pointer; in fact this is what I saw when trying to reproduce. By grouping these variables together, we can use a carefully-dereferenced pointer to ensure we always get one or the other (the free of the module init section is already done in an RCU callback, so that's safe). We allocate the init one at the end of the module init section, and keep the core one inside the struct module itself (it could also have been allocated at the end of the module core, but that's probably overkill). Reported-by: NWeilong Chen <chenweilong@huawei.com> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=111541 Cc: stable@kernel.org Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
由 Rusty Russell 提交于
This trivial wrapper adds clarity and makes the following patch smaller. Cc: stable@kernel.org Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
由 Luis R. Rodriguez 提交于
Commit f2411da7 ("driver-core: add driver module asynchronous probe support") added async probe support, in two forms: * in-kernel driver specification annotation * generic async_probe module parameter (modprobe foo async_probe) To support the generic kernel parameter parse_args() was extended via commit ecc86170 ("module: add extra argument for parse_params() callback") however commit failed to f2411da7 failed to add the required argument. This causes a crash then whenever async_probe generic module parameter is used. This was overlooked when the form in which in-kernel async probe support was reworked a bit... Fix this as originally intended. Cc: Hannes Reinecke <hare@suse.de> Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com> Cc: stable@vger.kernel.org (4.2+) Signed-off-by: NLuis R. Rodriguez <mcgrof@suse.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> [minimized]
-
- 01 2月, 2016 1 次提交
-
-
由 Dan Williams 提交于
A dma_addr_t is potentially smaller than a phys_addr_t on some archs. Don't truncate the address when doing the pfn conversion. Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Reported-by: NMatthew Wilcox <willy@linux.intel.com> [willy: fix pfn_t_to_phys as well] Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 30 1月, 2016 3 次提交
-
-
由 Zhen Lei 提交于
Accidentally discovered this typo when I studied this module. Signed-off-by: NZhen Lei <thunder.leizhen@huawei.com> Cc: Hanjun Guo <guohanjun@huawei.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tianhong Ding <dingtianhong@huawei.com> Cc: Xinwei Hu <huxinwei@huawei.com> Cc: Zefan Li <lizefan@huawei.com> Link: http://lkml.kernel.org/r/1454119457-11272-1-git-send-email-thunder.leizhen@huawei.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Dan Williams 提交于
to_vmem_altmap() needs to return valid results until arch_remove_memory() completes. It also needs to be valid for any pfn in a section regardless of whether that pfn maps to data. This escape was a result of a bug in the unit test. The signature of this bug is that free_pagetable() fails to retrieve a vmem_altmap and goes off into the weeds: BUG: unable to handle kernel NULL pointer dereference at (null) IP: [<ffffffff811d2629>] get_pfnblock_flags_mask+0x49/0x60 [..] Call Trace: [<ffffffff811d3477>] free_hot_cold_page+0x97/0x1d0 [<ffffffff811d367a>] __free_pages+0x2a/0x40 [<ffffffff8191e669>] free_pagetable+0x8c/0xd4 [<ffffffff8191ef4e>] remove_pagetable+0x37a/0x808 [<ffffffff8191b210>] vmemmap_free+0x10/0x20 Fixes: 4b94ffdc ("x86, mm: introduce vmem_altmap to augment vmemmap_populate()") Cc: Andrew Morton <akpm@linux-foundation.org> Reported-by: NJeff Moyer <jmoyer@redhat.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
由 Tejun Heo 提交于
fca839c0 ("workqueue: warn if memory reclaim tries to flush !WQ_MEM_RECLAIM workqueue") implemented flush dependency warning which triggers if a PF_MEMALLOC task or WQ_MEM_RECLAIM workqueue tries to flush a !WQ_MEM_RECLAIM workquee. This assumes that workqueues marked with WQ_MEM_RECLAIM sit in memory reclaim path and making it depend on something which may need more memory to make forward progress can lead to deadlocks. Unfortunately, workqueues created with the legacy create*_workqueue() interface always have WQ_MEM_RECLAIM regardless of whether they are depended upon memory reclaim or not. These spurious WQ_MEM_RECLAIM markings cause spurious triggering of the flush dependency checks. WARNING: CPU: 0 PID: 6 at kernel/workqueue.c:2361 check_flush_dependency+0x138/0x144() workqueue: WQ_MEM_RECLAIM deferwq:deferred_probe_work_func is flushing !WQ_MEM_RECLAIM events:lru_add_drain_per_cpu ... Workqueue: deferwq deferred_probe_work_func [<c0017acc>] (unwind_backtrace) from [<c0013134>] (show_stack+0x10/0x14) [<c0013134>] (show_stack) from [<c0245f18>] (dump_stack+0x94/0xd4) [<c0245f18>] (dump_stack) from [<c0026f9c>] (warn_slowpath_common+0x80/0xb0) [<c0026f9c>] (warn_slowpath_common) from [<c0026ffc>] (warn_slowpath_fmt+0x30/0x40) [<c0026ffc>] (warn_slowpath_fmt) from [<c00390b8>] (check_flush_dependency+0x138/0x144) [<c00390b8>] (check_flush_dependency) from [<c0039ca0>] (flush_work+0x50/0x15c) [<c0039ca0>] (flush_work) from [<c00c51b0>] (lru_add_drain_all+0x130/0x180) [<c00c51b0>] (lru_add_drain_all) from [<c00f728c>] (migrate_prep+0x8/0x10) [<c00f728c>] (migrate_prep) from [<c00bfbc4>] (alloc_contig_range+0xd8/0x338) [<c00bfbc4>] (alloc_contig_range) from [<c00f8f18>] (cma_alloc+0xe0/0x1ac) [<c00f8f18>] (cma_alloc) from [<c001cac4>] (__alloc_from_contiguous+0x38/0xd8) [<c001cac4>] (__alloc_from_contiguous) from [<c001ceb4>] (__dma_alloc+0x240/0x278) [<c001ceb4>] (__dma_alloc) from [<c001cf78>] (arm_dma_alloc+0x54/0x5c) [<c001cf78>] (arm_dma_alloc) from [<c0355ea4>] (dmam_alloc_coherent+0xc0/0xec) [<c0355ea4>] (dmam_alloc_coherent) from [<c039cc4c>] (ahci_port_start+0x150/0x1dc) [<c039cc4c>] (ahci_port_start) from [<c0384734>] (ata_host_start.part.3+0xc8/0x1c8) [<c0384734>] (ata_host_start.part.3) from [<c03898dc>] (ata_host_activate+0x50/0x148) [<c03898dc>] (ata_host_activate) from [<c039d558>] (ahci_host_activate+0x44/0x114) [<c039d558>] (ahci_host_activate) from [<c039f05c>] (ahci_platform_init_host+0x1d8/0x3c8) [<c039f05c>] (ahci_platform_init_host) from [<c039e6bc>] (tegra_ahci_probe+0x448/0x4e8) [<c039e6bc>] (tegra_ahci_probe) from [<c0347058>] (platform_drv_probe+0x50/0xac) [<c0347058>] (platform_drv_probe) from [<c03458cc>] (driver_probe_device+0x214/0x2c0) [<c03458cc>] (driver_probe_device) from [<c0343cc0>] (bus_for_each_drv+0x60/0x94) [<c0343cc0>] (bus_for_each_drv) from [<c03455d8>] (__device_attach+0xb0/0x114) [<c03455d8>] (__device_attach) from [<c0344ab8>] (bus_probe_device+0x84/0x8c) [<c0344ab8>] (bus_probe_device) from [<c0344f48>] (deferred_probe_work_func+0x68/0x98) [<c0344f48>] (deferred_probe_work_func) from [<c003b738>] (process_one_work+0x120/0x3f8) [<c003b738>] (process_one_work) from [<c003ba48>] (worker_thread+0x38/0x55c) [<c003ba48>] (worker_thread) from [<c0040f14>] (kthread+0xdc/0xf4) [<c0040f14>] (kthread) from [<c000f778>] (ret_from_fork+0x14/0x3c) Fix it by marking workqueues created via create*_workqueue() with __WQ_LEGACY and disabling flush dependency checks on them. Signed-off-by: NTejun Heo <tj@kernel.org> Reported-and-tested-by: NThierry Reding <thierry.reding@gmail.com> Link: http://lkml.kernel.org/g/20160126173843.GA11115@ulmo.nvidia.com Fixes: fca839c0 ("workqueue: warn if memory reclaim tries to flush !WQ_MEM_RECLAIM workqueue")
-