1. 09 3月, 2016 1 次提交
    • C
      x86/ACPI/PCI: Recognize that Interrupt Line 255 means "not connected" · e237a551
      Chen Fan 提交于
      Per the x86-specific footnote to PCI spec r3.0, sec 6.2.4, the value 255 in
      the Interrupt Line register means "unknown" or "no connection."
      Previously, when we couldn't derive an IRQ from the _PRT, we fell back to
      using the value from Interrupt Line as an IRQ.  It's questionable whether
      we should do that at all, but the spec clearly suggests we shouldn't do it
      for the value 255 on x86.
      
      Calling request_irq() with IRQ 255 may succeed, but the driver won't
      receive any interrupts.  Or, if IRQ 255 is shared with another device, it
      may succeed, and the driver's ISR will be called at random times when the
      *other* device interrupts.  Or it may fail if another device is using IRQ
      255 with incompatible flags.  What we *want* is for request_irq() to fail
      predictably so the driver can fall back to polling.
      
      On x86, assume 255 in the Interrupt Line means the INTx line is not
      connected.  In that case, set dev->irq to IRQ_NOTCONNECTED so request_irq()
      will fail gracefully with -ENOTCONN.
      
      We found this problem on a system where Secure Boot firmware assigned
      Interrupt Line 255 to an i801_smbus device and another device was already
      using MSI-X IRQ 255.  This was in v3.10, where i801_probe() fails if
      request_irq() fails:
      
        i801_smbus 0000:00:1f.3: enabling device (0140 -> 0143)
        i801_smbus 0000:00:1f.3: can't derive routing for PCI INT C
        i801_smbus 0000:00:1f.3: PCI INT C: no GSI
        genirq: Flags mismatch irq 255. 00000080 (i801_smbus) vs. 00000000 (megasa)
        CPU: 0 PID: 2487 Comm: kworker/0:1 Not tainted 3.10.0-229.el7.x86_64 #1
        Hardware name: FUJITSU PRIMEQUEST 2800E2/D3736, BIOS PRIMEQUEST 2000 Serie5
        Call Trace:
          dump_stack+0x19/0x1b
          __setup_irq+0x54a/0x570
          request_threaded_irq+0xcc/0x170
          i801_probe+0x32f/0x508 [i2c_i801]
          local_pci_probe+0x45/0xa0
        i801_smbus 0000:00:1f.3: Failed to allocate irq 255: -16
        i801_smbus: probe of 0000:00:1f.3 failed with error -16
      
      After aeb8a3d1 ("i2c: i801: Check if interrupts are disabled"),
      i801_probe() will fall back to polling if request_irq() fails.  But we
      still need this patch because request_irq() may succeed or fail depending
      on other devices in the system.  If request_irq() fails, i801_smbus will
      work by falling back to polling, but if it succeeds, i801_smbus won't work
      because it expects interrupts that it may not receive.
      Signed-off-by: NChen Fan <chen.fan.fnst@cn.fujitsu.com>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      e237a551
  2. 04 3月, 2016 1 次提交
    • S
      tracing: Do not have 'comm' filter override event 'comm' field · e57cbaf0
      Steven Rostedt (Red Hat) 提交于
      Commit 9f616680 "tracing: Allow triggers to filter for CPU ids and
      process names" added a 'comm' filter that will filter events based on the
      current tasks struct 'comm'. But this now hides the ability to filter events
      that have a 'comm' field too. For example, sched_migrate_task trace event.
      That has a 'comm' field of the task to be migrated.
      
       echo 'comm == "bash"' > events/sched_migrate_task/filter
      
      will now filter all sched_migrate_task events for tasks named "bash" that
      migrates other tasks (in interrupt context), instead of seeing when "bash"
      itself gets migrated.
      
      This fix requires a couple of changes.
      
      1) Change the look up order for filter predicates to look at the events
         fields before looking at the generic filters.
      
      2) Instead of basing the filter function off of the "comm" name, have the
         generic "comm" filter have its own filter_type (FILTER_COMM). Test
         against the type instead of the name to assign the filter function.
      
      3) Add a new "COMM" filter that works just like "comm" but will filter based
         on the current task, even if the trace event contains a "comm" field.
      
      Do the same for "cpu" field, adding a FILTER_CPU and a filter "CPU".
      
      Cc: stable@vger.kernel.org # v4.3+
      Fixes: 9f616680 "tracing: Allow triggers to filter for CPU ids and process names"
      Reported-by: NMatt Fleming <matt@codeblueprint.co.uk>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      e57cbaf0
  3. 25 2月, 2016 12 次提交
    • P
      perf: Robustify task_function_call() · 0da4cf3e
      Peter Zijlstra 提交于
      Since there is no serialization between task_function_call() doing
      task_curr() and the other CPU doing context switches, we could end
      up not sending an IPI even if we had to.
      
      And I'm not sure I still buy my own argument we're OK.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dvyukov@google.com
      Cc: eranian@google.com
      Cc: oleg@redhat.com
      Cc: panand@redhat.com
      Cc: sasha.levin@oracle.com
      Cc: vince@deater.net
      Link: http://lkml.kernel.org/r/20160224174948.340031200@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      0da4cf3e
    • P
      perf: Fix scaling vs. perf_install_in_context() · a096309b
      Peter Zijlstra 提交于
      Completely reworks perf_install_in_context() (again!) in order to
      ensure that there will be no ctx time hole between add_event_to_ctx()
      and any potential ctx_sched_in().
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dvyukov@google.com
      Cc: eranian@google.com
      Cc: oleg@redhat.com
      Cc: panand@redhat.com
      Cc: sasha.levin@oracle.com
      Cc: vince@deater.net
      Link: http://lkml.kernel.org/r/20160224174948.279399438@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a096309b
    • P
      perf: Fix scaling vs. perf_event_enable() · bd2afa49
      Peter Zijlstra 提交于
      Similar to the perf_enable_on_exec(), ensure that event timings are
      consistent across perf_event_enable().
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dvyukov@google.com
      Cc: eranian@google.com
      Cc: oleg@redhat.com
      Cc: panand@redhat.com
      Cc: sasha.levin@oracle.com
      Cc: vince@deater.net
      Link: http://lkml.kernel.org/r/20160224174948.218288698@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      bd2afa49
    • P
      perf: Fix scaling vs. perf_event_enable_on_exec() · 7fce2509
      Peter Zijlstra 提交于
      The recent commit 3e349507 ("perf: Fix perf_enable_on_exec() event
      scheduling") caused this by moving task_ctx_sched_out() from before
      __perf_event_mask_enable() to after it.
      
      The overlooked consequence of that change is that task_ctx_sched_out()
      would update the ctx time fields, and now __perf_event_mask_enable()
      uses stale time.
      
      In order to fix this, explicitly stop our context's time before
      enabling the event(s).
      Reported-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dvyukov@google.com
      Cc: eranian@google.com
      Cc: panand@redhat.com
      Cc: sasha.levin@oracle.com
      Cc: vince@deater.net
      Fixes: 3e349507 ("perf: Fix perf_enable_on_exec() event scheduling")
      Link: http://lkml.kernel.org/r/20160224174948.159242158@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7fce2509
    • P
      perf: Fix ctx time tracking by introducing EVENT_TIME · 3cbaa590
      Peter Zijlstra 提交于
      Currently any ctx_sched_in() call will re-start the ctx time tracking,
      this means that calls like:
      
      	ctx_sched_in(.event_type = EVENT_PINNED);
      	ctx_sched_in(.event_type = EVENT_FLEXIBLE);
      
      will have a hole in their ctx time tracking. This is likely harmless
      but can confuse things a little. By adding EVENT_TIME, we can have the
      first ctx_sched_in() (is_active: 0 -> !0) start the time and any
      further ctx_sched_in() will leave the timestamps alone.
      
      Secondly, this allows for an early disable like:
      
      	ctx_sched_out(.event_type = EVENT_TIME);
      
      which would update the ctx time (if the ctx is active) and any further
      calls to ctx_sched_out() would not further modify the ctx time.
      
      For ctx_sched_in() any 0 -> !0 transition will automatically include
      EVENT_TIME.
      
      For ctx_sched_out(), any transition that clears EVENT_ALL will
      automatically clear EVENT_TIME.
      
      These two rules ensure that under normal circumstances we need not
      bother with EVENT_TIME and get natural ctx time behaviour.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dvyukov@google.com
      Cc: eranian@google.com
      Cc: oleg@redhat.com
      Cc: panand@redhat.com
      Cc: sasha.levin@oracle.com
      Cc: vince@deater.net
      Link: http://lkml.kernel.org/r/20160224174948.100446561@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      3cbaa590
    • P
      perf: Cure event->pending_disable race · 28a967c3
      Peter Zijlstra 提交于
      Because event_sched_out() checks event->pending_disable _before_
      actually disabling the event, it can happen that the event fires after
      it checks but before it gets disabled.
      
      This would leave event->pending_disable set and the queued irq_work
      will try and process it.
      
      However, if the event trigger was during schedule(), the event might
      have been de-scheduled by the time the irq_work runs, and
      perf_event_disable_local() will fail.
      
      Fix this by checking event->pending_disable _after_ we call
      event->pmu->del(). This depends on the latter being a compiler
      barrier, such that the compiler does not lift the load and re-creates
      the problem.
      Tested-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dvyukov@google.com
      Cc: eranian@google.com
      Cc: oleg@redhat.com
      Cc: panand@redhat.com
      Cc: sasha.levin@oracle.com
      Cc: vince@deater.net
      Link: http://lkml.kernel.org/r/20160224174948.040469884@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      28a967c3
    • P
      perf: Fix race between event install and jump_labels · 9107c89e
      Peter Zijlstra 提交于
      perf_install_in_context() relies upon the context switch hooks to have
      scheduled in events when the IPI misses its target -- after all, if
      the task has moved from the CPU (or wasn't running at all), it will
      have to context switch to run elsewhere.
      
      This however doesn't appear to be happening.
      
      It is possible for the IPI to not happen (task wasn't running) only to
      later observe the task running with an inactive context.
      
      The only possible explanation is that the context switch hooks are not
      called. Therefore put in a sync_sched() after toggling the jump_label
      to guarantee all CPUs will have them enabled before we install an
      event.
      
      A simple if (0->1) sync_sched() will not in fact work, because any
      further increment can race and complete before the sync_sched().
      Therefore we must jump through some hoops.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dvyukov@google.com
      Cc: eranian@google.com
      Cc: oleg@redhat.com
      Cc: panand@redhat.com
      Cc: sasha.levin@oracle.com
      Cc: vince@deater.net
      Link: http://lkml.kernel.org/r/20160224174947.980211985@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      9107c89e
    • P
      perf: Fix cloning · a69b0ca4
      Peter Zijlstra 提交于
      Alexander reported that when the 'original' context gets destroyed, no
      new clones happen.
      
      This can happen irrespective of the ctx switch optimization, any task
      can die, even the parent, and we want to continue monitoring the task
      hierarchy until we either close the event or no tasks are left in the
      hierarchy.
      
      perf_event_init_context() will attempt to pin the 'parent' context
      during clone(). At that point current is the parent, and since current
      cannot have exited while executing clone(), its context cannot have
      passed through perf_event_exit_task_context(). Therefore
      perf_pin_task_context() cannot observe ctx->task == TASK_TOMBSTONE.
      
      However, since inherit_event() does:
      
      	if (parent_event->parent)
      		parent_event = parent_event->parent;
      
      it looks at the 'original' event when it does: is_orphaned_event().
      This can return true if the context that contains the this event has
      passed through perf_event_exit_task_context(). And thus we'll fail to
      clone the perf context.
      
      Fix this by adding a new state: STATE_DEAD, which is set by
      perf_release() to indicate that the filedesc (or kernel reference) is
      dead and there are no observers for our data left.
      
      Only for STATE_DEAD will is_orphaned_event() be true and inhibit
      cloning.
      
      STATE_EXIT is otherwise preserved such that is_event_hup() remains
      functional and will report when the observed task hierarchy becomes
      empty.
      Reported-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Tested-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dvyukov@google.com
      Cc: eranian@google.com
      Cc: oleg@redhat.com
      Cc: panand@redhat.com
      Cc: sasha.levin@oracle.com
      Cc: vince@deater.net
      Fixes: c6e5b732 ("perf: Synchronously clean up child events")
      Link: http://lkml.kernel.org/r/20160224174947.919845295@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a69b0ca4
    • P
      perf: Only update context time when active · 6f932e5b
      Peter Zijlstra 提交于
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dvyukov@google.com
      Cc: eranian@google.com
      Cc: oleg@redhat.com
      Cc: panand@redhat.com
      Cc: sasha.levin@oracle.com
      Cc: vince@deater.net
      Link: http://lkml.kernel.org/r/20160224174947.860690919@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6f932e5b
    • P
      perf: Allow perf_release() with !event->ctx · a4f4bb6d
      Peter Zijlstra 提交于
      In the err_file: fput(event_file) case, the event will not yet have
      been attached to a context. However perf_release() does assume it has
      been. Cure this.
      Tested-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dvyukov@google.com
      Cc: eranian@google.com
      Cc: oleg@redhat.com
      Cc: panand@redhat.com
      Cc: sasha.levin@oracle.com
      Cc: vince@deater.net
      Link: http://lkml.kernel.org/r/20160224174947.793996260@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a4f4bb6d
    • P
      perf: Do not double free · 13005627
      Peter Zijlstra 提交于
      In case of: err_file: fput(event_file), we'll end up calling
      perf_release() which in turn will free the event.
      
      Do not then free the event _again_.
      Tested-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dvyukov@google.com
      Cc: eranian@google.com
      Cc: oleg@redhat.com
      Cc: panand@redhat.com
      Cc: sasha.levin@oracle.com
      Cc: vince@deater.net
      Link: http://lkml.kernel.org/r/20160224174947.697350349@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      13005627
    • P
      perf: Close install vs. exit race · 84c4e620
      Peter Zijlstra 提交于
      Consider the following scenario:
      
        CPU0					CPU1
      
        ctx = find_get_ctx();
      					perf_event_exit_task_context()
        mutex_lock(&ctx->mutex);
        perf_install_in_context(ctx, ...);
          /* NO-OP */
        mutex_unlock(&ctx->mutex);
      
        ...
      
        perf_release()
          WARN_ON_ONCE(event->state != STATE_EXIT);
      
      Since the event doesn't pass through perf_remove_from_context()
      because perf_install_in_context() NO-OPs because the ctx is dead, and
      perf_event_exit_task_context() will not observe the event because its
      not attached yet, the event->state will not be set.
      
      Solve this by revalidating ctx->task after we acquire ctx->mutex and
      failing the event creation as a whole.
      Tested-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NAlexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: dvyukov@google.com
      Cc: eranian@google.com
      Cc: oleg@redhat.com
      Cc: panand@redhat.com
      Cc: sasha.levin@oracle.com
      Cc: vince@deater.net
      Link: http://lkml.kernel.org/r/20160224174947.626853419@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      84c4e620
  4. 24 2月, 2016 2 次提交
    • S
      tracing: Fix showing function event in available_events · d045437a
      Steven Rostedt (Red Hat) 提交于
      The ftrace:function event is only displayed for parsing the function tracer
      data. It is not used to enable function tracing, and does not include an
      "enable" file in its event directory.
      
      Originally, this event was kept separate from other events because it did
      not have a ->reg parameter. But perf added a "reg" parameter for its use
      which caused issues, because it made the event available to functions where
      it was not compatible for.
      
      Commit 9b63776f "tracing: Do not enable function event with enable"
      added a TRACE_EVENT_FL_IGNORE_ENABLE flag that prevented the function event
      from being enabled by normal trace events. But this commit missed keeping
      the function event from being displayed by the "available_events" directory,
      which is used to show what events can be enabled by set_event.
      
      One documented way to enable all events is to:
      
       cat available_events > set_event
      
      But because the function event is displayed in the available_events, this
      now causes an INVALID error:
      
       cat: write error: Invalid argument
      Reported-by: NChunyu Hu <chuhu@redhat.com>
      Fixes: 9b63776f "tracing: Do not enable function event with enable"
      Cc: stable@vger.kernel.org # 3.4+
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      d045437a
    • T
      devm_memremap: Fix error value when memremap failed · 93f834df
      Toshi Kani 提交于
      devm_memremap() returns an ERR_PTR() value in case of error.
      However, it returns NULL when memremap() failed.  This causes
      the caller, such as the pmem driver, to proceed and oops later.
      
      Change devm_memremap() to return ERR_PTR(-ENXIO) when memremap()
      failed.
      Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: <stable@vger.kernel.org>
      Reviewed-by: NRoss Zwisler <ross.zwisler@linux.intel.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      93f834df
  5. 21 2月, 2016 1 次提交
    • S
      kernel/resource.c: fix muxed resource handling in __request_region() · 59ceeaaf
      Simon Guinot 提交于
      In __request_region, if a conflict with a BUSY and MUXED resource is
      detected, then the caller goes to sleep and waits for the resource to be
      released.  A pointer on the conflicting resource is kept.  At wake-up
      this pointer is used as a parent to retry to request the region.
      
      A first problem is that this pointer might well be invalid (if for
      example the conflicting resource have already been freed).  Another
      problem is that the next call to __request_region() fails to detect a
      remaining conflict.  The previously conflicting resource is passed as a
      parameter and __request_region() will look for a conflict among the
      children of this resource and not at the resource itself.  It is likely
      to succeed anyway, even if there is still a conflict.
      
      Instead, the parent of the conflicting resource should be passed to
      __request_region().
      
      As a fix, this patch doesn't update the parent resource pointer in the
      case we have to wait for a muxed region right after.
      Reported-and-tested-by: NVincent Pelletier <plr.vincent@gmail.com>
      Signed-off-by: NSimon Guinot <simon.guinot@sequanux.org>
      Tested-by: NVincent Donnefort <vdonnefort@gmail.com>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      59ceeaaf
  6. 20 2月, 2016 1 次提交
    • Y
      tracing, kasan: Silence Kasan warning in check_stack of stack_tracer · 6e22c836
      Yang Shi 提交于
      When enabling stack trace via "echo 1 > /proc/sys/kernel/stack_tracer_enabled",
      the below KASAN warning is triggered:
      
      BUG: KASAN: stack-out-of-bounds in check_stack+0x344/0x848 at addr ffffffc0689ebab8
      Read of size 8 by task ksoftirqd/4/29
      page:ffffffbdc3a27ac0 count:0 mapcount:0 mapping:          (null) index:0x0
      flags: 0x0()
      page dumped because: kasan: bad access detected
      CPU: 4 PID: 29 Comm: ksoftirqd/4 Not tainted 4.5.0-rc1 #129
      Hardware name: Freescale Layerscape 2085a RDB Board (DT)
      Call trace:
      [<ffffffc000091300>] dump_backtrace+0x0/0x3a0
      [<ffffffc0000916c4>] show_stack+0x24/0x30
      [<ffffffc0009bbd78>] dump_stack+0xd8/0x168
      [<ffffffc000420bb0>] kasan_report_error+0x6a0/0x920
      [<ffffffc000421688>] kasan_report+0x70/0xb8
      [<ffffffc00041f7f0>] __asan_load8+0x60/0x78
      [<ffffffc0002e05c4>] check_stack+0x344/0x848
      [<ffffffc0002e0c8c>] stack_trace_call+0x1c4/0x370
      [<ffffffc0002af558>] ftrace_ops_no_ops+0x2c0/0x590
      [<ffffffc00009f25c>] ftrace_graph_call+0x0/0x14
      [<ffffffc0000881bc>] fpsimd_thread_switch+0x24/0x1e8
      [<ffffffc000089864>] __switch_to+0x34/0x218
      [<ffffffc0011e089c>] __schedule+0x3ac/0x15b8
      [<ffffffc0011e1f6c>] schedule+0x5c/0x178
      [<ffffffc0001632a8>] smpboot_thread_fn+0x350/0x960
      [<ffffffc00015b518>] kthread+0x1d8/0x2b0
      [<ffffffc0000874d0>] ret_from_fork+0x10/0x40
      Memory state around the buggy address:
       ffffffc0689eb980: 00 00 00 00 00 00 00 00 f1 f1 f1 f1 00 f4 f4 f4
       ffffffc0689eba00: f3 f3 f3 f3 00 00 00 00 00 00 00 00 00 00 00 00
      >ffffffc0689eba80: 00 00 f1 f1 f1 f1 00 f4 f4 f4 f3 f3 f3 f3 00 00
                                              ^
       ffffffc0689ebb00: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
       ffffffc0689ebb80: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
      
      The stacker tracer traverses the whole kernel stack when saving the max stack
      trace. It may touch the stack red zones to cause the warning. So, just disable
      the instrumentation to silence the warning.
      
      Link: http://lkml.kernel.org/r/1455309960-18930-1-git-send-email-yang.shi@linaro.orgSigned-off-by: NYang Shi <yang.shi@linaro.org>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      6e22c836
  7. 19 2月, 2016 1 次提交
  8. 18 2月, 2016 1 次提交
  9. 17 2月, 2016 4 次提交
  10. 12 2月, 2016 2 次提交
  11. 11 2月, 2016 2 次提交
    • D
      bpf: fix branch offset adjustment on backjumps after patching ctx expansion · a1b14d27
      Daniel Borkmann 提交于
      When ctx access is used, the kernel often needs to expand/rewrite
      instructions, so after that patching, branch offsets have to be
      adjusted for both forward and backward jumps in the new eBPF program,
      but for backward jumps it fails to account the delta. Meaning, for
      example, if the expansion happens exactly on the insn that sits at
      the jump target, it doesn't fix up the back jump offset.
      
      Analysis on what the check in adjust_branches() is currently doing:
      
        /* adjust offset of jmps if necessary */
        if (i < pos && i + insn->off + 1 > pos)
          insn->off += delta;
        else if (i > pos && i + insn->off + 1 < pos)
          insn->off -= delta;
      
      First condition (forward jumps):
      
        Before:                         After:
      
        insns[0]                        insns[0]
        insns[1] <--- i/insn            insns[1] <--- i/insn
        insns[2] <--- pos               insns[P] <--- pos
        insns[3]                        insns[P]  `------| delta
        insns[4] <--- target_X          insns[P]   `-----|
        insns[5]                        insns[3]
                                        insns[4] <--- target_X
                                        insns[5]
      
      First case is if we cross pos-boundary and the jump instruction was
      before pos. This is handeled correctly. I.e. if i == pos, then this
      would mean our jump that we currently check was the patchlet itself
      that we just injected. Since such patchlets are self-contained and
      have no awareness of any insns before or after the patched one, the
      delta is correctly not adjusted. Also, for the second condition in
      case of i + insn->off + 1 == pos, means we jump to that newly patched
      instruction, so no offset adjustment are needed. That part is correct.
      
      Second condition (backward jumps):
      
        Before:                         After:
      
        insns[0]                        insns[0]
        insns[1] <--- target_X          insns[1] <--- target_X
        insns[2] <--- pos <-- target_Y  insns[P] <--- pos <-- target_Y
        insns[3]                        insns[P]  `------| delta
        insns[4] <--- i/insn            insns[P]   `-----|
        insns[5]                        insns[3]
                                        insns[4] <--- i/insn
                                        insns[5]
      
      Second interesting case is where we cross pos-boundary and the jump
      instruction was after pos. Backward jump with i == pos would be
      impossible and pose a bug somewhere in the patchlet, so the first
      condition checking i > pos is okay only by itself. However, i +
      insn->off + 1 < pos does not always work as intended to trigger the
      adjustment. It works when jump targets would be far off where the
      delta wouldn't matter. But, for example, where the fixed insn->off
      before pointed to pos (target_Y), it now points to pos + delta, so
      that additional room needs to be taken into account for the check.
      This means that i) both tests here need to be adjusted into pos + delta,
      and ii) for the second condition, the test needs to be <= as pos
      itself can be a target in the backjump, too.
      
      Fixes: 9bac3d6d ("bpf: allow extended BPF programs access skb fields")
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a1b14d27
    • T
      workqueue: handle NUMA_NO_NODE for unbound pool_workqueue lookup · d6e022f1
      Tejun Heo 提交于
      When looking up the pool_workqueue to use for an unbound workqueue,
      workqueue assumes that the target CPU is always bound to a valid NUMA
      node.  However, currently, when a CPU goes offline, the mapping is
      destroyed and cpu_to_node() returns NUMA_NO_NODE.
      
      This has always been broken but hasn't triggered often enough before
      874bbfe6 ("workqueue: make sure delayed work run in local cpu").
      After the commit, workqueue forcifully assigns the local CPU for
      delayed work items without explicit target CPU to fix a different
      issue.  This widens the window where CPU can go offline while a
      delayed work item is pending causing delayed work items dispatched
      with target CPU set to an already offlined CPU.  The resulting
      NUMA_NO_NODE mapping makes workqueue try to queue the work item on a
      NULL pool_workqueue and thus crash.
      
      While 874bbfe6 has been reverted for a different reason making the
      bug less visible again, it can still happen.  Fix it by mapping
      NUMA_NO_NODE to the default pool_workqueue from unbound_pwq_by_node().
      This is a temporary workaround.  The long term solution is keeping CPU
      -> NODE mapping stable across CPU off/online cycles which is being
      worked on.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NMike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Rafael J. Wysocki <rafael@kernel.org>
      Cc: Len Brown <len.brown@intel.com>
      Cc: stable@vger.kernel.org
      Link: http://lkml.kernel.org/g/1454424264.11183.46.camel@gmail.com
      Link: http://lkml.kernel.org/g/1453702100-2597-1-git-send-email-tangchen@cn.fujitsu.com
      d6e022f1
  12. 10 2月, 2016 3 次提交
    • T
      workqueue: implement "workqueue.debug_force_rr_cpu" debug feature · f303fccb
      Tejun Heo 提交于
      Workqueue used to guarantee local execution for work items queued
      without explicit target CPU.  The guarantee is gone now which can
      break some usages in subtle ways.  To flush out those cases, this
      patch implements a debug feature which forces round-robin CPU
      selection for all such work items.
      
      The debug feature defaults to off and can be enabled with a kernel
      parameter.  The default can be flipped with a debug config option.
      
      If you hit this commit during bisection, please refer to 041bd12e
      ("Revert "workqueue: make sure delayed work run in local cpu"") for
      more information and ping me.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      f303fccb
    • M
      workqueue: schedule WORK_CPU_UNBOUND work on wq_unbound_cpumask CPUs · ef557180
      Mike Galbraith 提交于
      WORK_CPU_UNBOUND work items queued to a bound workqueue always run
      locally.  This is a good thing normally, but not when the user has
      asked us to keep unbound work away from certain CPUs.  Round robin
      these to wq_unbound_cpumask CPUs instead, as perturbation avoidance
      trumps performance.
      
      tj: Cosmetic and comment changes.  WARN_ON_ONCE() dropped from empty
          (wq_unbound_cpumask AND cpu_online_mask).  If we want that, it
          should be done when config changes.
      Signed-off-by: NMike Galbraith <umgwanakikbuti@gmail.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      ef557180
    • T
      Revert "workqueue: make sure delayed work run in local cpu" · 041bd12e
      Tejun Heo 提交于
      This reverts commit 874bbfe6.
      
      Workqueue used to implicity guarantee that work items queued without
      explicit CPU specified are put on the local CPU.  Recent changes in
      timer broke the guarantee and led to vmstat breakage which was fixed
      by 176bed1d ("vmstat: explicitly schedule per-cpu work on the CPU
      we need it to run on").
      
      vmstat is the most likely to expose the issue and it's quite possible
      that there are other similar problems which are a lot more difficult
      to trigger.  As a preventive measure, 874bbfe6 ("workqueue: make
      sure delayed work run in local cpu") was applied to restore the local
      CPU guarnatee.  Unfortunately, the change exposed a bug in timer code
      which got fixed by 22b886dd ("timers: Use proper base migration in
      add_timer_on()").  Due to code restructuring, the commit couldn't be
      backported beyond certain point and stable kernels which only had
      874bbfe6 started crashing.
      
      The local CPU guarantee was accidental more than anything else and we
      want to get rid of it anyway.  As, with the vmstat case fixed,
      874bbfe6 is causing more problems than it's fixing, it has been
      decided to take the chance and officially break the guarantee by
      reverting the commit.  A debug feature will be added to force foreign
      CPU assignment to expose cases relying on the guarantee and fixes for
      the individual cases will be backported to stable as necessary.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Fixes: 874bbfe6 ("workqueue: make sure delayed work run in local cpu")
      Link: http://lkml.kernel.org/g/20160120211926.GJ10810@quack.suse.cz
      Cc: stable@vger.kernel.org
      Cc: Mike Galbraith <umgwanakikbuti@gmail.com>
      Cc: Henrique de Moraes Holschuh <hmh@hmh.eng.br>
      Cc: Daniel Bilik <daniel.bilik@neosystem.cz>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Shaohua Li <shli@fb.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Daniel Bilik <daniel.bilik@neosystem.cz>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Michal Hocko <mhocko@kernel.org>
      041bd12e
  13. 09 2月, 2016 1 次提交
    • D
      locking/lockdep: Fix stack trace caching logic · 8a5fd564
      Dmitry Vyukov 提交于
      check_prev_add() caches saved stack trace in static trace variable
      to avoid duplicate save_trace() calls in dependencies involving trylocks.
      But that caching logic contains a bug. We may not save trace on first
      iteration due to early return from check_prev_add(). Then on the
      second iteration when we actually need the trace we don't save it
      because we think that we've already saved it.
      
      Let check_prev_add() itself control when stack is saved.
      
      There is another bug. Trace variable is protected by graph lock.
      But we can temporary release graph lock during printing.
      
      Fix this by invalidating cached stack trace when we release graph lock.
      Signed-off-by: NDmitry Vyukov <dvyukov@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: glider@google.com
      Cc: kcc@google.com
      Cc: peter@hurleysoftware.com
      Cc: sasha.levin@oracle.com
      Link: http://lkml.kernel.org/r/1454593240-121647-1-git-send-email-dvyukov@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8a5fd564
  14. 06 2月, 2016 1 次提交
  15. 03 2月, 2016 3 次提交
    • R
      modules: fix longstanding /proc/kallsyms vs module insertion race. · 8244062e
      Rusty Russell 提交于
      For CONFIG_KALLSYMS, we keep two symbol tables and two string tables.
      There's one full copy, marked SHF_ALLOC and laid out at the end of the
      module's init section.  There's also a cut-down version that only
      contains core symbols and strings, and lives in the module's core
      section.
      
      After module init (and before we free the module memory), we switch
      the mod->symtab, mod->num_symtab and mod->strtab to point to the core
      versions.  We do this under the module_mutex.
      
      However, kallsyms doesn't take the module_mutex: it uses
      preempt_disable() and rcu tricks to walk through the modules, because
      it's used in the oops path.  It's also used in /proc/kallsyms.
      There's nothing atomic about the change of these variables, so we can
      get the old (larger!) num_symtab and the new symtab pointer; in fact
      this is what I saw when trying to reproduce.
      
      By grouping these variables together, we can use a
      carefully-dereferenced pointer to ensure we always get one or the
      other (the free of the module init section is already done in an RCU
      callback, so that's safe).  We allocate the init one at the end of the
      module init section, and keep the core one inside the struct module
      itself (it could also have been allocated at the end of the module
      core, but that's probably overkill).
      Reported-by: NWeilong Chen <chenweilong@huawei.com>
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=111541
      Cc: stable@kernel.org
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      8244062e
    • R
      module: wrapper for symbol name. · 2e7bac53
      Rusty Russell 提交于
      This trivial wrapper adds clarity and makes the following patch
      smaller.
      
      Cc: stable@kernel.org
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      2e7bac53
    • L
      modules: fix modparam async_probe request · 4355efbd
      Luis R. Rodriguez 提交于
      Commit f2411da7 ("driver-core: add driver module
      asynchronous probe support") added async probe support,
      in two forms:
      
        * in-kernel driver specification annotation
        * generic async_probe module parameter (modprobe foo async_probe)
      
      To support the generic kernel parameter parse_args() was
      extended via commit ecc86170 ("module: add extra
      argument for parse_params() callback") however commit
      failed to f2411da7 failed to add the required argument.
      
      This causes a crash then whenever async_probe generic
      module parameter is used. This was overlooked when the
      form in which in-kernel async probe support was reworked
      a bit... Fix this as originally intended.
      
      Cc: Hannes Reinecke <hare@suse.de>
      Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
      Cc: stable@vger.kernel.org (4.2+)
      Signed-off-by: NLuis R. Rodriguez <mcgrof@suse.com>
      Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> [minimized]
      4355efbd
  16. 01 2月, 2016 1 次提交
  17. 30 1月, 2016 3 次提交
    • Z
      pid: Fix spelling in comments · 840d6fe7
      Zhen Lei 提交于
      Accidentally discovered this typo when I studied this module.
      Signed-off-by: NZhen Lei <thunder.leizhen@huawei.com>
      Cc: Hanjun Guo <guohanjun@huawei.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tianhong Ding <dingtianhong@huawei.com>
      Cc: Xinwei Hu <huxinwei@huawei.com>
      Cc: Zefan Li <lizefan@huawei.com>
      Link: http://lkml.kernel.org/r/1454119457-11272-1-git-send-email-thunder.leizhen@huawei.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      840d6fe7
    • D
      devm_memremap_pages: fix vmem_altmap lifetime + alignment handling · eb7d78c9
      Dan Williams 提交于
      to_vmem_altmap() needs to return valid results until
      arch_remove_memory() completes.  It also needs to be valid for any pfn
      in a section regardless of whether that pfn maps to data.  This escape
      was a result of a bug in the unit test.
      
      The signature of this bug is that free_pagetable() fails to retrieve a
      vmem_altmap and goes off into the weeds:
      
       BUG: unable to handle kernel NULL pointer dereference at           (null)
       IP: [<ffffffff811d2629>] get_pfnblock_flags_mask+0x49/0x60
       [..]
       Call Trace:
        [<ffffffff811d3477>] free_hot_cold_page+0x97/0x1d0
        [<ffffffff811d367a>] __free_pages+0x2a/0x40
        [<ffffffff8191e669>] free_pagetable+0x8c/0xd4
        [<ffffffff8191ef4e>] remove_pagetable+0x37a/0x808
        [<ffffffff8191b210>] vmemmap_free+0x10/0x20
      
      Fixes: 4b94ffdc ("x86, mm: introduce vmem_altmap to augment vmemmap_populate()")
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Reported-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      eb7d78c9
    • T
      workqueue: skip flush dependency checks for legacy workqueues · 23d11a58
      Tejun Heo 提交于
      fca839c0 ("workqueue: warn if memory reclaim tries to flush
      !WQ_MEM_RECLAIM workqueue") implemented flush dependency warning which
      triggers if a PF_MEMALLOC task or WQ_MEM_RECLAIM workqueue tries to
      flush a !WQ_MEM_RECLAIM workquee.
      
      This assumes that workqueues marked with WQ_MEM_RECLAIM sit in memory
      reclaim path and making it depend on something which may need more
      memory to make forward progress can lead to deadlocks.  Unfortunately,
      workqueues created with the legacy create*_workqueue() interface
      always have WQ_MEM_RECLAIM regardless of whether they are depended
      upon memory reclaim or not.  These spurious WQ_MEM_RECLAIM markings
      cause spurious triggering of the flush dependency checks.
      
        WARNING: CPU: 0 PID: 6 at kernel/workqueue.c:2361 check_flush_dependency+0x138/0x144()
        workqueue: WQ_MEM_RECLAIM deferwq:deferred_probe_work_func is flushing !WQ_MEM_RECLAIM events:lru_add_drain_per_cpu
        ...
        Workqueue: deferwq deferred_probe_work_func
        [<c0017acc>] (unwind_backtrace) from [<c0013134>] (show_stack+0x10/0x14)
        [<c0013134>] (show_stack) from [<c0245f18>] (dump_stack+0x94/0xd4)
        [<c0245f18>] (dump_stack) from [<c0026f9c>] (warn_slowpath_common+0x80/0xb0)
        [<c0026f9c>] (warn_slowpath_common) from [<c0026ffc>] (warn_slowpath_fmt+0x30/0x40)
        [<c0026ffc>] (warn_slowpath_fmt) from [<c00390b8>] (check_flush_dependency+0x138/0x144)
        [<c00390b8>] (check_flush_dependency) from [<c0039ca0>] (flush_work+0x50/0x15c)
        [<c0039ca0>] (flush_work) from [<c00c51b0>] (lru_add_drain_all+0x130/0x180)
        [<c00c51b0>] (lru_add_drain_all) from [<c00f728c>] (migrate_prep+0x8/0x10)
        [<c00f728c>] (migrate_prep) from [<c00bfbc4>] (alloc_contig_range+0xd8/0x338)
        [<c00bfbc4>] (alloc_contig_range) from [<c00f8f18>] (cma_alloc+0xe0/0x1ac)
        [<c00f8f18>] (cma_alloc) from [<c001cac4>] (__alloc_from_contiguous+0x38/0xd8)
        [<c001cac4>] (__alloc_from_contiguous) from [<c001ceb4>] (__dma_alloc+0x240/0x278)
        [<c001ceb4>] (__dma_alloc) from [<c001cf78>] (arm_dma_alloc+0x54/0x5c)
        [<c001cf78>] (arm_dma_alloc) from [<c0355ea4>] (dmam_alloc_coherent+0xc0/0xec)
        [<c0355ea4>] (dmam_alloc_coherent) from [<c039cc4c>] (ahci_port_start+0x150/0x1dc)
        [<c039cc4c>] (ahci_port_start) from [<c0384734>] (ata_host_start.part.3+0xc8/0x1c8)
        [<c0384734>] (ata_host_start.part.3) from [<c03898dc>] (ata_host_activate+0x50/0x148)
        [<c03898dc>] (ata_host_activate) from [<c039d558>] (ahci_host_activate+0x44/0x114)
        [<c039d558>] (ahci_host_activate) from [<c039f05c>] (ahci_platform_init_host+0x1d8/0x3c8)
        [<c039f05c>] (ahci_platform_init_host) from [<c039e6bc>] (tegra_ahci_probe+0x448/0x4e8)
        [<c039e6bc>] (tegra_ahci_probe) from [<c0347058>] (platform_drv_probe+0x50/0xac)
        [<c0347058>] (platform_drv_probe) from [<c03458cc>] (driver_probe_device+0x214/0x2c0)
        [<c03458cc>] (driver_probe_device) from [<c0343cc0>] (bus_for_each_drv+0x60/0x94)
        [<c0343cc0>] (bus_for_each_drv) from [<c03455d8>] (__device_attach+0xb0/0x114)
        [<c03455d8>] (__device_attach) from [<c0344ab8>] (bus_probe_device+0x84/0x8c)
        [<c0344ab8>] (bus_probe_device) from [<c0344f48>] (deferred_probe_work_func+0x68/0x98)
        [<c0344f48>] (deferred_probe_work_func) from [<c003b738>] (process_one_work+0x120/0x3f8)
        [<c003b738>] (process_one_work) from [<c003ba48>] (worker_thread+0x38/0x55c)
        [<c003ba48>] (worker_thread) from [<c0040f14>] (kthread+0xdc/0xf4)
        [<c0040f14>] (kthread) from [<c000f778>] (ret_from_fork+0x14/0x3c)
      
      Fix it by marking workqueues created via create*_workqueue() with
      __WQ_LEGACY and disabling flush dependency checks on them.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-and-tested-by: NThierry Reding <thierry.reding@gmail.com>
      Link: http://lkml.kernel.org/g/20160126173843.GA11115@ulmo.nvidia.com
      Fixes: fca839c0 ("workqueue: warn if memory reclaim tries to flush !WQ_MEM_RECLAIM workqueue")
      23d11a58