1. 13 1月, 2016 1 次提交
  2. 11 1月, 2016 1 次提交
  3. 09 1月, 2016 1 次提交
    • D
      restrict /dev/mem to idle io memory ranges · 90a545e9
      Dan Williams 提交于
      This effectively promotes IORESOURCE_BUSY to IORESOURCE_EXCLUSIVE
      semantics by default.  If userspace really believes it is safe to access
      the memory region it can also perform the extra step of disabling an
      active driver.  This protects device address ranges with read side
      effects and otherwise directs userspace to use the driver.
      
      Persistent memory presents a large "mistake surface" to /dev/mem as now
      accidental writes can corrupt a filesystem.
      
      In general if a device driver is busily using a memory region it already
      informs other parts of the kernel to not touch it via
      request_mem_region().  /dev/mem should honor the same safety restriction
      by default.  Debugging a device driver from userspace becomes more
      difficult with this enabled.  Any application using /dev/mem or mmap of
      sysfs pci resources will now need to perform the extra step of either:
      
      1/ Disabling the driver, for example:
      
         echo <device id> > /dev/bus/<parent bus>/drivers/<driver name>/unbind
      
      2/ Rebooting with "iomem=relaxed" on the command line
      
      3/ Recompiling with CONFIG_IO_STRICT_DEVMEM=n
      
      Traditional users of /dev/mem like dosemu are unaffected because the
      first 1MB of memory is not subject to the IO_STRICT_DEVMEM restriction.
      Legacy X configurations use /dev/mem to talk to graphics hardware, but
      that functionality has since moved to kernel graphics drivers.
      
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Acked-by: NKees Cook <keescook@chromium.org>
      Acked-by: NIngo Molnar <mingo@redhat.com>
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      90a545e9
  4. 08 1月, 2016 4 次提交
    • Q
      ftrace: Fix the race between ftrace and insmod · 5156dca3
      Qiu Peiyang 提交于
      We hit ftrace_bug report when booting Android on a 64bit ATOM SOC chip.
      Basically, there is a race between insmod and ftrace_run_update_code.
      
      After load_module=>ftrace_module_init, another thread jumps in to call
      ftrace_run_update_code=>ftrace_arch_code_modify_prepare
                              =>set_all_modules_text_rw, to change all modules
      as RW. Since the new module is at MODULE_STATE_UNFORMED, the text attribute
      is not changed. Then, the 2nd thread goes ahead to change codes.
      However, load_module continues to call complete_formation=>set_section_ro_nx,
      then 2nd thread would fail when probing the module's TEXT.
      
      The patch fixes it by using notifier to delay the enabling of ftrace
      records to the time when module is at state MODULE_STATE_COMING.
      
      Link: http://lkml.kernel.org/r/567CE628.3000609@intel.comSigned-off-by: NQiu Peiyang <peiyangx.qiu@intel.com>
      Signed-off-by: NZhang Yanmin <yanmin.zhang@intel.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      5156dca3
    • S
      ftrace: Add infrastructure for delayed enabling of module functions · b7ffffbb
      Steven Rostedt (Red Hat) 提交于
      Qiu Peiyang pointed out that there's a race when enabling function tracing
      and loading a module. In order to make the modifications of converting nops
      in the prologue of functions into callbacks, the text needs to be converted
      from read-only to read-write. When enabling function tracing, the text
      permission is updated, the functions are modified, and then they are put
      back.
      
      When loading a module, the updates to convert function calls to mcount is
      done before the module text is set to read-only. But after it is done, the
      module text is visible by the function tracer. Thus we have the following
      race:
      
      	CPU 0			CPU 1
      	-----			-----
         start function tracing
         set text to read-write
      			     load_module
      			     add functions to ftrace
      			     set module text read-only
      
         update all functions to callbacks
         modify module functions too
         < Can't it's read-only >
      
      When this happens, ftrace detects the issue and disables itself till the
      next reboot.
      
      To fix this, a new DISABLED flag is added for ftrace records, which all
      module functions get when they are added. Then later, after the module code
      is all set, the records will have the DISABLED flag cleared, and they will
      be enabled if any callback wants all functions to be traced.
      
      Note, this doesn't add the delay to later. It simply changes the
      ftrace_module_init() to do both the setting of DISABLED records, and then
      immediately calls the enable code. This helps with testing this new code as
      it has the same behavior as previously. Another change will come after this
      to have the ftrace_module_enable() called after the text is set to
      read-only.
      
      Cc: Qiu Peiyang <peiyangx.qiu@intel.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      b7ffffbb
    • S
      ftrace/module: Call clean up function when module init fails early · 049fb9bd
      Steven Rostedt (Red Hat) 提交于
      If the module init code fails after calling ftrace_module_init() and before
      calling do_init_module(), we can suffer from a memory leak. This is because
      ftrace_module_init() allocates pages to store the locations that ftrace
      hooks are placed in the module text. If do_init_module() fails, it still
      calls the MODULE_GOING notifiers which will tell ftrace to do a clean up of
      the pages it allocated for the module. But if load_module() fails before
      then, the pages allocated by ftrace_module_init() will never be freed.
      
      Call ftrace_release_mod() on the module if load_module() fails before
      getting to do_init_module().
      
      Link: http://lkml.kernel.org/r/567CEA31.1070507@intel.comReported-by: N"Qiu, PeiyangX" <peiyangx.qiu@intel.com>
      Fixes: a949ae56 "ftrace/module: Hardcode ftrace_module_init() call into load_module()"
      Cc: stable@vger.kernel.org # v2.6.38+
      Acked-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      049fb9bd
    • W
      workqueue: simplify the apply_workqueue_attrs_locked() · 6201171e
      wanghaibin 提交于
      If the apply_wqattrs_prepare() returns NULL, it has already cleaned up
      the related resources, so it can return directly and avoid calling the
      clean up function again.
      
      This doesn't introduce any functional changes.
      Signed-off-by: Nwanghaibin <wanghaibin.wang@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      6201171e
  5. 06 1月, 2016 7 次提交
    • P
      perf/core: Collapse more IPI loops · 7b648018
      Peter Zijlstra 提交于
      This patch collapses the two 'hard' cases, which are
      perf_event_{dis,en}able().
      
      I cannot seem to convince myself the current code is correct.
      
      So starting with perf_event_disable(); we don't strictly need to test
      for event->state == ACTIVE, ctx->is_active is enough. If the event is
      not scheduled while the ctx is, __perf_event_disable() still does the
      right thing.  Its a little less efficient to IPI in that case,
      over-all simpler.
      
      For perf_event_enable(); the same goes, but I think that's actually
      broken in its current form. The current condition is: ctx->is_active
      && event->state == OFF, that means it doesn't do anything when
      !ctx->active && event->state == OFF. This is wrong, it should still
      mark the event INACTIVE in that case, otherwise we'll still not try
      and schedule the event once the context becomes active again.
      
      This patch implements the two function using the new
      event_function_call() and does away with the tricky event->state
      tests.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NAlexander Shishkin <alexander.shishkin@intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      7b648018
    • Y
      sched/fair: Fix new task's load avg removed from source CPU in wake_up_new_task() · 0905f04e
      Yuyang Du 提交于
      If a newly created task is selected to go to a different CPU in fork
      balance when it wakes up the first time, its load averages should
      not be removed from the source CPU since they are never added to
      it before. The same is also applicable to a never used group entity.
      
      Fix it in remove_entity_load_avg(): when entity's last_update_time
      is 0, simply return. This should precisely identify the case in
      question, because in other migrations, the last_update_time is set
      to 0 after remove_entity_load_avg().
      Reported-by: NSteve Muckle <steve.muckle@linaro.org>
      Signed-off-by: NYuyang Du <yuyang.du@intel.com>
      [peterz: cfs_rq_last_update_time]
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
      Cc: Juri Lelli <Juri.Lelli@arm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Morten Rasmussen <morten.rasmussen@arm.com>
      Cc: Patrick Bellasi <patrick.bellasi@arm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      Link: http://lkml.kernel.org/r/20151216233427.GJ28098@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      0905f04e
    • W
      sched/deadline: Fix the earliest_dl.next logic · 7d92de3a
      Wanpeng Li 提交于
      earliest_dl.next should cache deadline of the earliest ready task that
      is also enqueued in the pushable rbtree, as pull algorithm uses this
      information to find candidates for migration: if the earliest_dl.next
      deadline of source rq is earlier than the earliest_dl.curr deadline of
      destination rq, the task from the source rq can be pulled.
      
      However, current implementation only guarantees that earliest_dl.next is
      the deadline of the next ready task instead of the next pushable task;
      which will result in potentially holding both rqs' lock and find nothing
      to migrate because of affinity constraints. In addition, current logic
      doesn't update the next candidate for pushing in pick_next_task_dl(),
      even if the running task is never eligible.
      
      This patch fixes both problems by updating earliest_dl.next when
      pushable dl task is enqueued/dequeued, similar to what we already do for
      RT.
      Tested-by: NLuca Abeni <luca.abeni@unitn.it>
      Signed-off-by: NWanpeng Li <wanpeng.li@hotmail.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NJuri Lelli <juri.lelli@arm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1449135730-27202-1-git-send-email-wanpeng.li@hotmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7d92de3a
    • S
      sched/core: Reset task's lockless wake-queues on fork() · 093e5840
      Sebastian Andrzej Siewior 提交于
      In the following commit:
      
        76751049 ("sched: Implement lockless wake-queues")
      
      we gained lockless wake-queues.
      
      The -RT kernel managed to lockup itself with those. There could be multiple
      attempts for task X to enqueue it for a wakeup _even_ if task X is already
      running.
      
      The reason is that task X could be runnable but not yet on CPU. The the
      task performing the wakeup did not leave the CPU it could performe
      multiple wakeups.
      
      With the proper timming task X could be running and enqueued for a
      wakeup. If this happens while X is performing a fork() then its its
      child will have a !NULL `wake_q` member copied.
      
      This is not a problem as long as the child task does not participate in
      lockless wakeups :)
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Davidlohr Bueso <dbueso@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 76751049 ("sched: Implement lockless wake-queues")
      Link: http://lkml.kernel.org/r/20151221171710.GA5499@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      093e5840
    • A
      sched/fair: Fix multiplication overflow on 32-bit systems · 9e0e83a1
      Andrey Ryabinin 提交于
      Make 'r' 64-bit type to avoid overflow in 'r * LOAD_AVG_MAX'
      on 32-bit systems:
      
      	UBSAN: Undefined behaviour in kernel/sched/fair.c:2785:18
      	signed integer overflow:
      	87950 * 47742 cannot be represented in type 'int'
      
      The most likely effect of this bug are bad load average numbers
      resulting in weird scheduling. It's also likely that this can
      persist for a longer time - until the system goes idle for
      a long time so that all load avg numbers get reset.
      
      [ This is the CFS load average metric, not the procfs output, which
        is separate. ]
      Signed-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 9d89c257 ("sched/fair: Rewrite runnable load and utilization average tracking")
      Link: http://lkml.kernel.org/r/1450097243-30137-1-git-send-email-aryabinin@virtuozzo.com
      [ Improved the changelog. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      9e0e83a1
    • P
      perf: Fix race in swevent hash · 12ca6ad2
      Peter Zijlstra 提交于
      There's a race on CPU unplug where we free the swevent hash array
      while it can still have events on. This will result in a
      use-after-free which is BAD.
      
      Simply do not free the hash array on unplug. This leaves the thing
      around and no use-after-free takes place.
      
      When the last swevent dies, we do a for_each_possible_cpu() iteration
      anyway to clean these up, at which time we'll free it, so no leakage
      will occur.
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Tested-by: NSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      12ca6ad2
    • P
      perf: Fix race in perf_event_exec() · c1274499
      Peter Zijlstra 提交于
      I managed to tickle this warning:
      
        [ 2338.884942] ------------[ cut here ]------------
        [ 2338.890112] WARNING: CPU: 13 PID: 35162 at ../kernel/events/core.c:2702 task_ctx_sched_out+0x6b/0x80()
        [ 2338.900504] Modules linked in:
        [ 2338.903933] CPU: 13 PID: 35162 Comm: bash Not tainted 4.4.0-rc4-dirty #244
        [ 2338.911610] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013
        [ 2338.923071]  ffffffff81f1468e ffff8807c6457cb8 ffffffff815c680c 0000000000000000
        [ 2338.931382]  ffff8807c6457cf0 ffffffff810c8a56 ffffe8ffff8c1bd0 ffff8808132ed400
        [ 2338.939678]  0000000000000286 ffff880813170380 ffff8808132ed400 ffff8807c6457d00
        [ 2338.947987] Call Trace:
        [ 2338.950726]  [<ffffffff815c680c>] dump_stack+0x4e/0x82
        [ 2338.956474]  [<ffffffff810c8a56>] warn_slowpath_common+0x86/0xc0
        [ 2338.963195]  [<ffffffff810c8b4a>] warn_slowpath_null+0x1a/0x20
        [ 2338.969720]  [<ffffffff811a49cb>] task_ctx_sched_out+0x6b/0x80
        [ 2338.976244]  [<ffffffff811a62d2>] perf_event_exec+0xe2/0x180
        [ 2338.982575]  [<ffffffff8121fb6f>] setup_new_exec+0x6f/0x1b0
        [ 2338.988810]  [<ffffffff8126de83>] load_elf_binary+0x393/0x1660
        [ 2338.995339]  [<ffffffff811dc772>] ? get_user_pages+0x52/0x60
        [ 2339.001669]  [<ffffffff8121e297>] search_binary_handler+0x97/0x200
        [ 2339.008581]  [<ffffffff8121f8b3>] do_execveat_common.isra.33+0x543/0x6e0
        [ 2339.016072]  [<ffffffff8121fcea>] SyS_execve+0x3a/0x50
        [ 2339.021819]  [<ffffffff819fc165>] stub_execve+0x5/0x5
        [ 2339.027469]  [<ffffffff819fbeb2>] ? entry_SYSCALL_64_fastpath+0x12/0x71
        [ 2339.034860] ---[ end trace ee1337c59a0ddeac ]---
      
      Which is a WARN_ON_ONCE() indicating that cpuctx->task_ctx is not
      what we expected it to be.
      
      This is because context switches can swap the task_struct::perf_event_ctxp[]
      pointer around. Therefore you have to either disable preemption when looking
      at current, or hold ctx->lock.
      
      Fix perf_event_enable_on_exec(), it loads current->perf_event_ctxp[]
      before disabling interrupts, therefore a preemption in the right place
      can swap contexts around and we're using the wrong one.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Kostya Serebryany <kcc@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: syzkaller <syzkaller@googlegroups.com>
      Link: http://lkml.kernel.org/r/20151210195740.GG6357@twins.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      c1274499
  6. 05 1月, 2016 2 次提交
    • R
      PM / sleep: Add support for read-only sysfs attributes · a1e9ca69
      Rafael J. Wysocki 提交于
      Some sysfs attributes in /sys/power/ should really be read-only,
      so add support for that, convert those attributes to read-only
      and drop the stub .show() routines from them.
      Original-by: NSergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      a1e9ca69
    • Q
      tracing: Fix setting of start_index in find_next() · f36d1be2
      Qiu Peiyang 提交于
      When we do cat /sys/kernel/debug/tracing/printk_formats, we hit kernel
      panic at t_show.
      
      general protection fault: 0000 [#1] PREEMPT SMP
      CPU: 0 PID: 2957 Comm: sh Tainted: G W  O 3.14.55-x86_64-01062-gd4acdc7 #2
      RIP: 0010:[<ffffffff811375b2>]
       [<ffffffff811375b2>] t_show+0x22/0xe0
      RSP: 0000:ffff88002b4ebe80  EFLAGS: 00010246
      RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000004
      RDX: 0000000000000004 RSI: ffffffff81fd26a6 RDI: ffff880032f9f7b1
      RBP: ffff88002b4ebe98 R08: 0000000000001000 R09: 000000000000ffec
      R10: 0000000000000000 R11: 000000000000000f R12: ffff880004d9b6c0
      R13: 7365725f6d706400 R14: ffff880004d9b6c0 R15: ffffffff82020570
      FS:  0000000000000000(0000) GS:ffff88003aa00000(0063) knlGS:00000000f776bc40
      CS:  0010 DS: 002b ES: 002b CR0: 0000000080050033
      CR2: 00000000f6c02ff0 CR3: 000000002c2b3000 CR4: 00000000001007f0
      Call Trace:
       [<ffffffff811dc076>] seq_read+0x2f6/0x3e0
       [<ffffffff811b749b>] vfs_read+0x9b/0x160
       [<ffffffff811b7f69>] SyS_read+0x49/0xb0
       [<ffffffff81a3a4b9>] ia32_do_call+0x13/0x13
       ---[ end trace 5bd9eb630614861e ]---
      Kernel panic - not syncing: Fatal exception
      
      When the first time find_next calls find_next_mod_format, it should
      iterate the trace_bprintk_fmt_list to find the first print format of
      the module. However in current code, start_index is smaller than *pos
      at first, and code will not iterate the list. Latter container_of will
      get the wrong address with former v, which will cause mod_fmt be a
      meaningless object and so is the returned mod_fmt->fmt.
      
      This patch will fix it by correcting the start_index. After fixed,
      when the first time calls find_next_mod_format, start_index will be
      equal to *pos, and code will iterate the trace_bprintk_fmt_list to
      get the right module printk format, so is the returned mod_fmt->fmt.
      
      Link: http://lkml.kernel.org/r/5684B900.9000309@intel.com
      
      Cc: stable@vger.kernel.org # 3.12+
      Fixes: 102c9323 "tracing: Add __tracepoint_string() to export string pointers"
      Signed-off-by: NQiu Peiyang <peiyangx.qiu@intel.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      f36d1be2
  7. 04 1月, 2016 2 次提交
  8. 02 1月, 2016 1 次提交
  9. 30 12月, 2015 3 次提交
  10. 29 12月, 2015 1 次提交
  11. 24 12月, 2015 8 次提交
  12. 21 12月, 2015 2 次提交
  13. 20 12月, 2015 7 次提交