1. 28 6月, 2015 1 次提交
  2. 23 6月, 2015 3 次提交
    • D
      module: add per-module param_lock · b51d23e4
      Dan Streetman 提交于
      Add a "param_lock" mutex to each module, and update params.c to use
      the correct built-in or module mutex while locking kernel params.
      Remove the kparam_block_sysfs_r/w() macros, replace them with direct
      calls to kernel_param_[un]lock(module).
      
      The kernel param code currently uses a single mutex to protect
      modification of any and all kernel params.  While this generally works,
      there is one specific problem with it; a module callback function
      cannot safely load another module, i.e. with request_module() or even
      with indirect calls such as crypto_has_alg().  If the module to be
      loaded has any of its params configured (e.g. with a /etc/modprobe.d/*
      config file), then the attempt will result in a deadlock between the
      first module param callback waiting for modprobe, and modprobe trying to
      lock the single kernel param mutex to set the new module's param.
      
      This fixes that by using per-module mutexes, so that each individual module
      is protected against concurrent changes in its own kernel params, but is
      not blocked by changes to other module params.  All built-in modules
      continue to use the built-in mutex, since they will always be loaded at
      runtime and references (e.g. request_module(), crypto_has_alg()) to them
      will never cause load-time param changing.
      
      This also simplifies the interface used by modules to block sysfs access
      to their params; while there are currently functions to block and unblock
      sysfs param access which are split up by read and write and expect a single
      kernel param to be passed, their actual operation is identical and applies
      to all params, not just the one passed to them; they simply lock and unlock
      the global param mutex.  They are replaced with direct calls to
      kernel_param_[un]lock(THIS_MODULE), which locks THIS_MODULE's param_lock, or
      if the module is built-in, it locks the built-in mutex.
      Suggested-by: NRusty Russell <rusty@rustcorp.com.au>
      Signed-off-by: NDan Streetman <ddstreet@ieee.org>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      b51d23e4
    • D
      module: make perm const · 5104b7d7
      Dan Streetman 提交于
      Change the struct kernel_param.perm field to a const, as it should never
      be changed.
      Signed-off-by: NDan Streetman <ddstreet@ieee.org>
      Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (cut from larger patch)
      5104b7d7
    • R
      params: suppress unused variable error, warn once just in case code changes. · 74c3dea3
      Rusty Russell 提交于
      It shouldn't fail due to OOM (it's boot time), and already warns if we
      get two identical names.  But you never know what the future holds, and
      WARN_ON_ONCE() keeps gcc happy with minimal code.
      Reported-by: NLouis Langholtz <lou_langholtz@me.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      74c3dea3
  3. 28 5月, 2015 13 次提交
  4. 27 5月, 2015 2 次提交
    • P
      module, jump_label: Fix module locking · bed831f9
      Peter Zijlstra 提交于
      As per the module core lockdep annotations in the coming patch:
      
      [   18.034047] ---[ end trace 9294429076a9c673 ]---
      [   18.047760] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013
      [   18.059228]  ffffffff817d8676 ffff880036683c38 ffffffff8157e98b 0000000000000001
      [   18.067541]  0000000000000000 ffff880036683c78 ffffffff8105fbc7 ffff880036683c68
      [   18.075851]  ffffffffa0046b08 0000000000000000 ffffffffa0046d00 ffffffffa0046cc8
      [   18.084173] Call Trace:
      [   18.086906]  [<ffffffff8157e98b>] dump_stack+0x4f/0x7b
      [   18.092649]  [<ffffffff8105fbc7>] warn_slowpath_common+0x97/0xe0
      [   18.099361]  [<ffffffff8105fc2a>] warn_slowpath_null+0x1a/0x20
      [   18.105880]  [<ffffffff810ee502>] __module_address+0x1d2/0x1e0
      [   18.112400]  [<ffffffff81161153>] jump_label_module_notify+0x143/0x1e0
      [   18.119710]  [<ffffffff810814bf>] notifier_call_chain+0x4f/0x70
      [   18.126326]  [<ffffffff8108160e>] __blocking_notifier_call_chain+0x5e/0x90
      [   18.134009]  [<ffffffff81081656>] blocking_notifier_call_chain+0x16/0x20
      [   18.141490]  [<ffffffff810f0f00>] load_module+0x1b50/0x2660
      [   18.147720]  [<ffffffff810f1ade>] SyS_init_module+0xce/0x100
      [   18.154045]  [<ffffffff81587429>] system_call_fastpath+0x12/0x17
      [   18.160748] ---[ end trace 9294429076a9c674 ]---
      
      Jump labels is not doing it right; fix this.
      
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jason Baron <jbaron@akamai.com>
      Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      bed831f9
    • P
      module: Annotate module version magic · 926a59b1
      Peter Zijlstra 提交于
      Due to the new lockdep checks in the coming patch, we go:
      
      [    9.759380] ------------[ cut here ]------------
      [    9.759389] WARNING: CPU: 31 PID: 597 at ../kernel/module.c:216 each_symbol_section+0x121/0x130()
      [    9.759391] Modules linked in:
      [    9.759393] CPU: 31 PID: 597 Comm: modprobe Not tainted 4.0.0-rc1+ #65
      [    9.759393] Hardware name: Intel Corporation S2600GZ/S2600GZ, BIOS SE5C600.86B.02.02.0002.122320131210 12/23/2013
      [    9.759396]  ffffffff817d8676 ffff880424567ca8 ffffffff8157e98b 0000000000000001
      [    9.759398]  0000000000000000 ffff880424567ce8 ffffffff8105fbc7 ffff880424567cd8
      [    9.759400]  0000000000000000 ffffffff810ec160 ffff880424567d40 0000000000000000
      [    9.759400] Call Trace:
      [    9.759407]  [<ffffffff8157e98b>] dump_stack+0x4f/0x7b
      [    9.759410]  [<ffffffff8105fbc7>] warn_slowpath_common+0x97/0xe0
      [    9.759412]  [<ffffffff810ec160>] ? section_objs+0x60/0x60
      [    9.759414]  [<ffffffff8105fc2a>] warn_slowpath_null+0x1a/0x20
      [    9.759415]  [<ffffffff810ed9c1>] each_symbol_section+0x121/0x130
      [    9.759417]  [<ffffffff810eda01>] find_symbol+0x31/0x70
      [    9.759420]  [<ffffffff810ef5bf>] load_module+0x20f/0x2660
      [    9.759422]  [<ffffffff8104ef10>] ? __do_page_fault+0x190/0x4e0
      [    9.759426]  [<ffffffff815880ec>] ? retint_restore_args+0x13/0x13
      [    9.759427]  [<ffffffff815880ec>] ? retint_restore_args+0x13/0x13
      [    9.759433]  [<ffffffff810ae73d>] ? trace_hardirqs_on_caller+0x11d/0x1e0
      [    9.759437]  [<ffffffff812fcc0e>] ? trace_hardirqs_on_thunk+0x3a/0x3f
      [    9.759439]  [<ffffffff815880ec>] ? retint_restore_args+0x13/0x13
      [    9.759441]  [<ffffffff810f1ade>] SyS_init_module+0xce/0x100
      [    9.759443]  [<ffffffff81587429>] system_call_fastpath+0x12/0x17
      [    9.759445] ---[ end trace 9294429076a9c644 ]---
      
      As per the comment this site should be fine, but lets wrap it in
      preempt_disable() anyhow to placate lockdep.
      
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
      926a59b1
  5. 20 5月, 2015 1 次提交
  6. 19 5月, 2015 2 次提交
    • S
      sched: always use blk_schedule_flush_plug in io_schedule_out · 10d784ea
      Shaohua Li 提交于
      block plug callback could sleep, so we introduce a parameter
      'from_schedule' and corresponding drivers can use it to destinguish a
      schedule plug flush or a plug finish. Unfortunately io_schedule_out
      still uses blk_flush_plug(). This causes below output (Note, I added a
      might_sleep() in raid1_unplug to make it trigger faster, but the whole
      thing doesn't matter if I add might_sleep). In raid1/10, this can cause
      deadlock.
      
      This patch makes io_schedule_out always uses blk_schedule_flush_plug.
      This should only impact drivers (as far as I know, raid 1/10) which are
      sensitive to the 'from_schedule' parameter.
      
      [  370.817949] ------------[ cut here ]------------
      [  370.817960] WARNING: CPU: 7 PID: 145 at ../kernel/sched/core.c:7306 __might_sleep+0x7f/0x90()
      [  370.817969] do not call blocking ops when !TASK_RUNNING; state=2 set at [<ffffffff81092fcf>] prepare_to_wait+0x2f/0x90
      [  370.817971] Modules linked in: raid1
      [  370.817976] CPU: 7 PID: 145 Comm: kworker/u16:9 Tainted: G        W       4.0.0+ #361
      [  370.817977] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140709_153802- 04/01/2014
      [  370.817983] Workqueue: writeback bdi_writeback_workfn (flush-9:1)
      [  370.817985]  ffffffff81cd83be ffff8800ba8cb298 ffffffff819dd7af 0000000000000001
      [  370.817988]  ffff8800ba8cb2e8 ffff8800ba8cb2d8 ffffffff81051afc ffff8800ba8cb2c8
      [  370.817990]  ffffffffa00061a8 000000000000041e 0000000000000000 ffff8800ba8cba28
      [  370.817993] Call Trace:
      [  370.817999]  [<ffffffff819dd7af>] dump_stack+0x4f/0x7b
      [  370.818002]  [<ffffffff81051afc>] warn_slowpath_common+0x8c/0xd0
      [  370.818004]  [<ffffffff81051b86>] warn_slowpath_fmt+0x46/0x50
      [  370.818006]  [<ffffffff81092fcf>] ? prepare_to_wait+0x2f/0x90
      [  370.818008]  [<ffffffff81092fcf>] ? prepare_to_wait+0x2f/0x90
      [  370.818010]  [<ffffffff810776ef>] __might_sleep+0x7f/0x90
      [  370.818014]  [<ffffffffa0000c03>] raid1_unplug+0xd3/0x170 [raid1]
      [  370.818024]  [<ffffffff81421d9a>] blk_flush_plug_list+0x8a/0x1e0
      [  370.818028]  [<ffffffff819e3550>] ? bit_wait+0x50/0x50
      [  370.818031]  [<ffffffff819e21b0>] io_schedule_timeout+0x130/0x140
      [  370.818033]  [<ffffffff819e3586>] bit_wait_io+0x36/0x50
      [  370.818034]  [<ffffffff819e31b5>] __wait_on_bit+0x65/0x90
      [  370.818041]  [<ffffffff8125b67c>] ? ext4_read_block_bitmap_nowait+0xbc/0x630
      [  370.818043]  [<ffffffff819e3550>] ? bit_wait+0x50/0x50
      [  370.818045]  [<ffffffff819e3302>] out_of_line_wait_on_bit+0x72/0x80
      [  370.818047]  [<ffffffff810935e0>] ? autoremove_wake_function+0x40/0x40
      [  370.818050]  [<ffffffff811de744>] __wait_on_buffer+0x44/0x50
      [  370.818053]  [<ffffffff8125ae80>] ext4_wait_block_bitmap+0xe0/0xf0
      [  370.818058]  [<ffffffff812975d6>] ext4_mb_init_cache+0x206/0x790
      [  370.818062]  [<ffffffff8114bc6c>] ? lru_cache_add+0x1c/0x50
      [  370.818064]  [<ffffffff81297c7e>] ext4_mb_init_group+0x11e/0x200
      [  370.818066]  [<ffffffff81298231>] ext4_mb_load_buddy+0x341/0x360
      [  370.818068]  [<ffffffff8129a1a3>] ext4_mb_find_by_goal+0x93/0x2f0
      [  370.818070]  [<ffffffff81295b54>] ? ext4_mb_normalize_request+0x1e4/0x5b0
      [  370.818072]  [<ffffffff8129ab67>] ext4_mb_regular_allocator+0x67/0x460
      [  370.818074]  [<ffffffff81295b54>] ? ext4_mb_normalize_request+0x1e4/0x5b0
      [  370.818076]  [<ffffffff8129ca4b>] ext4_mb_new_blocks+0x4cb/0x620
      [  370.818079]  [<ffffffff81290956>] ext4_ext_map_blocks+0x4c6/0x14d0
      [  370.818081]  [<ffffffff812a4d4e>] ? ext4_es_lookup_extent+0x4e/0x290
      [  370.818085]  [<ffffffff8126399d>] ext4_map_blocks+0x14d/0x4f0
      [  370.818088]  [<ffffffff81266fbd>] ext4_writepages+0x76d/0xe50
      [  370.818094]  [<ffffffff81149691>] do_writepages+0x21/0x50
      [  370.818097]  [<ffffffff811d5c00>] __writeback_single_inode+0x60/0x490
      [  370.818099]  [<ffffffff811d630a>] writeback_sb_inodes+0x2da/0x590
      [  370.818103]  [<ffffffff811abf4b>] ? trylock_super+0x1b/0x50
      [  370.818105]  [<ffffffff811abf4b>] ? trylock_super+0x1b/0x50
      [  370.818107]  [<ffffffff811d665f>] __writeback_inodes_wb+0x9f/0xd0
      [  370.818109]  [<ffffffff811d69db>] wb_writeback+0x34b/0x3c0
      [  370.818111]  [<ffffffff811d70df>] bdi_writeback_workfn+0x23f/0x550
      [  370.818116]  [<ffffffff8106bbd8>] process_one_work+0x1c8/0x570
      [  370.818117]  [<ffffffff8106bb5b>] ? process_one_work+0x14b/0x570
      [  370.818119]  [<ffffffff8106c09b>] worker_thread+0x11b/0x470
      [  370.818121]  [<ffffffff8106bf80>] ? process_one_work+0x570/0x570
      [  370.818124]  [<ffffffff81071868>] kthread+0xf8/0x110
      [  370.818126]  [<ffffffff81071770>] ? kthread_create_on_node+0x210/0x210
      [  370.818129]  [<ffffffff819e9322>] ret_from_fork+0x42/0x70
      [  370.818131]  [<ffffffff81071770>] ? kthread_create_on_node+0x210/0x210
      [  370.818132] ---[ end trace 7b4deb71e68b6605 ]---
      
      V2: don't change ->in_iowait
      
      Cc: NeilBrown <neilb@suse.de>
      Signed-off-by: NShaohua Li <shli@fb.com>
      Reviewed-by: NJeff Moyer <jmoyer@redhat.com>
      Signed-off-by: NJens Axboe <axboe@fb.com>
      10d784ea
    • P
      watchdog: Fix merge 'conflict' · ab992dc3
      Peter Zijlstra 提交于
      Two watchdog changes that came through different trees had a non
      conflicting conflict, that is, one changed the semantics of a variable
      but no actual code conflict happened. So the merge appeared fine, but
      the resulting code did not behave as expected.
      
      Commit 195daf66 ("watchdog: enable the new user interface of the
      watchdog mechanism") changes the semantics of watchdog_user_enabled,
      which thereafter is only used by the functions introduced by
      b3738d29 ("watchdog: Add watchdog enable/disable all functions").
      
      There further appears to be a distinct lack of serialization between
      setting and using watchdog_enabled, so perhaps we should wrap the
      {en,dis}able_all() things in watchdog_proc_mutex.
      
      This patch fixes a s2r failure reported by Michal; which I cannot
      readily explain. But this does make the code internally consistent
      again.
      Reported-and-tested-by: NMichal Hocko <mhocko@suse.cz>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ab992dc3
  7. 13 5月, 2015 1 次提交
  8. 08 5月, 2015 3 次提交
  9. 07 5月, 2015 1 次提交
  10. 01 5月, 2015 1 次提交
  11. 29 4月, 2015 1 次提交
  12. 28 4月, 2015 1 次提交
  13. 27 4月, 2015 1 次提交
  14. 25 4月, 2015 2 次提交
  15. 23 4月, 2015 1 次提交
    • M
      kexec: allocate the kexec control page with KEXEC_CONTROL_MEMORY_GFP · 7e01b5ac
      Martin Schwidefsky 提交于
      Introduce KEXEC_CONTROL_MEMORY_GFP to allow the architecture code
      to override the gfp flags of the allocation for the kexec control
      page. The loop in kimage_alloc_normal_control_pages allocates pages
      with GFP_KERNEL until a page is found that happens to have an
      address smaller than the KEXEC_CONTROL_MEMORY_LIMIT. On systems
      with a large memory size but a small KEXEC_CONTROL_MEMORY_LIMIT
      the loop will keep allocating memory until the oom killer steps in.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      7e01b5ac
  16. 20 4月, 2015 1 次提交
  17. 17 4月, 2015 5 次提交
    • S
      tracing: Fix possible out of bounds memory access when parsing enums · 3193899d
      Steven Rostedt (Red Hat) 提交于
      The code that replaces the enum names with the enum values in the
      tracepoints' format files could possible miss the end of string nul
      character. This was caused by processing things like backslashes, quotes
      and other tokens. After processing the tokens, a check for the nul
      character needed to be done before continuing the loop, because the loop
      incremented the pointer before doing the check, which could bypass the nul
      character.
      
      Link: http://lkml.kernel.org/r/552E661D.5060502@oracle.com
      
      Reported-by: Sasha Levin <sasha.levin@oracle.com> # via KASan
      Tested-by: NAndrey Ryabinin <a.ryabinin@samsung.com>
      Fixes: 0c564a53 "tracing: Add TRACE_DEFINE_ENUM() macro to map enums to their values"
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      3193899d
    • D
      oprofile: reduce mmap_sem hold for mm->exe_file · 11163348
      Davidlohr Bueso 提交于
      sync_buffer() needs the mmap_sem for two distinct operations, both only
      occurring upon user context switch handling:
      
       1) Dealing with the exe_file.
      
       2) Adding the dcookie data as we need to lookup the vma that
         backs it. This is done via add_sample() and add_data().
      
      This patch isolates 1), for it will no longer need the mmap_sem for
      serialization.  However, for now, make of the more standard
      get_mm_exe_file(), requiring only holding the mmap_sem to read the value,
      and relying on reference counting to make sure that the exe file won't
      dissappear underneath us while doing the get dcookie.
      
      As a consequence, for 2) we move the mmap_sem locking into where we really
      need it, in lookup_dcookie().  The benefits are twofold: reduce mmap_sem
      hold times, and cleaner code.
      
      [akpm@linux-foundation.org: export get_mm_exe_file for arch/x86/oprofile/oprofile.ko]
      Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
      Cc: Robert Richter <rric@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      11163348
    • A
      gcov: fix softlockups · 9d796e66
      Andrey Ryabinin 提交于
      gcov profiling if enabled with other heavy compile-time instrumentation
      like KASan could trigger following softlockups:
      
        NMI watchdog: BUG: soft lockup - CPU#0 stuck for 22s! [swapper/0:1]
        Modules linked in:
        irq event stamp: 22823276
        hardirqs last  enabled at (22823275): [<ffffffff86e8d10d>] mutex_lock_nested+0x7d9/0x930
        hardirqs last disabled at (22823276): [<ffffffff86e9521d>] apic_timer_interrupt+0x6d/0x80
        softirqs last  enabled at (22823172): [<ffffffff811ed969>] __do_softirq+0x4db/0x729
        softirqs last disabled at (22823167): [<ffffffff811edfcf>] irq_exit+0x7d/0x15b
        CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W       3.19.0-05245-gbb33326-dirty #3
        Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.7.5.1-0-g8936dbb-20141113_115728-nilsson.home.kraxel.org 04/01/2014
        task: ffff88006cba8000 ti: ffff88006cbb0000 task.ti: ffff88006cbb0000
        RIP: kasan_mem_to_shadow+0x1e/0x1f
        Call Trace:
          strcmp+0x28/0x70
          get_node_by_name+0x66/0x99
          gcov_event+0x4f/0x69e
          gcov_enable_events+0x54/0x7b
          gcov_fs_init+0xf8/0x134
          do_one_initcall+0x1b2/0x288
          kernel_init_freeable+0x467/0x580
          kernel_init+0x15/0x18b
          ret_from_fork+0x7c/0xb0
        Kernel panic - not syncing: softlockup: hung tasks
      
      Fix this by sticking cond_resched() in gcov_enable_events().
      Signed-off-by: NAndrey Ryabinin <a.ryabinin@samsung.com>
      Reported-by: NFengguang Wu <fengguang.wu@intel.com>
      Cc: Peter Oberparleiter <oberpar@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9d796e66
    • H
      kernel/sysctl.c: detect overflows when converting to int · 230633d1
      Heinrich Schuchardt 提交于
      When converting unsigned long to int overflows may occur.  These currently
      are not detected when writing to the sysctl file system.
      
      E.g. on a system where int has 32 bits and long has 64 bits
        echo 0x800001234 > /proc/sys/kernel/threads-max
      has the same effect as
        echo 0x1234 > /proc/sys/kernel/threads-max
      
      The patch adds the missing check in do_proc_dointvec_conv.
      
      With the patch an overflow will result in an error EINVAL when writing to
      the the sysctl file system.
      Signed-off-by: NHeinrich Schuchardt <xypron.glpk@gmx.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      230633d1
    • D
      prctl: avoid using mmap_sem for exe_file serialization · 6e399cd1
      Davidlohr Bueso 提交于
      Oleg cleverly suggested using xchg() to set the new mm->exe_file instead
      of calling set_mm_exe_file() which requires some form of serialization --
      mmap_sem in this case.  For archs that do not have atomic rmw instructions
      we still fallback to a spinlock alternative, so this should always be
      safe.  As such, we only need the mmap_sem for looking up the backing
      vm_file, which can be done sharing the lock.  Naturally, this means we
      need to manually deal with both the new and old file reference counting,
      and we need not worry about the MMF_EXE_FILE_CHANGED bits, which can
      probably be deleted in the future anyway.
      Signed-off-by: NDavidlohr Bueso <dbueso@suse.de>
      Suggested-by: NOleg Nesterov <oleg@redhat.com>
      Acked-by: NOleg Nesterov <oleg@redhat.com>
      Reviewed-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6e399cd1