1. 02 6月, 2014 1 次提交
  2. 31 5月, 2014 1 次提交
    • M
      x86_64: expand kernel stack to 16K · 6538b8ea
      Minchan Kim 提交于
      While I play inhouse patches with much memory pressure on qemu-kvm,
      3.14 kernel was randomly crashed. The reason was kernel stack overflow.
      
      When I investigated the problem, the callstack was a little bit deeper
      by involve with reclaim functions but not direct reclaim path.
      
      I tried to diet stack size of some functions related with alloc/reclaim
      so did a hundred of byte but overflow was't disappeard so that I encounter
      overflow by another deeper callstack on reclaim/allocator path.
      
      Of course, we might sweep every sites we have found for reducing
      stack usage but I'm not sure how long it saves the world(surely,
      lots of developer start to add nice features which will use stack
      agains) and if we consider another more complex feature in I/O layer
      and/or reclaim path, it might be better to increase stack size(
      meanwhile, stack usage on 64bit machine was doubled compared to 32bit
      while it have sticked to 8K. Hmm, it's not a fair to me and arm64
      already expaned to 16K. )
      
      So, my stupid idea is just let's expand stack size and keep an eye
      toward stack consumption on each kernel functions via stacktrace of ftrace.
      For example, we can have a bar like that each funcion shouldn't exceed 200K
      and emit the warning when some function consumes more in runtime.
      Of course, it could make false positive but at least, it could make a
      chance to think over it.
      
      I guess this topic was discussed several time so there might be
      strong reason not to increase kernel stack size on x86_64, for me not
      knowing so Ccing x86_64 maintainers, other MM guys and virtio
      maintainers.
      
      Here's an example call trace using up the kernel stack:
      
               Depth    Size   Location    (51 entries)
               -----    ----   --------
         0)     7696      16   lookup_address
         1)     7680      16   _lookup_address_cpa.isra.3
         2)     7664      24   __change_page_attr_set_clr
         3)     7640     392   kernel_map_pages
         4)     7248     256   get_page_from_freelist
         5)     6992     352   __alloc_pages_nodemask
         6)     6640       8   alloc_pages_current
         7)     6632     168   new_slab
         8)     6464       8   __slab_alloc
         9)     6456      80   __kmalloc
        10)     6376     376   vring_add_indirect
        11)     6000     144   virtqueue_add_sgs
        12)     5856     288   __virtblk_add_req
        13)     5568      96   virtio_queue_rq
        14)     5472     128   __blk_mq_run_hw_queue
        15)     5344      16   blk_mq_run_hw_queue
        16)     5328      96   blk_mq_insert_requests
        17)     5232     112   blk_mq_flush_plug_list
        18)     5120     112   blk_flush_plug_list
        19)     5008      64   io_schedule_timeout
        20)     4944     128   mempool_alloc
        21)     4816      96   bio_alloc_bioset
        22)     4720      48   get_swap_bio
        23)     4672     160   __swap_writepage
        24)     4512      32   swap_writepage
        25)     4480     320   shrink_page_list
        26)     4160     208   shrink_inactive_list
        27)     3952     304   shrink_lruvec
        28)     3648      80   shrink_zone
        29)     3568     128   do_try_to_free_pages
        30)     3440     208   try_to_free_pages
        31)     3232     352   __alloc_pages_nodemask
        32)     2880       8   alloc_pages_current
        33)     2872     200   __page_cache_alloc
        34)     2672      80   find_or_create_page
        35)     2592      80   ext4_mb_load_buddy
        36)     2512     176   ext4_mb_regular_allocator
        37)     2336     128   ext4_mb_new_blocks
        38)     2208     256   ext4_ext_map_blocks
        39)     1952     160   ext4_map_blocks
        40)     1792     384   ext4_writepages
        41)     1408      16   do_writepages
        42)     1392      96   __writeback_single_inode
        43)     1296     176   writeback_sb_inodes
        44)     1120      80   __writeback_inodes_wb
        45)     1040     160   wb_writeback
        46)      880     208   bdi_writeback_workfn
        47)      672     144   process_one_work
        48)      528     112   worker_thread
        49)      416     240   kthread
        50)      176     176   ret_from_fork
      
      [ Note: the problem is exacerbated by certain gcc versions that seem to
        generate much bigger stack frames due to apparently bad coalescing of
        temporaries and generating too many spills.  Rusty saw gcc-4.6.4 using
        35% more stack on the virtio path than 4.8.2 does, for example.
      
        Minchan not only uses such a bad gcc version (4.6.3 in his case), but
        some of the stack use is due to debugging (CONFIG_DEBUG_PAGEALLOC is
        what causes that kernel_map_pages() frame, for example). But we're
        clearly getting too close.
      
        The VM code also seems to have excessive stack frames partly for the
        same compiler reason, triggered by excessive inlining and lots of
        function arguments.
      
        We need to improve on our stack use, but in the meantime let's do this
        simple stack increase too.  Unlike most earlier reports, there is
        nothing simple that stands out as being really horribly wrong here,
        apart from the fact that the stack frames are just bigger than they
        should need to be.        - Linus ]
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Cc: Peter Anvin <hpa@zytor.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Michael S Tsirkin <mst@redhat.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: PJ Waskiewicz <pjwaskiewicz@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6538b8ea
  3. 29 5月, 2014 1 次提交
    • W
      arm64: mm: fix pmd_write CoW brokenness · ceb21835
      Will Deacon 提交于
      Commit 9c7e535f ("arm64: mm: Route pmd thp functions through pte
      equivalents") changed the pmd manipulator and accessor functions to
      convert the target pmd to a pte, process it with the pte functions, then
      convert it back. Along the way, we gained support for PTE_WRITE, however
      this is completely ignored by set_pmd_at, and so we fail to set the
      PMD_SECT_RDONLY for PMDs, resulting in all sorts of lovely failures (like
      CoW not working).
      
      Partially reverting the offending commit (by making use of
      PMD_SECT_RDONLY explicitly for pmd_{write,wrprotect,mkwrite} functions)
      leads to further issues because pmd_write can then return potentially
      incorrect values for page table entries marked as RDONLY, leading to
      BUG_ON(pmd_write(entry)) tripping under some THP workloads.
      
      This patch fixes the issue by routing set_pmd_at through set_pte_at,
      which correctly takes the PTE_WRITE flag into account. Given that
      THP mappings are always anonymous, the additional cache-flushing code
      in __sync_icache_dcache won't impose any significant overhead as the
      flush will be skipped.
      
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Acked-by: NSteve Capper <steve.capper@arm.com>
      Tested-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      ceb21835
  4. 28 5月, 2014 15 次提交
    • N
      ARM: 8063/1: bL_switcher: fix individual online status reporting of removed CPUs · 3f8517e7
      Nicolas Pitre 提交于
      The content of /sys/devices/system/cpu/cpu*/online  is still 1 for those
      CPUs that the switcher has removed even though the global state in
      /sys/devices/system/cpu/online is updated correctly.
      
      It turns out that commit 0902a904 ("Driver core: Use generic
      offline/online for CPU offline/online") has changed the way those files
      retrieve their content by relying on on the generic attribute handling
      code.  The switcher, by calling cpu_down() directly, bypasses this
      handling and the attribute value doesn't get updated.
      
      Fix this by calling device_offline()/device_online() instead.
      Signed-off-by: NNicolas Pitre <nico@linaro.org>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      3f8517e7
    • T
      MIPS: R46000: Fix Micro-assembler field overflow for R4600 V2 · f3f0d951
      Thomas Bogendoerfer 提交于
      Fix uasm warning, which triggered because of workaround for R4600 V2 CPUs.
      Signed-off-by: NThomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/6716/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      f3f0d951
    • A
      MIPS: ptrace: Avoid smp_processor_id() in preemptible code · 57c7ea51
      Alex Smith 提交于
      ptrace_{get,set}_watch_regs access current_cpu_data to get the watch
      register count/masks, which calls smp_processor_id(). However they are
      run in preemptible context and therefore trigger warnings like so:
      
      [ 6340.092000] BUG: using smp_processor_id() in preemptible [00000000] code: gdb/367
      [ 6340.092000] caller is ptrace_get_watch_regs+0x44/0x220
      
      Since the watch register count/masks should be the same across all
      CPUs, use boot_cpu_data instead. Note that this may need to change in
      future should a heterogenous system be supported where the count/masks
      are not the same across all CPUs (the current code is also incorrect
      for this scenario - current_cpu_data here would not necessarily be
      correct for the CPU that the target task will execute on).
      Signed-off-by: NAlex Smith <alex.smith@imgtec.com>
      Reviewed-by: NPaul Burton <paul.burton@imgtec.com>
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/6879/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      57c7ea51
    • S
      MIPS: Lemote 2F: cs5536: mfgpt: use raw locks · f02ffb19
      Sebastian Andrzej Siewior 提交于
      The lock is taken in the raw irq path and therefore a rawlock should be
      used instead of a normal spinlock.
      While here I drop the export symbol on that variable since there are no
      other users.
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: linux-mips@linux-mips.org
      Cc: Hua Yan <yanh@lemote.com>
      Cc: Huacai Chen <chenhc@lemote.com>
      Cc: Alex Smith <alex.smith@imgtec.com>
      Cc: Hongliang Tao <taohl@lemote.com>
      Cc: Wu Zhangjin <wuzhangjin@gmail.com>
      Patchwork: https://patchwork.linux-mips.org/patch/6936/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      f02ffb19
    • F
      m68k/hp300: Convert printk to pr_foo() · e8d6dc5a
      Fabian Frederick 提交于
      This patch also fixes some checkpatch warnings
      
      This is untested
      
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NFabian Frederick <fabf@skynet.be>
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      e8d6dc5a
    • F
      m68k/apollo: Convert printk to pr_foo() · ce00aa0a
      Fabian Frederick 提交于
      no level printk converted to pr_info
      
      This is untested
      
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NFabian Frederick <fabf@skynet.be>
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      ce00aa0a
    • F
      m68k/amiga: Convert printk(foo to pr_foo() · f296401b
      Fabian Frederick 提交于
      -no level printk converted to pr_warn/pr_info
      -fixed a small identation problem
      
      This is untested
      
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NFabian Frederick <fabf@skynet.be>
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      f296401b
    • A
      m68k: Increase initial mapping to 8 or 16 MiB if possible · 486df8bc
      Andreas Schwab 提交于
      If the size of the first memory chunk is at least 8 or 16 MiB increase the
      initial mapping to 8 resp. 16 MiB instead of 4 MiB.
      This makes it possible to
        1. Map more memory in the first node without running out of space for the
           page tables,
        2. Boot kernels that don't fit in 4 MiB (e.g. multi_defconfig).
      Signed-off-by: NAndreas Schwab <schwab@linux-m68k.org>
      
        - Add support for 8 MiB,
        - Store initial mapping size in head.S for later reuse,
        - Add comment about large kernels.
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      486df8bc
    • G
      44074e89
    • F
      m68k/atari: fix SCC initialization for debug console · 83adc181
      Finn Thain 提交于
      Fix SCC initialization for Atari as was previously fixed for Mac. It's
      probably not practical to share more code but some attempt is made to
      align the Mac and Atari variants.
      Signed-off-by: NFinn Thain <fthain@telegraphics.com.au>
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      83adc181
    • F
      m68k/mvme16x: Adopt common boot console · c46f46d0
      Finn Thain 提交于
      In a multi-platform kernel binary we only need one early console instance.
      
      The difficulty here is that the common early console is started by
      early_param(), whereas the MVME16x instance is started later by
      config_mvme16x(). That means some interrupt setup must be done earlier.
      Signed-off-by: NFinn Thain <fthain@telegraphics.com.au>
      Tested-by: NStephen N Chivers <schivers@csc.com.au>
      [Geert] Tag debug_cons_write() with __ref to kill section mismatch warning
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      c46f46d0
    • F
      m68k: Multi-platform EARLY_PRINTK · 7913ad1a
      Finn Thain 提交于
      Make the boot console available to more m68k platforms by leveraging
      the head.S debug console.
      
      The boot console is enabled by the "earlyprintk" command line argument
      which is how most other architectures do this.
      
      This is a change of behaviour for the Mac but does not negatively impact
      the common use-case which is not debugging.
      
      This is also a change of behaviour for other platforms because it means
      the serial port stays quiet when CONFIG_EARLY_PRINTK is not enabled. This
      is also an improvement for the common use-case.
      Signed-off-by: NFinn Thain <fthain@telegraphics.com.au>
      Tested-by: NStephen N Chivers <schivers@csc.com.au>
      [Geert: CONSOLE_DEBUG should depend on CONFIG_FONT_SUPPORT]
      Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      7913ad1a
    • R
      MIPS: SB1: Fix excessive kernel warnings. · bb6c0bd3
      Ralf Baechle 提交于
      A kernel build with binutils 2.24 is going to emit warnings like
      
        CC      kernel/sys.o
      {standard input}: Assembler messages:
      {standard input}:701: Warning: the 32-bit MIPS architecture does not support the `mdmx' extension
      {standard input}:701: Warning: the `mdmx' extension requires 64-bit FPRs
      {standard input}:701: Warning: the `mips3d' extension requires MIPS32 revision 2 or greater
      {standard input}:701: Warning: the `mips3d' extension requires 64-bit FPRs
      
      for almost every file.  This is caused by changes to gas' interpretation
      of .set semantics.  Fixed by explicitly disabling MIPS3D and MDMX for
      Sibyte builds.
      Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      bb6c0bd3
    • S
      powerpc, kexec: Fix "Processor X is stuck" issue during kexec from ST mode · 011e4b02
      Srivatsa S. Bhat 提交于
      If we try to perform a kexec when the machine is in ST (Single-Threaded) mode
      (ppc64_cpu --smt=off), the kexec operation doesn't succeed properly, and we
      get the following messages during boot:
      
      [    0.089866] POWER8 performance monitor hardware support registered
      [    0.089985] power8-pmu: PMAO restore workaround active.
      [    5.095419] Processor 1 is stuck.
      [   10.097933] Processor 2 is stuck.
      [   15.100480] Processor 3 is stuck.
      [   20.102982] Processor 4 is stuck.
      [   25.105489] Processor 5 is stuck.
      [   30.108005] Processor 6 is stuck.
      [   35.110518] Processor 7 is stuck.
      [   40.113369] Processor 9 is stuck.
      [   45.115879] Processor 10 is stuck.
      [   50.118389] Processor 11 is stuck.
      [   55.120904] Processor 12 is stuck.
      [   60.123425] Processor 13 is stuck.
      [   65.125970] Processor 14 is stuck.
      [   70.128495] Processor 15 is stuck.
      [   75.131316] Processor 17 is stuck.
      
      Note that only the sibling threads are stuck, while the primary threads (0, 8,
      16 etc) boot just fine. Looking closer at the previous step of kexec, we observe
      that kexec tries to wakeup (bring online) the sibling threads of all the cores,
      before performing kexec:
      
      [ 9464.131231] Starting new kernel
      [ 9464.148507] kexec: Waking offline cpu 1.
      [ 9464.148552] kexec: Waking offline cpu 2.
      [ 9464.148600] kexec: Waking offline cpu 3.
      [ 9464.148636] kexec: Waking offline cpu 4.
      [ 9464.148671] kexec: Waking offline cpu 5.
      [ 9464.148708] kexec: Waking offline cpu 6.
      [ 9464.148743] kexec: Waking offline cpu 7.
      [ 9464.148779] kexec: Waking offline cpu 9.
      [ 9464.148815] kexec: Waking offline cpu 10.
      [ 9464.148851] kexec: Waking offline cpu 11.
      [ 9464.148887] kexec: Waking offline cpu 12.
      [ 9464.148922] kexec: Waking offline cpu 13.
      [ 9464.148958] kexec: Waking offline cpu 14.
      [ 9464.148994] kexec: Waking offline cpu 15.
      [ 9464.149030] kexec: Waking offline cpu 17.
      
      Instrumenting this piece of code revealed that the cpu_up() operation actually
      fails with -EBUSY. Thus, only the primary threads of all the cores are online
      during kexec, and hence this is a sure-shot receipe for disaster, as explained
      in commit e8e5c215 (powerpc/kexec: Fix orphaned offline CPUs across kexec),
      as well as in the comment above wake_offline_cpus().
      
      It turns out that cpu_up() was returning -EBUSY because the variable
      'cpu_hotplug_disabled' was set to 1; and this disabling of CPU hotplug was done
      by migrate_to_reboot_cpu() inside kernel_kexec().
      
      Now, migrate_to_reboot_cpu() was originally written with the assumption that
      any further code will not need to perform CPU hotplug, since we are anyway in
      the reboot path. However, kexec is clearly not such a case, since we depend on
      onlining CPUs, atleast on powerpc.
      
      So re-enable cpu-hotplug after returning from migrate_to_reboot_cpu() in the
      kexec path, to fix this regression in kexec on powerpc.
      
      Also, wrap the cpu_up() in powerpc kexec code within a WARN_ON(), so that we
      can catch such issues more easily in the future.
      
      Fixes: c97102ba (kexec: migrate to reboot cpu)
      Cc: stable@vger.kernel.org
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      011e4b02
    • G
      powerpc: Fix 64 bit builds with binutils 2.24 · 7998eb3d
      Guenter Roeck 提交于
      With binutils 2.24, various 64 bit builds fail with relocation errors
      such as
      
      arch/powerpc/kernel/built-in.o: In function `exc_debug_crit_book3e':
      	(.text+0x165ee): relocation truncated to fit: R_PPC64_ADDR16_HI
      	against symbol `interrupt_base_book3e' defined in .text section
      	in arch/powerpc/kernel/built-in.o
      arch/powerpc/kernel/built-in.o: In function `exc_debug_crit_book3e':
      	(.text+0x16602): relocation truncated to fit: R_PPC64_ADDR16_HI
      	against symbol `interrupt_end_book3e' defined in .text section
      	in arch/powerpc/kernel/built-in.o
      
      The assembler maintainer says:
      
       I changed the ABI, something that had to be done but unfortunately
       happens to break the booke kernel code.  When building up a 64-bit
       value with lis, ori, shl, oris, ori or similar sequences, you now
       should use @high and @higha in place of @h and @ha.  @h and @ha
       (and their associated relocs R_PPC64_ADDR16_HI and R_PPC64_ADDR16_HA)
       now report overflow if the value is out of 32-bit signed range.
       ie. @h and @ha assume you're building a 32-bit value. This is needed
       to report out-of-range -mcmodel=medium toc pointer offsets in @toc@h
       and @toc@ha expressions, and for consistency I did the same for all
       other @h and @ha relocs.
      
      Replacing @h with @high in one strategic location fixes the relocation
      errors. This has to be done conditionally since the assembler either
      supports @h or @high but not both.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      7998eb3d
  5. 27 5月, 2014 2 次提交
    • F
      m68k: Toward platform agnostic framebuffer debug logging · 97f3f68c
      Finn Thain 提交于
      Code subject to #ifdef CONSOLE is made more generic, as was apparently
      intended by the original author.
      
      Remove console_put_stats() routine. If it should be somehow useful, it
      should also be useful on platforms without framebuffer debug logging. The
      present implementation is only built #if defined CONFIG_MAC && defined
      CONSOLE even though puts() works everywhere.
      Signed-off-by: NFinn Thain <fthain@telegraphics.com.au>
      Tested-by: NStephen N Chivers <schivers@csc.com.au>
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      97f3f68c
    • M
      m68k/atari - stram: alloc ST-RAM pool even if kernel not in ST-RAM · fded332b
      Michael Schmitz 提交于
      With the kernel loaded to FastRAM (TT-RAM), none of the ST-RAM
      address range is mapped by init_mem, and ST-RAM is not accessible
      through the normal allocation pathways as a result.
      
      Implement ST-RAM pool allocation to be based on physical addresses
      always (it already was when the kernel was loaded in ST-RAM).
      Return kernel virtual addresses as per normal.
      
      The current test for the kernel residing in ST-RAM always returns
      true. Use the bootinfo memory chunk order instead - with the kernel
      in FastRAM, ST-RAM (phys. 0x0) is not the first chunk.
      
      In case the kernel is running from FastRAM, delay mapping of ST-RAM
      pool until after mem_init.
      
      Provide helper functions for those users of ST-RAM that need
      to be aware of the backing physical addresses.
      
      Kudos to Geert for his hints on getting this started.
      Signed-off-by: NMichael Schmitz <schmitz@debian.org>
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      fded332b
  6. 26 5月, 2014 5 次提交
  7. 25 5月, 2014 3 次提交
  8. 24 5月, 2014 2 次提交
  9. 23 5月, 2014 2 次提交
  10. 22 5月, 2014 3 次提交
  11. 21 5月, 2014 1 次提交
  12. 20 5月, 2014 4 次提交