1. 04 6月, 2014 2 次提交
  2. 02 6月, 2014 1 次提交
  3. 31 5月, 2014 1 次提交
    • M
      x86_64: expand kernel stack to 16K · 6538b8ea
      Minchan Kim 提交于
      While I play inhouse patches with much memory pressure on qemu-kvm,
      3.14 kernel was randomly crashed. The reason was kernel stack overflow.
      
      When I investigated the problem, the callstack was a little bit deeper
      by involve with reclaim functions but not direct reclaim path.
      
      I tried to diet stack size of some functions related with alloc/reclaim
      so did a hundred of byte but overflow was't disappeard so that I encounter
      overflow by another deeper callstack on reclaim/allocator path.
      
      Of course, we might sweep every sites we have found for reducing
      stack usage but I'm not sure how long it saves the world(surely,
      lots of developer start to add nice features which will use stack
      agains) and if we consider another more complex feature in I/O layer
      and/or reclaim path, it might be better to increase stack size(
      meanwhile, stack usage on 64bit machine was doubled compared to 32bit
      while it have sticked to 8K. Hmm, it's not a fair to me and arm64
      already expaned to 16K. )
      
      So, my stupid idea is just let's expand stack size and keep an eye
      toward stack consumption on each kernel functions via stacktrace of ftrace.
      For example, we can have a bar like that each funcion shouldn't exceed 200K
      and emit the warning when some function consumes more in runtime.
      Of course, it could make false positive but at least, it could make a
      chance to think over it.
      
      I guess this topic was discussed several time so there might be
      strong reason not to increase kernel stack size on x86_64, for me not
      knowing so Ccing x86_64 maintainers, other MM guys and virtio
      maintainers.
      
      Here's an example call trace using up the kernel stack:
      
               Depth    Size   Location    (51 entries)
               -----    ----   --------
         0)     7696      16   lookup_address
         1)     7680      16   _lookup_address_cpa.isra.3
         2)     7664      24   __change_page_attr_set_clr
         3)     7640     392   kernel_map_pages
         4)     7248     256   get_page_from_freelist
         5)     6992     352   __alloc_pages_nodemask
         6)     6640       8   alloc_pages_current
         7)     6632     168   new_slab
         8)     6464       8   __slab_alloc
         9)     6456      80   __kmalloc
        10)     6376     376   vring_add_indirect
        11)     6000     144   virtqueue_add_sgs
        12)     5856     288   __virtblk_add_req
        13)     5568      96   virtio_queue_rq
        14)     5472     128   __blk_mq_run_hw_queue
        15)     5344      16   blk_mq_run_hw_queue
        16)     5328      96   blk_mq_insert_requests
        17)     5232     112   blk_mq_flush_plug_list
        18)     5120     112   blk_flush_plug_list
        19)     5008      64   io_schedule_timeout
        20)     4944     128   mempool_alloc
        21)     4816      96   bio_alloc_bioset
        22)     4720      48   get_swap_bio
        23)     4672     160   __swap_writepage
        24)     4512      32   swap_writepage
        25)     4480     320   shrink_page_list
        26)     4160     208   shrink_inactive_list
        27)     3952     304   shrink_lruvec
        28)     3648      80   shrink_zone
        29)     3568     128   do_try_to_free_pages
        30)     3440     208   try_to_free_pages
        31)     3232     352   __alloc_pages_nodemask
        32)     2880       8   alloc_pages_current
        33)     2872     200   __page_cache_alloc
        34)     2672      80   find_or_create_page
        35)     2592      80   ext4_mb_load_buddy
        36)     2512     176   ext4_mb_regular_allocator
        37)     2336     128   ext4_mb_new_blocks
        38)     2208     256   ext4_ext_map_blocks
        39)     1952     160   ext4_map_blocks
        40)     1792     384   ext4_writepages
        41)     1408      16   do_writepages
        42)     1392      96   __writeback_single_inode
        43)     1296     176   writeback_sb_inodes
        44)     1120      80   __writeback_inodes_wb
        45)     1040     160   wb_writeback
        46)      880     208   bdi_writeback_workfn
        47)      672     144   process_one_work
        48)      528     112   worker_thread
        49)      416     240   kthread
        50)      176     176   ret_from_fork
      
      [ Note: the problem is exacerbated by certain gcc versions that seem to
        generate much bigger stack frames due to apparently bad coalescing of
        temporaries and generating too many spills.  Rusty saw gcc-4.6.4 using
        35% more stack on the virtio path than 4.8.2 does, for example.
      
        Minchan not only uses such a bad gcc version (4.6.3 in his case), but
        some of the stack use is due to debugging (CONFIG_DEBUG_PAGEALLOC is
        what causes that kernel_map_pages() frame, for example). But we're
        clearly getting too close.
      
        The VM code also seems to have excessive stack frames partly for the
        same compiler reason, triggered by excessive inlining and lots of
        function arguments.
      
        We need to improve on our stack use, but in the meantime let's do this
        simple stack increase too.  Unlike most earlier reports, there is
        nothing simple that stands out as being really horribly wrong here,
        apart from the fact that the stack frames are just bigger than they
        should need to be.        - Linus ]
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Cc: Peter Anvin <hpa@zytor.com>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Michael S Tsirkin <mst@redhat.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: PJ Waskiewicz <pjwaskiewicz@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6538b8ea
  4. 29 5月, 2014 1 次提交
    • W
      arm64: mm: fix pmd_write CoW brokenness · ceb21835
      Will Deacon 提交于
      Commit 9c7e535f ("arm64: mm: Route pmd thp functions through pte
      equivalents") changed the pmd manipulator and accessor functions to
      convert the target pmd to a pte, process it with the pte functions, then
      convert it back. Along the way, we gained support for PTE_WRITE, however
      this is completely ignored by set_pmd_at, and so we fail to set the
      PMD_SECT_RDONLY for PMDs, resulting in all sorts of lovely failures (like
      CoW not working).
      
      Partially reverting the offending commit (by making use of
      PMD_SECT_RDONLY explicitly for pmd_{write,wrprotect,mkwrite} functions)
      leads to further issues because pmd_write can then return potentially
      incorrect values for page table entries marked as RDONLY, leading to
      BUG_ON(pmd_write(entry)) tripping under some THP workloads.
      
      This patch fixes the issue by routing set_pmd_at through set_pte_at,
      which correctly takes the PTE_WRITE flag into account. Given that
      THP mappings are always anonymous, the additional cache-flushing code
      in __sync_icache_dcache won't impose any significant overhead as the
      flush will be skipped.
      
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Acked-by: NSteve Capper <steve.capper@arm.com>
      Tested-by: NMarc Zyngier <marc.zyngier@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      ceb21835
  5. 28 5月, 2014 7 次提交
    • N
      ARM: 8063/1: bL_switcher: fix individual online status reporting of removed CPUs · 3f8517e7
      Nicolas Pitre 提交于
      The content of /sys/devices/system/cpu/cpu*/online  is still 1 for those
      CPUs that the switcher has removed even though the global state in
      /sys/devices/system/cpu/online is updated correctly.
      
      It turns out that commit 0902a904 ("Driver core: Use generic
      offline/online for CPU offline/online") has changed the way those files
      retrieve their content by relying on on the generic attribute handling
      code.  The switcher, by calling cpu_down() directly, bypasses this
      handling and the attribute value doesn't get updated.
      
      Fix this by calling device_offline()/device_online() instead.
      Signed-off-by: NNicolas Pitre <nico@linaro.org>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      3f8517e7
    • T
      MIPS: R46000: Fix Micro-assembler field overflow for R4600 V2 · f3f0d951
      Thomas Bogendoerfer 提交于
      Fix uasm warning, which triggered because of workaround for R4600 V2 CPUs.
      Signed-off-by: NThomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/6716/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      f3f0d951
    • A
      MIPS: ptrace: Avoid smp_processor_id() in preemptible code · 57c7ea51
      Alex Smith 提交于
      ptrace_{get,set}_watch_regs access current_cpu_data to get the watch
      register count/masks, which calls smp_processor_id(). However they are
      run in preemptible context and therefore trigger warnings like so:
      
      [ 6340.092000] BUG: using smp_processor_id() in preemptible [00000000] code: gdb/367
      [ 6340.092000] caller is ptrace_get_watch_regs+0x44/0x220
      
      Since the watch register count/masks should be the same across all
      CPUs, use boot_cpu_data instead. Note that this may need to change in
      future should a heterogenous system be supported where the count/masks
      are not the same across all CPUs (the current code is also incorrect
      for this scenario - current_cpu_data here would not necessarily be
      correct for the CPU that the target task will execute on).
      Signed-off-by: NAlex Smith <alex.smith@imgtec.com>
      Reviewed-by: NPaul Burton <paul.burton@imgtec.com>
      Cc: linux-mips@linux-mips.org
      Patchwork: https://patchwork.linux-mips.org/patch/6879/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      57c7ea51
    • S
      MIPS: Lemote 2F: cs5536: mfgpt: use raw locks · f02ffb19
      Sebastian Andrzej Siewior 提交于
      The lock is taken in the raw irq path and therefore a rawlock should be
      used instead of a normal spinlock.
      While here I drop the export symbol on that variable since there are no
      other users.
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: linux-mips@linux-mips.org
      Cc: Hua Yan <yanh@lemote.com>
      Cc: Huacai Chen <chenhc@lemote.com>
      Cc: Alex Smith <alex.smith@imgtec.com>
      Cc: Hongliang Tao <taohl@lemote.com>
      Cc: Wu Zhangjin <wuzhangjin@gmail.com>
      Patchwork: https://patchwork.linux-mips.org/patch/6936/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      f02ffb19
    • R
      MIPS: SB1: Fix excessive kernel warnings. · bb6c0bd3
      Ralf Baechle 提交于
      A kernel build with binutils 2.24 is going to emit warnings like
      
        CC      kernel/sys.o
      {standard input}: Assembler messages:
      {standard input}:701: Warning: the 32-bit MIPS architecture does not support the `mdmx' extension
      {standard input}:701: Warning: the `mdmx' extension requires 64-bit FPRs
      {standard input}:701: Warning: the `mips3d' extension requires MIPS32 revision 2 or greater
      {standard input}:701: Warning: the `mips3d' extension requires 64-bit FPRs
      
      for almost every file.  This is caused by changes to gas' interpretation
      of .set semantics.  Fixed by explicitly disabling MIPS3D and MDMX for
      Sibyte builds.
      Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      bb6c0bd3
    • S
      powerpc, kexec: Fix "Processor X is stuck" issue during kexec from ST mode · 011e4b02
      Srivatsa S. Bhat 提交于
      If we try to perform a kexec when the machine is in ST (Single-Threaded) mode
      (ppc64_cpu --smt=off), the kexec operation doesn't succeed properly, and we
      get the following messages during boot:
      
      [    0.089866] POWER8 performance monitor hardware support registered
      [    0.089985] power8-pmu: PMAO restore workaround active.
      [    5.095419] Processor 1 is stuck.
      [   10.097933] Processor 2 is stuck.
      [   15.100480] Processor 3 is stuck.
      [   20.102982] Processor 4 is stuck.
      [   25.105489] Processor 5 is stuck.
      [   30.108005] Processor 6 is stuck.
      [   35.110518] Processor 7 is stuck.
      [   40.113369] Processor 9 is stuck.
      [   45.115879] Processor 10 is stuck.
      [   50.118389] Processor 11 is stuck.
      [   55.120904] Processor 12 is stuck.
      [   60.123425] Processor 13 is stuck.
      [   65.125970] Processor 14 is stuck.
      [   70.128495] Processor 15 is stuck.
      [   75.131316] Processor 17 is stuck.
      
      Note that only the sibling threads are stuck, while the primary threads (0, 8,
      16 etc) boot just fine. Looking closer at the previous step of kexec, we observe
      that kexec tries to wakeup (bring online) the sibling threads of all the cores,
      before performing kexec:
      
      [ 9464.131231] Starting new kernel
      [ 9464.148507] kexec: Waking offline cpu 1.
      [ 9464.148552] kexec: Waking offline cpu 2.
      [ 9464.148600] kexec: Waking offline cpu 3.
      [ 9464.148636] kexec: Waking offline cpu 4.
      [ 9464.148671] kexec: Waking offline cpu 5.
      [ 9464.148708] kexec: Waking offline cpu 6.
      [ 9464.148743] kexec: Waking offline cpu 7.
      [ 9464.148779] kexec: Waking offline cpu 9.
      [ 9464.148815] kexec: Waking offline cpu 10.
      [ 9464.148851] kexec: Waking offline cpu 11.
      [ 9464.148887] kexec: Waking offline cpu 12.
      [ 9464.148922] kexec: Waking offline cpu 13.
      [ 9464.148958] kexec: Waking offline cpu 14.
      [ 9464.148994] kexec: Waking offline cpu 15.
      [ 9464.149030] kexec: Waking offline cpu 17.
      
      Instrumenting this piece of code revealed that the cpu_up() operation actually
      fails with -EBUSY. Thus, only the primary threads of all the cores are online
      during kexec, and hence this is a sure-shot receipe for disaster, as explained
      in commit e8e5c215 (powerpc/kexec: Fix orphaned offline CPUs across kexec),
      as well as in the comment above wake_offline_cpus().
      
      It turns out that cpu_up() was returning -EBUSY because the variable
      'cpu_hotplug_disabled' was set to 1; and this disabling of CPU hotplug was done
      by migrate_to_reboot_cpu() inside kernel_kexec().
      
      Now, migrate_to_reboot_cpu() was originally written with the assumption that
      any further code will not need to perform CPU hotplug, since we are anyway in
      the reboot path. However, kexec is clearly not such a case, since we depend on
      onlining CPUs, atleast on powerpc.
      
      So re-enable cpu-hotplug after returning from migrate_to_reboot_cpu() in the
      kexec path, to fix this regression in kexec on powerpc.
      
      Also, wrap the cpu_up() in powerpc kexec code within a WARN_ON(), so that we
      can catch such issues more easily in the future.
      
      Fixes: c97102ba (kexec: migrate to reboot cpu)
      Cc: stable@vger.kernel.org
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      011e4b02
    • G
      powerpc: Fix 64 bit builds with binutils 2.24 · 7998eb3d
      Guenter Roeck 提交于
      With binutils 2.24, various 64 bit builds fail with relocation errors
      such as
      
      arch/powerpc/kernel/built-in.o: In function `exc_debug_crit_book3e':
      	(.text+0x165ee): relocation truncated to fit: R_PPC64_ADDR16_HI
      	against symbol `interrupt_base_book3e' defined in .text section
      	in arch/powerpc/kernel/built-in.o
      arch/powerpc/kernel/built-in.o: In function `exc_debug_crit_book3e':
      	(.text+0x16602): relocation truncated to fit: R_PPC64_ADDR16_HI
      	against symbol `interrupt_end_book3e' defined in .text section
      	in arch/powerpc/kernel/built-in.o
      
      The assembler maintainer says:
      
       I changed the ABI, something that had to be done but unfortunately
       happens to break the booke kernel code.  When building up a 64-bit
       value with lis, ori, shl, oris, ori or similar sequences, you now
       should use @high and @higha in place of @h and @ha.  @h and @ha
       (and their associated relocs R_PPC64_ADDR16_HI and R_PPC64_ADDR16_HA)
       now report overflow if the value is out of 32-bit signed range.
       ie. @h and @ha assume you're building a 32-bit value. This is needed
       to report out-of-range -mcmodel=medium toc pointer offsets in @toc@h
       and @toc@ha expressions, and for consistency I did the same for all
       other @h and @ha relocs.
      
      Replacing @h with @high in one strategic location fixes the relocation
      errors. This has to be done conditionally since the assembler either
      supports @h or @high but not both.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      7998eb3d
  6. 26 5月, 2014 5 次提交
  7. 25 5月, 2014 3 次提交
  8. 24 5月, 2014 2 次提交
  9. 23 5月, 2014 2 次提交
  10. 22 5月, 2014 3 次提交
  11. 21 5月, 2014 1 次提交
  12. 20 5月, 2014 7 次提交
  13. 18 5月, 2014 1 次提交
  14. 17 5月, 2014 3 次提交
    • T
      ARM: OMAP2+: Fix DMA hang after off-idle · 9ce2482f
      Tony Lindgren 提交于
      Commit 6ddeb6d8 (dmaengine: omap-dma: move IRQ handling to omap-dma)
      added support for handling interrupts in the omap dmaengine driver
      instead of the legacy driver. Because of different handling for
      interrupts this however caused omap3 to hang eventually after hitting
      off-idle.
      
      Any of the virtual 32 DMA channels can be assigned to any of the
      four DMA interrupts. So commit 6ddeb6d8 made the omap dmaengine
      driver to use the second DMA interrupt while keeping the legacy code
      still using the first DMA interrupt.
      
      This means we need to save and restore both IRQENABLE_L1 in addition
      to IRQENABLE_L0. As there is a chance that the DSP might be using
      IRQENABLE_L2 or IRQENABLE_L3 lines, let's not touch those until
      this has been confirmed. Let's just add a comment to the code for
      now.
      
      Fixes: 6ddeb6d8 (dmaengine: omap-dma: move IRQ handling to omap-dma)
      Cc: Russell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: NTony Lindgren <tony@atomide.com>
      9ce2482f
    • R
      ARM: OMAP2+: nand: Fix NAND on OMAP2 and OMAP3 boards · 5005e0b7
      Roger Quadros 提交于
      Commit c66d0391 broke NAND for non-DT boot on all OMAP2 and OMAP3
      boards using board_nand_init(). Following error is seen at boot
      
      [    0.154998]  (null): Unsupported NAND ECC scheme selected
      
      For OMAP2 and OMAP3 platforms, the ecc_opt parameter in platform data
      must be set to OMAP_ECC_HAM1_CODE_HW to work properly.
      
      Tested on omap3-beagle c4.
      
      Fixes: c66d0391 (mtd: nand: omap: combine different flavours of 1-bit hamming ecc schemes)
      Cc: stable@vger.kernel.org # v3.12+
      Signed-off-by: NRoger Quadros <rogerq@ti.com>
      Signed-off-by: NTony Lindgren <tony@atomide.com>
      5005e0b7
    • M
      arm64: fix pud_huge() for 2-level pagetables · 4797ec2d
      Mark Salter 提交于
      The following happens when trying to run a kvm guest on a kernel
      configured for 64k pages. This doesn't happen with 4k pages:
      
        BUG: failure at include/linux/mm.h:297/put_page_testzero()!
        Kernel panic - not syncing: BUG!
        CPU: 2 PID: 4228 Comm: qemu-system-aar Tainted: GF            3.13.0-0.rc7.31.sa2.k32v1.aarch64.debug #1
        Call trace:
        [<fffffe0000096034>] dump_backtrace+0x0/0x16c
        [<fffffe00000961b4>] show_stack+0x14/0x1c
        [<fffffe000066e648>] dump_stack+0x84/0xb0
        [<fffffe0000668678>] panic+0xf4/0x220
        [<fffffe000018ec78>] free_reserved_area+0x0/0x110
        [<fffffe000018edd8>] free_pages+0x50/0x88
        [<fffffe00000a759c>] kvm_free_stage2_pgd+0x30/0x40
        [<fffffe00000a5354>] kvm_arch_destroy_vm+0x18/0x44
        [<fffffe00000a1854>] kvm_put_kvm+0xf0/0x184
        [<fffffe00000a1938>] kvm_vm_release+0x10/0x1c
        [<fffffe00001edc1c>] __fput+0xb0/0x288
        [<fffffe00001ede4c>] ____fput+0xc/0x14
        [<fffffe00000d5a2c>] task_work_run+0xa8/0x11c
        [<fffffe0000095c14>] do_notify_resume+0x54/0x58
      
      In arch/arm/kvm/mmu.c:unmap_range(), we end up doing an extra put_page()
      on the stage2 pgd which leads to the BUG in put_page_testzero(). This
      happens because a pud_huge() test in unmap_range() returns true when it
      should always be false with 2-level pages tables used by 64k pages.
      This patch removes support for huge puds if 2-level pagetables are
      being used.
      Signed-off-by: NMark Salter <msalter@redhat.com>
      [catalin.marinas@arm.com: removed #ifndef around PUD_SIZE check]
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      Cc: <stable@vger.kernel.org> # v3.11+
      4797ec2d
  15. 16 5月, 2014 1 次提交