1. 16 4月, 2016 4 次提交
  2. 15 4月, 2016 7 次提交
    • J
      arm64: mm: Add trace_irqflags annotations to do_debug_exception() · 6afedcd2
      James Morse 提交于
      With CONFIG_PROVE_LOCKING, CONFIG_DEBUG_LOCKDEP and CONFIG_TRACE_IRQFLAGS
      enabled, lockdep will compare current->hardirqs_enabled with the flags from
      local_irq_save().
      
      When a debug exception occurs, interrupts are disabled in entry.S, but
      lockdep isn't told, resulting in:
      DEBUG_LOCKS_WARN_ON(current->hardirqs_enabled)
      ------------[ cut here ]------------
      WARNING: at ../kernel/locking/lockdep.c:3523
      Modules linked in:
      CPU: 3 PID: 1752 Comm: perf Not tainted 4.5.0-rc4+ #2204
      Hardware name: ARM Juno development board (r1) (DT)
      task: ffffffc974868000 ti: ffffffc975f40000 task.ti: ffffffc975f40000
      PC is at check_flags.part.35+0x17c/0x184
      LR is at check_flags.part.35+0x17c/0x184
      pc : [<ffffff80080fc93c>] lr : [<ffffff80080fc93c>] pstate: 600003c5
      [...]
      ---[ end trace 74631f9305ef5020 ]---
      Call trace:
      [<ffffff80080fc93c>] check_flags.part.35+0x17c/0x184
      [<ffffff80080ffe30>] lock_acquire+0xa8/0xc4
      [<ffffff8008093038>] breakpoint_handler+0x118/0x288
      [<ffffff8008082434>] do_debug_exception+0x3c/0xa8
      [<ffffff80080854b4>] el1_dbg+0x18/0x6c
      [<ffffff80081e82f4>] do_filp_open+0x64/0xdc
      [<ffffff80081d6e60>] do_sys_open+0x140/0x204
      [<ffffff80081d6f58>] SyS_openat+0x10/0x18
      [<ffffff8008085d30>] el0_svc_naked+0x24/0x28
      possible reason: unannotated irqs-off.
      irq event stamp: 65857
      hardirqs last  enabled at (65857): [<ffffff80081fb1c0>] lookup_mnt+0xf4/0x1b4
      hardirqs last disabled at (65856): [<ffffff80081fb188>] lookup_mnt+0xbc/0x1b4
      softirqs last  enabled at (65790): [<ffffff80080bdca4>] __do_softirq+0x1f8/0x290
      softirqs last disabled at (65757): [<ffffff80080be038>] irq_exit+0x9c/0xd0
      
      This patch adds the annotations to do_debug_exception(), while trying not
      to call trace_hardirqs_off() if el1_dbg() interrupted a task that already
      had irqs disabled.
      Signed-off-by: NJames Morse <james.morse@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      6afedcd2
    • A
      arm64: hw-breakpoint: Remove superfluous SMP function call · 4bc49274
      Anna-Maria Gleixner 提交于
      Since commit 1cf4f629 ("cpu/hotplug: Move online calls to
      hotplugged cpu") it is ensured that callbacks of CPU_ONLINE and
      CPU_DOWN_PREPARE are processed on the hotplugged CPU. Due to this SMP
      function calls are no longer required.
      
      Replace smp_call_function_single() with a direct call of
      hw_breakpoint_reset(). To keep the calling convention, interrupts are
      explicitly disabled around the call.
      
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Signed-off-by: NAnna-Maria Gleixner <anna-maria@linutronix.de>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      4bc49274
    • A
      arm64/debug: Remove superfluous SMP function call · 499c8150
      Anna-Maria Gleixner 提交于
      Since commit 1cf4f629 ("cpu/hotplug: Move online calls to
      hotplugged cpu") it is ensured that callbacks of CPU_ONLINE and
      CPU_DOWN_PREPARE are processed on the hotplugged CPU. Due to this SMP
      function calls are no longer required.
      
      Replace smp_call_function_single() with a direct call to
      clear_os_lock(). The function writes the OSLAR register to clear OS
      locking. This does not require to be called with interrupts disabled,
      therefore the smp_call_function_single() calling convention is not
      preserved.
      
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: linux-arm-kernel@lists.infradead.org
      Signed-off-by: NAnna-Maria Gleixner <anna-maria@linutronix.de>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      499c8150
    • A
      arm64: simplify kernel segment mapping granularity · 97740051
      Ard Biesheuvel 提交于
      The mapping of the kernel consist of four segments, each of which is mapped
      with different permission attributes and/or lifetimes. To optimize the TLB
      and translation table footprint, we define various opaque constants in the
      linker script that resolve to different aligment values depending on the
      page size and whether CONFIG_DEBUG_ALIGN_RODATA is set.
      
      Considering that
      - a 4 KB granule kernel benefits from a 64 KB segment alignment (due to
        the fact that it allows the use of the contiguous bit),
      - the minimum alignment of the .data segment is THREAD_SIZE already, not
        PAGE_SIZE (i.e., we already have padding between _data and the start of
        the .data payload in many cases),
      - 2 MB is a suitable alignment value on all granule sizes, either for
        mapping directly (level 2 on 4 KB), or via the contiguous bit (level 3 on
        16 KB and 64 KB),
      - anything beyond 2 MB exceeds the minimum alignment mandated by the boot
        protocol, and can only be mapped efficiently if the physical alignment
        happens to be the same,
      
      we can simplify this by standardizing on 64 KB (or 2 MB) explicitly, i.e.,
      regardless of granule size, all segments are aligned either to 64 KB, or to
      2 MB if CONFIG_DEBUG_ALIGN_RODATA=y. This also means we can drop the Kconfig
      dependency of CONFIG_DEBUG_ALIGN_RODATA on CONFIG_ARM64_4K_PAGES.
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      97740051
    • A
      arm64: cover the .head.text section in the .text segment mapping · 7eb90f2f
      Ard Biesheuvel 提交于
      Keeping .head.text out of the .text mapping buys us very little: its actual
      payload is only 4 KB, most of which is padding, but the page alignment may
      add up to 2 MB (in case of CONFIG_DEBUG_ALIGN_RODATA=y) of additional
      padding to the uncompressed kernel Image.
      
      Also, on 4 KB granule kernels, the 4 KB misalignment of .text forces us to
      map the adjacent 56 KB of code without the PTE_CONT attribute, and since
      this region contains things like the vector table and the GIC interrupt
      handling entry point, this region is likely to benefit from the reduced TLB
      pressure that results from PTE_CONT mappings.
      
      So remove the alignment between the .head.text and .text sections, and use
      the [_text, _etext) rather than the [_stext, _etext) interval for mapping
      the .text segment.
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      7eb90f2f
    • A
      arm64: move early boot code to the .init segment · 546c8c44
      Ard Biesheuvel 提交于
      Apart from the arm64/linux and EFI header data structures, there is nothing
      in the .head.text section that must reside at the beginning of the Image.
      So let's move it to the .init section where it belongs.
      
      Note that this involves some minor tweaking of the EFI header, primarily
      because the address of 'stext' no longer coincides with the start of the
      .text section. It also requires a couple of relocated symbol references
      to be slightly rewritten or their definition moved to the linker script.
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      546c8c44
    • A
      arm64: use 'segment' rather than 'chunk' to describe mapped kernel regions · 2c09ec06
      Ard Biesheuvel 提交于
      Replace the poorly defined term chunk with segment, which is a term that is
      already used by the ELF spec to describe contiguous mappings with the same
      permission attributes of statically allocated ranges of an executable.
      Acked-by: NMark Rutland <mark.rutland@arm.com>
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      2c09ec06
  3. 14 4月, 2016 11 次提交
  4. 13 4月, 2016 2 次提交
    • J
      arm64: cpuidle: make arm_cpuidle_suspend() a bit more efficient · b5fda7ed
      Jisheng Zhang 提交于
      Currently, we check two pointers: cpu_ops and cpu_suspend on every idle
      state entry. These pointers check can be avoided:
      
      If cpu_ops has not been registered, arm_cpuidle_init() will return
      -EOPNOTSUPP, so arm_cpuidle_suspend() will never have chance to
      run. In other word, the cpu_ops check can be avoid.
      
      Similarly, the cpu_suspend check could be avoided in this hot path by
      moving it into arm_cpuidle_init().
      
      I measured the 4096 * time from arm_cpuidle_suspend entry point to the
      cpu_psci_cpu_suspend entry point. HW platform is Marvell BG4CT STB
      board.
      
      1. only one shell, no other process, hot-unplug secondary cpus, execute
      the following cmd
      
      while true
      do
      	sleep 0.2
      done
      
      before the patch: 1581220ns
      
      after the patch: 1579630ns
      
      reduced by 0.1%
      
      2. only one shell, no other process, hot-unplug secondary cpus, execute
      the following cmd
      
      while true
      do
      	md5sum /tmp/testfile
      	sleep 0.2
      done
      
      NOTE: the testfile size should be larger than L1+L2 cache size
      
      before the patch: 1961960ns
      after the patch: 1912500ns
      
      reduced by 2.5%
      
      So the more complex the system load, the bigger the improvement.
      Signed-off-by: NJisheng Zhang <jszhang@marvell.com>
      Acked-by: NLorenzo Pieralisi <lorenzo.pieralisi@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      b5fda7ed
    • K
      arm64: cpufeature: append additional id_aa64mmfr2 fields to cpufeature · 7d7b4ae4
      Kefeng Wang 提交于
      There are some new cpu features which can be identified by id_aa64mmfr2,
      this patch appends all fields of it.
      Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      7d7b4ae4
  5. 09 4月, 2016 5 次提交
  6. 08 4月, 2016 3 次提交
  7. 07 4月, 2016 1 次提交
  8. 05 4月, 2016 4 次提交
    • L
      kvm: x86: make lapic hrtimer pinned · 61abdbe0
      Luiz Capitulino 提交于
      When a vCPU runs on a nohz_full core, the hrtimer used by
      the lapic emulation code can be migrated to another core.
      When this happens, it's possible to observe milisecond
      latency when delivering timer IRQs to KVM guests.
      
      The huge latency is mainly due to the fact that
      apic_timer_fn() expects to run during a kvm exit. It
      sets KVM_REQ_PENDING_TIMER and let it be handled on kvm
      entry. However, if the timer fires on a different core,
      we have to wait until the next kvm exit for the guest
      to see KVM_REQ_PENDING_TIMER set.
      
      This problem became visible after commit 9642d18e. This
      commit changed the timer migration code to always attempt
      to migrate timers away from nohz_full cores. While it's
      discussable if this is correct/desirable (I don't think
      it is), it's clear that the lapic emulation code has
      a requirement on firing the hrtimer in the same core
      where it was started. This is achieved by making the
      hrtimer pinned.
      
      Lastly, note that KVM has code to migrate timers when a
      vCPU is scheduled to run in different core. However, this
      forced migration may fail. When this happens, we can have
      the same problem. If we want 100% correctness, we'll have
      to modify apic_timer_fn() to cause a kvm exit when it runs
      on a different core than the vCPU. Not sure if this is
      possible.
      
      Here's a reproducer for the issue being fixed:
      
       1. Set all cores but core0 to be nohz_full cores
       2. Start a guest with a single vCPU
       3. Trace apic_timer_fn() and kvm_inject_apic_timer_irqs()
      
      You'll see that apic_timer_fn() will run in core0 while
      kvm_inject_apic_timer_irqs() runs in a different core. If
      you get both on core0, try running a program that takes 100%
      of the CPU and pin it to core0 to force the vCPU out.
      Signed-off-by: NLuiz Capitulino <lcapitulino@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      61abdbe0
    • C
      s390/mm/kvm: fix mis-merge in gmap handling · 9c650d09
      Christian Borntraeger 提交于
      commit 1e133ab2 ("s390/mm: split arch/s390/mm/pgtable.c") dropped
      some changes from commit a3a92c31 ("KVM: s390: fix mismatch
      between user and in-kernel guest limit") - this breaks KVM for some
      memory sizes (kvm-s390: failed to commit memory region) like
      exactly 2GB.
      
      Cc: Dominik Dingel <dingel@linux.vnet.ibm.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Acked-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      9c650d09
    • K
      mm, fs: remove remaining PAGE_CACHE_* and page_cache_{get,release} usage · ea1754a0
      Kirill A. Shutemov 提交于
      Mostly direct substitution with occasional adjustment or removing
      outdated comments.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ea1754a0
    • K
      mm, fs: get rid of PAGE_CACHE_* and page_cache_{get,release} macros · 09cbfeaf
      Kirill A. Shutemov 提交于
      PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time
      ago with promise that one day it will be possible to implement page
      cache with bigger chunks than PAGE_SIZE.
      
      This promise never materialized.  And unlikely will.
      
      We have many places where PAGE_CACHE_SIZE assumed to be equal to
      PAGE_SIZE.  And it's constant source of confusion on whether
      PAGE_CACHE_* or PAGE_* constant should be used in a particular case,
      especially on the border between fs and mm.
      
      Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much
      breakage to be doable.
      
      Let's stop pretending that pages in page cache are special.  They are
      not.
      
      The changes are pretty straight-forward:
      
       - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>;
      
       - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN};
      
       - page_cache_get() -> get_page();
      
       - page_cache_release() -> put_page();
      
      This patch contains automated changes generated with coccinelle using
      script below.  For some reason, coccinelle doesn't patch header files.
      I've called spatch for them manually.
      
      The only adjustment after coccinelle is revert of changes to
      PAGE_CAHCE_ALIGN definition: we are going to drop it later.
      
      There are few places in the code where coccinelle didn't reach.  I'll
      fix them manually in a separate patch.  Comments and documentation also
      will be addressed with the separate patch.
      
      virtual patch
      
      @@
      expression E;
      @@
      - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      expression E;
      @@
      - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT)
      + E
      
      @@
      @@
      - PAGE_CACHE_SHIFT
      + PAGE_SHIFT
      
      @@
      @@
      - PAGE_CACHE_SIZE
      + PAGE_SIZE
      
      @@
      @@
      - PAGE_CACHE_MASK
      + PAGE_MASK
      
      @@
      expression E;
      @@
      - PAGE_CACHE_ALIGN(E)
      + PAGE_ALIGN(E)
      
      @@
      expression E;
      @@
      - page_cache_get(E)
      + get_page(E)
      
      @@
      expression E;
      @@
      - page_cache_release(E)
      + put_page(E)
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      09cbfeaf
  9. 04 4月, 2016 2 次提交
  10. 03 4月, 2016 1 次提交
    • P
      MIPS: Bail on unsupported module relocs · 04211a57
      Paul Burton 提交于
      When an unsupported reloc is encountered in a module, we currently
      blindly branch to whatever would be at its entry in the reloc handler
      function pointer arrays. This may be NULL, or if the unsupported reloc
      has a type greater than that of the supported reloc with the highest
      type then we'll dereference some value after the function pointer array
      & branch to that. The result is at best a kernel oops.
      
      Fix this by checking that the reloc type has an entry in the function
      pointer array (ie. is less than the number of items in the array) and
      that the handler is non-NULL, returning an error code to fail the module
      load if no handler is found.
      Signed-off-by: NPaul Burton <paul.burton@imgtec.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Steven J. Hill <Steven.Hill@imgtec.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: linux-mips@linux-mips.org
      Cc: linux-kernel@vger.kernel.org
      Patchwork: https://patchwork.linux-mips.org/patch/12432/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
      04211a57