1. 13 1月, 2014 4 次提交
    • P
      sched/clock, x86: Rewrite cyc2ns() to avoid the need to disable IRQs · 20d1c86a
      Peter Zijlstra 提交于
      Use a ring-buffer like multi-version object structure which allows
      always having a coherent object; we use this to avoid having to
      disable IRQs while reading sched_clock() and avoids a problem when
      getting an NMI while changing the cyc2ns data.
      
                              MAINLINE   PRE        POST
      
          sched_clock_stable: 1          1          1
          (cold) sched_clock: 329841     331312     257223
          (cold) local_clock: 301773     310296     309889
          (warm) sched_clock: 38375      38247      25280
          (warm) local_clock: 100371     102713     85268
          (warm) rdtsc:       27340      27289      24247
          sched_clock_stable: 0          0          0
          (cold) sched_clock: 382634     372706     301224
          (cold) local_clock: 396890     399275     399870
          (warm) sched_clock: 38194      38124      25630
          (warm) local_clock: 143452     148698     129629
          (warm) rdtsc:       27345      27365      24307
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/n/tip-s567in1e5ekq2nlyhn8f987r@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      20d1c86a
    • P
      sched/clock, x86: Move some cyc2ns() code around · 57c67da2
      Peter Zijlstra 提交于
      There are no __cycles_2_ns() users outside of arch/x86/kernel/tsc.c,
      so move it there.
      
      There are no cycles_2_ns() users.
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/n/tip-01lslnavfgo3kmbo4532zlcj@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      57c67da2
    • P
      sched/clock, x86: Use mul_u64_u32_shr() for native_sched_clock() · 5dd12c21
      Peter Zijlstra 提交于
      Use mul_u64_u32_shr() so that x86_64 can use a single 64x64->128 mul.
      
      Before:
      
      0000000000000560 <native_sched_clock>:
       560:   44 8b 1d 00 00 00 00    mov    0x0(%rip),%r11d        # 567 <native_sched_clock+0x7>
       567:   55                      push   %rbp
       568:   48 89 e5                mov    %rsp,%rbp
       56b:   45 85 db                test   %r11d,%r11d
       56e:   75 4f                   jne    5bf <native_sched_clock+0x5f>
       570:   0f 31                   rdtsc
       572:   89 c0                   mov    %eax,%eax
       574:   48 c1 e2 20             shl    $0x20,%rdx
       578:   48 c7 c1 00 00 00 00    mov    $0x0,%rcx
       57f:   48 09 c2                or     %rax,%rdx
       582:   48 c7 c7 00 00 00 00    mov    $0x0,%rdi
       589:   65 8b 04 25 00 00 00    mov    %gs:0x0,%eax
       590:   00
       591:   48 98                   cltq
       593:   48 8b 34 c5 00 00 00    mov    0x0(,%rax,8),%rsi
       59a:   00
       59b:   48 89 d0                mov    %rdx,%rax
       59e:   81 e2 ff 03 00 00       and    $0x3ff,%edx
       5a4:   48 c1 e8 0a             shr    $0xa,%rax
       5a8:   48 0f af 14 0e          imul   (%rsi,%rcx,1),%rdx
       5ad:   48 0f af 04 0e          imul   (%rsi,%rcx,1),%rax
       5b2:   5d                      pop    %rbp
       5b3:   48 03 04 3e             add    (%rsi,%rdi,1),%rax
       5b7:   48 c1 ea 0a             shr    $0xa,%rdx
       5bb:   48 01 d0                add    %rdx,%rax
       5be:   c3                      retq
      
      After:
      
      0000000000000550 <native_sched_clock>:
       550:   8b 3d 00 00 00 00       mov    0x0(%rip),%edi        # 556 <native_sched_clock+0x6>
       556:   55                      push   %rbp
       557:   48 89 e5                mov    %rsp,%rbp
       55a:   48 83 e4 f0             and    $0xfffffffffffffff0,%rsp
       55e:   85 ff                   test   %edi,%edi
       560:   75 2c                   jne    58e <native_sched_clock+0x3e>
       562:   0f 31                   rdtsc
       564:   89 c0                   mov    %eax,%eax
       566:   48 c1 e2 20             shl    $0x20,%rdx
       56a:   48 09 c2                or     %rax,%rdx
       56d:   65 48 8b 04 25 00 00    mov    %gs:0x0,%rax
       574:   00 00
       576:   89 c0                   mov    %eax,%eax
       578:   48 f7 e2                mul    %rdx
       57b:   65 48 8b 0c 25 00 00    mov    %gs:0x0,%rcx
       582:   00 00
       584:   c9                      leaveq
       585:   48 0f ac d0 0a          shrd   $0xa,%rdx,%rax
       58a:   48 01 c8                add    %rcx,%rax
       58d:   c3                      retq
      
                              MAINLINE   POST
      
          sched_clock_stable: 1	   1
          (cold) sched_clock: 329841     331312
          (cold) local_clock: 301773     310296
          (warm) sched_clock: 38375      38247
          (warm) local_clock: 100371     102713
          (warm) rdtsc:       27340      27289
          sched_clock_stable: 0          0
          (cold) sched_clock: 382634     372706
          (cold) local_clock: 396890     399275
          (warm) sched_clock: 38194      38124
          (warm) local_clock: 143452     148698
          (warm) rdtsc:       27345      27365
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/n/tip-piu203ses5y1g36bnyw2n16x@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5dd12c21
    • D
      sched: Add new scheduler syscalls to support an extended scheduling parameters ABI · d50dde5a
      Dario Faggioli 提交于
      Add the syscalls needed for supporting scheduling algorithms
      with extended scheduling parameters (e.g., SCHED_DEADLINE).
      
      In general, it makes possible to specify a periodic/sporadic task,
      that executes for a given amount of runtime at each instance, and is
      scheduled according to the urgency of their own timing constraints,
      i.e.:
      
       - a (maximum/typical) instance execution time,
       - a minimum interval between consecutive instances,
       - a time constraint by which each instance must be completed.
      
      Thus, both the data structure that holds the scheduling parameters of
      the tasks and the system calls dealing with it must be extended.
      Unfortunately, modifying the existing struct sched_param would break
      the ABI and result in potentially serious compatibility issues with
      legacy binaries.
      
      For these reasons, this patch:
      
       - defines the new struct sched_attr, containing all the fields
         that are necessary for specifying a task in the computational
         model described above;
      
       - defines and implements the new scheduling related syscalls that
         manipulate it, i.e., sched_setattr() and sched_getattr().
      
      Syscalls are introduced for x86 (32 and 64 bits) and ARM only, as a
      proof of concept and for developing and testing purposes. Making them
      available on other architectures is straightforward.
      
      Since no "user" for these new parameters is introduced in this patch,
      the implementation of the new system calls is just identical to their
      already existing counterpart. Future patches that implement scheduling
      policies able to exploit the new data structure must also take care of
      modifying the sched_*attr() calls accordingly with their own purposes.
      Signed-off-by: NDario Faggioli <raistlin@linux.it>
      [ Rewrote to use sched_attr. ]
      Signed-off-by: NJuri Lelli <juri.lelli@gmail.com>
      [ Removed sched_setscheduler2() for now. ]
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1383831828-15501-3-git-send-email-juri.lelli@gmail.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d50dde5a
  2. 09 1月, 2014 1 次提交
    • J
      parisc: Ensure full cache coherency for kmap/kunmap · f8dae006
      John David Anglin 提交于
      Helge Deller noted a few weeks ago problems with the AIO support on
      parisc. This change is the result of numerous iterations on how best to
      deal with this problem.
      
      The solution adopted here is to provide full cache coherency in a
      uniform manner on all parisc systems. This involves calling
      flush_dcache_page() on kmap operations and flush_kernel_dcache_page() on
      kunmap operations. As a result, the copy_user_page() and
      clear_user_page() functions can be removed and the overall code is
      simpler.
      
      The change ensures that both userspace and kernel aliases to a mapped
      page are invalidated and flushed. This is necessary for the correct
      operation of PA8800 and PA8900 based systems which do not support
      inequivalent aliases.
      
      With this change, I have observed no cache related issues on c8000 and
      rp3440. It is now possible for example to do kernel builds with "-j64"
      on four way systems.
      
      On systems using XFS file systems, the patch recently posted by Mikulas
      Patocka to "fix crash using XFS on loopback" is needed to avoid a hang
      caused by an uninitialized lock passed to flush_dcache_page() in the
      page struct.
      Signed-off-by: NJohn David Anglin <dave.anglin@bell.net>
      Cc: stable@vger.kernel.org # v3.9+
      Signed-off-by: NHelge Deller <deller@gmx.de>
      f8dae006
  3. 05 1月, 2014 6 次提交
  4. 03 1月, 2014 1 次提交
  5. 02 1月, 2014 1 次提交
    • J
      KVM: nVMX: Unconditionally uninit the MMU on nested vmexit · 29bf08f1
      Jan Kiszka 提交于
      Three reasons for doing this: 1. arch.walk_mmu points to arch.mmu anyway
      in case nested EPT wasn't in use. 2. this aligns VMX with SVM. But 3. is
      most important: nested_cpu_has_ept(vmcs12) queries the VMCS page, and if
      one guest VCPU manipulates the page of another VCPU in L2, we may be
      fooled to skip over the nested_ept_uninit_mmu_context, leaving mmu in
      nested state. That can crash the host later on if nested_ept_get_cr3 is
      invoked while L1 already left vmxon and nested.current_vmcs12 became
      NULL therefore.
      
      Cc: stable@kernel.org
      Signed-off-by: NJan Kiszka <jan.kiszka@siemens.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      29bf08f1
  6. 31 12月, 2013 2 次提交
  7. 30 12月, 2013 8 次提交
  8. 29 12月, 2013 4 次提交
    • L
      ARM: 7931/1: Correct virt_addr_valid · efea3403
      Laura Abbott 提交于
      The definition of virt_addr_valid is that virt_addr_valid should
      return true if and only if virt_to_page returns a valid pointer.
      The current definition of virt_addr_valid only checks against the
      virtual address range. There's no guarantee that just because a
      virtual address falls bewteen PAGE_OFFSET and high_memory the
      associated physical memory has a valid backing struct page. Follow
      the example of other architectures and convert to pfn_valid to
      verify that the virtual address is actually valid. The check for
      an address between PAGE_OFFSET and high_memory is still necessary
      as vmalloc/highmem addresses are not valid with virt_to_page.
      
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Nicolas Pitre <nico@linaro.org>
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NLaura Abbott <lauraa@codeaurora.org>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      efea3403
    • S
      ARM: 7923/1: mm: fix dcache flush logic for compound high pages · 2a7cfcbc
      Steven Capper 提交于
      When given a compound high page, __flush_dcache_page will only flush
      the first page of the compound page repeatedly rather than the entire
      set of constituent pages.
      
      This error was introduced by:
         0b19f933 ARM: mm: Add support for flushing HugeTLB pages.
      
      This patch corrects the logic such that all constituent pages are now
      flushed.
      
      Cc: stable@vger.kernel.org # 3.10+
      Signed-off-by: NSteve Capper <steve.capper@linaro.org>
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      2a7cfcbc
    • R
      ARM: fix footbridge clockevent device · 4ff859fe
      Russell King 提交于
      The clockevents code was being told that the footbridge clock event
      device ticks at 16x the rate which it actually does.  This leads to
      timekeeping problems since it allows the clocksource to wrap before
      the kernel notices.  Fix this by using the correct clock.
      
      Fixes: 4e8d7637 ("ARM: footbridge: convert to clockevents/clocksource")
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      Cc: <stable@vger.kernel.org>
      4ff859fe
    • L
      ARM: pxa: fix USB gadget driver compilation regression · 9928422f
      Linus Walleij 提交于
      After commit 88f718e3
      "ARM: pxa: delete the custom GPIO header" a compilation
      error was introduced in the PXA25x gadget driver.
      An attempt to fix the problem was made in
      commit b144e4ab
      "usb: gadget: fix pxa25x compilation problems"
      by explictly stating the driver needs the <mach/hardware.h>
      header, which solved the compilation for a few boards,
      such as the pxa255-idp and its defconfig.
      
      However the Lubbock board has this special clause in
      drivers/usb/gadget/pxa25x_udc.c:
      
      This include file has an implicit dependency on
      <mach/irqs.h> having been included before <mach/lubbock.h>
      was included.
      
      Before commit 88f718e3
      "ARM: pxa: delete the custom GPIO header" this implicit
      dependency for the pxa25x_udc compile on the Lubbock was
      satisfied by <linux/gpio.h> implicitly including
      <mach/gpio.h> which was in turn including <mach/irqs.h>,
      apart from the earlier added <mach/hardware.h>.
      
      Fix this by having the PXA25x <mach/lubbock.h> explicitly
      include <mach/irqs.h>.
      Reported-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      Cc: Greg Kroah-Hartmann <gregkh@linuxfoundation.org>
      Cc: Felipe Balbi <balbi@ti.com>
      Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: NHaojian Zhuang <haojian.zhuang@gmail.com>
      Signed-off-by: NOlof Johansson <olof@lixom.net>
      9928422f
  9. 28 12月, 2013 1 次提交
  10. 26 12月, 2013 5 次提交
  11. 21 12月, 2013 1 次提交
  12. 20 12月, 2013 2 次提交
    • L
      x86 idle: Repair large-server 50-watt idle-power regression · 40e2d7f9
      Len Brown 提交于
      Linux 3.10 changed the timing of how thread_info->flags is touched:
      
      	x86: Use generic idle loop
      	(7d1a9417)
      
      This caused Intel NHM-EX and WSM-EX servers to experience a large number
      of immediate MONITOR/MWAIT break wakeups, which caused cpuidle to demote
      from deep C-states to shallow C-states, which caused these platforms
      to experience a significant increase in idle power.
      
      Note that this issue was already present before the commit above,
      however, it wasn't seen often enough to be noticed in power measurements.
      
      Here we extend an errata workaround from the Core2 EX "Dunnington"
      to extend to NHM-EX and WSM-EX, to prevent these immediate
      returns from MWAIT, reducing idle power on these platforms.
      
      While only acpi_idle ran on Dunnington, intel_idle
      may also run on these two newer systems.
      As of today, there are no other models that are known
      to need this tweak.
      
      Link: http://lkml.kernel.org/r/CAJvTdK=%2BaNN66mYpCGgbHGCHhYQAKx-vB0kJSWjVpsNb_hOAtQ@mail.gmail.comSigned-off-by: NLen Brown <len.brown@intel.com>
      Link: http://lkml.kernel.org/r/baff264285f6e585df757d58b17788feabc68918.1387403066.git.len.brown@intel.com
      Cc: <stable@vger.kernel.org> # 3.12.x, 3.11.x, 3.10.x
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      40e2d7f9
    • W
      arm64: ptrace: avoid using HW_BREAKPOINT_EMPTY for disabled events · cdc27c27
      Will Deacon 提交于
      Commit 8f34a1da ("arm64: ptrace: use HW_BREAKPOINT_EMPTY type for
      disabled breakpoints") fixed an issue with GDB trying to zero breakpoint
      control registers. The problem there is that the arch hw_breakpoint code
      will attempt to create a (disabled), execute breakpoint of length 0.
      
      This will fail validation and report unexpected failure to GDB. To avoid
      this, we treated disabled breakpoints as HW_BREAKPOINT_EMPTY, but that
      seems to have broken with recent kernels, causing watchpoints to be
      treated as TYPE_INST in the core code and returning ENOSPC for any
      further breakpoints.
      
      This patch fixes the problem by prioritising the `enable' field of the
      breakpoint: if it is cleared, we simply update the perf_event_attr to
      indicate that the thing is disabled and don't bother changing either the
      type or the length. This reinforces the behaviour that the breakpoint
      control register is essentially read-only apart from the enable bit
      when disabling a breakpoint.
      
      Cc: <stable@vger.kernel.org>
      Reported-by: NAaron Liu <liucy214@gmail.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      cdc27c27
  13. 19 12月, 2013 4 次提交