1. 16 1月, 2014 4 次提交
    • B
      x86, tsc: Add static (MSR) TSC calibration on Intel Atom SoCs · 7da7c156
      Bin Gao 提交于
      On SoCs that have the calibration MSRs available, either there is no
      PIT, HPET or PMTIMER to calibrate against, or the PIT/HPET/PMTIMER is
      driven from the same clock as the TSC, so calibration is redundant and
      just slows down the boot.
      
      TSC rate is caculated by this formula:
      <maximum core-clock to bus-clock ratio> * <maximum resolved frequency>
      The ratio and the resolved frequency ID can be obtained from MSR.
      See Intel 64 and IA-32 System Programming Guid section 16.12 and 30.11.5
      for details.
      Signed-off-by: NBin Gao <bin.gao@intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      Link: http://lkml.kernel.org/n/tip-rgm7xmg7k6qnjlw3ynkcjsmh@git.kernel.org
      7da7c156
    • P
      x86: Add check for number of available vectors before CPU down · da6139e4
      Prarit Bhargava 提交于
      Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=64791
      
      When a cpu is downed on a system, the irqs on the cpu are assigned to
      other cpus.  It is possible, however, that when a cpu is downed there
      aren't enough free vectors on the remaining cpus to account for the
      vectors from the cpu that is being downed.
      
      This results in an interesting "overflow" condition where irqs are
      "assigned" to a CPU but are not handled.
      
      For example, when downing cpus on a 1-64 logical processor system:
      
      <snip>
      [  232.021745] smpboot: CPU 61 is now offline
      [  238.480275] smpboot: CPU 62 is now offline
      [  245.991080] ------------[ cut here ]------------
      [  245.996270] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:264 dev_watchdog+0x246/0x250()
      [  246.005688] NETDEV WATCHDOG: p786p1 (ixgbe): transmit queue 0 timed out
      [  246.013070] Modules linked in: lockd sunrpc iTCO_wdt iTCO_vendor_support sb_edac ixgbe microcode e1000e pcspkr joydev edac_core lpc_ich ioatdma ptp mdio mfd_core i2c_i801 dca pps_core i2c_core wmi acpi_cpufreq isci libsas scsi_transport_sas
      [  246.037633] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.12.0+ #14
      [  246.044451] Hardware name: Intel Corporation S4600LH ........../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
      [  246.057371]  0000000000000009 ffff88081fa03d40 ffffffff8164fbf6 ffff88081fa0ee48
      [  246.065728]  ffff88081fa03d90 ffff88081fa03d80 ffffffff81054ecc ffff88081fa13040
      [  246.074073]  0000000000000000 ffff88200cce0000 0000000000000040 0000000000000000
      [  246.082430] Call Trace:
      [  246.085174]  <IRQ>  [<ffffffff8164fbf6>] dump_stack+0x46/0x58
      [  246.091633]  [<ffffffff81054ecc>] warn_slowpath_common+0x8c/0xc0
      [  246.098352]  [<ffffffff81054fb6>] warn_slowpath_fmt+0x46/0x50
      [  246.104786]  [<ffffffff815710d6>] dev_watchdog+0x246/0x250
      [  246.110923]  [<ffffffff81570e90>] ? dev_deactivate_queue.constprop.31+0x80/0x80
      [  246.119097]  [<ffffffff8106092a>] call_timer_fn+0x3a/0x110
      [  246.125224]  [<ffffffff8106280f>] ? update_process_times+0x6f/0x80
      [  246.132137]  [<ffffffff81570e90>] ? dev_deactivate_queue.constprop.31+0x80/0x80
      [  246.140308]  [<ffffffff81061db0>] run_timer_softirq+0x1f0/0x2a0
      [  246.146933]  [<ffffffff81059a80>] __do_softirq+0xe0/0x220
      [  246.152976]  [<ffffffff8165fedc>] call_softirq+0x1c/0x30
      [  246.158920]  [<ffffffff810045f5>] do_softirq+0x55/0x90
      [  246.164670]  [<ffffffff81059d35>] irq_exit+0xa5/0xb0
      [  246.170227]  [<ffffffff8166062a>] smp_apic_timer_interrupt+0x4a/0x60
      [  246.177324]  [<ffffffff8165f40a>] apic_timer_interrupt+0x6a/0x70
      [  246.184041]  <EOI>  [<ffffffff81505a1b>] ? cpuidle_enter_state+0x5b/0xe0
      [  246.191559]  [<ffffffff81505a17>] ? cpuidle_enter_state+0x57/0xe0
      [  246.198374]  [<ffffffff81505b5d>] cpuidle_idle_call+0xbd/0x200
      [  246.204900]  [<ffffffff8100b7ae>] arch_cpu_idle+0xe/0x30
      [  246.210846]  [<ffffffff810a47b0>] cpu_startup_entry+0xd0/0x250
      [  246.217371]  [<ffffffff81646b47>] rest_init+0x77/0x80
      [  246.223028]  [<ffffffff81d09e8e>] start_kernel+0x3ee/0x3fb
      [  246.229165]  [<ffffffff81d0989f>] ? repair_env_string+0x5e/0x5e
      [  246.235787]  [<ffffffff81d095a5>] x86_64_start_reservations+0x2a/0x2c
      [  246.242990]  [<ffffffff81d0969f>] x86_64_start_kernel+0xf8/0xfc
      [  246.249610] ---[ end trace fb74fdef54d79039 ]---
      [  246.254807] ixgbe 0000:c2:00.0 p786p1: initiating reset due to tx timeout
      [  246.262489] ixgbe 0000:c2:00.0 p786p1: Reset adapter
      Last login: Mon Nov 11 08:35:14 from 10.18.17.119
      [root@(none) ~]# [  246.792676] ixgbe 0000:c2:00.0 p786p1: detected SFP+: 5
      [  249.231598] ixgbe 0000:c2:00.0 p786p1: NIC Link is Up 10 Gbps, Flow Control: RX/TX
      [  246.792676] ixgbe 0000:c2:00.0 p786p1: detected SFP+: 5
      [  249.231598] ixgbe 0000:c2:00.0 p786p1: NIC Link is Up 10 Gbps, Flow Control: RX/TX
      
      (last lines keep repeating.  ixgbe driver is dead until module reload.)
      
      If the downed cpu has more vectors than are free on the remaining cpus on the
      system, it is possible that some vectors are "orphaned" even though they are
      assigned to a cpu.  In this case, since the ixgbe driver had a watchdog, the
      watchdog fired and notified that something was wrong.
      
      This patch adds a function, check_vectors(), to compare the number of vectors
      on the CPU going down and compares it to the number of vectors available on
      the system.  If there aren't enough vectors for the CPU to go down, an
      error is returned and propogated back to userspace.
      
      v2: Do not need to look at percpu irqs
      v3: Need to check affinity to prevent counting of MSIs in IOAPIC Lowest
          Priority Mode
      v4: Additional changes suggested by Gong Chen.
      v5/v6/v7/v8: Updated comment text
      Signed-off-by: NPrarit Bhargava <prarit@redhat.com>
      Link: http://lkml.kernel.org/r/1389613861-3853-1-git-send-email-prarit@redhat.comReviewed-by: NGong Chen <gong.chen@linux.intel.com>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Seiji Aguchi <seiji.aguchi@hds.com>
      Cc: Yang Zhang <yang.z.zhang@Intel.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Janet Morgan <janet.morgan@intel.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Ruiv Wang <ruiv.wang@gmail.com>
      Cc: Gong Chen <gong.chen@linux.intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      Cc: <stable@vger.kernel.org>
      da6139e4
    • D
      x86, intel-mid: Add Merrifield platform support · bc20aa48
      David Cohen 提交于
      This code was partially based on Mark Brown's previous work.
      Signed-off-by: NDavid Cohen <david.a.cohen@linux.intel.com>
      Link: http://lkml.kernel.org/r/1387224459-25746-4-git-send-email-david.a.cohen@linux.intel.comSigned-off-by: NFei Yang <fei.yang@intel.com>
      Cc: Mark F. Brown <mark.f.brown@intel.com>
      Cc: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      bc20aa48
    • K
      x86, intel-mid: Add Clovertrail platform support · 85611e3f
      Kuppuswamy Sathyanarayanan 提交于
      This patch adds Clovertrail support on intel-mid and makes it more
      flexible to support other SoCs.
      Signed-off-by: NDavid Cohen <david.a.cohen@linux.intel.com>
      Link: http://lkml.kernel.org/r/1387224459-25746-3-git-send-email-david.a.cohen@linux.intel.comSigned-off-by: NKuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
      Signed-off-by: NFei Yang <fei.yang@intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      85611e3f
  2. 14 1月, 2014 4 次提交
  3. 13 1月, 2014 3 次提交
    • P
      sched/clock, x86: Rewrite cyc2ns() to avoid the need to disable IRQs · 20d1c86a
      Peter Zijlstra 提交于
      Use a ring-buffer like multi-version object structure which allows
      always having a coherent object; we use this to avoid having to
      disable IRQs while reading sched_clock() and avoids a problem when
      getting an NMI while changing the cyc2ns data.
      
                              MAINLINE   PRE        POST
      
          sched_clock_stable: 1          1          1
          (cold) sched_clock: 329841     331312     257223
          (cold) local_clock: 301773     310296     309889
          (warm) sched_clock: 38375      38247      25280
          (warm) local_clock: 100371     102713     85268
          (warm) rdtsc:       27340      27289      24247
          sched_clock_stable: 0          0          0
          (cold) sched_clock: 382634     372706     301224
          (cold) local_clock: 396890     399275     399870
          (warm) sched_clock: 38194      38124      25630
          (warm) local_clock: 143452     148698     129629
          (warm) rdtsc:       27345      27365      24307
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/n/tip-s567in1e5ekq2nlyhn8f987r@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      20d1c86a
    • P
      sched/clock, x86: Move some cyc2ns() code around · 57c67da2
      Peter Zijlstra 提交于
      There are no __cycles_2_ns() users outside of arch/x86/kernel/tsc.c,
      so move it there.
      
      There are no cycles_2_ns() users.
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/n/tip-01lslnavfgo3kmbo4532zlcj@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      57c67da2
    • P
      sched/clock, x86: Use mul_u64_u32_shr() for native_sched_clock() · 5dd12c21
      Peter Zijlstra 提交于
      Use mul_u64_u32_shr() so that x86_64 can use a single 64x64->128 mul.
      
      Before:
      
      0000000000000560 <native_sched_clock>:
       560:   44 8b 1d 00 00 00 00    mov    0x0(%rip),%r11d        # 567 <native_sched_clock+0x7>
       567:   55                      push   %rbp
       568:   48 89 e5                mov    %rsp,%rbp
       56b:   45 85 db                test   %r11d,%r11d
       56e:   75 4f                   jne    5bf <native_sched_clock+0x5f>
       570:   0f 31                   rdtsc
       572:   89 c0                   mov    %eax,%eax
       574:   48 c1 e2 20             shl    $0x20,%rdx
       578:   48 c7 c1 00 00 00 00    mov    $0x0,%rcx
       57f:   48 09 c2                or     %rax,%rdx
       582:   48 c7 c7 00 00 00 00    mov    $0x0,%rdi
       589:   65 8b 04 25 00 00 00    mov    %gs:0x0,%eax
       590:   00
       591:   48 98                   cltq
       593:   48 8b 34 c5 00 00 00    mov    0x0(,%rax,8),%rsi
       59a:   00
       59b:   48 89 d0                mov    %rdx,%rax
       59e:   81 e2 ff 03 00 00       and    $0x3ff,%edx
       5a4:   48 c1 e8 0a             shr    $0xa,%rax
       5a8:   48 0f af 14 0e          imul   (%rsi,%rcx,1),%rdx
       5ad:   48 0f af 04 0e          imul   (%rsi,%rcx,1),%rax
       5b2:   5d                      pop    %rbp
       5b3:   48 03 04 3e             add    (%rsi,%rdi,1),%rax
       5b7:   48 c1 ea 0a             shr    $0xa,%rdx
       5bb:   48 01 d0                add    %rdx,%rax
       5be:   c3                      retq
      
      After:
      
      0000000000000550 <native_sched_clock>:
       550:   8b 3d 00 00 00 00       mov    0x0(%rip),%edi        # 556 <native_sched_clock+0x6>
       556:   55                      push   %rbp
       557:   48 89 e5                mov    %rsp,%rbp
       55a:   48 83 e4 f0             and    $0xfffffffffffffff0,%rsp
       55e:   85 ff                   test   %edi,%edi
       560:   75 2c                   jne    58e <native_sched_clock+0x3e>
       562:   0f 31                   rdtsc
       564:   89 c0                   mov    %eax,%eax
       566:   48 c1 e2 20             shl    $0x20,%rdx
       56a:   48 09 c2                or     %rax,%rdx
       56d:   65 48 8b 04 25 00 00    mov    %gs:0x0,%rax
       574:   00 00
       576:   89 c0                   mov    %eax,%eax
       578:   48 f7 e2                mul    %rdx
       57b:   65 48 8b 0c 25 00 00    mov    %gs:0x0,%rcx
       582:   00 00
       584:   c9                      leaveq
       585:   48 0f ac d0 0a          shrd   $0xa,%rdx,%rax
       58a:   48 01 c8                add    %rcx,%rax
       58d:   c3                      retq
      
                              MAINLINE   POST
      
          sched_clock_stable: 1	   1
          (cold) sched_clock: 329841     331312
          (cold) local_clock: 301773     310296
          (warm) sched_clock: 38375      38247
          (warm) local_clock: 100371     102713
          (warm) rdtsc:       27340      27289
          sched_clock_stable: 0          0
          (cold) sched_clock: 382634     372706
          (cold) local_clock: 396890     399275
          (warm) sched_clock: 38194      38124
          (warm) local_clock: 143452     148698
          (warm) rdtsc:       27345      27365
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Link: http://lkml.kernel.org/n/tip-piu203ses5y1g36bnyw2n16x@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5dd12c21
  4. 12 1月, 2014 3 次提交
    • P
      x86/irq: Fix do_IRQ() interrupt warning for cpu hotplug retriggered irqs · 9345005f
      Prarit Bhargava 提交于
      During heavy CPU-hotplug operations the following spurious kernel warnings
      can trigger:
      
        do_IRQ: No ... irq handler for vector (irq -1)
      
        [ See: https://bugzilla.kernel.org/show_bug.cgi?id=64831 ]
      
      When downing a cpu it is possible that there are unhandled irqs
      left in the APIC IRR register.  The following code path shows
      how the problem can occur:
      
       1. CPU 5 is to go down.
      
       2. cpu_disable() on CPU 5 executes with interrupt flag cleared
          by local_irq_save() via stop_machine().
      
       3. IRQ 12 asserts on CPU 5, setting IRR but not ISR because
          interrupt flag is cleared (CPU unabled to handle the irq)
      
       4. IRQs are migrated off of CPU 5, and the vectors' irqs are set
          to -1. 5. stop_machine() finishes cpu_disable()
      
       6. cpu_die() for CPU 5 executes in normal context.
      
       7. CPU 5 attempts to handle IRQ 12 because the IRR is set for
          IRQ 12.  The code attempts to find the vector's IRQ and cannot
          because it has been set to -1. 8. do_IRQ() warning displays
          warning about CPU 5 IRQ 12.
      
      I added a debug printk to output which CPU & vector was
      retriggered and discovered that that we are getting bogus
      events.  I see a 100% correlation between this debug printk in
      fixup_irqs() and the do_IRQ() warning.
      
      This patchset resolves this by adding definitions for
      VECTOR_UNDEFINED(-1) and VECTOR_RETRIGGERED(-2) and modifying
      the code to use them.
      
      Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=64831Signed-off-by: NPrarit Bhargava <prarit@redhat.com>
      Reviewed-by: NRui Wang <rui.y.wang@intel.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Seiji Aguchi <seiji.aguchi@hds.com>
      Cc: Yang Zhang <yang.z.zhang@Intel.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: janet.morgan@Intel.com
      Cc: tony.luck@Intel.com
      Cc: ruiv.wang@gmail.com
      Link: http://lkml.kernel.org/r/1388938252-16627-1-git-send-email-prarit@redhat.com
      [ Cleaned up the code a bit. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      9345005f
    • P
      arch: Introduce smp_load_acquire(), smp_store_release() · 47933ad4
      Peter Zijlstra 提交于
      A number of situations currently require the heavyweight smp_mb(),
      even though there is no need to order prior stores against later
      loads.  Many architectures have much cheaper ways to handle these
      situations, but the Linux kernel currently has no portable way
      to make use of them.
      
      This commit therefore supplies smp_load_acquire() and
      smp_store_release() to remedy this situation.  The new
      smp_load_acquire() primitive orders the specified load against
      any subsequent reads or writes, while the new smp_store_release()
      primitive orders the specifed store against any prior reads or
      writes.  These primitives allow array-based circular FIFOs to be
      implemented without an smp_mb(), and also allow a theoretical
      hole in rcu_assign_pointer() to be closed at no additional
      expense on most architectures.
      
      In addition, the RCU experience transitioning from explicit
      smp_read_barrier_depends() and smp_wmb() to rcu_dereference()
      and rcu_assign_pointer(), respectively resulted in substantial
      improvements in readability.  It therefore seems likely that
      replacing other explicit barriers with smp_load_acquire() and
      smp_store_release() will provide similar benefits.  It appears
      that roughly half of the explicit barriers in core kernel code
      might be so replaced.
      
      [Changelog by PaulMck]
      Reviewed-by: N"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Acked-by: NWill Deacon <will.deacon@arm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Cc: Michael Ellerman <michael@ellerman.id.au>
      Cc: Michael Neuling <mikey@neuling.org>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Victor Kaplansky <VICTORK@il.ibm.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Link: http://lkml.kernel.org/r/20131213150640.908486364@infradead.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      47933ad4
    • L
      x86, fpu, amd: Clear exceptions in AMD FXSAVE workaround · 26bef131
      Linus Torvalds 提交于
      Before we do an EMMS in the AMD FXSAVE information leak workaround we
      need to clear any pending exceptions, otherwise we trap with a
      floating-point exception inside this code.
      Reported-by: Nhalfdog <me@halfdog.net>
      Tested-by: NBorislav Petkov <bp@suse.de>
      Link: http://lkml.kernel.org/r/CA%2B55aFxQnY_PCG_n4=0w-VG=YLXL-yr7oMxyy0WU2gCBAf3ydg@mail.gmail.comSigned-off-by: NH. Peter Anvin <hpa@zytor.com>
      26bef131
  5. 09 1月, 2014 1 次提交
    • D
      arch: x86: New MailBox support driver for Intel SOC's · 46184415
      David E. Box 提交于
      Current Intel SOC cores use a MailBox Interface (MBI) to provide access to
      configuration registers on devices (called units) connected to the system
      fabric. This is a support driver that implements access to this interface on
      those platforms that can enumerate the device using PCI. Initial support is for
      BayTrail, for which port definitons are provided. This is a requirement for
      implementing platform specific features (e.g. RAPL driver requires this to
      perform platform specific power management using the registers in PUNIT).
      Dependant modules should select IOSF_MBI in their respective Kconfig
      configuraiton. Serialized access is handled by all exported routines with
      spinlocks.
      
      The API includes 3 functions for access to unit registers:
      
      int iosf_mbi_read(u8 port, u8 opcode, u32 offset, u32 *mdr)
      int iosf_mbi_write(u8 port, u8 opcode, u32 offset, u32 mdr)
      int iosf_mbi_modify(u8 port, u8 opcode, u32 offset, u32 mdr, u32 mask)
      
      port:	indicating the unit being accessed
      opcode:	the read or write port specific opcode
      offset:	the register offset within the port
      mdr:	the register data to be read, written, or modified
      mask:	bit locations in mdr to change
      
      Returns nonzero on error
      
      Note: GPU code handles access to the GFX unit. Therefore access to that unit
      with this driver is disallowed to avoid conflicts.
      Signed-off-by: NDavid E. Box <david.e.box@linux.intel.com>
      Link: http://lkml.kernel.org/r/1389216471-734-1-git-send-email-david.e.box@linux.intel.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      46184415
  6. 07 1月, 2014 1 次提交
  7. 05 1月, 2014 1 次提交
    • S
      x86, sparse: Do not force removal of __user when calling copy_to/from_user_nocheck() · df90ca96
      Steven Rostedt 提交于
      Commit ff47ab4f "x86: Add 1/2/4/8 byte optimization to 64bit
      __copy_{from,to}_user_inatomic" added a "_nocheck" call in between
      the copy_to/from_user() and copy_user_generic(). As both the
      normal and nocheck versions of theses calls use the proper __user
      annotation, a typecast to remove it should not be added.
      This causes sparse to spin out the following warnings:
      
      arch/x86/include/asm/uaccess_64.h:207:47: warning: incorrect type in argument 2 (different address spaces)
      arch/x86/include/asm/uaccess_64.h:207:47:    expected void const [noderef] <asn:1>*src
      arch/x86/include/asm/uaccess_64.h:207:47:    got void const *<noident>
      arch/x86/include/asm/uaccess_64.h:207:47: warning: incorrect type in argument 2 (different address spaces)
      arch/x86/include/asm/uaccess_64.h:207:47:    expected void const [noderef] <asn:1>*src
      arch/x86/include/asm/uaccess_64.h:207:47:    got void const *<noident>
      arch/x86/include/asm/uaccess_64.h:207:47: warning: incorrect type in argument 2 (different address spaces)
      arch/x86/include/asm/uaccess_64.h:207:47:    expected void const [noderef] <asn:1>*src
      arch/x86/include/asm/uaccess_64.h:207:47:    got void const *<noident>
      arch/x86/include/asm/uaccess_64.h:207:47: warning: incorrect type in argument 2 (different address spaces)
      arch/x86/include/asm/uaccess_64.h:207:47:    expected void const [noderef] <asn:1>*src
      arch/x86/include/asm/uaccess_64.h:207:47:    got void const *<noident>
      
      Cc: Andi Kleen <ak@linux.intel.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      Link: http://lkml.kernel.org/r/20140103164500.5f6478f5@gandalf.local.homeSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      df90ca96
  8. 04 1月, 2014 1 次提交
  9. 03 1月, 2014 1 次提交
  10. 29 12月, 2013 1 次提交
    • D
      x86/efi: Pass necessary EFI data for kexec via setup_data · 1fec0533
      Dave Young 提交于
      Add a new setup_data type SETUP_EFI for kexec use.  Passing the saved
      fw_vendor, runtime, config tables and EFI runtime mappings.
      
      When entering virtual mode, directly mapping the EFI runtime regions
      which we passed in previously. And skip the step to call
      SetVirtualAddressMap().
      
      Specially for HP z420 workstation we need save the smbios physical
      address.  The kernel boot sequence proceeds in the following order.
      Step 2 requires efi.smbios to be the physical address.  However, I found
      that on HP z420 EFI system table has a virtual address of SMBIOS in step
      1.  Hence, we need set it back to the physical address with the smbios
      in efi_setup_data.  (When it is still the physical address, it simply
      sets the same value.)
      
      1. efi_init() - Set efi.smbios from EFI system table
      2. dmi_scan_machine() - Temporary map efi.smbios to access SMBIOS table
      3. efi_enter_virtual_mode() - Map EFI ranges
      
      Tested on ovmf+qemu, lenovo thinkpad, a dell laptop and an
      HP z420 workstation.
      Signed-off-by: NDave Young <dyoung@redhat.com>
      Tested-by: NToshi Kani <toshi.kani@hp.com>
      Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      1fec0533
  11. 28 12月, 2013 2 次提交
  12. 21 12月, 2013 1 次提交
  13. 20 12月, 2013 2 次提交
  14. 19 12月, 2013 1 次提交
    • R
      mm: fix TLB flush race between migration, and change_protection_range · 20841405
      Rik van Riel 提交于
      There are a few subtle races, between change_protection_range (used by
      mprotect and change_prot_numa) on one side, and NUMA page migration and
      compaction on the other side.
      
      The basic race is that there is a time window between when the PTE gets
      made non-present (PROT_NONE or NUMA), and the TLB is flushed.
      
      During that time, a CPU may continue writing to the page.
      
      This is fine most of the time, however compaction or the NUMA migration
      code may come in, and migrate the page away.
      
      When that happens, the CPU may continue writing, through the cached
      translation, to what is no longer the current memory location of the
      process.
      
      This only affects x86, which has a somewhat optimistic pte_accessible.
      All other architectures appear to be safe, and will either always flush,
      or flush whenever there is a valid mapping, even with no permissions
      (SPARC).
      
      The basic race looks like this:
      
      CPU A			CPU B			CPU C
      
      						load TLB entry
      make entry PTE/PMD_NUMA
      			fault on entry
      						read/write old page
      			start migrating page
      			change PTE/PMD to new page
      						read/write old page [*]
      flush TLB
      						reload TLB from new entry
      						read/write new page
      						lose data
      
      [*] the old page may belong to a new user at this point!
      
      The obvious fix is to flush remote TLB entries, by making sure that
      pte_accessible aware of the fact that PROT_NONE and PROT_NUMA memory may
      still be accessible if there is a TLB flush pending for the mm.
      
      This should fix both NUMA migration and compaction.
      
      [mgorman@suse.de: fix build]
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Cc: Alex Thorlton <athorlton@sgi.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      20841405
  15. 11 12月, 2013 1 次提交
  16. 05 12月, 2013 1 次提交
  17. 19 11月, 2013 3 次提交
    • P
      ftrace, perf: Avoid infinite event generation loop · d5b5f391
      Peter Zijlstra 提交于
      Vince's perf-trinity fuzzer found yet another 'interesting' problem.
      
      When we sample the irq_work_exit tracepoint with period==1 (or
      PERF_SAMPLE_PERIOD) and we add an fasync SIGNAL handler we create an
      infinite event generation loop:
      
        ,-> <IPI>
        |     irq_work_exit() ->
        |       trace_irq_work_exit() ->
        |         ...
        |           __perf_event_overflow() -> (due to fasync)
        |             irq_work_queue() -> (irq_work_list must be empty)
        '---------      arch_irq_work_raise()
      
      Similar things can happen due to regular poll() wakeups if we exceed
      the ring-buffer wakeup watermark, or have an event_limit.
      
      To avoid this, dis-allow sampling this particular tracepoint.
      
      In order to achieve this, create a special perf_perm function pointer
      for each event and call this (when set) on trying to create a
      tracepoint perf event.
      
      [ roasted: use expr... to allow for ',' in your expression ]
      Reported-by: NVince Weaver <vincent.weaver@maine.edu>
      Tested-by: NVince Weaver <vincent.weaver@maine.edu>
      Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Link: http://lkml.kernel.org/r/20131114152304.GC5364@laptop.programming.kicks-ass.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      d5b5f391
    • K
      x86/mm: Implement ASLR for hugetlb mappings · fd8526ad
      Kirill A. Shutemov 提交于
      Matthew noticed that hugetlb mappings don't participate in ASLR on x86-64:
      
        %  for i in `seq 3`; do
        > tools/testing/selftests/vm/map_hugetlb | grep address
        > done
        Returned address is 0x2aaaaac00000
        Returned address is 0x2aaaaac00000
        Returned address is 0x2aaaaac00000
      
      /proc/PID/maps entries for the mapping are always the same
      (except inode number):
      
        2aaaaac00000-2aaabac00000 rw-p 00000000 00:0c 8200              /anon_hugepage (deleted)
        2aaaaac00000-2aaabac00000 rw-p 00000000 00:0c 256               /anon_hugepage (deleted)
        2aaaaac00000-2aaabac00000 rw-p 00000000 00:0c 7180              /anon_hugepage (deleted)
      
      The reason is the generic hugetlb_get_unmapped_area() function
      which is used on x86-64.  It doesn't support randomization and
      use bottom-up unmapped area lookup, instead of usual top-down
      on x86-64.
      
      x86 has arch-specific hugetlb_get_unmapped_area(), but it's used
      only on x86-32.
      
      Let's use arch-specific hugetlb_get_unmapped_area() on x86-64
      too. That adds ASLR and switches hugetlb mappings to use top-down
      unmapped area lookup:
      
        %  for i in `seq 3`; do
        > tools/testing/selftests/vm/map_hugetlb | grep address
        > done
        Returned address is 0x7f4f08a00000
        Returned address is 0x7fdda4200000
        Returned address is 0x7febe0000000
      
      /proc/PID/maps entries:
      
        7f4f08a00000-7f4f18a00000 rw-p 00000000 00:0c 1168              /anon_hugepage (deleted)
        7fdda4200000-7fddb4200000 rw-p 00000000 00:0c 7092              /anon_hugepage (deleted)
        7febe0000000-7febf0000000 rw-p 00000000 00:0c 7183              /anon_hugepage (deleted)
      
      Unmapped area lookup policy for hugetlb mappings is consistent
      with normal mappings now -- the only difference is alignment
      requirements for huge pages.
      
      libhugetlbfs test-suite didn't detect any regressions with the
      patch applied (although it shows few failures on my machine
      regardless the patch).
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Matthew Wilcox <willy@linux.intel.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Mel Gorman <mgorman@suse.de>
      Link: http://lkml.kernel.org/r/20131119131750.EA45CE0090@blue.fi.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      fd8526ad
    • C
      x86/mm: Unify pte_to_pgoff() and pgoff_to_pte() helpers · 5305ca10
      Cyrill Gorcunov 提交于
      Use unified pte_bitop() helper to manipulate bits in pte/pgoff
      bitfields and convert pte_to_pgoff()/pgoff_to_pte() to inlines.
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      5305ca10
  18. 15 11月, 2013 2 次提交
    • K
      x86, mm: enable split page table lock for PMD level · 9491846f
      Kirill A. Shutemov 提交于
      Enable PMD split page table lock for X86_64 and PAE.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Tested-by: NAlex Thorlton <athorlton@sgi.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: "Eric W . Biederman" <ebiederm@xmission.com>
      Cc: "Paul E . McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Andi Kleen <ak@linux.intel.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Robin Holt <robinmholt@gmail.com>
      Cc: Sedat Dilek <sedat.dilek@gmail.com>
      Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Hugh Dickins <hughd@google.com>
      Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9491846f
    • R
      ACPI / driver core: Store an ACPI device pointer in struct acpi_dev_node · 7b199811
      Rafael J. Wysocki 提交于
      Modify struct acpi_dev_node to contain a pointer to struct acpi_device
      associated with the given device object (that is, its ACPI companion
      device) instead of an ACPI handle corresponding to it.  Introduce two
      new macros for manipulating that pointer in a CONFIG_ACPI-safe way,
      ACPI_COMPANION() and ACPI_COMPANION_SET(), and rework the
      ACPI_HANDLE() macro to take the above changes into account.
      Drop the ACPI_HANDLE_SET() macro entirely and rework its users to
      use ACPI_COMPANION_SET() instead.  For some of them who used to
      pass the result of acpi_get_child() directly to ACPI_HANDLE_SET()
      introduce a helper routine acpi_preset_companion() doing an
      equivalent thing.
      
      The main motivation for doing this is that there are things
      represented by struct acpi_device objects that don't have valid
      ACPI handles (so called fixed ACPI hardware features, such as
      power and sleep buttons) and we would like to create platform
      device objects for them and "glue" them to their ACPI companions
      in the usual way (which currently is impossible due to the
      lack of valid ACPI handles).  However, there are more reasons
      why it may be useful.
      
      First, struct acpi_device pointers allow of much better type checking
      than void pointers which are ACPI handles, so it should be more
      difficult to write buggy code using modified struct acpi_dev_node
      and the new macros.  Second, the change should help to reduce (over
      time) the number of places in which the result of ACPI_HANDLE() is
      passed to acpi_bus_get_device() in order to obtain a pointer to the
      struct acpi_device associated with the given "physical" device,
      because now that pointer is returned by ACPI_COMPANION() directly.
      Finally, the change should make it easier to write generic code that
      will build both for CONFIG_ACPI set and unset without adding explicit
      compiler directives to it.
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Tested-by: Mika Westerberg <mika.westerberg@linux.intel.com> # on Haswell
      Reviewed-by: NMika Westerberg <mika.westerberg@linux.intel.com>
      Reviewed-by: Aaron Lu <aaron.lu@intel.com> # for ATA and SDIO part
      7b199811
  19. 14 11月, 2013 1 次提交
  20. 13 11月, 2013 2 次提交
    • V
      x86: move fpu_counter into ARCH specific thread_struct · c375f15a
      Vineet Gupta 提交于
      Only a couple of arches (sh/x86) use fpu_counter in task_struct so it can
      be moved out into ARCH specific thread_struct, reducing the size of
      task_struct for other arches.
      
      Compile tested i386_defconfig + gcc 4.7.3
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      Cc: Paul Mundt <paul.mundt@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c375f15a
    • J
      x86/dumpstack: Fix printk_address for direct addresses · 5f01c988
      Jiri Slaby 提交于
      Consider a kernel crash in a module, simulated the following way:
      
       static int my_init(void)
       {
               char *map = (void *)0x5;
               *map = 3;
               return 0;
       }
       module_init(my_init);
      
      When we turn off FRAME_POINTERs, the very first instruction in
      that function causes a BUG. The problem is that we print IP in
      the BUG report using %pB (from printk_address). And %pB
      decrements the pointer by one to fix printing addresses of
      functions with tail calls.
      
      This was added in commit 71f9e598 ("x86, dumpstack: Use
      %pB format specifier for stack trace") to fix the call stack
      printouts.
      
      So instead of correct output:
      
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000005
        IP: [<ffffffffa01ac000>] my_init+0x0/0x10 [pb173]
      
      We get:
      
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000005
        IP: [<ffffffffa0152000>] 0xffffffffa0151fff
      
      To fix that, we use %pS only for stack addresses printouts (via
      newly added printk_stack_address) and %pB for regs->ip (via
      printk_address). I.e. we revert to the old behaviour for all
      except call stacks. And since from all those reliable is 1, we
      remove that parameter from printk_address.
      Signed-off-by: NJiri Slaby <jslaby@suse.cz>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: joe@perches.com
      Cc: jirislaby@gmail.com
      Link: http://lkml.kernel.org/r/1382706418-8435-1-git-send-email-jslaby@suse.czSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5f01c988
  21. 12 11月, 2013 2 次提交
  22. 09 11月, 2013 2 次提交