1. 15 12月, 2012 1 次提交
  2. 12 12月, 2012 3 次提交
    • M
      mm: use vm_unmapped_area() on x86_64 architecture · f9902472
      Michel Lespinasse 提交于
      Update the x86_64 arch_get_unmapped_area[_topdown] functions to make use
      of vm_unmapped_area() instead of implementing a brute force search.
      Signed-off-by: NMichel Lespinasse <walken@google.com>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f9902472
    • A
      mm: support more pagesizes for MAP_HUGETLB/SHM_HUGETLB · 42d7395f
      Andi Kleen 提交于
      There was some desire in large applications using MAP_HUGETLB or
      SHM_HUGETLB to use 1GB huge pages on some mappings, and stay with 2MB on
      others.  This is useful together with NUMA policy: use 2MB interleaving
      on some mappings, but 1GB on local mappings.
      
      This patch extends the IPC/SHM syscall interfaces slightly to allow
      specifying the page size.
      
      It borrows some upper bits in the existing flag arguments and allows
      encoding the log of the desired page size in addition to the *_HUGETLB
      flag.  When 0 is specified the default size is used, this makes the
      change fully compatible.
      
      Extending the internal hugetlb code to handle this is straight forward.
      Instead of a single mount it just keeps an array of them and selects the
      right mount based on the specified page size.  When no page size is
      specified it uses the mount of the default page size.
      
      The change is not visible in /proc/mounts because internal mounts don't
      appear there.  It also has very little overhead: the additional mounts
      just consume a super block, but not more memory when not used.
      
      I also exported the new flags to the user headers (they were previously
      under __KERNEL__).  Right now only symbols for x86 and some other
      architecture for 1GB and 2MB are defined.  The interface should already
      work for all other architectures though.  Only architectures that define
      multiple hugetlb sizes actually need it (that is currently x86, tile,
      powerpc).  However tile and powerpc have user configurable hugetlb
      sizes, so it's not easy to add defines.  A program on those
      architectures would need to query sysfs and use the appropiate log2.
      
      [akpm@linux-foundation.org: cleanups]
      [rientjes@google.com: fix build]
      [akpm@linux-foundation.org: checkpatch fixes]
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      42d7395f
    • Z
      x86/kexec: crash_vmclear_local_vmcss needs __rcu · 0ca0d818
      Zhang Yanfei 提交于
      This removes the sparse warning:
      arch/x86/kernel/crash.c:49:32: sparse: incompatible types in comparison expression (different address spaces)
      Reported-by: Nkbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      0ca0d818
  3. 08 12月, 2012 1 次提交
  4. 07 12月, 2012 1 次提交
  5. 06 12月, 2012 1 次提交
  6. 05 12月, 2012 1 次提交
  7. 01 12月, 2012 4 次提交
    • V
      x86, fpu: Avoid FPU lazy restore after suspend · 644c1541
      Vincent Palatin 提交于
      When a cpu enters S3 state, the FPU state is lost.
      After resuming for S3, if we try to lazy restore the FPU for a process running
      on the same CPU, this will result in a corrupted FPU context.
      
      Ensure that "fpu_owner_task" is properly invalided when (re-)initializing a CPU,
      so nobody will try to lazy restore a state which doesn't exist in the hardware.
      
      Tested with a 64-bit kernel on a 4-core Ivybridge CPU with eagerfpu=off,
      by doing thousands of suspend/resume cycles with 4 processes doing FPU
      operations running. Without the patch, a process is killed after a
      few hundreds cycles by a SIGFPE.
      
      Cc: Duncan Laurie <dlaurie@chromium.org>
      Cc: Olof Johansson <olofj@chromium.org>
      Cc: <stable@kernel.org> v3.4+ # for 3.4 need to replace this_cpu_write by percpu_write
      Signed-off-by: NVincent Palatin <vpalatin@chromium.org>
      Link: http://lkml.kernel.org/r/1354306532-1014-1-git-send-email-vpalatin@chromium.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      644c1541
    • W
      KVM: x86: Emulate IA32_TSC_ADJUST MSR · ba904635
      Will Auld 提交于
      CPUID.7.0.EBX[1]=1 indicates IA32_TSC_ADJUST MSR 0x3b is supported
      
      Basic design is to emulate the MSR by allowing reads and writes to a guest
      vcpu specific location to store the value of the emulated MSR while adding
      the value to the vmcs tsc_offset. In this way the IA32_TSC_ADJUST value will
      be included in all reads to the TSC MSR whether through rdmsr or rdtsc. This
      is of course as long as the "use TSC counter offsetting" VM-execution control
      is enabled as well as the IA32_TSC_ADJUST control.
      
      However, because hardware will only return the TSC + IA32_TSC_ADJUST +
      vmsc tsc_offset for a guest process when it does and rdtsc (with the correct
      settings) the value of our virtualized IA32_TSC_ADJUST must be stored in one
      of these three locations. The argument against storing it in the actual MSR
      is performance. This is likely to be seldom used while the save/restore is
      required on every transition. IA32_TSC_ADJUST was created as a way to solve
      some issues with writing TSC itself so that is not an option either.
      
      The remaining option, defined above as our solution has the problem of
      returning incorrect vmcs tsc_offset values (unless we intercept and fix, not
      done here) as mentioned above. However, more problematic is that storing the
      data in vmcs tsc_offset will have a different semantic effect on the system
      than does using the actual MSR. This is illustrated in the following example:
      
      The hypervisor set the IA32_TSC_ADJUST, then the guest sets it and a guest
      process performs a rdtsc. In this case the guest process will get
      TSC + IA32_TSC_ADJUST_hyperviser + vmsc tsc_offset including
      IA32_TSC_ADJUST_guest. While the total system semantics changed the semantics
      as seen by the guest do not and hence this will not cause a problem.
      Signed-off-by: NWill Auld <will.auld@intel.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      ba904635
    • W
      KVM: x86: Add code to track call origin for msr assignment · 8fe8ab46
      Will Auld 提交于
      In order to track who initiated the call (host or guest) to modify an msr
      value I have changed function call parameters along the call path. The
      specific change is to add a struct pointer parameter that points to (index,
      data, caller) information rather than having this information passed as
      individual parameters.
      
      The initial use for this capability is for updating the IA32_TSC_ADJUST msr
      while setting the tsc value. It is anticipated that this capability is
      useful for other tasks.
      Signed-off-by: NWill Auld <will.auld@intel.com>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      8fe8ab46
    • F
      context_tracking: New context tracking susbsystem · 91d1aa43
      Frederic Weisbecker 提交于
      Create a new subsystem that probes on kernel boundaries
      to keep track of the transitions between level contexts
      with two basic initial contexts: user or kernel.
      
      This is an abstraction of some RCU code that use such tracking
      to implement its userspace extended quiescent state.
      
      We need to pull this up from RCU into this new level of indirection
      because this tracking is also going to be used to implement an "on
      demand" generic virtual cputime accounting. A necessary step to
      shutdown the tick while still accounting the cputime.
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Li Zhong <zhong@linux.vnet.ibm.com>
      Cc: Gilad Ben-Yossef <gilad@benyossef.com>
      Reviewed-by: NSteven Rostedt <rostedt@goodmis.org>
      [ paulmck: fix whitespace error and email address. ]
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      91d1aa43
  8. 30 11月, 2012 7 次提交
  9. 29 11月, 2012 6 次提交
  10. 28 11月, 2012 9 次提交
  11. 21 11月, 2012 1 次提交
    • R
      x86-32: Fix invalid stack address while in softirq · 10226238
      Robert Richter 提交于
      In 32 bit the stack address provided by kernel_stack_pointer() may
      point to an invalid range causing NULL pointer access or page faults
      while in NMI (see trace below). This happens if called in softirq
      context and if the stack is empty. The address at &regs->sp is then
      out of range.
      
      Fixing this by checking if regs and &regs->sp are in the same stack
      context. Otherwise return the previous stack pointer stored in struct
      thread_info. If that address is invalid too, return address of regs.
      
       BUG: unable to handle kernel NULL pointer dereference at 0000000a
       IP: [<c1004237>] print_context_stack+0x6e/0x8d
       *pde = 00000000
       Oops: 0000 [#1] SMP
       Modules linked in:
       Pid: 4434, comm: perl Not tainted 3.6.0-rc3-oprofile-i386-standard-g4411a05 #4 Hewlett-Packard HP xw9400 Workstation/0A1Ch
       EIP: 0060:[<c1004237>] EFLAGS: 00010093 CPU: 0
       EIP is at print_context_stack+0x6e/0x8d
       EAX: ffffe000 EBX: 0000000a ECX: f4435f94 EDX: 0000000a
       ESI: f4435f94 EDI: f4435f94 EBP: f5409ec0 ESP: f5409ea0
        DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068
       CR0: 8005003b CR2: 0000000a CR3: 34ac9000 CR4: 000007d0
       DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
       DR6: ffff0ff0 DR7: 00000400
       Process perl (pid: 4434, ti=f5408000 task=f5637850 task.ti=f4434000)
       Stack:
        000003e8 ffffe000 00001ffc f4e39b00 00000000 0000000a f4435f94 c155198c
        f5409ef0 c1003723 c155198c f5409f04 00000000 f5409edc 00000000 00000000
        f5409ee8 f4435f94 f5409fc4 00000001 f5409f1c c12dce1c 00000000 c155198c
       Call Trace:
        [<c1003723>] dump_trace+0x7b/0xa1
        [<c12dce1c>] x86_backtrace+0x40/0x88
        [<c12db712>] ? oprofile_add_sample+0x56/0x84
        [<c12db731>] oprofile_add_sample+0x75/0x84
        [<c12ddb5b>] op_amd_check_ctrs+0x46/0x260
        [<c12dd40d>] profile_exceptions_notify+0x23/0x4c
        [<c1395034>] nmi_handle+0x31/0x4a
        [<c1029dc5>] ? ftrace_define_fields_irq_handler_entry+0x45/0x45
        [<c13950ed>] do_nmi+0xa0/0x2ff
        [<c1029dc5>] ? ftrace_define_fields_irq_handler_entry+0x45/0x45
        [<c13949e5>] nmi_stack_correct+0x28/0x2d
        [<c1029dc5>] ? ftrace_define_fields_irq_handler_entry+0x45/0x45
        [<c1003603>] ? do_softirq+0x4b/0x7f
        <IRQ>
        [<c102a06f>] irq_exit+0x35/0x5b
        [<c1018f56>] smp_apic_timer_interrupt+0x6c/0x7a
        [<c1394746>] apic_timer_interrupt+0x2a/0x30
       Code: 89 fe eb 08 31 c9 8b 45 0c ff 55 ec 83 c3 04 83 7d 10 00 74 0c 3b 5d 10 73 26 3b 5d e4 73 0c eb 1f 3b 5d f0 76 1a 3b 5d e8 73 15 <8b> 13 89 d0 89 55 e0 e8 ad 42 03 00 85 c0 8b 55 e0 75 a6 eb cc
       EIP: [<c1004237>] print_context_stack+0x6e/0x8d SS:ESP 0068:f5409ea0
       CR2: 000000000000000a
       ---[ end trace 62afee3481b00012 ]---
       Kernel panic - not syncing: Fatal exception in interrupt
      
      V2:
      * add comments to kernel_stack_pointer()
      * always return a valid stack address by falling back to the address
        of regs
      Reported-by: NYang Wei <wei.yang@windriver.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NRobert Richter <robert.richter@amd.com>
      Link: http://lkml.kernel.org/r/20120912135059.GZ8285@erda.amd.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      Cc: Jun Zhang <jun.zhang@intel.com>
      10226238
  12. 19 11月, 2012 1 次提交
    • S
      x86_32: Return actual stack when requesting sp from regs · 6c8d8b3c
      Steven Rostedt 提交于
      As x86_32 traps do not save sp when taken in kernel mode, we need to
      accommodate the sp when requesting to get the register.
      
      This affects kprobes.
      
      Before:
      
       # echo 'p:ftrace sys_read+4 s=%sp' > /debug/tracing/kprobe_events
       # echo 1 > /debug/tracing/events/kprobes/enable
       # cat trace
                  sshd-1345  [000] d...   489.117168: ftrace: (sys_read+0x4/0x70) s=b7e96768
                  sshd-1345  [000] d...   489.117191: ftrace: (sys_read+0x4/0x70) s=b7e96768
                   cat-1447  [000] d...   489.117392: ftrace: (sys_read+0x4/0x70) s=5a7
                   cat-1447  [001] d...   489.118023: ftrace: (sys_read+0x4/0x70) s=b77ad05f
                  less-1448  [000] d...   489.118079: ftrace: (sys_read+0x4/0x70) s=b7762e06
                  less-1448  [000] d...   489.118117: ftrace: (sys_read+0x4/0x70) s=b7764970
      
      After:
                  sshd-1352  [000] d...   362.348016: ftrace: (sys_read+0x4/0x70) s=f3febfa8
                  sshd-1352  [000] d...   362.348048: ftrace: (sys_read+0x4/0x70) s=f3febfa8
                  bash-1355  [001] d...   362.348081: ftrace: (sys_read+0x4/0x70) s=f5075fa8
                  sshd-1352  [000] d...   362.348082: ftrace: (sys_read+0x4/0x70) s=f3febfa8
                  sshd-1352  [000] d...   362.690950: ftrace: (sys_read+0x4/0x70) s=f3febfa8
                  bash-1355  [001] d...   362.691033: ftrace: (sys_read+0x4/0x70) s=f5075fa8
      
      Link: http://lkml.kernel.org/r/1342208654.30075.22.camel@gandalf.stny.rr.comReviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      6c8d8b3c
  13. 15 11月, 2012 4 次提交
    • F
      x86, topology: Debug CPU0 hotplug · a71c8bc5
      Fenghua Yu 提交于
      CONFIG_DEBUG_HOTPLUG_CPU0 is for debugging the CPU0 hotplug feature. The switch
      offlines CPU0 as soon as possible and boots userspace up with CPU0 offlined.
      User can online CPU0 back after boot time. The default value of the switch is
      off.
      
      To debug CPU0 hotplug, you need to enable CPU0 offline/online feature by either
      turning on CONFIG_BOOTPARAM_HOTPLUG_CPU0 during compilation or giving
      cpu0_hotplug kernel parameter at boot.
      
      It's safe and early place to take down CPU0 after all hotplug notifiers
      are installed and SMP is booted.
      
      Please note that some applications or drivers, e.g. some versions of udevd,
      during boot time may put CPU0 online again in this CPU0 hotplug debug mode.
      
      In this debug mode, setup_local_APIC() may report a warning on max_loops<=0
      when CPU0 is onlined back after boot time. This is because pending interrupt in
      IRR can not move to ISR. The warning is not CPU0 specfic and it can happen on
      other CPUs as well. It is harmless except the first CPU0 online takes a bit
      longer time. And so this debug mode is useful to expose this issue. I'll send
      a seperate patch to fix this generic warning issue.
      Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
      Link: http://lkml.kernel.org/r/1352835171-3958-15-git-send-email-fenghua.yu@intel.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      a71c8bc5
    • F
      x86, hotplug: Wake up CPU0 via NMI instead of INIT, SIPI, SIPI · e1c467e6
      Fenghua Yu 提交于
      Instead of waiting for STARTUP after INITs, BSP will execute the BIOS boot-strap
      code which is not a desired behavior for waking up BSP. To avoid the boot-strap
      code, wake up CPU0 by NMI instead.
      
      This works to wake up soft offlined CPU0 only. If CPU0 is hard offlined (i.e.
      physically hot removed and then hot added), NMI won't wake it up. We'll change
      this code in the future to wake up hard offlined CPU0 if real platform and
      request are available.
      
      AP is still waken up as before by INIT, SIPI, SIPI sequence.
      Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
      Link: http://lkml.kernel.org/r/1352896613-25957-1-git-send-email-fenghua.yu@intel.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      e1c467e6
    • M
      driver core / ACPI: Move ACPI support to core device and driver types · 06f64c8f
      Mika Westerberg 提交于
      With ACPI 5 we are starting to see devices that don't natively support
      discovery but can be enumerated with the help of the ACPI namespace.
      Typically, these devices can be represented in the Linux device driver
      model as platform devices or some serial bus devices, like SPI or I2C
      devices.
      
      Since we want to re-use existing drivers for those devices, we need a
      way for drivers to specify the ACPI IDs of supported devices, so that
      they can be matched against device nodes in the ACPI namespace.  To
      this end, it is sufficient to add a pointer to an array of supported
      ACPI device IDs, that can be provided by the driver, to struct device.
      
      Moreover, things like ACPI power management need to have access to
      the ACPI handle of each supported device, because that handle is used
      to invoke AML methods associated with the corresponding ACPI device
      node.  The ACPI handles of devices are now stored in the archdata
      member structure of struct device whose definition depends on the
      architecture and includes the ACPI handle only on x86 and ia64. Since
      the pointer to an array of supported ACPI IDs is added to struct
      device_driver in an architecture-independent way, it is logical to
      move the ACPI handle from archdata to struct device itself at the same
      time.  This also makes code more straightforward in some places and
      follows the example of Device Trees that have a poiter to struct
      device_node in there too.
      
      This changeset is based on Mika Westerberg's work.
      Signed-off-by: NMika Westerberg <mika.westerberg@linux.intel.com>
      Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Acked-by: NH. Peter Anvin <hpa@zytor.com>
      Acked-by: NTony Luck <tony.luck@intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      06f64c8f
    • F
      x86, hotplug: Support functions for CPU0 online/offline · 30106c17
      Fenghua Yu 提交于
      Add smp_store_boot_cpu_info() to store cpu info for BSP during boot time.
      
      Now smp_store_cpu_info() stores cpu info for bringing up BSP or AP after
      it's offline.
      
      Continue to online CPU0 in native_cpu_up().
      
      Continue to offline CPU0 in native_cpu_disable().
      Signed-off-by: NFenghua Yu <fenghua.yu@intel.com>
      Link: http://lkml.kernel.org/r/1352835171-3958-5-git-send-email-fenghua.yu@intel.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      30106c17