1. 03 5月, 2013 1 次提交
    • K
      x86, gdt, hibernate: Store/load GDT for hibernate path. · cc456c4e
      Konrad Rzeszutek Wilk 提交于
      The git commite7a5cd06
      ("x86-64, gdt: Store/load GDT for ACPI S3 or hibernate/resume path
      is not needed.") assumes that for the hibernate path the booting
      kernel and the resuming kernel MUST be the same. That is certainly
      the case for a 32-bit kernel (see check_image_kernel and
      CONFIG_ARCH_HIBERNATION_HEADER config option).
      
      However for 64-bit kernels it is OK to have a different kernel
      version (and size of the image) of the booting and resuming kernels.
      Hence the above mentioned git commit introduces an regression.
      
      This patch fixes it by introducing a 'struct desc_ptr gdt_desc'
      back in the 'struct saved_context'. However instead of having in the
      'save_processor_state' and 'restore_processor_state' the
      store/load_gdt calls, we are only saving the GDT in the
      save_processor_state.
      
      For the restore path the lgdt operation is done in
      hibernate_asm_[32|64].S in the 'restore_registers' path.
      
      The apt reader of this description will recognize that only 64-bit
      kernels need this treatment, not 32-bit. This patch adds the logic
      in the 32-bit path to be more similar to 64-bit so that in the future
      the unification process can take advantage of this.
      
      [ hpa: this also reverts an inadvertent on-disk format change ]
      Suggested-by: N"H. Peter Anvin" <hpa@zytor.com>
      Acked-by: N"Rafael J. Wysocki" <rjw@sisk.pl>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Link: http://lkml.kernel.org/r/1367459610-9656-2-git-send-email-konrad.wilk@oracle.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      cc456c4e
  2. 12 4月, 2013 3 次提交
    • K
      x86, wakeup, sleep: Use pvops functions for changing GDT entries · 4d681be3
      konrad@kernel.org 提交于
      We check the TSS descriptor before we try to dereference it.
      Also we document what the value '9' actually means using the
      AMD64 Architecture Programmer's Manual Volume 2, pg 90:
      "Hex value 9: Available 64-bit TSS" and pg 91:
      "The available 32-bit TSS (09h), which is redefined as the
      available 64-bit TSS."
      
      Without this, on Xen, where the GDT is available as R/O (to
      protect the hypervisor from the guest modifying it), we end up
      with a pagetable fault.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Link: http://lkml.kernel.org/r/1365194544-14648-5-git-send-email-konrad.wilk@oracle.com
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      4d681be3
    • K
      x86-32, gdt: Store/load GDT for ACPI S3 or hibernation/resume path is not needed · 84e70971
      Konrad Rzeszutek Wilk 提交于
      During the ACPI S3 suspend, we store the GDT in the wakup_header (see
      wakeup_asm.s) field called 'pmode_gdt'.
      
      Which is then used during the resume path and has the same exact
      value as what the store/load_gdt do with the saved_context
      (which is saved/restored via save/restore_processor_state()).
      
      The flow during resume from ACPI S3 is simpler than the 64-bit
      counterpart. We only use the early bootstrap once (wakeup_gdt) and
      do various checks in real mode.
      
      After the checks are completed, we load the saved GDT ('pmode_gdt') and
      continue on with the resume (by heading to startup_32 in trampoline_32.S) -
      which quickly jumps to what was saved in 'pmode_entry'
      aka 'wakeup_pmode_return'.
      
      The 'wakeup_pmode_return' restores the GDT (saved_gdt) again (which was
      saved in do_suspend_lowlevel initially). After that it ends up calling
      the 'ret_point' which calls 'restore_processor_state()'.
      
      We have two opportunities to remove code where we restore the same GDT
      twice.
      
      Here is the call chain:
       wakeup_start
             |- lgdtl wakeup_gdt [the work-around broken BIOSes]
             |
             | - lgdtl pmode_gdt [the real one]
             |
             \-- startup_32 (in trampoline_32.S)
                    \-- wakeup_pmode_return (in wakeup_32.S)
                             |- lgdtl saved_gdt [the real one]
                             \-- ret_point
                                   |..
                                   |- call restore_processor_state
      
      The hibernate path is much simpler. During the saving of the hibernation
      image we call save_processor_state() and save the contents of that
      along with the rest of the kernel in the hibernation image destination.
      We save the EIP of 'restore_registers' (restore_jump_address) and
      cr3 (restore_cr3).
      
      During hibernate resume, the 'restore_registers' (via the
      'restore_jump_address) in hibernate_asm_32.S is invoked which
      restores the contents of most registers. Naturally the resume path benefits
      from already being in 32-bit mode, so it does not have to reload the GDT.
      
      It only reloads the cr3 (from restore_cr3) and continues on. Note
      that the restoration of the restore image page-tables is done prior to
      this.
      
      After the 'restore_registers' it returns and we end up called
      restore_processor_state() - where we reload the GDT. The reload of
      the GDT is not needed as bootup kernel has already loaded the GDT
      which is at the same physical location as the the restored kernel.
      
      Note that the hibernation path assumes the GDT is correct during its
      'restore_registers'. The assumption in the code is that the restored
      image is the same as saved - meaning we are not trying to restore
      an different kernel in the virtual address space of a new kernel.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Link: http://lkml.kernel.org/r/1365194544-14648-3-git-send-email-konrad.wilk@oracle.com
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      84e70971
    • K
      x86-64, gdt: Store/load GDT for ACPI S3 or hibernate/resume path is not needed. · e7a5cd06
      Konrad Rzeszutek Wilk 提交于
      During the ACPI S3 resume path the trampoline code handles it already.
      
      During the ACPI S3 suspend phase (acpi_suspend_lowlevel) we set:
      early_gdt_descr.address = (..)get_cpu_gdt_table(smp_processor_id());
      
      which is then used during the resume path and has the same exact
      value as what the store/load_gdt do with the saved_context
      (which is saved/restored via save/restore_processor_state()).
      
      The flow during resume is complex and for 64-bit kernels we use three GDTs
      - one early bootstrap GDT (wakeup_igdt) that we load to workaround
      broken BIOSes, an early Protected Mode to Long Mode transition one
      (tr_gdt), and the final one - early_gdt_descr (which points to the real GDT).
      
      The early ('wakeup_gdt') is loaded in 'trampoline_start' for working
      around broken BIOSes, and then when we end up in Protected Mode in the
      startup_32 (in trampoline_64.s, not head_32.s) we use the 'tr_gdt'
      (still in trampoline_64.s). This 'tr_gdt' has a a 32-bit code segment,
      64-bit code segment with L=1, and a 32-bit data segment.
      
      Once we have transitioned from Protected Mode to Long Mode we then
      set the GDT to 'early_gdt_desc' and then via an iretq emerge in
      wakeup_long64 (set via 'initial_code' variable in acpi_suspend_lowlevel).
      
      In the wakeup_long64 we end up restoring the %rip (which is set to
      'resume_point') and jump there.
      
      In 'resume_point' we call 'restore_processor_state' which does
      the load_gdt on the saved context. This load_gdt is redundant as the
      GDT loaded via early_gdt_desc is the same.
      
      Here is the call-chain:
       wakeup_start
         |- lgdtl wakeup_gdt [the work-around broken BIOSes]
         |
         \-- trampoline_start (trampoline_64.S)
               |- lgdtl tr_gdt
               |
               \-- startup_32 (trampoline_64.S)
                     |
                     \-- startup_64 (trampoline_64.S)
                            |
                            \-- secondary_startup_64
                                     |- lgdtl early_gdt_desc
                                     | ...
                                     |- movq initial_code(%rip), %eax
                                     |-.. lretq
                                     \-- wakeup_64
                                           |-- other registers are reloaded
                                           |-- call restore_processor_state
      
      The hibernate path is much simpler. During the saving of the hibernation
      image we call save_processor_state() and save the contents of that along
      with the rest of the kernel in the hibernation image destination.
      We save the EIP of 'restore_registers' (restore_jump_address) and cr3
      (restore_cr3).
      
      During hibernate resume, the 'restore_registers' (via the
      'restore_jump_address) in hibernate_asm_64.S is invoked which restores
      the contents of most registers. Naturally the resume path benefits from
      already being in 64-bit mode, so it does not have to load the GDT.
      
      It only reloads the cr3 (from restore_cr3) and continues on. Note that
      the restoration of the restore image page-tables is done prior to this.
      
      After the 'restore_registers' it returns and we end up called
      restore_processor_state() - where we reload the GDT. The reload of
      the GDT is not needed as bootup kernel has already loaded the GDT which
      is at the same physical location as the the restored kernel.
      
      Note that the hibernation path assumes the GDT is correct during its
      'restore_registers'. The assumption in the code is that the restored
      image is the same as saved - meaning we are not trying to restore
      an different kernel in the virtual address space of a new kernel.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Link: http://lkml.kernel.org/r/1365194544-14648-2-git-send-email-konrad.wilk@oracle.com
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      e7a5cd06
  3. 16 3月, 2013 1 次提交
  4. 15 11月, 2012 2 次提交
  5. 02 4月, 2012 1 次提交
  6. 20 3月, 2012 1 次提交
    • M
      x86: kvmclock: abstract save/restore sched_clock_state · b74f05d6
      Marcelo Tosatti 提交于
      Upon resume from hibernation, CPU 0's hvclock area contains the old
      values for system_time and tsc_timestamp. It is necessary for the
      hypervisor to update these values with uptodate ones before the CPU uses
      them.
      
      Abstract TSC's save/restore sched_clock_state functions and use
      restore_state to write to KVM_SYSTEM_TIME MSR, forcing an update.
      
      Also move restore_sched_clock_state before __restore_processor_state,
      since the later calls CONFIG_LOCK_STAT's lockstat_clock (also for TSC).
      Thanks to Igor Mammedov for tracking it down.
      
      Fixes suspend-to-disk with kvmclock.
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      b74f05d6
  7. 22 2月, 2012 1 次提交
    • L
      i387: Split up <asm/i387.h> into exported and internal interfaces · 1361b83a
      Linus Torvalds 提交于
      While various modules include <asm/i387.h> to get access to things we
      actually *intend* for them to use, most of that header file was really
      pretty low-level internal stuff that we really don't want to expose to
      others.
      
      So split the header file into two: the small exported interfaces remain
      in <asm/i387.h>, while the internal definitions that are only used by
      core architecture code are now in <asm/fpu-internal.h>.
      
      The guiding principle for this was to expose functions that we export to
      modules, and leave them in <asm/i387.h>, while stuff that is used by
      task switching or was marked GPL-only is in <asm/fpu-internal.h>.
      
      The fpu-internal.h file could be further split up too, especially since
      arch/x86/kvm/ uses some of the remaining stuff for its module.  But that
      kvm usage should probably be abstracted out a bit, and at least now the
      internal FPU accessor functions are much more contained.  Even if it
      isn't perhaps as contained as it _could_ be.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1202211340330.5354@i5.linux-foundation.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      1361b83a
  8. 01 11月, 2011 1 次提交
    • P
      x86: Fix files explicitly requiring export.h for EXPORT_SYMBOL/THIS_MODULE · 69c60c88
      Paul Gortmaker 提交于
      These files were implicitly getting EXPORT_SYMBOL via device.h
      which was including module.h, but that will be fixed up shortly.
      
      By fixing these now, we can avoid seeing things like:
      
      arch/x86/kernel/rtc.c:29: warning: type defaults to ‘int’ in declaration of ‘EXPORT_SYMBOL’
      arch/x86/kernel/pci-dma.c:20: warning: type defaults to ‘int’ in declaration of ‘EXPORT_SYMBOL’
      arch/x86/kernel/e820.c:69: warning: type defaults to ‘int’ in declaration of ‘EXPORT_SYMBOL_GPL’
      
      [ with input from Randy Dunlap <rdunlap@xenotime.net> and also
        from Stephen Rothwell <sfr@canb.auug.org.au> ]
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      69c60c88
  9. 20 8月, 2010 1 次提交
    • S
      x86, tsc, sched: Recompute cyc2ns_offset's during resume from sleep states · cd7240c0
      Suresh Siddha 提交于
      TSC's get reset after suspend/resume (even on cpu's with invariant TSC
      which runs at a constant rate across ACPI P-, C- and T-states). And in
      some systems BIOS seem to reinit TSC to arbitrary large value (still
      sync'd across cpu's) during resume.
      
      This leads to a scenario of scheduler rq->clock (sched_clock_cpu()) less
      than rq->age_stamp (introduced in 2.6.32). This leads to a big value
      returned by scale_rt_power() and the resulting big group power set by the
      update_group_power() is causing improper load balancing between busy and
      idle cpu's after suspend/resume.
      
      This resulted in multi-threaded workloads (like kernel-compilation) go
      slower after suspend/resume cycle on core i5 laptops.
      
      Fix this by recomputing cyc2ns_offset's during resume, so that
      sched_clock() continues from the point where it was left off during
      suspend.
      Reported-by: NFlorian Pritz <flo@xssn.at>
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Cc: <stable@kernel.org> # [v2.6.32+]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1282262618.2675.24.camel@sbsiddha-MOBL3.sc.intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cd7240c0
  10. 19 7月, 2010 1 次提交
  11. 08 6月, 2010 1 次提交
  12. 08 11月, 2009 1 次提交
    • F
      hw-breakpoints: Rewrite the hw-breakpoints layer on top of perf events · 24f1e32c
      Frederic Weisbecker 提交于
      This patch rebase the implementation of the breakpoints API on top of
      perf events instances.
      
      Each breakpoints are now perf events that handle the
      register scheduling, thread/cpu attachment, etc..
      
      The new layering is now made as follows:
      
             ptrace       kgdb      ftrace   perf syscall
                \          |          /         /
                 \         |         /         /
                                              /
                  Core breakpoint API        /
                                            /
                           |               /
                           |              /
      
                    Breakpoints perf events
      
                           |
                           |
      
                     Breakpoints PMU ---- Debug Register constraints handling
                                          (Part of core breakpoint API)
                           |
                           |
      
                   Hardware debug registers
      
      Reasons of this rewrite:
      
      - Use the centralized/optimized pmu registers scheduling,
        implying an easier arch integration
      - More powerful register handling: perf attributes (pinned/flexible
        events, exclusive/non-exclusive, tunable period, etc...)
      
      Impact:
      
      - New perf ABI: the hardware breakpoints counters
      - Ptrace breakpoints setting remains tricky and still needs some per
        thread breakpoints references.
      
      Todo (in the order):
      
      - Support breakpoints perf counter events for perf tools (ie: implement
        perf_bpcounter_event())
      - Support from perf tools
      
      Changes in v2:
      
      - Follow the perf "event " rename
      - The ptrace regression have been fixed (ptrace breakpoint perf events
        weren't released when a task ended)
      - Drop the struct hw_breakpoint and store generic fields in
        perf_event_attr.
      - Separate core and arch specific headers, drop
        asm-generic/hw_breakpoint.h and create linux/hw_breakpoint.h
      - Use new generic len/type for breakpoint
      - Handle off case: when breakpoints api is not supported by an arch
      
      Changes in v3:
      
      - Fix broken CONFIG_KVM, we need to propagate the breakpoint api
        changes to kvm when we exit the guest and restore the bp registers
        to the host.
      
      Changes in v4:
      
      - Drop the hw_breakpoint_restore() stub as it is only used by KVM
      - EXPORT_SYMBOL_GPL hw_breakpoint_restore() as KVM can be built as a
        module
      - Restore the breakpoints unconditionally on kvm guest exit:
        TIF_DEBUG_THREAD doesn't anymore cover every cases of running
        breakpoints and vcpu->arch.switch_db_regs might not always be
        set when the guest used debug registers.
        (Waiting for a reliable optimization)
      
      Changes in v5:
      
      - Split-up the asm-generic/hw-breakpoint.h moving to
        linux/hw_breakpoint.h into a separate patch
      - Optimize the breakpoints restoring while switching from kvm guest
        to host. We only want to restore the state if we have active
        breakpoints to the host, otherwise we don't care about messed-up
        address registers.
      - Add asm/hw_breakpoint.h to Kbuild
      - Fix bad breakpoint type in trace_selftest.c
      
      Changes in v6:
      
      - Fix wrong header inclusion in trace.h (triggered a build
        error with CONFIG_FTRACE_SELFTEST
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Prasad <prasad@linux.vnet.ibm.com>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jan Kiszka <jan.kiszka@web.de>
      Cc: Jiri Slaby <jirislaby@gmail.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      24f1e32c
  13. 18 9月, 2009 1 次提交
  14. 22 8月, 2009 1 次提交
    • S
      x86, pat/mtrr: Rendezvous all the cpus for MTRR/PAT init · d0af9eed
      Suresh Siddha 提交于
      SDM Vol 3a section titled "MTRR considerations in MP systems" specifies
      the need for synchronizing the logical cpu's while initializing/updating
      MTRR.
      
      Currently Linux kernel does the synchronization of all cpu's only when
      a single MTRR register is programmed/updated. During an AP online
      (during boot/cpu-online/resume)  where we initialize all the MTRR/PAT registers,
      we don't follow this synchronization algorithm.
      
      This can lead to scenarios where during a dynamic cpu online, that logical cpu
      is initializing MTRR/PAT with cache disabled (cr0.cd=1) etc while other logical
      HT sibling continue to run (also with cache disabled because of cr0.cd=1
      on its sibling).
      
      Starting from Westmere, VMX transitions with cr0.cd=1 don't work properly
      (because of some VMX performance optimizations) and the above scenario
      (with one logical cpu doing VMX activity and another logical cpu coming online)
      can result in system crash.
      
      Fix the MTRR initialization by doing rendezvous of all the cpus. During
      boot and resume, we delay the MTRR/PAT init for APs till all the
      logical cpu's come online and the rendezvous process at the end of AP's bringup,
      will initialize the MTRR/PAT for all AP's.
      
      For dynamic single cpu online, we synchronize all the logical cpus and
      do the MTRR/PAT init on the AP that is coming online.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      d0af9eed
  15. 24 6月, 2009 1 次提交
  16. 13 6月, 2009 6 次提交
  17. 03 6月, 2009 2 次提交
  18. 01 4月, 2009 1 次提交
  19. 28 8月, 2008 1 次提交
  20. 10 2月, 2008 2 次提交
  21. 02 2月, 2008 1 次提交
  22. 30 1月, 2008 3 次提交
  23. 18 12月, 2007 1 次提交
  24. 24 10月, 2007 1 次提交
  25. 20 10月, 2007 2 次提交
  26. 19 10月, 2007 2 次提交