1. 29 8月, 2019 1 次提交
    • T
      x86/CPU/AMD: Clear RDRAND CPUID bit on AMD family 15h/16h · e063b03b
      Tom Lendacky 提交于
      commit c49a0a80137c7ca7d6ced4c812c9e07a949f6f24 upstream.
      
      There have been reports of RDRAND issues after resuming from suspend on
      some AMD family 15h and family 16h systems. This issue stems from a BIOS
      not performing the proper steps during resume to ensure RDRAND continues
      to function properly.
      
      RDRAND support is indicated by CPUID Fn00000001_ECX[30]. This bit can be
      reset by clearing MSR C001_1004[62]. Any software that checks for RDRAND
      support using CPUID, including the kernel, will believe that RDRAND is
      not supported.
      
      Update the CPU initialization to clear the RDRAND CPUID bit for any family
      15h and 16h processor that supports RDRAND. If it is known that the family
      15h or family 16h system does not have an RDRAND resume issue or that the
      system will not be placed in suspend, the "rdrand=force" kernel parameter
      can be used to stop the clearing of the RDRAND CPUID bit.
      
      Additionally, update the suspend and resume path to save and restore the
      MSR C001_1004 value to ensure that the RDRAND CPUID setting remains in
      place after resuming from suspend.
      
      Note, that clearing the RDRAND CPUID bit does not prevent a processor
      that normally supports the RDRAND instruction from executing it. So any
      code that determined the support based on family and model won't #UD.
      Signed-off-by: NTom Lendacky <thomas.lendacky@amd.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andrew Cooper <andrew.cooper3@citrix.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Chen Yu <yu.c.chen@intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: "linux-doc@vger.kernel.org" <linux-doc@vger.kernel.org>
      Cc: "linux-pm@vger.kernel.org" <linux-pm@vger.kernel.org>
      Cc: Nathan Chancellor <natechancellor@gmail.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: "Rafael J. Wysocki" <rjw@rjwysocki.net>
      Cc: <stable@vger.kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "x86@kernel.org" <x86@kernel.org>
      Link: https://lkml.kernel.org/r/7543af91666f491547bd86cebb1e17c66824ab9f.1566229943.git.thomas.lendacky@amd.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e063b03b
  2. 11 6月, 2019 1 次提交
    • J
      x86/power: Fix 'nosmt' vs hibernation triple fault during resume · 4d166206
      Jiri Kosina 提交于
      commit ec527c318036a65a083ef68d8ba95789d2212246 upstream.
      
      As explained in
      
      	0cc3cd21 ("cpu/hotplug: Boot HT siblings at least once")
      
      we always, no matter what, have to bring up x86 HT siblings during boot at
      least once in order to avoid first MCE bringing the system to its knees.
      
      That means that whenever 'nosmt' is supplied on the kernel command-line,
      all the HT siblings are as a result sitting in mwait or cpudile after
      going through the online-offline cycle at least once.
      
      This causes a serious issue though when a kernel, which saw 'nosmt' on its
      commandline, is going to perform resume from hibernation: if the resume
      from the hibernated image is successful, cr3 is flipped in order to point
      to the address space of the kernel that is being resumed, which in turn
      means that all the HT siblings are all of a sudden mwaiting on address
      which is no longer valid.
      
      That results in triple fault shortly after cr3 is switched, and machine
      reboots.
      
      Fix this by always waking up all the SMT siblings before initiating the
      'restore from hibernation' process; this guarantees that all the HT
      siblings will be properly carried over to the resumed kernel waiting in
      resume_play_dead(), and acted upon accordingly afterwards, based on the
      target kernel configuration.
      
      Symmetricaly, the resumed kernel has to push the SMT siblings to mwait
      again in case it has SMT disabled; this means it has to online all
      the siblings when resuming (so that they come out of hlt) and offline
      them again to let them reach mwait.
      
      Cc: 4.19+ <stable@vger.kernel.org> # v4.19+
      Debugged-by: NThomas Gleixner <tglx@linutronix.de>
      Fixes: 0cc3cd21 ("cpu/hotplug: Boot HT siblings at least once")
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4d166206
  3. 17 12月, 2017 2 次提交
  4. 15 12月, 2017 3 次提交
  5. 06 12月, 2017 1 次提交
    • A
      x86/power: Fix some ordering bugs in __restore_processor_context() · 5b06bbcf
      Andy Lutomirski 提交于
      __restore_processor_context() had a couple of ordering bugs.  It
      restored GSBASE after calling load_gs_index(), and the latter can
      call into tracing code.  It also tried to restore segment registers
      before restoring the LDT, which is straight-up wrong.
      
      Reorder the code so that we restore GSBASE, then the descriptor
      tables, then the segments.
      
      This fixes two bugs.  First, it fixes a regression that broke resume
      under certain configurations due to irqflag tracing in
      native_load_gs_index().  Second, it fixes resume when the userspace
      process that initiated suspect had funny segments.  The latter can be
      reproduced by compiling this:
      
      // SPDX-License-Identifier: GPL-2.0
      /*
       * ldt_echo.c - Echo argv[1] while using an LDT segment
       */
      
      int main(int argc, char **argv)
      {
      	int ret;
      	size_t len;
      	char *buf;
      
      	const struct user_desc desc = {
                      .entry_number    = 0,
                      .base_addr       = 0,
                      .limit           = 0xfffff,
                      .seg_32bit       = 1,
                      .contents        = 0, /* Data, grow-up */
                      .read_exec_only  = 0,
                      .limit_in_pages  = 1,
                      .seg_not_present = 0,
                      .useable         = 0
              };
      
      	if (argc != 2)
      		errx(1, "Usage: %s STRING", argv[0]);
      
      	len = asprintf(&buf, "%s\n", argv[1]);
      	if (len < 0)
      		errx(1, "Out of memory");
      
      	ret = syscall(SYS_modify_ldt, 1, &desc, sizeof(desc));
      	if (ret < -1)
      		errno = -ret;
      	if (ret)
      		err(1, "modify_ldt");
      
      	asm volatile ("movw %0, %%es" :: "rm" ((unsigned short)7));
      	write(1, buf, len);
      	return 0;
      }
      
      and running ldt_echo >/sys/power/mem
      
      Without the fix, the latter causes a triple fault on resume.
      
      Fixes: ca37e57b ("x86/entry/64: Add missing irqflags tracing to native_load_gs_index()")
      Reported-by: NJarkko Nikula <jarkko.nikula@linux.intel.com>
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Tested-by: NJarkko Nikula <jarkko.nikula@linux.intel.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: https://lkml.kernel.org/r/6b31721ea92f51ea839e79bd97ade4a75b1eeea2.1512057304.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5b06bbcf
  6. 14 9月, 2017 1 次提交
  7. 07 9月, 2017 1 次提交
    • A
      x86/mm: Reinitialize TLB state on hotplug and resume · 72c0098d
      Andy Lutomirski 提交于
      When Linux brings a CPU down and back up, it switches to init_mm and then
      loads swapper_pg_dir into CR3.  With PCID enabled, this has the side effect
      of masking off the ASID bits in CR3.
      
      This can result in some confusion in the TLB handling code.  If we
      bring a CPU down and back up with any ASID other than 0, we end up
      with the wrong ASID active on the CPU after resume.  This could
      cause our internal state to become corrupt, although major
      corruption is unlikely because init_mm doesn't have any user pages.
      More obviously, if CONFIG_DEBUG_VM=y, we'll trip over an assertion
      in the next context switch.  The result of *that* is a failure to
      resume from suspend with probability 1 - 1/6^(cpus-1).
      
      Fix it by reinitializing cpu_tlbstate on resume and CPU bringup.
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Reported-by: NJiri Kosina <jikos@kernel.org>
      Fixes: 10af6235 ("x86/mm: Implement PCID based optimization: try to preserve old TLB entries using PCID")
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      72c0098d
  8. 13 6月, 2017 1 次提交
  9. 16 3月, 2017 1 次提交
    • T
      x86: Remap GDT tables in the fixmap section · 69218e47
      Thomas Garnier 提交于
      Each processor holds a GDT in its per-cpu structure. The sgdt
      instruction gives the base address of the current GDT. This address can
      be used to bypass KASLR memory randomization. With another bug, an
      attacker could target other per-cpu structures or deduce the base of
      the main memory section (PAGE_OFFSET).
      
      This patch relocates the GDT table for each processor inside the
      fixmap section. The space is reserved based on number of supported
      processors.
      
      For consistency, the remapping is done by default on 32 and 64-bit.
      
      Each processor switches to its remapped GDT at the end of
      initialization. For hibernation, the main processor returns with the
      original GDT and switches back to the remapping at completion.
      
      This patch was tested on both architectures. Hibernation and KVM were
      both tested specially for their usage of the GDT.
      
      Thanks to Boris Ostrovsky <boris.ostrovsky@oracle.com> for testing and
      recommending changes for Xen support.
      Signed-off-by: NThomas Garnier <thgarnie@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Chris Wilson <chris@chris-wilson.co.uk>
      Cc: Christian Borntraeger <borntraeger@de.ibm.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Lorenzo Stoakes <lstoakes@gmail.com>
      Cc: Luis R . Rodriguez <mcgrof@kernel.org>
      Cc: Matt Fleming <matt@codeblueprint.co.uk>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Radim Krčmář <rkrcmar@redhat.com>
      Cc: Rafael J . Wysocki <rjw@rjwysocki.net>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Stanislaw Gruszka <sgruszka@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
      Cc: kasan-dev@googlegroups.com
      Cc: kernel-hardening@lists.openwall.com
      Cc: kvm@vger.kernel.org
      Cc: lguest@lists.ozlabs.org
      Cc: linux-doc@vger.kernel.org
      Cc: linux-efi@vger.kernel.org
      Cc: linux-mm@kvack.org
      Cc: linux-pm@vger.kernel.org
      Cc: xen-devel@lists.xenproject.org
      Cc: zijun_hu <zijun_hu@htc.com>
      Link: http://lkml.kernel.org/r/20170314170508.100882-2-thgarnie@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      69218e47
  10. 15 12月, 2016 1 次提交
  11. 30 9月, 2016 1 次提交
  12. 16 7月, 2016 1 次提交
    • R
      x86 / hibernate: Use hlt_play_dead() when resuming from hibernation · 406f992e
      Rafael J. Wysocki 提交于
      On Intel hardware, native_play_dead() uses mwait_play_dead() by
      default and only falls back to the other methods if that fails.
      That also happens during resume from hibernation, when the restore
      (boot) kernel runs disable_nonboot_cpus() to take all of the CPUs
      except for the boot one offline.
      
      However, that is problematic, because the address passed to
      __monitor() in mwait_play_dead() is likely to be written to in the
      last phase of hibernate image restoration and that causes the "dead"
      CPU to start executing instructions again.  Unfortunately, the page
      containing the address in that CPU's instruction pointer may not be
      valid any more at that point.
      
      First, that page may have been overwritten with image kernel memory
      contents already, so the instructions the CPU attempts to execute may
      simply be invalid.  Second, the page tables previously used by that
      CPU may have been overwritten by image kernel memory contents, so the
      address in its instruction pointer is impossible to resolve then.
      
      A report from Varun Koyyalagunta and investigation carried out by
      Chen Yu show that the latter sometimes happens in practice.
      
      To prevent it from happening, temporarily change the smp_ops.play_dead
      pointer during resume from hibernation so that it points to a special
      "play dead" routine which uses hlt_play_dead() and avoids the
      inadvertent "revivals" of "dead" CPUs this way.
      
      A slightly unpleasant consequence of this change is that if the
      system is hibernated with one or more CPUs offline, it will generally
      draw more power after resume than it did before hibernation, because
      the physical state entered by CPUs via hlt_play_dead() is higher-power
      than the mwait_play_dead() one in the majority of cases.  It is
      possible to work around this, but it is unclear how much of a problem
      that's going to be in practice, so the workaround will be implemented
      later if it turns out to be necessary.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=106371Reported-by: NVarun Koyyalagunta <cpudebug@centtech.com>
      Original-by: NChen Yu <yu.c.chen@intel.com>
      Tested-by: NChen Yu <yu.c.chen@intel.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      406f992e
  13. 26 11月, 2015 1 次提交
    • C
      x86/pm: Introduce quirk framework to save/restore extra MSR registers around suspend/resume · 7a9c2dd0
      Chen Yu 提交于
      A bug was reported that on certain Broadwell platforms, after
      resuming from S3, the CPU is running at an anomalously low
      speed.
      
      It turns out that the BIOS has modified the value of the
      THERM_CONTROL register during S3, and changed it from 0 to 0x10,
      thus enabled clock modulation(bit4), but with undefined CPU Duty
      Cycle(bit1:3) - which causes the problem.
      
      Here is a simple scenario to reproduce the issue:
      
       1. Boot up the system
       2. Get MSR 0x19a, it should be 0
       3. Put the system into sleep, then wake it up
       4. Get MSR 0x19a, it shows 0x10, while it should be 0
      
      Although some BIOSen want to change the CPU Duty Cycle during
      S3, in our case we don't want the BIOS to do any modification.
      
      Fix this issue by introducing a more generic x86 framework to
      save/restore specified MSR registers(THERM_CONTROL in this case)
      for suspend/resume. This allows us to fix similar bugs in a much
      simpler way in the future.
      
      When the kernel wants to protect certain MSRs during suspending,
      we simply add a quirk entry in msr_save_dmi_table, and customize
      the MSR registers inside the quirk callback, for example:
      
        u32 msr_id_need_to_save[] = {MSR_ID0, MSR_ID1, MSR_ID2...};
      
      and the quirk mechanism ensures that, once resumed from suspend,
      the MSRs indicated by these IDs will be restored to their
      original, pre-suspend values.
      
      Since both 64-bit and 32-bit kernels are affected, this patch
      covers the common 64/32-bit suspend/resume code path. And
      because the MSRs specified by the user might not be available or
      readable in any situation, we use rdmsrl_safe() to safely save
      these MSRs.
      Reported-and-tested-by: NMarcin Kaszewski <marcin.kaszewski@intel.com>
      Signed-off-by: NChen Yu <yu.c.chen@intel.com>
      Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: bp@suse.de
      Cc: len.brown@intel.com
      Cc: linux@horizon.com
      Cc: luto@kernel.org
      Cc: rjw@rjwysocki.net
      Link: http://lkml.kernel.org/r/c9abdcbc173dd2f57e8990e304376f19287e92ba.1448382971.git.yu.c.chen@intel.com
      [ More edits to the naming of data structures. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      7a9c2dd0
  14. 31 7月, 2015 1 次提交
    • A
      x86/ldt: Make modify_ldt synchronous · 37868fe1
      Andy Lutomirski 提交于
      modify_ldt() has questionable locking and does not synchronize
      threads.  Improve it: redesign the locking and synchronize all
      threads' LDTs using an IPI on all modifications.
      
      This will dramatically slow down modify_ldt in multithreaded
      programs, but there shouldn't be any multithreaded programs that
      care about modify_ldt's performance in the first place.
      
      This fixes some fallout from the CVE-2015-5157 fixes.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Cc: Andrew Cooper <andrew.cooper3@citrix.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jan Beulich <jbeulich@suse.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: security@kernel.org <security@kernel.org>
      Cc: <stable@vger.kernel.org>
      Cc: xen-devel <xen-devel@lists.xen.org>
      Link: http://lkml.kernel.org/r/4c6978476782160600471bd865b318db34c7b628.1438291540.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      37868fe1
  15. 19 5月, 2015 4 次提交
    • I
      x86/fpu: Move various internal function prototypes to fpu/internal.h · 952f07ec
      Ingo Molnar 提交于
      There are a number of FPU internal function prototypes and an inline function
      in fpu/api.h, mostly placed so historically as the code grew over the years.
      
      Move them over into fpu/internal.h where they belong. (Add sched.h include
      to stackprotector.h which incorrectly relied on getting it from fpu/api.h.)
      
      fpu/api.h is now a pure file that only contains FPU APIs intended for driver
      use.
      Reviewed-by: NBorislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      952f07ec
    • I
      x86/fpu: Move XCR0 manipulation to the FPU code proper · 9254aaa0
      Ingo Molnar 提交于
      The suspend code accesses FPU state internals, add a helper for
      it and isolate it.
      Reviewed-by: NBorislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      9254aaa0
    • I
      x86/fpu: Rename 'pcntxt_mask' to 'xfeatures_mask' · 614df7fb
      Ingo Molnar 提交于
      So the 'pcntxt_mask' is a misnomer, it's essentially meaningless to anyone
      who doesn't know what it does exactly.
      
      Name it more descriptively as 'xfeatures_mask'.
      Reviewed-by: NBorislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      614df7fb
    • I
      x86/fpu: Rename fpu-internal.h to fpu/internal.h · 78f7f1e5
      Ingo Molnar 提交于
      This unifies all the FPU related header files under a unified, hiearchical
      naming scheme:
      
       - asm/fpu/types.h:      FPU related data types, needed for 'struct task_struct',
                               widely included in almost all kernel code, and hence kept
                               as small as possible.
      
       - asm/fpu/api.h:        FPU related 'public' methods exported to other subsystems.
      
       - asm/fpu/internal.h:   FPU subsystem internal methods
      
       - asm/fpu/xsave.h:      XSAVE support internal methods
      
      (Also standardize the header guard in asm/fpu/internal.h.)
      Reviewed-by: NBorislav Petkov <bp@alien8.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      78f7f1e5
  16. 06 3月, 2015 1 次提交
  17. 04 2月, 2015 1 次提交
  18. 17 7月, 2014 1 次提交
    • S
      x86, power, suspend: Annotate restore_processor_state() with notrace · b8f99b3e
      Steven Rostedt (Red Hat) 提交于
      ftrace_stop() is used to stop function tracing during suspend and resume
      which removes a lot of possible debugging opportunities with tracing.
      The reason was that some function in the resume path was causing a triple
      fault if it were to be traced. The issue I found was that doing something
      as simple as calling smp_processor_id() would reboot the box!
      
      When function tracing was first created I didn't have a good way to figure
      out what function was having issues, or it looked to be multiple ones. To
      fix it, we just created a big hammer approach to the problem which was to
      add a flag in the mcount trampoline that could be checked and not call
      the traced functions.
      
      Lately I developed better ways to find problem functions and I can bisect
      down to see what function is causing the issue. I removed the flag that
      stopped tracing and proceeded to find the problem function and it ended
      up being restore_processor_state(). This function makes sense as when the
      CPU comes back online from a suspend it calls this function to set up
      registers, amongst them the GS register, which stores things such as
      what CPU the processor is (if you call smp_processor_id() without this
      set up properly, it would fault).
      
      By making restore_processor_state() notrace, the system can suspend and
      resume without the need of the big hammer tracing to stop.
      
      Link: http://lkml.kernel.org/r/3577662.BSnUZfboWb@vostro.rjw.lanAcked-by: N"Rafael J. Wysocki" <rjw@rjwysocki.net>
      Reviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      b8f99b3e
  19. 07 8月, 2013 1 次提交
  20. 03 5月, 2013 1 次提交
    • K
      x86, gdt, hibernate: Store/load GDT for hibernate path. · cc456c4e
      Konrad Rzeszutek Wilk 提交于
      The git commite7a5cd06
      ("x86-64, gdt: Store/load GDT for ACPI S3 or hibernate/resume path
      is not needed.") assumes that for the hibernate path the booting
      kernel and the resuming kernel MUST be the same. That is certainly
      the case for a 32-bit kernel (see check_image_kernel and
      CONFIG_ARCH_HIBERNATION_HEADER config option).
      
      However for 64-bit kernels it is OK to have a different kernel
      version (and size of the image) of the booting and resuming kernels.
      Hence the above mentioned git commit introduces an regression.
      
      This patch fixes it by introducing a 'struct desc_ptr gdt_desc'
      back in the 'struct saved_context'. However instead of having in the
      'save_processor_state' and 'restore_processor_state' the
      store/load_gdt calls, we are only saving the GDT in the
      save_processor_state.
      
      For the restore path the lgdt operation is done in
      hibernate_asm_[32|64].S in the 'restore_registers' path.
      
      The apt reader of this description will recognize that only 64-bit
      kernels need this treatment, not 32-bit. This patch adds the logic
      in the 32-bit path to be more similar to 64-bit so that in the future
      the unification process can take advantage of this.
      
      [ hpa: this also reverts an inadvertent on-disk format change ]
      Suggested-by: N"H. Peter Anvin" <hpa@zytor.com>
      Acked-by: N"Rafael J. Wysocki" <rjw@sisk.pl>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Link: http://lkml.kernel.org/r/1367459610-9656-2-git-send-email-konrad.wilk@oracle.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      cc456c4e
  21. 12 4月, 2013 3 次提交
    • K
      x86, wakeup, sleep: Use pvops functions for changing GDT entries · 4d681be3
      konrad@kernel.org 提交于
      We check the TSS descriptor before we try to dereference it.
      Also we document what the value '9' actually means using the
      AMD64 Architecture Programmer's Manual Volume 2, pg 90:
      "Hex value 9: Available 64-bit TSS" and pg 91:
      "The available 32-bit TSS (09h), which is redefined as the
      available 64-bit TSS."
      
      Without this, on Xen, where the GDT is available as R/O (to
      protect the hypervisor from the guest modifying it), we end up
      with a pagetable fault.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Link: http://lkml.kernel.org/r/1365194544-14648-5-git-send-email-konrad.wilk@oracle.com
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      4d681be3
    • K
      x86-32, gdt: Store/load GDT for ACPI S3 or hibernation/resume path is not needed · 84e70971
      Konrad Rzeszutek Wilk 提交于
      During the ACPI S3 suspend, we store the GDT in the wakup_header (see
      wakeup_asm.s) field called 'pmode_gdt'.
      
      Which is then used during the resume path and has the same exact
      value as what the store/load_gdt do with the saved_context
      (which is saved/restored via save/restore_processor_state()).
      
      The flow during resume from ACPI S3 is simpler than the 64-bit
      counterpart. We only use the early bootstrap once (wakeup_gdt) and
      do various checks in real mode.
      
      After the checks are completed, we load the saved GDT ('pmode_gdt') and
      continue on with the resume (by heading to startup_32 in trampoline_32.S) -
      which quickly jumps to what was saved in 'pmode_entry'
      aka 'wakeup_pmode_return'.
      
      The 'wakeup_pmode_return' restores the GDT (saved_gdt) again (which was
      saved in do_suspend_lowlevel initially). After that it ends up calling
      the 'ret_point' which calls 'restore_processor_state()'.
      
      We have two opportunities to remove code where we restore the same GDT
      twice.
      
      Here is the call chain:
       wakeup_start
             |- lgdtl wakeup_gdt [the work-around broken BIOSes]
             |
             | - lgdtl pmode_gdt [the real one]
             |
             \-- startup_32 (in trampoline_32.S)
                    \-- wakeup_pmode_return (in wakeup_32.S)
                             |- lgdtl saved_gdt [the real one]
                             \-- ret_point
                                   |..
                                   |- call restore_processor_state
      
      The hibernate path is much simpler. During the saving of the hibernation
      image we call save_processor_state() and save the contents of that
      along with the rest of the kernel in the hibernation image destination.
      We save the EIP of 'restore_registers' (restore_jump_address) and
      cr3 (restore_cr3).
      
      During hibernate resume, the 'restore_registers' (via the
      'restore_jump_address) in hibernate_asm_32.S is invoked which
      restores the contents of most registers. Naturally the resume path benefits
      from already being in 32-bit mode, so it does not have to reload the GDT.
      
      It only reloads the cr3 (from restore_cr3) and continues on. Note
      that the restoration of the restore image page-tables is done prior to
      this.
      
      After the 'restore_registers' it returns and we end up called
      restore_processor_state() - where we reload the GDT. The reload of
      the GDT is not needed as bootup kernel has already loaded the GDT
      which is at the same physical location as the the restored kernel.
      
      Note that the hibernation path assumes the GDT is correct during its
      'restore_registers'. The assumption in the code is that the restored
      image is the same as saved - meaning we are not trying to restore
      an different kernel in the virtual address space of a new kernel.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Link: http://lkml.kernel.org/r/1365194544-14648-3-git-send-email-konrad.wilk@oracle.com
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      84e70971
    • K
      x86-64, gdt: Store/load GDT for ACPI S3 or hibernate/resume path is not needed. · e7a5cd06
      Konrad Rzeszutek Wilk 提交于
      During the ACPI S3 resume path the trampoline code handles it already.
      
      During the ACPI S3 suspend phase (acpi_suspend_lowlevel) we set:
      early_gdt_descr.address = (..)get_cpu_gdt_table(smp_processor_id());
      
      which is then used during the resume path and has the same exact
      value as what the store/load_gdt do with the saved_context
      (which is saved/restored via save/restore_processor_state()).
      
      The flow during resume is complex and for 64-bit kernels we use three GDTs
      - one early bootstrap GDT (wakeup_igdt) that we load to workaround
      broken BIOSes, an early Protected Mode to Long Mode transition one
      (tr_gdt), and the final one - early_gdt_descr (which points to the real GDT).
      
      The early ('wakeup_gdt') is loaded in 'trampoline_start' for working
      around broken BIOSes, and then when we end up in Protected Mode in the
      startup_32 (in trampoline_64.s, not head_32.s) we use the 'tr_gdt'
      (still in trampoline_64.s). This 'tr_gdt' has a a 32-bit code segment,
      64-bit code segment with L=1, and a 32-bit data segment.
      
      Once we have transitioned from Protected Mode to Long Mode we then
      set the GDT to 'early_gdt_desc' and then via an iretq emerge in
      wakeup_long64 (set via 'initial_code' variable in acpi_suspend_lowlevel).
      
      In the wakeup_long64 we end up restoring the %rip (which is set to
      'resume_point') and jump there.
      
      In 'resume_point' we call 'restore_processor_state' which does
      the load_gdt on the saved context. This load_gdt is redundant as the
      GDT loaded via early_gdt_desc is the same.
      
      Here is the call-chain:
       wakeup_start
         |- lgdtl wakeup_gdt [the work-around broken BIOSes]
         |
         \-- trampoline_start (trampoline_64.S)
               |- lgdtl tr_gdt
               |
               \-- startup_32 (trampoline_64.S)
                     |
                     \-- startup_64 (trampoline_64.S)
                            |
                            \-- secondary_startup_64
                                     |- lgdtl early_gdt_desc
                                     | ...
                                     |- movq initial_code(%rip), %eax
                                     |-.. lretq
                                     \-- wakeup_64
                                           |-- other registers are reloaded
                                           |-- call restore_processor_state
      
      The hibernate path is much simpler. During the saving of the hibernation
      image we call save_processor_state() and save the contents of that along
      with the rest of the kernel in the hibernation image destination.
      We save the EIP of 'restore_registers' (restore_jump_address) and cr3
      (restore_cr3).
      
      During hibernate resume, the 'restore_registers' (via the
      'restore_jump_address) in hibernate_asm_64.S is invoked which restores
      the contents of most registers. Naturally the resume path benefits from
      already being in 64-bit mode, so it does not have to load the GDT.
      
      It only reloads the cr3 (from restore_cr3) and continues on. Note that
      the restoration of the restore image page-tables is done prior to this.
      
      After the 'restore_registers' it returns and we end up called
      restore_processor_state() - where we reload the GDT. The reload of
      the GDT is not needed as bootup kernel has already loaded the GDT which
      is at the same physical location as the the restored kernel.
      
      Note that the hibernation path assumes the GDT is correct during its
      'restore_registers'. The assumption in the code is that the restored
      image is the same as saved - meaning we are not trying to restore
      an different kernel in the virtual address space of a new kernel.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Link: http://lkml.kernel.org/r/1365194544-14648-2-git-send-email-konrad.wilk@oracle.com
      Cc: Rafael J. Wysocki <rjw@sisk.pl>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      e7a5cd06
  22. 16 3月, 2013 1 次提交
  23. 15 11月, 2012 2 次提交
  24. 02 4月, 2012 1 次提交
  25. 20 3月, 2012 1 次提交
    • M
      x86: kvmclock: abstract save/restore sched_clock_state · b74f05d6
      Marcelo Tosatti 提交于
      Upon resume from hibernation, CPU 0's hvclock area contains the old
      values for system_time and tsc_timestamp. It is necessary for the
      hypervisor to update these values with uptodate ones before the CPU uses
      them.
      
      Abstract TSC's save/restore sched_clock_state functions and use
      restore_state to write to KVM_SYSTEM_TIME MSR, forcing an update.
      
      Also move restore_sched_clock_state before __restore_processor_state,
      since the later calls CONFIG_LOCK_STAT's lockstat_clock (also for TSC).
      Thanks to Igor Mammedov for tracking it down.
      
      Fixes suspend-to-disk with kvmclock.
      Reviewed-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NAvi Kivity <avi@redhat.com>
      b74f05d6
  26. 22 2月, 2012 1 次提交
    • L
      i387: Split up <asm/i387.h> into exported and internal interfaces · 1361b83a
      Linus Torvalds 提交于
      While various modules include <asm/i387.h> to get access to things we
      actually *intend* for them to use, most of that header file was really
      pretty low-level internal stuff that we really don't want to expose to
      others.
      
      So split the header file into two: the small exported interfaces remain
      in <asm/i387.h>, while the internal definitions that are only used by
      core architecture code are now in <asm/fpu-internal.h>.
      
      The guiding principle for this was to expose functions that we export to
      modules, and leave them in <asm/i387.h>, while stuff that is used by
      task switching or was marked GPL-only is in <asm/fpu-internal.h>.
      
      The fpu-internal.h file could be further split up too, especially since
      arch/x86/kvm/ uses some of the remaining stuff for its module.  But that
      kvm usage should probably be abstracted out a bit, and at least now the
      internal FPU accessor functions are much more contained.  Even if it
      isn't perhaps as contained as it _could_ be.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1202211340330.5354@i5.linux-foundation.orgSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      1361b83a
  27. 01 11月, 2011 1 次提交
    • P
      x86: Fix files explicitly requiring export.h for EXPORT_SYMBOL/THIS_MODULE · 69c60c88
      Paul Gortmaker 提交于
      These files were implicitly getting EXPORT_SYMBOL via device.h
      which was including module.h, but that will be fixed up shortly.
      
      By fixing these now, we can avoid seeing things like:
      
      arch/x86/kernel/rtc.c:29: warning: type defaults to ‘int’ in declaration of ‘EXPORT_SYMBOL’
      arch/x86/kernel/pci-dma.c:20: warning: type defaults to ‘int’ in declaration of ‘EXPORT_SYMBOL’
      arch/x86/kernel/e820.c:69: warning: type defaults to ‘int’ in declaration of ‘EXPORT_SYMBOL_GPL’
      
      [ with input from Randy Dunlap <rdunlap@xenotime.net> and also
        from Stephen Rothwell <sfr@canb.auug.org.au> ]
      Signed-off-by: NPaul Gortmaker <paul.gortmaker@windriver.com>
      69c60c88
  28. 20 8月, 2010 1 次提交
    • S
      x86, tsc, sched: Recompute cyc2ns_offset's during resume from sleep states · cd7240c0
      Suresh Siddha 提交于
      TSC's get reset after suspend/resume (even on cpu's with invariant TSC
      which runs at a constant rate across ACPI P-, C- and T-states). And in
      some systems BIOS seem to reinit TSC to arbitrary large value (still
      sync'd across cpu's) during resume.
      
      This leads to a scenario of scheduler rq->clock (sched_clock_cpu()) less
      than rq->age_stamp (introduced in 2.6.32). This leads to a big value
      returned by scale_rt_power() and the resulting big group power set by the
      update_group_power() is causing improper load balancing between busy and
      idle cpu's after suspend/resume.
      
      This resulted in multi-threaded workloads (like kernel-compilation) go
      slower after suspend/resume cycle on core i5 laptops.
      
      Fix this by recomputing cyc2ns_offset's during resume, so that
      sched_clock() continues from the point where it was left off during
      suspend.
      Reported-by: NFlorian Pritz <flo@xssn.at>
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Cc: <stable@kernel.org> # [v2.6.32+]
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <1282262618.2675.24.camel@sbsiddha-MOBL3.sc.intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cd7240c0
  29. 19 7月, 2010 1 次提交
  30. 08 6月, 2010 1 次提交
  31. 08 11月, 2009 1 次提交
    • F
      hw-breakpoints: Rewrite the hw-breakpoints layer on top of perf events · 24f1e32c
      Frederic Weisbecker 提交于
      This patch rebase the implementation of the breakpoints API on top of
      perf events instances.
      
      Each breakpoints are now perf events that handle the
      register scheduling, thread/cpu attachment, etc..
      
      The new layering is now made as follows:
      
             ptrace       kgdb      ftrace   perf syscall
                \          |          /         /
                 \         |         /         /
                                              /
                  Core breakpoint API        /
                                            /
                           |               /
                           |              /
      
                    Breakpoints perf events
      
                           |
                           |
      
                     Breakpoints PMU ---- Debug Register constraints handling
                                          (Part of core breakpoint API)
                           |
                           |
      
                   Hardware debug registers
      
      Reasons of this rewrite:
      
      - Use the centralized/optimized pmu registers scheduling,
        implying an easier arch integration
      - More powerful register handling: perf attributes (pinned/flexible
        events, exclusive/non-exclusive, tunable period, etc...)
      
      Impact:
      
      - New perf ABI: the hardware breakpoints counters
      - Ptrace breakpoints setting remains tricky and still needs some per
        thread breakpoints references.
      
      Todo (in the order):
      
      - Support breakpoints perf counter events for perf tools (ie: implement
        perf_bpcounter_event())
      - Support from perf tools
      
      Changes in v2:
      
      - Follow the perf "event " rename
      - The ptrace regression have been fixed (ptrace breakpoint perf events
        weren't released when a task ended)
      - Drop the struct hw_breakpoint and store generic fields in
        perf_event_attr.
      - Separate core and arch specific headers, drop
        asm-generic/hw_breakpoint.h and create linux/hw_breakpoint.h
      - Use new generic len/type for breakpoint
      - Handle off case: when breakpoints api is not supported by an arch
      
      Changes in v3:
      
      - Fix broken CONFIG_KVM, we need to propagate the breakpoint api
        changes to kvm when we exit the guest and restore the bp registers
        to the host.
      
      Changes in v4:
      
      - Drop the hw_breakpoint_restore() stub as it is only used by KVM
      - EXPORT_SYMBOL_GPL hw_breakpoint_restore() as KVM can be built as a
        module
      - Restore the breakpoints unconditionally on kvm guest exit:
        TIF_DEBUG_THREAD doesn't anymore cover every cases of running
        breakpoints and vcpu->arch.switch_db_regs might not always be
        set when the guest used debug registers.
        (Waiting for a reliable optimization)
      
      Changes in v5:
      
      - Split-up the asm-generic/hw-breakpoint.h moving to
        linux/hw_breakpoint.h into a separate patch
      - Optimize the breakpoints restoring while switching from kvm guest
        to host. We only want to restore the state if we have active
        breakpoints to the host, otherwise we don't care about messed-up
        address registers.
      - Add asm/hw_breakpoint.h to Kbuild
      - Fix bad breakpoint type in trace_selftest.c
      
      Changes in v6:
      
      - Fix wrong header inclusion in trace.h (triggered a build
        error with CONFIG_FTRACE_SELFTEST
      Signed-off-by: NFrederic Weisbecker <fweisbec@gmail.com>
      Cc: Prasad <prasad@linux.vnet.ibm.com>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jan Kiszka <jan.kiszka@web.de>
      Cc: Jiri Slaby <jirislaby@gmail.com>
      Cc: Li Zefan <lizf@cn.fujitsu.com>
      Cc: Avi Kivity <avi@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Masami Hiramatsu <mhiramat@redhat.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      24f1e32c