1. 07 6月, 2015 3 次提交
  2. 27 5月, 2015 1 次提交
  3. 07 5月, 2015 8 次提交
  4. 06 5月, 2015 4 次提交
  5. 05 5月, 2015 1 次提交
  6. 01 5月, 2015 1 次提交
    • J
      x86/PCI/ACPI: Make all resources except [io 0xcf8-0xcff] available on PCI bus · 2c62e849
      Jiang Liu 提交于
      An IO port or MMIO resource assigned to a PCI host bridge may be
      consumed by the host bridge itself or available to its child
      bus/devices. The ACPI specification defines a bit (Producer/Consumer)
      to tell whether the resource is consumed by the host bridge itself,
      but firmware hasn't used that bit consistently, so we can't rely on it.
      
      Before commit 593669c2 ("x86/PCI/ACPI: Use common ACPI resource
      interfaces to simplify implementation"), arch/x86/pci/acpi.c ignored
      all IO port resources defined by acpi_resource_io and
      acpi_resource_fixed_io to filter out IO ports consumed by the host
      bridge itself.
      
      Commit 593669c2 ("x86/PCI/ACPI: Use common ACPI resource interfaces
      to simplify implementation") started accepting all IO port and MMIO
      resources, which caused a regression that IO port resources consumed
      by the host bridge itself became available to its child devices.
      
      Then commit 63f1789e ("x86/PCI/ACPI: Ignore resources consumed by
      host bridge itself") ignored resources consumed by the host bridge
      itself by checking the IORESOURCE_WINDOW flag, which accidently removed
      MMIO resources defined by acpi_resource_memory24, acpi_resource_memory32
      and acpi_resource_fixed_memory32.
      
      On x86 and IA64 platforms, all IO port and MMIO resources are assumed
      to be available to child bus/devices except one special case:
          IO port [0xCF8-0xCFF] is consumed by the host bridge itself
          to access PCI configuration space.
      
      So explicitly filter out PCI CFG IO ports[0xCF8-0xCFF]. This solution
      will also ease the way to consolidate ACPI PCI host bridge common code
      from x86, ia64 and ARM64.
      
      Related ACPI table are archived at:
      https://bugzilla.kernel.org/show_bug.cgi?id=94221
      
      Related discussions at:
      http://patchwork.ozlabs.org/patch/461633/
      https://lkml.org/lkml/2015/3/29/304
      
      Fixes: 63f1789e (Ignore resources consumed by host bridge itself)
      Reported-by: NBernhard Thaler <bernhard.thaler@wvnet.at>
      Signed-off-by: NJiang Liu <jiang.liu@linux.intel.com>
      Cc: 4.0+ <stable@vger.kernel.org> # 4.0+
      Reviewed-by: NBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      2c62e849
  7. 30 4月, 2015 1 次提交
    • B
      xen: Suspend ticks on all CPUs during suspend · 2b953a5e
      Boris Ostrovsky 提交于
      Commit 77e32c89 ("clockevents: Manage device's state separately for
      the core") decouples clockevent device's modes from states. With this
      change when a Xen guest tries to resume, it won't be calling its
      set_mode op which needs to be done on each VCPU in order to make the
      hypervisor aware that we are in oneshot mode.
      
      This happens because clockevents_tick_resume() (which is an intermediate
      step of resuming ticks on a processor) doesn't call clockevents_set_state()
      anymore and because during suspend clockevent devices on all VCPUs (except
      for the one doing the suspend) are left in ONESHOT state. As result, during
      resume the clockevents state machine will assume that device is already
      where it should be and doesn't need to be updated.
      
      To avoid this problem we should suspend ticks on all VCPUs during
      suspend.
      Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      2b953a5e
  8. 27 4月, 2015 3 次提交
    • P
      x86: pvclock: Really remove the sched notifier for cross-cpu migrations · 73459e2a
      Paolo Bonzini 提交于
      This reverts commits 0a4e6be9
      and 80f7fdb1.
      
      The task migration notifier was originally introduced in order to support
      the pvclock vsyscall with non-synchronized TSC, but KVM only supports it
      with synchronized TSC.  Hence, on KVM the race condition is only needed
      due to a bad implementation on the host side, and even then it's so rare
      that it's mostly theoretical.
      
      As far as KVM is concerned it's possible to fix the host, avoiding the
      additional complexity in the vDSO and the (re)introduction of the task
      migration notifier.
      
      Xen, on the other hand, hasn't yet implemented vsyscall support at
      all, so we do not care about its plans for non-synchronized TSC.
      Reported-by: NPeter Zijlstra <peterz@infradead.org>
      Suggested-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      73459e2a
    • R
      kvm: x86: fix kvmclock update protocol · 5dca0d91
      Radim Krčmář 提交于
      The kvmclock spec says that the host will increment a version field to
      an odd number, then update stuff, then increment it to an even number.
      The host is buggy and doesn't do this, and the result is observable
      when one vcpu reads another vcpu's kvmclock data.
      
      There's no good way for a guest kernel to keep its vdso from reading
      a different vcpu's kvmclock data, but we don't need to care about
      changing VCPUs as long as we read a consistent data from kvmclock.
      (VCPU can change outside of this loop too, so it doesn't matter if we
      return a value not fit for this VCPU.)
      
      Based on a patch by Radim Krčmář.
      Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
      Acked-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      5dca0d91
    • A
      x86_64, asm: Work around AMD SYSRET SS descriptor attribute issue · 61f01dd9
      Andy Lutomirski 提交于
      AMD CPUs don't reinitialize the SS descriptor on SYSRET, so SYSRET with
      SS == 0 results in an invalid usermode state in which SS is apparently
      equal to __USER_DS but causes #SS if used.
      
      Work around the issue by setting SS to __KERNEL_DS __switch_to, thus
      ensuring that SYSRET never happens with SS set to NULL.
      
      This was exposed by a recent vDSO cleanup.
      
      Fixes: e7d6eefa x86/vdso32/syscall.S: Do not load __USER32_DS to %ss
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Peter Anvin <hpa@zytor.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Denys Vlasenko <vda.linux@googlemail.com>
      Cc: Brian Gerst <brgerst@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      61f01dd9
  9. 24 4月, 2015 2 次提交
    • L
      x86: fix special __probe_kernel_write() tail zeroing case · d869844b
      Linus Torvalds 提交于
      Commit cae2a173 ("x86: clean up/fix 'copy_in_user()' tail zeroing")
      fixed the failure case tail zeroing of one special case of the x86-64
      generic user-copy routine, namely when used for the user-to-user case
      ("copy_in_user()").
      
      But in the process it broke an even more unusual case: using the user
      copy routine for kernel-to-kernel copying.
      
      Now, normally kernel-kernel copies are obviously done using memcpy(),
      but we have a couple of special cases when we use the user-copy
      functions.  One is when we pass a kernel buffer to a regular user-buffer
      routine, using set_fs(KERNEL_DS).  That's a "normal" case, and continued
      to work fine, because it never takes any faults (with the possible
      exception of a silent and successful vmalloc fault).
      
      But Jan Beulich pointed out another, very unusual, special case: when we
      use the user-copy routines not because it's a path that expects a user
      pointer, but for a couple of ftrace/kgdb cases that want to do a kernel
      copy, but do so using "unsafe" buffers, and use the user-copy routine to
      gracefully handle faults.  IOW, for probe_kernel_write().
      
      And that broke for the case of a faulting kernel destination, because we
      saw the kernel destination and wanted to try to clear the tail of the
      buffer.  Which doesn't work, since that's what faults.
      
      This only triggers for things like kgdb and ftrace users (eg trying
      setting a breakpoint on read-only memory), but it's definitely a bug.
      The fix is to not compare against the kernel address start (TASK_SIZE),
      but instead use the same limits "access_ok()" uses.
      Reported-and-tested-by: NJan Beulich <jbeulich@suse.com>
      Cc: stable@vger.kernel.org # 4.0
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d869844b
    • A
      crypto: x86/sha512_ssse3 - fixup for asm function prototype change · 00425bb1
      Ard Biesheuvel 提交于
      Patch e68410eb ("crypto: x86/sha512_ssse3 - move SHA-384/512
      SSSE3 implementation to base layer") changed the prototypes of the
      core asm SHA-512 implementations so that they are compatible with
      the prototype used by the base layer.
      
      However, in one instance, the register that was used for passing the
      input buffer was reused as a scratch register later on in the code,
      and since the input buffer param changed places with the digest param
      -which needs to be written back before the function returns- this
      resulted in the scratch register to be dereferenced in a memory write
      operation, causing a GPF.
      
      Fix this by changing the scratch register to use the same register as
      the input buffer param again.
      
      Fixes: e68410eb ("crypto: x86/sha512_ssse3 - move SHA-384/512 SSSE3 implementation to base layer")
      Reported-By: NBobby Powers <bobbypowers@gmail.com>
      Tested-By: NBobby Powers <bobbypowers@gmail.com>
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      00425bb1
  10. 22 4月, 2015 4 次提交
    • S
      perf/x86/intel/uncore: Move PCI IDs for IMC to uncore driver · 0140e614
      Sonny Rao 提交于
      This keeps all the related PCI IDs together in the driver where
      they are used.
      Signed-off-by: NSonny Rao <sonnyrao@chromium.org>
      Acked-by: NBjorn Helgaas <bhelgaas@google.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/1429644791-25724-1-git-send-email-sonnyrao@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      0140e614
    • S
      perf/x86/intel/uncore: Add support for Intel Haswell ULT (lower power Mobile... · 80bcffb3
      Sonny Rao 提交于
      perf/x86/intel/uncore: Add support for Intel Haswell ULT (lower power Mobile Processor) IMC uncore PMUs
      
      This uncore is the same as the Haswell desktop part but uses a
      different PCI ID.
      Signed-off-by: NSonny Rao <sonnyrao@chromium.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/1429569247-16697-1-git-send-email-sonnyrao@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      80bcffb3
    • J
      perf/x86/intel: Add cpu_(prepare|starting|dying) for core_pmu · 3b6e0421
      Jiri Olsa 提交于
      The core_pmu does not define cpu_* callbacks, which handles
      allocation of 'struct cpu_hw_events::shared_regs' data,
      initialization of debug store and PMU_FL_EXCL_CNTRS counters.
      
      While this probably won't happen on bare metal, virtual CPU can
      define x86_pmu.extra_regs together with PMU version 1 and thus
      be using core_pmu -> using shared_regs data without it being
      allocated. That could could leave to following panic:
      
      	BUG: unable to handle kernel NULL pointer dereference at (null)
      	IP: [<ffffffff8152cd4f>] _spin_lock_irqsave+0x1f/0x40
      
      	SNIP
      
      	 [<ffffffff81024bd9>] __intel_shared_reg_get_constraints+0x69/0x1e0
      	 [<ffffffff81024deb>] intel_get_event_constraints+0x9b/0x180
      	 [<ffffffff8101e815>] x86_schedule_events+0x75/0x1d0
      	 [<ffffffff810586dc>] ? check_preempt_curr+0x7c/0x90
      	 [<ffffffff810649fe>] ? try_to_wake_up+0x24e/0x3e0
      	 [<ffffffff81064ba2>] ? default_wake_function+0x12/0x20
      	 [<ffffffff8109eb16>] ? autoremove_wake_function+0x16/0x40
      	 [<ffffffff810577e9>] ? __wake_up_common+0x59/0x90
      	 [<ffffffff811a9517>] ? __d_lookup+0xa7/0x150
      	 [<ffffffff8119db5f>] ? do_lookup+0x9f/0x230
      	 [<ffffffff811a993a>] ? dput+0x9a/0x150
      	 [<ffffffff8119c8f5>] ? path_to_nameidata+0x25/0x60
      	 [<ffffffff8119e90a>] ? __link_path_walk+0x7da/0x1000
      	 [<ffffffff8101d8f9>] ? x86_pmu_add+0xb9/0x170
      	 [<ffffffff8101d7a7>] x86_pmu_commit_txn+0x67/0xc0
      	 [<ffffffff811b07b0>] ? mntput_no_expire+0x30/0x110
      	 [<ffffffff8119c731>] ? path_put+0x31/0x40
      	 [<ffffffff8107c297>] ? current_fs_time+0x27/0x30
      	 [<ffffffff8117d170>] ? mem_cgroup_get_reclaim_stat_from_page+0x20/0x70
      	 [<ffffffff8111b7aa>] group_sched_in+0x13a/0x170
      	 [<ffffffff81014a29>] ? sched_clock+0x9/0x10
      	 [<ffffffff8111bac8>] ctx_sched_in+0x2e8/0x330
      	 [<ffffffff8111bb7b>] perf_event_sched_in+0x6b/0xb0
      	 [<ffffffff8111bc36>] perf_event_context_sched_in+0x76/0xc0
      	 [<ffffffff8111eb3b>] perf_event_comm+0x1bb/0x2e0
      	 [<ffffffff81195ee9>] set_task_comm+0x69/0x80
      	 [<ffffffff81195fe1>] setup_new_exec+0xe1/0x2e0
      	 [<ffffffff811ea68e>] load_elf_binary+0x3ce/0x1ab0
      
      Adding cpu_(prepare|starting|dying) for core_pmu to have
      shared_regs data allocated for core_pmu. AFAICS there's no harm
      to initialize debug store and PMU_FL_EXCL_CNTRS either for
      core_pmu.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/20150421152623.GC13169@krava.redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      3b6e0421
    • B
      KVM: VMX: Preserve host CR4.MCE value while in guest mode. · 085e68ee
      Ben Serebrin 提交于
      The host's decision to enable machine check exceptions should remain
      in force during non-root mode.  KVM was writing 0 to cr4 on VCPU reset
      and passed a slightly-modified 0 to the vmcs.guest_cr4 value.
      
      Tested: Built.
      On earlier version, tested by injecting machine check
      while a guest is spinning.
      
      Before the change, if guest CR4.MCE==0, then the machine check is
      escalated to Catastrophic Error (CATERR) and the machine dies.
      If guest CR4.MCE==1, then the machine check causes VMEXIT and is
      handled normally by host Linux. After the change, injecting a machine
      check causes normal Linux machine check handling.
      Signed-off-by: NBen Serebrin <serebrin@google.com>
      Reviewed-by: NVenkatesh Srinivas <venkateshs@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      085e68ee
  11. 20 4月, 2015 1 次提交
  12. 19 4月, 2015 1 次提交
  13. 18 4月, 2015 2 次提交
  14. 17 4月, 2015 7 次提交
  15. 16 4月, 2015 1 次提交
    • O
      x86/ptrace: Fix the TIF_FORCED_TF logic in handle_signal() · fd0f86b6
      Oleg Nesterov 提交于
      When the TIF_SINGLESTEP tracee dequeues a signal,
      handle_signal() clears TIF_FORCED_TF and X86_EFLAGS_TF but
      leaves TIF_SINGLESTEP set.
      
      If the tracer does PTRACE_SINGLESTEP again, enable_single_step()
      sets X86_EFLAGS_TF but not TIF_FORCED_TF.  This means that the
      subsequent PTRACE_CONT doesn't not clear X86_EFLAGS_TF, and the
      tracee gets the wrong SIGTRAP.
      
      Test-case (needs -O2 to avoid prologue insns in signal handler):
      
      	#include <unistd.h>
      	#include <stdio.h>
      	#include <sys/ptrace.h>
      	#include <sys/wait.h>
      	#include <sys/user.h>
      	#include <assert.h>
      	#include <stddef.h>
      
      	void handler(int n)
      	{
      		asm("nop");
      	}
      
      	int child(void)
      	{
      		assert(ptrace(PTRACE_TRACEME, 0,0,0) == 0);
      		signal(SIGALRM, handler);
      		kill(getpid(), SIGALRM);
      		return 0x23;
      	}
      
      	void *getip(int pid)
      	{
      		return (void*)ptrace(PTRACE_PEEKUSER, pid,
      					offsetof(struct user, regs.rip), 0);
      	}
      
      	int main(void)
      	{
      		int pid, status;
      
      		pid = fork();
      		if (!pid)
      			return child();
      
      		assert(wait(&status) == pid);
      		assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGALRM);
      
      		assert(ptrace(PTRACE_SINGLESTEP, pid, 0, SIGALRM) == 0);
      		assert(wait(&status) == pid);
      		assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGTRAP);
      		assert((getip(pid) - (void*)handler) == 0);
      
      		assert(ptrace(PTRACE_SINGLESTEP, pid, 0, SIGALRM) == 0);
      		assert(wait(&status) == pid);
      		assert(WIFSTOPPED(status) && WSTOPSIG(status) == SIGTRAP);
      		assert((getip(pid) - (void*)handler) == 1);
      
      		assert(ptrace(PTRACE_CONT, pid, 0,0) == 0);
      		assert(wait(&status) == pid);
      		assert(WIFEXITED(status) && WEXITSTATUS(status) == 0x23);
      
      		return 0;
      	}
      
      The last assert() fails because PTRACE_CONT wrongly triggers
      another single-step and X86_EFLAGS_TF can't be cleared by
      debugger until the tracee does sys_rt_sigreturn().
      
      Change handle_signal() to do user_disable_single_step() if
      stepping, we do not need to preserve TIF_SINGLESTEP because we
      are going to do ptrace_notify(), and it is simply wrong to leak
      this bit.
      
      While at it, change the comment to explain why we also need to
      clear TF unconditionally after setup_rt_frame().
      
      Note: in the longer term we should probably change
      setup_sigcontext() to use get_flags() and then just remove this
      user_disable_single_step().  And, the state of TIF_FORCED_TF can
      be wrong after restore_sigcontext() which can set/clear TF, this
      needs another fix.
      
      This fix fixes the 'single_step_syscall_32' testcase in
      the x86 testsuite:
      
      Before:
      
      	~/linux/tools/testing/selftests/x86> ./single_step_syscall_32
      	[RUN]   Set TF and check nop
      	[OK]    Survived with TF set and 9 traps
      	[RUN]   Set TF and check int80
      	[OK]    Survived with TF set and 9 traps
      	[RUN]   Set TF and check a fast syscall
      	[WARN]  Hit 10000 SIGTRAPs with si_addr 0xf7789cc0, ip 0xf7789cc0
      	Trace/breakpoint trap (core dumped)
      
      After:
      
      	~/linux/linux/tools/testing/selftests/x86> ./single_step_syscall_32
      	[RUN]   Set TF and check nop
      	[OK]    Survived with TF set and 9 traps
      	[RUN]   Set TF and check int80
      	[OK]    Survived with TF set and 9 traps
      	[RUN]   Set TF and check a fast syscall
      	[OK]    Survived with TF set and 39 traps
      	[RUN]   Fast syscall with TF cleared
      	[OK]    Nothing unexpected happened
      Reported-by: NEvan Teran <eteran@alum.rit.edu>
      Reported-by: NPedro Alves <palves@redhat.com>
      Tested-by: NAndres Freund <andres@anarazel.de>
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      [ Added x86 self-test info. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      fd0f86b6