1. 14 5月, 2015 2 次提交
  2. 11 5月, 2015 1 次提交
    • B
      x86/alternatives: Switch AMD F15h and later to the P6 NOPs · f21262b8
      Borislav Petkov 提交于
      Software optimization guides for both F15h and F16h cite those
      NOPs as the optimal ones. A microbenchmark confirms that
      actually even older families are better with the single-insn
      NOPs so switch to them for the alternatives.
      
      Cycles count below includes the loop overhead of the measurement
      but that overhead is the same with all runs.
      
      	F10h, revE:
      	-----------
      	Running NOP tests, 1000 NOPs x 1000000 repetitions
      
      	K8:
      			      90     288.212282 cycles
      			   66 90     288.220840 cycles
      			66 66 90     288.219447 cycles
      		     66 66 66 90     288.223204 cycles
      		  66 66 90 66 90     571.393424 cycles
      	       66 66 90 66 66 90     571.374919 cycles
      	    66 66 66 90 66 66 90     572.249281 cycles
      	 66 66 66 90 66 66 66 90     571.388651 cycles
      
      	P6:
      			      90     288.214193 cycles
      			   66 90     288.225550 cycles
      			0f 1f 00     288.224441 cycles
      		     0f 1f 40 00     288.225030 cycles
      		  0f 1f 44 00 00     288.233558 cycles
      	       66 0f 1f 44 00 00     324.792342 cycles
      	    0f 1f 80 00 00 00 00     325.657462 cycles
      	 0f 1f 84 00 00 00 00 00     430.246643 cycles
      
      	F14h:
      	----
      	Running NOP tests, 1000 NOPs x 1000000 repetitions
      
      	K8:
      			      90     510.404890 cycles
      			   66 90     510.432117 cycles
      			66 66 90     510.561858 cycles
      		     66 66 66 90     510.541865 cycles
      		  66 66 90 66 90    1014.192782 cycles
      	       66 66 90 66 66 90    1014.226546 cycles
      	    66 66 66 90 66 66 90    1014.334299 cycles
      	 66 66 66 90 66 66 66 90    1014.381205 cycles
      
      	P6:
      			      90     510.436710 cycles
      			   66 90     510.448229 cycles
      			0f 1f 00     510.545100 cycles
      		     0f 1f 40 00     510.502792 cycles
      		  0f 1f 44 00 00     510.589517 cycles
      	       66 0f 1f 44 00 00     510.611462 cycles
      	    0f 1f 80 00 00 00 00     511.166794 cycles
      	 0f 1f 84 00 00 00 00 00     511.651641 cycles
      
      	F15h:
      	-----
      	Running NOP tests, 1000 NOPs x 1000000 repetitions
      
      	K8:
      			      90     243.128396 cycles
      			   66 90     243.129883 cycles
      			66 66 90     243.131631 cycles
      		     66 66 66 90     242.499324 cycles
      		  66 66 90 66 90     481.829083 cycles
      	       66 66 90 66 66 90     481.884413 cycles
      	    66 66 66 90 66 66 90     481.851446 cycles
      	 66 66 66 90 66 66 66 90     481.409220 cycles
      
      	P6:
      			      90     243.127026 cycles
      			   66 90     243.130711 cycles
      			0f 1f 00     243.122747 cycles
      		     0f 1f 40 00     242.497617 cycles
      		  0f 1f 44 00 00     245.354461 cycles
      	       66 0f 1f 44 00 00     361.930417 cycles
      	    0f 1f 80 00 00 00 00     362.844944 cycles
      	 0f 1f 84 00 00 00 00 00     480.514948 cycles
      
      	F16h:
      	-----
      	Running NOP tests, 1000 NOPs x 1000000 repetitions
      
      	K8:
      			      90     507.793298 cycles
      			   66 90     507.789636 cycles
      			66 66 90     507.826490 cycles
      		     66 66 66 90     507.859075 cycles
      		  66 66 90 66 90    1008.663129 cycles
      	       66 66 90 66 66 90    1008.696259 cycles
      	    66 66 66 90 66 66 90    1008.692517 cycles
      	 66 66 66 90 66 66 66 90    1008.755399 cycles
      
      	P6:
      			      90     507.795232 cycles
      			   66 90     507.794761 cycles
      			0f 1f 00     507.834901 cycles
      		     0f 1f 40 00     507.822629 cycles
      		  0f 1f 44 00 00     507.838493 cycles
      	       66 0f 1f 44 00 00     507.908597 cycles
      	    0f 1f 80 00 00 00 00     507.946417 cycles
      	 0f 1f 84 00 00 00 00 00     507.954960 cycles
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Aravind Gopalakrishnan <aravind.gopalakrishnan@amd.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1431332153-18566-2-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f21262b8
  3. 10 5月, 2015 4 次提交
  4. 08 5月, 2015 6 次提交
    • D
      x86/entry: Define 'cpu_current_top_of_stack' for 64-bit code · 3a23208e
      Denys Vlasenko 提交于
      32-bit code has PER_CPU_VAR(cpu_current_top_of_stack).
      64-bit code uses somewhat more obscure: PER_CPU_VAR(cpu_tss + TSS_sp0).
      
      Define the 'cpu_current_top_of_stack' macro on CONFIG_X86_64
      as well so that the PER_CPU_VAR(cpu_current_top_of_stack)
      expression can be used in both 32-bit and 64-bit code.
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1429889495-27850-3-git-send-email-dvlasenk@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      3a23208e
    • D
      x86/entry: Remove unused 'kernel_stack' per-cpu variable · fed7c3f0
      Denys Vlasenko 提交于
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Acked-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1429889495-27850-2-git-send-email-dvlasenk@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      fed7c3f0
    • D
      x86/entry: Stop using PER_CPU_VAR(kernel_stack) · 63332a84
      Denys Vlasenko 提交于
      PER_CPU_VAR(kernel_stack) is redundant:
      
        - On the 64-bit build, we can use PER_CPU_VAR(cpu_tss + TSS_sp0).
        - On the 32-bit build, we can use PER_CPU_VAR(cpu_current_top_of_stack).
      
      PER_CPU_VAR(kernel_stack) will be deleted by a separate change.
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1429889495-27850-1-git-send-email-dvlasenk@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      63332a84
    • D
      x86: Force inlining of atomic ops · 2a4e90b1
      Denys Vlasenko 提交于
      With both gcc 4.7.2 and 4.9.2, sometimes gcc mysteriously
      doesn't inline very small functions we expect to be inlined:
      
      $ nm --size-sort vmlinux | grep -iF ' t ' | uniq -c | grep -v '^
      *1 ' | sort -rn     473 000000000000000b t spin_unlock_irqrestore
          449 000000000000005f t rcu_read_unlock
          355 0000000000000009 t atomic_inc                <== THIS
          353 000000000000006e t rcu_read_lock
          350 0000000000000075 t rcu_read_lock_sched_held
          291 000000000000000b t spin_unlock
          266 0000000000000019 t arch_local_irq_restore
          215 000000000000000b t spin_lock
          180 0000000000000011 t kzalloc
          165 0000000000000012 t list_add_tail
          161 0000000000000019 t arch_local_save_flags
          153 0000000000000016 t test_and_set_bit
          134 000000000000000b t spin_unlock_irq
          134 0000000000000009 t atomic_dec                <== THIS
          130 000000000000000b t spin_unlock_bh
          122 0000000000000010 t brelse
          120 0000000000000016 t test_and_clear_bit
          120 000000000000000b t spin_lock_irq
          119 000000000000001e t get_dma_ops
          117 0000000000000053 t cpumask_next
          116 0000000000000036 t kref_get
          114 000000000000001a t schedule_work
          106 000000000000000b t spin_lock_bh
          103 0000000000000019 t arch_local_irq_disable
      ...
      
      Note sizes of marked functions. They are merely 9 bytes long!
      Selecting function with 'atomic' in their names:
      
          355 0000000000000009 t atomic_inc
          134 0000000000000009 t atomic_dec
           98 0000000000000014 t atomic_dec_and_test
           31 000000000000000e t atomic_add_return
           27 000000000000000a t atomic64_inc
           26 000000000000002f t kmap_atomic
           24 0000000000000009 t atomic_add
           12 0000000000000009 t atomic_sub
           10 0000000000000021 t __atomic_add_unless
           10 000000000000000a t atomic64_add
            5 000000000000001f t __atomic_add_unless.constprop.7
            5 000000000000000a t atomic64_dec
            4 000000000000001f t __atomic_add_unless.constprop.18
            4 000000000000001f t __atomic_add_unless.constprop.12
            4 000000000000001f t __atomic_add_unless.constprop.10
            3 000000000000001f t __atomic_add_unless.constprop.13
            3 0000000000000011 t atomic64_add_return
            2 000000000000001f t __atomic_add_unless.constprop.9
            2 000000000000001f t __atomic_add_unless.constprop.8
            2 000000000000001f t __atomic_add_unless.constprop.6
            2 000000000000001f t __atomic_add_unless.constprop.5
            2 000000000000001f t __atomic_add_unless.constprop.3
            2 000000000000001f t __atomic_add_unless.constprop.22
            2 000000000000001f t __atomic_add_unless.constprop.14
            2 000000000000001f t __atomic_add_unless.constprop.11
            2 000000000000001e t atomic_dec_if_positive
            2 0000000000000014 t atomic_inc_and_test
            2 0000000000000011 t atomic_add_return.constprop.4
            2 0000000000000011 t atomic_add_return.constprop.17
            2 0000000000000011 t atomic_add_return.constprop.16
            2 000000000000000d t atomic_inc.constprop.4
            2 000000000000000c t atomic_cmpxchg
      
      This patch fixes this for x86 atomic ops via
      s/inline/__always_inline/. This decreases allyesconfig kernel by
      about 25k:
      
          text     data      bss       dec     hex filename
      82399481 22255416 20627456 125282353 777a831 vmlinux.before
      82375570 22255544 20627456 125258570 7774b4a vmlinux
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1431080762-17797-1-git-send-email-dvlasenk@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2a4e90b1
    • D
      x86/asm/entry/64: Clean up usage of TEST insns · 03335e95
      Denys Vlasenko 提交于
      By the nature of TEST operation, it is often possible
      to test a narrower part of the operand:
      
          "testl $3, mem"  -> "testb $3, mem"
      
      This results in shorter insns, because TEST insn has no
      sign-entending byte-immediate forms unlike other ALU ops.
      
         text	   data	    bss	    dec	    hex	filename
        11674	      0	      0	  11674	   2d9a	entry_64.o.before
        11658	      0	      0	  11658	   2d8a	entry_64.o
      
      Changes in object code:
      
      -	f7 84 24 88 00 00 00 03 00 00 00 	testl  $0x3,0x88(%rsp)
      +	f6 84 24 88 00 00 00 03	         	testb  $0x3,0x88(%rsp)
      -	f7 44 24 68 03 00 00 00          	testl  $0x3,0x68(%rsp)
      +	f6 44 24 68 03                  	testb  $0x3,0x68(%rsp)
      -	f7 84 24 90 00 00 00 03 00 00 00	testl  $0x3,0x90(%rsp)
      +	f6 84 24 90 00 00 00 03         	testb  $0x3,0x90(%rsp)
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Acked-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1430140912-7960-2-git-send-email-dvlasenk@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      03335e95
    • D
      x86/asm/entry/64: Tidy up JZ insns after TESTs · dde74f2e
      Denys Vlasenko 提交于
      After TESTs, use logically correct JZ/JNZ mnemonics instead of
      JE/JNE. This doesn't change code.
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Acked-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1430140912-7960-1-git-send-email-dvlasenk@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      dde74f2e
  5. 06 5月, 2015 6 次提交
  6. 05 5月, 2015 1 次提交
  7. 01 5月, 2015 1 次提交
    • J
      x86/PCI/ACPI: Make all resources except [io 0xcf8-0xcff] available on PCI bus · 2c62e849
      Jiang Liu 提交于
      An IO port or MMIO resource assigned to a PCI host bridge may be
      consumed by the host bridge itself or available to its child
      bus/devices. The ACPI specification defines a bit (Producer/Consumer)
      to tell whether the resource is consumed by the host bridge itself,
      but firmware hasn't used that bit consistently, so we can't rely on it.
      
      Before commit 593669c2 ("x86/PCI/ACPI: Use common ACPI resource
      interfaces to simplify implementation"), arch/x86/pci/acpi.c ignored
      all IO port resources defined by acpi_resource_io and
      acpi_resource_fixed_io to filter out IO ports consumed by the host
      bridge itself.
      
      Commit 593669c2 ("x86/PCI/ACPI: Use common ACPI resource interfaces
      to simplify implementation") started accepting all IO port and MMIO
      resources, which caused a regression that IO port resources consumed
      by the host bridge itself became available to its child devices.
      
      Then commit 63f1789e ("x86/PCI/ACPI: Ignore resources consumed by
      host bridge itself") ignored resources consumed by the host bridge
      itself by checking the IORESOURCE_WINDOW flag, which accidently removed
      MMIO resources defined by acpi_resource_memory24, acpi_resource_memory32
      and acpi_resource_fixed_memory32.
      
      On x86 and IA64 platforms, all IO port and MMIO resources are assumed
      to be available to child bus/devices except one special case:
          IO port [0xCF8-0xCFF] is consumed by the host bridge itself
          to access PCI configuration space.
      
      So explicitly filter out PCI CFG IO ports[0xCF8-0xCFF]. This solution
      will also ease the way to consolidate ACPI PCI host bridge common code
      from x86, ia64 and ARM64.
      
      Related ACPI table are archived at:
      https://bugzilla.kernel.org/show_bug.cgi?id=94221
      
      Related discussions at:
      http://patchwork.ozlabs.org/patch/461633/
      https://lkml.org/lkml/2015/3/29/304
      
      Fixes: 63f1789e (Ignore resources consumed by host bridge itself)
      Reported-by: NBernhard Thaler <bernhard.thaler@wvnet.at>
      Signed-off-by: NJiang Liu <jiang.liu@linux.intel.com>
      Cc: 4.0+ <stable@vger.kernel.org> # 4.0+
      Reviewed-by: NBjorn Helgaas <bhelgaas@google.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      2c62e849
  8. 30 4月, 2015 1 次提交
    • B
      xen: Suspend ticks on all CPUs during suspend · 2b953a5e
      Boris Ostrovsky 提交于
      Commit 77e32c89 ("clockevents: Manage device's state separately for
      the core") decouples clockevent device's modes from states. With this
      change when a Xen guest tries to resume, it won't be calling its
      set_mode op which needs to be done on each VCPU in order to make the
      hypervisor aware that we are in oneshot mode.
      
      This happens because clockevents_tick_resume() (which is an intermediate
      step of resuming ticks on a processor) doesn't call clockevents_set_state()
      anymore and because during suspend clockevent devices on all VCPUs (except
      for the one doing the suspend) are left in ONESHOT state. As result, during
      resume the clockevents state machine will assume that device is already
      where it should be and doesn't need to be updated.
      
      To avoid this problem we should suspend ticks on all VCPUs during
      suspend.
      Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NDavid Vrabel <david.vrabel@citrix.com>
      2b953a5e
  9. 27 4月, 2015 3 次提交
    • P
      x86: pvclock: Really remove the sched notifier for cross-cpu migrations · 73459e2a
      Paolo Bonzini 提交于
      This reverts commits 0a4e6be9
      and 80f7fdb1.
      
      The task migration notifier was originally introduced in order to support
      the pvclock vsyscall with non-synchronized TSC, but KVM only supports it
      with synchronized TSC.  Hence, on KVM the race condition is only needed
      due to a bad implementation on the host side, and even then it's so rare
      that it's mostly theoretical.
      
      As far as KVM is concerned it's possible to fix the host, avoiding the
      additional complexity in the vDSO and the (re)introduction of the task
      migration notifier.
      
      Xen, on the other hand, hasn't yet implemented vsyscall support at
      all, so we do not care about its plans for non-synchronized TSC.
      Reported-by: NPeter Zijlstra <peterz@infradead.org>
      Suggested-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      73459e2a
    • R
      kvm: x86: fix kvmclock update protocol · 5dca0d91
      Radim Krčmář 提交于
      The kvmclock spec says that the host will increment a version field to
      an odd number, then update stuff, then increment it to an even number.
      The host is buggy and doesn't do this, and the result is observable
      when one vcpu reads another vcpu's kvmclock data.
      
      There's no good way for a guest kernel to keep its vdso from reading
      a different vcpu's kvmclock data, but we don't need to care about
      changing VCPUs as long as we read a consistent data from kvmclock.
      (VCPU can change outside of this loop too, so it doesn't matter if we
      return a value not fit for this VCPU.)
      
      Based on a patch by Radim Krčmář.
      Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
      Acked-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      5dca0d91
    • A
      x86_64, asm: Work around AMD SYSRET SS descriptor attribute issue · 61f01dd9
      Andy Lutomirski 提交于
      AMD CPUs don't reinitialize the SS descriptor on SYSRET, so SYSRET with
      SS == 0 results in an invalid usermode state in which SS is apparently
      equal to __USER_DS but causes #SS if used.
      
      Work around the issue by setting SS to __KERNEL_DS __switch_to, thus
      ensuring that SYSRET never happens with SS set to NULL.
      
      This was exposed by a recent vDSO cleanup.
      
      Fixes: e7d6eefa x86/vdso32/syscall.S: Do not load __USER32_DS to %ss
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Peter Anvin <hpa@zytor.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Denys Vlasenko <vda.linux@googlemail.com>
      Cc: Brian Gerst <brgerst@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      61f01dd9
  10. 24 4月, 2015 2 次提交
    • L
      x86: fix special __probe_kernel_write() tail zeroing case · d869844b
      Linus Torvalds 提交于
      Commit cae2a173 ("x86: clean up/fix 'copy_in_user()' tail zeroing")
      fixed the failure case tail zeroing of one special case of the x86-64
      generic user-copy routine, namely when used for the user-to-user case
      ("copy_in_user()").
      
      But in the process it broke an even more unusual case: using the user
      copy routine for kernel-to-kernel copying.
      
      Now, normally kernel-kernel copies are obviously done using memcpy(),
      but we have a couple of special cases when we use the user-copy
      functions.  One is when we pass a kernel buffer to a regular user-buffer
      routine, using set_fs(KERNEL_DS).  That's a "normal" case, and continued
      to work fine, because it never takes any faults (with the possible
      exception of a silent and successful vmalloc fault).
      
      But Jan Beulich pointed out another, very unusual, special case: when we
      use the user-copy routines not because it's a path that expects a user
      pointer, but for a couple of ftrace/kgdb cases that want to do a kernel
      copy, but do so using "unsafe" buffers, and use the user-copy routine to
      gracefully handle faults.  IOW, for probe_kernel_write().
      
      And that broke for the case of a faulting kernel destination, because we
      saw the kernel destination and wanted to try to clear the tail of the
      buffer.  Which doesn't work, since that's what faults.
      
      This only triggers for things like kgdb and ftrace users (eg trying
      setting a breakpoint on read-only memory), but it's definitely a bug.
      The fix is to not compare against the kernel address start (TASK_SIZE),
      but instead use the same limits "access_ok()" uses.
      Reported-and-tested-by: NJan Beulich <jbeulich@suse.com>
      Cc: stable@vger.kernel.org # 4.0
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d869844b
    • A
      crypto: x86/sha512_ssse3 - fixup for asm function prototype change · 00425bb1
      Ard Biesheuvel 提交于
      Patch e68410eb ("crypto: x86/sha512_ssse3 - move SHA-384/512
      SSSE3 implementation to base layer") changed the prototypes of the
      core asm SHA-512 implementations so that they are compatible with
      the prototype used by the base layer.
      
      However, in one instance, the register that was used for passing the
      input buffer was reused as a scratch register later on in the code,
      and since the input buffer param changed places with the digest param
      -which needs to be written back before the function returns- this
      resulted in the scratch register to be dereferenced in a memory write
      operation, causing a GPF.
      
      Fix this by changing the scratch register to use the same register as
      the input buffer param again.
      
      Fixes: e68410eb ("crypto: x86/sha512_ssse3 - move SHA-384/512 SSSE3 implementation to base layer")
      Reported-By: NBobby Powers <bobbypowers@gmail.com>
      Tested-By: NBobby Powers <bobbypowers@gmail.com>
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      00425bb1
  11. 22 4月, 2015 9 次提交
    • D
      x86/asm/entry/32: Update -ENOSYS handling to match the 64-bit logic · 3f5159a9
      Denys Vlasenko 提交于
      Recently Andy changed the 64-bit syscall logic so that
      pt_regs->ax is initially set to -ENOSYS, and on syscall exit,
      it is updated with the actual return value. This simplified
      the logic there.
      
      This patch does the same for 32-bit syscall entry points.
      
      The check for %rax being too big is moved to be just before
      the call instruction which dispatches execution through the
      syscall table.
      
      There is no way to accidentally skip this check now by jumping
      to a label after it. This allows us to remove redundant checks
      after ptrace et al.
      
      If %rax is too big, we just skip over the (call, write %rax to
      pt_regs->ax) instruction pair. pt_regs->ax remains set to -ENOSYS,
      and it gets returned to userspace.
      
      Similar to 64-bit code, this eliminates the "ia32_badsys" code path.
      
      Run-tested.
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1429632194-13445-2-git-send-email-dvlasenk@redhat.com
      [ Changelog massage. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      3f5159a9
    • D
      x86/asm/entry/64: Merge 32-bit execve stubs with x32 ones, as they are identical · ac7f5dfb
      Denys Vlasenko 提交于
      Run-tested.
      Suggested-by: NBrian Gerst <brgerst@gmail.com>
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1429632194-13445-1-git-send-email-dvlasenk@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      ac7f5dfb
    • D
      x86/asm/entry/64: Implement better check for canonical addresses · 17be0aec
      Denys Vlasenko 提交于
      This change makes the check exact (no more false positives
      on "negative" addresses).
      
      Andy explains:
      
       "Canonical addresses either start with 17 zeros or 17 ones.
      
        In the old code, we checked that the top (64-47) = 17 bits were all
        zero.  We did this by shifting right by 47 bits and making sure that
        nothing was left.
      
        In the new code, we're shifting left by (64 - 48) = 16 bits and then
        signed shifting right by the same amount, this propagating the 17th
        highest bit to all positions to its left.  If we get the same value we
        started with, then we're good to go."
      
      While it isn't really important to be fully correct here -
      almost all addresses we'll ever see will be userspace ones,
      but OTOH it looks to be cheap enough: the new code uses
      two more ALU ops but preserves %rcx, allowing to not
      reload it from pt_regs->cx again.
      
      On disassembly level, the changes are:
      
        cmp %rcx,0x80(%rsp) -> mov 0x80(%rsp),%r11; cmp %rcx,%r11
        shr $0x2f,%rcx      -> shl $0x10,%rcx; sar $0x10,%rcx; cmp %rcx,%r11
        mov 0x58(%rsp),%rcx -> (eliminated)
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Acked-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Alexei Starovoitov <ast@plumgrid.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1429633649-20169-1-git-send-email-dvlasenk@redhat.com
      [ Changelog massage. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      17be0aec
    • S
      perf/x86/intel/uncore: Move PCI IDs for IMC to uncore driver · 0140e614
      Sonny Rao 提交于
      This keeps all the related PCI IDs together in the driver where
      they are used.
      Signed-off-by: NSonny Rao <sonnyrao@chromium.org>
      Acked-by: NBjorn Helgaas <bhelgaas@google.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/1429644791-25724-1-git-send-email-sonnyrao@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      0140e614
    • S
      perf/x86/intel/uncore: Add support for Intel Haswell ULT (lower power Mobile... · 80bcffb3
      Sonny Rao 提交于
      perf/x86/intel/uncore: Add support for Intel Haswell ULT (lower power Mobile Processor) IMC uncore PMUs
      
      This uncore is the same as the Haswell desktop part but uses a
      different PCI ID.
      Signed-off-by: NSonny Rao <sonnyrao@chromium.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/1429569247-16697-1-git-send-email-sonnyrao@chromium.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      80bcffb3
    • J
      perf/x86/intel: Add cpu_(prepare|starting|dying) for core_pmu · 3b6e0421
      Jiri Olsa 提交于
      The core_pmu does not define cpu_* callbacks, which handles
      allocation of 'struct cpu_hw_events::shared_regs' data,
      initialization of debug store and PMU_FL_EXCL_CNTRS counters.
      
      While this probably won't happen on bare metal, virtual CPU can
      define x86_pmu.extra_regs together with PMU version 1 and thus
      be using core_pmu -> using shared_regs data without it being
      allocated. That could could leave to following panic:
      
      	BUG: unable to handle kernel NULL pointer dereference at (null)
      	IP: [<ffffffff8152cd4f>] _spin_lock_irqsave+0x1f/0x40
      
      	SNIP
      
      	 [<ffffffff81024bd9>] __intel_shared_reg_get_constraints+0x69/0x1e0
      	 [<ffffffff81024deb>] intel_get_event_constraints+0x9b/0x180
      	 [<ffffffff8101e815>] x86_schedule_events+0x75/0x1d0
      	 [<ffffffff810586dc>] ? check_preempt_curr+0x7c/0x90
      	 [<ffffffff810649fe>] ? try_to_wake_up+0x24e/0x3e0
      	 [<ffffffff81064ba2>] ? default_wake_function+0x12/0x20
      	 [<ffffffff8109eb16>] ? autoremove_wake_function+0x16/0x40
      	 [<ffffffff810577e9>] ? __wake_up_common+0x59/0x90
      	 [<ffffffff811a9517>] ? __d_lookup+0xa7/0x150
      	 [<ffffffff8119db5f>] ? do_lookup+0x9f/0x230
      	 [<ffffffff811a993a>] ? dput+0x9a/0x150
      	 [<ffffffff8119c8f5>] ? path_to_nameidata+0x25/0x60
      	 [<ffffffff8119e90a>] ? __link_path_walk+0x7da/0x1000
      	 [<ffffffff8101d8f9>] ? x86_pmu_add+0xb9/0x170
      	 [<ffffffff8101d7a7>] x86_pmu_commit_txn+0x67/0xc0
      	 [<ffffffff811b07b0>] ? mntput_no_expire+0x30/0x110
      	 [<ffffffff8119c731>] ? path_put+0x31/0x40
      	 [<ffffffff8107c297>] ? current_fs_time+0x27/0x30
      	 [<ffffffff8117d170>] ? mem_cgroup_get_reclaim_stat_from_page+0x20/0x70
      	 [<ffffffff8111b7aa>] group_sched_in+0x13a/0x170
      	 [<ffffffff81014a29>] ? sched_clock+0x9/0x10
      	 [<ffffffff8111bac8>] ctx_sched_in+0x2e8/0x330
      	 [<ffffffff8111bb7b>] perf_event_sched_in+0x6b/0xb0
      	 [<ffffffff8111bc36>] perf_event_context_sched_in+0x76/0xc0
      	 [<ffffffff8111eb3b>] perf_event_comm+0x1bb/0x2e0
      	 [<ffffffff81195ee9>] set_task_comm+0x69/0x80
      	 [<ffffffff81195fe1>] setup_new_exec+0xe1/0x2e0
      	 [<ffffffff811ea68e>] load_elf_binary+0x3ce/0x1ab0
      
      Adding cpu_(prepare|starting|dying) for core_pmu to have
      shared_regs data allocated for core_pmu. AFAICS there's no harm
      to initialize debug store and PMU_FL_EXCL_CNTRS either for
      core_pmu.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Link: http://lkml.kernel.org/r/20150421152623.GC13169@krava.redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      3b6e0421
    • H
      x86/asm: Always inline atomics · 3462bd2a
      Hagen Paul Pfeifer 提交于
      During some code analysis I realized that atomic_add(), atomic_sub()
      and friends are not necessarily inlined AND that each function
      is defined multiple times:
      
      	atomic_inc:          544 duplicates
      	atomic_dec:          215 duplicates
      	atomic_dec_and_test: 107 duplicates
      	atomic64_inc:         38 duplicates
      	[...]
      
      Each definition is exact equally, e.g.:
      
      	ffffffff813171b8 <atomic_add>:
      	55         push   %rbp
      	48 89 e5   mov    %rsp,%rbp
      	f0 01 3e   lock add %edi,(%rsi)
      	5d         pop    %rbp
      	c3         retq
      
      In turn each definition has one or more callsites (sure):
      
      	ffffffff81317c78: e8 3b f5 ff ff  callq  ffffffff813171b8 <atomic_add> [...]
      	ffffffff8131a062: e8 51 d1 ff ff  callq  ffffffff813171b8 <atomic_add> [...]
      	ffffffff8131a190: e8 23 d0 ff ff  callq  ffffffff813171b8 <atomic_add> [...]
      
      The other way around would be to remove the static linkage - but
      I prefer an enforced inlining here.
      
      	Before:
      	  text     data	  bss      dec       hex     filename
      	  81467393 19874720 20168704 121510817 73e1ba1 vmlinux.orig
      
      	After:
      	  text     data     bss      dec       hex     filename
      	  81461323 19874720 20168704 121504747 73e03eb vmlinux.inlined
      
      Yes, the inlining here makes the kernel even smaller! ;)
      
      Linus further observed:
      
      	"I have this memory of having seen that before - the size
      	 heuristics for gcc getting confused by inlining.
      	 [...]
      
      	 It might be a good idea to mark things that are basically just
      	 wrappers around a single (or a couple of) asm instruction to be
      	 always_inline."
      Signed-off-by: NHagen Paul Pfeifer <hagen@jauu.net>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1429565231-4609-1-git-send-email-hagen@jauu.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      3462bd2a
    • A
      x86, paravirt, xen: Remove the 64-bit ->irq_enable_sysexit() pvop · aac82d31
      Andy Lutomirski 提交于
      We don't use irq_enable_sysexit on 64-bit kernels any more.
      Remove all the paravirt and Xen machinery to support it on
      64-bit kernels.
      Tested-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Denys Vlasenko <vda.linux@googlemail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/8a03355698fe5b94194e9e7360f19f91c1b2cf1f.1428100853.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      aac82d31
    • B
      KVM: VMX: Preserve host CR4.MCE value while in guest mode. · 085e68ee
      Ben Serebrin 提交于
      The host's decision to enable machine check exceptions should remain
      in force during non-root mode.  KVM was writing 0 to cr4 on VCPU reset
      and passed a slightly-modified 0 to the vmcs.guest_cr4 value.
      
      Tested: Built.
      On earlier version, tested by injecting machine check
      while a guest is spinning.
      
      Before the change, if guest CR4.MCE==0, then the machine check is
      escalated to Catastrophic Error (CATERR) and the machine dies.
      If guest CR4.MCE==1, then the machine check causes VMEXIT and is
      handled normally by host Linux. After the change, injecting a machine
      check causes normal Linux machine check handling.
      Signed-off-by: NBen Serebrin <serebrin@google.com>
      Reviewed-by: NVenkatesh Srinivas <venkateshs@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      085e68ee
  12. 20 4月, 2015 1 次提交
  13. 19 4月, 2015 1 次提交
  14. 18 4月, 2015 2 次提交