1. 03 1月, 2015 1 次提交
    • A
      x86, entry: Switch stacks on a paranoid entry from userspace · 48e08d0f
      Andy Lutomirski 提交于
      This causes all non-NMI, non-double-fault kernel entries from
      userspace to run on the normal kernel stack.  Double-fault is
      exempt to minimize confusion if we double-fault directly from
      userspace due to a bad kernel stack.
      
      This is, suprisingly, simpler and shorter than the current code.  It
      removes the IMO rather frightening paranoid_userspace path, and it
      make sync_regs much simpler.
      
      There is no risk of stack overflow due to this change -- the kernel
      stack that we switch to is empty.
      
      This will also enable us to create non-atomic sections within
      machine checks from userspace, which will simplify memory failure
      handling.  It will also allow the upcoming fsgsbase code to be
      simplified, because it doesn't need to worry about usergs when
      scheduling in paranoid_exit, as that code no longer exists.
      
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Tony Luck <tony.luck@intel.com>
      Acked-by: NBorislav Petkov <bp@alien8.de>
      Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
      48e08d0f
  2. 18 12月, 2014 1 次提交
  3. 16 12月, 2014 26 次提交
  4. 14 12月, 2014 3 次提交
    • A
      x86/tls: Disallow unusual TLS segments · 0e58af4e
      Andy Lutomirski 提交于
      Users have no business installing custom code segments into the
      GDT, and segments that are not present but are otherwise valid
      are a historical source of interesting attacks.
      
      For completeness, block attempts to set the L bit.  (Prior to
      this patch, the L bit would have been silently dropped.)
      
      This is an ABI break.  I've checked glibc, musl, and Wine, and
      none of them look like they'll have any trouble.
      
      Note to stable maintainers: this is a hardening patch that fixes
      no known bugs.  Given the possibility of ABI issues, this
      probably shouldn't be backported quickly.
      Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
      Acked-by: NH. Peter Anvin <hpa@zytor.com>
      Cc: stable@vger.kernel.org # optional
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: security@kernel.org <security@kernel.org>
      Cc: Willy Tarreau <w@1wt.eu>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      0e58af4e
    • A
      x86/tls: Validate TLS entries to protect espfix · 41bdc785
      Andy Lutomirski 提交于
      Installing a 16-bit RW data segment into the GDT defeats espfix.
      AFAICT this will not affect glibc, Wine, or dosemu at all.
      Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
      Acked-by: NH. Peter Anvin <hpa@zytor.com>
      Cc: stable@vger.kernel.org
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: security@kernel.org <security@kernel.org>
      Cc: Willy Tarreau <w@1wt.eu>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      41bdc785
    • D
      x86: hook up execveat system call · 27d6ec7a
      David Drysdale 提交于
      Hook up x86-64, i386 and x32 ABIs.
      Signed-off-by: NDavid Drysdale <drysdale@google.com>
      Cc: Meredydd Luff <meredydd@senatehouse.org>
      Cc: Shuah Khan <shuah.kh@samsung.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Rich Felker <dalias@aerifal.cx>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      27d6ec7a
  5. 13 12月, 2014 1 次提交
  6. 12 12月, 2014 1 次提交
  7. 11 12月, 2014 4 次提交
    • B
      x86/asm: Guard against building the 32/64-bit versions of the asm-offsets*.c file directly · 5de2b61a
      Borislav Petkov 提交于
      Sometimes it is helpful to build a kernel compilation unit
      directly, i.e.:
      
        make .../<filename>.i
      
      in order to look at compiler output.
      
      Since asm-offsets_{32,64}.c are included by asm-offsets.c and
      building them directly doesn't evaluate the macros used (thus
      making the preprocessor output not very useful), error out when
      an attempt is made to build them. Issue a hint for the user to
      build asm-offsets.c instead.
      Suggested-by: NMichael Matz <matz@suse.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Michal Marek <mmarek@suse.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/1418139917-12722-1-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5de2b61a
    • A
      x86_64, switch_to(): Load TLS descriptors before switching DS and ES · f647d7c1
      Andy Lutomirski 提交于
      Otherwise, if buggy user code points DS or ES into the TLS
      array, they would be corrupted after a context switch.
      
      This also significantly improves the comments and documents some
      gotchas in the code.
      
      Before this patch, the both tests below failed.  With this
      patch, the es test passes, although the gsbase test still fails.
      
       ----- begin es test -----
      
      /*
       * Copyright (c) 2014 Andy Lutomirski
       * GPL v2
       */
      
      static unsigned short GDT3(int idx)
      {
      	return (idx << 3) | 3;
      }
      
      static int create_tls(int idx, unsigned int base)
      {
      	struct user_desc desc = {
      		.entry_number    = idx,
      		.base_addr       = base,
      		.limit           = 0xfffff,
      		.seg_32bit       = 1,
      		.contents        = 0, /* Data, grow-up */
      		.read_exec_only  = 0,
      		.limit_in_pages  = 1,
      		.seg_not_present = 0,
      		.useable         = 0,
      	};
      
      	if (syscall(SYS_set_thread_area, &desc) != 0)
      		err(1, "set_thread_area");
      
      	return desc.entry_number;
      }
      
      int main()
      {
      	int idx = create_tls(-1, 0);
      	printf("Allocated GDT index %d\n", idx);
      
      	unsigned short orig_es;
      	asm volatile ("mov %%es,%0" : "=rm" (orig_es));
      
      	int errors = 0;
      	int total = 1000;
      	for (int i = 0; i < total; i++) {
      		asm volatile ("mov %0,%%es" : : "rm" (GDT3(idx)));
      		usleep(100);
      
      		unsigned short es;
      		asm volatile ("mov %%es,%0" : "=rm" (es));
      		asm volatile ("mov %0,%%es" : : "rm" (orig_es));
      		if (es != GDT3(idx)) {
      			if (errors == 0)
      				printf("[FAIL]\tES changed from 0x%hx to 0x%hx\n",
      				       GDT3(idx), es);
      			errors++;
      		}
      	}
      
      	if (errors) {
      		printf("[FAIL]\tES was corrupted %d/%d times\n", errors, total);
      		return 1;
      	} else {
      		printf("[OK]\tES was preserved\n");
      		return 0;
      	}
      }
      
       ----- end es test -----
      
       ----- begin gsbase test -----
      
      /*
       * gsbase.c, a gsbase test
       * Copyright (c) 2014 Andy Lutomirski
       * GPL v2
       */
      
      static unsigned char *testptr, *testptr2;
      
      static unsigned char read_gs_testvals(void)
      {
      	unsigned char ret;
      	asm volatile ("movb %%gs:%1, %0" : "=r" (ret) : "m" (*testptr));
      	return ret;
      }
      
      int main()
      {
      	int errors = 0;
      
      	testptr = mmap((void *)0x200000000UL, 1, PROT_READ | PROT_WRITE,
      		       MAP_PRIVATE | MAP_FIXED | MAP_ANONYMOUS, -1, 0);
      	if (testptr == MAP_FAILED)
      		err(1, "mmap");
      
      	testptr2 = mmap((void *)0x300000000UL, 1, PROT_READ | PROT_WRITE,
      		       MAP_PRIVATE | MAP_FIXED | MAP_ANONYMOUS, -1, 0);
      	if (testptr2 == MAP_FAILED)
      		err(1, "mmap");
      
      	*testptr = 0;
      	*testptr2 = 1;
      
      	if (syscall(SYS_arch_prctl, ARCH_SET_GS,
      		    (unsigned long)testptr2 - (unsigned long)testptr) != 0)
      		err(1, "ARCH_SET_GS");
      
      	usleep(100);
      
      	if (read_gs_testvals() == 1) {
      		printf("[OK]\tARCH_SET_GS worked\n");
      	} else {
      		printf("[FAIL]\tARCH_SET_GS failed\n");
      		errors++;
      	}
      
      	asm volatile ("mov %0,%%gs" : : "r" (0));
      
      	if (read_gs_testvals() == 0) {
      		printf("[OK]\tWriting 0 to gs worked\n");
      	} else {
      		printf("[FAIL]\tWriting 0 to gs failed\n");
      		errors++;
      	}
      
      	usleep(100);
      
      	if (read_gs_testvals() == 0) {
      		printf("[OK]\tgsbase is still zero\n");
      	} else {
      		printf("[FAIL]\tgsbase was corrupted\n");
      		errors++;
      	}
      
      	return errors == 0 ? 0 : 1;
      }
      
       ----- end gsbase test -----
      Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
      Cc: <stable@vger.kernel.org>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Link: http://lkml.kernel.org/r/509d27c9fec78217691c3dad91cec87e1006b34a.1418075657.git.luto@amacapital.netSigned-off-by: NIngo Molnar <mingo@kernel.org>
      f647d7c1
    • X
      x86/mm: Use min() instead of min_t() in the e820 printout code · 29258cf4
      Xishi Qiu 提交于
      The type of "MAX_DMA_PFN" and "xXx_pfn" are both unsigned long
      now, so use min() instead of min_t().
      Suggested-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NXishi Qiu <qiuxishi@huawei.com>
      Cc: Linux MM <linux-mm@kvack.org>
      Cc: <dave@sr71.net>
      Cc: Rik van Riel <riel@redhat.com>
      Link: http://lkml.kernel.org/r/5487AB3F.7050807@huawei.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      29258cf4
    • J
      perf/x86/intel/uncore: Make sure only uncore events are collected · af91568e
      Jiri Olsa 提交于
      The uncore_collect_events functions assumes that event group
      might contain only uncore events which is wrong, because it
      might contain any type of events.
      
      This bug leads to uncore framework touching 'not' uncore events,
      which could end up all sorts of bugs.
      
      One was triggered by Vince's perf fuzzer, when the uncore code
      touched breakpoint event private event space as if it was uncore
      event and caused BUG:
      
         BUG: unable to handle kernel paging request at ffffffff82822068
         IP: [<ffffffff81020338>] uncore_assign_events+0x188/0x250
         ...
      
      The code in uncore_assign_events() function was looking for
      event->hw.idx data while the event was initialized as a
      breakpoint with different members in event->hw union.
      
      This patch forces uncore_collect_events() to collect only uncore
      events.
      Reported-by: NVince Weaver <vince@deater.net>
      Signed-off-by: NJiri Olsa <jolsa@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Yan, Zheng <zheng.z.yan@intel.com>
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/1418243031-20367-2-git-send-email-jolsa@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      af91568e
  8. 10 12月, 2014 2 次提交
  9. 08 12月, 2014 1 次提交