1. 01 11月, 2014 1 次提交
    • S
      ftrace/x86: Add dynamic allocated trampoline for ftrace_ops · f3bea491
      Steven Rostedt (Red Hat) 提交于
      The current method of handling multiple function callbacks is to register
      a list function callback that calls all the other callbacks based on
      their hash tables and compare it to the function that the callback was
      called on. But this is very inefficient.
      
      For example, if you are tracing all functions in the kernel and then
      add a kprobe to a function such that the kprobe uses ftrace, the
      mcount trampoline will switch from calling the function trace callback
      to calling the list callback that will iterate over all registered
      ftrace_ops (in this case, the function tracer and the kprobes callback).
      That means for every function being traced it checks the hash of the
      ftrace_ops for function tracing and kprobes, even though the kprobes
      is only set at a single function. The kprobes ftrace_ops is checked
      for every function being traced!
      
      Instead of calling the list function for functions that are only being
      traced by a single callback, we can call a dynamically allocated
      trampoline that calls the callback directly. The function graph tracer
      already uses a direct call trampoline when it is being traced by itself
      but it is not dynamically allocated. It's trampoline is static in the
      kernel core. The infrastructure that called the function graph trampoline
      can also be used to call a dynamically allocated one.
      
      For now, only ftrace_ops that are not dynamically allocated can have
      a trampoline. That is, users such as function tracer or stack tracer.
      kprobes and perf allocate their ftrace_ops, and until there's a safe
      way to free the trampoline, it can not be used. The dynamically allocated
      ftrace_ops may, although, use the trampoline if the kernel is not
      compiled with CONFIG_PREEMPT. But that will come later.
      Tested-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Tested-by: NJiri Kosina <jkosina@suse.cz>
      Signed-off-by: NSteven Rostedt <rostedt@goodmis.org>
      f3bea491
  2. 19 10月, 2014 1 次提交
    • A
      x86,kvm,vmx: Preserve CR4 across VM entry · d974baa3
      Andy Lutomirski 提交于
      CR4 isn't constant; at least the TSD and PCE bits can vary.
      
      TBH, treating CR0 and CR3 as constant scares me a bit, too, but it looks
      like it's correct.
      
      This adds a branch and a read from cr4 to each vm entry.  Because it is
      extremely likely that consecutive entries into the same vcpu will have
      the same host cr4 value, this fixes up the vmcs instead of restoring cr4
      after the fact.  A subsequent patch will add a kernel-wide cr4 shadow,
      reducing the overhead in the common case to just two memory reads and a
      branch.
      Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
      Acked-by: NPaolo Bonzini <pbonzini@redhat.com>
      Cc: stable@vger.kernel.org
      Cc: Petr Matousek <pmatouse@redhat.com>
      Cc: Gleb Natapov <gleb@kernel.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d974baa3
  3. 17 10月, 2014 1 次提交
  4. 15 10月, 2014 1 次提交
    • A
      x86: bpf_jit: fix two bugs in eBPF JIT compiler · e0ee9c12
      Alexei Starovoitov 提交于
      1.
      JIT compiler using multi-pass approach to converge to final image size,
      since x86 instructions are variable length. It starts with large
      gaps between instructions (so some jumps may use imm32 instead of imm8)
      and iterates until total program size is the same as in previous pass.
      This algorithm works only if program size is strictly decreasing.
      Programs that use LD_ABS insn need additional code in prologue, but it
      was not emitted during 1st pass, so there was a chance that 2nd pass would
      adjust imm32->imm8 jump offsets to the same number of bytes as increase in
      prologue, which may cause algorithm to erroneously decide that size converged.
      Fix it by always emitting largest prologue in the first pass which
      is detected by oldproglen==0 check.
      Also change error check condition 'proglen != oldproglen' to fail gracefully.
      
      2.
      while staring at the code realized that 64-byte buffer may not be enough
      when 1st insn is large, so increase it to 128 to avoid buffer overflow
      (theoretical maximum size of prologue+div is 109) and add runtime check.
      
      Fixes: 62258278 ("net: filter: x86: internal BPF JIT")
      Reported-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
      Tested-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e0ee9c12
  5. 14 10月, 2014 7 次提交
  6. 13 10月, 2014 1 次提交
  7. 10 10月, 2014 2 次提交
    • G
      nosave: consolidate __nosave_{begin,end} in <asm/sections.h> · 7f8998c7
      Geert Uytterhoeven 提交于
      The different architectures used their own (and different) declarations:
      
          extern __visible const void __nosave_begin, __nosave_end;
          extern const void __nosave_begin, __nosave_end;
          extern long __nosave_begin, __nosave_end;
      
      Consolidate them using the first variant in <asm/sections.h>.
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7f8998c7
    • M
      mm: remove misleading ARCH_USES_NUMA_PROT_NONE · 6a33979d
      Mel Gorman 提交于
      ARCH_USES_NUMA_PROT_NONE was defined for architectures that implemented
      _PAGE_NUMA using _PROT_NONE.  This saved using an additional PTE bit and
      relied on the fact that PROT_NONE vmas were skipped by the NUMA hinting
      fault scanner.  This was found to be conceptually confusing with a lot of
      implicit assumptions and it was asked that an alternative be found.
      
      Commit c46a7c81 "x86: define _PAGE_NUMA by reusing software bits on the
      PMD and PTE levels" redefined _PAGE_NUMA on x86 to be one of the swap PTE
      bits and shrunk the maximum possible swap size but it did not go far
      enough.  There are no architectures that reuse _PROT_NONE as _PROT_NUMA
      but the relics still exist.
      
      This patch removes ARCH_USES_NUMA_PROT_NONE and removes some unnecessary
      duplication in powerpc vs the generic implementation by defining the types
      the core NUMA helpers expected to exist from x86 with their ppc64
      equivalent.  This necessitated that a PTE bit mask be created that
      identified the bits that distinguish present from NUMA pte entries but it
      is expected this will only differ between arches based on _PAGE_PROTNONE.
      The naming for the generic helpers was taken from x86 originally but ppc64
      has types that are equivalent for the purposes of the helper so they are
      mapped instead of duplicating code.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Cyrill Gorcunov <gorcunov@gmail.com>
      Reviewed-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6a33979d
  8. 09 10月, 2014 2 次提交
  9. 08 10月, 2014 7 次提交
  10. 07 10月, 2014 1 次提交
    • A
      x86_64, entry: Filter RFLAGS.NT on entry from userspace · 8c7aa698
      Andy Lutomirski 提交于
      The NT flag doesn't do anything in long mode other than causing IRET
      to #GP.  Oddly, CPL3 code can still set NT using popf.
      
      Entry via hardware or software interrupt clears NT automatically, so
      the only relevant entries are fast syscalls.
      
      If user code causes kernel code to run with NT set, then there's at
      least some (small) chance that it could cause trouble.  For example,
      user code could cause a call to EFI code with NT set, and who knows
      what would happen?  Apparently some games on Wine sometimes do
      this (!), and, if an IRET return happens, they will segfault.  That
      segfault cannot be handled, because signal delivery fails, too.
      
      This patch programs the CPU to clear NT on entry via SYSCALL (both
      32-bit and 64-bit, by my reading of the AMD APM), and it clears NT
      in software on entry via SYSENTER.
      
      To save a few cycles, this borrows a trick from Jan Beulich in Xen:
      it checks whether NT is set before trying to clear it.  As a result,
      it seems to have very little effect on SYSENTER performance on my
      machine.
      
      There's another minor bug fix in here: it looks like the CFI
      annotations were wrong if CONFIG_AUDITSYSCALL=n.
      
      Testers beware: on Xen, SYSENTER with NT set turns into a GPF.
      
      I haven't touched anything on 32-bit kernels.
      
      The syscall mask change comes from a variant of this patch by Anish
      Bhatt.
      
      Note to stable maintainers: there is no known security issue here.
      A misguided program can set NT and cause the kernel to try and fail
      to deliver SIGSEGV, crashing the program.  This patch fixes Far Cry
      on Wine: https://bugs.winehq.org/show_bug.cgi?id=33275
      
      Cc: <stable@vger.kernel.org>
      Reported-by: NAnish Bhatt <anish@chelsio.com>
      Signed-off-by: NAndy Lutomirski <luto@amacapital.net>
      Link: http://lkml.kernel.org/r/395749a5d39a29bd3e4b35899cf3a3c1340e5595.1412189265.git.luto@amacapital.netSigned-off-by: NH. Peter Anvin <hpa@zytor.com>
      8c7aa698
  11. 06 10月, 2014 1 次提交
  12. 03 10月, 2014 6 次提交
  13. 02 10月, 2014 4 次提交
  14. 27 9月, 2014 1 次提交
  15. 25 9月, 2014 1 次提交
    • M
      x86/efi: Truncate 64-bit values when calling 32-bit OutputString() · 115c6628
      Matt Fleming 提交于
      If we're executing the 32-bit efi_char16_printk() code path (i.e.
      running on top of 32-bit firmware) we know that efi_early->text_output
      will be a 32-bit value, even though ->text_output has type u64.
      
      Unfortunately, we currently pass ->text_output directly to
      efi_early->call() so for CONFIG_X86_32 the compiler will push a 64-bit
      value onto the stack, causing the other parameters to be misaligned.
      
      The way we handle this in the rest of the EFI boot stub is to pass
      pointers as arguments to efi_early->call(), which automatically do the
      right thing (pointers are 32-bit on CONFIG_X86_32, and we simply ignore
      the upper 32-bits of the argument register if running in 64-bit mode
      with 32-bit firmware).
      
      This fixes a corruption bug when printing strings from the 32-bit EFI
      boot stub.
      
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=84241Signed-off-by: NMatt Fleming <matt.fleming@intel.com>
      115c6628
  16. 24 9月, 2014 3 次提交