1. 23 3月, 2016 1 次提交
    • D
      kernel: add kcov code coverage · 5c9a8750
      Dmitry Vyukov 提交于
      kcov provides code coverage collection for coverage-guided fuzzing
      (randomized testing).  Coverage-guided fuzzing is a testing technique
      that uses coverage feedback to determine new interesting inputs to a
      system.  A notable user-space example is AFL
      (http://lcamtuf.coredump.cx/afl/).  However, this technique is not
      widely used for kernel testing due to missing compiler and kernel
      support.
      
      kcov does not aim to collect as much coverage as possible.  It aims to
      collect more or less stable coverage that is function of syscall inputs.
      To achieve this goal it does not collect coverage in soft/hard
      interrupts and instrumentation of some inherently non-deterministic or
      non-interesting parts of kernel is disbled (e.g.  scheduler, locking).
      
      Currently there is a single coverage collection mode (tracing), but the
      API anticipates additional collection modes.  Initially I also
      implemented a second mode which exposes coverage in a fixed-size hash
      table of counters (what Quentin used in his original patch).  I've
      dropped the second mode for simplicity.
      
      This patch adds the necessary support on kernel side.  The complimentary
      compiler support was added in gcc revision 231296.
      
      We've used this support to build syzkaller system call fuzzer, which has
      found 90 kernel bugs in just 2 months:
      
        https://github.com/google/syzkaller/wiki/Found-Bugs
      
      We've also found 30+ bugs in our internal systems with syzkaller.
      Another (yet unexplored) direction where kcov coverage would greatly
      help is more traditional "blob mutation".  For example, mounting a
      random blob as a filesystem, or receiving a random blob over wire.
      
      Why not gcov.  Typical fuzzing loop looks as follows: (1) reset
      coverage, (2) execute a bit of code, (3) collect coverage, repeat.  A
      typical coverage can be just a dozen of basic blocks (e.g.  an invalid
      input).  In such context gcov becomes prohibitively expensive as
      reset/collect coverage steps depend on total number of basic
      blocks/edges in program (in case of kernel it is about 2M).  Cost of
      kcov depends only on number of executed basic blocks/edges.  On top of
      that, kernel requires per-thread coverage because there are always
      background threads and unrelated processes that also produce coverage.
      With inlined gcov instrumentation per-thread coverage is not possible.
      
      kcov exposes kernel PCs and control flow to user-space which is
      insecure.  But debugfs should not be mapped as user accessible.
      
      Based on a patch by Quentin Casasnovas.
      
      [akpm@linux-foundation.org: make task_struct.kcov_mode have type `enum kcov_mode']
      [akpm@linux-foundation.org: unbreak allmodconfig]
      [akpm@linux-foundation.org: follow x86 Makefile layout standards]
      Signed-off-by: NDmitry Vyukov <dvyukov@google.com>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Cc: syzkaller <syzkaller@googlegroups.com>
      Cc: Vegard Nossum <vegard.nossum@oracle.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Tavis Ormandy <taviso@google.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Kostya Serebryany <kcc@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Kees Cook <keescook@google.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: David Drysdale <drysdale@google.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5c9a8750
  2. 14 3月, 2016 1 次提交
  3. 10 3月, 2016 1 次提交
    • B
      x86/delay: Avoid preemptible context checks in delay_mwaitx() · 84477336
      Borislav Petkov 提交于
      We do use this_cpu_ptr(&cpu_tss) as a cacheline-aligned, seldomly
      accessed per-cpu var as the MONITORX target in delay_mwaitx(). However,
      when called in preemptible context, this_cpu_ptr -> smp_processor_id() ->
      debug_smp_processor_id() fires:
      
        BUG: using smp_processor_id() in preemptible [00000000] code: udevd/312
        caller is delay_mwaitx+0x40/0xa0
      
      But we don't care about that check - we only need cpu_tss as a MONITORX
      target and it doesn't really matter which CPU's var we're touching as
      we're going idle anyway. Fix that.
      Suggested-by: NAndy Lutomirski <luto@kernel.org>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Huang Rui <ray.huang@amd.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: spg_linux_kernel@amd.com
      Link: http://lkml.kernel.org/r/20160309205622.GG6564@pd.tnicSigned-off-by: NIngo Molnar <mingo@kernel.org>
      84477336
  4. 09 3月, 2016 1 次提交
    • T
      x86/mm, x86/mce: Add memcpy_mcsafe() · 92b0729c
      Tony Luck 提交于
      Make use of the EXTABLE_FAULT exception table entries to write
      a kernel copy routine that doesn't crash the system if it
      encounters a machine check. Prime use case for this is to copy
      from large arrays of non-volatile memory used as storage.
      
      We have to use an unrolled copy loop for now because current
      hardware implementations treat a machine check in "rep mov"
      as fatal. When that is fixed we can simplify.
      
      Return type is a "bool". True means that we copied OK, false means
      that it didn't.
      Signed-off-by: NTony Luck <tony.luck@intel.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@gmail.com>
      Link: http://lkml.kernel.org/r/a44e1055efc2d2a9473307b22c91caa437aa3f8b.1456439214.git.tony.luck@intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      92b0729c
  5. 03 3月, 2016 1 次提交
    • J
      x86/asm/decoder: Use explicitly signed chars · 19072f23
      Josh Poimboeuf 提交于
      When running objtool on a ppc64le host to analyze x86 binaries, it
      reports a lot of false warnings like:
      
        ipc/compat_mq.o: warning: objtool: compat_SyS_mq_open()+0x91: can't find jump dest instruction at .text+0x3a5
      
      The warnings are caused by the x86 instruction decoder setting the wrong
      value for the jump instruction's immediate field because it assumes that
      "char == signed char", which isn't true for all architectures.  When
      converting char to int, gcc sign-extends on x86 but doesn't sign-extend
      on ppc64le.
      
      According to the gcc man page, that's a feature, not a bug:
      
        > Each kind of machine has a default for what "char" should be.  It is
        > either like "unsigned char" by default or like "signed char" by
        > default.
        >
        > Ideally, a portable program should always use "signed char" or
        > "unsigned char" when it depends on the signedness of an object.
      
      Conform to the "standards" by changing the "char" casts to "signed
      char".  This results in no actual changes to the object code on x86.
      
      Note: the x86 decoder now lives in three different locations in the
      kernel tree, which are all kept in sync via makefile checks and
      warnings: in-kernel, perf, and objtool.  This fixes all three locations.
      Eventually we should probably try to at least converge the two separate
      "tools" locations into a single shared location.
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/9dd4161719b20e6def9564646d68bfbe498c549f.1456962210.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      19072f23
  6. 24 2月, 2016 1 次提交
    • J
      x86/asm: Create stack frames in rwsem functions · 3387a535
      Josh Poimboeuf 提交于
      rwsem.S has several callable non-leaf functions which don't honor
      CONFIG_FRAME_POINTER, which can result in bad stack traces.
      
      Create stack frames for them when CONFIG_FRAME_POINTER is enabled.
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Reviewed-by: NBorislav Petkov <bp@suse.de>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Bernd Petrovitsch <bernd@petrovitsch.priv.at>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Chris J Arges <chris.j.arges@canonical.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michal Marek <mmarek@suse.cz>
      Cc: Namhyung Kim <namhyung@gmail.com>
      Cc: Pedro Alves <palves@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: live-patching@vger.kernel.org
      Link: http://lkml.kernel.org/r/ad0932bbead975b15f9578e4f2cf2ee5961eb840.1453405861.git.jpoimboe@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      3387a535
  7. 17 2月, 2016 2 次提交
    • T
      x86/uaccess/64: Handle the caching of 4-byte nocache copies properly in __copy_user_nocache() · a82eee74
      Toshi Kani 提交于
      Data corruption issues were observed in tests which initiated
      a system crash/reset while accessing BTT devices.  This problem
      is reproducible.
      
      The BTT driver calls pmem_rw_bytes() to update data in pmem
      devices.  This interface calls __copy_user_nocache(), which
      uses non-temporal stores so that the stores to pmem are
      persistent.
      
      __copy_user_nocache() uses non-temporal stores when a request
      size is 8 bytes or larger (and is aligned by 8 bytes).  The
      BTT driver updates the BTT map table, which entry size is
      4 bytes.  Therefore, updates to the map table entries remain
      cached, and are not written to pmem after a crash.
      
      Change __copy_user_nocache() to use non-temporal store when
      a request size is 4 bytes.  The change extends the current
      byte-copy path for a less-than-8-bytes request, and does not
      add any overhead to the regular path.
      Reported-and-tested-by: NMicah Parrish <micah.parrish@hpe.com>
      Reported-and-tested-by: NBrian Boylston <brian.boylston@hpe.com>
      Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
      Cc: <stable@vger.kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: Vishal Verma <vishal.l.verma@intel.com>
      Cc: linux-nvdimm@lists.01.org
      Link: http://lkml.kernel.org/r/1455225857-12039-3-git-send-email-toshi.kani@hpe.com
      [ Small readability edits. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      a82eee74
    • T
      x86/uaccess/64: Make the __copy_user_nocache() assembly code more readable · ee9737c9
      Toshi Kani 提交于
      Add comments to __copy_user_nocache() to clarify its procedures
      and alignment requirements.
      
      Also change numeric branch target labels to named local labels.
      
      No code changed:
      
       arch/x86/lib/copy_user_64.o:
      
          text    data     bss     dec     hex filename
          1239       0       0    1239     4d7 copy_user_64.o.before
          1239       0       0    1239     4d7 copy_user_64.o.after
      
       md5:
          58bed94c2db98c1ca9a2d46d0680aaae  copy_user_64.o.before.asm
          58bed94c2db98c1ca9a2d46d0680aaae  copy_user_64.o.after.asm
      Signed-off-by: NToshi Kani <toshi.kani@hpe.com>
      Cc: <stable@vger.kernel.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Luis R. Rodriguez <mcgrof@suse.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: brian.boylston@hpe.com
      Cc: dan.j.williams@intel.com
      Cc: linux-nvdimm@lists.01.org
      Cc: micah.parrish@hpe.com
      Cc: ross.zwisler@linux.intel.com
      Cc: vishal.l.verma@intel.com
      Link: http://lkml.kernel.org/r/1455225857-12039-2-git-send-email-toshi.kani@hpe.com
      [ Small readability edits and added object file comparison. ]
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      ee9737c9
  8. 03 2月, 2016 4 次提交
    • D
      x86/boot: Pass in size to early cmdline parsing · 8c051775
      Dave Hansen 提交于
      We will use this in a few patches to implement tests for early parsing.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      [ Aligned args properly. ]
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: fenghua.yu@intel.com
      Cc: yu-cheng.yu@intel.com
      Link: http://lkml.kernel.org/r/20151222225243.5CC47EB6@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      8c051775
    • D
      x86/boot: Simplify early command line parsing · 4de07ea4
      Dave Hansen 提交于
      __cmdline_find_option_bool() tries to account for both NULL-terminated
      and non-NULL-terminated strings. It keeps 'pos' to look for the end of
      the buffer and also looks for '!c' in a bunch of places to look for NULL
      termination.
      
      But, it also calls strlen(). You can't call strlen on a
      non-NULL-terminated string.
      
      If !strlen(cmdline), then cmdline[0]=='\0'. In that case, we will go in
      to the while() loop, set c='\0', hit st_wordstart, notice !c, and will
      immediately return 0.
      
      So, remove the strlen().  It is unnecessary and unsafe.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: fenghua.yu@intel.com
      Cc: yu-cheng.yu@intel.com
      Link: http://lkml.kernel.org/r/20151222225241.15365E43@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      4de07ea4
    • D
      x86/boot: Fix early command-line parsing when partial word matches · abcdc1c6
      Dave Hansen 提交于
      cmdline_find_option_bool() keeps track of position in two strings:
      
       1. the command-line
       2. the option we are searchign for in the command-line
      
      We plow through each character in the command-line one at a time, always
      moving forward. We move forward in the option ('opptr') when we match
      characters in 'cmdline'. We reset the 'opptr' only when we go in to the
      'st_wordstart' state.
      
      But, if we fail to match an option because we see a space
      (state=st_wordcmp, *opptr='\0',c=' '), we set state='st_wordskip' and
      'break', moving to the next character. But, that move to the next
      character is the one *after* the ' '. This means that we will miss a
      'st_wordstart' state.
      
      For instance, if we have
      
        cmdline = "foo fool";
      
      and are searching for "fool", we have:
      
      	  "fool"
        opptr = ----^
      
                 "foo fool"
         c = --------^
      
      We see that 'l' != ' ', set state=st_wordskip, break, and then move 'c', so:
      
                "foo fool"
        c = ---------^
      
      and are still in state=st_wordskip. We will stay in wordskip until we
      have skipped "fool", thus missing the option we were looking for. This
      *only* happens when you have a partially- matching word followed by a
      matching one.
      
      To fix this, we always fall *into* the 'st_wordskip' state when we set
      it.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: fenghua.yu@intel.com
      Cc: yu-cheng.yu@intel.com
      Link: http://lkml.kernel.org/r/20151222225239.8E1DCA58@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      abcdc1c6
    • D
      x86/boot: Fix early command-line parsing when matching at end · 02afeaae
      Dave Hansen 提交于
      The x86 early command line parsing in cmdline_find_option_bool() is
      buggy. If it matches a specified 'option' all the way to the end of the
      command-line, it will consider it a match.
      
      For instance,
      
        cmdline = "foo";
        cmdline_find_option_bool(cmdline, "fool");
      
      will return 1. This is particularly annoying since we have actual FPU
      options like "noxsave" and "noxsaves" So, command-line "foo bar noxsave"
      will match *BOTH* a "noxsave" and "noxsaves". (This turns out not to be
      an actual problem because "noxsave" implies "noxsaves", but it's still
      confusing.)
      
      To fix this, we simplify the code and stop tracking 'len'. 'len'
      was trying to indicate either the NULL terminator *OR* the end of a
      non-NULL-terminated command line at 'COMMAND_LINE_SIZE'. But, each of the
      three states is *already* checking 'cmdline' for a NULL terminator.
      
      We _only_ need to check if we have overrun 'COMMAND_LINE_SIZE', and that
      we can do without keeping 'len' around.
      
      Also add some commends to clarify what is going on.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: fenghua.yu@intel.com
      Cc: yu-cheng.yu@intel.com
      Link: http://lkml.kernel.org/r/20151222225238.9AEB560C@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      02afeaae
  9. 30 1月, 2016 1 次提交
  10. 06 12月, 2015 1 次提交
    • A
      x86, tracing, perf: Add trace point for MSR accesses · 7f47d8cc
      Andi Kleen 提交于
      For debugging low level code interacting with the CPU it is often
      useful to trace the MSR read/writes. This gives a concise summary of
      PMU and other operations.
      
      perf has an ad-hoc way to do this using trace_printk, but it's
      somewhat limited (and also now spews ugly boot messages when enabled)
      
      Instead define real trace points for all MSR accesses.
      
      This adds three new trace points: read_msr and write_msr and rdpmc.
      
      They also report if the access faulted (if *_safe is used)
      
      This allows filtering and triggering on specific MSR values, which
      allows various more advanced debugging techniques.
      
      All the values are well defined in the CPU documentation.
      
      The trace can be post processed with
      Documentation/trace/postprocess/decode_msr.py to add symbolic MSR
      names to the trace.
      
      I only added it to native MSR accesses in C, not paravirtualized or in
      entry*.S (which is not too interesting)
      
      Originally the patch kit moved the MSRs out of line.  This uses an
      alternative approach recommended by Steven Rostedt of only moving the
      trace calls out of line, but open coding the access to the jump label.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Mike Galbraith <efault@gmx.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1449018060-1742-3-git-send-email-andi@firstfloor.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      7f47d8cc
  11. 24 11月, 2015 1 次提交
  12. 04 9月, 2015 5 次提交
    • A
      x86/insn: perf tools: Add new xsave instructions · f83b6b64
      Adrian Hunter 提交于
      Add xsavec, xsaves and xrstors to the op code map and the perf tools new
      instructions test.  To run the test:
      
        $ tools/perf/perf test "x86 ins"
        39: Test x86 instruction decoder - new instructions          : Ok
      
      Or to see the details:
      
        $ tools/perf/perf test -v "x86 ins" 2>&1 | grep 'xsave\|xrst'
      
      For information about xsavec, xsaves and xrstors, refer the Intel SDM.
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Acked-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qiaowei Ren <qiaowei.ren@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1441196131-20632-8-git-send-email-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      f83b6b64
    • A
      x86/insn: perf tools: Add new memory protection keys instructions · 978260cd
      Adrian Hunter 提交于
      Add rdpkru and wrpkru to the op code map and the perf tools new
      instructions test.  In the case of the test, only the bytes can be
      tested at the moment since binutils doesn't support the instructions
      yet.  To run the test:
      
        $ tools/perf/perf test "x86 ins"
        39: Test x86 instruction decoder - new instructions          : Ok
      
      Or to see the details:
      
        $ tools/perf/perf test -v "x86 ins" 2>&1 | grep pkru
      
      For information about rdpkru and wrpkru, refer the Intel SDM.
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Acked-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qiaowei Ren <qiaowei.ren@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1441196131-20632-7-git-send-email-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      978260cd
    • A
      x86/insn: perf tools: Add new memory instructions · ac1c8859
      Adrian Hunter 提交于
      Intel Architecture Instruction Set Extensions Programing Reference (Oct
      2014) describes 3 new memory instructions, namely clflushopt, clwb and
      pcommit.  Add them to the op code map and the perf tools new
      instructions test. e.g.
      
        $ tools/perf/perf test "x86 ins"
        39: Test x86 instruction decoder - new instructions          : Ok
      
      Or to see the details:
      
        $ tools/perf/perf test -v "x86 ins"
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Acked-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qiaowei Ren <qiaowei.ren@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1441196131-20632-6-git-send-email-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      ac1c8859
    • A
      x86/insn: perf tools: Add new SHA instructions · 3fe78d6a
      Adrian Hunter 提交于
      Intel SHA Extensions are explained in the Intel Architecture
      Instruction Set Extensions Programing Reference (Oct 2014).
      There are 7 new instructions.  Add them to the op code map
      and the perf tools new instructions test. e.g.
      
        $ tools/perf/perf test "x86 ins"
        39: Test x86 instruction decoder - new instructions          : Ok
      
      Or to see the details:
      
        $ tools/perf/perf test -v "x86 ins" 2>&1 | grep sha
      
      Committer note:
      
      3 lines of details, for the curious:
      
        $ perf test -v "x86 ins" 2>&1 | grep sha256msg1 | tail -3
        Decoded ok: 0f 38 cc 84 08 78 56 34 12 	sha256msg1 0x12345678(%rax,%rcx,1),%xmm0
        Decoded ok: 0f 38 cc 84 c8 78 56 34 12 	sha256msg1 0x12345678(%rax,%rcx,8),%xmm0
        Decoded ok: 44 0f 38 cc bc c8 78 56 34 12 	sha256msg1 0x12345678(%rax,%rcx,8),%xmm15
        $
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Acked-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qiaowei Ren <qiaowei.ren@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1441196131-20632-5-git-send-email-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      3fe78d6a
    • A
      x86/insn: perf tools: Pedantically tweak opcode map for MPX instructions · 78173ec6
      Adrian Hunter 提交于
      The MPX instructions are presently not described in the SDM
      opcode maps, and there are not encoding characters for bnd
      registers, address method or operand type.  So the kernel
      opcode map is using 'Gv' for bnd registers and 'Ev' for
      everything else.  That is fine because the instruction
      decoder does not use that information anyway, except as
      an indication that there is a ModR/M byte.
      
      Nevertheless, in some cases the 'Gv' and 'Ev' are the wrong
      way around, BNDLDX and BNDSTX have 2 operands not 3, and it
      wouldn't hurt to identify the mandatory prefixes.
      
      This has no effect on the decoding of valid instructions,
      but the addition of the mandatory prefixes will cause some
      invalid instructions to error out that wouldn't have
      previously.
      
      Note that perf tools has a copy of the instruction decoder
      and provides a test for new instructions which includes MPX
      instructions e.g.
      
        $ perf test "x86 ins"
        39: Test x86 instruction decoder - new instructions          : Ok
      
      Or to see the details:
      
        $ perf test -v "x86 ins"
      
      Commiter notes:
      
      And to see these MPX instructions specifically:
      
        $ perf test -v "x86 ins" 2>&1 | grep bndldx | head -3
        Decoded ok: 0f 1a 00             	bndldx (%eax),%bnd0
        Decoded ok: 0f 1a 05 78 56 34 12 	bndldx 0x12345678,%bnd0
        Decoded ok: 0f 1a 18             	bndldx (%eax),%bnd3
        $ perf test -v "x86 ins" 2>&1 | grep bndstx | head -3
        Decoded ok: 0f 1b 00             	bndstx %bnd0,(%eax)
        Decoded ok: 0f 1b 05 78 56 34 12 	bndstx %bnd0,0x12345678
        Decoded ok: 0f 1b 18             	bndstx %bnd3,(%eax)
        $
      Signed-off-by: NAdrian Hunter <adrian.hunter@intel.com>
      Acked-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Qiaowei Ren <qiaowei.ren@intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: http://lkml.kernel.org/r/1441196131-20632-4-git-send-email-adrian.hunter@intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      78173ec6
  13. 22 8月, 2015 1 次提交
    • H
      x86/asm/delay: Introduce an MWAITX-based delay with a configurable timer · b466bdb6
      Huang Rui 提交于
      MWAITX can enable a timer and a corresponding timer value
      specified in SW P0 clocks. The SW P0 frequency is the same as
      TSC. The timer provides an upper bound on how long the
      instruction waits before exiting.
      
      This way, a delay function in the kernel can leverage that
      MWAITX timer of MWAITX.
      
      When a CPU core executes MWAITX, it will be quiesced in a
      waiting phase, diminishing its power consumption. This way, we
      can save power in comparison to our default TSC-based delays.
      
      A simple test shows that:
      
      	$ cat /sys/bus/pci/devices/0000\:00\:18.4/hwmon/hwmon0/power1_acc
      	$ sleep 10000s
      	$ cat /sys/bus/pci/devices/0000\:00\:18.4/hwmon/hwmon0/power1_acc
      
      Results:
      
      	* TSC-based default delay:      485115 uWatts average power
      	* MWAITX-based delay:           252738 uWatts average power
      
      Thus, that's about 240 milliWatts less power consumption. The
      test method relies on the support of AMD CPU accumulated power
      algorithm in fam15h_power for which patches are forthcoming.
      Suggested-by: NAndy Lutomirski <luto@amacapital.net>
      Suggested-by: NBorislav Petkov <bp@suse.de>
      Suggested-by: NPeter Zijlstra <peterz@infradead.org>
      Signed-off-by: NHuang Rui <ray.huang@amd.com>
      [ Fix delay truncation. ]
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Aaron Lu <aaron.lu@intel.com>
      Cc: Andreas Herrmann <herrmann.der.user@gmail.com>
      Cc: Aravind Gopalakrishnan <Aravind.Gopalakrishnan@amd.com>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Hector Marco-Gisbert <hecmargi@upv.es>
      Cc: Jacob Shin <jacob.w.shin@gmail.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Paolo Bonzini <pbonzini@redhat.com>
      Cc: Rafael J. Wysocki <rjw@rjwysocki.net>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Li <tony.li@amd.com>
      Link: http://lkml.kernel.org/r/1438744732-1459-3-git-send-email-ray.huang@amd.com
      Link: http://lkml.kernel.org/r/1439201994-28067-4-git-send-email-bp@alien8.deSigned-off-by: NIngo Molnar <mingo@kernel.org>
      b466bdb6
  14. 06 7月, 2015 5 次提交
  15. 07 6月, 2015 1 次提交
  16. 04 6月, 2015 1 次提交
    • I
      x86/asm/entry: Move the 'thunk' functions to arch/x86/entry/ · e6b93f4e
      Ingo Molnar 提交于
      These are all calling x86 entry code functions, so move them close
      to other entry code.
      
      Change lib-y to obj-y: there's no real difference between the two
      as we don't really drop any of them during the linking stage, and
      obj-y is the more common approach for core kernel object code.
      
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      e6b93f4e
  17. 02 6月, 2015 1 次提交
    • I
      x86/debug: Remove perpetually broken, unmaintainable dwarf annotations · 131484c8
      Ingo Molnar 提交于
      So the dwarf2 annotations in low level assembly code have
      become an increasing hindrance: unreadable, messy macros
      mixed into some of the most security sensitive code paths
      of the Linux kernel.
      
      These debug info annotations don't even buy the upstream
      kernel anything: dwarf driven stack unwinding has caused
      problems in the past so it's out of tree, and the upstream
      kernel only uses the much more robust framepointers based
      stack unwinding method.
      
      In addition to that there's a steady, slow bitrot going
      on with these annotations, requiring frequent fixups.
      There's no tooling and no functionality upstream that
      keeps it correct.
      
      So burn down the sick forest, allowing new, healthier growth:
      
         27 files changed, 350 insertions(+), 1101 deletions(-)
      
      Someone who has the willingness and time to do this
      properly can attempt to reintroduce dwarf debuginfo in x86
      assembly code plus dwarf unwinding from first principles,
      with the following conditions:
      
       - it should be maximally readable, and maximally low-key to
         'ordinary' code reading and maintenance.
      
       - find a build time method to insert dwarf annotations
         automatically in the most common cases, for pop/push
         instructions that manipulate the stack pointer. This could
         be done for example via a preprocessing step that just
         looks for common patterns - plus special annotations for
         the few cases where we want to depart from the default.
         We have hundreds of CFI annotations, so automating most of
         that makes sense.
      
       - it should come with build tooling checks that ensure that
         CFI annotations are sensible. We've seen such efforts from
         the framepointer side, and there's no reason it couldn't be
         done on the dwarf side.
      
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Brian Gerst <brgerst@gmail.com>
      Cc: Denys Vlasenko <dvlasenk@redhat.com>
      Cc: Frédéric Weisbecker <fweisbec@gmail.com
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jan Beulich <JBeulich@suse.com>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      131484c8
  18. 19 5月, 2015 2 次提交
  19. 14 5月, 2015 3 次提交
  20. 24 4月, 2015 1 次提交
    • L
      x86: fix special __probe_kernel_write() tail zeroing case · d869844b
      Linus Torvalds 提交于
      Commit cae2a173 ("x86: clean up/fix 'copy_in_user()' tail zeroing")
      fixed the failure case tail zeroing of one special case of the x86-64
      generic user-copy routine, namely when used for the user-to-user case
      ("copy_in_user()").
      
      But in the process it broke an even more unusual case: using the user
      copy routine for kernel-to-kernel copying.
      
      Now, normally kernel-kernel copies are obviously done using memcpy(),
      but we have a couple of special cases when we use the user-copy
      functions.  One is when we pass a kernel buffer to a regular user-buffer
      routine, using set_fs(KERNEL_DS).  That's a "normal" case, and continued
      to work fine, because it never takes any faults (with the possible
      exception of a silent and successful vmalloc fault).
      
      But Jan Beulich pointed out another, very unusual, special case: when we
      use the user-copy routines not because it's a path that expects a user
      pointer, but for a couple of ftrace/kgdb cases that want to do a kernel
      copy, but do so using "unsafe" buffers, and use the user-copy routine to
      gracefully handle faults.  IOW, for probe_kernel_write().
      
      And that broke for the case of a faulting kernel destination, because we
      saw the kernel destination and wanted to try to clear the tail of the
      buffer.  Which doesn't work, since that's what faults.
      
      This only triggers for things like kgdb and ftrace users (eg trying
      setting a breakpoint on read-only memory), but it's definitely a bug.
      The fix is to not compare against the kernel address start (TASK_SIZE),
      but instead use the same limits "access_ok()" uses.
      Reported-and-tested-by: NJan Beulich <jbeulich@suse.com>
      Cc: stable@vger.kernel.org # 4.0
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d869844b
  21. 09 4月, 2015 1 次提交
    • L
      x86: clean up/fix 'copy_in_user()' tail zeroing · cae2a173
      Linus Torvalds 提交于
      The rule for 'copy_from_user()' is that it zeroes the remaining kernel
      buffer even when the copy fails halfway, just to make sure that we don't
      leave uninitialized kernel memory around.  Because even if we check for
      errors, some kernel buffers stay around after thge copy (think page
      cache).
      
      However, the x86-64 logic for user copies uses a copy_user_generic()
      function for all the cases, that set the "zerorest" flag for any fault
      on the source buffer.  Which meant that it didn't just try to clear the
      kernel buffer after a failure in copy_from_user(), it also tried to
      clear the destination user buffer for the "copy_in_user()" case.
      
      Not only is that pointless, it also means that the clearing code has to
      worry about the tail clearing taking page faults for the user buffer
      case.  Which is just stupid, since that case shouldn't happen in the
      first place.
      
      Get rid of the whole "zerorest" thing entirely, and instead just check
      if the destination is in kernel space or not.  And then just use
      memset() to clear the tail of the kernel buffer if necessary.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cae2a173
  22. 07 3月, 2015 1 次提交
    • D
      x86/asm: Optimize unnecessarily wide TEST instructions · 3e1aa7cb
      Denys Vlasenko 提交于
      By the nature of the TEST operation, it is often possible to test
      a narrower part of the operand:
      
          "testl $3,  mem"  ->  "testb $3, mem",
          "testq $3, %rcx"  ->  "testb $3, %cl"
      
      This results in shorter instructions, because the TEST instruction
      has no sign-entending byte-immediate forms unlike other ALU ops.
      
      Note that this change does not create any LCP (Length-Changing Prefix)
      stalls, which happen when adding a 0x66 prefix, which happens when
      16-bit immediates are used, which changes such TEST instructions:
      
        [test_opcode] [modrm] [imm32]
      
      to:
      
        [0x66] [test_opcode] [modrm] [imm16]
      
      where [imm16] has a *different length* now: 2 bytes instead of 4.
      This confuses the decoder and slows down execution.
      
      REX prefixes were carefully designed to almost never hit this case:
      adding REX prefix does not change instruction length except MOVABS
      and MOV [addr],RAX instruction.
      
      This patch does not add instructions which would use a 0x66 prefix,
      code changes in assembly are:
      
          -48 f7 07 01 00 00 00 	testq  $0x1,(%rdi)
          +f6 07 01             	testb  $0x1,(%rdi)
          -48 f7 c1 01 00 00 00 	test   $0x1,%rcx
          +f6 c1 01             	test   $0x1,%cl
          -48 f7 c1 02 00 00 00 	test   $0x2,%rcx
          +f6 c1 02             	test   $0x2,%cl
          -41 f7 c2 01 00 00 00 	test   $0x1,%r10d
          +41 f6 c2 01          	test   $0x1,%r10b
          -48 f7 c1 04 00 00 00 	test   $0x4,%rcx
          +f6 c1 04             	test   $0x4,%cl
          -48 f7 c1 08 00 00 00 	test   $0x8,%rcx
          +f6 c1 08             	test   $0x8,%cl
      
      Linus further notes:
      
         "There are no stalls from using 8-bit instruction forms.
      
          Now, changing from 64-bit or 32-bit 'test' instructions to 8-bit ones
          *could* cause problems if it ends up having forwarding issues, so that
          instead of just forwarding the result, you end up having to wait for
          it to be stable in the L1 cache (or possibly the register file). The
          forwarding from the store buffer is simplest and most reliable if the
          read is done at the exact same address and the exact same size as the
          write that gets forwarded.
      
          But that's true only if:
      
           (a) the write was very recent and is still in the write queue. I'm
               not sure that's the case here anyway.
      
           (b) on at least most Intel microarchitectures, you have to test a
               different byte than the lowest one (so forwarding a 64-bit write
               to a 8-bit read ends up working fine, as long as the 8-bit read
               is of the low 8 bits of the written data).
      
          A very similar issue *might* show up for registers too, not just
          memory writes, if you use 'testb' with a high-byte register (where
          instead of forwarding the value from the original producer it needs to
          go through the register file and then shifted). But it's mainly a
          problem for store buffers.
      
          But afaik, the way Denys changed the test instructions, neither of the
          above issues should be true.
      
          The real problem for store buffer forwarding tends to be "write 8
          bits, read 32 bits". That can be really surprisingly expensive,
          because the read ends up having to wait until the write has hit the
          cacheline, and we might talk tens of cycles of latency here. But
          "write 32 bits, read the low 8 bits" *should* be fast on pretty much
          all x86 chips, afaik."
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      Acked-by: NAndy Lutomirski <luto@amacapital.net>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: H. Peter Anvin <hpa@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Will Drewry <wad@chromium.org>
      Link: http://lkml.kernel.org/r/1425675332-31576-1-git-send-email-dvlasenk@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      3e1aa7cb
  23. 05 3月, 2015 2 次提交
  24. 23 2月, 2015 1 次提交