1. 19 5月, 2016 3 次提交
  2. 13 5月, 2016 1 次提交
    • C
      KVM: halt_polling: provide a way to qualify wakeups during poll · 3491caf2
      Christian Borntraeger 提交于
      Some wakeups should not be considered a sucessful poll. For example on
      s390 I/O interrupts are usually floating, which means that _ALL_ CPUs
      would be considered runnable - letting all vCPUs poll all the time for
      transactional like workload, even if one vCPU would be enough.
      This can result in huge CPU usage for large guests.
      This patch lets architectures provide a way to qualify wakeups if they
      should be considered a good/bad wakeups in regard to polls.
      
      For s390 the implementation will fence of halt polling for anything but
      known good, single vCPU events. The s390 implementation for floating
      interrupts does a wakeup for one vCPU, but the interrupt will be delivered
      by whatever CPU checks first for a pending interrupt. We prefer the
      woken up CPU by marking the poll of this CPU as "good" poll.
      This code will also mark several other wakeup reasons like IPI or
      expired timers as "good". This will of course also mark some events as
      not sucessful. As  KVM on z runs always as a 2nd level hypervisor,
      we prefer to not poll, unless we are really sure, though.
      
      This patch successfully limits the CPU usage for cases like uperf 1byte
      transactional ping pong workload or wakeup heavy workload like OLTP
      while still providing a proper speedup.
      
      This also introduced a new vcpu stat "halt_poll_no_tuning" that marks
      wakeups that are considered not good for polling.
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Acked-by: Radim Krčmář <rkrcmar@redhat.com> (for an earlier version)
      Cc: David Matlack <dmatlack@google.com>
      Cc: Wanpeng Li <kernellwp@gmail.com>
      [Rename config symbol. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      3491caf2
  3. 12 5月, 2016 1 次提交
  4. 09 5月, 2016 1 次提交
  5. 03 5月, 2016 1 次提交
  6. 29 4月, 2016 1 次提交
    • B
      KVM: x86: fix ordering of cr0 initialization code in vmx_cpu_reset · f2463247
      Bruce Rogers 提交于
      Commit d28bc9dd reversed the order of two lines which initialize cr0,
      allowing the current (old) cr0 value to mess up vcpu initialization.
      This was observed in the checks for cr0 X86_CR0_WP bit in the context of
      kvm_mmu_reset_context(). Besides, setting vcpu->arch.cr0 after vmx_set_cr0()
      is completely redundant. Change the order back to ensure proper vcpu
      initialization.
      
      The combination of booting with ovmf firmware when guest vcpus > 1 and kvm's
      ept=N option being set results in a VM-entry failure. This patch fixes that.
      
      Fixes: d28bc9dd ("KVM: x86: INIT and reset sequences are different")
      Cc: stable@vger.kernel.org
      Signed-off-by: NBruce Rogers <brogers@suse.com>
      Signed-off-by: NRadim Krčmář <rkrcmar@redhat.com>
      f2463247
  7. 20 4月, 2016 3 次提交
  8. 11 4月, 2016 3 次提交
    • P
      KVM: x86: mask CPUID(0xD,0x1).EAX against host value · 316314ca
      Paolo Bonzini 提交于
      This ensures that the guest doesn't see XSAVE extensions
      (e.g. xgetbv1 or xsavec) that the host lacks.
      
      Cc: stable@vger.kernel.org
      Reviewed-by: NRadim Krčmář <rkrcmar@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      316314ca
    • D
      kvm: x86: do not leak guest xcr0 into host interrupt handlers · fc5b7f3b
      David Matlack 提交于
      An interrupt handler that uses the fpu can kill a KVM VM, if it runs
      under the following conditions:
       - the guest's xcr0 register is loaded on the cpu
       - the guest's fpu context is not loaded
       - the host is using eagerfpu
      
      Note that the guest's xcr0 register and fpu context are not loaded as
      part of the atomic world switch into "guest mode". They are loaded by
      KVM while the cpu is still in "host mode".
      
      Usage of the fpu in interrupt context is gated by irq_fpu_usable(). The
      interrupt handler will look something like this:
      
      if (irq_fpu_usable()) {
              kernel_fpu_begin();
      
              [... code that uses the fpu ...]
      
              kernel_fpu_end();
      }
      
      As long as the guest's fpu is not loaded and the host is using eager
      fpu, irq_fpu_usable() returns true (interrupted_kernel_fpu_idle()
      returns true). The interrupt handler proceeds to use the fpu with
      the guest's xcr0 live.
      
      kernel_fpu_begin() saves the current fpu context. If this uses
      XSAVE[OPT], it may leave the xsave area in an undesirable state.
      According to the SDM, during XSAVE bit i of XSTATE_BV is not modified
      if bit i is 0 in xcr0. So it's possible that XSTATE_BV[i] == 1 and
      xcr0[i] == 0 following an XSAVE.
      
      kernel_fpu_end() restores the fpu context. Now if any bit i in
      XSTATE_BV == 1 while xcr0[i] == 0, XRSTOR generates a #GP. The
      fault is trapped and SIGSEGV is delivered to the current process.
      
      Only pre-4.2 kernels appear to be vulnerable to this sequence of
      events. Commit 653f52c3 ("kvm,x86: load guest FPU context more eagerly")
      from 4.2 forces the guest's fpu to always be loaded on eagerfpu hosts.
      
      This patch fixes the bug by keeping the host's xcr0 loaded outside
      of the interrupts-disabled region where KVM switches into guest mode.
      
      Cc: stable@vger.kernel.org
      Suggested-by: NAndy Lutomirski <luto@amacapital.net>
      Signed-off-by: NDavid Matlack <dmatlack@google.com>
      [Move load after goto cancel_injection. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      fc5b7f3b
    • X
      KVM: MMU: fix permission_fault() · 7a98205d
      Xiao Guangrong 提交于
      kvm-unit-tests complained about the PFEC is not set properly, e.g,:
      test pte.rw pte.d pte.nx pde.p pde.rw pde.pse user fetch: FAIL: error code 15
      expected 5
      Dump mapping: address: 0x123400000000
      ------L4: 3e95007
      ------L3: 3e96007
      ------L2: 2000083
      
      It's caused by the reason that PFEC returned to guest is copied from the
      PFEC triggered by shadow page table
      
      This patch fixes it and makes the logic of updating errcode more clean
      Signed-off-by: NXiao Guangrong <guangrong.xiao@linux.intel.com>
      [Do not assume pfec.p=1. - Paolo]
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      7a98205d
  9. 08 4月, 2016 1 次提交
  10. 07 4月, 2016 1 次提交
  11. 05 4月, 2016 1 次提交
    • L
      kvm: x86: make lapic hrtimer pinned · 61abdbe0
      Luiz Capitulino 提交于
      When a vCPU runs on a nohz_full core, the hrtimer used by
      the lapic emulation code can be migrated to another core.
      When this happens, it's possible to observe milisecond
      latency when delivering timer IRQs to KVM guests.
      
      The huge latency is mainly due to the fact that
      apic_timer_fn() expects to run during a kvm exit. It
      sets KVM_REQ_PENDING_TIMER and let it be handled on kvm
      entry. However, if the timer fires on a different core,
      we have to wait until the next kvm exit for the guest
      to see KVM_REQ_PENDING_TIMER set.
      
      This problem became visible after commit 9642d18e. This
      commit changed the timer migration code to always attempt
      to migrate timers away from nohz_full cores. While it's
      discussable if this is correct/desirable (I don't think
      it is), it's clear that the lapic emulation code has
      a requirement on firing the hrtimer in the same core
      where it was started. This is achieved by making the
      hrtimer pinned.
      
      Lastly, note that KVM has code to migrate timers when a
      vCPU is scheduled to run in different core. However, this
      forced migration may fail. When this happens, we can have
      the same problem. If we want 100% correctness, we'll have
      to modify apic_timer_fn() to cause a kvm exit when it runs
      on a different core than the vCPU. Not sure if this is
      possible.
      
      Here's a reproducer for the issue being fixed:
      
       1. Set all cores but core0 to be nohz_full cores
       2. Start a guest with a single vCPU
       3. Trace apic_timer_fn() and kvm_inject_apic_timer_irqs()
      
      You'll see that apic_timer_fn() will run in core0 while
      kvm_inject_apic_timer_irqs() runs in a different core. If
      you get both on core0, try running a program that takes 100%
      of the CPU and pin it to core0 to force the vCPU out.
      Signed-off-by: NLuiz Capitulino <lcapitulino@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      61abdbe0
  12. 02 4月, 2016 2 次提交
  13. 01 4月, 2016 4 次提交
    • Y
      kvm: set page dirty only if page has been writable · 14f47605
      Yu Zhao 提交于
      In absence of shadow dirty mask, there is no need to set page dirty
      if page has never been writable. This is a tiny optimization but
      good to have for people who care much about dirty page tracking.
      Signed-off-by: NYu Zhao <yuzhao@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      14f47605
    • P
      KVM: x86: reduce default value of halt_poll_ns parameter · 14ebda33
      Paolo Bonzini 提交于
      Windows lets applications choose the frequency of the timer tick,
      and in Windows 10 the maximum rate was changed from 1024 Hz to
      2048 Hz.  Unfortunately, because of the way the Windows API
      works, most applications who need a higher rate than the default
      64 Hz will just do
      
         timeGetDevCaps(&tc, sizeof(tc));
         timeBeginPeriod(tc.wPeriodMin);
      
      and pick the maximum rate.  This causes very high CPU usage when
      playing media or games on Windows 10, even if the guest does not
      actually use the CPU very much, because the frequent timer tick
      causes halt_poll_ns to kick in.
      
      There is no really good solution, especially because Microsoft
      could sooner or later bump the limit to 4096 Hz, but for now
      the best we can do is lower a bit the upper limit for
      halt_poll_ns. :-(
      Reported-by: NJon Panozzo <jonp@lime-technology.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      14ebda33
    • P
      KVM: Hyper-V: do not do hypercall userspace exits if SynIC is disabled · a2b5c3c0
      Paolo Bonzini 提交于
      If SynIC is disabled, there is nothing that userspace can do to
      handle these exits; on the other hand, userspace probably will
      not know about KVM_EXIT_HYPERV_HCALL and complain about it or
      even exit.  Just prevent anything bad from happening by handling
      the hypercall in KVM and returning an "invalid hypercall" code.
      
      Fixes: 83326e43
      Cc: Andrey Smetanin <irqlevel@gmail.com>
      Reviewed-by: NRoman Kagan <rkagan@virtuozzo.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a2b5c3c0
    • Y
      KVM: x86: Inject pending interrupt even if pending nmi exist · 321c5658
      Yuki Shibuya 提交于
      Non maskable interrupts (NMI) are preferred to interrupts in current
      implementation. If a NMI is pending and NMI is blocked by the result
      of nmi_allowed(), pending interrupt is not injected and
      enable_irq_window() is not executed, even if interrupts injection is
      allowed.
      
      In old kernel (e.g. 2.6.32), schedule() is often called in NMI context.
      In this case, interrupts are needed to execute iret that intends end
      of NMI. The flag of blocking new NMI is not cleared until the guest
      execute the iret, and interrupts are blocked by pending NMI. Due to
      this, iret can't be invoked in the guest, and the guest is starved
      until block is cleared by some events (e.g. canceling injection).
      
      This patch injects pending interrupts, when it's allowed, even if NMI
      is blocked. And, If an interrupts is pending after executing
      inject_pending_event(), enable_irq_window() is executed regardless of
      NMI pending counter.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NYuki Shibuya <shibuya.yk@ncos.nec.co.jp>
      Suggested-by: NPaolo Bonzini <pbonzini@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      321c5658
  14. 31 3月, 2016 1 次提交
    • P
      perf/x86/amd/ibs: Fix pmu::stop() nesting · 85dc6002
      Peter Zijlstra 提交于
      Patch 5a50f529 ("perf/x86/ibs: Fix race with IBS_STARTING state")
      closed a big hole while opening another, smaller hole.
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Fixes: 5a50f529 ("perf/x86/ibs: Fix race with IBS_STARTING state")
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      85dc6002
  15. 29 3月, 2016 8 次提交
  16. 26 3月, 2016 3 次提交
    • S
      ACPI / processor: Request native thermal interrupt handling via _OSC · a2121167
      Srinivas Pandruvada 提交于
      There are several reports of freeze on enabling HWP (Hardware PStates)
      feature on Skylake-based systems by the Intel P-states driver. The root
      cause is identified as the HWP interrupts causing BIOS code to freeze.
      
      HWP interrupts use the thermal LVT which can be handled by Linux
      natively, but on the affected Skylake-based systems SMM will respond
      to it by default.  This is a problem for several reasons:
       - On the affected systems the SMM thermal LVT handler is broken (it
         will crash when invoked) and a BIOS update is necessary to fix it.
       - With thermal interrupt handled in SMM we lose all of the reporting
         features of the arch/x86/kernel/cpu/mcheck/therm_throt driver.
       - Some thermal drivers like x86-package-temp depend on the thermal
         threshold interrupts signaled via the thermal LVT.
       - The HWP interrupts are useful for debugging and tuning
         performance (if the kernel can handle them).
      The native handling of thermal interrupts needs to be enabled
      because of that.
      
      This requires some way to tell SMM that the OS can handle thermal
      interrupts.  That can be done by using _OSC/_PDC in processor
      scope very early during ACPI initialization.
      
      The meaning of _OSC/_PDC bit 12 in processor scope is whether or
      not the OS supports native handling of interrupts for Collaborative
      Processor Performance Control (CPPC) notifications.  Since on
      HWP-capable systems CPPC is a firmware interface to HWP, setting
      this bit effectively tells the firmware that the OS will handle
      thermal interrupts natively going forward.
      
      For details on _OSC/_PDC refer to:
      http://www.intel.com/content/www/us/en/standards/processor-vendor-specific-acpi-specification.html
      
      To implement the _OSC/_PDC handshake as described, introduce a new
      function, acpi_early_processor_osc(), that walks the ACPI
      namespace looking for ACPI processor objects and invokes _OSC for
      them with bit 12 in the capabilities buffer set and terminates the
      namespace walk on the first success.
      
      Also modify intel_thermal_interrupt() to clear HWP status bits in
      the HWP_STATUS MSR to acknowledge HWP interrupts (which prevents
      them from firing continuously).
      Signed-off-by: NSrinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
      [ rjw: Subject & changelog, function rename ]
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      a2121167
    • A
      mm, kasan: stackdepot implementation. Enable stackdepot for SLAB · cd11016e
      Alexander Potapenko 提交于
      Implement the stack depot and provide CONFIG_STACKDEPOT.  Stack depot
      will allow KASAN store allocation/deallocation stack traces for memory
      chunks.  The stack traces are stored in a hash table and referenced by
      handles which reside in the kasan_alloc_meta and kasan_free_meta
      structures in the allocated memory chunks.
      
      IRQ stack traces are cut below the IRQ entry point to avoid unnecessary
      duplication.
      
      Right now stackdepot support is only enabled in SLAB allocator.  Once
      KASAN features in SLAB are on par with those in SLUB we can switch SLUB
      to stackdepot as well, thus removing the dependency on SLUB stack
      bookkeeping, which wastes a lot of memory.
      
      This patch is based on the "mm: kasan: stack depots" patch originally
      prepared by Dmitry Chernenkov.
      
      Joonsoo has said that he plans to reuse the stackdepot code for the
      mm/page_owner.c debugging facility.
      
      [akpm@linux-foundation.org: s/depot_stack_handle/depot_stack_handle_t]
      [aryabinin@virtuozzo.com: comment style fixes]
      Signed-off-by: NAlexander Potapenko <glider@google.com>
      Signed-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Andrey Konovalov <adech.fo@gmail.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Konstantin Serebryany <kcc@google.com>
      Cc: Dmitry Chernenkov <dmitryc@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cd11016e
    • A
      arch, ftrace: for KASAN put hard/soft IRQ entries into separate sections · be7635e7
      Alexander Potapenko 提交于
      KASAN needs to know whether the allocation happens in an IRQ handler.
      This lets us strip everything below the IRQ entry point to reduce the
      number of unique stack traces needed to be stored.
      
      Move the definition of __irq_entry to <linux/interrupt.h> so that the
      users don't need to pull in <linux/ftrace.h>.  Also introduce the
      __softirq_entry macro which is similar to __irq_entry, but puts the
      corresponding functions to the .softirqentry.text section.
      Signed-off-by: NAlexander Potapenko <glider@google.com>
      Acked-by: NSteven Rostedt <rostedt@goodmis.org>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Andrey Konovalov <adech.fo@gmail.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Konstantin Serebryany <kcc@google.com>
      Cc: Dmitry Chernenkov <dmitryc@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      be7635e7
  17. 25 3月, 2016 2 次提交
    • B
      xen/apic: Provide Xen-specific version of cpu_present_to_apicid APIC op · ed6069be
      Boris Ostrovsky 提交于
      Currently Xen uses default_cpu_present_to_apicid() which will always
      report BAD_APICID for PV guests since x86_bios_cpu_apic_id is initialised
      to that value and is never updated.
      
      With commit 1f12e32f ("x86/topology: Create logical package id"), this
      op is now called by smp_init_package_map() when deciding whether to call
      topology_update_package_map() which sets cpu_data(cpu).logical_proc_id.
      The latter (as topology_logical_package_id(cpu)) may be used, for example,
      by cpu_to_rapl_pmu() as an array index. Since uninitialized
      logical_package_id is set to -1, the index will become 64K which is
      obviously problematic.
      
      While RAPL code (and any other users of logical_package_id) should be
      careful in their assumptions about id's validity, Xen's
      cpu_present_to_apicid op should still provide value consistent with its
      own xen_apic_read(APIC_ID).
      Signed-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      ed6069be
    • H
      perf/x86: Move events_sysfs_show() outside CPU_SUP_INTEL · a49ac9f8
      Huang Rui 提交于
      randconfig builds can sometimes disable CONFIG_CPU_SUP_INTEL while
      enabling the AMD power reporting PMU driver, resulting in this
      build failure:
      
        arch/x86/kernel/cpu/perf_event.h:663:31: error: 'events_sysfs_show' undeclared here (not in a function)
      
      To fix it, move events_sysfs_show() outside of #ifdef CONFIG_CPU_SUP_INTEL.
      Reported-by: NRandy Dunlap <rdunlap@infradead.org>
      Reported-by: Nbuild test robot <lkp@intel.com>
      Signed-off-by: NHuang Rui <ray.huang@amd.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Fengguang Wu <fengguang.wu@intel.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sherry Hurwitz <sherry.hurwitz@amd.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: acme@kernel.org
      Cc: kbuild-all@01.org
      Cc: linux-next@vger.kernel.org
      Cc: spg_linux_kernel@amd.com
      Link: http://lkml.kernel.org/r/1458875905-4278-1-git-send-email-ray.huang@amd.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      a49ac9f8
  18. 23 3月, 2016 3 次提交
    • P
      x86/msr: Remove unused native_read_tscp() · 9da77666
      Prarit Bhargava 提交于
      After e76b027e ("x86,vdso: Use LSL unconditionally for vgetcpu")
      native_read_tscp() is unused in the kernel. The function can be removed like
      native_read_tsc() was.
      Signed-off-by: NPrarit Bhargava <prarit@redhat.com>
      Acked-by: NAndy Lutomirski <luto@amacapital.net>
      Cc: Borislav Petkov <bp@suse.de>
      Link: http://lkml.kernel.org/r/1458687968-9106-1-git-send-email-prarit@redhat.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
      9da77666
    • A
      x86/extable: use generic search and sort routines · 29934b0f
      Ard Biesheuvel 提交于
      Replace the arch specific versions of search_extable() and
      sort_extable() with calls to the generic ones, which now support
      relative exception tables as well.
      Signed-off-by: NArd Biesheuvel <ard.biesheuvel@linaro.org>
      Acked-by: NH. Peter Anvin <hpa@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      29934b0f
    • D
      kernel: add kcov code coverage · 5c9a8750
      Dmitry Vyukov 提交于
      kcov provides code coverage collection for coverage-guided fuzzing
      (randomized testing).  Coverage-guided fuzzing is a testing technique
      that uses coverage feedback to determine new interesting inputs to a
      system.  A notable user-space example is AFL
      (http://lcamtuf.coredump.cx/afl/).  However, this technique is not
      widely used for kernel testing due to missing compiler and kernel
      support.
      
      kcov does not aim to collect as much coverage as possible.  It aims to
      collect more or less stable coverage that is function of syscall inputs.
      To achieve this goal it does not collect coverage in soft/hard
      interrupts and instrumentation of some inherently non-deterministic or
      non-interesting parts of kernel is disbled (e.g.  scheduler, locking).
      
      Currently there is a single coverage collection mode (tracing), but the
      API anticipates additional collection modes.  Initially I also
      implemented a second mode which exposes coverage in a fixed-size hash
      table of counters (what Quentin used in his original patch).  I've
      dropped the second mode for simplicity.
      
      This patch adds the necessary support on kernel side.  The complimentary
      compiler support was added in gcc revision 231296.
      
      We've used this support to build syzkaller system call fuzzer, which has
      found 90 kernel bugs in just 2 months:
      
        https://github.com/google/syzkaller/wiki/Found-Bugs
      
      We've also found 30+ bugs in our internal systems with syzkaller.
      Another (yet unexplored) direction where kcov coverage would greatly
      help is more traditional "blob mutation".  For example, mounting a
      random blob as a filesystem, or receiving a random blob over wire.
      
      Why not gcov.  Typical fuzzing loop looks as follows: (1) reset
      coverage, (2) execute a bit of code, (3) collect coverage, repeat.  A
      typical coverage can be just a dozen of basic blocks (e.g.  an invalid
      input).  In such context gcov becomes prohibitively expensive as
      reset/collect coverage steps depend on total number of basic
      blocks/edges in program (in case of kernel it is about 2M).  Cost of
      kcov depends only on number of executed basic blocks/edges.  On top of
      that, kernel requires per-thread coverage because there are always
      background threads and unrelated processes that also produce coverage.
      With inlined gcov instrumentation per-thread coverage is not possible.
      
      kcov exposes kernel PCs and control flow to user-space which is
      insecure.  But debugfs should not be mapped as user accessible.
      
      Based on a patch by Quentin Casasnovas.
      
      [akpm@linux-foundation.org: make task_struct.kcov_mode have type `enum kcov_mode']
      [akpm@linux-foundation.org: unbreak allmodconfig]
      [akpm@linux-foundation.org: follow x86 Makefile layout standards]
      Signed-off-by: NDmitry Vyukov <dvyukov@google.com>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Cc: syzkaller <syzkaller@googlegroups.com>
      Cc: Vegard Nossum <vegard.nossum@oracle.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Tavis Ormandy <taviso@google.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Quentin Casasnovas <quentin.casasnovas@oracle.com>
      Cc: Kostya Serebryany <kcc@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Kees Cook <keescook@google.com>
      Cc: Bjorn Helgaas <bhelgaas@google.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Cc: David Drysdale <drysdale@google.com>
      Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Jiri Slaby <jslaby@suse.cz>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5c9a8750