1. 20 5月, 2019 1 次提交
  2. 18 5月, 2019 1 次提交
  3. 17 5月, 2019 1 次提交
  4. 16 5月, 2019 4 次提交
    • A
      x86/speculation/mds: Revert CPU buffer clear on double fault exit · 88640e1d
      Andy Lutomirski 提交于
      The double fault ESPFIX path doesn't return to user mode at all --
      it returns back to the kernel by simulating a #GP fault.
      prepare_exit_to_usermode() will run on the way out of
      general_protection before running user code.
      Signed-off-by: NAndy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Frederic Weisbecker <frederic@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Jon Masters <jcm@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: stable@vger.kernel.org
      Fixes: 04dcbdb8 ("x86/speculation/mds: Clear CPU buffers on exit to user")
      Link: http://lkml.kernel.org/r/ac97612445c0a44ee10374f6ea79c222fe22a5c4.1557865329.git.luto@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      88640e1d
    • S
      Revert "KVM: nVMX: Expose RDPMC-exiting only when guest supports PMU" · f93f7ede
      Sean Christopherson 提交于
      The RDPMC-exiting control is dependent on the existence of the RDPMC
      instruction itself, i.e. is not tied to the "Architectural Performance
      Monitoring" feature.  For all intents and purposes, the control exists
      on all CPUs with VMX support since RDPMC also exists on all VCPUs with
      VMX supported.  Per Intel's SDM:
      
        The RDPMC instruction was introduced into the IA-32 Architecture in
        the Pentium Pro processor and the Pentium processor with MMX technology.
        The earlier Pentium processors have performance-monitoring counters, but
        they must be read with the RDMSR instruction.
      
      Because RDPMC-exiting always exists, KVM requires the control and refuses
      to load if it's not available.  As a result, hiding the PMU from a guest
      breaks nested virtualization if the guest attemts to use KVM.
      
      While it's not explicitly stated in the RDPMC pseudocode, the VM-Exit
      check for RDPMC-exiting follows standard fault vs. VM-Exit prioritization
      for privileged instructions, e.g. occurs after the CPL/CR0.PE/CR4.PCE
      checks, but before the counter referenced in ECX is checked for validity.
      
      In other words, the original KVM behavior of injecting a #GP was correct,
      and the KVM unit test needs to be adjusted accordingly, e.g. eat the #GP
      when the unit test guest (L3 in this case) executes RDPMC without
      RDPMC-exiting set in the unit test host (L2).
      
      This reverts commit e51bfdb6.
      
      Fixes: e51bfdb6 ("KVM: nVMX: Expose RDPMC-exiting only when guest supports PMU")
      Reported-by: NDavid Hill <hilld@binarystorm.net>
      Cc: Saar Amar <saaramar@microsoft.com>
      Cc: Mihai Carabas <mihai.carabas@oracle.com>
      Cc: Jim Mattson <jmattson@google.com>
      Cc: Liran Alon <liran.alon@oracle.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      f93f7ede
    • K
      kvm: x86: Fix L1TF mitigation for shadow MMU · 61455bf2
      Kai Huang 提交于
      Currently KVM sets 5 most significant bits of physical address bits
      reported by CPUID (boot_cpu_data.x86_phys_bits) for nonpresent or
      reserved bits SPTE to mitigate L1TF attack from guest when using shadow
      MMU. However for some particular Intel CPUs the physical address bits
      of internal cache is greater than physical address bits reported by
      CPUID.
      
      Use the kernel's existing boot_cpu_data.x86_cache_bits to determine the
      five most significant bits. Doing so improves KVM's L1TF mitigation in
      the unlikely scenario that system RAM overlaps the high order bits of
      the "real" physical address space as reported by CPUID. This aligns with
      the kernel's warnings regarding L1TF mitigation, e.g. in the above
      scenario the kernel won't warn the user about lack of L1TF mitigation
      if x86_cache_bits is greater than x86_phys_bits.
      
      Also initialize shadow_nonpresent_or_rsvd_mask explicitly to make it
      consistent with other 'shadow_{xxx}_mask', and opportunistically add a
      WARN once if KVM's L1TF mitigation cannot be applied on a system that
      is marked as being susceptible to L1TF.
      Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NKai Huang <kai.huang@linux.intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      61455bf2
    • S
      KVM: nVMX: Disable intercept for FS/GS base MSRs in vmcs02 when possible · d69129b4
      Sean Christopherson 提交于
      If L1 is using an MSR bitmap, unconditionally merge the MSR bitmaps from
      L0 and L1 for MSR_{KERNEL,}_{FS,GS}_BASE.  KVM unconditionally exposes
      MSRs L1.  If KVM is also running in L1 then it's highly likely L1 is
      also exposing the MSRs to L2, i.e. KVM doesn't need to intercept L2
      accesses.
      
      Based on code from Jintack Lim.
      
      Cc: Jintack Lim <jintack@xxxxxxxxxxxxxxx>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@xxxxxxxxx>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d69129b4
  5. 15 5月, 2019 9 次提交
  6. 14 5月, 2019 1 次提交
  7. 13 5月, 2019 1 次提交
  8. 11 5月, 2019 4 次提交
    • S
      x86: Hide the int3_emulate_call/jmp functions from UML · 693713cb
      Steven Rostedt (VMware) 提交于
      User Mode Linux does not have access to the ip or sp fields of the pt_regs,
      and accessing them causes UML to fail to build. Hide the int3_emulate_jmp()
      and int3_emulate_call() instructions from UML, as it doesn't need them
      anyway.
      Reported-by: Nkbuild test robot <lkp@intel.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      693713cb
    • J
      livepatch: Remove klp_check_compiler_support() · 56e33afd
      Jiri Kosina 提交于
      The only purpose of klp_check_compiler_support() is to make sure that we
      are not using ftrace on x86 via mcount (because that's executed only after
      prologue has already happened, and that's too late for livepatching
      purposes).
      
      Now that mcount is not supported by ftrace any more, there is no need for
      klp_check_compiler_support() either.
      
      Link: http://lkml.kernel.org/r/nycvar.YFH.7.76.1905102346100.17054@cbobk.fhfr.pmReported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      56e33afd
    • S
      ftrace/x86: Remove mcount support · 562e14f7
      Steven Rostedt (VMware) 提交于
      There's two methods of enabling function tracing in Linux on x86. One is
      with just "gcc -pg" and the other is "gcc -pg -mfentry". The former will use
      calls to a special function "mcount" after the frame is set up in all C
      functions. The latter will add calls to a special function called "fentry"
      as the very first instruction of all C functions.
      
      At compile time, there is a check to see if gcc supports, -mfentry, and if
      it does, it will use that, because it is more versatile and less error prone
      for function tracing.
      
      Starting with v4.19, the minimum gcc supported to build the Linux kernel,
      was raised to version 4.6. That also happens to be the first gcc version to
      support -mfentry. Since on x86, using gcc versions from 4.6 and beyond will
      unconditionally enable the -mfentry, it will no longer use mcount as the
      method for inserting calls into the C functions of the kernel. This means
      that there is no point in continuing to maintain mcount in x86.
      
      Remove support for using mcount. This makes the code less complex, and will
      also allow it to be simplified in the future.
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: NJiri Kosina <jkosina@suse.cz>
      Acked-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      562e14f7
    • S
      ftrace/x86_32: Remove support for non DYNAMIC_FTRACE · 518049d9
      Steven Rostedt (VMware) 提交于
      When DYNAMIC_FTRACE is enabled in the kernel, all the functions that can be
      traced by the function tracer have a "nop" placeholder at the start of the
      function. When function tracing is enabled, the nop is converted into a call
      to the tracing infrastructure where the functions get traced. This also
      allows for specifying specific functions to trace, and a lot of
      infrastructure is built on top of this.
      
      When DYNAMIC_FTRACE is not enabled, all the functions have a call to the
      ftrace trampoline. A check is made to see if a function pointer is the
      ftrace_stub or not, and if it is not, it calls the function pointer to trace
      the code. This adds over 10% overhead to the kernel even when tracing is
      disabled.
      
      When an architecture supports DYNAMIC_FTRACE there really is no reason to
      use the static tracing. I have kept non DYNAMIC_FTRACE available in x86 so
      that the generic code for non DYNAMIC_FTRACE can be tested. There is no
      reason to support non DYNAMIC_FTRACE for both x86_64 and x86_32. As the non
      DYNAMIC_FTRACE for x86_32 does not even support fentry, and we want to
      remove mcount completely, there's no reason to keep non DYNAMIC_FTRACE
      around for x86_32.
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      518049d9
  9. 10 5月, 2019 3 次提交
    • V
      cpufreq: Call transition notifier only once for each policy · df24014a
      Viresh Kumar 提交于
      Currently, the notifiers are called once for each CPU of the policy->cpus
      cpumask. It would be more optimal if the notifier can be called only
      once and all the relevant information be provided to it. Out of the 23
      drivers that register for the transition notifiers today, only 4 of them
      do per-cpu updates and the callback for the rest can be called only once
      for the policy without any impact.
      
      This would also avoid multiple function calls to the notifier callbacks
      and reduce multiple iterations of notifier core's code (which does
      locking as well).
      
      This patch adds pointer to the cpufreq policy to the struct
      cpufreq_freqs, so the notifier callback has all the information
      available to it with a single call. The five drivers which perform
      per-cpu updates are updated to use the cpufreq policy. The freqs->cpu
      field is redundant now and is removed.
      
      Acked-by: David S. Miller <davem@davemloft.net> (sparc)
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      df24014a
    • R
      x86: intel_epb: Take CONFIG_PM into account · 9ed09853
      Rafael J. Wysocki 提交于
      Commit b9c273ba ("PM / arch: x86: MSR_IA32_ENERGY_PERF_BIAS sysfs
      interface") caused kernels built with CONFIG_PM unset to crash on
      systems supporting the Performance and Energy Bias Hint (EPB),
      because it attempts to add files to sysfs directories that don't
      exist on those systems.
      
      Prevent that from happening by taking CONFIG_PM into account so
      that the code depending on it is not compiled at all when it is
      not set.
      
      Fixes: b9c273ba ("PM / arch: x86: MSR_IA32_ENERGY_PERF_BIAS sysfs interface")
      Reported-by: NIdo Schimmel <idosch@mellanox.com>
      Tested-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NIngo Molnar <mingo@kernel.org>
      9ed09853
    • S
      perf/x86/intel: Fix INTEL_FLAGS_EVENT_CONSTRAINT* masking · 6b89d4c1
      Stephane Eranian 提交于
      On Intel Westmere, a cmdline as follows:
      
        $ perf record -e cpu/event=0xc4,umask=0x2,name=br_inst_retired.near_call/p ....
      
      was failing. Yet the event+ umask support PEBS.
      
      It turns out this is due to a bug in the the PEBS event constraint table for
      westmere. All forms of BR_INST_RETIRED.* support PEBS. Therefore the constraint
      mask should ignore the umask. The name of the macro INTEL_FLAGS_EVENT_CONSTRAINT()
      hint that this is the case but it was not. That macros was checking both the
      event code and event umask. Therefore, it was only matching on 0x00c4.
      There are code+umask macros, they all have *UEVENT*.
      
      This bug fixes the issue by checking only the event code in the mask.
      Both single and range version are modified.
      Signed-off-by: NStephane Eranian <eranian@google.com>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Cc: kan.liang@intel.com
      Link: http://lkml.kernel.org/r/20190509214556.123493-1-eranian@google.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6b89d4c1
  10. 09 5月, 2019 5 次提交
    • D
      x86/mpx, mm/core: Fix recursive munmap() corruption · 5a28fc94
      Dave Hansen 提交于
      This is a bit of a mess, to put it mildly.  But, it's a bug
      that only seems to have showed up in 4.20 but wasn't noticed
      until now, because nobody uses MPX.
      
      MPX has the arch_unmap() hook inside of munmap() because MPX
      uses bounds tables that protect other areas of memory.  When
      memory is unmapped, there is also a need to unmap the MPX
      bounds tables.  Barring this, unused bounds tables can eat 80%
      of the address space.
      
      But, the recursive do_munmap() that gets called vi arch_unmap()
      wreaks havoc with __do_munmap()'s state.  It can result in
      freeing populated page tables, accessing bogus VMA state,
      double-freed VMAs and more.
      
      See the "long story" further below for the gory details.
      
      To fix this, call arch_unmap() before __do_unmap() has a chance
      to do anything meaningful.  Also, remove the 'vma' argument
      and force the MPX code to do its own, independent VMA lookup.
      
      == UML / unicore32 impact ==
      
      Remove unused 'vma' argument to arch_unmap().  No functional
      change.
      
      I compile tested this on UML but not unicore32.
      
      == powerpc impact ==
      
      powerpc uses arch_unmap() well to watch for munmap() on the
      VDSO and zeroes out 'current->mm->context.vdso_base'.  Moving
      arch_unmap() makes this happen earlier in __do_munmap().  But,
      'vdso_base' seems to only be used in perf and in the signal
      delivery that happens near the return to userspace.  I can not
      find any likely impact to powerpc, other than the zeroing
      happening a little earlier.
      
      powerpc does not use the 'vma' argument and is unaffected by
      its removal.
      
      I compile-tested a 64-bit powerpc defconfig.
      
      == x86 impact ==
      
      For the common success case this is functionally identical to
      what was there before.  For the munmap() failure case, it's
      possible that some MPX tables will be zapped for memory that
      continues to be in use.  But, this is an extraordinarily
      unlikely scenario and the harm would be that MPX provides no
      protection since the bounds table got reset (zeroed).
      
      I can't imagine anyone doing this:
      
      	ptr = mmap();
      	// use ptr
      	ret = munmap(ptr);
      	if (ret)
      		// oh, there was an error, I'll
      		// keep using ptr.
      
      Because if you're doing munmap(), you are *done* with the
      memory.  There's probably no good data in there _anyway_.
      
      This passes the original reproducer from Richard Biener as
      well as the existing mpx selftests/.
      
      The long story:
      
      munmap() has a couple of pieces:
      
       1. Find the affected VMA(s)
       2. Split the start/end one(s) if neceesary
       3. Pull the VMAs out of the rbtree
       4. Actually zap the memory via unmap_region(), including
          freeing page tables (or queueing them to be freed).
       5. Fix up some of the accounting (like fput()) and actually
          free the VMA itself.
      
      This specific ordering was actually introduced by:
      
        dd2283f2 ("mm: mmap: zap pages with read mmap_sem in munmap")
      
      during the 4.20 merge window.  The previous __do_munmap() code
      was actually safe because the only thing after arch_unmap() was
      remove_vma_list().  arch_unmap() could not see 'vma' in the
      rbtree because it was detached, so it is not even capable of
      doing operations unsafe for remove_vma_list()'s use of 'vma'.
      
      Richard Biener reported a test that shows this in dmesg:
      
        [1216548.787498] BUG: Bad rss-counter state mm:0000000017ce560b idx:1 val:551
        [1216548.787500] BUG: non-zero pgtables_bytes on freeing mm: 24576
      
      What triggered this was the recursive do_munmap() called via
      arch_unmap().  It was freeing page tables that has not been
      properly zapped.
      
      But, the problem was bigger than this.  For one, arch_unmap()
      can free VMAs.  But, the calling __do_munmap() has variables
      that *point* to VMAs and obviously can't handle them just
      getting freed while the pointer is still in use.
      
      I tried a couple of things here.  First, I tried to fix the page
      table freeing problem in isolation, but I then found the VMA
      issue.  I also tried having the MPX code return a flag if it
      modified the rbtree which would force __do_munmap() to re-walk
      to restart.  That spiralled out of control in complexity pretty
      fast.
      
      Just moving arch_unmap() and accepting that the bonkers failure
      case might eat some bounds tables seems like the simplest viable
      fix.
      
      This was also reported in the following kernel bugzilla entry:
      
        https://bugzilla.kernel.org/show_bug.cgi?id=203123
      
      There are some reports that this commit triggered this bug:
      
        dd2283f2 ("mm: mmap: zap pages with read mmap_sem in munmap")
      
      While that commit certainly made the issues easier to hit, I believe
      the fundamental issue has been with us as long as MPX itself, thus
      the Fixes: tag below is for one of the original MPX commits.
      
      [ mingo: Minor edits to the changelog and the patch. ]
      Reported-by: NRichard Biener <rguenther@suse.de>
      Reported-by: NH.J. Lu <hjl.tools@gmail.com>
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by Thomas Gleixner <tglx@linutronix.de>
      Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com>
      Acked-by: NMichael Ellerman <mpe@ellerman.id.au>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: linux-arch@vger.kernel.org
      Cc: linux-mm@kvack.org
      Cc: linux-um@lists.infradead.org
      Cc: linuxppc-dev@lists.ozlabs.org
      Cc: stable@vger.kernel.org
      Fixes: dd2283f2 ("mm: mmap: zap pages with read mmap_sem in munmap")
      Link: http://lkml.kernel.org/r/20190419194747.5E1AD6DC@viggo.jf.intel.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      5a28fc94
    • B
      x86/mm: Do not use set_{pud, pmd}_safe() when splitting a large page · eccd9064
      Brijesh Singh 提交于
      The commit
      
        0a9fe8ca ("x86/mm: Validate kernel_physical_mapping_init() PTE population")
      
      triggers this warning in SEV guests:
      
        WARNING: CPU: 0 PID: 0 at arch/x86/include/asm/pgalloc.h:87 phys_pmd_init+0x30d/0x386
        Call Trace:
         kernel_physical_mapping_init+0xce/0x259
         early_set_memory_enc_dec+0x10f/0x160
         kvm_smp_prepare_boot_cpu+0x71/0x9d
         start_kernel+0x1c9/0x50b
         secondary_startup_64+0xa4/0xb0
      
      A SEV guest calls kernel_physical_mapping_init() to clear the encryption
      mask from an existing mapping. While doing so, it also splits large
      pages into smaller.
      
      To split a page, kernel_physical_mapping_init() allocates a new page and
      updates the existing entry. The set_{pud,pmd}_safe() helpers trigger a
      warning when updating an entry with a page in the present state.
      
      Add a new kernel_physical_mapping_change() helper which uses the
      non-safe variants of set_{pmd,pud,p4d}() and {pmd,pud,p4d}_populate()
      routines when updating the entry.
      
      Since kernel_physical_mapping_change() may replace an existing
      entry with a new entry, the caller is responsible to flush
      the TLB at the end. Change early_set_memory_enc_dec() to use
      kernel_physical_mapping_change() when it wants to clear the memory
      encryption mask from the page table entry.
      
       [ bp:
         - massage commit message.
         - flesh out comment according to dhansen's request.
         - align function arguments at opening brace. ]
      
      Fixes: 0a9fe8ca ("x86/mm: Validate kernel_physical_mapping_init() PTE population")
      Signed-off-by: NBrijesh Singh <brijesh.singh@amd.com>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Reviewed-by: NDave Hansen <dave.hansen@intel.com>
      Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Lendacky <Thomas.Lendacky@amd.com>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190417154102.22613-1-brijesh.singh@amd.com
      eccd9064
    • P
      ftrace/x86_64: Emulate call function while updating in breakpoint handler · 9e298e86
      Peter Zijlstra 提交于
      Nicolai Stange discovered[1] that if live kernel patching is enabled, and the
      function tracer started tracing the same function that was patched, the
      conversion of the fentry call site during the translation of going from
      calling the live kernel patch trampoline to the iterator trampoline, would
      have as slight window where it didn't call anything.
      
      As live kernel patching depends on ftrace to always call its code (to
      prevent the function being traced from being called, as it will redirect
      it). This small window would allow the old buggy function to be called, and
      this can cause undesirable results.
      
      Nicolai submitted new patches[2] but these were controversial. As this is
      similar to the static call emulation issues that came up a while ago[3].
      But after some debate[4][5] adding a gap in the stack when entering the
      breakpoint handler allows for pushing the return address onto the stack to
      easily emulate a call.
      
      [1] http://lkml.kernel.org/r/20180726104029.7736-1-nstange@suse.de
      [2] http://lkml.kernel.org/r/20190427100639.15074-1-nstange@suse.de
      [3] http://lkml.kernel.org/r/3cf04e113d71c9f8e4be95fb84a510f085aa4afa.1541711457.git.jpoimboe@redhat.com
      [4] http://lkml.kernel.org/r/CAHk-=wh5OpheSU8Em_Q3Hg8qw_JtoijxOdPtHru6d+5K8TWM=A@mail.gmail.com
      [5] http://lkml.kernel.org/r/CAHk-=wjvQxY4DvPrJ6haPgAa6b906h=MwZXO6G8OtiTGe=N7_w@mail.gmail.com
      
      [
        Live kernel patching is not implemented on x86_32, thus the emulate
        calls are only for x86_64.
      ]
      
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Nicolai Stange <nstange@suse.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: the arch/x86 maintainers <x86@kernel.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: Miroslav Benes <mbenes@suse.cz>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Joe Lawrence <joe.lawrence@redhat.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Mimi Zohar <zohar@linux.ibm.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Nayna Jain <nayna@linux.ibm.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: "open list:KERNEL SELFTEST FRAMEWORK" <linux-kselftest@vger.kernel.org>
      Cc: stable@vger.kernel.org
      Fixes: b700e7f0 ("livepatch: kernel: add support for live patching")
      Tested-by: NNicolai Stange <nstange@suse.de>
      Reviewed-by: NNicolai Stange <nstange@suse.de>
      Reviewed-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      [ Changed to only implement emulated calls for x86_64 ]
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      9e298e86
    • P
      x86_64: Allow breakpoints to emulate call instructions · 4b33dadf
      Peter Zijlstra 提交于
      In order to allow breakpoints to emulate call instructions, they need to push
      the return address onto the stack. The x86_64 int3 handler adds a small gap
      to allow the stack to grow some. Use this gap to add the return address to
      be able to emulate a call instruction at the breakpoint location.
      
      These helper functions are added:
      
        int3_emulate_jmp(): changes the location of the regs->ip to return there.
      
       (The next two are only for x86_64)
        int3_emulate_push(): to push the address onto the gap in the stack
        int3_emulate_call(): push the return address and change regs->ip
      
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Nicolai Stange <nstange@suse.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: the arch/x86 maintainers <x86@kernel.org>
      Cc: Josh Poimboeuf <jpoimboe@redhat.com>
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: Miroslav Benes <mbenes@suse.cz>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Joe Lawrence <joe.lawrence@redhat.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Tim Chen <tim.c.chen@linux.intel.com>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: Mimi Zohar <zohar@linux.ibm.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Nick Desaulniers <ndesaulniers@google.com>
      Cc: Nayna Jain <nayna@linux.ibm.com>
      Cc: Masahiro Yamada <yamada.masahiro@socionext.com>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: "open list:KERNEL SELFTEST FRAMEWORK" <linux-kselftest@vger.kernel.org>
      Cc: stable@vger.kernel.org
      Fixes: b700e7f0 ("livepatch: kernel: add support for live patching")
      Tested-by: NNicolai Stange <nstange@suse.de>
      Reviewed-by: NNicolai Stange <nstange@suse.de>
      Reviewed-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      [ Modified to only work for x86_64 and added comment to int3_emulate_push() ]
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      4b33dadf
    • J
      x86_64: Add gap to int3 to allow for call emulation · 2700fefd
      Josh Poimboeuf 提交于
      To allow an int3 handler to emulate a call instruction, it must be able to
      push a return address onto the stack. Add a gap to the stack to allow the
      int3 handler to push the return address and change the return from int3 to
      jump straight to the emulated called function target.
      
      Link: http://lkml.kernel.org/r/20181130183917.hxmti5josgq4clti@treble
      Link: http://lkml.kernel.org/r/20190502162133.GX2623@hirez.programming.kicks-ass.net
      
      [
        Note, this is needed to allow Live Kernel Patching to not miss calling a
        patched function when tracing is enabled. -- Steven Rostedt
      ]
      
      Cc: stable@vger.kernel.org
      Fixes: b700e7f0 ("livepatch: kernel: add support for live patching")
      Tested-by: NNicolai Stange <nstange@suse.de>
      Reviewed-by: NNicolai Stange <nstange@suse.de>
      Reviewed-by: NMasami Hiramatsu <mhiramat@kernel.org>
      Signed-off-by: NJosh Poimboeuf <jpoimboe@redhat.com>
      Signed-off-by: NSteven Rostedt (VMware) <rostedt@goodmis.org>
      2700fefd
  11. 08 5月, 2019 5 次提交
  12. 06 5月, 2019 3 次提交
    • K
      *: convert stream-like files from nonseekable_open -> stream_open · c5bf68fe
      Kirill Smelkov 提交于
      Using scripts/coccinelle/api/stream_open.cocci added in 10dce8af
      ("fs: stream_open - opener for stream-like files so that read and write
      can run simultaneously without deadlock"), search and convert to
      stream_open all in-kernel nonseekable_open users for which read and
      write actually do not depend on ppos and where there is no other methods
      in file_operations which assume @offset access.
      
      I've verified each generated change manually - that it is correct to convert -
      and each other nonseekable_open instance left - that it is either not correct
      to convert there, or that it is not converted due to current stream_open.cocci
      limitations. The script also does not convert files that should be valid to
      convert, but that currently have .llseek = noop_llseek or generic_file_llseek
      for unknown reason despite file being opened with nonseekable_open (e.g.
      drivers/input/mousedev.c)
      
      Among cases converted 14 were potentially vulnerable to read vs write deadlock
      (see details in 10dce8af):
      
      	drivers/char/pcmcia/cm4000_cs.c:1685:7-23: ERROR: cm4000_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix.
      	drivers/gnss/core.c:45:1-17: ERROR: gnss_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix.
      	drivers/hid/uhid.c:635:1-17: ERROR: uhid_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix.
      	drivers/infiniband/core/user_mad.c:988:1-17: ERROR: umad_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix.
      	drivers/input/evdev.c:527:1-17: ERROR: evdev_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix.
      	drivers/input/misc/uinput.c:401:1-17: ERROR: uinput_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix.
      	drivers/isdn/capi/capi.c:963:8-24: ERROR: capi_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix.
      	drivers/leds/uleds.c:77:1-17: ERROR: uleds_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix.
      	drivers/media/rc/lirc_dev.c:198:1-17: ERROR: lirc_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix.
      	drivers/s390/char/fs3270.c:488:1-17: ERROR: fs3270_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix.
      	drivers/usb/misc/ldusb.c:310:1-17: ERROR: ld_usb_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix.
      	drivers/xen/evtchn.c:667:8-24: ERROR: evtchn_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix.
      	net/batman-adv/icmp_socket.c:80:1-17: ERROR: batadv_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix.
      	net/rfkill/core.c:1146:8-24: ERROR: rfkill_fops: .read() can deadlock .write(); change nonseekable_open -> stream_open to fix.
      
      and the rest were just safe to convert to stream_open because their read and
      write do not use ppos at all and corresponding file_operations do not
      have methods that assume @offset file access(*):
      
      	arch/powerpc/platforms/52xx/mpc52xx_gpt.c:631:8-24: WARNING: mpc52xx_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	arch/powerpc/platforms/cell/spufs/file.c:591:8-24: WARNING: spufs_ibox_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
      	arch/powerpc/platforms/cell/spufs/file.c:591:8-24: WARNING: spufs_ibox_stat_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
      	arch/powerpc/platforms/cell/spufs/file.c:591:8-24: WARNING: spufs_mbox_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
      	arch/powerpc/platforms/cell/spufs/file.c:591:8-24: WARNING: spufs_mbox_stat_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
      	arch/powerpc/platforms/cell/spufs/file.c:591:8-24: WARNING: spufs_wbox_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	arch/powerpc/platforms/cell/spufs/file.c:591:8-24: WARNING: spufs_wbox_stat_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
      	arch/um/drivers/harddog_kern.c:88:8-24: WARNING: harddog_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	arch/x86/kernel/cpu/microcode/core.c:430:33-49: WARNING: microcode_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/char/ds1620.c:215:8-24: WARNING: ds1620_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/char/dtlk.c:301:1-17: WARNING: dtlk_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/char/ipmi/ipmi_watchdog.c:840:9-25: WARNING: ipmi_wdog_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/char/pcmcia/scr24x_cs.c:95:8-24: WARNING: scr24x_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/char/tb0219.c:246:9-25: WARNING: tb0219_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/firewire/nosy.c:306:8-24: WARNING: nosy_ops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/hwmon/fschmd.c:840:8-24: WARNING: watchdog_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/hwmon/w83793.c:1344:8-24: WARNING: watchdog_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/infiniband/core/ucma.c:1747:8-24: WARNING: ucma_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/infiniband/core/ucm.c:1178:8-24: WARNING: ucm_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/infiniband/core/uverbs_main.c:1086:8-24: WARNING: uverbs_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/input/joydev.c:282:1-17: WARNING: joydev_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/pci/switch/switchtec.c:393:1-17: WARNING: switchtec_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/platform/chrome/cros_ec_debugfs.c:135:8-24: WARNING: cros_ec_console_log_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/rtc/rtc-ds1374.c:470:9-25: WARNING: ds1374_wdt_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/rtc/rtc-m41t80.c:805:9-25: WARNING: wdt_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/s390/char/tape_char.c:293:2-18: WARNING: tape_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/s390/char/zcore.c:194:8-24: WARNING: zcore_reipl_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/s390/crypto/zcrypt_api.c:528:8-24: WARNING: zcrypt_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/spi/spidev.c:594:1-17: WARNING: spidev_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/staging/pi433/pi433_if.c:974:1-17: WARNING: pi433_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/acquirewdt.c:203:8-24: WARNING: acq_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/advantechwdt.c:202:8-24: WARNING: advwdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/alim1535_wdt.c:252:8-24: WARNING: ali_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/alim7101_wdt.c:217:8-24: WARNING: wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/ar7_wdt.c:166:8-24: WARNING: ar7_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/at91rm9200_wdt.c:113:8-24: WARNING: at91wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/ath79_wdt.c:135:8-24: WARNING: ath79_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/bcm63xx_wdt.c:119:8-24: WARNING: bcm63xx_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/cpu5wdt.c:143:8-24: WARNING: cpu5wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/cpwd.c:397:8-24: WARNING: cpwd_fops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/eurotechwdt.c:319:8-24: WARNING: eurwdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/f71808e_wdt.c:528:8-24: WARNING: watchdog_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/gef_wdt.c:232:8-24: WARNING: gef_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/geodewdt.c:95:8-24: WARNING: geodewdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/ib700wdt.c:241:8-24: WARNING: ibwdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/ibmasr.c:326:8-24: WARNING: asr_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/indydog.c:80:8-24: WARNING: indydog_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/intel_scu_watchdog.c:307:8-24: WARNING: intel_scu_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/iop_wdt.c:104:8-24: WARNING: iop_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/it8712f_wdt.c:330:8-24: WARNING: it8712f_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/ixp4xx_wdt.c:68:8-24: WARNING: ixp4xx_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/ks8695_wdt.c:145:8-24: WARNING: ks8695wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/m54xx_wdt.c:88:8-24: WARNING: m54xx_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/machzwd.c:336:8-24: WARNING: zf_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/mixcomwd.c:153:8-24: WARNING: mixcomwd_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/mtx-1_wdt.c:121:8-24: WARNING: mtx1_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/mv64x60_wdt.c:136:8-24: WARNING: mv64x60_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/nuc900_wdt.c:134:8-24: WARNING: nuc900wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/nv_tco.c:164:8-24: WARNING: nv_tco_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/pc87413_wdt.c:289:8-24: WARNING: pc87413_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/pcwd.c:698:8-24: WARNING: pcwd_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/pcwd.c:737:8-24: WARNING: pcwd_temp_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/pcwd_pci.c:581:8-24: WARNING: pcipcwd_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/pcwd_pci.c:623:8-24: WARNING: pcipcwd_temp_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/pcwd_usb.c:488:8-24: WARNING: usb_pcwd_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/pcwd_usb.c:527:8-24: WARNING: usb_pcwd_temperature_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/pika_wdt.c:121:8-24: WARNING: pikawdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/pnx833x_wdt.c:119:8-24: WARNING: pnx833x_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/rc32434_wdt.c:153:8-24: WARNING: rc32434_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/rdc321x_wdt.c:145:8-24: WARNING: rdc321x_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/riowd.c:79:1-17: WARNING: riowd_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/sa1100_wdt.c:62:8-24: WARNING: sa1100dog_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/sbc60xxwdt.c:211:8-24: WARNING: wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/sbc7240_wdt.c:139:8-24: WARNING: wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/sbc8360.c:274:8-24: WARNING: sbc8360_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/sbc_epx_c3.c:81:8-24: WARNING: epx_c3_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/sbc_fitpc2_wdt.c:78:8-24: WARNING: fitpc2_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/sb_wdog.c:108:1-17: WARNING: sbwdog_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/sc1200wdt.c:181:8-24: WARNING: sc1200wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/sc520_wdt.c:261:8-24: WARNING: wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/sch311x_wdt.c:319:8-24: WARNING: sch311x_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/scx200_wdt.c:105:8-24: WARNING: scx200_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/smsc37b787_wdt.c:369:8-24: WARNING: wb_smsc_wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/w83877f_wdt.c:227:8-24: WARNING: wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/w83977f_wdt.c:301:8-24: WARNING: wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/wafer5823wdt.c:200:8-24: WARNING: wafwdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/watchdog_dev.c:828:8-24: WARNING: watchdog_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/wdrtas.c:379:8-24: WARNING: wdrtas_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/wdrtas.c:445:8-24: WARNING: wdrtas_temp_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/wdt285.c:104:1-17: WARNING: watchdog_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/wdt977.c:276:8-24: WARNING: wdt977_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/wdt.c:424:8-24: WARNING: wdt_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/wdt.c:484:8-24: WARNING: wdt_temp_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/wdt_pci.c:464:8-24: WARNING: wdtpci_fops: .write() has stream semantic; safe to change nonseekable_open -> stream_open.
      	drivers/watchdog/wdt_pci.c:527:8-24: WARNING: wdtpci_temp_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
      	net/batman-adv/log.c:105:1-17: WARNING: batadv_log_fops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
      	sound/core/control.c:57:7-23: WARNING: snd_ctl_f_ops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
      	sound/core/rawmidi.c:385:7-23: WARNING: snd_rawmidi_f_ops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open.
      	sound/core/seq/seq_clientmgr.c:310:7-23: WARNING: snd_seq_f_ops: .read() and .write() have stream semantic; safe to change nonseekable_open -> stream_open.
      	sound/core/timer.c:1428:7-23: WARNING: snd_timer_f_ops: .read() has stream semantic; safe to change nonseekable_open -> stream_open.
      
      One can also recheck/review the patch via generating it with explanation comments included via
      
      	$ make coccicheck MODE=patch COCCI=scripts/coccinelle/api/stream_open.cocci SPFLAGS="-D explain"
      
      (*) This second group also contains cases with read/write deadlocks that
      stream_open.cocci don't yet detect, but which are still valid to convert to
      stream_open since ppos is not used. For example drivers/pci/switch/switchtec.c
      calls wait_for_completion_interruptible() in its .read, but stream_open.cocci
      currently detects only "wait_event*" as blocking.
      
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Cc: Yongzhi Pan <panyongzhi@gmail.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: David Vrabel <david.vrabel@citrix.com>
      Cc: Juergen Gross <jgross@suse.com>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Kirill Tkhai <ktkhai@virtuozzo.com>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Julia Lawall <Julia.Lawall@lip6.fr>
      Cc: Nikolaus Rath <Nikolaus@rath.org>
      Cc: Han-Wen Nienhuys <hanwen@google.com>
      Cc: Anatolij Gustschin <agust@denx.de>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Anton Ivanov <anton.ivanov@cambridgegreys.com>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "James R. Van Zandt" <jrv@vanzandt.mv.com>
      Cc: Corey Minyard <minyard@acm.org>
      Cc: Harald Welte <laforge@gnumonks.org>
      Acked-by: Lubomir Rintel <lkundrak@v3.sk> [scr24x_cs]
      Cc: Stefan Richter <stefanr@s5r6.in-berlin.de>
      Cc: Johan Hovold <johan@kernel.org>
      Cc: David Herrmann <dh.herrmann@googlemail.com>
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: Benjamin Tissoires <benjamin.tissoires@redhat.com>
      Cc: Jean Delvare <jdelvare@suse.com>
      Acked-by: Guenter Roeck <linux@roeck-us.net>	[watchdog/* hwmon/*]
      Cc: Rudolf Marek <r.marek@assembler.cz>
      Cc: Dmitry Torokhov <dmitry.torokhov@gmail.com>
      Cc: Karsten Keil <isdn@linux-pingi.de>
      Cc: Jacek Anaszewski <jacek.anaszewski@gmail.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
      Cc: Kurt Schwemmer <kurt.schwemmer@microsemi.com>
      Acked-by: Logan Gunthorpe <logang@deltatee.com> [drivers/pci/switch/switchtec]
      Acked-by: Bjorn Helgaas <bhelgaas@google.com> [drivers/pci/switch/switchtec]
      Cc: Benson Leung <bleung@chromium.org>
      Acked-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> [platform/chrome]
      Cc: Alessandro Zummo <a.zummo@towertech.it>
      Acked-by: Alexandre Belloni <alexandre.belloni@bootlin.com> [rtc/*]
      Cc: Mark Brown <broonie@kernel.org>
      Cc: Wim Van Sebroeck <wim@linux-watchdog.org>
      Cc: Florian Fainelli <f.fainelli@gmail.com>
      Cc: bcm-kernel-feedback-list@broadcom.com
      Cc: Wan ZongShun <mcuos.com@gmail.com>
      Cc: Zwane Mwaikambo <zwanem@gmail.com>
      Cc: Marek Lindner <mareklindner@neomailbox.ch>
      Cc: Simon Wunderlich <sw@simonwunderlich.de>
      Cc: Antonio Quartulli <a@unstable.cc>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Johannes Berg <johannes@sipsolutions.net>
      Cc: Jaroslav Kysela <perex@perex.cz>
      Cc: Takashi Iwai <tiwai@suse.com>
      Signed-off-by: NKirill Smelkov <kirr@nexedi.com>
      c5bf68fe
    • S
      x86/fpu: Fault-in user stack if copy_fpstate_to_sigframe() fails · d9c9ce34
      Sebastian Andrzej Siewior 提交于
      In the compacted form, XSAVES may save only the XMM+SSE state but skip
      FP (x87 state).
      
      This is denoted by header->xfeatures = 6. The fastpath
      (copy_fpregs_to_sigframe()) does that but _also_ initialises the FP
      state (cwd to 0x37f, mxcsr as we do, remaining fields to 0).
      
      The slowpath (copy_xstate_to_user()) leaves most of the FP
      state untouched. Only mxcsr and mxcsr_flags are set due to
      xfeatures_mxcsr_quirk(). Now that XFEATURE_MASK_FP is set
      unconditionally, see
      
        04944b79 ("x86: xsave: set FP, SSE bits in the xsave header in the user sigcontext"),
      
      on return from the signal, random garbage is loaded as the FP state.
      
      Instead of utilizing copy_xstate_to_user(), fault-in the user memory
      and retry the fast path. Ideally, the fast path succeeds on the second
      attempt but may be retried again if the memory is swapped out due
      to memory pressure. If the user memory can not be faulted-in then
      get_user_pages() returns an error so we don't loop forever.
      
      Fault in memory via get_user_pages_unlocked() so
      copy_fpregs_to_sigframe() succeeds without a fault.
      
      Fixes: 69277c98 ("x86/fpu: Always store the registers in copy_fpstate_to_sigframe()")
      Reported-by: NKurt Kanzenbach <kurt.kanzenbach@linutronix.de>
      Suggested-by: NDave Hansen <dave.hansen@intel.com>
      Signed-off-by: NSebastian Andrzej Siewior <bigeasy@linutronix.de>
      Signed-off-by: NBorislav Petkov <bp@suse.de>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jann Horn <jannh@google.com>
      Cc: "linux-mm@kvack.org" <linux-mm@kvack.org>
      Cc: Qian Cai <cai@lca.pw>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: x86-ml <x86@kernel.org>
      Link: https://lkml.kernel.org/r/20190502171139.mqtegctsg35cir2e@linutronix.de
      d9c9ce34
    • N
      x86/mm: Initialize PGD cache during mm initialization · caa84136
      Nadav Amit 提交于
      Poking-mm initialization might require to duplicate the PGD in early
      stage. Initialize the PGD cache earlier to prevent boot failures.
      Reported-by: Nkernel test robot <lkp@intel.com>
      Signed-off-by: NNadav Amit <namit@vmware.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rick Edgecombe <rick.p.edgecombe@intel.com>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Fixes: 4fc19708 ("x86/alternatives: Initialize temporary mm for patching")
      Link: http://lkml.kernel.org/r/20190505011124.39692-1-namit@vmware.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      caa84136
  13. 05 5月, 2019 1 次提交
    • J
      perf/x86/intel: Fix race in intel_pmu_disable_event() · 6f55967a
      Jiri Olsa 提交于
      New race in x86_pmu_stop() was introduced by replacing the
      atomic __test_and_clear_bit() of cpuc->active_mask by separate
      test_bit() and __clear_bit() calls in the following commit:
      
        3966c3fe ("x86/perf/amd: Remove need to check "running" bit in NMI handler")
      
      The race causes panic for PEBS events with enabled callchains:
      
        BUG: unable to handle kernel NULL pointer dereference at 0000000000000000
        ...
        RIP: 0010:perf_prepare_sample+0x8c/0x530
        Call Trace:
         <NMI>
         perf_event_output_forward+0x2a/0x80
         __perf_event_overflow+0x51/0xe0
         handle_pmi_common+0x19e/0x240
         intel_pmu_handle_irq+0xad/0x170
         perf_event_nmi_handler+0x2e/0x50
         nmi_handle+0x69/0x110
         default_do_nmi+0x3e/0x100
         do_nmi+0x11a/0x180
         end_repeat_nmi+0x16/0x1a
        RIP: 0010:native_write_msr+0x6/0x20
        ...
         </NMI>
         intel_pmu_disable_event+0x98/0xf0
         x86_pmu_stop+0x6e/0xb0
         x86_pmu_del+0x46/0x140
         event_sched_out.isra.97+0x7e/0x160
        ...
      
      The event is configured to make samples from PEBS drain code,
      but when it's disabled, we'll go through NMI path instead,
      where data->callchain will not get allocated and we'll crash:
      
                x86_pmu_stop
                  test_bit(hwc->idx, cpuc->active_mask)
                  intel_pmu_disable_event(event)
                  {
                    ...
                    intel_pmu_pebs_disable(event);
                    ...
      
      EVENT OVERFLOW ->  <NMI>
                           intel_pmu_handle_irq
                             handle_pmi_common
         TEST PASSES ->        test_bit(bit, cpuc->active_mask))
                                 perf_event_overflow
                                   perf_prepare_sample
                                   {
                                     ...
                                     if (!(sample_type & __PERF_SAMPLE_CALLCHAIN_EARLY))
                                           data->callchain = perf_callchain(event, regs);
      
               CRASH ->              size += data->callchain->nr;
                                   }
                         </NMI>
                    ...
                    x86_pmu_disable_event(event)
                  }
      
                  __clear_bit(hwc->idx, cpuc->active_mask);
      
      Fixing this by disabling the event itself before setting
      off the PEBS bit.
      Signed-off-by: NJiri Olsa <jolsa@kernel.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: David Arcari <darcari@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Lendacky Thomas <Thomas.Lendacky@amd.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Fixes: 3966c3fe ("x86/perf/amd: Remove need to check "running" bit in NMI handler")
      Link: http://lkml.kernel.org/r/20190504151556.31031-1-jolsa@kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
      6f55967a
  14. 03 5月, 2019 1 次提交