1. 22 2月, 2019 2 次提交
  2. 21 2月, 2019 1 次提交
    • S
      KVM: Call kvm_arch_memslots_updated() before updating memslots · 15248258
      Sean Christopherson 提交于
      kvm_arch_memslots_updated() is at this point in time an x86-specific
      hook for handling MMIO generation wraparound.  x86 stashes 19 bits of
      the memslots generation number in its MMIO sptes in order to avoid
      full page fault walks for repeat faults on emulated MMIO addresses.
      Because only 19 bits are used, wrapping the MMIO generation number is
      possible, if unlikely.  kvm_arch_memslots_updated() alerts x86 that
      the generation has changed so that it can invalidate all MMIO sptes in
      case the effective MMIO generation has wrapped so as to avoid using a
      stale spte, e.g. a (very) old spte that was created with generation==0.
      
      Given that the purpose of kvm_arch_memslots_updated() is to prevent
      consuming stale entries, it needs to be called before the new generation
      is propagated to memslots.  Invalidating the MMIO sptes after updating
      memslots means that there is a window where a vCPU could dereference
      the new memslots generation, e.g. 0, and incorrectly reuse an old MMIO
      spte that was created with (pre-wrap) generation==0.
      
      Fixes: e59dbe09 ("KVM: Introduce kvm_arch_memslots_updated()")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      15248258
  3. 05 2月, 2019 16 次提交
  4. 12 1月, 2019 5 次提交
    • D
      s390/smp: Fix calling smp_call_ipl_cpu() from ipl CPU · 60f1bf29
      David Hildenbrand 提交于
      When calling smp_call_ipl_cpu() from the IPL CPU, we will try to read
      from pcpu_devices->lowcore. However, due to prefixing, that will result
      in reading from absolute address 0 on that CPU. We have to go via the
      actual lowcore instead.
      
      This means that right now, we will read lc->nodat_stack == 0 and
      therfore work on a very wrong stack.
      
      This BUG essentially broke rebooting under QEMU TCG (which will report
      a low address protection exception). And checking under KVM, it is
      also broken under KVM. With 1 VCPU it can be easily triggered.
      
      :/# echo 1 > /proc/sys/kernel/sysrq
      :/# echo b > /proc/sysrq-trigger
      [   28.476745] sysrq: SysRq : Resetting
      [   28.476793] Kernel stack overflow.
      [   28.476817] CPU: 0 PID: 424 Comm: sh Not tainted 5.0.0-rc1+ #13
      [   28.476820] Hardware name: IBM 2964 NE1 716 (KVM/Linux)
      [   28.476826] Krnl PSW : 0400c00180000000 0000000000115c0c (pcpu_delegate+0x12c/0x140)
      [   28.476861]            R:0 T:1 IO:0 EX:0 Key:0 M:0 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3
      [   28.476863] Krnl GPRS: ffffffffffffffff 0000000000000000 000000000010dff8 0000000000000000
      [   28.476864]            0000000000000000 0000000000000000 0000000000ab7090 000003e0006efbf0
      [   28.476864]            000000000010dff8 0000000000000000 0000000000000000 0000000000000000
      [   28.476865]            000000007fffc000 0000000000730408 000003e0006efc58 0000000000000000
      [   28.476887] Krnl Code: 0000000000115bfe: 4170f000            la      %r7,0(%r15)
      [   28.476887]            0000000000115c02: 41f0a000            la      %r15,0(%r10)
      [   28.476887]           #0000000000115c06: e370f0980024        stg     %r7,152(%r15)
      [   28.476887]           >0000000000115c0c: c0e5fffff86e        brasl   %r14,114ce8
      [   28.476887]            0000000000115c12: 41f07000            la      %r15,0(%r7)
      [   28.476887]            0000000000115c16: a7f4ffa8            brc     15,115b66
      [   28.476887]            0000000000115c1a: 0707                bcr     0,%r7
      [   28.476887]            0000000000115c1c: 0707                bcr     0,%r7
      [   28.476901] Call Trace:
      [   28.476902] Last Breaking-Event-Address:
      [   28.476920]  [<0000000000a01c4a>] arch_call_rest_init+0x22/0x80
      [   28.476927] Kernel panic - not syncing: Corrupt kernel stack, can't continue.
      [   28.476930] CPU: 0 PID: 424 Comm: sh Not tainted 5.0.0-rc1+ #13
      [   28.476932] Hardware name: IBM 2964 NE1 716 (KVM/Linux)
      [   28.476932] Call Trace:
      
      Fixes: 2f859d0d ("s390/smp: reduce size of struct pcpu")
      Cc: stable@vger.kernel.org # 4.0+
      Reported-by: NCornelia Huck <cohuck@redhat.com>
      Signed-off-by: NDavid Hildenbrand <david@redhat.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      60f1bf29
    • V
      s390/vdso: correct vdso mapping for compat tasks · 190f056f
      Vasily Gorbik 提交于
      While "s390/vdso: avoid 64-bit vdso mapping for compat tasks" fixed
      64-bit vdso mapping for compat tasks under gdb it introduced another
      problem. "compat_mm" flag is not inherited during fork and when
      31-bit process forks a child (but does not perform exec) it ends up
      with 64-bit vdso. To address that, init_new_context (which is called
      during fork and exec) now initialize compat_mm based on thread TIF_31BIT
      flag. Later compat_mm is adjusted in arch_setup_additional_pages, which
      is called during exec.
      
      Fixes: d1befa65 ("s390/vdso: avoid 64-bit vdso mapping for compat tasks")
      Reported-by: NStefan Liebler <stli@linux.ibm.com>
      Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Cc: <stable@vger.kernel.org> # v4.20+
      Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      190f056f
    • G
      s390/smp: fix CPU hotplug deadlock with CPU rescan · b7cb707c
      Gerald Schaefer 提交于
      smp_rescan_cpus() is called without the device_hotplug_lock, which can lead
      to a dedlock when a new CPU is found and immediately set online by a udev
      rule.
      
      This was observed on an older kernel version, where the cpu_hotplug_begin()
      loop was still present, and it resulted in hanging chcpu and systemd-udev
      processes. This specific deadlock will not show on current kernels. However,
      there may be other possible deadlocks, and since smp_rescan_cpus() can still
      trigger a CPU hotplug operation, the device_hotplug_lock should be held.
      
      For reference, this was the deadlock with the old cpu_hotplug_begin() loop:
      
              chcpu (rescan)                       systemd-udevd
      
       echo 1 > /sys/../rescan
       -> smp_rescan_cpus()
       -> (*) get_online_cpus()
          (increases refcount)
       -> smp_add_present_cpu()
          (new CPU found)
       -> register_cpu()
       -> device_add()
       -> udev "add" event triggered -----------> udev rule sets CPU online
                                               -> echo 1 > /sys/.../online
                                               -> lock_device_hotplug_sysfs()
                                                  (this is missing in rescan path)
                                               -> device_online()
                                               -> (**) device_lock(new CPU dev)
                                               -> cpu_up()
                                               -> cpu_hotplug_begin()
                                                  (loops until refcount == 0)
                                                  -> deadlock with (*)
       -> bus_probe_device()
       -> device_attach()
       -> device_lock(new CPU dev)
          -> deadlock with (**)
      
      Fix this by taking the device_hotplug_lock in the CPU rescan path.
      
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NGerald Schaefer <gerald.schaefer@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      b7cb707c
    • M
      s390/mm: always force a load of the primary ASCE on context switch · a3866208
      Martin Schwidefsky 提交于
      The ASCE of an mm_struct can be modified after a task has been created,
      e.g. via crst_table_downgrade for a compat process. The active_mm logic
      to avoid the switch_mm call if the next task is a kernel thread can
      lead to a situation where switch_mm is called where 'prev == next' is
      true but 'prev->context.asce == next->context.asce' is not.
      
      This can lead to a situation where a CPU uses the outdated ASCE to run
      a task. The result can be a crash, endless loops and really subtle
      problem due to TLBs being created with an invalid ASCE.
      
      Cc: stable@kernel.org # v3.15+
      Fixes: 53e857f3 ("s390/mm,tlb: race of lazy TLB flush vs. recreation")
      Reported-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      a3866208
    • C
      s390/early: improve machine detection · 03aa047e
      Christian Borntraeger 提交于
      Right now the early machine detection code check stsi 3.2.2 for "KVM"
      and set MACHINE_IS_VM if this is different. As the console detection
      uses diagnose 8 if MACHINE_IS_VM returns true this will crash Linux
      early for any non z/VM system that sets a different value than KVM.
      So instead of assuming z/VM, do not set any of MACHINE_IS_LPAR,
      MACHINE_IS_VM, or MACHINE_IS_KVM.
      
      CC: stable@vger.kernel.org
      Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      03aa047e
  5. 06 1月, 2019 5 次提交
  6. 05 1月, 2019 2 次提交
    • J
      mm: treewide: remove unused address argument from pte_alloc functions · 4cf58924
      Joel Fernandes (Google) 提交于
      Patch series "Add support for fast mremap".
      
      This series speeds up the mremap(2) syscall by copying page tables at
      the PMD level even for non-THP systems.  There is concern that the extra
      'address' argument that mremap passes to pte_alloc may do something
      subtle architecture related in the future that may make the scheme not
      work.  Also we find that there is no point in passing the 'address' to
      pte_alloc since its unused.  This patch therefore removes this argument
      tree-wide resulting in a nice negative diff as well.  Also ensuring
      along the way that the enabled architectures do not do anything funky
      with the 'address' argument that goes unnoticed by the optimization.
      
      Build and boot tested on x86-64.  Build tested on arm64.  The config
      enablement patch for arm64 will be posted in the future after more
      testing.
      
      The changes were obtained by applying the following Coccinelle script.
      (thanks Julia for answering all Coccinelle questions!).
      Following fix ups were done manually:
      * Removal of address argument from  pte_fragment_alloc
      * Removal of pte_alloc_one_fast definitions from m68k and microblaze.
      
      // Options: --include-headers --no-includes
      // Note: I split the 'identifier fn' line, so if you are manually
      // running it, please unsplit it so it runs for you.
      
      virtual patch
      
      @pte_alloc_func_def depends on patch exists@
      identifier E2;
      identifier fn =~
      "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
      type T2;
      @@
      
       fn(...
      - , T2 E2
       )
       { ... }
      
      @pte_alloc_func_proto_noarg depends on patch exists@
      type T1, T2, T3, T4;
      identifier fn =~ "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
      @@
      
      (
      - T3 fn(T1, T2);
      + T3 fn(T1);
      |
      - T3 fn(T1, T2, T4);
      + T3 fn(T1, T2);
      )
      
      @pte_alloc_func_proto depends on patch exists@
      identifier E1, E2, E4;
      type T1, T2, T3, T4;
      identifier fn =~
      "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
      @@
      
      (
      - T3 fn(T1 E1, T2 E2);
      + T3 fn(T1 E1);
      |
      - T3 fn(T1 E1, T2 E2, T4 E4);
      + T3 fn(T1 E1, T2 E2);
      )
      
      @pte_alloc_func_call depends on patch exists@
      expression E2;
      identifier fn =~
      "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
      @@
      
       fn(...
      -,  E2
       )
      
      @pte_alloc_macro depends on patch exists@
      identifier fn =~
      "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
      identifier a, b, c;
      expression e;
      position p;
      @@
      
      (
      - #define fn(a, b, c) e
      + #define fn(a, b) e
      |
      - #define fn(a, b) e
      + #define fn(a) e
      )
      
      Link: http://lkml.kernel.org/r/20181108181201.88826-2-joelaf@google.comSigned-off-by: NJoel Fernandes (Google) <joel@joelfernandes.org>
      Suggested-by: NKirill A. Shutemov <kirill@shutemov.name>
      Acked-by: NKirill A. Shutemov <kirill@shutemov.name>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Julia Lawall <Julia.Lawall@lip6.fr>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: William Kucharski <william.kucharski@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4cf58924
    • M
      fls: change parameter to unsigned int · 3fc2579e
      Matthew Wilcox 提交于
      When testing in userspace, UBSAN pointed out that shifting into the sign
      bit is undefined behaviour.  It doesn't really make sense to ask for the
      highest set bit of a negative value, so just turn the argument type into
      an unsigned int.
      
      Some architectures (eg ppc) already had it declared as an unsigned int,
      so I don't expect too many problems.
      
      Link: http://lkml.kernel.org/r/20181105221117.31828-1-willy@infradead.orgSigned-off-by: NMatthew Wilcox <willy@infradead.org>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3fc2579e
  7. 04 1月, 2019 1 次提交
    • L
      Remove 'type' argument from access_ok() function · 96d4f267
      Linus Torvalds 提交于
      Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
      of the user address range verification function since we got rid of the
      old racy i386-only code to walk page tables by hand.
      
      It existed because the original 80386 would not honor the write protect
      bit when in kernel mode, so you had to do COW by hand before doing any
      user access.  But we haven't supported that in a long time, and these
      days the 'type' argument is a purely historical artifact.
      
      A discussion about extending 'user_access_begin()' to do the range
      checking resulted this patch, because there is no way we're going to
      move the old VERIFY_xyz interface to that model.  And it's best done at
      the end of the merge window when I've done most of my merges, so let's
      just get this done once and for all.
      
      This patch was mostly done with a sed-script, with manual fix-ups for
      the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.
      
      There were a couple of notable cases:
      
       - csky still had the old "verify_area()" name as an alias.
      
       - the iter_iov code had magical hardcoded knowledge of the actual
         values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
         really used it)
      
       - microblaze used the type argument for a debug printout
      
      but other than those oddities this should be a total no-op patch.
      
      I tried to fix up all architectures, did fairly extensive grepping for
      access_ok() uses, and the changes are trivial, but I may have missed
      something.  Any missed conversion should be trivially fixable, though.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      96d4f267
  8. 02 1月, 2019 1 次提交
  9. 29 12月, 2018 3 次提交
  10. 21 12月, 2018 1 次提交
  11. 20 12月, 2018 1 次提交
    • C
      dma-mapping: zero memory returned from dma_alloc_* · 518a2f19
      Christoph Hellwig 提交于
      If we want to map memory from the DMA allocator to userspace it must be
      zeroed at allocation time to prevent stale data leaks.   We already do
      this on most common architectures, but some architectures don't do this
      yet, fix them up, either by passing GFP_ZERO when we use the normal page
      allocator or doing a manual memset otherwise.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> [m68k]
      Acked-by: Sam Ravnborg <sam@ravnborg.org> [sparc]
      518a2f19
  12. 18 12月, 2018 2 次提交
新手
引导
客服 返回
顶部