1. 22 2月, 2019 1 次提交
  2. 21 2月, 2019 1 次提交
    • S
      KVM: Call kvm_arch_memslots_updated() before updating memslots · 15248258
      Sean Christopherson 提交于
      kvm_arch_memslots_updated() is at this point in time an x86-specific
      hook for handling MMIO generation wraparound.  x86 stashes 19 bits of
      the memslots generation number in its MMIO sptes in order to avoid
      full page fault walks for repeat faults on emulated MMIO addresses.
      Because only 19 bits are used, wrapping the MMIO generation number is
      possible, if unlikely.  kvm_arch_memslots_updated() alerts x86 that
      the generation has changed so that it can invalidate all MMIO sptes in
      case the effective MMIO generation has wrapped so as to avoid using a
      stale spte, e.g. a (very) old spte that was created with generation==0.
      
      Given that the purpose of kvm_arch_memslots_updated() is to prevent
      consuming stale entries, it needs to be called before the new generation
      is propagated to memslots.  Invalidating the MMIO sptes after updating
      memslots means that there is a window where a vCPU could dereference
      the new memslots generation, e.g. 0, and incorrectly reuse an old MMIO
      spte that was created with (pre-wrap) generation==0.
      
      Fixes: e59dbe09 ("KVM: Introduce kvm_arch_memslots_updated()")
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NSean Christopherson <sean.j.christopherson@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      15248258
  3. 05 2月, 2019 8 次提交
  4. 12 1月, 2019 2 次提交
  5. 06 1月, 2019 2 次提交
  6. 05 1月, 2019 2 次提交
    • J
      mm: treewide: remove unused address argument from pte_alloc functions · 4cf58924
      Joel Fernandes (Google) 提交于
      Patch series "Add support for fast mremap".
      
      This series speeds up the mremap(2) syscall by copying page tables at
      the PMD level even for non-THP systems.  There is concern that the extra
      'address' argument that mremap passes to pte_alloc may do something
      subtle architecture related in the future that may make the scheme not
      work.  Also we find that there is no point in passing the 'address' to
      pte_alloc since its unused.  This patch therefore removes this argument
      tree-wide resulting in a nice negative diff as well.  Also ensuring
      along the way that the enabled architectures do not do anything funky
      with the 'address' argument that goes unnoticed by the optimization.
      
      Build and boot tested on x86-64.  Build tested on arm64.  The config
      enablement patch for arm64 will be posted in the future after more
      testing.
      
      The changes were obtained by applying the following Coccinelle script.
      (thanks Julia for answering all Coccinelle questions!).
      Following fix ups were done manually:
      * Removal of address argument from  pte_fragment_alloc
      * Removal of pte_alloc_one_fast definitions from m68k and microblaze.
      
      // Options: --include-headers --no-includes
      // Note: I split the 'identifier fn' line, so if you are manually
      // running it, please unsplit it so it runs for you.
      
      virtual patch
      
      @pte_alloc_func_def depends on patch exists@
      identifier E2;
      identifier fn =~
      "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
      type T2;
      @@
      
       fn(...
      - , T2 E2
       )
       { ... }
      
      @pte_alloc_func_proto_noarg depends on patch exists@
      type T1, T2, T3, T4;
      identifier fn =~ "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
      @@
      
      (
      - T3 fn(T1, T2);
      + T3 fn(T1);
      |
      - T3 fn(T1, T2, T4);
      + T3 fn(T1, T2);
      )
      
      @pte_alloc_func_proto depends on patch exists@
      identifier E1, E2, E4;
      type T1, T2, T3, T4;
      identifier fn =~
      "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
      @@
      
      (
      - T3 fn(T1 E1, T2 E2);
      + T3 fn(T1 E1);
      |
      - T3 fn(T1 E1, T2 E2, T4 E4);
      + T3 fn(T1 E1, T2 E2);
      )
      
      @pte_alloc_func_call depends on patch exists@
      expression E2;
      identifier fn =~
      "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
      @@
      
       fn(...
      -,  E2
       )
      
      @pte_alloc_macro depends on patch exists@
      identifier fn =~
      "^(__pte_alloc|pte_alloc_one|pte_alloc|__pte_alloc_kernel|pte_alloc_one_kernel)$";
      identifier a, b, c;
      expression e;
      position p;
      @@
      
      (
      - #define fn(a, b, c) e
      + #define fn(a, b) e
      |
      - #define fn(a, b) e
      + #define fn(a) e
      )
      
      Link: http://lkml.kernel.org/r/20181108181201.88826-2-joelaf@google.comSigned-off-by: NJoel Fernandes (Google) <joel@joelfernandes.org>
      Suggested-by: NKirill A. Shutemov <kirill@shutemov.name>
      Acked-by: NKirill A. Shutemov <kirill@shutemov.name>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Julia Lawall <Julia.Lawall@lip6.fr>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: William Kucharski <william.kucharski@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4cf58924
    • M
      fls: change parameter to unsigned int · 3fc2579e
      Matthew Wilcox 提交于
      When testing in userspace, UBSAN pointed out that shifting into the sign
      bit is undefined behaviour.  It doesn't really make sense to ask for the
      highest set bit of a negative value, so just turn the argument type into
      an unsigned int.
      
      Some architectures (eg ppc) already had it declared as an unsigned int,
      so I don't expect too many problems.
      
      Link: http://lkml.kernel.org/r/20181105221117.31828-1-willy@infradead.orgSigned-off-by: NMatthew Wilcox <willy@infradead.org>
      Acked-by: NThomas Gleixner <tglx@linutronix.de>
      Acked-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Cc: <linux-arch@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3fc2579e
  7. 04 1月, 2019 1 次提交
    • L
      Remove 'type' argument from access_ok() function · 96d4f267
      Linus Torvalds 提交于
      Nobody has actually used the type (VERIFY_READ vs VERIFY_WRITE) argument
      of the user address range verification function since we got rid of the
      old racy i386-only code to walk page tables by hand.
      
      It existed because the original 80386 would not honor the write protect
      bit when in kernel mode, so you had to do COW by hand before doing any
      user access.  But we haven't supported that in a long time, and these
      days the 'type' argument is a purely historical artifact.
      
      A discussion about extending 'user_access_begin()' to do the range
      checking resulted this patch, because there is no way we're going to
      move the old VERIFY_xyz interface to that model.  And it's best done at
      the end of the merge window when I've done most of my merges, so let's
      just get this done once and for all.
      
      This patch was mostly done with a sed-script, with manual fix-ups for
      the cases that weren't of the trivial 'access_ok(VERIFY_xyz' form.
      
      There were a couple of notable cases:
      
       - csky still had the old "verify_area()" name as an alias.
      
       - the iter_iov code had magical hardcoded knowledge of the actual
         values of VERIFY_{READ,WRITE} (not that they mattered, since nothing
         really used it)
      
       - microblaze used the type argument for a debug printout
      
      but other than those oddities this should be a total no-op patch.
      
      I tried to fix up all architectures, did fairly extensive grepping for
      access_ok() uses, and the changes are trivial, but I may have missed
      something.  Any missed conversion should be trivially fixable, though.
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      96d4f267
  8. 07 12月, 2018 1 次提交
  9. 30 11月, 2018 2 次提交
  10. 06 11月, 2018 1 次提交
  11. 02 11月, 2018 2 次提交
    • V
      s390/kasan: increase instrumented stack size to 64k · 9fed920e
      Vasily Gorbik 提交于
      Increase kasan instrumented kernel stack size from 32k to 64k. Other
      architectures seems to get away with just doubling kernel stack size under
      kasan, but on s390 this appears to be not enough due to bigger frame size.
      The particular pain point is kasan inlined checks (CONFIG_KASAN_INLINE
      vs CONFIG_KASAN_OUTLINE). With inlined checks one particular case hitting
      stack overflow is fs sync on xfs filesystem:
      
       #0 [9a0681e8]  704 bytes  check_usage at 34b1fc
       #1 [9a0684a8]  432 bytes  check_usage at 34c710
       #2 [9a068658]  1048 bytes  validate_chain at 35044a
       #3 [9a068a70]  312 bytes  __lock_acquire at 3559fe
       #4 [9a068ba8]  440 bytes  lock_acquire at 3576ee
       #5 [9a068d60]  104 bytes  _raw_spin_lock at 21b44e0
       #6 [9a068dc8]  1992 bytes  enqueue_entity at 2dbf72
       #7 [9a069590]  1496 bytes  enqueue_task_fair at 2df5f0
       #8 [9a069b68]  64 bytes  ttwu_do_activate at 28f438
       #9 [9a069ba8]  552 bytes  try_to_wake_up at 298c4c
       #10 [9a069dd0]  168 bytes  wake_up_worker at 23f97c
       #11 [9a069e78]  200 bytes  insert_work at 23fc2e
       #12 [9a069f40]  648 bytes  __queue_work at 2487c0
       #13 [9a06a1c8]  200 bytes  __queue_delayed_work at 24db28
       #14 [9a06a290]  248 bytes  mod_delayed_work_on at 24de84
       #15 [9a06a388]  24 bytes  kblockd_mod_delayed_work_on at 153e2a0
       #16 [9a06a3a0]  288 bytes  __blk_mq_delay_run_hw_queue at 158168c
       #17 [9a06a4c0]  192 bytes  blk_mq_run_hw_queue at 1581a3c
       #18 [9a06a580]  184 bytes  blk_mq_sched_insert_requests at 15a2192
       #19 [9a06a638]  1024 bytes  blk_mq_flush_plug_list at 1590f3a
       #20 [9a06aa38]  704 bytes  blk_flush_plug_list at 1555028
       #21 [9a06acf8]  320 bytes  schedule at 219e476
       #22 [9a06ae38]  760 bytes  schedule_timeout at 21b0aac
       #23 [9a06b130]  408 bytes  wait_for_common at 21a1706
       #24 [9a06b2c8]  360 bytes  xfs_buf_iowait at fa1540
       #25 [9a06b430]  256 bytes  __xfs_buf_submit at fadae6
       #26 [9a06b530]  264 bytes  xfs_buf_read_map at fae3f6
       #27 [9a06b638]  656 bytes  xfs_trans_read_buf_map at 10ac9a8
       #28 [9a06b8c8]  304 bytes  xfs_btree_kill_root at e72426
       #29 [9a06b9f8]  288 bytes  xfs_btree_lookup_get_block at e7bc5e
       #30 [9a06bb18]  624 bytes  xfs_btree_lookup at e7e1a6
       #31 [9a06bd88]  2664 bytes  xfs_alloc_ag_vextent_near at dfa070
       #32 [9a06c7f0]  144 bytes  xfs_alloc_ag_vextent at dff3ca
       #33 [9a06c880]  1128 bytes  xfs_alloc_vextent at e05fce
       #34 [9a06cce8]  584 bytes  xfs_bmap_btalloc at e58342
       #35 [9a06cf30]  1336 bytes  xfs_bmapi_write at e618de
       #36 [9a06d468]  776 bytes  xfs_iomap_write_allocate at ff678e
       #37 [9a06d770]  720 bytes  xfs_map_blocks at f82af8
       #38 [9a06da40]  928 bytes  xfs_writepage_map at f83cd6
       #39 [9a06dde0]  320 bytes  xfs_do_writepage at f85872
       #40 [9a06df20]  1320 bytes  write_cache_pages at 73dfe8
       #41 [9a06e448]  208 bytes  xfs_vm_writepages at f7f892
       #42 [9a06e518]  88 bytes  do_writepages at 73fe6a
       #43 [9a06e570]  872 bytes  __writeback_single_inode at a20cb6
       #44 [9a06e8d8]  664 bytes  writeback_sb_inodes at a23be2
       #45 [9a06eb70]  296 bytes  __writeback_inodes_wb at a242e0
       #46 [9a06ec98]  928 bytes  wb_writeback at a2500e
       #47 [9a06f038]  848 bytes  wb_do_writeback at a260ae
       #48 [9a06f388]  536 bytes  wb_workfn at a28228
       #49 [9a06f5a0]  1088 bytes  process_one_work at 24a234
       #50 [9a06f9e0]  1120 bytes  worker_thread at 24ba26
       #51 [9a06fe40]  104 bytes  kthread at 26545a
       #52 [9a06fea8]             kernel_thread_starter at 21b6b62
      
      To be able to increase the stack size to 64k reuse LLILL instruction
      in __switch_to function to load 64k - STACK_FRAME_OVERHEAD - __PT_SIZE
      (65192) value as unsigned.
      Reported-by: NBenjamin Block <bblock@linux.ibm.com>
      Reviewed-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NVasily Gorbik <gor@linux.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      9fed920e
    • M
      s390/mm: fix mis-accounting of pgtable_bytes · e12e4044
      Martin Schwidefsky 提交于
      In case a fork or a clone system fails in copy_process and the error
      handling does the mmput() at the bad_fork_cleanup_mm label, the
      following warning messages will appear on the console:
      
        BUG: non-zero pgtables_bytes on freeing mm: 16384
      
      The reason for that is the tricks we play with mm_inc_nr_puds() and
      mm_inc_nr_pmds() in init_new_context().
      
      A normal 64-bit process has 3 levels of page table, the p4d level and
      the pud level are folded. On process termination the free_pud_range()
      function in mm/memory.c will subtract 16KB from pgtable_bytes with a
      mm_dec_nr_puds() call, but there actually is not really a pud table.
      
      One issue with this is the fact that pgtable_bytes is usually off
      by a few kilobytes, but the more severe problem is that for a failed
      fork or clone the free_pgtables() function is not called. In this case
      there is no mm_dec_nr_puds() or mm_dec_nr_pmds() that go together with
      the mm_inc_nr_puds() and mm_inc_nr_pmds in init_new_context().
      The pgtable_bytes will be off by 16384 or 32768 bytes and we get the
      BUG message. The message itself is purely cosmetic, but annoying.
      
      To fix this override the mm_pmd_folded, mm_pud_folded and mm_p4d_folded
      function to check for the true size of the address space.
      Reported-by: NLi Wang <liwang@redhat.com>
      Tested-by: NLi Wang <liwang@redhat.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      e12e4044
  12. 31 10月, 2018 1 次提交
  13. 22 10月, 2018 1 次提交
  14. 10 10月, 2018 2 次提交
  15. 09 10月, 2018 13 次提交
新手
引导
客服 返回
顶部