1. 04 5月, 2022 1 次提交
  2. 30 4月, 2022 1 次提交
    • P
      KVM: fix bad user ABI for KVM_EXIT_SYSTEM_EVENT · d495f942
      Paolo Bonzini 提交于
      When KVM_EXIT_SYSTEM_EVENT was introduced, it included a flags
      member that at the time was unused.  Unfortunately this extensibility
      mechanism has several issues:
      
      - x86 is not writing the member, so it would not be possible to use it
        on x86 except for new events
      
      - the member is not aligned to 64 bits, so the definition of the
        uAPI struct is incorrect for 32- on 64-bit userspace.  This is a
        problem for RISC-V, which supports CONFIG_KVM_COMPAT, but fortunately
        usage of flags was only introduced in 5.18.
      
      Since padding has to be introduced, place a new field in there
      that tells if the flags field is valid.  To allow further extensibility,
      in fact, change flags to an array of 16 values, and store how many
      of the values are valid.  The availability of the new ndata field
      is tied to a system capability; all architectures are changed to
      fill in the field.
      
      To avoid breaking compilation of userspace that was using the flags
      field, provide a userspace-only union to overlap flags with data[0].
      The new field is placed at the same offset for both 32- and 64-bit
      userspace.
      
      Cc: Will Deacon <will@kernel.org>
      Cc: Marc Zyngier <maz@kernel.org>
      Cc: Peter Gonda <pgonda@google.com>
      Cc: Sean Christopherson <seanjc@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      Reported-by: Nkernel test robot <lkp@intel.com>
      Message-Id: <20220422103013.34832-1-pbonzini@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      d495f942
  3. 28 4月, 2022 1 次提交
  4. 18 4月, 2022 1 次提交
  5. 11 4月, 2022 1 次提交
    • J
      io_uring: flag the fact that linked file assignment is sane · c4212f3e
      Jens Axboe 提交于
      Give applications a way to tell if the kernel supports sane linked files,
      as in files being assigned at the right time to be able to reliably
      do <open file direct into slot X><read file from slot X> while using
      IOSQE_IO_LINK to order them.
      
      Not really a bug fix, but flag it as such so that it gets pulled in with
      backports of the deferred file assignment.
      
      Fixes: 6bf9c47a ("io_uring: defer file assignment")
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      c4212f3e
  6. 04 4月, 2022 1 次提交
  7. 03 4月, 2022 1 次提交
  8. 02 4月, 2022 2 次提交
  9. 01 4月, 2022 1 次提交
  10. 30 3月, 2022 1 次提交
  11. 29 3月, 2022 5 次提交
  12. 25 3月, 2022 1 次提交
    • J
      mm: madvise: MADV_DONTNEED_LOCKED · 9457056a
      Johannes Weiner 提交于
      MADV_DONTNEED historically rejects mlocked ranges, but with MLOCK_ONFAULT
      and MCL_ONFAULT allowing to mlock without populating, there are valid use
      cases for depopulating locked ranges as well.
      
      Users mlock memory to protect secrets.  There are allocators for secure
      buffers that want in-use memory generally mlocked, but cleared and
      invalidated memory to give up the physical pages.  This could be done with
      explicit munlock -> mlock calls on free -> alloc of course, but that adds
      two unnecessary syscalls, heavy mmap_sem write locks, vma splits and
      re-merges - only to get rid of the backing pages.
      
      Users also mlockall(MCL_ONFAULT) to suppress sustained paging, but are
      okay with on-demand initial population.  It seems valid to selectively
      free some memory during the lifetime of such a process, without having to
      mess with its overall policy.
      
      Why add a separate flag? Isn't this a pretty niche usecase?
      
      - MADV_DONTNEED has been bailing on locked vmas forever. It's at least
        conceivable that someone, somewhere is relying on mlock to protect
        data from perhaps broader invalidation calls. Changing this behavior
        now could lead to quiet data corruption.
      
      - It also clarifies expectations around MADV_FREE and maybe
        MADV_REMOVE. It avoids the situation where one quietly behaves
        different than the others. MADV_FREE_LOCKED can be added later.
      
      - The combination of mlock() and madvise() in the first place is
        probably niche. But where it happens, I'd say that dropping pages
        from a locked region once they don't contain secrets or won't page
        anymore is much saner than relying on mlock to protect memory from
        speculative or errant invalidation calls. It's just that we can't
        change the default behavior because of the two previous points.
      
      Given that, an explicit new flag seems to make the most sense.
      
      [hannes@cmpxchg.org: fix mips build]
      
      Link: https://lkml.kernel.org/r/20220304171912.305060-1-hannes@cmpxchg.orgSigned-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: NShakeel Butt <shakeelb@google.com>
      Acked-by: NVlastimil Babka <vbabka@suse.cz>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9457056a
  13. 24 3月, 2022 4 次提交
  14. 23 3月, 2022 1 次提交
    • N
      userfaultfd: provide unmasked address on page-fault · 824ddc60
      Nadav Amit 提交于
      Userfaultfd is supposed to provide the full address (i.e., unmasked) of
      the faulting access back to userspace.  However, that is not the case for
      quite some time.
      
      Even running "userfaultfd_demo" from the userfaultfd man page provides the
      wrong output (and contradicts the man page).  Notice that
      "UFFD_EVENT_PAGEFAULT event" shows the masked address (7fc5e30b3000) and
      not the first read address (0x7fc5e30b300f).
      
      	Address returned by mmap() = 0x7fc5e30b3000
      
      	fault_handler_thread():
      	    poll() returns: nready = 1; POLLIN = 1; POLLERR = 0
      	    UFFD_EVENT_PAGEFAULT event: flags = 0; address = 7fc5e30b3000
      		(uffdio_copy.copy returned 4096)
      	Read address 0x7fc5e30b300f in main(): A
      	Read address 0x7fc5e30b340f in main(): A
      	Read address 0x7fc5e30b380f in main(): A
      	Read address 0x7fc5e30b3c0f in main(): A
      
      The exact address is useful for various reasons and specifically for
      prefetching decisions.  If it is known that the memory is populated by
      certain objects whose size is not page-aligned, then based on the faulting
      address, the uffd-monitor can decide whether to prefetch and prefault the
      adjacent page.
      
      This bug has been for quite some time in the kernel: since commit
      1a29d85e ("mm: use vmf->address instead of of vmf->virtual_address")
      vmf->virtual_address"), which dates back to 2016.  A concern has been
      raised that existing userspace application might rely on the old/wrong
      behavior in which the address is masked.  Therefore, it was suggested to
      provide the masked address unless the user explicitly asks for the exact
      address.
      
      Add a new userfaultfd feature UFFD_FEATURE_EXACT_ADDRESS to direct
      userfaultfd to provide the exact address.  Add a new "real_address" field
      to vmf to hold the unmasked address.  Provide the address to userspace
      accordingly.
      
      Initialize real_address in various code-paths to be consistent with
      address, even when it is not used, to be on the safe side.
      
      [namit@vmware.com: initialize real_address on all code paths, per Jan]
        Link: https://lkml.kernel.org/r/20220226022655.350562-1-namit@vmware.com
      [akpm@linux-foundation.org: fix typo in comment, per Jan]
      
      Link: https://lkml.kernel.org/r/20220218041003.3508-1-namit@vmware.comSigned-off-by: NNadav Amit <namit@vmware.com>
      Acked-by: NPeter Xu <peterx@redhat.com>
      Reviewed-by: NDavid Hildenbrand <david@redhat.com>
      Acked-by: NMike Rapoport <rppt@linux.ibm.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      824ddc60
  15. 21 3月, 2022 2 次提交
  16. 18 3月, 2022 12 次提交
  17. 17 3月, 2022 1 次提交
  18. 16 3月, 2022 1 次提交
  19. 14 3月, 2022 2 次提交
反馈
建议
客服 返回
顶部