1. 27 5月, 2021 13 次提交
    • A
      KVM: selftests: refactor vm_mem_backing_src_type flags · b3784bc2
      Axel Rasmussen 提交于
      Each struct vm_mem_backing_src_alias has a flags field, which denotes
      the flags used to mmap() an area of that type. Previously, this field
      never included MAP_PRIVATE | MAP_ANONYMOUS, because
      vm_userspace_mem_region_add assumed that *all* types would always use
      those flags, and so it hardcoded them.
      
      In a follow-up commit, we'll add a new type: shmem. Areas of this type
      must not have MAP_PRIVATE | MAP_ANONYMOUS, and instead they must have
      MAP_SHARED.
      
      So, refactor things. Make it so that the flags field of
      struct vm_mem_backing_src_alias really is a complete set of flags, and
      don't add in any extras in vm_userspace_mem_region_add. This will let us
      easily tack on shmem.
      Signed-off-by: NAxel Rasmussen <axelrasmussen@google.com>
      Message-Id: <20210519200339.829146-7-axelrasmussen@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      b3784bc2
    • A
      KVM: selftests: allow different backing source types · 0368c2c1
      Axel Rasmussen 提交于
      Add an argument which lets us specify a different backing memory type
      for the test. The default is just to use anonymous, matching existing
      behavior.
      
      This is in preparation for testing UFFD minor faults. For that, we'll
      need to use a new backing memory type which is setup with MAP_SHARED.
      Signed-off-by: NAxel Rasmussen <axelrasmussen@google.com>
      Message-Id: <20210519200339.829146-6-axelrasmussen@google.com>
      Reviewed-by: NBen Gardon <bgardon@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      0368c2c1
    • A
      KVM: selftests: compute correct demand paging size · 32ffa4f7
      Axel Rasmussen 提交于
      This is a preparatory commit needed before we can use different kinds of
      backing pages for guest memory.
      
      Previously, we used perf_test_args.host_page_size, which is the host's
      native page size (commonly 4K). For VM_MEM_SRC_ANONYMOUS this turns out
      to be okay, but in a follow-up commit we want to allow using different
      kinds of backing memory.
      
      Take VM_MEM_SRC_ANONYMOUS_HUGETLB for example. Without this change, if
      we used that backing page type, when we issued a UFFDIO_COPY ioctl we'd
      only do so with 4K, rather than the full 2M of a backing hugepage. In
      this case, UFFDIO_COPY returns -EINVAL (__mcopy_atomic_hugetlb checks
      the size).
      Signed-off-by: NAxel Rasmussen <axelrasmussen@google.com>
      Message-Id: <20210519200339.829146-5-axelrasmussen@google.com>
      Reviewed-by: NBen Gardon <bgardon@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      32ffa4f7
    • A
      KVM: selftests: simplify setup_demand_paging error handling · 25408e5a
      Axel Rasmussen 提交于
      A small cleanup. Our caller writes:
      
        r = setup_demand_paging(...);
        if (r < 0) exit(-r);
      
      Since we're just going to exit anyway, instead of returning an error we
      can just re-use TEST_ASSERT. This makes the caller simpler, as well as
      the function itself - no need to write our branches, etc.
      Signed-off-by: NAxel Rasmussen <axelrasmussen@google.com>
      Message-Id: <20210519200339.829146-3-axelrasmussen@google.com>
      Reviewed-by: NBen Gardon <bgardon@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      25408e5a
    • D
      KVM: selftests: Print a message if /dev/kvm is missing · 2aab4b35
      David Matlack 提交于
      If a KVM selftest is run on a machine without /dev/kvm, it will exit
      silently. Make it easy to tell what's happening by printing an error
      message.
      
      Opportunistically consolidate all codepaths that open /dev/kvm into a
      single function so they all print the same message.
      
      This slightly changes the semantics of vm_is_unrestricted_guest() by
      changing a TEST_ASSERT() to exit(KSFT_SKIP). However
      vm_is_unrestricted_guest() is only called in one place
      (x86_64/mmio_warning_test.c) and that is to determine if the test should
      be skipped or not.
      Signed-off-by: NDavid Matlack <dmatlack@google.com>
      Message-Id: <20210511202120.1371800-1-dmatlack@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      2aab4b35
    • A
      KVM: selftests: trivial comment/logging fixes · c887d6a1
      Axel Rasmussen 提交于
      Some trivial fixes I found while touching related code in this series,
      factored out into a separate commit for easier reviewing:
      
      - s/gor/got/ and add a newline in demand_paging_test.c
      - s/backing_src/src_type/ in a comment to be consistent with the real
        function signature in kvm_util.c
      Signed-off-by: NAxel Rasmussen <axelrasmussen@google.com>
      Message-Id: <20210519200339.829146-2-axelrasmussen@google.com>
      Reviewed-by: NBen Gardon <bgardon@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      c887d6a1
    • D
      KVM: selftests: Fix hang in hardware_disable_test · a10453c0
      David Matlack 提交于
      If /dev/kvm is not available then hardware_disable_test will hang
      indefinitely because the child process exits before posting to the
      semaphore for which the parent is waiting.
      
      Fix this by making the parent periodically check if the child has
      exited. We have to be careful to forward the child's exit status to
      preserve a KSFT_SKIP status.
      
      I considered just checking for /dev/kvm before creating the child
      process, but there are so many other reasons why the child could exit
      early that it seemed better to handle that as general case.
      
      Tested:
      
      $ ./hardware_disable_test
      /dev/kvm not available, skipping test
      $ echo $?
      4
      $ modprobe kvm_intel
      $ ./hardware_disable_test
      $ echo $?
      0
      Signed-off-by: NDavid Matlack <dmatlack@google.com>
      Message-Id: <20210514230521.2608768-1-dmatlack@google.com>
      Reviewed-by: NAndrew Jones <drjones@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a10453c0
    • D
      KVM: selftests: Ignore CPUID.0DH.1H in get_cpuid_test · 50bc913d
      David Matlack 提交于
      Similar to CPUID.0DH.0H this entry depends on the vCPU's XCR0 register
      and IA32_XSS MSR. Since this test does not control for either before
      assigning the vCPU's CPUID, these entries will not necessarily match
      the supported CPUID exposed by KVM.
      
      This fixes get_cpuid_test on Cascade Lake CPUs.
      Suggested-by: NJim Mattson <jmattson@google.com>
      Signed-off-by: NDavid Matlack <dmatlack@google.com>
      Message-Id: <20210519211345.3944063-1-dmatlack@google.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      50bc913d
    • D
      KVM: selftests: Fix 32-bit truncation of vm_get_max_gfn() · ef4c9f4f
      David Matlack 提交于
      vm_get_max_gfn() casts vm->max_gfn from a uint64_t to an unsigned int,
      which causes the upper 32-bits of the max_gfn to get truncated.
      
      Nobody noticed until now likely because vm_get_max_gfn() is only used
      as a mechanism to create a memslot in an unused region of the guest
      physical address space (the top), and the top of the 32-bit physical
      address space was always good enough.
      
      This fix reveals a bug in memslot_modification_stress_test which was
      trying to create a dummy memslot past the end of guest physical memory.
      Fix that by moving the dummy memslot lower.
      
      Fixes: 52200d0d ("KVM: selftests: Remove duplicate guest mode handling")
      Reviewed-by: NVenkatesh Srinivas <venkateshs@chromium.org>
      Signed-off-by: NDavid Matlack <dmatlack@google.com>
      Message-Id: <20210521173828.1180619-1-dmatlack@google.com>
      Reviewed-by: NAndrew Jones <drjones@redhat.com>
      Reviewed-by: NPeter Xu <peterx@redhat.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      ef4c9f4f
    • M
      KVM: selftests: add a memslot-related performance benchmark · cad347fa
      Maciej S. Szmigiero 提交于
      This benchmark contains the following tests:
      * Map test, where the host unmaps guest memory while the guest writes to
      it (maps it).
      
      The test is designed in a way to make the unmap operation on the host
      take a negligible amount of time in comparison with the mapping
      operation in the guest.
      
      The test area is actually split in two: the first half is being mapped
      by the guest while the second half in being unmapped by the host.
      Then a guest <-> host sync happens and the areas are reversed.
      
      * Unmap test which is broadly similar to the above map test, but it is
      designed in an opposite way: to make the mapping operation in the guest
      take a negligible amount of time in comparison with the unmap operation
      on the host.
      This test is available in two variants: with per-page unmap operation
      or a chunked one (using 2 MiB chunk size).
      
      * Move active area test which involves moving the last (highest gfn)
      memslot a bit back and forth on the host while the guest is
      concurrently writing around the area being moved (including over the
      moved memslot).
      
      * Move inactive area test which is similar to the previous move active
      area test, but now guest writes all happen outside of the area being
      moved.
      
      * Read / write test in which the guest writes to the beginning of each
      page of the test area while the host writes to the middle of each such
      page.
      Then each side checks the values the other side has written.
      This particular test is not expected to give different results depending
      on particular memslots implementation, it is meant as a rough sanity
      check and to provide insight on the spread of test results expected.
      
      Each test performs its operation in a loop until a test period ends
      (this is 5 seconds by default, but it is configurable).
      Then the total count of loops done is divided by the actual elapsed
      time to give the test result.
      
      The tests have a configurable memslot cap with the "-s" test option, by
      default the system maximum is used.
      Each test is repeated a particular number of times (by default 20
      times), the best result achieved is printed.
      
      The test memory area is divided equally between memslots, the reminder
      is added to the last memslot.
      The test area size does not depend on the number of memslots in use.
      
      The tests also measure the time that it took to add all these memslots.
      The best result from the tests that use the whole test area is printed
      after all the requested tests are done.
      
      In general, these tests are designed to use as much memory as possible
      (within reason) while still doing 100+ loops even on high memslot counts
      with the default test length.
      Increasing the test runtime makes it increasingly more likely that some
      event will happen on the system during the test run, which might lower
      the test result.
      Signed-off-by: NMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
      Reviewed-by: NAndrew Jones <drjones@redhat.com>
      Message-Id: <8d31bb3d92bc8fa33a9756fa802ee14266ab994e.1618253574.git.maciej.szmigiero@oracle.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      cad347fa
    • M
      KVM: selftests: Keep track of memslots more efficiently · 22721a56
      Maciej S. Szmigiero 提交于
      The KVM selftest framework was using a simple list for keeping track of
      the memslots currently in use.
      This resulted in lookups and adding a single memslot being O(n), the
      later due to linear scanning of the existing memslot set to check for
      the presence of any conflicting entries.
      
      Before this change, benchmarking high count of memslots was more or less
      impossible as pretty much all the benchmark time was spent in the
      selftest framework code.
      
      We can simply use a rbtree for keeping track of both of gfn and hva.
      We don't need an interval tree for hva here as we can't have overlapping
      memslots because we allocate a completely new memory chunk for each new
      memslot.
      Signed-off-by: NMaciej S. Szmigiero <maciej.szmigiero@oracle.com>
      Reviewed-by: NAndrew Jones <drjones@redhat.com>
      Message-Id: <b12749d47ee860468240cf027412c91b76dbe3db.1618253574.git.maciej.szmigiero@oracle.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      22721a56
    • P
      selftests: kvm: fix potential issue with ELF loading · a13534d6
      Paolo Bonzini 提交于
      vm_vaddr_alloc() sets up GVA to GPA mapping page by page; therefore, GPAs
      may not be continuous if same memslot is used for data and page table allocation.
      
      kvm_vm_elf_load() however expects a continuous range of HVAs (and thus GPAs)
      because it does not try to read file data page by page.  Fix this mismatch
      by allocating memory in one step.
      Reported-by: NZhenzhong Duan <zhenzhong.duan@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      a13534d6
    • Z
      selftests: kvm: make allocation of extra memory take effect · 39fe2fc9
      Zhenzhong Duan 提交于
      The extra memory pages is missed to be allocated during VM creating.
      perf_test_util and kvm_page_table_test use it to alloc extra memory
      currently.
      
      Fix it by adding extra_mem_pages to the total memory calculation before
      allocate.
      Signed-off-by: NZhenzhong Duan <zhenzhong.duan@intel.com>
      Message-Id: <20210512043107.30076-1-zhenzhong.duan@intel.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      39fe2fc9
  2. 07 5月, 2021 7 次提交
  3. 06 5月, 2021 5 次提交
    • P
      selftests/vm: gup_test: test faulting in kernel, and verify pinnable pages · e44605a8
      Pavel Tatashin 提交于
      When pages are pinned they can be faulted in userland and migrated, and
      they can be faulted right in kernel without migration.
      
      In either case, the pinned pages must end-up being pinnable (not
      movable).
      
      Add a new test to gup_test, to help verify that the gup/pup
      (get_user_pages() / pin_user_pages()) behavior with respect to pinnable
      and movable pages is reasonable and correct.  Specifically, provide a
      way to:
      
      1) Verify that only "pinnable" pages are pinned.  This is checked
         automatically for you.
      
      2) Verify that gup/pup performance is reasonable.  This requires
         comparing benchmarks between doing gup/pup on pages that have been
         pre-faulted in from user space, vs.  doing gup/pup on pages that are
         not faulted in until gup/pup time (via FOLL_TOUCH).  This decision is
         controlled with the new -z command line option.
      
      Link: https://lkml.kernel.org/r/20210215161349.246722-15-pasha.tatashin@soleen.comSigned-off-by: NPavel Tatashin <pasha.tatashin@soleen.com>
      Reviewed-by: NJohn Hubbard <jhubbard@nvidia.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: James Morris <jmorris@namei.org>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sasha Levin <sashal@kernel.org>
      Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: Tyler Hicks <tyhicks@linux.microsoft.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e44605a8
    • P
      selftests/vm: gup_test: fix test flag · 79dbf135
      Pavel Tatashin 提交于
      In gup_test both gup_flags and test_flags use the same flags field.
      This is broken.
      
      Farther, in the actual gup_test.c all the passed gup_flags are erased
      and unconditionally replaced with FOLL_WRITE.
      
      Which means that test_flags are ignored, and code like this always
      performs pin dump test:
      
      155  			if (gup->flags & GUP_TEST_FLAG_DUMP_PAGES_USE_PIN)
      156  				nr = pin_user_pages(addr, nr, gup->flags,
      157  						    pages + i, NULL);
      158  			else
      159  				nr = get_user_pages(addr, nr, gup->flags,
      160  						    pages + i, NULL);
      161  			break;
      
      Add a new test_flags field, to allow raw gup_flags to work.  Add a new
      subcommand for DUMP_USER_PAGES_TEST to specify that pin test should be
      performed.
      
      Remove unconditional overwriting of gup_flags via FOLL_WRITE.  But,
      preserve the previous behaviour where FOLL_WRITE was the default flag,
      and add a new option "-W" to unset FOLL_WRITE.
      
      Rename flags with gup_flags.
      
      With the fix, dump works like this:
      
        root@virtme:/# gup_test  -c
        ---- page #0, starting from user virt addr: 0x7f8acb9e4000
        page:00000000d3d2ee27 refcount:2 mapcount:1 mapping:0000000000000000
        index:0x0 pfn:0x100bcf
        anon flags: 0x300000000080016(referenced|uptodate|lru|swapbacked)
        raw: 0300000000080016 ffffd0e204021608 ffffd0e208df2e88 ffff8ea04243ec61
        raw: 0000000000000000 0000000000000000 0000000200000000 0000000000000000
        page dumped because: gup_test: dump_pages() test
        DUMP_USER_PAGES_TEST: done
      
        root@virtme:/# gup_test  -c -p
        ---- page #0, starting from user virt addr: 0x7fd19701b000
        page:00000000baed3c7d refcount:1025 mapcount:1 mapping:0000000000000000
        index:0x0 pfn:0x108008
        anon flags: 0x300000000080014(uptodate|lru|swapbacked)
        raw: 0300000000080014 ffffd0e204200188 ffffd0e205e09088 ffff8ea04243ee71
        raw: 0000000000000000 0000000000000000 0000040100000000 0000000000000000
        page dumped because: gup_test: dump_pages() test
        DUMP_USER_PAGES_TEST: done
      
      Refcount shows the difference between pin vs no-pin case.
      Also change type of nr from int to long, as it counts number of pages.
      
      Link: https://lkml.kernel.org/r/20210215161349.246722-14-pasha.tatashin@soleen.comSigned-off-by: NPavel Tatashin <pasha.tatashin@soleen.com>
      Reviewed-by: NJohn Hubbard <jhubbard@nvidia.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Ira Weiny <ira.weiny@intel.com>
      Cc: James Morris <jmorris@namei.org>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sasha Levin <sashal@kernel.org>
      Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
      Cc: Tyler Hicks <tyhicks@linux.microsoft.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      79dbf135
    • A
      userfaultfd/selftests: add test exercising minor fault handling · f0fa9433
      Axel Rasmussen 提交于
      Fix a dormant bug in userfaultfd_events_test(), where we did `return
      faulting_process(0)` instead of `exit(faulting_process(0))`.  This
      caused the forked process to keep running, trying to execute any further
      test cases after the events test in parallel with the "real" process.
      
      Add a simple test case which exercises minor faults.  In short, it does
      the following:
      
      1. "Sets up" an area (area_dst) and a second shared mapping to the same
         underlying pages (area_dst_alias).
      
      2. Register one of these areas with userfaultfd, in minor fault mode.
      
      3. Start a second thread to handle any minor faults.
      
      4. Populate the underlying pages with the non-UFFD-registered side of
         the mapping. Basically, memset() each page with some arbitrary
         contents.
      
      5. Then, using the UFFD-registered mapping, read all of the page
         contents, asserting that the contents match expectations (we expect
         the minor fault handling thread can modify the page contents before
         resolving the fault).
      
      The minor fault handling thread, upon receiving an event, flips all the
      bits (~) in that page, just to prove that it can modify it in some
      arbitrary way.  Then it issues a UFFDIO_CONTINUE ioctl, to setup the
      mapping and resolve the fault.  The reading thread should wake up and
      see this modification.
      
      Currently the minor fault test is only enabled in hugetlb_shared mode,
      as this is the only configuration the kernel feature supports.
      
      Link: https://lkml.kernel.org/r/20210301222728.176417-7-axelrasmussen@google.comSigned-off-by: NAxel Rasmussen <axelrasmussen@google.com>
      Reviewed-by: NPeter Xu <peterx@redhat.com>
      Cc: Adam Ruprecht <ruprecht@google.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Cannon Matthews <cannonmatthews@google.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chinwen Chang <chinwen.chang@mediatek.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Lokesh Gidra <lokeshgidra@google.com>
      Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: "Michal Koutn" <mkoutny@suse.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Mina Almasry <almasrymina@google.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Oliver Upton <oupton@google.com>
      Cc: Shaohua Li <shli@fb.com>
      Cc: Shawn Anastasio <shawn@anastas.io>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f0fa9433
    • Z
      mm: huge_memory: debugfs for file-backed THP split · fbe37501
      Zi Yan 提交于
      Further extend <debugfs>/split_huge_pages to accept
      "<path>,<pgoff_start>,<pgoff_end>" for file-backed THP split tests since
      tmpfs may have file backed by THP that mapped nowhere.
      
      Update selftest program to test file-backed THP split too.
      
      Link: https://lkml.kernel.org/r/20210331235309.332292-2-zi.yan@sent.comSigned-off-by: NZi Yan <ziy@nvidia.com>
      Suggested-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reviewed-by: NYang Shi <shy828301@gmail.com>
      Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Sandipan Das <sandipan@linux.ibm.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Mika Penttila <mika.penttila@nextfour.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fbe37501
    • Z
      mm: huge_memory: a new debugfs interface for splitting THP tests · fa6c0231
      Zi Yan 提交于
      We did not have a direct user interface of splitting the compound page
      backing a THP and there is no need unless we want to expose the THP
      implementation details to users.  Make <debugfs>/split_huge_pages accept a
      new command to do that.
      
      By writing "<pid>,<vaddr_start>,<vaddr_end>" to
      <debugfs>/split_huge_pages, THPs within the given virtual address range
      from the process with the given pid are split. It is used to test
      split_huge_page function. In addition, a selftest program is added to
      tools/testing/selftests/vm to utilize the interface by splitting
      PMD THPs and PTE-mapped THPs.
      
      This does not change the old behavior, i.e., writing 1 to the interface
      to split all THPs in the system.
      
      Link: https://lkml.kernel.org/r/20210331235309.332292-1-zi.yan@sent.comSigned-off-by: NZi Yan <ziy@nvidia.com>
      Reviewed-by: NYang Shi <shy828301@gmail.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Mika Penttila <mika.penttila@nextfour.com>
      Cc: Sandipan Das <sandipan@linux.ibm.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fa6c0231
  4. 04 5月, 2021 6 次提交
  5. 01 5月, 2021 5 次提交
  6. 28 4月, 2021 1 次提交
  7. 27 4月, 2021 3 次提交
    • D
      bpf: Fix propagation of 32 bit unsigned bounds from 64 bit bounds · 10bf4e83
      Daniel Borkmann 提交于
      Similarly as b0270958 ("bpf: Fix propagation of 32-bit signed bounds
      from 64-bit bounds."), we also need to fix the propagation of 32 bit
      unsigned bounds from 64 bit counterparts. That is, really only set the
      u32_{min,max}_value when /both/ {umin,umax}_value safely fit in 32 bit
      space. For example, the register with a umin_value == 1 does /not/ imply
      that u32_min_value is also equal to 1, since umax_value could be much
      larger than 32 bit subregister can hold, and thus u32_min_value is in
      the interval [0,1] instead.
      
      Before fix, invalid tracking result of R2_w=inv1:
      
        [...]
        5: R0_w=inv1337 R1=ctx(id=0,off=0,imm=0) R2_w=inv(id=0) R10=fp0
        5: (35) if r2 >= 0x1 goto pc+1
        [...] // goto path
        7: R0=inv1337 R1=ctx(id=0,off=0,imm=0) R2=inv(id=0,umin_value=1) R10=fp0
        7: (b6) if w2 <= 0x1 goto pc+1
        [...] // goto path
        9: R0=inv1337 R1=ctx(id=0,off=0,imm=0) R2=inv(id=0,smin_value=-9223372036854775807,smax_value=9223372032559808513,umin_value=1,umax_value=18446744069414584321,var_off=(0x1; 0xffffffff00000000),s32_min_value=1,s32_max_value=1,u32_max_value=1) R10=fp0
        9: (bc) w2 = w2
        10: R0=inv1337 R1=ctx(id=0,off=0,imm=0) R2_w=inv1 R10=fp0
        [...]
      
      After fix, correct tracking result of R2_w=inv(id=0,umax_value=1,var_off=(0x0; 0x1)):
      
        [...]
        5: R0_w=inv1337 R1=ctx(id=0,off=0,imm=0) R2_w=inv(id=0) R10=fp0
        5: (35) if r2 >= 0x1 goto pc+1
        [...] // goto path
        7: R0=inv1337 R1=ctx(id=0,off=0,imm=0) R2=inv(id=0,umin_value=1) R10=fp0
        7: (b6) if w2 <= 0x1 goto pc+1
        [...] // goto path
        9: R0=inv1337 R1=ctx(id=0,off=0,imm=0) R2=inv(id=0,smax_value=9223372032559808513,umax_value=18446744069414584321,var_off=(0x0; 0xffffffff00000001),s32_min_value=0,s32_max_value=1,u32_max_value=1) R10=fp0
        9: (bc) w2 = w2
        10: R0=inv1337 R1=ctx(id=0,off=0,imm=0) R2_w=inv(id=0,umax_value=1,var_off=(0x0; 0x1)) R10=fp0
        [...]
      
      Thus, same issue as in b0270958 holds for unsigned subregister tracking.
      Also, align __reg64_bound_u32() similarly to __reg64_bound_s32() as done in
      b0270958 to make them uniform again.
      
      Fixes: 3f50f132 ("bpf: Verifier, do explicit ALU32 bounds tracking")
      Reported-by: Manfred Paul (@_manfp)
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Reviewed-by: NJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: NAlexei Starovoitov <ast@kernel.org>
      10bf4e83
    • A
      selftests/bpf: Fix core_reloc test runner · bede0ebf
      Andrii Nakryiko 提交于
      Fix failed tests checks in core_reloc test runner, which allowed failing tests
      to pass quietly. Also add extra check to make sure that expected to fail test cases with
      invalid names are caught as test failure anyway, as this is not an expected
      failure mode. Also fix mislabeled probed vs direct bitfield test cases.
      
      Fixes: 124a892d ("selftests/bpf: Test TYPE_EXISTS and TYPE_SIZE CO-RE relocations")
      Reported-by: NLorenz Bauer <lmb@cloudflare.com>
      Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NLorenz Bauer <lmb@cloudflare.com>
      Link: https://lore.kernel.org/bpf/20210426192949.416837-6-andrii@kernel.org
      bede0ebf
    • A
      selftests/bpf: Fix field existence CO-RE reloc tests · 5a30eb23
      Andrii Nakryiko 提交于
      Negative field existence cases for have a broken assumption that FIELD_EXISTS
      CO-RE relo will fail for fields that match the name but have incompatible type
      signature. That's not how CO-RE relocations generally behave. Types and fields
      that match by name but not by expected type are treated as non-matching
      candidates and are skipped. Error later is reported if no matching candidate
      was found. That's what happens for most relocations, but existence relocations
      (FIELD_EXISTS and TYPE_EXISTS) are more permissive and they are designed to
      return 0 or 1, depending if a match is found. This allows to handle
      name-conflicting but incompatible types in BPF code easily. Combined with
      ___flavor suffixes, it's possible to handle pretty much any structural type
      changes in kernel within the compiled once BPF source code.
      
      So, long story short, negative field existence test cases are invalid in their
      assumptions, so this patch reworks them into a single consolidated positive
      case that doesn't match any of the fields.
      
      Fixes: c7566a69 ("selftests/bpf: Add field existence CO-RE relocs tests")
      Reported-by: NLorenz Bauer <lmb@cloudflare.com>
      Signed-off-by: NAndrii Nakryiko <andrii@kernel.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Acked-by: NLorenz Bauer <lmb@cloudflare.com>
      Link: https://lore.kernel.org/bpf/20210426192949.416837-5-andrii@kernel.org
      5a30eb23