1. 20 8月, 2021 6 次提交
  2. 28 6月, 2021 2 次提交
  3. 24 6月, 2021 1 次提交
  4. 18 6月, 2021 3 次提交
  5. 17 6月, 2021 1 次提交
    • L
      sched/cpufreq: Consider reduced CPU capacity in energy calculation · 8f1b971b
      Lukasz Luba 提交于
      Energy Aware Scheduling (EAS) needs to predict the decisions made by
      SchedUtil. The map_util_freq() exists to do that.
      
      There are corner cases where the max allowed frequency might be reduced
      (due to thermal). SchedUtil as a CPUFreq governor, is aware of that
      but EAS is not. This patch aims to address it.
      
      SchedUtil stores the maximum allowed frequency in
      'sugov_policy::next_freq' field. EAS has to predict that value, which is
      the real used frequency. That value is made after a call to
      cpufreq_driver_resolve_freq() which clamps to the CPUFreq policy limits.
      In the existing code EAS is not able to predict that real frequency.
      This leads to energy estimation errors.
      
      To avoid wrong energy estimation in EAS (due to frequency miss prediction)
      make sure that the step which calculates Performance Domain frequency,
      is also aware of the allowed CPU capacity.
      
      Furthermore, modify map_util_freq() to not extend the frequency value.
      Instead, use map_util_perf() to extend the util value in both places:
      SchedUtil and EAS, but for EAS clamp it to max allowed CPU capacity.
      In the end, we achieve the same desirable behavior for both subsystems
      and alignment in regards to the real CPU frequency.
      Signed-off-by: NLukasz Luba <lukasz.luba@arm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> (For the schedutil part)
      Link: https://lore.kernel.org/r/20210614191238.23224-1-lukasz.luba@arm.com
      8f1b971b
  6. 13 6月, 2021 1 次提交
    • F
      mm: relocate 'write_protect_seq' in struct mm_struct · 2e302543
      Feng Tang 提交于
      0day robot reported a 9.2% regression for will-it-scale mmap1 test
      case[1], caused by commit 57efa1fe ("mm/gup: prevent gup_fast from
      racing with COW during fork").
      
      Further debug shows the regression is due to that commit changes the
      offset of hot fields 'mmap_lock' inside structure 'mm_struct', thus some
      cache alignment changes.
      
      From the perf data, the contention for 'mmap_lock' is very severe and
      takes around 95% cpu cycles, and it is a rw_semaphore
      
              struct rw_semaphore {
                      atomic_long_t count;	/* 8 bytes */
                      atomic_long_t owner;	/* 8 bytes */
                      struct optimistic_spin_queue osq; /* spinner MCS lock */
                      ...
      
      Before commit 57efa1fe adds the 'write_protect_seq', it happens to
      have a very optimal cache alignment layout, as Linus explained:
      
       "and before the addition of the 'write_protect_seq' field, the
        mmap_sem was at offset 120 in 'struct mm_struct'.
      
        Which meant that count and owner were in two different cachelines,
        and then when you have contention and spend time in
        rwsem_down_write_slowpath(), this is probably *exactly* the kind
        of layout you want.
      
        Because first the rwsem_write_trylock() will do a cmpxchg on the
        first cacheline (for the optimistic fast-path), and then in the
        case of contention, rwsem_down_write_slowpath() will just access
        the second cacheline.
      
        Which is probably just optimal for a load that spends a lot of
        time contended - new waiters touch that first cacheline, and then
        they queue themselves up on the second cacheline."
      
      After the commit, the rw_semaphore is at offset 128, which means the
      'count' and 'owner' fields are now in the same cacheline, and causes
      more cache bouncing.
      
      Currently there are 3 "#ifdef CONFIG_XXX" before 'mmap_lock' which will
      affect its offset:
      
        CONFIG_MMU
        CONFIG_MEMBARRIER
        CONFIG_HAVE_ARCH_COMPAT_MMAP_BASES
      
      The layout above is on 64 bits system with 0day's default kernel config
      (similar to RHEL-8.3's config), in which all these 3 options are 'y'.
      And the layout can vary with different kernel configs.
      
      Relayouting a structure is usually a double-edged sword, as sometimes it
      can helps one case, but hurt other cases.  For this case, one solution
      is, as the newly added 'write_protect_seq' is a 4 bytes long seqcount_t
      (when CONFIG_DEBUG_LOCK_ALLOC=n), placing it into an existing 4 bytes
      hole in 'mm_struct' will not change other fields' alignment, while
      restoring the regression.
      
      Link: https://lore.kernel.org/lkml/20210525031636.GB7744@xsang-OptiPlex-9020/ [1]
      Reported-by: Nkernel test robot <oliver.sang@intel.com>
      Signed-off-by: NFeng Tang <feng.tang@intel.com>
      Reviewed-by: NJohn Hubbard <jhubbard@nvidia.com>
      Reviewed-by: NJason Gunthorpe <jgg@nvidia.com>
      Cc: Peter Xu <peterx@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2e302543
  7. 10 6月, 2021 1 次提交
  8. 09 6月, 2021 2 次提交
    • P
      kvm: fix previous commit for 32-bit builds · 4422829e
      Paolo Bonzini 提交于
      array_index_nospec does not work for uint64_t on 32-bit builds.
      However, the size of a memory slot must be less than 20 bits wide
      on those system, since the memory slot must fit in the user
      address space.  So just store it in an unsigned long.
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      4422829e
    • P
      kvm: avoid speculation-based attacks from out-of-range memslot accesses · da27a83f
      Paolo Bonzini 提交于
      KVM's mechanism for accessing guest memory translates a guest physical
      address (gpa) to a host virtual address using the right-shifted gpa
      (also known as gfn) and a struct kvm_memory_slot.  The translation is
      performed in __gfn_to_hva_memslot using the following formula:
      
            hva = slot->userspace_addr + (gfn - slot->base_gfn) * PAGE_SIZE
      
      It is expected that gfn falls within the boundaries of the guest's
      physical memory.  However, a guest can access invalid physical addresses
      in such a way that the gfn is invalid.
      
      __gfn_to_hva_memslot is called from kvm_vcpu_gfn_to_hva_prot, which first
      retrieves a memslot through __gfn_to_memslot.  While __gfn_to_memslot
      does check that the gfn falls within the boundaries of the guest's
      physical memory or not, a CPU can speculate the result of the check and
      continue execution speculatively using an illegal gfn. The speculation
      can result in calculating an out-of-bounds hva.  If the resulting host
      virtual address is used to load another guest physical address, this
      is effectively a Spectre gadget consisting of two consecutive reads,
      the second of which is data dependent on the first.
      
      Right now it's not clear if there are any cases in which this is
      exploitable.  One interesting case was reported by the original author
      of this patch, and involves visiting guest page tables on x86.  Right
      now these are not vulnerable because the hva read goes through get_user(),
      which contains an LFENCE speculation barrier.  However, there are
      patches in progress for x86 uaccess.h to mask kernel addresses instead of
      using LFENCE; once these land, a guest could use speculation to read
      from the VMM's ring 3 address space.  Other architectures such as ARM
      already use the address masking method, and would be susceptible to
      this same kind of data-dependent access gadgets.  Therefore, this patch
      proactively protects from these attacks by masking out-of-bounds gfns
      in __gfn_to_hva_memslot, which blocks speculation of invalid hvas.
      
      Sean Christopherson noted that this patch does not cover
      kvm_read_guest_offset_cached.  This however is limited to a few bytes
      past the end of the cache, and therefore it is unlikely to be useful in
      the context of building a chain of data dependent accesses.
      Reported-by: NArtemiy Margaritov <artemiy.margaritov@gmail.com>
      Co-developed-by: NArtemiy Margaritov <artemiy.margaritov@gmail.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      da27a83f
  9. 05 6月, 2021 1 次提交
  10. 04 6月, 2021 3 次提交
  11. 03 6月, 2021 1 次提交
    • D
      sched/fair: Fix util_est UTIL_AVG_UNCHANGED handling · 68d7a190
      Dietmar Eggemann 提交于
      The util_est internal UTIL_AVG_UNCHANGED flag which is used to prevent
      unnecessary util_est updates uses the LSB of util_est.enqueued. It is
      exposed via _task_util_est() (and task_util_est()).
      
      Commit 92a801e5 ("sched/fair: Mask UTIL_AVG_UNCHANGED usages")
      mentions that the LSB is lost for util_est resolution but
      find_energy_efficient_cpu() checks if task_util_est() returns 0 to
      return prev_cpu early.
      
      _task_util_est() returns the max value of util_est.ewma and
      util_est.enqueued or'ed w/ UTIL_AVG_UNCHANGED.
      So task_util_est() returning the max of task_util() and
      _task_util_est() will never return 0 under the default
      SCHED_FEAT(UTIL_EST, true).
      
      To fix this use the MSB of util_est.enqueued instead and keep the flag
      util_est internal, i.e. don't export it via _task_util_est().
      
      The maximal possible util_avg value for a task is 1024 so the MSB of
      'unsigned int util_est.enqueued' isn't used to store a util value.
      
      As a caveat the code behind the util_est_se trace point has to filter
      UTIL_AVG_UNCHANGED to see the real util_est.enqueued value which should
      be easy to do.
      
      This also fixes an issue report by Xuewen Yan that util_est_update()
      only used UTIL_AVG_UNCHANGED for the subtrahend of the equation:
      
        last_enqueued_diff = ue.enqueued - (task_util() | UTIL_AVG_UNCHANGED)
      
      Fixes: b89997aa sched/pelt: Fix task util_est update filtering
      Signed-off-by: NDietmar Eggemann <dietmar.eggemann@arm.com>
      Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
      Reviewed-by: NXuewen Yan <xuewen.yan@unisoc.com>
      Reviewed-by: NVincent Donnefort <vincent.donnefort@arm.com>
      Reviewed-by: NVincent Guittot <vincent.guittot@linaro.org>
      Link: https://lore.kernel.org/r/20210602145808.1562603-1-dietmar.eggemann@arm.com
      68d7a190
  12. 02 6月, 2021 2 次提交
  13. 01 6月, 2021 1 次提交
  14. 31 5月, 2021 1 次提交
  15. 27 5月, 2021 3 次提交
  16. 26 5月, 2021 2 次提交
  17. 25 5月, 2021 3 次提交
  18. 24 5月, 2021 2 次提交
  19. 23 5月, 2021 1 次提交
  20. 22 5月, 2021 1 次提交
  21. 20 5月, 2021 1 次提交
  22. 19 5月, 2021 1 次提交
    • A
      platform/surface: aggregator: avoid clang -Wconstant-conversion warning · ba6e1d84
      Arnd Bergmann 提交于
      Clang complains about the assignment of SSAM_ANY_IID to
      ssam_device_uid->instance:
      
      drivers/platform/surface/surface_aggregator_registry.c:478:25: error: implicit conversion from 'int' to '__u8' (aka 'unsigned char') changes value from 65535 to 255 [-Werror,-Wconstant-conversion]
              { SSAM_VDEV(HUB, 0x02, SSAM_ANY_IID, 0x00) },
              ~                      ^~~~~~~~~~~~
      include/linux/surface_aggregator/device.h:71:23: note: expanded from macro 'SSAM_ANY_IID'
       #define SSAM_ANY_IID            0xffff
                                      ^~~~~~
      include/linux/surface_aggregator/device.h:126:63: note: expanded from macro 'SSAM_VDEV'
              SSAM_DEVICE(SSAM_DOMAIN_VIRTUAL, SSAM_VIRTUAL_TC_##cat, tid, iid, fun)
                                                                           ^~~
      include/linux/surface_aggregator/device.h:102:41: note: expanded from macro 'SSAM_DEVICE'
              .instance = ((iid) != SSAM_ANY_IID) ? (iid) : 0,                        \
                                                     ^~~
      
      The assignment doesn't actually happen, but clang checks the type limits
      before checking whether this assignment is reached. Replace the ?:
      operator with a __builtin_choose_expr() invocation that avoids the
      warning for the untaken part.
      
      Fixes: eb0e90a8 ("platform/surface: aggregator: Add dedicated bus and device type")
      Cc: platform-driver-x86@vger.kernel.org
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Reviewed-by: NNathan Chancellor <nathan@kernel.org>
      Reviewed-by: NMaximilian Luz <luzmaximilian@gmail.com>
      Link: https://lore.kernel.org/r/20210514200453.1542978-1-arnd@kernel.orgSigned-off-by: NHans de Goede <hdegoede@redhat.com>
      ba6e1d84