提交 · 07ec77a1d4e82526e1588979fff2f024f8e96df2 · openeuler / Kernel

20 8月, 2021 6 次提交

sched: Allow task CPU affinity to be restricted on asymmetric systems · 07ec77a1

由 Will Deacon 提交于 7月 30, 2021

Asymmetric systems may not offer the same level of userspace ISA support
across all CPUs, meaning that some applications cannot be executed by
some CPUs. As a concrete example, upcoming arm64 big.LITTLE designs do
not feature support for 32-bit applications on both clusters.

Although userspace can carefully manage the affinity masks for such
tasks, one place where it is particularly problematic is execve()
because the CPU on which the execve() is occurring may be incompatible
with the new application image. In such a situation, it is desirable to
restrict the affinity mask of the task and ensure that the new image is
entered on a compatible CPU. From userspace's point of view, this looks
the same as if the incompatible CPUs have been hotplugged off in the
task's affinity mask. Similarly, if a subsequent execve() reverts to
a compatible image, then the old affinity is restored if it is still
valid.

In preparation for restricting the affinity mask for compat tasks on
arm64 systems without uniform support for 32-bit applications, introduce
{force,relax}_compatible_cpus_allowed_ptr(), which respectively restrict
and restore the affinity mask for a task based on the compatible CPUs.
Signed-off-by: NWill Deacon <will@kernel.org>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NValentin Schneider <valentin.schneider@arm.com>
Reviewed-by: NQuentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20210730112443.23245-9-will@kernel.org

07ec77a1

sched: Introduce task_struct::user_cpus_ptr to track requested affinity · b90ca8ba

由 Will Deacon 提交于 7月 30, 2021

In preparation for saving and restoring the user-requested CPU affinity
mask of a task, add a new cpumask_t pointer to 'struct task_struct'.

If the pointer is non-NULL, then the mask is copied across fork() and
freed on task exit.
Signed-off-by: NWill Deacon <will@kernel.org>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NValentin Schneider <Valentin.Schneider@arm.com>
Link: https://lore.kernel.org/r/20210730112443.23245-7-will@kernel.org

b90ca8ba

cpuset: Cleanup cpuset_cpus_allowed_fallback() use in select_fallback_rq() · 97c0054d

由 Will Deacon 提交于 7月 30, 2021

select_fallback_rq() only needs to recheck for an allowed CPU if the
affinity mask of the task has changed since the last check.

Return a 'bool' from cpuset_cpus_allowed_fallback() to indicate whether
the affinity mask was updated, and use this to elide the allowed check
when the mask has been left alone.

No functional change.
Suggested-by: NValentin Schneider <valentin.schneider@arm.com>
Signed-off-by: NWill Deacon <will@kernel.org>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NValentin Schneider <valentin.schneider@arm.com>
Link: https://lore.kernel.org/r/20210730112443.23245-5-will@kernel.org

97c0054d

cpuset: Honour task_cpu_possible_mask() in guarantee_online_cpus() · 431c69fa

由 Will Deacon 提交于 7月 30, 2021

Modify guarantee_online_cpus() to take task_cpu_possible_mask() into
account when trying to find a suitable set of online CPUs for a given
task. This will avoid passing an invalid mask to set_cpus_allowed_ptr()
during ->attach() and will subsequently allow the cpuset hierarchy to be
taken into account when forcefully overriding the affinity mask for a
task which requires migration to a compatible CPU.
Signed-off-by: NWill Deacon <will@kernel.org>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NValentin Schneider <Valentin.Schneider@arm.com>
Link: https://lkml.kernel.org/r/20210730112443.23245-4-will@kernel.org

431c69fa

cpuset: Don't use the cpu_possible_mask as a last resort for cgroup v1 · d4b96fb9

由 Will Deacon 提交于 7月 30, 2021

If the scheduler cannot find an allowed CPU for a task,
cpuset_cpus_allowed_fallback() will widen the affinity to cpu_possible_mask
if cgroup v1 is in use.

In preparation for allowing architectures to provide their own fallback
mask, just return early if we're either using cgroup v1 or we're using
cgroup v2 with a mask that contains invalid CPUs. This will allow
select_fallback_rq() to figure out the mask by itself.
Signed-off-by: NWill Deacon <will@kernel.org>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NValentin Schneider <valentin.schneider@arm.com>
Reviewed-by: NQuentin Perret <qperret@google.com>
Link: https://lkml.kernel.org/r/20210730112443.23245-3-will@kernel.org

d4b96fb9

sched: Introduce task_cpu_possible_mask() to limit fallback rq selection · 9ae606bc

由 Will Deacon 提交于 7月 30, 2021

On such a system, we must take care not to migrate a task to an
unsupported CPU when forcefully moving tasks in select_fallback_rq()
in response to a CPU hot-unplug operation.

Introduce a task_cpu_possible_mask() hook which, given a task argument,
allows an architecture to return a cpumask of CPUs that are capable of
executing that task. The default implementation returns the
cpu_possible_mask, since sane machines do not suffer from per-cpu ISA
limitations that affect scheduling. The new mask is used when selecting
the fallback runqueue as a last resort before forcing a migration to the
first active CPU.
Signed-off-by: NWill Deacon <will@kernel.org>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NValentin Schneider <Valentin.Schneider@arm.com>
Reviewed-by: NQuentin Perret <qperret@google.com>
Link: https://lore.kernel.org/r/20210730112443.23245-2-will@kernel.org

9ae606bc

28 6月, 2021 2 次提交

sched/sysctl: Move extern sysctl declarations to sched.h · 18765447

由 Hailong Liu 提交于 6月 06, 2021

Since commit '8a99b683(sched: Move SCHED_DEBUG sysctl to debugfs)',
SCHED_DEBUG sysctls are moved to debugfs, so these extern sysctls in
include/linux/sched/sysctl.h are no longer needed for sysctl.c, even
some are no longer needed.

So move those extern sysctls that needed by kernel/sched/debug.c to
kernel/sched/sched.h, and remove others that are no longer needed.
Signed-off-by: NHailong Liu <liu.hailong6@zte.com.cn>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210606115451.26745-1-liuhailongg6@163.com

18765447

wait: use LIST_HEAD_INIT() to initialize wait_queue_head · 77eccd0d

由 Julian Wiedmann 提交于 6月 01, 2021

Replace the open-coded initialization with the right macro.
Signed-off-by: NJulian Wiedmann <jwi@linux.ibm.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20210601151120.329223-1-jwi@linux.ibm.com

77eccd0d

24 6月, 2021 1 次提交

sched/core: Introduce SD_ASYM_CPUCAPACITY_FULL sched_domain flag · 2309a05d

由 Beata Michalska 提交于 6月 03, 2021

Introducing new, complementary to SD_ASYM_CPUCAPACITY, sched_domain
topology flag, to distinguish between shed_domains where any CPU
capacity asymmetry is detected (SD_ASYM_CPUCAPACITY) and ones where
a full set of CPU capacities is visible to all domain members
(SD_ASYM_CPUCAPACITY_FULL).

With the distinction between full and partial CPU capacity asymmetry,
brought in by the newly introduced flag, the scope of the original
SD_ASYM_CPUCAPACITY flag gets shifted, still maintaining the existing
behaviour when one is detected on a given sched domain, allowing
misfit migrations within sched domains that do not observe full range
of CPU capacities but still do have members with different capacity
values. It loses though it's meaning when it comes to the lowest CPU
asymmetry sched_domain level per-cpu pointer, which is to be now
denoted by SD_ASYM_CPUCAPACITY_FULL flag.
Signed-off-by: NBeata Michalska <beata.michalska@arm.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NValentin Schneider <valentin.schneider@arm.com>
Reviewed-by: NDietmar Eggemann <dietmar.eggemann@arm.com>
Link: https://lore.kernel.org/r/20210603140627.8409-2-beata.michalska@arm.com

2309a05d

18 6月, 2021 3 次提交

sched: Change task_struct::state · 2f064a59

由 Peter Zijlstra 提交于 6月 11, 2021

Change the type and name of task_struct::state. Drop the volatile and
shrink it to an 'unsigned int'. Rename it in order to find all uses
such that we can use READ_ONCE/WRITE_ONCE as appropriate.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NDaniel Bristot de Oliveira <bristot@redhat.com>
Acked-by: NWill Deacon <will@kernel.org>
Acked-by: NDaniel Thompson <daniel.thompson@linaro.org>
Link: https://lore.kernel.org/r/20210611082838.550736351@infradead.org

2f064a59

sched: Add get_current_state() · d6c23bb3

由 Peter Zijlstra 提交于 6月 11, 2021

Remove yet another few p->state accesses.
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NWill Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210611082838.347475156@infradead.org

d6c23bb3

sched: Introduce task_is_running() · b03fbd4f

由 Peter Zijlstra 提交于 6月 11, 2021

Replace a bunch of 'p->state == TASK_RUNNING' with a new helper:
task_is_running(p).
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NDavidlohr Bueso <dave@stgolabs.net>
Acked-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Acked-by: NWill Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210611082838.222401495@infradead.org

b03fbd4f

17 6月, 2021 1 次提交

sched/cpufreq: Consider reduced CPU capacity in energy calculation · 8f1b971b

由 Lukasz Luba 提交于 6月 14, 2021

Energy Aware Scheduling (EAS) needs to predict the decisions made by
SchedUtil. The map_util_freq() exists to do that.

There are corner cases where the max allowed frequency might be reduced
(due to thermal). SchedUtil as a CPUFreq governor, is aware of that
but EAS is not. This patch aims to address it.

SchedUtil stores the maximum allowed frequency in
'sugov_policy::next_freq' field. EAS has to predict that value, which is
the real used frequency. That value is made after a call to
cpufreq_driver_resolve_freq() which clamps to the CPUFreq policy limits.
In the existing code EAS is not able to predict that real frequency.
This leads to energy estimation errors.

To avoid wrong energy estimation in EAS (due to frequency miss prediction)
make sure that the step which calculates Performance Domain frequency,
is also aware of the allowed CPU capacity.

Furthermore, modify map_util_freq() to not extend the frequency value.
Instead, use map_util_perf() to extend the util value in both places:
SchedUtil and EAS, but for EAS clamp it to max allowed CPU capacity.
In the end, we achieve the same desirable behavior for both subsystems
and alignment in regards to the real CPU frequency.
Signed-off-by: NLukasz Luba <lukasz.luba@arm.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> (For the schedutil part)
Link: https://lore.kernel.org/r/20210614191238.23224-1-lukasz.luba@arm.com

8f1b971b

13 6月, 2021 1 次提交

mm: relocate 'write_protect_seq' in struct mm_struct · 2e302543

由 Feng Tang 提交于 6月 11, 2021

0day robot reported a 9.2% regression for will-it-scale mmap1 test
case[1], caused by commit 57efa1fe ("mm/gup: prevent gup_fast from
racing with COW during fork").

Further debug shows the regression is due to that commit changes the
offset of hot fields 'mmap_lock' inside structure 'mm_struct', thus some
cache alignment changes.

From the perf data, the contention for 'mmap_lock' is very severe and
takes around 95% cpu cycles, and it is a rw_semaphore

        struct rw_semaphore {
                atomic_long_t count;	/* 8 bytes */
                atomic_long_t owner;	/* 8 bytes */
                struct optimistic_spin_queue osq; /* spinner MCS lock */
                ...

Before commit 57efa1fe adds the 'write_protect_seq', it happens to
have a very optimal cache alignment layout, as Linus explained:

 "and before the addition of the 'write_protect_seq' field, the
  mmap_sem was at offset 120 in 'struct mm_struct'.

  Which meant that count and owner were in two different cachelines,
  and then when you have contention and spend time in
  rwsem_down_write_slowpath(), this is probably *exactly* the kind
  of layout you want.

  Because first the rwsem_write_trylock() will do a cmpxchg on the
  first cacheline (for the optimistic fast-path), and then in the
  case of contention, rwsem_down_write_slowpath() will just access
  the second cacheline.

  Which is probably just optimal for a load that spends a lot of
  time contended - new waiters touch that first cacheline, and then
  they queue themselves up on the second cacheline."

After the commit, the rw_semaphore is at offset 128, which means the
'count' and 'owner' fields are now in the same cacheline, and causes
more cache bouncing.

Currently there are 3 "#ifdef CONFIG_XXX" before 'mmap_lock' which will
affect its offset:

  CONFIG_MMU
  CONFIG_MEMBARRIER
  CONFIG_HAVE_ARCH_COMPAT_MMAP_BASES

The layout above is on 64 bits system with 0day's default kernel config
(similar to RHEL-8.3's config), in which all these 3 options are 'y'.
And the layout can vary with different kernel configs.

Relayouting a structure is usually a double-edged sword, as sometimes it
can helps one case, but hurt other cases.  For this case, one solution
is, as the newly added 'write_protect_seq' is a 4 bytes long seqcount_t
(when CONFIG_DEBUG_LOCK_ALLOC=n), placing it into an existing 4 bytes
hole in 'mm_struct' will not change other fields' alignment, while
restoring the regression.

Link: https://lore.kernel.org/lkml/20210525031636.GB7744@xsang-OptiPlex-9020/ [1]
Reported-by: Nkernel test robot <oliver.sang@intel.com>
Signed-off-by: NFeng Tang <feng.tang@intel.com>
Reviewed-by: NJohn Hubbard <jhubbard@nvidia.com>
Reviewed-by: NJason Gunthorpe <jgg@nvidia.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2e302543

10 6月, 2021 1 次提交

misc: rtsx: separate aspm mode into MODE_REG and MODE_CFG · 3df4fce7

由 Ricky Wu 提交于 6月 07, 2021

aspm (Active State Power Management)
rtsx_comm_set_aspm: this function is for driver to make sure
not enter power saving when processing of init and card_detcct
ASPM_MODE_CFG: 8411 5209 5227 5229 5249 5250
Change back to use original way to control aspm
ASPM_MODE_REG: 5227A 524A 5250A 5260 5261 5228
Keep the new way to control aspm

Fixes: 121e9c6b ("misc: rtsx: modify and fix init_hw function")
Reported-by: NChris Chiu <chris.chiu@canonical.com>
Tested-by: NGordon Lack <gordon.lack@dsl.pipex.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: NRicky Wu <ricky_wu@realtek.com>
Link: https://lore.kernel.org/r/20210607101634.4948-1-ricky_wu@realtek.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

3df4fce7

09 6月, 2021 2 次提交

kvm: fix previous commit for 32-bit builds · 4422829e

由 Paolo Bonzini 提交于 6月 09, 2021

array_index_nospec does not work for uint64_t on 32-bit builds.
However, the size of a memory slot must be less than 20 bits wide
on those system, since the memory slot must fit in the user
address space.  So just store it in an unsigned long.
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

4422829e

kvm: avoid speculation-based attacks from out-of-range memslot accesses · da27a83f

由 Paolo Bonzini 提交于 6月 08, 2021

KVM's mechanism for accessing guest memory translates a guest physical
address (gpa) to a host virtual address using the right-shifted gpa
(also known as gfn) and a struct kvm_memory_slot.  The translation is
performed in __gfn_to_hva_memslot using the following formula:

      hva = slot->userspace_addr + (gfn - slot->base_gfn) * PAGE_SIZE

It is expected that gfn falls within the boundaries of the guest's
physical memory.  However, a guest can access invalid physical addresses
in such a way that the gfn is invalid.

__gfn_to_hva_memslot is called from kvm_vcpu_gfn_to_hva_prot, which first
retrieves a memslot through __gfn_to_memslot.  While __gfn_to_memslot
does check that the gfn falls within the boundaries of the guest's
physical memory or not, a CPU can speculate the result of the check and
continue execution speculatively using an illegal gfn. The speculation
can result in calculating an out-of-bounds hva.  If the resulting host
virtual address is used to load another guest physical address, this
is effectively a Spectre gadget consisting of two consecutive reads,
the second of which is data dependent on the first.

Right now it's not clear if there are any cases in which this is
exploitable.  One interesting case was reported by the original author
of this patch, and involves visiting guest page tables on x86.  Right
now these are not vulnerable because the hva read goes through get_user(),
which contains an LFENCE speculation barrier.  However, there are
patches in progress for x86 uaccess.h to mask kernel addresses instead of
using LFENCE; once these land, a guest could use speculation to read
from the VMM's ring 3 address space.  Other architectures such as ARM
already use the address masking method, and would be susceptible to
this same kind of data-dependent access gadgets.  Therefore, this patch
proactively protects from these attacks by masking out-of-bounds gfns
in __gfn_to_hva_memslot, which blocks speculation of invalid hvas.

Sean Christopherson noted that this patch does not cover
kvm_read_guest_offset_cached.  This however is limited to a few bytes
past the end of the cache, and therefore it is unlikely to be useful in
the context of building a chain of data dependent accesses.
Reported-by: NArtemiy Margaritov <artemiy.margaritov@gmail.com>
Co-developed-by: NArtemiy Margaritov <artemiy.margaritov@gmail.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

da27a83f

05 6月, 2021 1 次提交

Revert "MIPS: make userspace mapping young by default" · 50c25ee9

由 Thomas Bogendoerfer 提交于 6月 04, 2021

This reverts commit f685a533.

The MIPS cache flush logic needs to know whether the mapping was already
established to decide how to flush caches.  This is done by checking the
valid bit in the PTE.  The commit above breaks this logic by setting the
valid in the PTE in new mappings, which causes kernel crashes.

Link: https://lkml.kernel.org/r/20210526094335.92948-1-tsbogend@alpha.franken.de
Fixes: f685a533 ("MIPS: make userspace mapping young by default")
Reported-by: NZhou Yanjie <zhouyanjie@wanyeetech.com>
Signed-off-by: NThomas Bogendoerfer <tsbogend@alpha.franken.de>
Cc: Huang Pei <huangpei@loongson.cn>
Cc: Nicholas Piggin <npiggin@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

50c25ee9

04 6月, 2021 3 次提交

virtchnl: Add missing padding to virtchnl_proto_hdrs · 519d8ab1

由 Geert Uytterhoeven 提交于 5月 19, 2021

On m68k (Coldfire M547x):

      CC      drivers/net/ethernet/intel/i40e/i40e_main.o
    In file included from drivers/net/ethernet/intel/i40e/i40e_prototype.h:9,
		     from drivers/net/ethernet/intel/i40e/i40e.h:41,
		     from drivers/net/ethernet/intel/i40e/i40e_main.c:12:
    include/linux/avf/virtchnl.h:153:36: warning: division by zero [-Wdiv-by-zero]
      153 |  { virtchnl_static_assert_##X = (n)/((sizeof(struct X) == (n)) ? 1 : 0) }
	  |                                    ^
    include/linux/avf/virtchnl.h:844:1: note: in expansion of macro ‘VIRTCHNL_CHECK_STRUCT_LEN’
      844 | VIRTCHNL_CHECK_STRUCT_LEN(2312, virtchnl_proto_hdrs);
	  | ^~~~~~~~~~~~~~~~~~~~~~~~~
    include/linux/avf/virtchnl.h:844:33: error: enumerator value for ‘virtchnl_static_assert_virtchnl_proto_hdrs’ is not an integer constant
      844 | VIRTCHNL_CHECK_STRUCT_LEN(2312, virtchnl_proto_hdrs);
	  |                                 ^~~~~~~~~~~~~~~~~~~

On m68k, integers are aligned on addresses that are multiples of two,
not four, bytes.  Hence the size of a structure containing integers may
not be divisible by 4.

Fix this by adding explicit padding.

Fixes: 1f7ea1cd ("ice: Enable FDIR Configure for AVF")
Reported-by: Nkernel test robot <lkp@intel.com>
Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Acked-by: NJesse Brandeburg <jesse.brandeburg@intel.com>
Signed-off-by: NTony Nguyen <anthony.l.nguyen@intel.com>

519d8ab1

usb: typec: tcpm: Properly handle Alert and Status Messages · 063933f4

由 Kyle Tso 提交于 6月 01, 2021

When receiving Alert Message, if it is not unexpected but is
unsupported for some reason, the port should return Not_Supported
Message response.

Also, according to PD3.0 Spec 6.5.2.1.4 Event Flags Field, the
OTP/OVP/OCP flags in the Event Flags field in Status Message no longer
require Get_PPS_Status Message to clear them. Thus remove it when
receiving Status Message with those flags being set.

In addition, add the missing AMS operations for Status Message.

Fixes: 64f7c494 ("typec: tcpm: Add support for sink PPS related messages")
Fixes: 0908c5ac ("usb: typec: tcpm: AMS and Collision Avoidance")
Signed-off-by: NKyle Tso <kyletso@google.com>
Link: https://lore.kernel.org/r/20210531164928.2368606-1-kyletso@google.com
Cc: stable <stable@vger.kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

063933f4

RDMA/mlx4: Do not map the core_clock page to user space unless enabled · 404e5a12

由 Shay Drory 提交于 6月 03, 2021

Currently when mlx4 maps the hca_core_clock page to the user space there
are read-modifiable registers, one of which is semaphore, on this page as
well as the clock counter. If user reads the wrong offset, it can modify
the semaphore and hang the device.

Do not map the hca_core_clock page to the user space unless the device has
been put in a backwards compatibility mode to support this feature.

After this patch, mlx4 core_clock won't be mapped to user space on the
majority of existing devices and the uverbs device time feature in
ibv_query_rt_values_ex() will be disabled.

Fixes: 52033cfb ("IB/mlx4: Add mmap call to map the hardware clock")
Link: https://lore.kernel.org/r/9632304e0d6790af84b3b706d8c18732bc0d5e27.1622726305.git.leonro@nvidia.comSigned-off-by: NShay Drory <shayd@nvidia.com>
Signed-off-by: NLeon Romanovsky <leonro@nvidia.com>
Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>

404e5a12

03 6月, 2021 1 次提交

sched/fair: Fix util_est UTIL_AVG_UNCHANGED handling · 68d7a190

由 Dietmar Eggemann 提交于 6月 02, 2021

The util_est internal UTIL_AVG_UNCHANGED flag which is used to prevent
unnecessary util_est updates uses the LSB of util_est.enqueued. It is
exposed via _task_util_est() (and task_util_est()).

Commit 92a801e5 ("sched/fair: Mask UTIL_AVG_UNCHANGED usages")
mentions that the LSB is lost for util_est resolution but
find_energy_efficient_cpu() checks if task_util_est() returns 0 to
return prev_cpu early.

_task_util_est() returns the max value of util_est.ewma and
util_est.enqueued or'ed w/ UTIL_AVG_UNCHANGED.
So task_util_est() returning the max of task_util() and
_task_util_est() will never return 0 under the default
SCHED_FEAT(UTIL_EST, true).

To fix this use the MSB of util_est.enqueued instead and keep the flag
util_est internal, i.e. don't export it via _task_util_est().

The maximal possible util_avg value for a task is 1024 so the MSB of
'unsigned int util_est.enqueued' isn't used to store a util value.

As a caveat the code behind the util_est_se trace point has to filter
UTIL_AVG_UNCHANGED to see the real util_est.enqueued value which should
be easy to do.

This also fixes an issue report by Xuewen Yan that util_est_update()
only used UTIL_AVG_UNCHANGED for the subtrahend of the equation:

  last_enqueued_diff = ue.enqueued - (task_util() | UTIL_AVG_UNCHANGED)

Fixes: b89997aa sched/pelt: Fix task util_est update filtering
Signed-off-by: NDietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NXuewen Yan <xuewen.yan@unisoc.com>
Reviewed-by: NVincent Donnefort <vincent.donnefort@arm.com>
Reviewed-by: NVincent Guittot <vincent.guittot@linaro.org>
Link: https://lore.kernel.org/r/20210602145808.1562603-1-dietmar.eggemann@arm.com

68d7a190

02 6月, 2021 2 次提交

usb: pd: Set PD_T_SINK_WAIT_CAP to 310ms · 6490fa56

由 Kyle Tso 提交于 5月 28, 2021

Current timer PD_T_SINK_WAIT_CAP is set to 240ms which will violate the
SinkWaitCapTimer (tTypeCSinkWaitCap 310 - 620 ms) defined in the PD
Spec if the port is faster enough when running the state machine. Set it
to the lower bound 310ms to ensure the timeout is in Spec.

Fixes: f0690a25 ("staging: typec: USB Type-C Port Manager (tcpm)")
Cc: stable <stable@vger.kernel.org>
Reviewed-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NKyle Tso <kyletso@google.com>
Link: https://lore.kernel.org/r/20210528081613.730661-1-kyletso@google.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

6490fa56

net/mlx5: DR, Create multi-destination flow table with level less than 64 · 216214c6

由 Yevgeny Kliteynik 提交于 12月 09, 2020

Flow table that contains flow pointing to multiple flow tables or multiple
TIRs must have a level lower than 64. In our case it applies to muli-
destination flow table.
Fix the level of the created table to comply with HW Spec definitions, and
still make sure that its level lower than SW-owned tables, so that it
would be possible to point from the multi-destination FW table to SW
tables.

Fixes: 34583bee ("net/mlx5: DR, Create multi-destination table for SW-steering use")
Signed-off-by: NYevgeny Kliteynik <kliteyn@nvidia.com>
Reviewed-by: NAlex Vesker <valex@nvidia.com>
Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>

216214c6

01 6月, 2021 1 次提交

Revert "fb_defio: Remove custom address_space_operations" · 0b78f8bc

由 Matthew Wilcox 提交于 6月 01, 2021

Commit ccf953d8 makes framebuffers which use deferred I/O stop
displaying updates after the first one.  This is because the pages
handled by fb_defio no longer have a page_mapping().  That prevents
page_mkclean() from marking the PTEs as clean, and so writes are only
noticed the first time.
Reported-by: NAndy Shevchenko <andy.shevchenko@gmail.com>
Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Link: https://patchwork.freedesktop.org/patch/msgid/YLZEhv0cpZp8uVE3@casper.infradead.org

0b78f8bc

31 5月, 2021 1 次提交

tick/nohz: Only check for RCU deferred wakeup on user/guest entry when needed · f268c373

由 Frederic Weisbecker 提交于 5月 27, 2021

Checking for and processing RCU-nocb deferred wakeup upon user/guest
entry is only relevant when nohz_full runs on the local CPU, otherwise
the periodic tick should take care of it.

Make sure we don't needlessly pollute these fast-paths as a -3%
performance regression on a will-it-scale.per_process_ops has been
reported so far.

Fixes: 47b8ff19 (entry: Explicitly flush pending rcuog wakeup before last rescheduling point)
Fixes: 4ae7dc97 (entry/kvm: Explicitly flush pending rcuog wakeup before last rescheduling point)
Reported-by: Nkernel test robot <oliver.sang@intel.com>
Signed-off-by: NFrederic Weisbecker <frederic@kernel.org>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NPaul E. McKenney <paulmck@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lkml.kernel.org/r/20210527113441.465489-1-frederic@kernel.org

f268c373

27 5月, 2021 3 次提交

KVM: rename KVM_REQ_PENDING_TIMER to KVM_REQ_UNBLOCK · 084071d5

由 Marcelo Tosatti 提交于 5月 25, 2021

KVM_REQ_UNBLOCK will be used to exit a vcpu from
its inner vcpu halt emulation loop.

Rename KVM_REQ_PENDING_TIMER to KVM_REQ_UNBLOCK, switch
PowerPC to arch specific request bit.
Signed-off-by: NMarcelo Tosatti <mtosatti@redhat.com>

Message-Id: <20210525134321.303768132@redhat.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

084071d5

KVM: PPC: exit halt polling on need_resched() · 6bd5b743

由 Wanpeng Li 提交于 5月 18, 2021

This is inspired by commit 262de410 (kvm: exit halt polling on
need_resched() as well). Due to PPC implements an arch specific halt
polling logic, we have to the need_resched() check there as well. This
patch adds a helper function that can be shared between book3s and generic
halt-polling loops.
Reviewed-by: NDavid Matlack <dmatlack@google.com>
Reviewed-by: NVenkatesh Srinivas <venkateshs@chromium.org>
Cc: Ben Segall <bsegall@google.com>
Cc: Venkatesh Srinivas <venkateshs@chromium.org>
Cc: Jim Mattson <jmattson@google.com>
Cc: David Matlack <dmatlack@google.com>
Cc: Paul Mackerras <paulus@ozlabs.org>
Cc: Suraj Jitindar Singh <sjitindarsingh@gmail.com>
Signed-off-by: NWanpeng Li <wanpengli@tencent.com>
Message-Id: <1621339235-11131-1-git-send-email-wanpengli@tencent.com>
[Make the function inline. - Paolo]
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>

6bd5b743

net: phy: Document phydev::dev_flags bits allocation · 62f3415d

由 Florian Fainelli 提交于 5月 26, 2021

Document the phydev::dev_flags bit allocation to allow bits 15:0 to
define PHY driver specific behavior, bits 23:16 to be reserved for now,
and bits 31:24 to hold generic PHY driver flags.
Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
Reviewed-by: NRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
Link: https://lore.kernel.org/r/20210526184617.3105012-1-f.fainelli@gmail.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>

62f3415d

26 5月, 2021 2 次提交

SUNRPC: More fixes for backlog congestion · e86be3a0

由 Trond Myklebust 提交于 5月 25, 2021

Ensure that we fix the XPRT_CONGESTED starvation issue for RDMA as well
as socket based transports.
Ensure we always initialise the request after waking up from the backlog
list.

Fixes: e877a88d ("SUNRPC in case of backlog, hand free slots directly to waiting task")
Signed-off-by: NTrond Myklebust <trond.myklebust@hammerspace.com>

e86be3a0

PCI/MSI: Fix MSIs for generic hosts that use device-tree's "msi-map" · 85aabbd7

由 Jean-Philippe Brucker 提交于 5月 10, 2021

Since commit 9ec37efb ("PCI/MSI: Make pci_host_common_probe() declare
its reliance on MSI domains"), platforms that rely on the "msi-map"
device-tree property don't get MSIs anymore.

On the Arm Fast Model for example [1], the host bridge doesn't have a
"msi-parent" property since it doesn't itself generate MSIs, and so doesn't
get a MSI domain. It has an "msi-map" property instead to describe MSI
controllers of child devices. As a result, due to the new msi_domain check
in pci_register_host_bridge(), the whole bus gets PCI_BUS_FLAGS_NO_MSI.

Check whether the root complex has an "msi-map" property before giving
up on MSIs.

[1] arch/arm64/boot/dts/arm/fvp-base-revc.dts

Fixes: 9ec37efb ("PCI/MSI: Make pci_host_common_probe() declare its reliance on MSI domains")
Link: https://lore.kernel.org/r/20210510173129.750496-1-jean-philippe@linaro.orgSigned-off-by: NJean-Philippe Brucker <jean-philippe@linaro.org>
Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
Acked-by: NMarc Zyngier <maz@kernel.org>

85aabbd7

25 5月, 2021 3 次提交

fanotify: fix permission model of unprivileged group · a8b98c80

由 Amir Goldstein 提交于 5月 24, 2021

Reporting event->pid should depend on the privileges of the user that
initialized the group, not the privileges of the user reading the
events.

Use an internal group flag FANOTIFY_UNPRIV to record the fact that the
group was initialized by an unprivileged user.

To be on the safe side, the premissions to setup filesystem and mount
marks now require that both the user that initialized the group and
the user setting up the mark have CAP_SYS_ADMIN.

Link: https://lore.kernel.org/linux-fsdevel/CAOQ4uxiA77_P5vtv7e83g0+9d7B5W9ZTE4GfQEYbWmfT1rA=VA@mail.gmail.com/
Fixes: 7cea2a3c ("fanotify: support limited functionality for unprivileged users")
Cc: <Stable@vger.kernel.org> # v5.12+
Link: https://lore.kernel.org/r/20210524135321.2190062-1-amir73il@gmail.comReviewed-by: NMatthew Bobrowski <repnop@google.com>
Acked-by: NChristian Brauner <christian.brauner@ubuntu.com>
Signed-off-by: NAmir Goldstein <amir73il@gmail.com>
Signed-off-by: NJan Kara <jack@suse.cz>

a8b98c80

init: verify that function is initcall_t at compile-time · 1cb61759

由 Marco Elver 提交于 5月 21, 2021

In the spirit of making it hard to misuse an interface, add a
compile-time assertion in the CONFIG_HAVE_ARCH_PREL32_RELOCATIONS case
to verify the initcall function matches initcall_t, because the inline
asm bypasses any type-checking the compiler would otherwise do. This
will help developers catch incorrect API use in all configurations.

A recent example of this is:
https://lkml.kernel.org/r/20210514140015.2944744-1-arnd@kernel.orgSigned-off-by: NMarco Elver <elver@google.com>
Reviewed-by: NMiguel Ojeda <ojeda@kernel.org>
Reviewed-by: NNathan Chancellor <nathan@kernel.org>
Tested-by: NNathan Chancellor <nathan@kernel.org>
Reviewed-by: NSami Tolvanen <samitolvanen@google.com>
Tested-by: NPaul E. McKenney <paulmck@kernel.org>
Signed-off-by: NKees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20210521072610.2880286-1-elver@google.com

1cb61759

cgroup: fix spelling mistakes · 08b2b6fd

由 Zhen Lei 提交于 5月 24, 2021

Fix some spelling mistakes in comments:
hierarhcy ==> hierarchy
automtically ==> automatically
overriden ==> overridden
In absense of .. or ==> In absence of .. and
assocaited ==> associated
taget ==> target
initate ==> initiate
succeded ==> succeeded
curremt ==> current
udpated ==> updated
Signed-off-by: NZhen Lei <thunder.leizhen@huawei.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

08b2b6fd

24 5月, 2021 2 次提交

regulator: bd71828: Fix .n_voltages settings · 4c668630

由 Axel Lin 提交于 5月 23, 2021

Current .n_voltages settings do not cover the latest 2 valid selectors,
so it fails to set voltage for the hightest voltage support.
The latest linear range has step_uV = 0, so it does not matter if we
count the .n_voltages to maximum selector + 1 or the first selector of
latest linear range + 1.
To simplify calculating the n_voltages, let's just set the
.n_voltages to maximum selector + 1.

Fixes: 522498f8 ("regulator: bd71828: Basic support for ROHM bd71828 PMIC regulators")
Signed-off-by: NAxel Lin <axel.lin@ingics.com>
Reviewed-by: NMatti Vaittinen <matti.vaittinen@fi.rohmeurope.com>
Link: https://lore.kernel.org/r/20210523071045.2168904-2-axel.lin@ingics.comSigned-off-by: NMark Brown <broonie@kernel.org>

4c668630

regulator: bd70528: Fix off-by-one for buck123 .n_voltages setting · 0514582a

由 Axel Lin 提交于 5月 23, 2021

The valid selectors for bd70528 bucks are 0 ~ 0xf, so the .n_voltages
should be 16 (0x10). Use 0x10 to make it consistent with BD70528_LDO_VOLTS.
Also remove redundant defines for BD70528_BUCK_VOLTS.
Signed-off-by: NAxel Lin <axel.lin@ingics.com>
Acked-by: NMatti Vaittinen <matti.vaittinen@fi.rohmeurope.com>
Link: https://lore.kernel.org/r/20210523071045.2168904-1-axel.lin@ingics.comSigned-off-by: NMark Brown <broonie@kernel.org>

0514582a

23 5月, 2021 1 次提交

linux/bits.h: fix compilation error with GENMASK · f747e666

由 Rikard Falkeborn 提交于 5月 22, 2021

GENMASK() has an input check which uses __builtin_choose_expr() to
enable a compile time sanity check of its inputs if they are known at
compile time.

However, it turns out that __builtin_constant_p() does not always return
a compile time constant [0]. It was thought this problem was fixed with
gcc 4.9 [1], but apparently this is not the case [2].

Switch to use __is_constexpr() instead which always returns a compile time
constant, regardless of its inputs.

Link: https://lore.kernel.org/lkml/42b4342b-aefc-a16a-0d43-9f9c0d63ba7a@rasmusvillemoes.dk [0]
Link: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=19449 [1]
Link: https://lore.kernel.org/lkml/1ac7bbc2-45d9-26ed-0b33-bf382b8d858b@I-love.SAKURA.ne.jp [2]
Link: https://lkml.kernel.org/r/20210511203716.117010-1-rikard.falkeborn@gmail.comSigned-off-by: NRikard Falkeborn <rikard.falkeborn@gmail.com>
Reported-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Acked-by: NArnd Bergmann <arnd@arndb.de>
Reviewed-by: NAndy Shevchenko <andy.shevchenko@gmail.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Yury Norov <yury.norov@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f747e666

22 5月, 2021 1 次提交

drivers: base: Fix device link removal · 80dd33cf

由 Rafael J. Wysocki 提交于 5月 14, 2021

When device_link_free() drops references to the supplier and
consumer devices of the device link going away and the reference
being dropped turns out to be the last one for any of those
device objects, its ->release callback will be invoked and it
may sleep which goes against the SRCU callback execution
requirements.

To address this issue, make the device link removal code carry out
the device_link_free() actions preceded by SRCU synchronization from
a separate work item (the "long" workqueue is used for that, because
it does not matter when the device link memory is released and it may
take time to get to that point) instead of using SRCU callbacks.

While at it, make the code work analogously when SRCU is not enabled
to reduce the differences between the SRCU and non-SRCU cases.

Fixes: 843e600b ("driver core: Fix sleeping in invalid context during device link deletion")
Cc: stable <stable@vger.kernel.org>
Reported-by: Nchenxiang (M) <chenxiang66@hisilicon.com>
Tested-by: Nchenxiang (M) <chenxiang66@hisilicon.com>
Reviewed-by: NSaravana Kannan <saravanak@google.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Link: https://lore.kernel.org/r/5722787.lOV4Wx5bFT@kreacherSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

80dd33cf

20 5月, 2021 1 次提交

block: prevent block device lookups at the beginning of del_gendisk · 6c60ff04

由 Christoph Hellwig 提交于 5月 14, 2021

As an artifact of how gendisk lookup used to work in earlier kernels,
GENHD_FL_UP is only cleared very late in del_gendisk, and a global lock
is used to prevent opens from succeeding while del_gendisk is tearing
down the gendisk. Switch to clearing the flag early and under bd_mutex
so that callers can use bd_mutex to stabilize the flag, which removes
the need for the global mutex.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20210514131842.1600568-2-hch@lst.deSigned-off-by: NJens Axboe <axboe@kernel.dk>

6c60ff04

19 5月, 2021 1 次提交

platform/surface: aggregator: avoid clang -Wconstant-conversion warning · ba6e1d84

由 Arnd Bergmann 提交于 5月 14, 2021

Clang complains about the assignment of SSAM_ANY_IID to
ssam_device_uid->instance:

drivers/platform/surface/surface_aggregator_registry.c:478:25: error: implicit conversion from 'int' to '__u8' (aka 'unsigned char') changes value from 65535 to 255 [-Werror,-Wconstant-conversion]
        { SSAM_VDEV(HUB, 0x02, SSAM_ANY_IID, 0x00) },
        ~                      ^~~~~~~~~~~~
include/linux/surface_aggregator/device.h:71:23: note: expanded from macro 'SSAM_ANY_IID'
 #define SSAM_ANY_IID            0xffff
                                ^~~~~~
include/linux/surface_aggregator/device.h:126:63: note: expanded from macro 'SSAM_VDEV'
        SSAM_DEVICE(SSAM_DOMAIN_VIRTUAL, SSAM_VIRTUAL_TC_##cat, tid, iid, fun)
                                                                     ^~~
include/linux/surface_aggregator/device.h:102:41: note: expanded from macro 'SSAM_DEVICE'
        .instance = ((iid) != SSAM_ANY_IID) ? (iid) : 0,                        \
                                               ^~~

The assignment doesn't actually happen, but clang checks the type limits
before checking whether this assignment is reached. Replace the ?:
operator with a __builtin_choose_expr() invocation that avoids the
warning for the untaken part.

Fixes: eb0e90a8 ("platform/surface: aggregator: Add dedicated bus and device type")
Cc: platform-driver-x86@vger.kernel.org
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Reviewed-by: NNathan Chancellor <nathan@kernel.org>
Reviewed-by: NMaximilian Luz <luzmaximilian@gmail.com>
Link: https://lore.kernel.org/r/20210514200453.1542978-1-arnd@kernel.orgSigned-off-by: NHans de Goede <hdegoede@redhat.com>

ba6e1d84

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功