提交 · 86a34ffef4f1d44dacf666ed3300a273166b3815 · openanolis / cloud-kernel

29 6月, 2020 25 次提交

ACPI / APEI: Switch NOTIFY_SEA to use the estatus queue · 86a34ffe

由 James Morse 提交于 1月 29, 2019

fix #28612342

commit 255097c82d821bb2bb18e9c7011841ee7342840f upstream

Now that the estatus queue can be used by more than one notification
method, we can move notifications that have NMI-like behaviour over.

Switch NOTIFY_SEA over to use the estatus queue. This makes it behave
in the same way as x86's NOTIFY_NMI.

Remove Kconfig's ability to turn ACPI_APEI_SEA off if ACPI_APEI_GHES
is selected. This roughly matches the x86 NOTIFY_NMI behaviour, and means
each architecture has at least one user of the estatus-queue, meaning it
doesn't need guarding with ifdef.
Signed-off-by: NJames Morse <james.morse@arm.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: NAlex Shi <alex.shi@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
Reviewed-by: Nluanshi <zhangliguang@linux.alibaba.com>

86a34ffe

ACPI / APEI: Move NOTIFY_SEA between the estatus-queue and NOTIFY_NMI · 73935d30

由 James Morse 提交于 1月 29, 2019

fix #28612342

commit 9c9d08051380ad3f6e6376d4383615771c59fd99 upstream

The estatus-queue code is currently hidden by the NOTIFY_NMI #ifdefs.
Once NOTIFY_SEA starts using the estatus-queue we can stop hiding
it as each architecture has a user that can't be turned off.

Split the existing CONFIG_HAVE_ACPI_APEI_NMI block in two, and move
the SEA code into the gap.

Move the code around ... and changes the stale comment describing
why the status queue is necessary: printk() is no longer the issue,
its the helpers like memory_failure_queue() that aren't nmi safe.
Signed-off-by: NJames Morse <james.morse@arm.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: NAlex Shi <alex.shi@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
Reviewed-by: Nluanshi <zhangliguang@linux.alibaba.com>

73935d30

ACPI / APEI: Don't allow ghes_ack_error() to mask earlier errors · 71409d22

由 James Morse 提交于 1月 29, 2019

fix #28612342

commit 06ddeadc8d1c4f704b8956f239263bca75a3add8 upstream

During ghes_proc() we use ghes_ack_error() to tell an external agent
we are done with these records and it can re-use the memory.

rc may hold an error returned by ghes_read_estatus(), ENOENT causes
us to skip ghes_ack_error() (as there is nothing to ack), but rc may
also by EIO, which gets supressed.

ghes_clear_estatus() is where we mark the records as processed for
non GHESv2 error sources, and already spots the ENOENT case as
buf_paddr is set to 0 by ghes_read_estatus().

Move the ghes_ack_error() call in here to avoid extra logic with
the return code in ghes_proc().

This enables GHESv2 acking for NMI-like error sources. This is safe
as the buffer is pre-mapped by map_gen_v2() before the GHES is added
to any NMI handler lists.

This same pre-mapping step means we can't receive an error from
apei_read()/write() here as apei_check_gar() succeeded when it
was mapped, and the mapping was cached, so the address can't be
rejected at runtime. Remove the error-returns as this is now
called from a function with no return.
Signed-off-by: NJames Morse <james.morse@arm.com>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: NAlex Shi <alex.shi@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
Reviewed-by: Nluanshi <zhangliguang@linux.alibaba.com>

71409d22

ACPI / APEI: Generalise the estatus queue's notify code · dedbb9b0

由 James Morse 提交于 1月 29, 2019

fix #28612342

commit ee2eb3d4ee175c2fb5c7f67e84f5fe40a8147d92 upstream

Refactor the estatus queue's pool notification routine from
NOTIFY_NMI's handlers. This will allow another notification
method to use the estatus queue without duplicating this code.

Add rcu_read_lock()/rcu_read_unlock() around the list
list_for_each_entry_rcu() walker. These aren't strictly necessary as
the whole nmi_enter/nmi_exit() window is a spooky RCU read-side
critical section.

in_nmi_queue_one_entry() is separate from the rcu-list walker for a
later caller that doesn't need to walk a list.
Signed-off-by: NJames Morse <james.morse@arm.com>
Reviewed-by: NPunit Agrawal <punit.agrawal@arm.com>
Tested-by: NTyler Baicar <tbaicar@codeaurora.org>
[ rjw: Drop unnecessary err variable in two places ]
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: NAlex Shi <alex.shi@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
Reviewed-by: Nluanshi <zhangliguang@linux.alibaba.com>

dedbb9b0

ACPI / APEI: Don't update struct ghes' flags in read/clear estatus · 30da7434

由 James Morse 提交于 1月 29, 2019

fix #28612342

commit 5cc6c68287ae4be22c40b41cf6844746cddebbcc upstream

ghes_read_estatus() sets a flag in struct ghes if the buffer of
CPER records needs to be cleared once the records have been
processed. This flag value is a problem if a struct ghes can be
processed concurrently, as happens at probe time if an NMI arrives
for the same error source. The NMI clears the flag, meaning the
interrupted handler may never do the ghes_estatus_clear() work.

The GHES_TO_CLEAR flags is only set at the same time as
buffer_paddr, which is now owned by the caller and passed to
ghes_clear_estatus(). Use this value as the flag.

A non-zero buf_paddr returned by ghes_read_estatus() means
ghes_clear_estatus() should clear this address. ghes_read_estatus()
already checks for a read of error_status_address being zero,
so CPER records cannot be written here.
Signed-off-by: NJames Morse <james.morse@arm.com>
Reviewed-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: NAlex Shi <alex.shi@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
Reviewed-by: Nluanshi <zhangliguang@linux.alibaba.com>

30da7434

ACPI / APEI: Remove spurious GHES_TO_CLEAR check · 3ed68924

由 James Morse 提交于 1月 29, 2019

fix #28612342

commit 7d49f2c75af22f980fd716a13634a16cfb7dd8a7 upstream

ghes_notify_nmi() checks ghes->flags for GHES_TO_CLEAR before going
on to __process_error(). This is pointless as ghes_read_estatus()
will always set this flag if it returns success, which was checked
earlier in the loop. Remove it.
Signed-off-by: NJames Morse <james.morse@arm.com>
Reviewed-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: NAlex Shi <alex.shi@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
Reviewed-by: Nluanshi <zhangliguang@linux.alibaba.com>

3ed68924

ACPI / APEI: Don't store CPER records physical address in struct ghes · 453a258b

由 James Morse 提交于 1月 29, 2019

fix #28612342

commit eeb2555779471abdbcc6289a52dc54ce513feaf2 upstream

When CPER records are found the address of the records is stashed
in the struct ghes. Once the records have been processed, this
address is overwritten with zero so that it won't be processed
again without being re-populated by firmware.

This goes wrong if a struct ghes can be processed concurrently,
as can happen at probe time when an NMI occurs. If the NMI arrives
on another CPU, the probing CPU may call ghes_clear_estatus() on the
records before the handler had finished with them.
Even on the same CPU, once the interrupted handler is resumed, it
will call ghes_clear_estatus() on the NMIs records, this memory may
have already been re-used by firmware.

Avoid this stashing by letting the caller hold the address. A
later patch will do away with the use of ghes->flags in the
read/clear code too.
Signed-off-by: NJames Morse <james.morse@arm.com>
Reviewed-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: NAlex Shi <alex.shi@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
Reviewed-by: Nluanshi <zhangliguang@linux.alibaba.com>

453a258b

ACPI / APEI: Make estatus pool allocation a static size · 4d0a055c

由 James Morse 提交于 1月 29, 2019

fix #28612342

commit fb7be08f1a091ec243780bfdad4bf0c492057808 upstream

Adding new NMI-like notifications duplicates the calls that grow
and shrink the estatus pool. This is all pretty pointless, as the
size is capped to 64K. Allocate this for each ghes and drop
the code that grows and shrinks the pool.
Suggested-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NJames Morse <james.morse@arm.com>
Reviewed-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: NAlex Shi <alex.shi@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
Reviewed-by: Nluanshi <zhangliguang@linux.alibaba.com>

4d0a055c

ACPI / APEI: Make hest.c manage the estatus memory pool · 06905477

由 James Morse 提交于 1月 29, 2019

fix #28612342

commit e147133a42cb9df6cbc99503fdf58d0e6388bf2a upstream

ghes.c has a memory pool it uses for the estatus cache and the estatus
queue. The cache is initialised when registering the platform driver.
For the queue, an NMI-like notification has to grow/shrink the pool
as it is registered and unregistered.

This is all pretty noisy when adding new NMI-like notifications, it
would be better to replace this with a static pool size based on the
number of users.

As a precursor, move the call that creates the pool from ghes_init(),
into hest.c. Later this will take the number of ghes entries and
consolidate the queue allocations.
Remove ghes_estatus_pool_exit() as hest.c doesn't have anywhere to put
this.

The pool is now initialised as part of ACPI's subsys_initcall():
(acpi_init(), acpi_scan_init(), acpi_pci_root_init(), acpi_hest_init())
Before this patch it happened later as a GHES specific device_initcall().
Signed-off-by: NJames Morse <james.morse@arm.com>
Reviewed-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: NAlex Shi <alex.shi@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
Reviewed-by: Nluanshi <zhangliguang@linux.alibaba.com>

06905477

ACPI / APEI: Remove silent flag from ghes_read_estatus() · 9dc5e569

由 James Morse 提交于 1月 29, 2019

fix #28612342

commit 93066e9aefa16beb10bb4a32c2f1657822b57753 upstream

Subsequent patches will split up ghes_read_estatus(), at which
point passing around the 'silent' flag gets annoying. This is to
suppress prink() messages, which prior to commit 42a0bb3f
("printk/nmi: generic solution for safe printk in NMI"), were
unsafe in NMI context.

This is no longer necessary, remove the flag. printk() messages
are batched in a per-cpu buffer and printed via irq-work, or a call
back from panic().
Signed-off-by: NJames Morse <james.morse@arm.com>
Reviewed-by: NBorislav Petkov <bp@suse.de>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: NAlex Shi <alex.shi@linux.alibaba.com>
Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
Reviewed-by: Nluanshi <zhangliguang@linux.alibaba.com>

9dc5e569

blk-mq: use plug for devices that implement ->commits_rqs() · 92339373

由 Jens Axboe 提交于 11月 29, 2018

fix #28871358

commit b2c5d16b72df1116f05c9be16a630ac939d34101 upstream

If we have that hook, we know the driver handles bd->last == true in
a smart fashion. If it does, even for multiple hardware queues, it's
a good idea to flush batches of requests to the device, if we have
batches of requests from the submitter.
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

92339373

blk-mq: use bd->last == true for list inserts · 9e166ffa

由 Jens Axboe 提交于 11月 24, 2018

fix #28871358

commit be94f058f2bde6f0b0ee9059a35daa8e15be308f upstream

If we are issuing a list of requests, we know if we're at the last one.
If we fail issuing, ensure that we call ->commits_rqs() to flush any
potential previous requests.
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

9e166ffa

virtio_blk: implement mq_ops->commit_rqs() hook · 957156e5

由 Jens Axboe 提交于 11月 26, 2018

fix #28871358

commit 944e7c87967c820a0f34a935b1f2799944099750 upstream

We need this for blk-mq to kick things into gear, if we told it that
we had more IO coming, but then failed to deliver on that promise.
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

957156e5

nvme: implement mq_ops->commit_rqs() hook · a2724bab

由 Jens Axboe 提交于 11月 29, 2018

fix #28871358

commit 04f3eafda6e05adc56afed4d3ae6e24aaa429058 upstream

Split the command submission and the SQ doorbell ring, and add the
doorbell ring as our ->commit_rqs() hook. This allows a list of
requests to be issued, with nvme only writing the SQ update when
it's necessary. This is more efficient if we have lists of requests
to issue, particularly on virtualized hardware, where writing the
SQ doorbell is more expensive than on real hardware. For those cases,
performance increases of 2-3x have been observed.

The use case for this is plugged IO, where blk-mq flushes a batch of
requests at the time.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

a2724bab

blk-mq: add mq_ops->commit_rqs() · 0111cff3

由 Jens Axboe 提交于 11月 27, 2018

fix #28871358

commit d666ba98f849ad44c4405ecc2180390ebe80f4f9 upstream

blk-mq passes information to the hardware about any given request being
the last that we will issue in this sequence. The point is that hardware
can defer costly doorbell type writes to the last request. But if we run
into errors issuing a sequence of requests, we may never send the request
with bd->last == true set. For that case, we need a hook that tells the
hardware that nothing else is coming right now.

For failures returned by the drivers ->queue_rq() hook, the driver is
responsible for flushing pending requests, if it uses bd->last to
optimize that part. This works like before, no changes there.
Reviewed-by: NOmar Sandoval <osandov@fb.com>
Reviewed-by: NMing Lei <ming.lei@redhat.com>
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

0111cff3

block: improve logic around when to sort a plug list · 26702d43

由 Jens Axboe 提交于 11月 27, 2018

fix #28871358

Only do it if we have requests for multiple queues in the same
plug.
Reviewed-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NJens Axboe <axboe@kernel.dk>
Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

26702d43

sched/cpufreq: Move the cfs_rq_util_change() call to cpufreq_update_util() · 8edf7c76

由 Vincent Guittot 提交于 6月 24, 2020

to #28739709

commit bef69dd87828ef5d8ecdab8d857cd3a33cf98675 upstream

update_cfs_rq_load_avg() calls cfs_rq_util_change() every time PELT decays,
which might be inefficient when the cpufreq driver has rate limitation.

When a task is attached on a CPU, we have this call path:

update_load_avg()
  update_cfs_rq_load_avg()
    cfs_rq_util_change -- > trig frequency update
  attach_entity_load_avg()
    cfs_rq_util_change -- > trig frequency update

The 1st frequency update will not take into account the utilization of the
newly attached task and the 2nd one might be discarded because of rate
limitation of the cpufreq driver.

update_cfs_rq_load_avg() is only called by update_blocked_averages()
and update_load_avg() so we can move the call to
cfs_rq_util_change/cpufreq_update_util() into these two functions.

It's also interesting to note that update_load_avg() already calls
cfs_rq_util_change() directly for the !SMP case.

This change will also ensure that cpufreq_update_util() is called even
when there is no more CFS rq in the leaf_cfs_rq_list to update, but only
IRQ, RT or DL PELT signals.

[ mingo: Minor updates. ]
Reported-by: NDoug Smythies <dsmythies@telus.net>
Tested-by: NDoug Smythies <dsmythies@telus.net>
Signed-off-by: NVincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: NDietmar Eggemann <dietmar.eggemann@arm.com>
Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: juri.lelli@redhat.com
Cc: linux-pm@vger.kernel.org
Cc: mgorman@suse.de
Cc: rostedt@goodmis.org
Cc: sargun@sargun.me
Cc: srinivas.pandruvada@linux.intel.com
Cc: tj@kernel.org
Cc: xiexiuqi@huawei.com
Cc: xiezhipeng1@huawei.com
Fixes: 039ae8b ("sched/fair: Fix O(nr_cgroups) in the load balancing path")
Link: https://lkml.kernel.org/r/1574083279-799-1-git-send-email-vincent.guittot@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com>
Acked-by: NMichael Wang <yun.wang@linux.alibaba.com>

8edf7c76

sched/fair: Fix O(nr_cgroups) in the load balancing path · a843cbc3

由 Vincent Guittot 提交于 2月 06, 2019

to #28739709

commit 039ae8bcf7a5f4476f4487e6bf816885fb3fb617 upstream

This re-applies the commit reverted here:

  commit c40f7d74c741 ("sched/fair: Fix infinite loop in update_blocked_averages() by reverting a9e7f654")

I.e. now that cfs_rq can be safely removed/added in the list, we can re-apply:

 commit a9e7f654 ("sched/fair: Fix O(nr_cgroups) in load balance path")
Signed-off-by: NVincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: sargun@sargun.me
Cc: tj@kernel.org
Cc: xiexiuqi@huawei.com
Cc: xiezhipeng1@huawei.com
Link: https://lkml.kernel.org/r/1549469662-13614-3-git-send-email-vincent.guittot@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com>
Acked-by: NMichael Wang <yun.wang@linux.alibaba.com>

a843cbc3

sched/fair: Optimize update_blocked_averages() · 3dd1a120

由 Vincent Guittot 提交于 2月 06, 2019

to #28739709

commit 31bc6aeaab1d1de8959b67edbed5c7a4b3cdbe7c upstream

Removing a cfs_rq from rq->leaf_cfs_rq_list can break the parent/child
ordering of the list when it will be added back. In order to remove an
empty and fully decayed cfs_rq, we must remove its children too, so they
will be added back in the right order next time.

With a normal decay of PELT, a parent will be empty and fully decayed
if all children are empty and fully decayed too. In such a case, we just
have to ensure that the whole branch will be added when a new task is
enqueued. This is default behavior since :

  commit f6783319737f ("sched/fair: Fix insertion in rq->leaf_cfs_rq_list")

In case of throttling, the PELT of throttled cfs_rq will not be updated
whereas the parent will. This breaks the assumption made above unless we
remove the children of a cfs_rq that is throttled. Then, they will be
added back when unthrottled and a sched_entity will be enqueued.

As throttled cfs_rq are now removed from the list, we can remove the
associated test in update_blocked_averages().
Signed-off-by: NVincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: sargun@sargun.me
Cc: tj@kernel.org
Cc: xiexiuqi@huawei.com
Cc: xiezhipeng1@huawei.com
Link: https://lkml.kernel.org/r/1549469662-13614-2-git-send-email-vincent.guittot@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com>
Acked-by: NMichael Wang <yun.wang@linux.alibaba.com>

3dd1a120

alinux: sched: Fix wrong cpuacct_update_latency declaration · c7462cfe

由 Yihao Wu 提交于 6月 29, 2020

to #29028845

cpuacct_update_latency's declaration was changed since 6dbaddaa,
but was not changed for the case when CONFIG_SCHED_SLI=n. This leads
to a compilation error.

Fixes: 6dbaddaa ("alinux: sched: Add cgroup's scheduling latency histograms")
Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com>
Acked-by: NMichael Wang <yun.wang@linux.alibaba.com>

c7462cfe

usb: add a hcd_uses_dma helper · 1d57a672

由 Christoph Hellwig 提交于 8月 11, 2019

fix #28339081

commit edfbcb321faf07ca970e4191abe061deeb7d3788 upstream

The USB buffer allocation code is the only place in the usb core (and in
fact the whole kernel) that uses is_device_dma_capable, while the URB
mapping code uses the uses_dma flag in struct usb_bus. Switch the buffer
allocation to use the uses_dma flag used by the rest of the USB code,
and create a helper in hcd.h that checks this flag as well as the
CONFIG_HAS_DMA to simplify the caller a bit.
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20190811080520.21712-3-hch@lst.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: NAlex Shi <alex.shi@linux.alibaba.com>

1d57a672

usb: don't create dma pools for HCDs with a localmem_pool · d17952a9

由 Christoph Hellwig 提交于 8月 11, 2019

fix #28339081

commit dd3ecf17ba70a70d2c9ef9ba725281b84f8eef12 upstream

If the HCD provides a localmem pool we will never use the DMA pools, so
don't create them.

Fixes: b0310c2f09bb ("USB: use genalloc for USB HCs with local memory")
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/r/20190811080520.21712-2-hch@lst.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: NAlex Shi <alex.shi@linux.alibaba.com>

d17952a9

USB: drop HCD_LOCAL_MEM flag · ac971f36

由 Laurentiu Tudor 提交于 5月 29, 2019

fix #28339081

commit 2d7a3dc3e24f43504b1f25eae8195e600f4cce8b upstream

With the addition of the local memory allocator, the HCD_LOCAL_MEM
flag can be dropped and the checks against it replaced with a check
for the localmem_pool ptr being initialized.
Signed-off-by: NLaurentiu Tudor <laurentiu.tudor@nxp.com>
Tested-by: NFredrik Noring <noring@nocrew.org>
Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: NAlex Shi <alex.shi@linux.alibaba.com>

ac971f36

USB: use genalloc for USB HCs with local memory · af6ff530

由 Laurentiu Tudor 提交于 5月 29, 2019

fix #28339081

commit b0310c2f09bbe8aebefb97ed67949a3a7092aca6 upstream

For HCs that have local memory, replace the current DMA API usage with
a genalloc generic allocator to manage the mappings for these devices.
To help users, introduce a new HCD API, usb_hcd_setup_local_mem() that
will setup up the genalloc backing up the device local memory. It will
be used in subsequent patches.  This is in preparation for dropping
the existing "coherent" dma mem declaration APIs.  The current
implementation was relying on a short circuit in the DMA API that in
the end, was acting as an allocator for these type of devices.
Signed-off-by: NLaurentiu Tudor <laurentiu.tudor@nxp.com>
Tested-by: NFredrik Noring <noring@nocrew.org>
Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: NAlex Shi <alex.shi@linux.alibaba.com>

af6ff530

lib/genalloc: add gen_pool_dma_zalloc() for zeroed DMA allocations · 45d3ff62

由 Fredrik Noring 提交于 5月 29, 2019

fix #28339081

commit da83a722959a82733c3ca60030cc364ca2318c5a upstream

gen_pool_dma_zalloc() is a zeroed memory variant of
gen_pool_dma_alloc().  Also document the return values of both, and
indicate NULL as a "%NULL" constant.
Signed-off-by: NFredrik Noring <noring@nocrew.org>
Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Reviewed-by: NAlex Shi <alex.shi@linux.alibaba.com>

45d3ff62

28 6月, 2020 1 次提交

configs: Enabled acpi-cpufreq for x86 platform · e61cc661

由 Zelin Deng 提交于 6月 23, 2020

fix #28886284

On AMD platforms cpu frequency was not able to be tuned as there's no
cpufreq driver registered -- intel_pstate has been enabled but it only
can be loaded on Intel CPUs. Hence after evaluated and validated on AMD
platforms, we decide to enable acpi-cpufreq.
acpi-cpufreq won't impact on intel_pstate on Intel platforms as
intel_pstate will be loaded in device_initcall while acpi-cpufreq will
be loaded in late_initcall. This sequence ensure intel_pstate can be
loaded but acpi-cpufreq can not on Intel platforms.
Signed-off-by: NZelin Deng <zelin.deng@linux.alibaba.com>
Reviewed-by: NCaspar Zhang <caspar@linux.alibaba.com>
Reviewed-by: NArtie Ding <fulin.dn@linux.alibaba.com>
Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>

e61cc661

24 6月, 2020 14 次提交

sched/fair: Remove sgs->sum_weighted_load · 980d06cf

由 Dietmar Eggemann 提交于 5月 27, 2019

to #28739709

commit af75d1a9a9f75bf030c2f35705f1ff6d226f96fe upstream

Since sg_lb_stats::sum_weighted_load is now identical with
sg_lb_stats::group_load remove it and replace its use case
(calculating load per task) with the latter.
Signed-off-by: NDietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NRik van Riel <riel@surriel.com>
Acked-by: NVincent Guittot <vincent.guittot@linaro.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Patrick Bellasi <patrick.bellasi@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Perret <quentin.perret@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Valentin Schneider <valentin.schneider@arm.com>
Link: https://lkml.kernel.org/r/20190527062116.11512-7-dietmar.eggemann@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com>
Reviewed-by: NShanpei Chen <shanpeic@linux.alibaba.com>
Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com>

980d06cf

sched/core: Remove sd->*_idx · b21904d4

由 Dietmar Eggemann 提交于 5月 27, 2019

to #28739709

commit 0e1fef63d92d61ed561e504c3a078a827a0f9bfe upstream

The sched domain per rq load index files also disappear from the
/proc/sys/kernel/sched_domain/cpuX/domainY directories.
Signed-off-by: NDietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NRik van Riel <riel@surriel.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Patrick Bellasi <patrick.bellasi@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Perret <quentin.perret@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Valentin Schneider <valentin.schneider@arm.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20190527062116.11512-6-dietmar.eggemann@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com>
Reviewed-by: NShanpei Chen <shanpeic@linux.alibaba.com>

b21904d4

sched/core: Remove rq->cpu_load[] · c3055505

由 Dietmar Eggemann 提交于 5月 27, 2019

to #28739709

commit 55627e3cd22c315c4a02fe3bbbb7234ec439cb1d upstream

The per rq load array values also disappear from the cpu#X sections in
/proc/sched_debug.
Signed-off-by: NDietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NRik van Riel <riel@surriel.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Patrick Bellasi <patrick.bellasi@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Perret <quentin.perret@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Valentin Schneider <valentin.schneider@arm.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20190527062116.11512-5-dietmar.eggemann@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com>
Reviewed-by: NShanpei Chen <shanpeic@linux.alibaba.com>

c3055505

sched/debug: Remove sd->*_idx range on sysctl · 606eca95

由 Dietmar Eggemann 提交于 5月 27, 2019

to #28739709

commit 3d8d53554405952993bb0279ef3ebebc51740074 upstream

This reverts:

  commit 201c373e ("sched/debug: Limit sd->*_idx range on sysctl")

Load indexes (sd->*_idx) are no longer needed without rq->cpu_load[].
The range check for load indexes can be removed as well. Get rid of it
before the rq->cpu_load[] since it uses CPU_LOAD_IDX_MAX.

At the same time, fix the following coding style issues detected by
scripts/checkpatch.pl:

  ERROR: space prohibited before that ','
  ERROR: space prohibited before that close parenthesis ')'
Signed-off-by: NDietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NRik van Riel <riel@surriel.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Patrick Bellasi <patrick.bellasi@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Perret <quentin.perret@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Valentin Schneider <valentin.schneider@arm.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20190527062116.11512-4-dietmar.eggemann@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com>
Reviewed-by: NShanpei Chen <shanpeic@linux.alibaba.com>

606eca95

sched/fair: Replace source_load() & target_load() with weighted_cpuload() · d24a239d

由 Dietmar Eggemann 提交于 5月 27, 2019

to #28739709

commit 1c1b8a7b03ef50f80f5d0c871ee261c04a6c967e upstream

With LB_BIAS disabled, source_load() & target_load() return
weighted_cpuload(). Replace both with calls to weighted_cpuload().

The function to obtain the load index (sd->*_idx) for an sd,
get_sd_load_idx(), can be removed as well.

Finally, get rid of the sched feature LB_BIAS.
Signed-off-by: NDietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NRik van Riel <riel@surriel.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Patrick Bellasi <patrick.bellasi@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Perret <quentin.perret@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Valentin Schneider <valentin.schneider@arm.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20190527062116.11512-3-dietmar.eggemann@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com>
Reviewed-by: NShanpei Chen <shanpeic@linux.alibaba.com>

d24a239d

sched/fair: Remove the rq->cpu_load[] update code · 112598d6

由 Dietmar Eggemann 提交于 5月 27, 2019

to #28739709

commit 5e83eafbfd3b351537c0d74467fc43e8a88f4ae4 upstream

With LB_BIAS disabled, there is no need to update the rq->cpu_load[idx]
any more.
Signed-off-by: NDietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: NRik van Riel <riel@surriel.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Morten Rasmussen <morten.rasmussen@arm.com>
Cc: Patrick Bellasi <patrick.bellasi@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Perret <quentin.perret@arm.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Valentin Schneider <valentin.schneider@arm.com>
Cc: Vincent Guittot <vincent.guittot@linaro.org>
Link: https://lkml.kernel.org/r/20190527062116.11512-2-dietmar.eggemann@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com>
Reviewed-by: NShanpei Chen <shanpeic@linux.alibaba.com>

112598d6

sched/fair: Remove rq->load · be11b02d

由 Dietmar Eggemann 提交于 4月 24, 2019

to #28739709

commit f2bedc4705659216bd60948029ad8dfedf923ad9 upstream

The CFS class is the only one maintaining and using the CPU wide load
(rq->load(.weight)). The last use case of the CPU wide load in CFS's
set_next_entity() can be replaced by using the load of the CFS class
(rq->cfs.load(.weight)) instead.
Signed-off-by: NDietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: https://lkml.kernel.org/r/20190424084556.604-1-dietmar.eggemann@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com>
Reviewed-by: NShanpei Chen <shanpeic@linux.alibaba.com>

be11b02d

cpuidle: menu: Remove get_loadavg() from the performance multiplier · 800bf05d

由 Daniel Lezcano 提交于 10月 04, 2018

to #28739709

commit a7fe5190c03f8137ef08db84a58dd4daf2c4785d upstream

The function get_loadavg() returns almost always zero. To be more
precise, statistically speaking for a total of 1023379 times passing
in the function, the load is equal to zero 1020728 times, greater than
100, 610 times, the remaining is between 0 and 5.

In 2011, the get_loadavg() was removed from the Android tree because
of the above [1]. At this time, the load was:

unsigned long this_cpu_load(void)
{
        struct rq *this = this_rq();
        return this->cpu_load[0];
}

In 2014, the code was changed by commit 372ba8cb (cpuidle: menu: Lookup CPU
runqueues less) and the load is:

void get_iowait_load(unsigned long *nr_waiters, unsigned long *load)
{
        struct rq *rq = this_rq();
        *nr_waiters = atomic_read(&rq->nr_iowait);
        *load = rq->load.weight;
}

with the same result.

Both measurements show using the load in this code path does no matter
anymore. Removing it.

[1] https://android.googlesource.com/kernel/common/+/4dedd9f124703207895777ac6e91dacde0f7cc17Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: NMel Gorman <mgorman@suse.de>
Acked-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com>
Reviewed-by: NShanpei Chen <shanpeic@linux.alibaba.com>

800bf05d

sched/fair: Disable LB_BIAS by default · bae52979

由 Dietmar Eggemann 提交于 8月 09, 2018

to #28739709

commit fdf5f315d5cfaefb7bb8a62ec4bf37b9891837aa upstream

LB_BIAS allows the adjustment on how conservative load should be
balanced.

The rq->cpu_load[idx] array is used for this functionality. It contains
weighted CPU load decayed average values over different intervals
(idx = 1..4). Idx = 0 is the weighted CPU load itself.

The values are updated during scheduler_tick, before idle balance and at
nohz exit.

There are 5 different types of idx's per sched domain (sd). Each of them
is used to index into the rq->cpu_load[idx] array in a specific scenario
(busy, idle and newidle for load balancing, forkexec for wake-up
slow-path load balancing and wake for affine wakeup based on weight).
Only the sd idx's for busy and idle load balancing are set to 2,3 or 1,2
respectively. All the other sd idx's are set to 0.

Conservative load balancing is achieved for sd idx's >= 1 by using the
min/max (source_load()/target_load()) value between the current weighted
CPU load and the rq->cpu_load[sd idx -1] for the busiest(idlest)/local
CPU load in load balancing or vice versa in the wake-up slow-path load
balancing.
There is no conservative balancing for sd idx = 0 since only current
weighted CPU load is used in this case.

It is very likely that LB_BIAS' influence on load balancing can be
neglected (see test results below). This is further supported by:

(1) Weighted CPU load today is by itself a decayed average value (PELT)
    (cfs_rq->avg->runnable_load_avg) and not the instantaneous load
    (rq->load.weight) it was when LB_BIAS was introduced.

(2) Sd imbalance_pct is used for CPU_NEWLY_IDLE and CPU_NOT_IDLE (relate
    to sd's newidle and busy idx) in find_busiest_group() when comparing
    busiest and local avg load to make load balancing even more
    conservative.

(3) The sd forkexec and newidle idx are always set to 0 so there is no
    adjustment on how conservatively load balancing is done here.

(4) Affine wakeup based on weight (wake_affine_weight()) will not be
    impacted since the sd wake idx is always set to 0.

Let's disable LB_BIAS by default for a few kernel releases to make sure
that no workload and no scheduler topology is affected. The benefit of
being able to remove the LB_BIAS dependency from source_load() and
target_load() is that the entire rq->cpu_load[idx] code could be removed
in this case.

It is really hard to say if there is no regression w/o testing this with
a lot of different workloads on a lot of different platforms, especially
NUMA machines.
The following 104 LKP (Linux Kernel Performance) tests were run by the
0-Day guys mostly on multi-socket hosts with a larger number of logical
cpus (88, 192).
The base for the test was commit b3dae109 ("sched/swait: Rename to
exclusive") (tip/sched/core v4.18-rc1).
Only 2 out of the 104 tests had a significant change in one of the
metrics (fsmark/1x-1t-1HDD-btrfs-nfsv4-4M-60G-NoSync-performance +7%
files_per_sec, unixbench/300s-100%-syscall-performance -11% score).
Tests which showed a change in one of the metrics are marked with a '*'
and this change is listed as well.

(a) lkp-bdw-ep3:
      88 threads Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz 64G

    dd-write/10m-1HDD-cfq-btrfs-100dd-performance
    fsmark/1x-1t-1HDD-xfs-nfsv4-4M-60G-NoSync-performance
  * fsmark/1x-1t-1HDD-btrfs-nfsv4-4M-60G-NoSync-performance
      7.50  7%  8.00  ±  6%  fsmark.files_per_sec
    fsmark/1x-1t-1HDD-btrfs-nfsv4-4M-60G-fsyncBeforeClose-performance
    fsmark/1x-1t-1HDD-btrfs-4M-60G-NoSync-performance
    fsmark/1x-1t-1HDD-btrfs-4M-60G-fsyncBeforeClose-performance
    kbuild/300s-50%-vmlinux_prereq-performance
    kbuild/300s-200%-vmlinux_prereq-performance
    kbuild/300s-50%-vmlinux_prereq-performance-1HDD-ext4
    kbuild/300s-200%-vmlinux_prereq-performance-1HDD-ext4

(b) lkp-skl-4sp1:
      192 threads Intel(R) Xeon(R) Platinum 8160 768G

    dbench/100%-performance
    ebizzy/200%-100x-10s-performance
    hackbench/1600%-process-pipe-performance
    iperf/300s-cs-localhost-tcp-performance
    iperf/300s-cs-localhost-udp-performance
    perf-bench-numa-mem/2t-300M-performance
    perf-bench-sched-pipe/10000000ops-process-performance
    perf-bench-sched-pipe/10000000ops-threads-performance
    schbench/2-16-300-30000-30000-performance
    tbench/100%-cs-localhost-performance

(c) lkp-bdw-ep6:
      88 threads Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz 128G

    stress-ng/100%-60s-pipe-performance
    unixbench/300s-1-whetstone-double-performance
    unixbench/300s-1-shell1-performance
    unixbench/300s-1-shell8-performance
    unixbench/300s-1-pipe-performance
  * unixbench/300s-1-context1-performance
      312  315  unixbench.score
    unixbench/300s-1-spawn-performance
    unixbench/300s-1-syscall-performance
    unixbench/300s-1-dhry2reg-performance
    unixbench/300s-1-fstime-performance
    unixbench/300s-1-fsbuffer-performance
    unixbench/300s-1-fsdisk-performance
    unixbench/300s-100%-whetstone-double-performance
    unixbench/300s-100%-shell1-performance
    unixbench/300s-100%-shell8-performance
    unixbench/300s-100%-pipe-performance
    unixbench/300s-100%-context1-performance
    unixbench/300s-100%-spawn-performance
  * unixbench/300s-100%-syscall-performance
      3571  ±  3%  -11%  3183  ±  4%  unixbench.score
    unixbench/300s-100%-dhry2reg-performance
    unixbench/300s-100%-fstime-performance
    unixbench/300s-100%-fsbuffer-performance
    unixbench/300s-100%-fsdisk-performance
    unixbench/300s-1-execl-performance
    unixbench/300s-100%-execl-performance
  * will-it-scale/brk1-performance
      365004  360387  will-it-scale.per_thread_ops
  * will-it-scale/dup1-performance
      432401  437596  will-it-scale.per_thread_ops
    will-it-scale/eventfd1-performance
    will-it-scale/futex1-performance
    will-it-scale/futex2-performance
    will-it-scale/futex3-performance
    will-it-scale/futex4-performance
    will-it-scale/getppid1-performance
    will-it-scale/lock1-performance
    will-it-scale/lseek1-performance
    will-it-scale/lseek2-performance
  * will-it-scale/malloc1-performance
      47025  45817  will-it-scale.per_thread_ops
      77499  76529  will-it-scale.per_process_ops
    will-it-scale/malloc2-performance
  * will-it-scale/mmap1-performance
      123399  120815  will-it-scale.per_thread_ops
      152219  149833  will-it-scale.per_process_ops
  * will-it-scale/mmap2-performance
      107327  104714  will-it-scale.per_thread_ops
      136405  133765  will-it-scale.per_process_ops
    will-it-scale/open1-performance
  * will-it-scale/open2-performance
      171570  168805  will-it-scale.per_thread_ops
      532644  526202  will-it-scale.per_process_ops
    will-it-scale/page_fault1-performance
    will-it-scale/page_fault2-performance
    will-it-scale/page_fault3-performance
    will-it-scale/pipe1-performance
    will-it-scale/poll1-performance
  * will-it-scale/poll2-performance
      176134  172848  will-it-scale.per_thread_ops
      281361  275053  will-it-scale.per_process_ops
    will-it-scale/posix_semaphore1-performance
    will-it-scale/pread1-performance
    will-it-scale/pread2-performance
    will-it-scale/pread3-performance
    will-it-scale/pthread_mutex1-performance
    will-it-scale/pthread_mutex2-performance
    will-it-scale/pwrite1-performance
    will-it-scale/pwrite2-performance
    will-it-scale/pwrite3-performance
  * will-it-scale/read1-performance
      1190563  1174833  will-it-scale.per_thread_ops
  * will-it-scale/read2-performance
      1105369  1080427  will-it-scale.per_thread_ops
    will-it-scale/readseek1-performance
  * will-it-scale/readseek2-performance
      261818  259040  will-it-scale.per_thread_ops
    will-it-scale/readseek3-performance
  * will-it-scale/sched_yield-performance
      2408059  2382034  will-it-scale.per_thread_ops
    will-it-scale/signal1-performance
    will-it-scale/unix1-performance
    will-it-scale/unlink1-performance
    will-it-scale/unlink2-performance
  * will-it-scale/write1-performance
      976701  961588  will-it-scale.per_thread_ops
  * will-it-scale/writeseek1-performance
      831898  822448  will-it-scale.per_thread_ops
  * will-it-scale/writeseek2-performance
      228248  225065  will-it-scale.per_thread_ops
  * will-it-scale/writeseek3-performance
      226670  224058  will-it-scale.per_thread_ops
    will-it-scale/context_switch1-performance
    aim7/performance-fork_test-2000
  * aim7/performance-brk_test-3000
      74869  76676  aim7.jobs-per-min
    aim7/performance-disk_cp-3000
    aim7/performance-disk_rd-3000
    aim7/performance-sieve-3000
    aim7/performance-page_test-3000
    aim7/performance-creat-clo-3000
    aim7/performance-mem_rtns_1-8000
    aim7/performance-disk_wrt-8000
    aim7/performance-pipe_cpy-8000
    aim7/performance-ram_copy-8000

(d) lkp-avoton3:
      8 threads Intel(R) Atom(TM) CPU C2750 @ 2.40GHz 16G

    netperf/ipv4-900s-200%-cs-localhost-TCP_STREAM-performance
Signed-off-by: NDietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: Fengguang Wu <fengguang.wu@intel.com>
Cc: Li Zhijian <zhijianx.li@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20180809135753.21077-1-dietmar.eggemann@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com>
Reviewed-by: NShanpei Chen <shanpeic@linux.alibaba.com>

bae52979

alinux: sched: Finer grain of sched latency · fa418988

由 Yihao Wu 提交于 5月 21, 2020

to #28739709

Many samples are between 10ms-50ms. To display more informative
distribution of latency, divide 10ms-50ms into 5 parts uniformly.

Example:

  $ cat /sys/fs/cgroup/cpuacct/a/cpuacct.wait_latency
	0-1ms: 	59726433
	1-4ms: 	167
	4-7ms: 	0
	7-10ms: 	0
	10-20ms: 	5
	20-30ms: 	0
	30-40ms: 	3
	40-50ms: 	0
	50-100ms: 	0
	100-500ms: 	0
	500-1000ms: 	0
	1000-5000ms: 	0
	5000-10000ms: 	0
	>=10000ms: 	0
	total(ms): 	45554
	nr: 	59726600
Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com>
Acked-by: NMichael Wang <yun.wang@linux.alibaba.com>

fa418988

alinux: sched: Add "nr" to sched latency histogram · 2abfd07b

由 Yihao Wu 提交于 6月 18, 2020

to #28739709

Sometimes histogram is not precise enough because each sample is
roughly accounted into a histogram bar. And average latency is more
pratical for some users.

This patch adds a "nr" field in 4 latency histogram interfaces, so

	lat(avg) = total(ms) / nr

And compared to histogram, average latency is better to be used as a
SLI because of simplicity.

Example

    $ cat /sys/fs/cgroup/cpuacct/a/cpuacct.wait_latency
      0-1ms:  4139
      1-4ms:  317
      4-7ms:  568
      7-10ms:         0
      10-100ms:       42324
      100-500ms:      9131
      500-1000ms:     95
      1000-5000ms:    134
      5000-10000ms:   0
      >=10000ms:      0
      total(ms):      4256455
      nr:      182128
Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com>
Acked-by: NMichael Wang <yun.wang@linux.alibaba.com>

2abfd07b

alinux: sched: Add cgroup's scheduling latency histograms · 6dbaddaa

由 Yihao Wu 提交于 5月 21, 2020

to #28739709

This patch adds cpuacct.cgroup_wait_latency interface. It exports the
histogram of the sched entity's schedule latency. Unlike wait_latency,
the sched entity is a cgroup rather than task.

This is useful when tasks are not directly clustered under one cgroup.
For examples:

cgroup1 --- cgroupA --- task1
        --- cgroupB --- task2
cgroup2 --- cgroupC --- task3
        --- cgroupD --- task4

This is a common cgroup hierarchy used by many applications. With
cgroup_wait_latency, we can just read from cgroup1 to know aggregated
wait latency information of task1 and task2.

The interface output format is identical to cpuacct.wait_latency.
Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com>
Acked-by: NMichael Wang <yun.wang@linux.alibaba.com>

6dbaddaa

alinux: sched: Add cgroup-level blocked time histograms · a055ee2c

由 Yihao Wu 提交于 12月 26, 2019

to #28739709

This patch measures time that tasks in cpuacct cgroup blocks. There
are two types: blocked due to IO, and others like locks. And they
are exported in"cpuacct.ioblock_latency" and "cpuacct.block_latency"
respectively.

According to histogram, we know the detailed distribution of the
duration. And according to total(ms), we know the percentage of time
tasks spent off rq, waiting for resources:

(△ioblock_latency.total(ms) + △block_latency.total(ms)) / △wall_time

The interface output format is identical to cpuacct.wait_latency.
Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com>
Acked-by: NXunlei Pang <xlpang@linux.alibaba.com>
Reviewed-by: NShanpei Chen <shanpeic@linux.alibaba.com>
Acked-by: NMichael Wang <yun.wang@linux.alibaba.com>

a055ee2c

alinux: sched: Introduce cfs scheduling latency histograms · 76d98609

由 Yihao Wu 提交于 1月 15, 2020

to #28739709

Export wait_latency in "cpuacct.wait_latency", which indicates the
time that tasks in a cpuacct cgroup wait on a cfs_rq to be scheduled.

This is like "perf sched", but it gives smaller overhead. So it can
be used as monitor constantly.

wait_latency is useful to debug application's high RT problem. It can
tell if it's caused by scheduling or not. If it is, loadavg can tell
if it's caused by bad scheduling bahaviour or system overloads.

System admins can also use wait_latency to define SLA. To ensure SLA
is guaranteed, there are various ways to decrease wait_latency.

This feature is disabled by default for performance concerns. It can
be switched on dynamically by "echo 0 > /proc/cpusli/sched_lat_enable"

Example:

  $ cat /sys/fs/cgroup/cpuacct/a/cpuacct.wait_latency
    0-1ms:  4139
    1-4ms:  317
    4-7ms:  568
    7-10ms:         0
    10-100ms:       42324
    100-500ms:      9131
    500-1000ms:     95
    1000-5000ms:    134
    5000-10000ms:   0
    >=10000ms:      0
    total(ms):      4256455
Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com>
Acked-by: NXunlei Pang <xlpang@linux.alibaba.com>
Reviewed-by: NShanpei Chen <shanpeic@linux.alibaba.com>
Acked-by: NMichael Wang <yun.wang@linux.alibaba.com>

76d98609

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功