- 29 6月, 2020 32 次提交
-
-
由 James Morse 提交于
fix #28612342 commit e00a6e3392cb623b7ac4d61c5e1c1234b4520cad upstream ghes_read_estatus() reads the record address, then the record's header, then performs some sanity checks before reading the records into the provided estatus buffer. To provide this estatus buffer the caller must know the size of the records in advance, or always provide a worst-case sized buffer as happens today for the non-NMI notifications. Add a function to peek at the record's header to find the size. This will let the NMI path allocate the right amount of memory before reading the records, instead of using the worst-case size, and having to copy the records. Split ghes_read_estatus() to create __ghes_peek_estatus() which returns the address and size of the CPER records. Signed-off-by: NJames Morse <james.morse@arm.com> Changes since v7: * Grammar * concistent argument ordering Changes since v6: * Additional buf_addr = 0 error handling * Moved checking out of peek-estatus * Reworded an error message so we can tell them apart Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: NAlex Shi <alex.shi@linux.alibaba.com> Acked-by: NCaspar Zhang <caspar@linux.alibaba.com> Reviewed-by: Nluanshi <zhangliguang@linux.alibaba.com>
-
由 James Morse 提交于
fix #28612342 commit f2a681b9160b9c80826b3062e71371cfc82b4863 upstream ghes_read_estatus() checks various lengths in the top-level header to ensure the CPER records to be read aren't obviously corrupt. Take the opportunity to make this more user-friendly, printing a (ratelimited) message about the nature of the header format error. Suggested-by: NBorislav Petkov <bp@alien8.de> Signed-off-by: NJames Morse <james.morse@arm.com> [ rjw: Add missing 'static' ] Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: NAlex Shi <alex.shi@linux.alibaba.com> Acked-by: NCaspar Zhang <caspar@linux.alibaba.com> Reviewed-by: Nluanshi <zhangliguang@linux.alibaba.com>
-
由 James Morse 提交于
fix #28612342 commit f2a7e059aa7a6a22a6f4612f31ee29e726a3bfd0 upstream The NMI-like notifications scribble over ghes->estatus, before copying it somewhere else. If this interrupts the ghes_probe() code calling ghes_proc() on each struct ghes, the data is corrupted. All the NMI-like notifications should use a queued estatus entry from the beginning, instead of the ghes version, then copying it. To do this, break up any use of "ghes->estatus" so that all functions take the estatus as an argument. This patch just moves these ghes->estatus dereferences into separate arguments, no change in behaviour. struct ghes becomes unused in ghes_clear_estatus() as it only wanted ghes->estatus, which we now pass directly. This is removed. Signed-off-by: NJames Morse <james.morse@arm.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: NAlex Shi <alex.shi@linux.alibaba.com> Acked-by: NCaspar Zhang <caspar@linux.alibaba.com> Reviewed-by: Nluanshi <zhangliguang@linux.alibaba.com>
-
由 James Morse 提交于
fix #28612342 commit b484079b9f520cc9a0797d885f1cd7f64b72b1b2 upstream ghes_copy_tofrom_phys() uses a different fixmap slot depending on in_nmi(). This doesn't work when there are multiple NMI-like notifications, that could interrupt each other. As with the locking, move the chosen fixmap_idx to the notification helper. This only matters for NMI-like notifications, anything calling ghes_proc() can use the IRQ fixmap slot as its already holding an irqsave spinlock. This lets us collapse the ghes_ioremap_pfn_*() helpers. Signed-off-by: NJames Morse <james.morse@arm.com> Reviewed-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: NAlex Shi <alex.shi@linux.alibaba.com> Acked-by: NCaspar Zhang <caspar@linux.alibaba.com> Reviewed-by: Nluanshi <zhangliguang@linux.alibaba.com>
-
由 James Morse 提交于
fix #28612342 commit 3b880cbe4df5dd78a2b2279dbe16db9d193412ca upstream ghes_copy_tofrom_phys() takes different locks depending on in_nmi(). This doesn't work if there are multiple NMI-like notifications, that can interrupt each other. Now that NOTIFY_SEA is always called in the same context, move the lock-taking to the notification helper. The helper will always know which lock to take. This avoids ghes_copy_tofrom_phys() taking a guess based on in_nmi(). This splits NOTIFY_NMI and NOTIFY_SEA to use different locks. All the other notifications use ghes_proc(), and are called in process or IRQ context. Move the spin_lock_irqsave() around their ghes_proc() calls. Signed-off-by: NJames Morse <james.morse@arm.com> Reviewed-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: NAlex Shi <alex.shi@linux.alibaba.com> Acked-by: NCaspar Zhang <caspar@linux.alibaba.com> Reviewed-by: Nluanshi <zhangliguang@linux.alibaba.com>
-
由 James Morse 提交于
fix #28612342 commit d44f1b8dd7e66d80cc4205809e5ace866bd851da upstream To split up APEIs in_nmi() path, the caller needs to always be in_nmi(). Add a helper to do the work and claim the notification. When KVM or the arch code takes an exception that might be a RAS notification, it asks the APEI firmware-first code whether it wants to claim the exception. A future kernel-first mechanism may be queried afterwards, and claim the notification, otherwise we fall through to the existing default behaviour. The NOTIFY_SEA code was merged before considering multiple, possibly interacting, NMI-like notifications and the need to consider kernel first in the future. Make the 'claiming' behaviour explicit. Restructuring the APEI code to allow multiple NMI-like notifications means any notification that might interrupt interrupts-masked code must always be wrapped in nmi_enter()/nmi_exit(). This will allow APEI to use in_nmi() to use the right fixmap entries. Mask SError over this window to prevent an asynchronous RAS error arriving and tripping 'nmi_enter()'s BUG_ON(in_nmi()). Signed-off-by: NJames Morse <james.morse@arm.com> Acked-by: NMarc Zyngier <marc.zyngier@arm.com> Tested-by: NTyler Baicar <tbaicar@codeaurora.org> Acked-by: NCatalin Marinas <catalin.marinas@arm.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: NAlex Shi <alex.shi@linux.alibaba.com> Acked-by: NCaspar Zhang <caspar@linux.alibaba.com> Reviewed-by: Nluanshi <zhangliguang@linux.alibaba.com>
-
由 James Morse 提交于
fix #28612342 commit 0db5e0223035b2c84e6186831fc27511270af812 upstream To split up APEIs in_nmi() path, the caller needs to always be in_nmi(). KVM shouldn't have to know about this, pull the RAS plumbing out into a header file. Currently guest synchronous external aborts are claimed as RAS notifications by handle_guest_sea(), which is hidden in the arch codes mm/fault.c. 32bit gets a dummy declaration in system_misc.h. There is going to be more of this in the future if/when the kernel supports the SError-based firmware-first notification mechanism and/or kernel-first notifications for both synchronous external abort and SError. Each of these will come with some Kconfig symbols and a handful of header files. Create a header file for all this. This patch gives handle_guest_sea() a 'kvm_' prefix, and moves the declarations to kvm_ras.h as preparation for a future patch that moves the ACPI-specific RAS code out of mm/fault.c. Signed-off-by: NJames Morse <james.morse@arm.com> Reviewed-by: NPunit Agrawal <punit.agrawal@arm.com> Acked-by: NMarc Zyngier <marc.zyngier@arm.com> Tested-by: NTyler Baicar <tbaicar@codeaurora.org> Acked-by: NCatalin Marinas <catalin.marinas@arm.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: NAlex Shi <alex.shi@linux.alibaba.com> Acked-by: NCaspar Zhang <caspar@linux.alibaba.com> Reviewed-by: Nluanshi <zhangliguang@linux.alibaba.com>
-
由 James Morse 提交于
fix #28612342 commit 255097c82d821bb2bb18e9c7011841ee7342840f upstream Now that the estatus queue can be used by more than one notification method, we can move notifications that have NMI-like behaviour over. Switch NOTIFY_SEA over to use the estatus queue. This makes it behave in the same way as x86's NOTIFY_NMI. Remove Kconfig's ability to turn ACPI_APEI_SEA off if ACPI_APEI_GHES is selected. This roughly matches the x86 NOTIFY_NMI behaviour, and means each architecture has at least one user of the estatus-queue, meaning it doesn't need guarding with ifdef. Signed-off-by: NJames Morse <james.morse@arm.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: NAlex Shi <alex.shi@linux.alibaba.com> Acked-by: NCaspar Zhang <caspar@linux.alibaba.com> Reviewed-by: Nluanshi <zhangliguang@linux.alibaba.com>
-
由 James Morse 提交于
fix #28612342 commit 9c9d08051380ad3f6e6376d4383615771c59fd99 upstream The estatus-queue code is currently hidden by the NOTIFY_NMI #ifdefs. Once NOTIFY_SEA starts using the estatus-queue we can stop hiding it as each architecture has a user that can't be turned off. Split the existing CONFIG_HAVE_ACPI_APEI_NMI block in two, and move the SEA code into the gap. Move the code around ... and changes the stale comment describing why the status queue is necessary: printk() is no longer the issue, its the helpers like memory_failure_queue() that aren't nmi safe. Signed-off-by: NJames Morse <james.morse@arm.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: NAlex Shi <alex.shi@linux.alibaba.com> Acked-by: NCaspar Zhang <caspar@linux.alibaba.com> Reviewed-by: Nluanshi <zhangliguang@linux.alibaba.com>
-
由 James Morse 提交于
fix #28612342 commit 06ddeadc8d1c4f704b8956f239263bca75a3add8 upstream During ghes_proc() we use ghes_ack_error() to tell an external agent we are done with these records and it can re-use the memory. rc may hold an error returned by ghes_read_estatus(), ENOENT causes us to skip ghes_ack_error() (as there is nothing to ack), but rc may also by EIO, which gets supressed. ghes_clear_estatus() is where we mark the records as processed for non GHESv2 error sources, and already spots the ENOENT case as buf_paddr is set to 0 by ghes_read_estatus(). Move the ghes_ack_error() call in here to avoid extra logic with the return code in ghes_proc(). This enables GHESv2 acking for NMI-like error sources. This is safe as the buffer is pre-mapped by map_gen_v2() before the GHES is added to any NMI handler lists. This same pre-mapping step means we can't receive an error from apei_read()/write() here as apei_check_gar() succeeded when it was mapped, and the mapping was cached, so the address can't be rejected at runtime. Remove the error-returns as this is now called from a function with no return. Signed-off-by: NJames Morse <james.morse@arm.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: NAlex Shi <alex.shi@linux.alibaba.com> Acked-by: NCaspar Zhang <caspar@linux.alibaba.com> Reviewed-by: Nluanshi <zhangliguang@linux.alibaba.com>
-
由 James Morse 提交于
fix #28612342 commit ee2eb3d4ee175c2fb5c7f67e84f5fe40a8147d92 upstream Refactor the estatus queue's pool notification routine from NOTIFY_NMI's handlers. This will allow another notification method to use the estatus queue without duplicating this code. Add rcu_read_lock()/rcu_read_unlock() around the list list_for_each_entry_rcu() walker. These aren't strictly necessary as the whole nmi_enter/nmi_exit() window is a spooky RCU read-side critical section. in_nmi_queue_one_entry() is separate from the rcu-list walker for a later caller that doesn't need to walk a list. Signed-off-by: NJames Morse <james.morse@arm.com> Reviewed-by: NPunit Agrawal <punit.agrawal@arm.com> Tested-by: NTyler Baicar <tbaicar@codeaurora.org> [ rjw: Drop unnecessary err variable in two places ] Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: NAlex Shi <alex.shi@linux.alibaba.com> Acked-by: NCaspar Zhang <caspar@linux.alibaba.com> Reviewed-by: Nluanshi <zhangliguang@linux.alibaba.com>
-
由 James Morse 提交于
fix #28612342 commit 5cc6c68287ae4be22c40b41cf6844746cddebbcc upstream ghes_read_estatus() sets a flag in struct ghes if the buffer of CPER records needs to be cleared once the records have been processed. This flag value is a problem if a struct ghes can be processed concurrently, as happens at probe time if an NMI arrives for the same error source. The NMI clears the flag, meaning the interrupted handler may never do the ghes_estatus_clear() work. The GHES_TO_CLEAR flags is only set at the same time as buffer_paddr, which is now owned by the caller and passed to ghes_clear_estatus(). Use this value as the flag. A non-zero buf_paddr returned by ghes_read_estatus() means ghes_clear_estatus() should clear this address. ghes_read_estatus() already checks for a read of error_status_address being zero, so CPER records cannot be written here. Signed-off-by: NJames Morse <james.morse@arm.com> Reviewed-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: NAlex Shi <alex.shi@linux.alibaba.com> Acked-by: NCaspar Zhang <caspar@linux.alibaba.com> Reviewed-by: Nluanshi <zhangliguang@linux.alibaba.com>
-
由 James Morse 提交于
fix #28612342 commit 7d49f2c75af22f980fd716a13634a16cfb7dd8a7 upstream ghes_notify_nmi() checks ghes->flags for GHES_TO_CLEAR before going on to __process_error(). This is pointless as ghes_read_estatus() will always set this flag if it returns success, which was checked earlier in the loop. Remove it. Signed-off-by: NJames Morse <james.morse@arm.com> Reviewed-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: NAlex Shi <alex.shi@linux.alibaba.com> Acked-by: NCaspar Zhang <caspar@linux.alibaba.com> Reviewed-by: Nluanshi <zhangliguang@linux.alibaba.com>
-
由 James Morse 提交于
fix #28612342 commit eeb2555779471abdbcc6289a52dc54ce513feaf2 upstream When CPER records are found the address of the records is stashed in the struct ghes. Once the records have been processed, this address is overwritten with zero so that it won't be processed again without being re-populated by firmware. This goes wrong if a struct ghes can be processed concurrently, as can happen at probe time when an NMI occurs. If the NMI arrives on another CPU, the probing CPU may call ghes_clear_estatus() on the records before the handler had finished with them. Even on the same CPU, once the interrupted handler is resumed, it will call ghes_clear_estatus() on the NMIs records, this memory may have already been re-used by firmware. Avoid this stashing by letting the caller hold the address. A later patch will do away with the use of ghes->flags in the read/clear code too. Signed-off-by: NJames Morse <james.morse@arm.com> Reviewed-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: NAlex Shi <alex.shi@linux.alibaba.com> Acked-by: NCaspar Zhang <caspar@linux.alibaba.com> Reviewed-by: Nluanshi <zhangliguang@linux.alibaba.com>
-
由 James Morse 提交于
fix #28612342 commit fb7be08f1a091ec243780bfdad4bf0c492057808 upstream Adding new NMI-like notifications duplicates the calls that grow and shrink the estatus pool. This is all pretty pointless, as the size is capped to 64K. Allocate this for each ghes and drop the code that grows and shrinks the pool. Suggested-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NJames Morse <james.morse@arm.com> Reviewed-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: NAlex Shi <alex.shi@linux.alibaba.com> Acked-by: NCaspar Zhang <caspar@linux.alibaba.com> Reviewed-by: Nluanshi <zhangliguang@linux.alibaba.com>
-
由 James Morse 提交于
fix #28612342 commit e147133a42cb9df6cbc99503fdf58d0e6388bf2a upstream ghes.c has a memory pool it uses for the estatus cache and the estatus queue. The cache is initialised when registering the platform driver. For the queue, an NMI-like notification has to grow/shrink the pool as it is registered and unregistered. This is all pretty noisy when adding new NMI-like notifications, it would be better to replace this with a static pool size based on the number of users. As a precursor, move the call that creates the pool from ghes_init(), into hest.c. Later this will take the number of ghes entries and consolidate the queue allocations. Remove ghes_estatus_pool_exit() as hest.c doesn't have anywhere to put this. The pool is now initialised as part of ACPI's subsys_initcall(): (acpi_init(), acpi_scan_init(), acpi_pci_root_init(), acpi_hest_init()) Before this patch it happened later as a GHES specific device_initcall(). Signed-off-by: NJames Morse <james.morse@arm.com> Reviewed-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: NAlex Shi <alex.shi@linux.alibaba.com> Acked-by: NCaspar Zhang <caspar@linux.alibaba.com> Reviewed-by: Nluanshi <zhangliguang@linux.alibaba.com>
-
由 James Morse 提交于
fix #28612342 commit 93066e9aefa16beb10bb4a32c2f1657822b57753 upstream Subsequent patches will split up ghes_read_estatus(), at which point passing around the 'silent' flag gets annoying. This is to suppress prink() messages, which prior to commit 42a0bb3f ("printk/nmi: generic solution for safe printk in NMI"), were unsafe in NMI context. This is no longer necessary, remove the flag. printk() messages are batched in a per-cpu buffer and printed via irq-work, or a call back from panic(). Signed-off-by: NJames Morse <james.morse@arm.com> Reviewed-by: NBorislav Petkov <bp@suse.de> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: NAlex Shi <alex.shi@linux.alibaba.com> Acked-by: NCaspar Zhang <caspar@linux.alibaba.com> Reviewed-by: Nluanshi <zhangliguang@linux.alibaba.com>
-
由 Jens Axboe 提交于
fix #28871358 commit b2c5d16b72df1116f05c9be16a630ac939d34101 upstream If we have that hook, we know the driver handles bd->last == true in a smart fashion. If it does, even for multiple hardware queues, it's a good idea to flush batches of requests to the device, if we have batches of requests from the submitter. Reviewed-by: NMing Lei <ming.lei@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Jens Axboe 提交于
fix #28871358 commit be94f058f2bde6f0b0ee9059a35daa8e15be308f upstream If we are issuing a list of requests, we know if we're at the last one. If we fail issuing, ensure that we call ->commits_rqs() to flush any potential previous requests. Reviewed-by: NOmar Sandoval <osandov@fb.com> Reviewed-by: NMing Lei <ming.lei@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Jens Axboe 提交于
fix #28871358 commit 944e7c87967c820a0f34a935b1f2799944099750 upstream We need this for blk-mq to kick things into gear, if we told it that we had more IO coming, but then failed to deliver on that promise. Reviewed-by: NOmar Sandoval <osandov@fb.com> Acked-by: NMichael S. Tsirkin <mst@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMing Lei <ming.lei@redhat.com> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Jens Axboe 提交于
fix #28871358 commit 04f3eafda6e05adc56afed4d3ae6e24aaa429058 upstream Split the command submission and the SQ doorbell ring, and add the doorbell ring as our ->commit_rqs() hook. This allows a list of requests to be issued, with nvme only writing the SQ update when it's necessary. This is more efficient if we have lists of requests to issue, particularly on virtualized hardware, where writing the SQ doorbell is more expensive than on real hardware. For those cases, performance increases of 2-3x have been observed. The use case for this is plugged IO, where blk-mq flushes a batch of requests at the time. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Jens Axboe 提交于
fix #28871358 commit d666ba98f849ad44c4405ecc2180390ebe80f4f9 upstream blk-mq passes information to the hardware about any given request being the last that we will issue in this sequence. The point is that hardware can defer costly doorbell type writes to the last request. But if we run into errors issuing a sequence of requests, we may never send the request with bd->last == true set. For that case, we need a hook that tells the hardware that nothing else is coming right now. For failures returned by the drivers ->queue_rq() hook, the driver is responsible for flushing pending requests, if it uses bd->last to optimize that part. This works like before, no changes there. Reviewed-by: NOmar Sandoval <osandov@fb.com> Reviewed-by: NMing Lei <ming.lei@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Jens Axboe 提交于
fix #28871358 Only do it if we have requests for multiple queues in the same plug. Reviewed-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NJens Axboe <axboe@kernel.dk> Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Vincent Guittot 提交于
to #28739709 commit bef69dd87828ef5d8ecdab8d857cd3a33cf98675 upstream update_cfs_rq_load_avg() calls cfs_rq_util_change() every time PELT decays, which might be inefficient when the cpufreq driver has rate limitation. When a task is attached on a CPU, we have this call path: update_load_avg() update_cfs_rq_load_avg() cfs_rq_util_change -- > trig frequency update attach_entity_load_avg() cfs_rq_util_change -- > trig frequency update The 1st frequency update will not take into account the utilization of the newly attached task and the 2nd one might be discarded because of rate limitation of the cpufreq driver. update_cfs_rq_load_avg() is only called by update_blocked_averages() and update_load_avg() so we can move the call to cfs_rq_util_change/cpufreq_update_util() into these two functions. It's also interesting to note that update_load_avg() already calls cfs_rq_util_change() directly for the !SMP case. This change will also ensure that cpufreq_update_util() is called even when there is no more CFS rq in the leaf_cfs_rq_list to update, but only IRQ, RT or DL PELT signals. [ mingo: Minor updates. ] Reported-by: NDoug Smythies <dsmythies@telus.net> Tested-by: NDoug Smythies <dsmythies@telus.net> Signed-off-by: NVincent Guittot <vincent.guittot@linaro.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NDietmar Eggemann <dietmar.eggemann@arm.com> Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: juri.lelli@redhat.com Cc: linux-pm@vger.kernel.org Cc: mgorman@suse.de Cc: rostedt@goodmis.org Cc: sargun@sargun.me Cc: srinivas.pandruvada@linux.intel.com Cc: tj@kernel.org Cc: xiexiuqi@huawei.com Cc: xiezhipeng1@huawei.com Fixes: 039ae8b ("sched/fair: Fix O(nr_cgroups) in the load balancing path") Link: https://lkml.kernel.org/r/1574083279-799-1-git-send-email-vincent.guittot@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com> Acked-by: NMichael Wang <yun.wang@linux.alibaba.com>
-
由 Vincent Guittot 提交于
to #28739709 commit 039ae8bcf7a5f4476f4487e6bf816885fb3fb617 upstream This re-applies the commit reverted here: commit c40f7d74c741 ("sched/fair: Fix infinite loop in update_blocked_averages() by reverting a9e7f654") I.e. now that cfs_rq can be safely removed/added in the list, we can re-apply: commit a9e7f654 ("sched/fair: Fix O(nr_cgroups) in load balance path") Signed-off-by: NVincent Guittot <vincent.guittot@linaro.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: sargun@sargun.me Cc: tj@kernel.org Cc: xiexiuqi@huawei.com Cc: xiezhipeng1@huawei.com Link: https://lkml.kernel.org/r/1549469662-13614-3-git-send-email-vincent.guittot@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com> Acked-by: NMichael Wang <yun.wang@linux.alibaba.com>
-
由 Vincent Guittot 提交于
to #28739709 commit 31bc6aeaab1d1de8959b67edbed5c7a4b3cdbe7c upstream Removing a cfs_rq from rq->leaf_cfs_rq_list can break the parent/child ordering of the list when it will be added back. In order to remove an empty and fully decayed cfs_rq, we must remove its children too, so they will be added back in the right order next time. With a normal decay of PELT, a parent will be empty and fully decayed if all children are empty and fully decayed too. In such a case, we just have to ensure that the whole branch will be added when a new task is enqueued. This is default behavior since : commit f6783319737f ("sched/fair: Fix insertion in rq->leaf_cfs_rq_list") In case of throttling, the PELT of throttled cfs_rq will not be updated whereas the parent will. This breaks the assumption made above unless we remove the children of a cfs_rq that is throttled. Then, they will be added back when unthrottled and a sched_entity will be enqueued. As throttled cfs_rq are now removed from the list, we can remove the associated test in update_blocked_averages(). Signed-off-by: NVincent Guittot <vincent.guittot@linaro.org> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: sargun@sargun.me Cc: tj@kernel.org Cc: xiexiuqi@huawei.com Cc: xiezhipeng1@huawei.com Link: https://lkml.kernel.org/r/1549469662-13614-2-git-send-email-vincent.guittot@linaro.orgSigned-off-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com> Acked-by: NMichael Wang <yun.wang@linux.alibaba.com>
-
由 Yihao Wu 提交于
to #29028845 cpuacct_update_latency's declaration was changed since 6dbaddaa, but was not changed for the case when CONFIG_SCHED_SLI=n. This leads to a compilation error. Fixes: 6dbaddaa ("alinux: sched: Add cgroup's scheduling latency histograms") Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com> Acked-by: NMichael Wang <yun.wang@linux.alibaba.com>
-
由 Christoph Hellwig 提交于
fix #28339081 commit edfbcb321faf07ca970e4191abe061deeb7d3788 upstream The USB buffer allocation code is the only place in the usb core (and in fact the whole kernel) that uses is_device_dma_capable, while the URB mapping code uses the uses_dma flag in struct usb_bus. Switch the buffer allocation to use the uses_dma flag used by the rest of the USB code, and create a helper in hcd.h that checks this flag as well as the CONFIG_HAS_DMA to simplify the caller a bit. Signed-off-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20190811080520.21712-3-hch@lst.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: NAlex Shi <alex.shi@linux.alibaba.com>
-
由 Christoph Hellwig 提交于
fix #28339081 commit dd3ecf17ba70a70d2c9ef9ba725281b84f8eef12 upstream If the HCD provides a localmem pool we will never use the DMA pools, so don't create them. Fixes: b0310c2f09bb ("USB: use genalloc for USB HCs with local memory") Signed-off-by: NChristoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20190811080520.21712-2-hch@lst.deSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: NAlex Shi <alex.shi@linux.alibaba.com>
-
由 Laurentiu Tudor 提交于
fix #28339081 commit 2d7a3dc3e24f43504b1f25eae8195e600f4cce8b upstream With the addition of the local memory allocator, the HCD_LOCAL_MEM flag can be dropped and the checks against it replaced with a check for the localmem_pool ptr being initialized. Signed-off-by: NLaurentiu Tudor <laurentiu.tudor@nxp.com> Tested-by: NFredrik Noring <noring@nocrew.org> Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: NAlex Shi <alex.shi@linux.alibaba.com>
-
由 Laurentiu Tudor 提交于
fix #28339081 commit b0310c2f09bbe8aebefb97ed67949a3a7092aca6 upstream For HCs that have local memory, replace the current DMA API usage with a genalloc generic allocator to manage the mappings for these devices. To help users, introduce a new HCD API, usb_hcd_setup_local_mem() that will setup up the genalloc backing up the device local memory. It will be used in subsequent patches. This is in preparation for dropping the existing "coherent" dma mem declaration APIs. The current implementation was relying on a short circuit in the DMA API that in the end, was acting as an allocator for these type of devices. Signed-off-by: NLaurentiu Tudor <laurentiu.tudor@nxp.com> Tested-by: NFredrik Noring <noring@nocrew.org> Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: NAlex Shi <alex.shi@linux.alibaba.com>
-
由 Fredrik Noring 提交于
fix #28339081 commit da83a722959a82733c3ca60030cc364ca2318c5a upstream gen_pool_dma_zalloc() is a zeroed memory variant of gen_pool_dma_alloc(). Also document the return values of both, and indicate NULL as a "%NULL" constant. Signed-off-by: NFredrik Noring <noring@nocrew.org> Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: NAlex Shi <alex.shi@linux.alibaba.com>
-
- 28 6月, 2020 1 次提交
-
-
由 Zelin Deng 提交于
fix #28886284 On AMD platforms cpu frequency was not able to be tuned as there's no cpufreq driver registered -- intel_pstate has been enabled but it only can be loaded on Intel CPUs. Hence after evaluated and validated on AMD platforms, we decide to enable acpi-cpufreq. acpi-cpufreq won't impact on intel_pstate on Intel platforms as intel_pstate will be loaded in device_initcall while acpi-cpufreq will be loaded in late_initcall. This sequence ensure intel_pstate can be loaded but acpi-cpufreq can not on Intel platforms. Signed-off-by: NZelin Deng <zelin.deng@linux.alibaba.com> Reviewed-by: NCaspar Zhang <caspar@linux.alibaba.com> Reviewed-by: NArtie Ding <fulin.dn@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
- 24 6月, 2020 7 次提交
-
-
由 Dietmar Eggemann 提交于
to #28739709 commit af75d1a9a9f75bf030c2f35705f1ff6d226f96fe upstream Since sg_lb_stats::sum_weighted_load is now identical with sg_lb_stats::group_load remove it and replace its use case (calculating load per task) with the latter. Signed-off-by: NDietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NRik van Riel <riel@surriel.com> Acked-by: NVincent Guittot <vincent.guittot@linaro.org> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: Patrick Bellasi <patrick.bellasi@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Perret <quentin.perret@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Valentin Schneider <valentin.schneider@arm.com> Link: https://lkml.kernel.org/r/20190527062116.11512-7-dietmar.eggemann@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com> Reviewed-by: NShanpei Chen <shanpeic@linux.alibaba.com> Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com>
-
由 Dietmar Eggemann 提交于
to #28739709 commit 0e1fef63d92d61ed561e504c3a078a827a0f9bfe upstream The sched domain per rq load index files also disappear from the /proc/sys/kernel/sched_domain/cpuX/domainY directories. Signed-off-by: NDietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NRik van Riel <riel@surriel.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: Patrick Bellasi <patrick.bellasi@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Perret <quentin.perret@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Valentin Schneider <valentin.schneider@arm.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/20190527062116.11512-6-dietmar.eggemann@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com> Reviewed-by: NShanpei Chen <shanpeic@linux.alibaba.com>
-
由 Dietmar Eggemann 提交于
to #28739709 commit 55627e3cd22c315c4a02fe3bbbb7234ec439cb1d upstream The per rq load array values also disappear from the cpu#X sections in /proc/sched_debug. Signed-off-by: NDietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NRik van Riel <riel@surriel.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: Patrick Bellasi <patrick.bellasi@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Perret <quentin.perret@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Valentin Schneider <valentin.schneider@arm.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/20190527062116.11512-5-dietmar.eggemann@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com> Reviewed-by: NShanpei Chen <shanpeic@linux.alibaba.com>
-
由 Dietmar Eggemann 提交于
to #28739709 commit 3d8d53554405952993bb0279ef3ebebc51740074 upstream This reverts: commit 201c373e ("sched/debug: Limit sd->*_idx range on sysctl") Load indexes (sd->*_idx) are no longer needed without rq->cpu_load[]. The range check for load indexes can be removed as well. Get rid of it before the rq->cpu_load[] since it uses CPU_LOAD_IDX_MAX. At the same time, fix the following coding style issues detected by scripts/checkpatch.pl: ERROR: space prohibited before that ',' ERROR: space prohibited before that close parenthesis ')' Signed-off-by: NDietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NRik van Riel <riel@surriel.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: Patrick Bellasi <patrick.bellasi@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Perret <quentin.perret@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Valentin Schneider <valentin.schneider@arm.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/20190527062116.11512-4-dietmar.eggemann@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com> Reviewed-by: NShanpei Chen <shanpeic@linux.alibaba.com>
-
由 Dietmar Eggemann 提交于
to #28739709 commit 1c1b8a7b03ef50f80f5d0c871ee261c04a6c967e upstream With LB_BIAS disabled, source_load() & target_load() return weighted_cpuload(). Replace both with calls to weighted_cpuload(). The function to obtain the load index (sd->*_idx) for an sd, get_sd_load_idx(), can be removed as well. Finally, get rid of the sched feature LB_BIAS. Signed-off-by: NDietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NRik van Riel <riel@surriel.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: Patrick Bellasi <patrick.bellasi@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Perret <quentin.perret@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Valentin Schneider <valentin.schneider@arm.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/20190527062116.11512-3-dietmar.eggemann@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com> Reviewed-by: NShanpei Chen <shanpeic@linux.alibaba.com>
-
由 Dietmar Eggemann 提交于
to #28739709 commit 5e83eafbfd3b351537c0d74467fc43e8a88f4ae4 upstream With LB_BIAS disabled, there is no need to update the rq->cpu_load[idx] any more. Signed-off-by: NDietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Acked-by: NRik van Riel <riel@surriel.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Morten Rasmussen <morten.rasmussen@arm.com> Cc: Patrick Bellasi <patrick.bellasi@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Quentin Perret <quentin.perret@arm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Valentin Schneider <valentin.schneider@arm.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/20190527062116.11512-2-dietmar.eggemann@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com> Reviewed-by: NShanpei Chen <shanpeic@linux.alibaba.com>
-
由 Dietmar Eggemann 提交于
to #28739709 commit f2bedc4705659216bd60948029ad8dfedf923ad9 upstream The CFS class is the only one maintaining and using the CPU wide load (rq->load(.weight)). The last use case of the CPU wide load in CFS's set_next_entity() can be replaced by using the load of the CFS class (rq->cfs.load(.weight)) instead. Signed-off-by: NDietmar Eggemann <dietmar.eggemann@arm.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20190424084556.604-1-dietmar.eggemann@arm.comSigned-off-by: NIngo Molnar <mingo@kernel.org> Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com> Reviewed-by: NShanpei Chen <shanpeic@linux.alibaba.com>
-