- 22 4月, 2020 25 次提交
-
-
由 wuxu.wu 提交于
fix #25872428 commit 19b61392c5a852b4e8a0bf35aecb969983c5932d upstream dw_spi_irq() and dw_spi_transfer_one concurrent calls. I find a panic in dw_writer(): txw = *(u8 *)(dws->tx), when dw->tx==null, dw->len==4, and dw->tx_end==1. When tpm driver's message overtime dw_spi_irq() and dw_spi_transfer_one may concurrent visit dw_spi, so I think dw_spi structure lack of protection. Otherwise dw_spi_transfer_one set dw rx/tx buffer and then open irq, store dw rx/tx instructions and other cores handle irq load dw rx/tx instructions may out of order. [ 1025.321302] Call trace: ... [ 1025.321319] __crash_kexec+0x98/0x148 [ 1025.321323] panic+0x17c/0x314 [ 1025.321329] die+0x29c/0x2e8 [ 1025.321334] die_kernel_fault+0x68/0x78 [ 1025.321337] __do_kernel_fault+0x90/0xb0 [ 1025.321346] do_page_fault+0x88/0x500 [ 1025.321347] do_translation_fault+0xa8/0xb8 [ 1025.321349] do_mem_abort+0x68/0x118 [ 1025.321351] el1_da+0x20/0x8c [ 1025.321362] dw_writer+0xc8/0xd0 [ 1025.321364] interrupt_transfer+0x60/0x110 [ 1025.321365] dw_spi_irq+0x48/0x70 ... Signed-off-by: Nwuxu.wu <wuxu.wu@huawei.com> Link: https://lore.kernel.org/r/1577849981-31489-1-git-send-email-wuxu.wu@huawei.comSigned-off-by: NMark Brown <broonie@kernel.org> Signed-off-by: Nluanshi <zhangliguang@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: NCaspar Zhang <caspar@linux.alibaba.com>
-
由 Suravee Suthikulpanit 提交于
fix #26319040 commit 730ad0ede130015a773229573559e97ba0943065 upstream. Commit b9c6ff94e43a ("iommu/amd: Re-factor guest virtual APIC (de-)activation code") accidentally left out the ir_data pointer when calling modity_irte_ga(), which causes the function amd_iommu_update_ga() to return prematurely due to struct amd_ir_data.ref is NULL and the "is_run" bit of IRTE does not get updated properly. This results in bad I/O performance since IOMMU AVIC always generate GA Log entry and notify IOMMU driver and KVM when it receives interrupt from the PCI pass-through device instead of directly inject interrupt to the vCPU. Fixes by passing ir_data when calling modify_irte_ga() as done previously. Fixes: b9c6ff94e43a ("iommu/amd: Re-factor guest virtual APIC (de-)activation code") Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: NJoerg Roedel <jroedel@suse.de> Signed-off-by: Ntianyi <fujunkang@linux.alibaba.com> Reviewed-by: Nzhangliguang <zhangliguang@linux.alibaba.com> Acked-by: Nzhangliguang <zhangliguang@linux.alibaba.com>
-
由 Suthikulpanit, Suravee 提交于
fix #26319040 commit b9c6ff94e43a0ee053e0c1d983fba1ac4953b762 upstream. Re-factore the logic for activate/deactivate guest virtual APIC mode (GAM) into helper functions, and export them for other drivers (e.g. SVM). to support run-time activate/deactivate of SVM AVIC. Cc: Joerg Roedel <joro@8bytes.org> Signed-off-by: NSuravee Suthikulpanit <suravee.suthikulpanit@amd.com> Signed-off-by: NJoerg Roedel <jroedel@suse.de> Signed-off-by: Ntianyi <fujunkang@linux.alibaba.com> Reviewed-by: Nzhangliguang <zhangliguang@linux.alibaba.com> Acked-by: Nzhangliguang <zhangliguang@linux.alibaba.com>
-
由 Joerg Roedel 提交于
fix #26319040 commit 2a78f9962565e53b78363eaf516eb052009e8020 upstream. The traversing of this list requires protection_domain->lock to be taken to avoid nasty races with attach/detach code. Make sure the lock is held on all code-paths traversing this list. Reported-by: NFilippo Sironi <sironi@amazon.de> Fixes: 92d420ec ("iommu/amd: Relax locking in dma_ops path") Reviewed-by: NFilippo Sironi <sironi@amazon.de> Reviewed-by: NJerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: NJoerg Roedel <jroedel@suse.de> Signed-off-by: Ntianyi <fujunkang@linux.alibaba.com> Reviewed-by: Nzhangliguang <zhangliguang@linux.alibaba.com> Acked-by: Nzhangliguang <zhangliguang@linux.alibaba.com>
-
由 Joerg Roedel 提交于
fix #26319040 commit ab7b2577f0d119052b98b8d913bad369ac2760eb upstream. Make sure that attaching a detaching a device can't race against each other and protect the iommu_dev_data with a spin_lock in these code paths. Fixes: 92d420ec ("iommu/amd: Relax locking in dma_ops path") Reviewed-by: NFilippo Sironi <sironi@amazon.de> Reviewed-by: NJerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: NJoerg Roedel <jroedel@suse.de> Signed-off-by: Ntianyi <fujunkang@linux.alibaba.com> Reviewed-by: Nzhangliguang <zhangliguang@linux.alibaba.com> Acked-by: Nzhangliguang <zhangliguang@linux.alibaba.com>
-
由 tianyi 提交于
fix #26319040 commit 45e528d9c479aeef2d3d1db1e619b243f91e324f upstream. Check early in attach_device whether the device is already attached to a domain. This also simplifies the code path so that __attach_device() can be removed. Fixes: 92d420ec ("iommu/amd: Relax locking in dma_ops path") Reviewed-by: NFilippo Sironi <sironi@amazon.de> Reviewed-by: NJerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: NJoerg Roedel <jroedel@suse.de> Signed-off-by: Ntianyi <fujunkang@linux.alibaba.com> Reviewed-by: Nzhangliguang <zhangliguang@linux.alibaba.com> Acked-by: Nzhangliguang <zhangliguang@linux.alibaba.com>
-
由 Joerg Roedel 提交于
fix #26319040 commit f6c0bfce271b2dd613e8b8e009eefe89c1f788e8 upstream. The code-paths before __attach_device() and __detach_device() are called also access and modify domain state, so take the domain lock there too. This allows to get rid of the __detach_device() function. Fixes: 92d420ec ("iommu/amd: Relax locking in dma_ops path") Reviewed-by: NFilippo Sironi <sironi@amazon.de> Reviewed-by: NJerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: NJoerg Roedel <jroedel@suse.de> Signed-off-by: Ntianyi <fujunkang@linux.alibaba.com> Reviewed-by: Nzhangliguang <zhangliguang@linux.alibaba.com> Acked-by: Nzhangliguang <zhangliguang@linux.alibaba.com>
-
由 Joerg Roedel 提交于
fix #26319040 commit 3a11905b69eb026402448c750f97a0eadfa76b08 upstream. The lock is not necessary because the device table does not contain shared state that needs protection. Locking is only needed on an individual entry basis, and that needs to happen on the iommu_dev_data level. Fixes: 92d420ec ("iommu/amd: Relax locking in dma_ops path") Reviewed-by: NFilippo Sironi <sironi@amazon.de> Reviewed-by: NJerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: NJoerg Roedel <jroedel@suse.de> Signed-off-by: Ntianyi <fujunkang@linux.alibaba.com> Reviewed-by: Nzhangliguang <zhangliguang@linux.alibaba.com> Acked-by: Nzhangliguang <zhangliguang@linux.alibaba.com>
-
由 Joerg Roedel 提交于
fix #26319040 commit f15d9a992f901d4f22db868adf800844d1cac9f2 upstream. iommu/amd: Remove domain->updated This struct member was used to track whether a domain change requires updates to the device-table and IOMMU cache flushes. The problem is, that access to this field is racy since locking in the common mapping code-paths has been eliminated. Move the updated field to the stack to get rid of all potential races and remove the field from the struct. Fixes: 92d420ec ("iommu/amd: Relax locking in dma_ops path") Reviewed-by: NFilippo Sironi <sironi@amazon.de> Reviewed-by: NJerry Snitselaar <jsnitsel@redhat.com> Signed-off-by: NJoerg Roedel <jroedel@suse.de> Signed-off-by: Ntianyi <fujunkang@linux.alibaba.com> Reviewed-by: Nzhangliguang <zhangliguang@linux.alibaba.com> Acked-by: Nzhangliguang <zhangliguang@linux.alibaba.com>
-
由 Tian Tao 提交于
to #25688970 commit 643956e61ced913a2bbdcf2c95f3d03026b39d1c upstream The fourth parameter 'level' of function 'acpi_find_cache_level()' is a signed interger, but its caller 'acpi_find_cache_node()' passes that parameter an unsigned interger. Make the paramter type inconsistency go away. Signed-off-by: NTian Tao <tiantao6@huawei.com> Signed-off-by: NXiongfeng Wang <wangxiongfeng2@huawei.com> [ rjw: Subject/changelog ] Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Nluanshi <zhangliguang@linux.alibaba.com> Acked-by: Nzou cao <zoucao@linux.alibaba.com> Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
-
由 Jeremy Linton 提交于
to #25688970 commit 56855a99f3d0d1e9f1f4e24f5851f9bf14c83296 upstream ACPI 6.3 adds a flag to indicate that child nodes are all identical cores. This is useful to authoritatively determine if a set of (possibly offline) cores are identical or not. Since the flag doesn't give us a unique id we can generate one and use it to create bitmaps of sibling nodes, or simply in a loop to determine if a subset of cores are identical. Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: NHanjun Guo <hanjun.guo@linaro.org> Reviewed-by: NSudeep Holla <sudeep.holla@arm.com> Signed-off-by: NJeremy Linton <jeremy.linton@arm.com> Signed-off-by: NWill Deacon <will@kernel.org> Signed-off-by: Nluanshi <zhangliguang@linux.alibaba.com> Acked-by: Nzou cao <zoucao@linux.alibaba.com> Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
-
由 Jeremy Linton 提交于
to #25688970 commit ed2b664fcc8073c09394393756df3fc86977bbac upstream The ACPI specification implies that the IDENTICAL flag should be set on all non leaf nodes where the children are identical. This means that we need to be searching for the last node with the identical flag set rather than the first one. Since this flag is also dependent on the table revision, we need to add a bit of extra code to verify the table revision, and the next node's state in the traversal. Since we want to avoid function pointers here, lets just special case the IDENTICAL flag. Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Tested-by: NHanjun Guo <hanjun.guo@linaro.org> Reviewed-by: NSudeep Holla <sudeep.holla@arm.com> Signed-off-by: NJeremy Linton <jeremy.linton@arm.com> Signed-off-by: NWill Deacon <will@kernel.org> Signed-off-by: Nluanshi <zhangliguang@linux.alibaba.com> Acked-by: Nzou cao <zoucao@linux.alibaba.com> Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
-
由 Bjorn Helgaas 提交于
to #25688970 commit 603fadf33604a2e170eb833f99f569d3597f1f09 upstream Fix some misspellings in comments. No functional change intended. Signed-off-by: NBjorn Helgaas <bhelgaas@google.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Nluanshi <zhangliguang@linux.alibaba.com> Acked-by: Nzou cao <zoucao@linux.alibaba.com> Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
-
由 Jeremy Linton 提交于
to #25688970 commit 4909e6df213a7c3e5e282538356f31ab68828793 upstream ACPI 6.3 bumps the PPTT table revision and adds a LEAF_NODE flag. This allows us to avoid a second pass through the table to assure that the node in question is a leaf. Signed-off-by: NJeremy Linton <jeremy.linton@arm.com> Reviewed-by: NSudeep Holla <sudeep.holla@arm.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Nluanshi <zhangliguang@linux.alibaba.com> Acked-by: Nzou cao <zoucao@linux.alibaba.com> Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
-
由 John Garry 提交于
to #25688970 commit 6cafe700b08cfd261a279b9e5ed99f3a346fe3b0 upstream For a system using ACPI-based FW without a PPTT, we may get many warnings about the lack of a PPTT, as shown: root@(none)$ dmesg | grep -i pptt [ 0.010125] ACPI PPTT: No PPTT table found, cpu topology may be inaccurate [ 7.138339] ACPI PPTT: No PPTT table found, cache topology may be inaccurate [ 7.145368] ACPI PPTT: No PPTT table found, cache topology may be inaccurate These logs are generated with pr_warn_once(), so the intention was for a single log, but the logs overlap, so consolidate them. Signed-off-by: NJohn Garry <john.garry@huawei.com> Reviewed-by: NJeremy Linton <jeremy.linton@arm.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Nluanshi <zhangliguang@linux.alibaba.com> Acked-by: Nzou cao <zoucao@linux.alibaba.com> Acked-by: NCaspar Zhang <caspar@linux.alibaba.com>
-
由 Christian König 提交于
to #25447038 commit b4ff0f8a85f3c523942e57b716e8722e7f6799cc upstream. This allows to invalidate VM entries without taking the reservation lock. v3: use -EBUSY Signed-off-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: Nmfkang <mfkang@linux.alibaba.com> Reviewed-by: Nluanshi <zhangliguang@linux.alibaba.com>
-
由 Christian König 提交于
to #25447038 commit 6ceeb144b1d6952a36afa6c29718beac575f2a3f upstream. When a page tables needs to be evicted the VM code should decide if that is possible or not. Signed-off-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NFelix Kuehling <Felix.Kuehling@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: Nmfkang <mfkang@linux.alibaba.com> Reviewed-by: Nluanshi <zhangliguang@linux.alibaba.com>
-
由 Christian König 提交于
to #25447038 commit 1bd4e4ca7bb8f681ff4e2b05c97ce975ccd781d6 upstream. Otherwise we won't be able to cleanly handle page faults. Signed-off-by: NChristian König <christian.koenig@amd.com> Reviewed-by: NChunming Zhou <david1.zhou@amd.com> Signed-off-by: NAlex Deucher <alexander.deucher@amd.com> Signed-off-by: Nmfkang <mfkang@linux.alibaba.com> Reviewed-by: Nluanshi <zhangliguang@linux.alibaba.com>
-
由 Christian Brauner 提交于
fix #27124689 commit 7f2923c4f73f21cfd714d12a2d48de8c21f11cfe upstream. proc_get_long() is a funny function. It uses simple_strtoul() and for a good reason. proc_get_long() wants to always succeed the parse and return the maybe incorrect value and the trailing characters to check against a pre-defined list of acceptable trailing values. However, simple_strtoul() explicitly ignores overflows which can cause funny things like the following to happen: echo 18446744073709551616 > /proc/sys/fs/file-max cat /proc/sys/fs/file-max 0 (Which will cause your system to silently die behind your back.) On the other hand kstrtoul() does do overflow detection but does not return the trailing characters, and also fails the parse when anything other than '\n' is a trailing character whereas proc_get_long() wants to be more lenient. Now, before adding another kstrtoul() function let's simply add a static parse strtoul_lenient() which: - fails on overflow with -ERANGE - returns the trailing characters to the caller The reason why we should fail on ERANGE is that we already do a partial fail on overflow right now. Namely, when the TMPBUFLEN is exceeded. So we already reject values such as 184467440737095516160 (21 chars) but accept values such as 18446744073709551616 (20 chars) but both are overflows. So we should just always reject 64bit overflows and not special-case this based on the number of chars. Link: http://lkml.kernel.org/r/20190107222700.15954-2-christian@brauner.ioSigned-off-by: NChristian Brauner <christian@brauner.io> Acked-by: NKees Cook <keescook@chromium.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: Joe Lawrence <joe.lawrence@redhat.com> Cc: Waiman Long <longman@redhat.com> Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Alexey Dobriyan <adobriyan@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Shile Zhang 提交于
to #26536261 Keep the common configs same between x86_64 and aarch64. Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Shile Zhang 提交于
to #24582903 Update aarch64 configs since gcc version and more minor changes. Signed-off-by: NShile Zhang <shile.zhang@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Yihao Wu 提交于
fix #25707555 commit 43e33924c38e8faeb0c12035481cb150e602e39d linux-next Deleting list entry within hlist_for_each_entry_safe is not safe unless next pointer (tmp) is protected too. It's not, because once hash_lock is released, cache_clean may delete the entry that tmp points to. Then cache_purge can walk to a deleted entry and tries to double free it. Fix this bug by holding only the deleted entry's reference. Suggested-by: NNeilBrown <neilb@suse.de> Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com> Reviewed-by: NNeilBrown <neilb@suse.de> [ cel: removed unused variable ] Signed-off-by: NChuck Lever <chuck.lever@oracle.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Michael Wang 提交于
fix #26198889 commit 26cf52229efc87e2effa9d788f9b33c40fb3358a linux-next During our testing, we found a case that shares no longer working correctly, the cgroup topology is like: /sys/fs/cgroup/cpu/A (shares=102400) /sys/fs/cgroup/cpu/A/B (shares=2) /sys/fs/cgroup/cpu/A/B/C (shares=1024) /sys/fs/cgroup/cpu/D (shares=1024) /sys/fs/cgroup/cpu/D/E (shares=1024) /sys/fs/cgroup/cpu/D/E/F (shares=1024) The same benchmark is running in group C & F, no other tasks are running, the benchmark is capable to consumed all the CPUs. We suppose the group C will win more CPU resources since it could enjoy all the shares of group A, but it's F who wins much more. The reason is because we have group B with shares as 2, since A->cfs_rq.load.weight == B->se.load.weight == B->shares/nr_cpus, so A->cfs_rq.load.weight become very small. And in calc_group_shares() we calculate shares as: load = max(scale_load_down(cfs_rq->load.weight), cfs_rq->avg.load_avg); shares = (tg_shares * load) / tg_weight; Since the 'cfs_rq->load.weight' is too small, the load become 0 after scale down, although 'tg_shares' is 102400, shares of the se which stand for group A on root cfs_rq become 2. While the se of D on root cfs_rq is far more bigger than 2, so it wins the battle. Thus when scale_load_down() scale real weight down to 0, it's no longer telling the real story, the caller will have the wrong information and the calculation will be buggy. This patch add check in scale_load_down(), so the real weight will be >= MIN_SHARES after scale, after applied the group C wins as expected. Suggested-by: NPeter Zijlstra <peterz@infradead.org> Signed-off-by: NMichael Wang <yun.wang@linux.alibaba.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NVincent Guittot <vincent.guittot@linaro.org> Link: https://lkml.kernel.org/r/38e8e212-59a1-64b2-b247-b6d0b52d8dc1@linux.alibaba.comAcked-by: NShanpei Chen <shanpeic@linux.alibaba.com> Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com>
-
由 Huaixin Chang 提交于
fix #25892693 commit 26a8b12747c975b33b4a82d62e4a307e1c07f31b upstream Currently, there is a potential race between distribute_cfs_runtime() and assign_cfs_rq_runtime(). Race happens when cfs_b->runtime is read, distributes without holding lock and finds out there is not enough runtime to charge against after distribution. Because assign_cfs_rq_runtime() might be called during distribution, and use cfs_b->runtime at the same time. Fibtest is the tool to test this race. Assume all gcfs_rq is throttled and cfs period timer runs, slow threads might run and sleep, returning unused cfs_rq runtime and keeping min_cfs_rq_runtime in their local pool. If all this happens sufficiently quickly, cfs_b->runtime will drop a lot. If runtime distributed is large too, over-use of runtime happens. A runtime over-using by about 70 percent of quota is seen when we test fibtest on a 96-core machine. We run fibtest with 1 fast thread and 95 slow threads in test group, configure 10ms quota for this group and see the CPU usage of fibtest is 17.0%, which is far from than the expected 10%. On a smaller machine with 32 cores, we also run fibtest with 96 threads. CPU usage is more than 12%, which is also more than expected 10%. This shows that on similar workloads, this race do affect CPU bandwidth control. Solve this by holding lock inside distribute_cfs_runtime(). Fixes: c06f04c7 ("sched: Fix potential near-infinite distribute_cfs_runtime() loop") Signed-off-by: NHuaixin Chang <changhuaixin@linux.alibaba.com> Reviewed-by: NBen Segall <bsegall@google.com> Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com> Link: https://lore.kernel.org/lkml/20200325092602.22471-1-changhuaixin@linux.alibaba.com/Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com>
-
由 Xunlei Pang 提交于
to #26424323 task_css() should be protected by rcu, fix several callers. Fixes: 1f49a738 ("alinux: psi: Support PSI under cgroup v1") Acked-by: NMichael Wang <yun.wany@linux.alibaba.com> Signed-off-by: NXunlei Pang <xlpang@linux.alibaba.com> Signed-off-by: NYihao Wu <wuyihao@linux.alibaba.com> Acked-by: NYang Shi <yang.shi@linux.alibaba.com>
-
- 21 4月, 2020 1 次提交
-
-
由 Chunmei Xu 提交于
to #26616987 Disable CONFIG_NFS_V3_ACL and CONFIG_NFSD_V3_ACL for aarch64, to be same with x86 Signed-off-by: NChunmei Xu <xuchunmei@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
- 17 4月, 2020 9 次提交
-
-
由 zhongjiang-ali 提交于
to #26788859 We've met several real-world issues that the child reaper (i.e. systemd) gets stuck in some aborted status and cann't reap its zombie children, so we provide the interface to do By specified the pid. Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com> Signed-off-by: Nzhongjiang-ali <zhongjiang-ali@linux.alibaba.com>
-
由 zhongjiang-ali 提交于
to #26424311 Commit 5028e358 ("alinux: mm: oom_kill: show killed task's cgroup info in global oom") introduces an potential null pointer reference. It is because the task 'p' maybe an null pointer in same code path. Fixes: 5028e358 ("alinux: mm: oom_kill: show killed task's cgroup info in global oom") Signed-off-by: Nzhongjiang-ali <zhongjiang-ali@linux.alibaba.com> Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>
-
由 Michal Hocko 提交于
task #25182720 commit 12e967fd8e4e6c3d275b4c69c890adc838891300 upstream Jann has brought up a very interesting point [1]. While shared pages are excluded from MADV_PAGEOUT normally, CoW pages can be easily reclaimed that way. This can lead to all sorts of hard to debug problems. E.g. performance problems outlined by Daniel [2]. There are runtime environments where there is a substantial memory shared among security domains via CoW memory and a easy to reclaim way of that memory, which MADV_{COLD,PAGEOUT} offers, can lead to either performance degradation in for the parent process which might be more privileged or even open side channel attacks. The feasibility of the latter is not really clear to me TBH but there is no real reason for exposure at this stage. It seems there is no real use case to depend on reclaiming CoW memory via madvise at this stage so it is much easier to simply disallow it and this is what this patch does. Put it simply MADV_{PAGEOUT,COLD} can operate only on the exclusively owned memory which is a straightforward semantic. [1] http://lkml.kernel.org/r/CAG48ez0G3JkMq61gUmyQAaCq=_TwHbi1XKzWRooxZkv08PQKuw@mail.gmail.com [2] http://lkml.kernel.org/r/CAKOZueua_v8jHCpmEtTB6f3i9e2YnmX4mqdYVWhV4E=Z-n+zRQ@mail.gmail.com Fixes: 9c276cc65a58 ("mm: introduce MADV_COLD") Reported-by: NJann Horn <jannh@google.com> Signed-off-by: NMichal Hocko <mhocko@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Acked-by: NVlastimil Babka <vbabka@suse.cz> Cc: Minchan Kim <minchan@kernel.org> Cc: Daniel Colascione <dancol@google.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: "Joel Fernandes (Google)" <joel@joelfernandes.org> Cc: <stable@vger.kernel.org> Link: http://lkml.kernel.org/r/20200312082248.GS23944@dhcp22.suse.czSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com> Signed-off-by: NXunlei Pang <xlpang@linux.alibaba.com>
-
由 Xunlei Pang 提交于
to #26782094 Pin code section of process in memory for the corresponding VMAs like mlock does. Usage: - pin process "PID" echo PID > /proc/unevictable/add_pid - unpin it echo PID > /proc/unevictable/del_pid - show all pinned process pids cat /proc/unevictable/add_pid For easy maintenance, we place it in kernel because it has no side effect if don't use it. Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com> Signed-off-by: NXunlei Pang <xlpang@linux.alibaba.com>
-
由 Xu Yu 提交于
fix #26416752 The idle page age shown in idle_page_stats is one scan period behind the theoretical idle age. The cause is that kidled_inc_page_age returned the ancient value, instead of the latest value, which leads to not accounting in corresponding memcg. This makes kidled_inc_page_age return the increased age of the page, i.e., the latest page age, when KIDLED_AGE_NOT_IN_PAGE_FLAGS is not set. Signed-off-by: NXu Yu <xuyu@linux.alibaba.com> Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com> Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com>
-
由 Michal Hocko 提交于
fix #25820910 commit 93b3a674485f6a4b8ffff85d1682d5e8b7c51560 upstream. pagetypeinfo_showfree_print is called by zone->lock held in irq mode. This is not really nice because it blocks both any interrupts on that cpu and the page allocator. On large machines this might even trigger the hard lockup detector. Considering the pagetypeinfo is a debugging tool we do not really need exact numbers here. The primary reason to look at the outuput is to see how pageblocks are spread among different migratetypes and low number of pages is much more interesting therefore putting a bound on the number of pages on the free_list sounds like a reasonable tradeoff. The new output will simply tell [...] Node 6, zone Normal, type Movable >100000 >100000 >100000 >100000 41019 31560 23996 10054 3229 983 648 instead of Node 6, zone Normal, type Movable 399568 294127 221558 102119 41019 31560 23996 10054 3229 983 648 The limit has been chosen arbitrary and it is a subject of a future change should there be a need for that. While we are at it, also drop the zone lock after each free_list iteration which will help with the IRQ and page allocator responsiveness even further as the IRQ lock held time is always bound to those 100k pages. [akpm@linux-foundation.org: tweak comment text, per David Hildenbrand] Link: http://lkml.kernel.org/r/20191025072610.18526-3-mhocko@kernel.orgSigned-off-by: NMichal Hocko <mhocko@suse.com> Suggested-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NWaiman Long <longman@redhat.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Acked-by: NDavid Hildenbrand <david@redhat.com> Acked-by: NRafael Aquini <aquini@redhat.com> Acked-by: NDavid Rientjes <rientjes@google.com> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Jann Horn <jannh@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Mel Gorman <mgorman@suse.de> Cc: Roman Gushchin <guro@fb.com> Cc: Song Liu <songliubraving@fb.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NXu Yu <xuyu@linux.alibaba.com> Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com> Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>
-
由 Xu Yu 提交于
to #26424368 Specifically, replace `val / 1000000` with `val >> 20` to do the optimization. This also fixes the possible compiling error when building with ARCH=i386, which reports undefined reference to `__udivdi3`. Signed-off-by: NXu Yu <xuyu@linux.alibaba.com> Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com>
-
由 Xu Yu 提交于
to #26424368 This reworks memsli "start", "end", "update" interfaces to make it more clear and symmetrical, by merging "update" action into "end", just like what psi_memstall_{enter, leave} does. Now the latency probe pattern of memsli is as follows: memcg_lat_stat_start(&start); /* kernel codes being probed */ memcg_lat_stat_end(MEM_LAT_XXX, start); This also formats the codes and fixes the warning(s) produced when CONFIG_MEMSLI is not set. Signed-off-by: NXu Yu <xuyu@linux.alibaba.com> Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com> Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com>
-
由 Xu Yu 提交于
to #26424368 Enable CONFIG_MEMSLI by default. Signed-off-by: NXu Yu <xuyu@linux.alibaba.com> Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com> Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>
-
- 16 4月, 2020 5 次提交
-
-
由 Xu Yu 提交于
to #26424368 This introduces the new bool kconfig MEMSLI, determining whether the memsli (memory latency histogram) feature should be built-in or not. Signed-off-by: NXu Yu <xuyu@linux.alibaba.com> Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com> Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>
-
由 Xu Yu 提交于
to #26424368 Since memsli also records latency histogram for swapout and swapin, which are NOT in the slow memory path, the overhead of memsli could be nonnegligible in some specific scenarios. For example, in scenarios with frequent swapping out and in, memsli could introduce overhead of ~1% of total run time of the synthetic testcase. This adds procfs interface for memsli switch. The memsli feature is enabled by default, and you can now disable it by: $ echo 0 > /proc/memsli/enabled Apparently, you can check current memsli switch status by: $ cat /proc/memsli/enabled Note that disabling memsli at runtime will NOT clear the existing latency histogram. You still need to manually reset the specified latency histogram(s) by echo 0 into the corresponding cgroup control file(s). Signed-off-by: NXu Yu <xuyu@linux.alibaba.com> Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com> Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>
-
由 Xu Yu 提交于
to #26424368 CPU hotplug may occur in some business scenarios, resulting in unavailable per-cpu memsli/exstat data on those non-present or offline CPU(s). This fixes the potential problem by using for_each_possible_cpu macro when gathering per-cpu memsli/exstat data. Signed-off-by: NXu Yu <xuyu@linux.alibaba.com> Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com> Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com>
-
由 Xu Yu 提交于
to #26424368 Commit 6202ab24 ("mm, memcg: throttle allocators when failing reclaim over memory.high") introduces explicit throttling when reclaim is failing to keep memcg size contained at the memory.high setting. Just account this latency on memcg direct reclaim latency histogram. Signed-off-by: NXu Yu <xuyu@linux.alibaba.com> Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com> Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com>
-
由 Xu Yu 提交于
to #26424368 Probe and calculate the latency of global swapout, memcg swapout and swapin respectively, and then group into the latency histogram in struct mem_cgroup. Note that the latency in each memcg is aggregated from all child memcgs. Usage: $ cat memory.direct_swapout_global_latency 0-1ms: 98313 1-5ms: 0 5-10ms: 0 10-100ms: 0 100-500ms: 0 500-1000ms: 0 >=1000ms: 0 total(ms): 52 Each line is the count of global swapout within the appropriate latency range. To clear the latency histogram: $ echo 0 > memory.direct_swapout_global_latency $ cat memory.direct_swapout_global_latency 0-1ms: 0 1-5ms: 0 5-10ms: 0 10-100ms: 0 100-500ms: 0 500-1000ms: 0 >=1000ms: 0 total(ms): 0 The usage of memory.direct_swapout_memcg_latency and memory.direct_swapin_latency is the same as memory.direct_swapout_global_latency. Signed-off-by: NXu Yu <xuyu@linux.alibaba.com> Reviewed-by: NYang Shi <yang.shi@linux.alibaba.com> Reviewed-by: NXunlei Pang <xlpang@linux.alibaba.com>
-