- 26 9月, 2021 32 次提交
-
-
由 Yang Shi 提交于
mainline inclusion from mainline-v5.13-rc1 commit 86750830 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I48N0H CVE: NA ------------------------------------------------- Use per memcg's nr_deferred for memcg aware shrinkers. The shrinker's nr_deferred will be used in the following cases: 1. Non memcg aware shrinkers 2. !CONFIG_MEMCG 3. memcg is disabled by boot parameter Link: https://lkml.kernel.org/r/20210311190845.9708-11-shy828301@gmail.comSigned-off-by: NYang Shi <shy828301@gmail.com> Acked-by: NRoman Gushchin <guro@fb.com> Acked-by: NKirill Tkhai <ktkhai@virtuozzo.com> Reviewed-by: NShakeel Butt <shakeelb@google.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NChen Wandun <chenwandun@huawei.com> Reviewed-by: NTong Tiangen <tongtiangen@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Yang Shi 提交于
mainline inclusion from mainline-v5.13-rc1 commit 3c6f17e6 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I48N0H CVE: NA ------------------------------------------------- Currently the number of deferred objects are per shrinker, but some slabs, for example, vfs inode/dentry cache are per memcg, this would result in poor isolation among memcgs. The deferred objects typically are generated by __GFP_NOFS allocations, one memcg with excessive __GFP_NOFS allocations may blow up deferred objects, then other innocent memcgs may suffer from over shrink, excessive reclaim latency, etc. For example, two workloads run in memcgA and memcgB respectively, workload in B is vfs heavy workload. Workload in A generates excessive deferred objects, then B's vfs cache might be hit heavily (drop half of caches) by B's limit reclaim or global reclaim. We observed this hit in our production environment which was running vfs heavy workload shown as the below tracing log: <...>-409454 [016] .... 28286961.747146: mm_shrink_slab_start: super_cache_scan+0x0/0x1a0 ffff9a83046f3458: nid: 1 objects to shrink 3641681686040 gfp_flags GFP_HIGHUSER_MOVABLE|__GFP_ZERO pgs_scanned 1 lru_pgs 15721 cache items 246404277 delta 31345 total_scan 123202138 <...>-409454 [022] .... 28287105.928018: mm_shrink_slab_end: super_cache_scan+0x0/0x1a0 ffff9a83046f3458: nid: 1 unused scan count 3641681686040 new scan count 3641798379189 total_scan 602 last shrinker return val 123186855 The vfs cache and page cache ratio was 10:1 on this machine, and half of caches were dropped. This also resulted in significant amount of page caches were dropped due to inodes eviction. Make nr_deferred per memcg for memcg aware shrinkers would solve the unfairness and bring better isolation. The following patch will add nr_deferred to parent memcg when memcg offline. To preserve nr_deferred when reparenting memcgs to root, root memcg needs shrinker_info allocated too. When memcg is not enabled (!CONFIG_MEMCG or memcg disabled), the shrinker's nr_deferred would be used. And non memcg aware shrinkers use shrinker's nr_deferred all the time. Link: https://lkml.kernel.org/r/20210311190845.9708-10-shy828301@gmail.comSigned-off-by: NYang Shi <shy828301@gmail.com> Acked-by: NRoman Gushchin <guro@fb.com> Acked-by: NKirill Tkhai <ktkhai@virtuozzo.com> Reviewed-by: NShakeel Butt <shakeelb@google.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NChen Wandun <chenwandun@huawei.com> Reviewed-by: NTong Tiangen <tongtiangen@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Yang Shi 提交于
mainline inclusion from mainline-v5.13-rc1 commit 41ca668a category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I48N0H CVE: NA ------------------------------------------------- Currently registered shrinker is indicated by non-NULL shrinker->nr_deferred. This approach is fine with nr_deferred at the shrinker level, but the following patches will move MEMCG_AWARE shrinkers' nr_deferred to memcg level, so their shrinker->nr_deferred would always be NULL. This would prevent the shrinkers from unregistering correctly. Remove SHRINKER_REGISTERING since we could check if shrinker is registered successfully by the new flag. Link: https://lkml.kernel.org/r/20210311190845.9708-9-shy828301@gmail.comSigned-off-by: NYang Shi <shy828301@gmail.com> Acked-by: NKirill Tkhai <ktkhai@virtuozzo.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Acked-by: NRoman Gushchin <guro@fb.com> Reviewed-by: NShakeel Butt <shakeelb@google.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NChen Wandun <chenwandun@huawei.com> Reviewed-by: NTong Tiangen <tongtiangen@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Yang Shi 提交于
mainline inclusion from mainline-v5.13-rc1 commit 468ab843 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I48N0H CVE: NA ------------------------------------------------- The shrinker_info is dereferenced in a couple of places via rcu_dereference_protected with different calling conventions, for example, using mem_cgroup_nodeinfo helper or dereferencing memcg->nodeinfo[nid]->shrinker_info. And the later patch will add more dereference places. So extract the dereference into a helper to make the code more readable. No functional change. [akpm@linux-foundation.org: retain rcu_dereference_protected() in free_shrinker_info(), per Hugh] Link: https://lkml.kernel.org/r/20210311190845.9708-8-shy828301@gmail.comSigned-off-by: NYang Shi <shy828301@gmail.com> Acked-by: NRoman Gushchin <guro@fb.com> Acked-by: NKirill Tkhai <ktkhai@virtuozzo.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Reviewed-by: NShakeel Butt <shakeelb@google.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NChen Wandun <chenwandun@huawei.com> Reviewed-by: NTong Tiangen <tongtiangen@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Yang Shi 提交于
mainline inclusion from mainline-v5.13-rc1 commit e4262c4f category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I48N0H CVE: NA ------------------------------------------------- The following patch is going to add nr_deferred into shrinker_map, the change will make shrinker_map not only include map anymore, so rename it to "memcg_shrinker_info". And this should make the patch adding nr_deferred cleaner and readable and make review easier. Also remove the "memcg_" prefix. Link: https://lkml.kernel.org/r/20210311190845.9708-7-shy828301@gmail.comSigned-off-by: NYang Shi <shy828301@gmail.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Acked-by: NKirill Tkhai <ktkhai@virtuozzo.com> Acked-by: NRoman Gushchin <guro@fb.com> Reviewed-by: NShakeel Butt <shakeelb@google.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NChen Wandun <chenwandun@huawei.com> Reviewed-by: NTong Tiangen <tongtiangen@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Yang Shi 提交于
mainline inclusion from mainline-v5.13-rc1 commit 72673e86 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I48N0H CVE: NA ------------------------------------------------- Using kvfree_rcu() to free the old shrinker_maps instead of call_rcu(). We don't have to define a dedicated callback for call_rcu() anymore. Link: https://lkml.kernel.org/r/20210311190845.9708-6-shy828301@gmail.comSigned-off-by: NYang Shi <shy828301@gmail.com> Acked-by: NRoman Gushchin <guro@fb.com> Acked-by: NKirill Tkhai <ktkhai@virtuozzo.com> Reviewed-by: NShakeel Butt <shakeelb@google.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NChen Wandun <chenwandun@huawei.com> Reviewed-by: NTong Tiangen <tongtiangen@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Yang Shi 提交于
mainline inclusion from mainline-v5.13-rc1 commit a2fb1261 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I48N0H CVE: NA ------------------------------------------------- Both memcg_shrinker_map_size and shrinker_nr_max is maintained, but actually the map size can be calculated via shrinker_nr_max, so it seems unnecessary to keep both. Remove memcg_shrinker_map_size since shrinker_nr_max is also used by iterating the bit map. Link: https://lkml.kernel.org/r/20210311190845.9708-5-shy828301@gmail.comSigned-off-by: NYang Shi <shy828301@gmail.com> Acked-by: NKirill Tkhai <ktkhai@virtuozzo.com> Acked-by: NRoman Gushchin <guro@fb.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Reviewed-by: NShakeel Butt <shakeelb@google.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NChen Wandun <chenwandun@huawei.com> Reviewed-by: NTong Tiangen <tongtiangen@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Yang Shi 提交于
mainline inclusion from mainline-v5.13-rc1 commit d27cf2aa category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I48N0H CVE: NA ------------------------------------------------- Since memcg_shrinker_map_size just can be changed under holding shrinker_rwsem exclusively, the read side can be protected by holding read lock, so it sounds superfluous to have a dedicated mutex. Kirill Tkhai suggested use write lock since: * We want the assignment to shrinker_maps is visible for shrink_slab_memcg(). * The rcu_dereference_protected() dereferrencing in shrink_slab_memcg(), but in case of we use READ lock in alloc_shrinker_maps(), the dereferrencing is not actually protected. * READ lock makes alloc_shrinker_info() racy against memory allocation fail. alloc_shrinker_info()->free_shrinker_info() may free memory right after shrink_slab_memcg() dereferenced it. You may say shrink_slab_memcg()->mem_cgroup_online() protects us from it? Yes, sure, but this is not the thing we want to remember in the future, since this spreads modularity. And a test with heavy paging workload didn't show write lock makes things worse. Link: https://lkml.kernel.org/r/20210311190845.9708-4-shy828301@gmail.comSigned-off-by: NYang Shi <shy828301@gmail.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Acked-by: NKirill Tkhai <ktkhai@virtuozzo.com> Acked-by: NRoman Gushchin <guro@fb.com> Reviewed-by: NShakeel Butt <shakeelb@google.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NChen Wandun <chenwandun@huawei.com> Reviewed-by: NTong Tiangen <tongtiangen@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Yang Shi 提交于
mainline inclusion from mainline-v5.13-rc1 commit 2bfd3637 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I48N0H CVE: NA ------------------------------------------------- The shrinker map management is not purely memcg specific, it is at the intersection between memory cgroup and shrinkers. It's allocation and assignment of a structure, and the only memcg bit is the map is being stored in a memcg structure. So move the shrinker_maps handling code into vmscan.c for tighter integration with shrinker code, and remove the "memcg_" prefix. There is no functional change. Link: https://lkml.kernel.org/r/20210311190845.9708-3-shy828301@gmail.comSigned-off-by: NYang Shi <shy828301@gmail.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Acked-by: NKirill Tkhai <ktkhai@virtuozzo.com> Acked-by: NRoman Gushchin <guro@fb.com> Reviewed-by: NShakeel Butt <shakeelb@google.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Conflicts: mm/memcontrol.c Signed-off-by: NChen Wandun <chenwandun@huawei.com> Reviewed-by: NTong Tiangen <tongtiangen@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Yang Shi 提交于
mainline inclusion from mainline-v5.13-rc1 commit 8efb4b59 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I48N0H CVE: NA ------------------------------------------------- Patch series "Make shrinker's nr_deferred memcg aware", v10. Recently huge amount one-off slab drop was seen on some vfs metadata heavy workloads, it turned out there were huge amount accumulated nr_deferred objects seen by the shrinker. On our production machine, I saw absurd number of nr_deferred shown as the below tracing result: <...>-48776 [032] .... 27970562.458916: mm_shrink_slab_start: super_cache_scan+0x0/0x1a0 ffff9a83046f3458: nid: 0 objects to shrink 2531805877005 gfp_flags GFP_HIGHUSER_MOVABLE pgs_scanned 32 lru_pgs 9300 cache items 1667 delta 11 total_scan 833 There are 2.5 trillion deferred objects on one node, assuming all of them are dentry (192 bytes per object), so the total size of deferred on one node is ~480TB. It is definitely ridiculous. I managed to reproduce this problem with kernel build workload plus negative dentry generator. First step, run the below kernel build test script: NR_CPUS=`cat /proc/cpuinfo | grep -e processor | wc -l` cd /root/Buildarea/linux-stable for i in `seq 1500`; do cgcreate -g memory:kern_build echo 4G > /sys/fs/cgroup/memory/kern_build/memory.limit_in_bytes echo 3 > /proc/sys/vm/drop_caches cgexec -g memory:kern_build make clean > /dev/null 2>&1 cgexec -g memory:kern_build make -j$NR_CPUS > /dev/null 2>&1 cgdelete -g memory:kern_build done Then run the below negative dentry generator script: NR_CPUS=`cat /proc/cpuinfo | grep -e processor | wc -l` mkdir /sys/fs/cgroup/memory/test echo $$ > /sys/fs/cgroup/memory/test/tasks for i in `seq $NR_CPUS`; do while true; do FILE=`head /dev/urandom | tr -dc A-Za-z0-9 | head -c 64` cat $FILE 2>/dev/null done & done Then kswapd will shrink half of dentry cache in just one loop as the below tracing result showed: kswapd0-475 [028] .... 305968.252561: mm_shrink_slab_start: super_cache_scan+0x0/0x190 0000000024acf00c: nid: 0 objects to shrink 4994376020 gfp_flags GFP_KERNEL cache items 93689873 delta 45746 total_scan 46844936 priority 12 kswapd0-475 [021] .... 306013.099399: mm_shrink_slab_end: super_cache_scan+0x0/0x190 0000000024acf00c: nid: 0 unused scan count 4994376020 new scan count 4947576838 total_scan 8 last shrinker return val 46844928 There were huge number of deferred objects before the shrinker was called, the behavior does match the code but it might be not desirable from the user's stand of point. The excessive amount of nr_deferred might be accumulated due to various reasons, for example: * GFP_NOFS allocation * Significant times of small amount scan (< scan_batch, 1024 for vfs metadata) However the LRUs of slabs are per memcg (memcg-aware shrinkers) but the deferred objects is per shrinker, this may have some bad effects: * Poor isolation among memcgs. Some memcgs which happen to have frequent limit reclaim may get nr_deferred accumulated to a huge number, then other innocent memcgs may take the fall. In our case the main workload was hit. * Unbounded deferred objects. There is no cap for deferred objects, it can outgrow ridiculously as the tracing result showed. * Easy to get out of control. Although shrinkers take into account deferred objects, but it can go out of control easily. One misconfigured memcg could incur absurd amount of deferred objects in a period of time. * Sort of reclaim problems, i.e. over reclaim, long reclaim latency, etc. There may be hundred GB slab caches for vfe metadata heavy workload, shrink half of them may take minutes. We observed latency spike due to the prolonged reclaim. These issues also have been discussed in https://lore.kernel.org/linux-mm/20200916185823.5347-1-shy828301@gmail.com/. The patchset is the outcome of that discussion. So this patchset makes nr_deferred per-memcg to tackle the problem. It does: * Have memcg_shrinker_deferred per memcg per node, just like what shrinker_map does. Instead it is an atomic_long_t array, each element represent one shrinker even though the shrinker is not memcg aware, this simplifies the implementation. For memcg aware shrinkers, the deferred objects are just accumulated to its own memcg. The shrinkers just see nr_deferred from its own memcg. Non memcg aware shrinkers still use global nr_deferred from struct shrinker. * Once the memcg is offlined, its nr_deferred will be reparented to its parent along with LRUs. * The root memcg has memcg_shrinker_deferred array too. It simplifies the handling of reparenting to root memcg. * Cap nr_deferred to 2x of the length of lru. The idea is borrowed from Dave Chinner's series (https://lore.kernel.org/linux-xfs/20191031234618.15403-1-david@fromorbit.com/) The downside is each memcg has to allocate extra memory to store the nr_deferred array. On our production environment, there are typically around 40 shrinkers, so each memcg needs ~320 bytes. 10K memcgs would need ~3.2MB memory. It seems fine. We have been running the patched kernel on some hosts of our fleet (test and production) for months, it works very well. The monitor data shows the working set is sustained as expected. This patch (of 13): The tracepoint's nid should show what node the shrink happens on, the start tracepoint uses nid from shrinkctl, but the nid might be set to 0 before end tracepoint if the shrinker is not NUMA aware, so the tracing log may show the shrink happens on one node but end up on the other node. It seems confusing. And the following patch will remove using nid directly in do_shrink_slab(), this patch also helps cleanup the code. Link: https://lkml.kernel.org/r/20210311190845.9708-1-shy828301@gmail.com Link: https://lkml.kernel.org/r/20210311190845.9708-2-shy828301@gmail.comSigned-off-by: NYang Shi <shy828301@gmail.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Acked-by: NKirill Tkhai <ktkhai@virtuozzo.com> Reviewed-by: NShakeel Butt <shakeelb@google.com> Acked-by: NRoman Gushchin <guro@fb.com> Cc: Dave Chinner <david@fromorbit.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> Signed-off-by: NChen Wandun <chenwandun@huawei.com> Reviewed-by: NTong Tiangen <tongtiangen@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 chenguangli 提交于
driver inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I47Z9P ----------------------------------------------------------------------- The memory will be leaked when driver set the port_poto_cfg invalid value. Signed-off-by: Nchenguangli <chenguangli2@huawei.com> Reviewed-by: Nbaowenyi <baowenyi@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Kai Ye 提交于
driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I477YJ?from=project-issue ---------------------------------------------------------------------- Add one page memory for device or qp status. Set a qp error flag in this page when device resetting. This error flag can be seen in the userspace. It helps users to stop tasks when device hardware occurs on the device. Signed-off-by: NKai Ye <yekai13@huawei.com> Reviewed-by: NHao Fang <fanghao11@huawei.com> Reviewed-by: NMingqiang Ling <lingmingqiang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Longfang Liu 提交于
driver inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I473Q4?from=project-issue ---------------------------------------------------------------------- This driver will provide all device operations as long as it's probed successfully by vfio-pci driver. Signed-off-by: NLongfang Liu <liulongfang@huawei.com> Reviewed-by: NHao Fang <fanghao11@huawei.com> Reviewed-by: NMingqiang Ling <lingmingqiang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Longfang Liu 提交于
driver inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I473Q4?from=project-issue ---------------------------------------------------------------------- Add device status recording function for accelerator live migration driver. Signed-off-by: NLongfang Liu <liulongfang@huawei.com> Reviewed-by: NHao Fang <fanghao11@huawei.com> Reviewed-by: NMingqiang Ling <lingmingqiang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Longfang Liu 提交于
driver inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I473Q4?from=project-issue ---------------------------------------------------------------------- vfio_pci_vendor_driver_ops includes these parts: (1) .probe() and .remove() interface to be called by vfio_pci_probe() and vfio_pci_remove(). (2) pointer to struct vfio_device_ops. It will be registered as ops of vfio device if .probe() succeeds. (3) vendor modules call macro module_vfio_pci_register_vendor_handler to generate module_init and module_exit. (4) export functions vfio_pci_vendor_data(), vfio_pci_irq_type(), vfio_pci_num_regions(), vfio_pci_pdev(), and functions in vfio_pci_ops, so they are able to be called from outside modules and make them a kind of inherited by vfio_device_ops provided by vendor modules (5) allows a simpler VFIO_DEVICE_GET_INFO ioctl in vendor driver, let vfio_pci know number of vendor regions and vendor irqs (6) allows vendor driver to read/write to bars directly which is useful in security checking condition. (7) allows vendor driver triggers this VFIO_IRQ_TYPE_REMAP_BAR_REGION when it wants to notify userspace to remap PCI BARs. Signed-off-by: NLongfang Liu <liulongfang@huawei.com> Reviewed-by: NHao Fang <fanghao11@huawei.com> Reviewed-by: NMingqiang Ling <lingmingqiang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Zhangfei Gao 提交于
driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I472UK?from=project-issue ---------------------------------------------------------------------- HiSilicon KunPeng920 and KunPeng930 have devices appear as PCI but are actually on the AMBA bus. These fake PCI devices can support SVA via SMMU stall feature, by setting dma-can-stall for ACPI platforms. Signed-off-by: NZhangfei Gao <zhangfei.gao@linaro.org> Signed-off-by: NJean-Philippe Brucker <jean-philippe@linaro.org> Signed-off-by: NZhou Wang <wangzhou1@hisilicon.com> Reviewed-by: NHao Fang <fanghao11@huawei.com> Reviewed-by: NMingqiang Ling <lingmingqiang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Zhangfei Gao 提交于
driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I472UK?from=project-issue ---------------------------------------------------------------------- HiSilicon KunPeng920 and KunPeng930 have devices appear as PCI but are actually on the AMBA bus. These fake PCI devices have PASID capability though not supporting TLP. Add a quirk to set pasid_no_tlp for these devices. Signed-off-by: NZhangfei Gao <zhangfei.gao@linaro.org> Signed-off-by: NJean-Philippe Brucker <jean-philippe@linaro.org> Signed-off-by: NZhou Wang <wangzhou1@hisilicon.com> Reviewed-by: NHao Fang <fanghao11@huawei.com> Reviewed-by: NMingqiang Ling <lingmingqiang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Zhangfei Gao 提交于
driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I472UK?from=project-issue ---------------------------------------------------------------------- A PASID-like feature is implemented on AMBA without using TLP prefixes and these devices have PASID capability though not supporting TLP. Adding a pasid_no_tlp bit for "PASID works without TLP prefixes" and pci_enable_pasid() checks pasid_no_tlp as well as eetlp_prefix_path. Suggested-by: NBjorn Helgaas <helgaas@kernel.org> Signed-off-by: NZhangfei Gao <zhangfei.gao@linaro.org> Reviewed-by: NHao Fang <fanghao11@huawei.com> Reviewed-by: NMingqiang Ling <lingmingqiang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Kai Ye 提交于
driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4750I?from=project-issue ---------------------------------------------------------------------- The CTR counter is 32bit rollover default on the BD. But the NIST standard is 128bit rollover. it cause the testing failed, so need to fix the BD configuration. Signed-off-by: NKai Ye <yekai13@huawei.com> Reviewed-by: NHao Fang <fanghao11@huawei.com> Reviewed-by: NMingqiang Ling <lingmingqiang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Kai Ye 提交于
driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4750I?from=project-issue ---------------------------------------------------------------------- Fix the maximum length of AAD for the CCM mode due to the hardware limited. Signed-off-by: NKai Ye <yekai13@huawei.com> Reviewed-by: NHao Fang <fanghao11@huawei.com> Reviewed-by: NMingqiang Ling <lingmingqiang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Kai Ye 提交于
driver inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I4751W?from=project-issue ---------------------------------------------------------------------- Fixup icv(integrity check value) checking enabled wrong on Kunpeng 930 Signed-off-by: NKai Ye <yekai13@huawei.com> Reviewed-by: NHao Fang <fanghao11@huawei.com> Reviewed-by: NMingqiang Ling <lingmingqiang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Weili Qian 提交于
mainline inclusion from mainline-master commit 3e1d2c52 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I475PF?from=project-issue CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git/commit/?id=3e1d2c52b2045ba7f90966b02daeb6c438432570 ---------------------------------------------------------------------- To support runtime PM, use the function 'pci_set_power_state' to change the power state. Therefore, method _PS0 or _PR0 needs to be filled by platform. So check whether the method is supported, if not, print a prompt information. Signed-off-by: NWeili Qian <qianweili@huawei.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Reviewed-by: NHao Fang <fanghao11@huawei.com> Reviewed-by: NMingqiang Ling <lingmingqiang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Weili Qian 提交于
mainline inclusion from mainline-master commit 74f5edbf category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I475PF?from=project-issue CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git/commit/?id=74f5edbffcd37162084b6883e059bb6bb686151d ---------------------------------------------------------------------- To avoid repeatedly obtaining 'qm' from 'filp', parameter passing of debugfs function directly use 'qm' instead of 'filp'. Signed-off-by: NWeili Qian <qianweili@huawei.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Reviewed-by: NHao Fang <fanghao11@huawei.com> Reviewed-by: NMingqiang Ling <lingmingqiang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Weili Qian 提交于
mainline inclusion from mainline-master commit 607c191b category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I475PF?from=project-issue CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git/commit/?id=607c191b371d72952c11dc209e583303a4515f14 ---------------------------------------------------------------------- Add runtime PM support for Kunpeng930 accelerator device. Signed-off-by: NWeili Qian <qianweili@huawei.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Reviewed-by: NHao Fang <fanghao11@huawei.com> Reviewed-by: NMingqiang Ling <lingmingqiang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Weili Qian 提交于
mainline inclusion from mainline-master commit d7ea5339 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I475PF?from=project-issue CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git/commit/?id=d7ea53395b723b1a87b9c0afb3301cc33fbe35e6 ---------------------------------------------------------------------- Accelerator devices support runtime PM to reduce power consumption. This patch adds the runtime PM suspend/resume callbacks to the accelerator devices. Signed-off-by: NWeili Qian <qianweili@huawei.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Reviewed-by: NHao Fang <fanghao11@huawei.com> Reviewed-by: NMingqiang Ling <lingmingqiang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Weili Qian 提交于
mainline inclusion from mainline-master commit 1295292d category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I475PF?from=project-issue CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git/commit/?id=1295292d65b729fc8b234fcdf884d79ff5a63ca1 ---------------------------------------------------------------------- The accelerator devices support runtime PM, when device is in suspended, an exception will occur if reading registers. Therefore, this patch uses 'debugfs_create_file' instead of 'debugfs_create_regset32' to create debugfs file, and then the driver can get the device status before reading the register. Signed-off-by: NWeili Qian <qianweili@huawei.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Reviewed-by: NHao Fang <fanghao11@huawei.com> Reviewed-by: NMingqiang Ling <lingmingqiang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Kai Ye 提交于
mainline inclusion from mainline-master commit a5262610 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I475XT?from=project-issue CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git/commit/?id=a52626106d6f7edf3d106c065e13a0313cfeb82f ---------------------------------------------------------------------- When the endian configuration of the hardware is abnormal, it will cause the SEC engine is faulty that reports empty message. And it will affect the normal function of the hardware. Currently the soft configuration method can't restore the faulty device. The endian needs to be configured according to the system properties. So fix it. Signed-off-by: NKai Ye <yekai13@huawei.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Reviewed-by: NHao Fang <fanghao11@huawei.com> Reviewed-by: NMingqiang Ling <lingmingqiang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Kai Ye 提交于
mainline inclusion from mainline-master commit 90367a02 category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I475XT?from=project-issue CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git/commit/?id=90367a027a22c3a9ca8b8bac15df34d9e859fc11 ---------------------------------------------------------------------- Because the algs registration process has added a judgment. So need to add the judgment for the abnormal exiting process. Signed-off-by: NKai Ye <yekai13@huawei.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Reviewed-by: NHao Fang <fanghao11@huawei.com> Reviewed-by: NMingqiang Ling <lingmingqiang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Weili Qian 提交于
mainline inclusion from mainline-master commit ea5202df category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I475PF?from=project-issue CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git/commit/drivers/crypto/hisilicon?id=ea5202dff79ce23e1a9fee3e1b2f09e28b77ba3a ---------------------------------------------------------------------- Kunpeng930 hpre device supports dynamic clock gating. When doing tasks, the algorithm core is opened, and when idle, the algorithm core is closed. This patch enables hpre dynamic clock gating by writing hardware registers. Signed-off-by: NWeili Qian <qianweili@huawei.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Reviewed-by: NHao Fang <fanghao11@huawei.com> Reviewed-by: NMingqiang Ling <lingmingqiang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Weili Qian 提交于
mainline inclusion from mainline-master commit 3d845d49 category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I475PF?from=project-issue CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git/commit/drivers/crypto/hisilicon?id=3d845d497b23547150fe7f9b3261ead9f4295686 ---------------------------------------------------------------------- Kunpeng930 sec device supports dynamic clock gating. When doing tasks, the algorithm core is opened, and when idle, the algorithm core is closed. This patch enables sec dynamic clock gating by writing hardware registers. Signed-off-by: NWeili Qian <qianweili@huawei.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Reviewed-by: NHao Fang <fanghao11@huawei.com> Reviewed-by: NMingqiang Ling <lingmingqiang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Weili Qian 提交于
mainline inclusion from mainline-master commit ed5fa39f category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I475PF?from=project-issue CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/herbert/cryptodev-2.6.git/commit/drivers/crypto/hisilicon?id=ed5fa39fa8a62fc55c1c4d53b71f3f4f08a90d22 ---------------------------------------------------------------------- Kunpeng930 zip device supports dynamic clock gating. When executing tasks, the algorithm core is opened, and when idle, the algorithm core is closed. This patch enables zip dynamic clock gating by writing hardware registers. Signed-off-by: NWeili Qian <qianweili@huawei.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Reviewed-by: NHao Fang <fanghao11@huawei.com> Reviewed-by: NMingqiang Ling <lingmingqiang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Kai Ye 提交于
mainline inclusion from mainline-master commit 66192b2e category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I475XT?from=project-issue CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/drivers/crypto/hisilicon?id=66192b2e3fd8ab97ed518d6c0240e26655a20b4b ---------------------------------------------------------------------- The open interface of the sva prefetching function is distinguish the chip version. But the close interface of the sva prefetching function doesn't distinguish the chip version. As a result, the sva prefetching close operation is also performed on Kunpeng920, those registers are important on Kunpeng920, which eventually leads to abnormal hardware problems. So need to fix it immediately. Signed-off-by: NKai Ye <yekai13@huawei.com> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Reviewed-by: NHao Fang <fanghao11@huawei.com> Reviewed-by: NMingqiang Ling <lingmingqiang@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
- 02 9月, 2021 8 次提交
-
-
由 Mel Gorman 提交于
mainline inclusion from mainline-5.14-rc1 commit ff4b2b40 category: bugfix bugzilla: https://gitee.com/src-openeuler/nfs-utils/issues/I46NSS CVE: NA Reference: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=ff4b2b4014cbffb3d32b22629252f4dc8616b0fe ------------------------------------------------- Dave Jones reported the following This made it into 5.13 final, and completely breaks NFSD for me (Serving tcp v3 mounts). Existing mounts on clients hang, as do new mounts from new clients. Rebooting the server back to rc7 everything recovers. The commit b3b64ebd ("mm/page_alloc: do bulk array bounds check after checking populated elements") returns the wrong value if the array is already populated which is interpreted as an allocation failure. Dave reported this fixes his problem and it also passed a test running dbench over NFS. Link: https://lkml.kernel.org/r/20210628150219.GC3840@techsingularity.net Fixes: b3b64ebd ("mm/page_alloc: do bulk array bounds check after checking populated elements") Signed-off-by: NMel Gorman <mgorman@techsingularity.net> Reported-by: NDave Jones <davej@codemonkey.org.uk> Tested-by: NDave Jones <davej@codemonkey.org.uk> Cc: Dan Carpenter <dan.carpenter@oracle.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> [5.13+] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org> (cherry picked from commit ff4b2b40) Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com> Reviewed-by: Ntong tiangen <tongtiangen@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Jing Xiangfeng 提交于
hulk inclusion category: bugfix bugzilla: https://gitee.com/openeuler/kernel/issues/I46IUJ CVE: NA -------------------------------------- In oom_next_task(), if points is equal to LONG_MIN, then we choose it to kill. That is not correct. LONG_MIN means to disable killing it. So fix it. Fixes: 4da32073 ("memcg: support priority for oom") Signed-off-by: NJing Xiangfeng <jingxiangfeng@huawei.com> Reviewed-by: NChen Wandun <chenwandun@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Xiongfeng Wang 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I40AXF CVE: NA -------------------------------------- Enable CONFIG_USERSWAP for openeuler_defconfig Signed-off-by: NXiongfeng Wang <wangxiongfeng2@huawei.com> Reviewed-by: Ntong tiangen <tongtiangen@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Yu Kuai 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I40JRR CVE: NA -------------------------------------- Enable eulerfs as module by default. Signed-off-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Yu Kuai 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I40JRR CVE: NA -------------------------------------- Signed-off-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Yu Kuai 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I40JRR CVE: NA -------------------------------------- Implement super_operations and module_init/exit interfaces. Signed-off-by: NMingkai Dong <dongmingkai1@huawei.com> Signed-off-by: NHou Tao <houtao1@huawei.com> Signed-off-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Yu Kuai 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I40JRR CVE: NA -------------------------------------- Implement get_link, setattr and listxattr interfaces for symlink inode operations. Signed-off-by: NMingkai Dong <dongmingkai1@huawei.com> Signed-off-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-
由 Yu Kuai 提交于
hulk inclusion category: feature bugzilla: https://gitee.com/openeuler/kernel/issues/I40JRR CVE: NA -------------------------------------- Impelement file_operations for dir inode. Signed-off-by: NMingkai Dong <dongmingkai1@huawei.com> Signed-off-by: NHou Tao <houtao1@huawei.com> Signed-off-by: NZhikang Zhang <zhangzhikang1@huawei.com> Signed-off-by: NYu Kuai <yukuai3@huawei.com> Reviewed-by: NHou Tao <houtao1@huawei.com> Signed-off-by: NZheng Zengkai <zhengzengkai@huawei.com>
-