- 25 2月, 2021 40 次提交
-
-
由 Muchun Song 提交于
We use a global percpu int_active_memcg variable to store the remote memcg when we are in the interrupt context. But get_active_memcg always return the current->active_memcg or root_mem_cgroup. The remote memcg (set in the interrupt context) is ignored. This is not what we want. So fix it. Link: https://lkml.kernel.org/r/20210223091101.42150-1-songmuchun@bytedance.com Fixes: 37d5985c ("mm: kmem: prepare remote memcg charging infra for interrupt contexts") Signed-off-by: NMuchun Song <songmuchun@bytedance.com> Reviewed-by: NShakeel Butt <shakeelb@google.com> Reviewed-by: NRoman Gushchin <guro@fb.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Muchun Song 提交于
When pages are swapped in, the VM may retain the swap copy to avoid repeated writes in the future. It's also retained if shared pages are faulted back in some processes, but not in others. During that time we have an in-memory copy of the page, as well as an on-swap copy. Cgroup1 and cgroup2 handle these overlapping lifetimes slightly differently due to the nature of how they account memory and swap: Cgroup1 has a unified memory+swap counter that tracks a data page regardless whether it's in-core or swapped out. On swapin, we transfer the charge from the swap entry to the newly allocated swapcache page, even though the swap entry might stick around for a while. That's why we have a mem_cgroup_uncharge_swap() call inside mem_cgroup_charge(). Cgroup2 tracks memory and swap as separate, independent resources and thus has split memory and swap counters. On swapin, we charge the newly allocated swapcache page as memory, while the swap slot in turn must remain charged to the swap counter as long as its allocated too. The cgroup2 logic was broken by commit 2d1c4980 ("mm: memcontrol: make swap tracking an integral part of memory control"), because it accidentally removed the do_memsw_account() check in the branch inside mem_cgroup_uncharge() that was supposed to tell the difference between the charge transfer in cgroup1 and the separate counters in cgroup2. As a result, cgroup2 currently undercounts retained swap to varying degrees: swap slots are cached up to 50% of the configured limit or total available swap space; partially faulted back shared pages are only limited by physical capacity. This in turn allows cgroups to significantly overconsume their alloted swap space. Add the do_memsw_account() check back to fix this problem. Link: https://lkml.kernel.org/r/20210217153237.92484-1-songmuchun@bytedance.com Fixes: 2d1c4980 ("mm: memcontrol: make swap tracking an integral part of memory control") Signed-off-by: NMuchun Song <songmuchun@bytedance.com> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Reviewed-by: NShakeel Butt <shakeelb@google.com> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: <stable@vger.kernel.org> [5.8+] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Johannes Weiner 提交于
alloc_page_buffers() currently uses get_mem_cgroup_from_page() for charging the buffers to the page owner, which does an rcu-protected page->memcg lookup and acquires a reference. But buffer allocation has the page lock held throughout, which pins the page to the memcg and thereby the memcg - neither rcu nor holding an extra reference during the allocation are necessary. Use a raw page_memcg() instead. This was the last user of get_mem_cgroup_from_page(), delete it. Link: https://lkml.kernel.org/r/20210209190126.97842-1-hannes@cmpxchg.orgSigned-off-by: NJohannes Weiner <hannes@cmpxchg.org> Reported-by: NMuchun Song <songmuchun@bytedance.com> Reviewed-by: NShakeel Butt <shakeelb@google.com> Acked-by: NRoman Gushchin <guro@fb.com> Acked-by: NMichal Hocko <mhocko@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Shakeel Butt 提交于
The list_lru file used to have local kvfree_rcu() which was renamed by commit e0feed08 ("mm/list_lru.c: Rename kvfree_rcu() to local variant") to introduce the globally visible kvfree_rcu(). Now we have global kvfree_rcu(), so remove the local kvfree_rcu_local() and just use the global one. Link: https://lkml.kernel.org/r/20210207152148.1285842-1-shakeelb@google.comSigned-off-by: NShakeel Butt <shakeelb@google.com> Reviewed-by: NUladzislau Rezki <urezki@gmail.com> Reviewed-by: NKirill Tkhai <ktkhai@virtuozzo.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Muchun Song 提交于
The rule of list walk has gone since commit a9d5adee ("mm/memcontrol: allow to uncharge page without using page->lru field") So remove the strange comment and replace the loop with a list_for_each_entry(). There is only one caller of the uncharge_list(). So just fold it into mem_cgroup_uncharge_list() and remove it. Link: https://lkml.kernel.org/r/20210204163055.56080-1-songmuchun@bytedance.comSigned-off-by: NMuchun Song <songmuchun@bytedance.com> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Acked-by: NRoman Gushchin <guro@fb.com> Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com> Acked-by: NMichal Hocko <mhocko@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Yang Li 提交于
Fix below warnings reported by coccicheck: mm/memcontrol.c:451:3-9: WARNING: NULL check before some freeing functions is not needed. Link: https://lkml.kernel.org/r/1611216029-34397-1-git-send-email-abaci-bugfix@linux.alibaba.comSigned-off-by: NYang Li <abaci-bugfix@linux.alibaba.com> Reported-by: NAbaci Robot <abaci@linux.alibaba.com> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NDavid Hildenbrand <david@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Feng Tang 提交于
When checking a memory cgroup related performance regression [1], from the perf c2c profiling data, we found high false sharing for accessing 'usage' and 'parent'. On 64 bit system, the 'usage' and 'parent' are close to each other, and easy to be in one cacheline (for cacheline size == 64+ B). 'usage' is usally written, while 'parent' is usually read as the cgroup's hierarchical counting nature. So move the 'parent' to the end of the structure to make sure they are in different cache lines. Following are some performance data with the patch, against v5.11-rc1. [ In the data, A means a platform with 2 sockets 48C/96T, B is a platform of 4 sockests 72C/144T, and if a %stddev will be shown bigger than 2%, P100/P50 means number of test tasks equals to 100%/50% of nr_cpu] will-it-scale/malloc1 --------------------- v5.11-rc1 v5.11-rc1+patch A-P100 15782 ± 2% -0.1% 15765 ± 3% will-it-scale.per_process_ops A-P50 21511 +8.9% 23432 will-it-scale.per_process_ops B-P100 9155 +2.2% 9357 will-it-scale.per_process_ops B-P50 10967 +7.1% 11751 ± 2% will-it-scale.per_process_ops will-it-scale/pagefault2 ------------------------ v5.11-rc1 v5.11-rc1+patch A-P100 79028 +3.0% 81411 will-it-scale.per_process_ops A-P50 183960 ± 2% +4.4% 192078 ± 2% will-it-scale.per_process_ops B-P100 85966 +9.9% 94467 ± 3% will-it-scale.per_process_ops B-P50 198195 +9.8% 217526 will-it-scale.per_process_ops fio (4k/1M is block size) ------------------------- v5.11-rc1 v5.11-rc1+patch A-P50-r-4k 16881 ± 2% +1.2% 17081 ± 2% fio.read_bw_MBps A-P50-w-4k 3931 +4.5% 4111 ± 2% fio.write_bw_MBps A-P50-r-1M 15178 -0.2% 15154 fio.read_bw_MBps A-P50-w-1M 3924 +0.1% 3929 fio.write_bw_MBps [1].https://lore.kernel.org/lkml/20201102091543.GM31092@shao2-debian/ Link: https://lkml.kernel.org/r/1611040814-33449-1-git-send-email-feng.tang@intel.comSigned-off-by: NFeng Tang <feng.tang@intel.com> Reviewed-by: NRoman Gushchin <guro@fb.com> Reviewed-by: NShakeel Butt <shakeelb@google.com> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Acked-by: NMichal Hocko <mhocko@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Roman Gushchin 提交于
I've noticed that __memcg_kmem_charge() and __memcg_kmem_uncharge() are not used anywhere except memcontrol.c. Yet they are not declared as non-static and are declared in memcontrol.h. This patch makes them static. Link: https://lkml.kernel.org/r/20210108020332.4096911-1-guro@fb.comSigned-off-by: NRoman Gushchin <guro@fb.com> Reviewed-by: NShakeel Butt <shakeelb@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Shakeel Butt 提交于
This patch adds swapcache stat for the cgroup v2. The swapcache represents the memory that is accounted against both the memory and the swap limit of the cgroup. The main motivation behind exposing the swapcache stat is for enabling users to gracefully migrate from cgroup v1's memsw counter to cgroup v2's memory and swap counters. Cgroup v1's memsw limit allows users to limit the memory+swap usage of a workload but without control on the exact proportion of memory and swap. Cgroup v2 provides separate limits for memory and swap which enables more control on the exact usage of memory and swap individually for the workload. With some little subtleties, the v1's memsw limit can be switched with the sum of the v2's memory and swap limits. However the alternative for memsw usage is not yet available in cgroup v2. Exposing per-cgroup swapcache stat enables that alternative. Adding the memory usage and swap usage and subtracting the swapcache will approximate the memsw usage. This will help in the transparent migration of the workloads depending on memsw usage and limit to v2' memory and swap counters. The reasons these applications are still interested in this approximate memsw usage are: (1) these applications are not really interested in two separate memory and swap usage metrics. A single usage metric is more simple to use and reason about for them. (2) The memsw usage metric hides the underlying system's swap setup from the applications. Applications with multiple instances running in a datacenter with heterogeneous systems (some have swap and some don't) will keep seeing a consistent view of their usage. [akpm@linux-foundation.org: fix CONFIG_SWAP=n build] Link: https://lkml.kernel.org/r/20210108155813.2914586-3-shakeelb@google.comSigned-off-by: NShakeel Butt <shakeelb@google.com> Acked-by: NMichal Hocko <mhocko@suse.com> Reviewed-by: NRoman Gushchin <guro@fb.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alex Shi 提交于
lock_page_lruvec() and its variants used rcu_read_lock() with the intention of safeguarding against the mem_cgroup being destroyed concurrently; but so long as they are called under the specified conditions (as they are), there is no way for the page's mem_cgroup to be destroyed. Delete the unnecessary rcu_read_lock() and _unlock(). Hugh Dickins polished the commit log. Thanks a lot! Link: https://lkml.kernel.org/r/1608614453-10739-2-git-send-email-alex.shi@linux.alibaba.comSigned-off-by: NAlex Shi <alex.shi@linux.alibaba.com> Acked-by: NHugh Dickins <hughd@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Alex Shi 提交于
lock_page_lruvec() and its variants are safe to use under the same conditions as commit_charge(): add lock_page_memcg() to the comment. Polished with Hugh Dickins' suggestions, thanks! Link: https://lkml.kernel.org/r/1608614453-10739-1-git-send-email-alex.shi@linux.alibaba.comSigned-off-by: NAlex Shi <alex.shi@linux.alibaba.com> Acked-by: NHugh Dickins <hughd@google.com> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Muchun Song 提交于
Although the ratio of the slab is one, we also should read the ratio from the related memory_stats instead of hard-coding. And the local variable of size is already the value of slab_unreclaimable. So we do not need to read again. To do this we need some code like below: if (unlikely(memory_stats[i].idx == NR_SLAB_UNRECLAIMABLE_B)) { - size = memcg_page_state(memcg, NR_SLAB_RECLAIMABLE_B) + - memcg_page_state(memcg, NR_SLAB_UNRECLAIMABLE_B); + VM_BUG_ON(i < 1); + VM_BUG_ON(memory_stats[i - 1].idx != NR_SLAB_RECLAIMABLE_B); + size += memcg_page_state(memcg, memory_stats[i - 1].idx) * + memory_stats[i - 1].ratio; It requires a series of VM_BUG_ONs or comments to ensure these two items are actually adjacent and in the right order. So it would probably be easier to implement this using a wrapper that has a big switch() for unit conversion. More details about this discussion can refer to: https://lore.kernel.org/patchwork/patch/1348611/ This would fix the ratio inconsistency and get rid of the order guarantee. Link: https://lkml.kernel.org/r/20201228164110.2838-8-songmuchun@bytedance.comSigned-off-by: NMuchun Song <songmuchun@bytedance.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Feng Tang <feng.tang@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@suse.com> Cc: NeilBrown <neilb@suse.de> Cc: Pankaj Gupta <pankaj.gupta@cloud.ionos.com> Cc: Rafael. J. Wysocki <rafael@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Roman Gushchin <guro@fb.com> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Muchun Song 提交于
Currently we use struct per_cpu_nodestat to cache the vmstat counters, which leads to inaccurate statistics especially THP vmstat counters. In the systems with hundreds of processors it can be GBs of memory. For example, for a 96 CPUs system, the threshold is the maximum number of 125. And the per cpu counters can cache 23.4375 GB in total. The THP page is already a form of batched addition (it will add 512 worth of memory in one go) so skipping the batching seems like sensible. Although every THP stats update overflows the per-cpu counter, resorting to atomic global updates. But it can make the statistics more accuracy for the THP vmstat counters. So we convert the NR_FILE_PMDMAPPED account to pages. This patch is consistent with 8f182270 ("mm/swap.c: flush lru pvecs on compound page arrival"). Doing this also can make the unit of vmstat counters more unified. Finally, the unit of the vmstat counters are pages, kB and bytes. The B/KB suffix can tell us that the unit is bytes or kB. The rest which is without suffix are pages. Link: https://lkml.kernel.org/r/20201228164110.2838-7-songmuchun@bytedance.comSigned-off-by: NMuchun Song <songmuchun@bytedance.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Feng Tang <feng.tang@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@suse.com> Cc: NeilBrown <neilb@suse.de> Cc: Pankaj Gupta <pankaj.gupta@cloud.ionos.com> Cc: Rafael. J. Wysocki <rafael@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Roman Gushchin <guro@fb.com> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Muchun Song 提交于
Currently we use struct per_cpu_nodestat to cache the vmstat counters, which leads to inaccurate statistics especially THP vmstat counters. In the systems with hundreds of processors it can be GBs of memory. For example, for a 96 CPUs system, the threshold is the maximum number of 125. And the per cpu counters can cache 23.4375 GB in total. The THP page is already a form of batched addition (it will add 512 worth of memory in one go) so skipping the batching seems like sensible. Although every THP stats update overflows the per-cpu counter, resorting to atomic global updates. But it can make the statistics more accuracy for the THP vmstat counters. So we convert the NR_SHMEM_PMDMAPPED account to pages. This patch is consistent with 8f182270 ("mm/swap.c: flush lru pvecs on compound page arrival"). Doing this also can make the unit of vmstat counters more unified. Finally, the unit of the vmstat counters are pages, kB and bytes. The B/KB suffix can tell us that the unit is bytes or kB. The rest which is without suffix are pages. Link: https://lkml.kernel.org/r/20201228164110.2838-6-songmuchun@bytedance.comSigned-off-by: NMuchun Song <songmuchun@bytedance.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Feng Tang <feng.tang@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@suse.com> Cc: NeilBrown <neilb@suse.de> Cc: Pankaj Gupta <pankaj.gupta@cloud.ionos.com> Cc: Rafael. J. Wysocki <rafael@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Roman Gushchin <guro@fb.com> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Muchun Song 提交于
Currently we use struct per_cpu_nodestat to cache the vmstat counters, which leads to inaccurate statistics especially THP vmstat counters. In the systems with hundreds of processors it can be GBs of memory. For example, for a 96 CPUs system, the threshold is the maximum number of 125. And the per cpu counters can cache 23.4375 GB in total. The THP page is already a form of batched addition (it will add 512 worth of memory in one go) so skipping the batching seems like sensible. Although every THP stats update overflows the per-cpu counter, resorting to atomic global updates. But it can make the statistics more accuracy for the THP vmstat counters. So we convert the NR_SHMEM_THPS account to pages. This patch is consistent with 8f182270 ("mm/swap.c: flush lru pvecs on compound page arrival"). Doing this also can make the unit of vmstat counters more unified. Finally, the unit of the vmstat counters are pages, kB and bytes. The B/KB suffix can tell us that the unit is bytes or kB. The rest which is without suffix are pages. Link: https://lkml.kernel.org/r/20201228164110.2838-5-songmuchun@bytedance.comSigned-off-by: NMuchun Song <songmuchun@bytedance.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Feng Tang <feng.tang@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@suse.com> Cc: NeilBrown <neilb@suse.de> Cc: Pankaj Gupta <pankaj.gupta@cloud.ionos.com> Cc: Rafael. J. Wysocki <rafael@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Roman Gushchin <guro@fb.com> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Muchun Song 提交于
Currently we use struct per_cpu_nodestat to cache the vmstat counters, which leads to inaccurate statistics especially THP vmstat counters. In the systems with if hundreds of processors it can be GBs of memory. For example, for a 96 CPUs system, the threshold is the maximum number of 125. And the per cpu counters can cache 23.4375 GB in total. The THP page is already a form of batched addition (it will add 512 worth of memory in one go) so skipping the batching seems like sensible. Although every THP stats update overflows the per-cpu counter, resorting to atomic global updates. But it can make the statistics more accuracy for the THP vmstat counters. So we convert the NR_FILE_THPS account to pages. This patch is consistent with 8f182270 ("mm/swap.c: flush lru pvecs on compound page arrival"). Doing this also can make the unit of vmstat counters more unified. Finally, the unit of the vmstat counters are pages, kB and bytes. The B/KB suffix can tell us that the unit is bytes or kB. The rest which is without suffix are pages. Link: https://lkml.kernel.org/r/20201228164110.2838-4-songmuchun@bytedance.comSigned-off-by: NMuchun Song <songmuchun@bytedance.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Feng Tang <feng.tang@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Michal Hocko <mhocko@suse.com> Cc: NeilBrown <neilb@suse.de> Cc: Pankaj Gupta <pankaj.gupta@cloud.ionos.com> Cc: Rafael. J. Wysocki <rafael@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Roman Gushchin <guro@fb.com> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Muchun Song 提交于
Currently we use struct per_cpu_nodestat to cache the vmstat counters, which leads to inaccurate statistics especially THP vmstat counters. In the systems with hundreds of processors it can be GBs of memory. For example, for a 96 CPUs system, the threshold is the maximum number of 125. And the per cpu counters can cache 23.4375 GB in total. The THP page is already a form of batched addition (it will add 512 worth of memory in one go) so skipping the batching seems like sensible. Although every THP stats update overflows the per-cpu counter, resorting to atomic global updates. But it can make the statistics more accuracy for the THP vmstat counters. So we convert the NR_ANON_THPS account to pages. This patch is consistent with 8f182270 ("mm/swap.c: flush lru pvecs on compound page arrival"). Doing this also can make the unit of vmstat counters more unified. Finally, the unit of the vmstat counters are pages, kB and bytes. The B/KB suffix can tell us that the unit is bytes or kB. The rest which is without suffix are pages. Link: https://lkml.kernel.org/r/20201228164110.2838-3-songmuchun@bytedance.comSigned-off-by: NMuchun Song <songmuchun@bytedance.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Rafael. J. Wysocki <rafael@kernel.org> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Hugh Dickins <hughd@google.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Roman Gushchin <guro@fb.com> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Feng Tang <feng.tang@intel.com> Cc: NeilBrown <neilb@suse.de> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Pankaj Gupta <pankaj.gupta@cloud.ionos.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Muchun Song 提交于
Patch series "Convert all THP vmstat counters to pages", v6. This patch series is aimed to convert all THP vmstat counters to pages. The unit of some vmstat counters are pages, some are bytes, some are HPAGE_PMD_NR, and some are KiB. When we want to expose these vmstat counters to the userspace, we have to know the unit of the vmstat counters is which one. When the unit is bytes or kB, both clearly distinguishable by the B/KB suffix. But for the THP vmstat counters, we may make mistakes. For example, the below is some bug fix for the THP vmstat counters: - 7de2e9f1 ("mm: memcontrol: correct the NR_ANON_THPS counter of hierarchical memcg") - The first commit in this series ("fix NR_ANON_THPS accounting in charge moving") This patch series can make the code clear. And make all the unit of the THP vmstat counters in pages. Finally, the unit of the vmstat counters are pages, kB and bytes. The B/KB suffix can tell us that the unit is bytes or kB. The rest which is without suffix are pages. In this series, I changed the following vmstat counters unit from HPAGE_PMD_NR to pages. However, there is no change to the print format of output to user space. - NR_ANON_THPS - NR_FILE_THPS - NR_SHMEM_THPS - NR_SHMEM_PMDMAPPED - NR_FILE_PMDMAPPED Doing this also can make the statistics more accuracy for the THP vmstat counters. This series is consistent with 8f182270 ("mm/swap.c: flush lru pvecs on compound page arrival"). Because we use struct per_cpu_nodestat to cache the vmstat counters, which leads to inaccurate statistics especially THP vmstat counters. In the systems with hundreds of processors it can be GBs of memory. For example, for a 96 CPUs system, the threshold is the maximum number of 125. And the per cpu counters can cache 23.4375 GB in total. The THP page is already a form of batched addition (it will add 512 worth of memory in one go) so skipping the batching seems like sensible. Although every THP stats update overflows the per-cpu counter, resorting to atomic global updates. But it can make the statistics more accuracy for the THP vmstat counters. From this point of view, I think that do this converting is reasonable. Thanks Hugh for mentioning this. This was inspired by Johannes and Roman. Thanks to them. This patch (of 7): The unit of NR_ANON_THPS is HPAGE_PMD_NR already. So it should inc/dec by one rather than nr_pages. Link: https://lkml.kernel.org/r/20201228164110.2838-1-songmuchun@bytedance.com Link: https://lkml.kernel.org/r/20201228164110.2838-2-songmuchun@bytedance.com Fixes: 468c3982 ("mm: memcontrol: switch to native NR_ANON_THPS counter") Signed-off-by: NMuchun Song <songmuchun@bytedance.com> Acked-by: NMichal Hocko <mhocko@suse.com> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Acked-by: NPankaj Gupta <pankaj.gupta@cloud.ionos.com> Reviewed-by: NRoman Gushchin <guro@fb.com> Reviewed-by: NShakeel Butt <shakeelb@google.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Feng Tang <feng.tang@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Hugh Dickins <hughd@google.com> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: NeilBrown <neilb@suse.de> Cc: Rafael. J. Wysocki <rafael@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Muchun Song 提交于
The vmstat threshold is 32 (MEMCG_CHARGE_BATCH), Actually the threshold can be as big as MEMCG_CHARGE_BATCH * PAGE_SIZE. It still fits into s32. So introduce struct batched_lruvec_stat to optimize memory usage. The size of struct lruvec_stat is 304 bytes on 64 bit systems. As it is a per-cpu structure. So with this patch, we can save 304 / 2 * ncpu bytes per-memcg per-node where ncpu is the number of the possible CPU. If there are c memory cgroup (include dying cgroup) and n NUMA node in the system. Finally, we can save (152 * ncpu * c * n) bytes. [akpm@linux-foundation.org: fix typo in comment] Link: https://lkml.kernel.org/r/20201210042121.39665-1-songmuchun@bytedance.comSigned-off-by: NMuchun Song <songmuchun@bytedance.com> Reviewed-by: NShakeel Butt <shakeelb@google.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Shakeel Butt <shakeelb@google.com> Cc: Roman Gushchin <guro@fb.com> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Cc: Chris Down <chris@chrisdown.name> Cc: Yafang Shao <laoar.shao@gmail.com> Cc: Wei Yang <richard.weiyang@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Roman Gushchin 提交于
In general it's unknown in advance if a slab page will contain accounted objects or not. In order to avoid memory waste, an obj_cgroup vector is allocated dynamically when a need to account of a new object arises. Such approach is memory efficient, but requires an expensive cmpxchg() to set up the memcg/objcgs pointer, because an allocation can race with a different allocation on another cpu. But in some common cases it's known for sure that a slab page will contain accounted objects: if the page belongs to a slab cache with a SLAB_ACCOUNT flag set. It includes such popular objects like vm_area_struct, anon_vma, task_struct, etc. In such cases we can pre-allocate the objcgs vector and simple assign it to the page without any atomic operations, because at this early stage the page is not visible to anyone else. A very simplistic benchmark (allocating 10000000 64-bytes objects in a row) shows ~15% win. In the real life it seems that most workloads are not very sensitive to the speed of (accounted) slab allocations. [guro@fb.com: open-code set_page_objcgs() and add some comments, by Johannes] Link: https://lkml.kernel.org/r/20201113001926.GA2934489@carbon.dhcp.thefacebook.com [akpm@linux-foundation.org: fix it for mm-slub-call-account_slab_page-after-slab-page-initialization-fix.patch] Link: https://lkml.kernel.org/r/20201110195753.530157-2-guro@fb.comSigned-off-by: NRoman Gushchin <guro@fb.com> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Reviewed-by: NShakeel Butt <shakeelb@google.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Christoph Lameter <cl@linux.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Yu Zhao 提交于
We are capable of SetPageWorkingset based on refault distances after commit aae466b0 ("mm/swap: implement workingset detection for anonymous LRU"). This is done by workingset_refault(), which is right above the unconditional SetPageWorkingset deleted by this patch. The unconditional SetPageWorkingset miscategorizes pages that are read ahead or never belonged to the working set (e.g., tmpfs pages accessed only once by fd). When those pages are swapped in (after they were swapped out) for the first time, they skew PSI (when using async swap). When this happens again, depending on their refault distances, they might skew workingset_restore_anon counter in addition to PSI because their shadows indicate they were part of the working set. Historically, SetPageWorkingset was added as part of the PSI series, and Johannes said: "It was meant to mark incoming pages under IO with SetPageWorkingset when waiting for them constituted a memory stall. On the page cache side, because we HAVE workingset detection, this was specific to recently evicted pages that had been active in their previous life. On the anon side, the aging algorithm had no distinction between workingset and sporadically used pages. Given the choice between a) no swapin stalls are pressure and b) all swapin stalls are pressure, I went with the latter in order to detect swap storms. The false positive case - high rate of swapin without severe memory pressure - was relatively unlikely, because we tried to avoid swapping until everything was completely on fire in the first place." Link: https://lkml.kernel.org/r/20201209012400.1771150-1-yuzhao@google.com Link: https://lkml.kernel.org/r/20201214231253.62313-1-yuzhao@google.comSigned-off-by: NYu Zhao <yuzhao@google.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Acked-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com> Acked-by: NMichal Hocko <mhocko@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Rikard Falkeborn 提交于
The only usage of swap_attr_group is to pass its address to sysfs_create_group() which takes a pointer to const attribute_group. Make it const to allow the compiler to put it in read-only memory. Link: https://lkml.kernel.org/r/20210201233254.91809-1-rikard.falkeborn@gmail.comSigned-off-by: NRikard Falkeborn <rikard.falkeborn@gmail.com> Reviewed-by: NAmy Parker <enbyamy@gmail.com> Acked-by: N"Huang, Ying" <ying.huang@intel.com> Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Georgi Djakov 提交于
If there are errors during swap read or write, they can easily fill the log buffer and remove any previous messages that might be useful for debugging, especially on systems that rely for logging only on the kernel ring-buffer. For example, on a systems using zram as swap, we are more likely to see any page allocation errors preceding the swap write errors if the alerts are ratelimited. Link: https://lkml.kernel.org/r/20210201142055.29068-1-georgi.djakov@linaro.orgSigned-off-by: NGeorgi Djakov <georgi.djakov@linaro.org> Acked-by: NMinchan Kim <minchan@kernel.org> Reviewed-by: NMiaohe Lin <linmiaohe@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Stephen Zhang 提交于
Once the function name is changed, it may be easy to forget to modify the corresponding code here. Link: https://lkml.kernel.org/r/1611369120-2276-1-git-send-email-stephenzhangzsd@gmail.comSigned-off-by: NStephen Zhang <stephenzhangzsd@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Yang Li 提交于
Fix below warnings reported by coccicheck: mm/swap_slots.c:197:3-9: WARNING: NULL check before some freeing functions is not needed. Link: https://lkml.kernel.org/r/1611214229-11225-1-git-send-email-abaci-bugfix@linux.alibaba.comSigned-off-by: NYang Li <abaci-bugfix@linux.alibaba.com> Reported-by: NAbaci Robot <abaci@linux.alibaba.com> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NDavid Hildenbrand <david@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Baolin Wang 提交于
Move the K() macro a little forward to remove the same macro definition. Link: https://lkml.kernel.org/r/d1ccdf2d3116dce9814f2bcc1f0415ecb4c76ea5.1612862230.git.baolin.wang@linux.alibaba.comSigned-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com> Reviewed-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Yang Guo 提交于
clear_buffer_new() is used to clear buffer new stat. When PAGE_SIZE is 64K, most buffer heads in the list are not needed to clear. clear_buffer_new() has an enpensive atomic modification operation, Let's add checking buffer head before clear it as __block_write_begin_int does which is good for performance. Link: https://lkml.kernel.org/r/1612332890-57918-1-git-send-email-zhangshaokun@hisilicon.comSigned-off-by: NYang Guo <guoyang2@huawei.com> Signed-off-by: NShaokun Zhang <zhangshaokun@hisilicon.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Nick Piggin <npiggin@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Christoph Hellwig 提交于
Avoid the pointless goto out just for returning retval. Link: https://lkml.kernel.org/r/20210122160140.223228-19-willy@infradead.orgSigned-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: NKent Overstreet <kent.overstreet@gmail.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Christoph Hellwig 提交于
Rename generic_file_buffered_read to match the naming of filemap_fault, also update the written parameter to a more descriptive name and improve the kerneldoc comment. Link: https://lkml.kernel.org/r/20210122160140.223228-18-willy@infradead.orgSigned-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: NKent Overstreet <kent.overstreet@gmail.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Matthew Wilcox (Oracle) 提交于
We don't need to get the page lock again; we just need to wait for the I/O to finish, so use wait_on_page_locked_killable() like the other callers of ->readpage. Link: https://lkml.kernel.org/r/20210122160140.223228-17-willy@infradead.orgSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: NKent Overstreet <kent.overstreet@gmail.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Cc: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Matthew Wilcox (Oracle) 提交于
Remove the got_pages label, remove indentation, rename find_page to retry, simplify error handling. Link: https://lkml.kernel.org/r/20210122160140.223228-16-willy@infradead.orgSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: NKent Overstreet <kent.overstreet@gmail.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Cc: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Matthew Wilcox (Oracle) 提交于
This simplifies the error handling. Link: https://lkml.kernel.org/r/20210122160140.223228-15-willy@infradead.orgSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: NKent Overstreet <kent.overstreet@gmail.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Cc: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Matthew Wilcox (Oracle) 提交于
Move the complicated condition and the calculations out of filemap_update_page() into its own function. [willy@infradead.org: unlock page before dropping its refcount] Link: https://lkml.kernel.org/r/20210201125229.GO308988@casper.infradead.org Link: https://lkml.kernel.org/r/20210122160140.223228-14-willy@infradead.orgSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: NKent Overstreet <kent.overstreet@gmail.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Cc: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Matthew Wilcox (Oracle) 提交于
We don't need to give up when a non-blocking request sees a !Uptodate page. We may be able to satisfy the read from a partially-uptodate page. Link: https://lkml.kernel.org/r/20210122160140.223228-13-willy@infradead.orgSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: NKent Overstreet <kent.overstreet@gmail.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Cc: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Matthew Wilcox (Oracle) 提交于
Use AOP_TRUNCATED_PAGE to indicate that no error occurred, but the page we looked up is no longer valid. In this case, the reference to the page will have been removed; if we hit any other error, the caller will release the reference. Link: https://lkml.kernel.org/r/20210122160140.223228-12-willy@infradead.orgSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: NKent Overstreet <kent.overstreet@gmail.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Cc: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Matthew Wilcox (Oracle) 提交于
By moving the iocb flag checks to the caller, we can pass the file and the page index instead of the iocb. It never needed the iter. By passing the pagevec, we can return an errno (or AOP_TRUNCATED_PAGE) instead of an ERR_PTR. Link: https://lkml.kernel.org/r/20210122160140.223228-11-willy@infradead.orgSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: NKent Overstreet <kent.overstreet@gmail.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Cc: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Matthew Wilcox (Oracle) 提交于
Make this function more generic by passing the file instead of the iocb. Check in the callers whether we should call readpage or not. Also make it return an errno / 0 / AOP_TRUNCATED_PAGE, and make calling put_page() the caller's responsibility. Link: https://lkml.kernel.org/r/20210122160140.223228-10-willy@infradead.orgSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: NKent Overstreet <kent.overstreet@gmail.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Cc: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Matthew Wilcox (Oracle) 提交于
The readpage operation can block in many (most?) filesystems, so we should punt to a work queue instead of calling it. This was the last caller of lock_page_for_iocb(), so remove it. Link: https://lkml.kernel.org/r/20210122160140.223228-9-willy@infradead.orgSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: NKent Overstreet <kent.overstreet@gmail.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Cc: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Matthew Wilcox (Oracle) 提交于
The previous patch removed wait_on_page_locked_async(), so inline __wait_on_page_locked_async into __lock_page_async(). Link: https://lkml.kernel.org/r/20210122160140.223228-8-willy@infradead.orgSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: NKent Overstreet <kent.overstreet@gmail.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Cc: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Matthew Wilcox (Oracle) 提交于
For page splitting to succeed, the thread asking to split the page has to be the only one with a reference to the page. Calling wait_on_page_locked() while holding a reference to the page will effectively prevent this from happening with sufficient threads waiting on the same page. Use put_and_wait_on_page_locked() to sleep without holding a reference to the page, then retry the page lookup after the page is unlocked. Since we now get the page lock a little earlier in filemap_update_page(), we can eliminate a number of duplicate checks. The original intent (commit ebded027 ("avoid unnecessary calls to lock_page when waiting for IO to complete during a read")) behind getting the page lock later was to avoid re-locking the page after it has been brought uptodate by another thread. We still avoid that because we go through the normal lookup path again after the winning thread has brought the page uptodate. Link: https://lkml.kernel.org/r/20210122160140.223228-7-willy@infradead.orgSigned-off-by: NMatthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: NKent Overstreet <kent.overstreet@gmail.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Cc: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-