提交 · 495e6e3154c8097f71d0682a0c1c5f3bb22e7bf4 · openeuler / Kernel

21 6月, 2022 1 次提交

mm: and drivers core: Convert hugetlb_report_node_meminfo to sysfs_emit · 495e6e31

由 Joe Perches 提交于 6月 21, 2022

mainline inclusion
from mainline-v5.10-rc1
commit 7981593b
category: bugfix
bugzilla: https://gitee.com/src-openeuler/kernel/issues/I5C32F
CVE: CVE-2022-20166

--------------------------------------------------

Convert the unbound sprintf in hugetlb_report_node_meminfo to use
sysfs_emit_at so that no possible overrun of a PAGE_SIZE buf can occur.
Signed-off-by: NJoe Perches <joe@perches.com>
Acked-by: NMike Kravetz <mike.kravetz@oracle.com>
Link: https://lore.kernel.org/r/894b351b82da6013cde7f36ff4b5493cd0ec30d0.1600285923.git.joe@perches.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

Conflicts:
	drivers/base/node.c
	include/linux/hugetlb.h
Signed-off-by: NGuo Mengqi <guomengqi3@huawei.com>
Reviewed-by: NWeilong Chen <chenweilong@huawei.com>
Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

495e6e31

24 3月, 2022 1 次提交

hugetlb: Add huge page alloced limit · c7c20ad0

由 Kefeng Wang 提交于 3月 24, 2022

hulk inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4YTLN
CVE: NA

--------------------------------

The user wants to reserve a certain amount of memory for normal
non-huge page, that is, the hugetlb can't allowed to use all the
memory.

Add a new kernel parameters "hugepage_prohibit_sz=" to set size
for normal non-huge page reserved, and when alloc huge page,
let's fail if the new allocating exceeds the limit.
Signed-off-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NPeng Liu <liupeng256@huawei.com>
Reviewed-by: NChen Wandun <chenwandun@huawei.com>
Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>

c7c20ad0

08 3月, 2022 1 次提交

hugetlbfs: fix a truncation issue in hugepages parameter · 81a3dc36

由 Liu Yuntao 提交于 3月 08, 2022

mainline inclusion
from mainline-v5.17-rc6
commit e79ce983
category: bugfix
bugzilla: 186043
CVE: NA

-------------------------------------------------

When we specify a large number for node in hugepages parameter, it may
be parsed to another number due to truncation in this statement:

	node = tmp;

For example, add following parameter in command line:

	hugepagesz=1G hugepages=4294967297:5

and kernel will allocate 5 hugepages for node 1 instead of ignoring it.

I move the validation check earlier to fix this issue, and slightly
simplifies the condition here.

Link: https://lkml.kernel.org/r/20220209134018.8242-1-liuyuntao10@huawei.com
Fixes: b5389086 ("hugetlbfs: extend the definition of hugepages parameter to support node allocation")
Signed-off-by: NLiu Yuntao <liuyuntao10@huawei.com>
Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

81a3dc36

17 1月, 2022 2 次提交

hugetlbfs: fix issue of preallocation of gigantic pages can't work · d168c42d

由 Zhenguo Yao 提交于 1月 17, 2022

mainline inclusion
from mainline-v5.16-rc5
commit 4178158e
category: bugfix
bugzilla: 186043
CVE: NA

--------------------------------

Preallocation of gigantic pages can't work bacause of commit
b5389086 ("hugetlbfs: extend the definition of hugepages parameter
to support node allocation").  When nid is NUMA_NO_NODE(-1),
alloc_bootmem_huge_page will always return without doing allocation.
Fix this by adding more check.

Link: https://lkml.kernel.org/r/20211129133803.15653-1-yaozhenguo1@gmail.com
Fixes: b5389086 ("hugetlbfs: extend the definition of hugepages parameter to support node allocation")
Signed-off-by: NZhenguo Yao <yaozhenguo1@gmail.com>
Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
Tested-by: NMaxim Levitsky <mlevitsk@redhat.com>
Reviewed-by: NMuchun Song <songmuchun@bytedance.com>
Reviewed-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: Kefeng Wang<wangkefeng.wang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

d168c42d

hugetlbfs: extend the definition of hugepages parameter to support node allocation · b0750f70

由 Zhenguo Yao 提交于 1月 17, 2022

mainline inclusion
from mainline-v5.16-rc1
commit b5389086
category: feature
bugzilla: 186043
CVE: NA

--------------------------------

We can specify the number of hugepages to allocate at boot.  But the
hugepages is balanced in all nodes at present.  In some scenarios, we
only need hugepages in one node.  For example: DPDK needs hugepages
which are in the same node as NIC.

If DPDK needs four hugepages of 1G size in node1 and system has 16 numa
nodes we must reserve 64 hugepages on the kernel cmdline.  But only four
hugepages are used.  The others should be free after boot.  If the
system memory is low(for example: 64G), it will be an impossible task.

So extend the hugepages parameter to support specifying hugepages on a
specific node.  For example add following parameter:

  hugepagesz=1G hugepages=0:1,1:3

It will allocate 1 hugepage in node0 and 3 hugepages in node1.

Link: https://lkml.kernel.org/r/20211005054729.86457-1-yaozhenguo1@gmail.comSigned-off-by: NZhenguo Yao <yaozhenguo1@gmail.com>
Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
Cc: Zhenguo Yao <yaozhenguo1@gmail.com>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Conflicts:
	Documentation/admin-guide/kernel-parameters.txt
	Documentation/admin-guide/mm/hugetlbpage.rst
	arch/powerpc/mm/hugetlbpage.c
	include/linux/hugetlb.h
	mm/hugetlb.c
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: Kefeng Wang<wangkefeng.wang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

b0750f70

16 12月, 2021 1 次提交

hugetlbfs: flush TLBs correctly after huge_pmd_unshare · 09e1590f

由 Nadav Amit 提交于 12月 16, 2021

stable inclusion
from linux-v4.19.219
commit b0313bc7f5fbb6beee327af39d818ffdc921821a
category: bugfix
bugzilla: 185854
CVE: CVE-2021-4002

-----------------------------------------------

commit a4a118f2 upstream.

When __unmap_hugepage_range() calls to huge_pmd_unshare() succeed, a TLB
flush is missing.  This TLB flush must be performed before releasing the
i_mmap_rwsem, in order to prevent an unshared PMDs page from being
released and reused before the TLB flush took place.

Arguably, a comprehensive solution would use mmu_gather interface to
batch the TLB flushes and the PMDs page release, however it is not an
easy solution: (1) try_to_unmap_one() and try_to_migrate_one() also call
huge_pmd_unshare() and they cannot use the mmu_gather interface; and (2)
deferring the release of the page reference for the PMDs page until
after i_mmap_rwsem is dropeed can confuse huge_pmd_unshare() into
thinking PMDs are shared when they are not.

Fix __unmap_hugepage_range() by adding the missing TLB flush, and
forcing a flush when unshare is successful.

Fixes: 24669e58 ("hugetlb: use mmu_gather instead of a temporary linked list for accumulating pages)" # 3.6
Signed-off-by: NNadav Amit <namit@vmware.com>
Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Conflicts:
	include/asm-generic/tlb.h
	mm/mmu_gather.c
Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
Reviewed-by: Ntong tiangen <tongtiangen@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

09e1590f

02 12月, 2021 1 次提交

hugetlb: before freeing hugetlb page set dtor to appropriate value · c65d13f3

由 Mike Kravetz 提交于 12月 02, 2021

mainline inclusion
from mainline-5.15-rc1
commit e32d20c0
category: bugfix
bugzilla: 180680
CVE: NA

---------------------------

When removing a hugetlb page from the pool the ref count is set to one (as
the free page has no ref count) and compound page destructor is set to
NULL_COMPOUND_DTOR.  Since a subsequent call to free the hugetlb page will
call __free_pages for non-gigantic pages and free_gigantic_page for
gigantic pages the destructor is not used.

However, consider the following race with code taking a speculative
reference on the page:

Thread 0				Thread 1
--------				--------
remove_hugetlb_page
  set_page_refcounted(page);
  set_compound_page_dtor(page,
           NULL_COMPOUND_DTOR);
					get_page_unless_zero(page)
__update_and_free_page
  __free_pages(page,
           huge_page_order(h));

		/* Note that __free_pages() will simply drop
		   the reference to the page. */

					put_page(page)
					  __put_compound_page()
					    destroy_compound_page
					      NULL_COMPOUND_DTOR
						BUG: kernel NULL pointer
						dereference, address:
						0000000000000000

To address this race, set the dtor to the normal compound page dtor for
non-gigantic pages.  The dtor for gigantic pages does not matter as
gigantic pages are changed from a compound page to 'just a group of pages'
before freeing.  Hence, the destructor is not used.

Link: https://lkml.kernel.org/r/20210809184832.18342-4-mike.kravetz@oracle.comSigned-off-by: NMike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: NMuchun Song <songmuchun@bytedance.com>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Naoya Horiguchi <naoya.horiguchi@linux.dev>
Cc: Mina Almasry <almasrymina@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Conflicts:
	mm/hugetlb.c
Signed-off-by: NChen Wandun <chenwandun@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

c65d13f3

30 10月, 2021 4 次提交

share_pool: k2u hugepage READONLY prot bug fix · c0f0bb55

由 guomengqi 提交于 10月 30, 2021

ascend inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4EUVI
CVE: NA

-------------------------------------------------

Modified hugetlb_insert_hugepage_pte_by_pa to assure
k2u hugepages can be set as READONLY.
Signed-off-by: Nguomengqi <guomengqi3@huawei.com>
Reviewed-by: NDing Tianhong <dingtianhong@huawei.com>
Signed-off-by: NZhou Guanghui <zhouguanghui1@huawei.com>
Reviewed-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

c0f0bb55

ascend: share_pool: Use remap_pfn_range to share kva to uva · 18f49509

由 Weilong Chen 提交于 10月 30, 2021

ascend inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4EUVI
CVE: NA

-------------------------------------------------

Add a flag VM_SHAREPOOL to avoid vfree() a shared kva.
Reviewed-by: NDing Tianhong <dingtianhong@huawei.com>
Signed-off-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Reviewed-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

18f49509

ascend: share_pool: Use sharepool_no_page to alloc hugepage · 4e851c2b

由 Weilong Chen 提交于 10月 30, 2021

ascend inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4EUVI
CVE: NA

-------------------------------------------------

Sharepool applies for a dedicated interface for large pages,
which optimizes the efficiency of memory application
Reviewed-by: NDing Tianhong <dingtianhong@huawei.com>
Signed-off-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Reviewed-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

4e851c2b

ascend: memory: introduce do_mm_populate and hugetlb_insert_hugepage · 0f8986c3

由 Ding Tianhong 提交于 10月 30, 2021

ascend inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4EUVI
CVE: NA

-------------------------------------------------

The do_mmap/mmap_region/__mm_populate could only be used to handle the
current process, now the share pool need to handle the other process
and create memory mmaping, so need to export new function to distinguish
different process and handle it, it would not break the current logic
and only valid for share pool.

The share pool need to remap the vmalloc pages to user space, so
introduce the hugetlb_insert_hugepage to support hugepage remapming.
Signed-off-by: NTang Yizhou <tangyizhou@huawei.com>
Signed-off-by: NLi Ming <limingming.li@huawei.com>
Signed-off-by: NZefan Li <lizefan@huawei.com>
Signed-off-by: NZhou Guanghui <zhouguanghui1@huawei.com>
Signed-off-by: NDing Tianhong <dingtianhong@huawei.com>
Reviewed-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

0f8986c3

22 10月, 2021 3 次提交

Ascend/hugetlb:support alloc normal and buddy hugepage · d4bec1a8

由 Zhou Guanghui 提交于 10月 22, 2021

ascend inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4D63I
CVE: NA

----------------------------------------------------

The current function hugetlb_alloc_hugepage implements the allocation
from static hugepages first. When the static hugepage is used up, it
attempts to apply for hugepages from buddy system. Two additional modes
are supported: static hugepages only and buddy hugepages only.
Signed-off-by: NZhou Guanghui <zhouguanghui1@huawei.com>
Signed-off-by: NGuo Mengqi <guomengqi3@huawei.com>
Reviewed-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

d4bec1a8

Ascend/memcg: Use CONFIG_ASCEND_FEATURES for customized interfaces · 8a04737c

由 Zhou Guanghui 提交于 10月 22, 2021

ascend inclusion
category: feature
bugzilla: https://gitee.com/openeuler/kernel/issues/I4D63I
CVE: NA

-------------------------------------------------------------

The following functions are used only in the ascend scenario:
hugetlb_get_hstate,
hugetlb_alloc_hugepage,
hugetlb_insert_hugepage_pte,
hugetlb_insert_hugepage_pte_by_pa

Remove unused interface hugetlb_insert_hugepage
Signed-off-by: NZhou Guanghui <zhouguanghui1@huawei.com>
Signed-off-by: NGuo Mengqi <guomengqi3@huawei.com>
Reviewed-by: NWeilong Chen <chenweilong@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

8a04737c

Ascend/cdm:alloc hugepage from the specified CDM node · 63051041

由 Zhou Guanghui 提交于 5月 22, 2021

ascend inclusion
category: bugfix
bugzilla: https://gitee.com/openeuler/kernel/issues/I4D63I
CVE: NA

-------------------------------------------------------

commit(bd177f8f0548f): Only __GFP_THISNODE marked allocations
will come from the CDM node.

Therefore, when we alloc normal hugepages, if __GFP_THISNODE
is marked, hugepages can be applied for from the specified nid.
Signed-off-by: NZhou Guanghui <zhouguanghui1@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

63051041

02 8月, 2021 1 次提交

mm, futex: fix shared futex pgoff on shmem huge page · b35fb004

由 Hugh Dickins 提交于 8月 02, 2021

stable inclusion
from linux-4.19.197
commit 2445837e9cd084ba849a7c1c70086a6cdc608f48

--------------------------------

[ Upstream commit fe19bd3d ]

If more than one futex is placed on a shmem huge page, it can happen
that waking the second wakes the first instead, and leaves the second
waiting: the key's shared.pgoff is wrong.

When 3.11 commit 13d60f4b ("futex: Take hugepages into account when
generating futex_key"), the only shared huge pages came from hugetlbfs,
and the code added to deal with its exceptional page->index was put into
hugetlb source.  Then that was missed when 4.8 added shmem huge pages.

page_to_pgoff() is what others use for this nowadays: except that, as
currently written, it gives the right answer on hugetlbfs head, but
nonsense on hugetlbfs tails.  Fix that by calling hugetlbfs-specific
hugetlb_basepage_index() on PageHuge tails as well as on head.

Yes, it's unconventional to declare hugetlb_basepage_index() there in
pagemap.h, rather than in hugetlb.h; but I do not expect anything but
page_to_pgoff() ever to need it.

[akpm@linux-foundation.org: give hugetlb_basepage_index() prototype the correct scope]

Link: https://lkml.kernel.org/r/b17d946b-d09-326e-b42a-52884c36df32@google.com
Fixes: 800d8c63 ("shmem: add huge pages support")
Reported-by: NNeel Natu <neelnatu@google.com>
Signed-off-by: NHugh Dickins <hughd@google.com>
Reviewed-by: NMatthew Wilcox (Oracle) <willy@infradead.org>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Zhang Yi <wetpzy@gmail.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

Note on stable backport: leave redundant #include <linux/hugetlb.h>
in kernel/futex.c, to avoid conflict over the header files included.
Signed-off-by: NHugh Dickins <hughd@google.com>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

b35fb004

29 7月, 2021 1 次提交

mm: hugetlb: fix type of delta parameter and related local variables in gather_surplus_pages() · faf89705

由 Liu Xiang 提交于 7月 29, 2021

mainline inclusion
from mainline-v5.11-rc1
commit 0a4f3d1b
category: bugfix
bugzilla: NA
CVE: NA

-----------------------------------------------

On 64-bit machine, delta variable in hugetlb_acct_memory() may be larger
than 0xffffffff, but gather_surplus_pages() can only use the low 32-bit
value now.  So we need to fix type of delta parameter and related local
variables in gather_surplus_pages().

Link: https://lkml.kernel.org/r/1605793733-3573-1-git-send-email-liu.xiang@zlingsmart.comReported-by: NMa Chenggong <ma.chenggong@zlingsmart.com>
Signed-off-by: NLiu Xiang <liu.xiang@zlingsmart.com>
Signed-off-by: NPan Jiagen <pan.jiagen@zlingsmart.com>
Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
Cc: Liu Xiang <liuxiang_1999@126.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

conflict:
	mm/hugetlb.c
Signed-off-by: NTong Tiangen <tongtiangen@huawei.com>
Reviewed-by: NChen Wandun <chenwandun@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

faf89705

19 7月, 2021 1 次提交

mm, hugetlb: fix simple resv_huge_pages underflow on UFFDIO_COPY · bf555e76

由 Mina Almasry 提交于 7月 19, 2021

stable inclusion
from linux-4.19.194
commit 7de60c2d5a2a66ef2c5d76952a5a3a9a4ea4d436

--------------------------------

[ Upstream commit d84cf06e ]

The userfaultfd hugetlb tests cause a resv_huge_pages underflow.  This
happens when hugetlb_mcopy_atomic_pte() is called with !is_continue on
an index for which we already have a page in the cache.  When this
happens, we allocate a second page, double consuming the reservation,
and then fail to insert the page into the cache and return -EEXIST.

To fix this, we first check if there is a page in the cache which
already consumed the reservation, and return -EEXIST immediately if so.

There is still a rare condition where we fail to copy the page contents
AND race with a call for hugetlb_no_page() for this index and again we
will underflow resv_huge_pages.  That is fixed in a more complicated
patch not targeted for -stable.

Test:

  Hacked the code locally such that resv_huge_pages underflows produce a
  warning, then:

  ./tools/testing/selftests/vm/userfaultfd hugetlb_shared 10
	2 /tmp/kokonut_test/huge/userfaultfd_test && echo test success
  ./tools/testing/selftests/vm/userfaultfd hugetlb 10
	2 /tmp/kokonut_test/huge/userfaultfd_test && echo test success

Both tests succeed and produce no warnings.  After the test runs number
of free/resv hugepages is correct.

[mike.kravetz@oracle.com: changelog fixes]

Link: https://lkml.kernel.org/r/20210528004649.85298-1-almasrymina@google.com
Fixes: 8fb5debc ("userfaultfd: hugetlbfs: add hugetlb_mcopy_atomic_pte for userfaultfd support")
Signed-off-by: NMina Almasry <almasrymina@google.com>
Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
Cc: Axel Rasmussen <axelrasmussen@google.com>
Cc: Peter Xu <peterx@redhat.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

bf555e76

30 6月, 2021 2 次提交

hugetlbfs: hugetlb_fault_mutex_hash() cleanup · f6719cc4

由 Mike Kravetz 提交于 6月 30, 2021

stable inclusion
from linux-4.19.193
commit a92212ef6326c8dc09003c7af4e1ba7da0b77e44

--------------------------------

commit 55254636 upstream.

A new clang diagnostic (-Wsizeof-array-div) warns about the calculation
to determine the number of u32's in an array of unsigned longs.
Suppress warning by adding parentheses.

While looking at the above issue, noticed that the 'address' parameter
to hugetlb_fault_mutex_hash is no longer used.  So, remove it from the
definition and all callers.

No functional change.

Link: http://lkml.kernel.org/r/20190919011847.18400-1-mike.kravetz@oracle.comSigned-off-by: NMike Kravetz <mike.kravetz@oracle.com>
Reported-by: NNathan Chancellor <natechancellor@gmail.com>
Reviewed-by: NNathan Chancellor <natechancellor@gmail.com>
Reviewed-by: NDavidlohr Bueso <dbueso@suse.de>
Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Ilie Halip <ilie.halip@gmail.com>
Cc: David Bolvansky <david.bolvansky@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

f6719cc4

mm/hugeltb: handle the error case in hugetlb_fix_reserve_counts() · 876e5fb2

由 Miaohe Lin 提交于 6月 30, 2021

stable inclusion
from linux-4.19.191
commit 2e8b30d7f8b5f55539659039a5e1ae2803002a22

--------------------------------

[ Upstream commit da56388c ]

A rare out of memory error would prevent removal of the reserve map region
for a page.  hugetlb_fix_reserve_counts() handles this rare case to avoid
dangling with incorrect counts.  Unfortunately, hugepage_subpool_get_pages
and hugetlb_acct_memory could possibly fail too.  We should correctly
handle these cases.

Link: https://lkml.kernel.org/r/20210410072348.20437-5-linmiaohe@huawei.com
Fixes: b5cec28d ("hugetlbfs: truncate_hugepages() takes a range of pages")
Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com>
Cc: Feilong Lin <linfeilong@huawei.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

876e5fb2

14 4月, 2021 4 次提交

mm/gup: allow VM_FAULT_RETRY for multiple times · 85c63908

由 Peter Xu 提交于 4月 14, 2021

mainline inclusion
from mainline-5.6
commit 4426e945
category: bugfix
bugzilla: 47439
CVE: NA
---------------------------

This is the gup counterpart of the change that allows the VM_FAULT_RETRY
to happen for more than once.  One thing to mention is that we must check
the fatal signal here before retry because the GUP can be interrupted by
that, otherwise we can loop forever.
Signed-off-by: NPeter Xu <peterx@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Tested-by: NBrian Geffon <bgeffon@google.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Bobby Powers <bobbypowers@gmail.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Denis Plotnikov <dplotnikov@virtuozzo.com>
Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: "Kirill A . Shutemov" <kirill@shutemov.name>
Cc: Martin Cracauer <cracauer@cons.org>
Cc: Marty McFadden <mcfadden8@llnl.gov>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Maya Gokhale <gokhale2@llnl.gov>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
Cc: Pavel Emelyanov <xemul@openvz.org>
Link: http://lkml.kernel.org/r/20200220195357.16371-1-peterx@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NXiongfeng Wang <wangxiongfeng2@huawei.com>
Reviewed-by: NJing Xiangfeng <jingxiangfeng@huawei.com>
Reviewed-by: NKefeng  Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>

85c63908

mm/hugetlb.c: fix unnecessary address expansion of pmd sharing · 39350e99

由 Li Xinhai 提交于 4月 14, 2021

stable inclusion
from linux-4.19.179
commit 66a013879cdb304c214e94a49d2129363f29e174

--------------------------------

commit a1ba9da8 upstream.

The current code would unnecessarily expand the address range.  Consider
one example, (start, end) = (1G-2M, 3G+2M), and (vm_start, vm_end) =
(1G-4M, 3G+4M), the expected adjustment should be keep (1G-2M, 3G+2M)
without expand.  But the current result will be (1G-4M, 3G+4M).  Actually,
the range (1G-4M, 1G) and (3G, 3G+4M) would never been involved in pmd
sharing.

After this patch, we will check that the vma span at least one PUD aligned
size and the start,end range overlap the aligned range of vma.

With above example, the aligned vma range is (1G, 3G), so if (start, end)
range is within (1G-4M, 1G), or within (3G, 3G+4M), then no adjustment to
both start and end.  Otherwise, we will have chance to adjust start
downwards or end upwards without exceeding (vm_start, vm_end).

Mike:

: The 'adjusted range' is used for calls to mmu notifiers and cache(tlb)
: flushing.  Since the current code unnecessarily expands the range in some
: cases, more entries than necessary would be flushed.  This would/could
: result in performance degradation.  However, this is highly dependent on
: the user runtime.  Is there a combination of vma layout and calls to
: actually hit this issue?  If the issue is hit, will those entries
: unnecessarily flushed be used again and need to be unnecessarily reloaded?

Link: https://lkml.kernel.org/r/20210104081631.2921415-1-lixinhai.lxh@gmail.com
Fixes: 75802ca6 ("mm/hugetlb: fix calculation of adjust_range_if_pmd_sharing_possible")
Signed-off-by: NLi Xinhai <lixinhai.lxh@gmail.com>
Suggested-by: NMike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
Cc: Peter Xu <peterx@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>

39350e99

hugetlb: fix update_and_free_page contig page struct assumption · a999f43f

由 Mike Kravetz 提交于 4月 14, 2021

stable inclusion
from linux-4.19.179
commit 08831f662b88f7117be51c5e55bd1f120087f90c

--------------------------------

commit dbfee5ae upstream.

page structs are not guaranteed to be contiguous for gigantic pages.  The
routine update_and_free_page can encounter a gigantic page, yet it assumes
page structs are contiguous when setting page flags in subpages.

If update_and_free_page encounters non-contiguous page structs, we can see
“BUG: Bad page state in process …” errors.

Non-contiguous page structs are generally not an issue.  However, they can
exist with a specific kernel configuration and hotplug operations.  For
example: Configure the kernel with CONFIG_SPARSEMEM and
!CONFIG_SPARSEMEM_VMEMMAP.  Then, hotplug add memory for the area where
the gigantic page will be allocated.  Zi Yan outlined steps to reproduce
here [1].

[1] https://lore.kernel.org/linux-mm/16F7C58B-4D79-41C5-9B64-A1A1628F4AF2@nvidia.com/

Link: https://lkml.kernel.org/r/20210217184926.33567-1-mike.kravetz@oracle.com
Fixes: 944d9fec ("hugetlb: add support for gigantic page allocation at runtime")
Signed-off-by: NZi Yan <ziy@nvidia.com>
Signed-off-by: NMike Kravetz <mike.kravetz@oracle.com>
Cc: Zi Yan <ziy@nvidia.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: "Kirill A . Shutemov" <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Oscar Salvador <osalvador@suse.de>
Cc: Joao Martins <joao.m.martins@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NMike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>

a999f43f

mm/hugetlb: fix potential double free in hugetlb_register_node() error path · 8bed4371

由 Miaohe Lin 提交于 4月 14, 2021

stable inclusion
from linux-4.19.178
commit 2d0324108fa80446d6b41228bac40c03cd3b5d35

--------------------------------

[ Upstream commit cc2205a6 ]

In hugetlb_sysfs_add_hstate(), we would do kobject_put() on hstate_kobjs
when failed to create sysfs group but forget to set hstate_kobjs to NULL.
Then in hugetlb_register_node() error path, we may free it again via
hugetlb_unregister_node().

Link: https://lkml.kernel.org/r/20210107123249.36964-1-linmiaohe@huawei.com
Fixes: a3437870 ("hugetlb: new sysfs interface")
Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com>
Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: NMuchun Song <smuchun@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NSasha Levin <sashal@kernel.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>

8bed4371

11 3月, 2021 4 次提交

mm: hugetlb: remove VM_BUG_ON_PAGE from page_huge_active · e5515e37

由 Muchun Song 提交于 3月 11, 2021

stable inclusion
from linux-4.19.175
commit 6bf5461ae968b870f81c813a880e0e3a2684dfc1

--------------------------------

commit ecbf4724 upstream.

The page_huge_active() can be called from scan_movable_pages() which do
not hold a reference count to the HugeTLB page.  So when we call
page_huge_active() from scan_movable_pages(), the HugeTLB page can be
freed parallel.  Then we will trigger a BUG_ON which is in the
page_huge_active() when CONFIG_DEBUG_VM is enabled.  Just remove the
VM_BUG_ON_PAGE.

Link: https://lkml.kernel.org/r/20210115124942.46403-6-songmuchun@bytedance.com
Fixes: 7e1f049e ("mm: hugetlb: cleanup using paeg_huge_active()")
Signed-off-by: NMuchun Song <songmuchun@bytedance.com>
Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Reviewed-by: NOscar Salvador <osalvador@suse.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>

e5515e37

mm: hugetlb: fix a race between isolating and freeing page · 50ac95de

由 Muchun Song 提交于 3月 11, 2021

stable inclusion
from linux-4.19.175
commit 532574ae2586940729419253fd2defd9c9880490

--------------------------------

commit 0eb2df2b upstream.

There is a race between isolate_huge_page() and __free_huge_page().

  CPU0:                                     CPU1:

  if (PageHuge(page))
                                            put_page(page)
                                              __free_huge_page(page)
                                                  spin_lock(&hugetlb_lock)
                                                  update_and_free_page(page)
                                                    set_compound_page_dtor(page,
                                                      NULL_COMPOUND_DTOR)
                                                  spin_unlock(&hugetlb_lock)
    isolate_huge_page(page)
      // trigger BUG_ON
      VM_BUG_ON_PAGE(!PageHead(page), page)
      spin_lock(&hugetlb_lock)
      page_huge_active(page)
        // trigger BUG_ON
        VM_BUG_ON_PAGE(!PageHuge(page), page)
      spin_unlock(&hugetlb_lock)

When we isolate a HugeTLB page on CPU0.  Meanwhile, we free it to the
buddy allocator on CPU1.  Then, we can trigger a BUG_ON on CPU0, because
it is already freed to the buddy allocator.

Link: https://lkml.kernel.org/r/20210115124942.46403-5-songmuchun@bytedance.com
Fixes: c8721bbb ("mm: memory-hotplug: enable memory hotplug to handle hugepage")
Signed-off-by: NMuchun Song <songmuchun@bytedance.com>
Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Reviewed-by: NOscar Salvador <osalvador@suse.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>

50ac95de

mm: hugetlb: fix a race between freeing and dissolving the page · 5b30c9f7

由 Muchun Song 提交于 3月 11, 2021

stable inclusion
from linux-4.19.175
commit db510d8f98c38a953ae5d3b51b01f435740729b4

--------------------------------

commit 7ffddd49 upstream.

There is a race condition between __free_huge_page()
and dissolve_free_huge_page().

  CPU0:                         CPU1:

  // page_count(page) == 1
  put_page(page)
    __free_huge_page(page)
                                dissolve_free_huge_page(page)
                                  spin_lock(&hugetlb_lock)
                                  // PageHuge(page) && !page_count(page)
                                  update_and_free_page(page)
                                  // page is freed to the buddy
                                  spin_unlock(&hugetlb_lock)
      spin_lock(&hugetlb_lock)
      clear_page_huge_active(page)
      enqueue_huge_page(page)
      // It is wrong, the page is already freed
      spin_unlock(&hugetlb_lock)

The race window is between put_page() and dissolve_free_huge_page().

We should make sure that the page is already on the free list when it is
dissolved.

As a result __free_huge_page would corrupt page(s) already in the buddy
allocator.

Link: https://lkml.kernel.org/r/20210115124942.46403-4-songmuchun@bytedance.com
Fixes: c8721bbb ("mm: memory-hotplug: enable memory hotplug to handle hugepage")
Signed-off-by: NMuchun Song <songmuchun@bytedance.com>
Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: NOscar Salvador <osalvador@suse.de>
Acked-by: NMichal Hocko <mhocko@suse.com>
Cc: David Hildenbrand <david@redhat.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>

5b30c9f7

mm: hugetlbfs: fix cannot migrate the fallocated HugeTLB page · 4cd71189

由 Muchun Song 提交于 3月 11, 2021

stable inclusion
from linux-4.19.175
commit b6e04c19c5b2060c91b07acec5d650a1beb6855f

--------------------------------

commit 585fc0d2 upstream.

If a new hugetlb page is allocated during fallocate it will not be
marked as active (set_page_huge_active) which will result in a later
isolate_huge_page failure when the page migration code would like to
move that page.  Such a failure would be unexpected and wrong.

Only export set_page_huge_active, just leave clear_page_huge_active as
static.  Because there are no external users.

Link: https://lkml.kernel.org/r/20210115124942.46403-3-songmuchun@bytedance.com
Fixes: 70c3547e (hugetlbfs: add hugetlbfs_fallocate())
Signed-off-by: NMuchun Song <songmuchun@bytedance.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: NOscar Salvador <osalvador@suse.de>
Cc: David Hildenbrand <david@redhat.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>

4cd71189

22 2月, 2021 3 次提交

mm/hugetlb: fix potential missing huge page size info · ca9833a7

由 Miaohe Lin 提交于 2月 22, 2021

stable inclusion
from linux-4.19.169
commit bba1a0da5bbdd938907cf7f5c3573b3d8e199074

--------------------------------

commit 0eb98f15 upstream.

The huge page size is encoded for VM_FAULT_HWPOISON errors only.  So if
we return VM_FAULT_HWPOISON, huge page size would just be ignored.

Link: https://lkml.kernel.org/r/20210107123449.38481-1-linmiaohe@huawei.com
Fixes: aa50d3a7 ("Encode huge page size for VM_FAULT_HWPOISON errors")
Signed-off-by: NMiaohe Lin <linmiaohe@huawei.com>
Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>

ca9833a7

arm64/ascend: mm: Fix arm32 compile warnings · f0b615be

由 Fang Lijun 提交于 2月 22, 2021

ascend inclusion
category: bugfix
bugzilla: NA
CVE: NA

-------------------------------------------------

The vm_flags will overflow on arm32 as left shift CHECKNODE_BITS(48).

This checknode function only used in cdm feature.

Fixes: cdccf4d4b7b5 ("arm64/ascend: mm: Add MAP_CHECKNODE flag to check node hugetlb")
Signed-off-by: NFang Lijun <fanglijun3@huawei.com>
Reviewed-by: NDing Tianhong <dingtianhong@huawei.com>
Reviewed-by: NXie XiuQi <xiexiuqi@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>

f0b615be

arm64/ascend: mm: Add MAP_CHECKNODE flag to check node hugetlb · 05a6fc26

由 Fang Lijun 提交于 2月 22, 2021

ascend inclusion
category: bugfix
bugzilla: NA
CVE: NA

-------------------------------------------------

Dvpp use flags MAP_CHECKNODE to enable check node hugetlb.
The global variable numanode will cause the mmap not be
reenterable, so use the flags BITS[26:31] directly.

Fixes: cbdbfc7514ab ("mm: Check numa node hugepages enough when mmap hugetlb")
Signed-off-by: NFang Lijun <fanglijun3@huawei.com>
Reviewed-by: NDing Tianhong <dingtianhong@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Signed-off-by: NCheng Jian <cj.chengjian@huawei.com>

05a6fc26

15 10月, 2020 1 次提交

mm/hugetlb: fix calculation of adjust_range_if_pmd_sharing_possible · fc684bc1

由 Peter Xu 提交于 10月 15, 2020

stable inclusion
from linux-4.19.142
commit 734654ae7962be55c44ff3fb0bb0652b5149cc17

--------------------------------

commit 75802ca6 upstream.

This is found by code observation only.

Firstly, the worst case scenario should assume the whole range was covered
by pmd sharing.  The old algorithm might not work as expected for ranges
like (1g-2m, 1g+2m), where the adjusted range should be (0, 1g+2m) but the
expected range should be (0, 2g).

Since at it, remove the loop since it should not be required.  With that,
the new code should be faster too when the invalidating range is huge.

Mike said:

: With range (1g-2m, 1g+2m) within a vma (0, 2g) the existing code will only
: adjust to (0, 1g+2m) which is incorrect.
:
: We should cc stable.  The original reason for adjusting the range was to
: prevent data corruption (getting wrong page).  Since the range is not
: always adjusted correctly, the potential for corruption still exists.
:
: However, I am fairly confident that adjust_range_if_pmd_sharing_possible
: is only gong to be called in two cases:
:
: 1) for a single page
: 2) for range == entire vma
:
: In those cases, the current code should produce the correct results.
:
: To be safe, let's just cc stable.

Fixes: 017b1660 ("mm: migration: fix migration of huge PMD shared pages")
Signed-off-by: NPeter Xu <peterx@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200730201636.74778-1-peterx@redhat.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NMike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

fc684bc1

22 9月, 2020 4 次提交

mm/hugetlb: fix a race between hugetlb sysctl handlers · 403e3ba3

由 Muchun Song 提交于 9月 22, 2020

mainline inclusion
from mainline-v5.9-rc4
commit 17743798
category: bugfix
bugzilla: NA
CVE: CVE-2020-25285

--------------------------------

There is a race between the assignment of `table->data` and write value
to the pointer of `table->data` in the __do_proc_doulongvec_minmax() on
the other thread.

  CPU0:                                 CPU1:
                                        proc_sys_write
  hugetlb_sysctl_handler                  proc_sys_call_handler
  hugetlb_sysctl_handler_common             hugetlb_sysctl_handler
    table->data = &tmp;                       hugetlb_sysctl_handler_common
                                                table->data = &tmp;
      proc_doulongvec_minmax
        do_proc_doulongvec_minmax           sysctl_head_finish
          __do_proc_doulongvec_minmax         unuse_table
            i = table->data;
            *i = val;  // corrupt CPU1's stack

Fix this by duplicating the `table`, and only update the duplicate of
it.  And introduce a helper of proc_hugetlb_doulongvec_minmax() to
simplify the code.

The following oops was seen:

    BUG: kernel NULL pointer dereference, address: 0000000000000000
    #PF: supervisor instruction fetch in kernel mode
    #PF: error_code(0x0010) - not-present page
    Code: Bad RIP value.
    ...
    Call Trace:
     ? set_max_huge_pages+0x3da/0x4f0
     ? alloc_pool_huge_page+0x150/0x150
     ? proc_doulongvec_minmax+0x46/0x60
     ? hugetlb_sysctl_handler_common+0x1c7/0x200
     ? nr_hugepages_store+0x20/0x20
     ? copy_fd_bitmaps+0x170/0x170
     ? hugetlb_sysctl_handler+0x1e/0x20
     ? proc_sys_call_handler+0x2f1/0x300
     ? unregister_sysctl_table+0xb0/0xb0
     ? __fd_install+0x78/0x100
     ? proc_sys_write+0x14/0x20
     ? __vfs_write+0x4d/0x90
     ? vfs_write+0xef/0x240
     ? ksys_write+0xc0/0x160
     ? __ia32_sys_read+0x50/0x50
     ? __close_fd+0x129/0x150
     ? __x64_sys_write+0x43/0x50
     ? do_syscall_64+0x6c/0x200
     ? entry_SYSCALL_64_after_hwframe+0x44/0xa9

Fixes: e5ff2159 ("hugetlb: multiple hstates for multiple page sizes")
Signed-off-by: NMuchun Song <songmuchun@bytedance.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
Cc: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/20200828031146.43035-1-songmuchun@bytedance.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>
Reviewed-by: NJason Yan <yanaijie@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

403e3ba3

arm64/ascend: Add hugepage flags change interface · db1d159b

由 Weilong Chen 提交于 9月 22, 2020

ascend inclusion
category: feature
bugzilla: NA
CVE: NA

-------------------------------------------------

Add a variable to change alloc hugepage gfp flags, and export it out.
Hugepage need to be accounted by cgroup. We use this to set ACCOUNT
flag to memory subsystem.
Signed-off-by: NWeilong Chen <chenweilong@huawei.com>
Reviewed-by: NDing Tianhong <dingtianhong@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

db1d159b

arm64/ascend: Add set hugepage number helper function · b6bcd500

由 Weilong Chen 提交于 9月 22, 2020

ascend inclusion
category: feature
bugzilla: NA
CVE: NA

-------------------------------------------------

Add helper function for change system hugepage nr, and export it out.
Signed-off-by: NWeilong Chen <chenweilong@huawei.com>
Reviewed-by: NDing Tianhong <dingtianhong@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

b6bcd500

mm/hugetlb: fix a addressing exception caused by huge_pte_offset · f053d5db

由 Longpeng 提交于 9月 22, 2020

stable inclusion
from linux-4.19.119
commit dcca7d2f751014d8caa3711e93b2119305e837df

--------------------------------

commit 3c1d7e6c upstream.

Our machine encountered a panic(addressing exception) after run for a
long time and the calltrace is:

    RIP: hugetlb_fault+0x307/0xbe0
    RSP: 0018:ffff9567fc27f808  EFLAGS: 00010286
    RAX: e800c03ff1258d48 RBX: ffffd3bb003b69c0 RCX: e800c03ff1258d48
    RDX: 17ff3fc00eda72b7 RSI: 00003ffffffff000 RDI: e800c03ff1258d48
    RBP: ffff9567fc27f8c8 R08: e800c03ff1258d48 R09: 0000000000000080
    R10: ffffaba0704c22a8 R11: 0000000000000001 R12: ffff95c87b4b60d8
    R13: 00005fff00000000 R14: 0000000000000000 R15: ffff9567face8074
    FS:  00007fe2d9ffb700(0000) GS:ffff956900e40000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: ffffd3bb003b69c0 CR3: 000000be67374000 CR4: 00000000003627e0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
    Call Trace:
      follow_hugetlb_page+0x175/0x540
      __get_user_pages+0x2a0/0x7e0
      __get_user_pages_unlocked+0x15d/0x210
      __gfn_to_pfn_memslot+0x3c5/0x460 [kvm]
      try_async_pf+0x6e/0x2a0 [kvm]
      tdp_page_fault+0x151/0x2d0 [kvm]
     ...
      kvm_arch_vcpu_ioctl_run+0x330/0x490 [kvm]
      kvm_vcpu_ioctl+0x309/0x6d0 [kvm]
      do_vfs_ioctl+0x3f0/0x540
      SyS_ioctl+0xa1/0xc0
      system_call_fastpath+0x22/0x27

For 1G hugepages, huge_pte_offset() wants to return NULL or pudp, but it
may return a wrong 'pmdp' if there is a race.  Please look at the
following code snippet:

    ...
    pud = pud_offset(p4d, addr);
    if (sz != PUD_SIZE && pud_none(*pud))
        return NULL;
    /* hugepage or swap? */
    if (pud_huge(*pud) || !pud_present(*pud))
        return (pte_t *)pud;

    pmd = pmd_offset(pud, addr);
    if (sz != PMD_SIZE && pmd_none(*pmd))
        return NULL;
    /* hugepage or swap? */
    if (pmd_huge(*pmd) || !pmd_present(*pmd))
        return (pte_t *)pmd;
    ...

The following sequence would trigger this bug:

 - CPU0: sz = PUD_SIZE and *pud = 0 , continue
 - CPU0: "pud_huge(*pud)" is false
 - CPU1: calling hugetlb_no_page and set *pud to xxxx8e7(PRESENT)
 - CPU0: "!pud_present(*pud)" is false, continue
 - CPU0: pmd = pmd_offset(pud, addr) and maybe return a wrong pmdp

However, we want CPU0 to return NULL or pudp in this case.

We must make sure there is exactly one dereference of pud and pmd.
Signed-off-by: NLongpeng <longpeng2@huawei.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
Reviewed-by: NJason Gunthorpe <jgg@mellanox.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: Sean Christopherson <sean.j.christopherson@intel.com>
Cc: <stable@vger.kernel.org>
Link: http://lkml.kernel.org/r/20200413010342.771-1-longpeng2@huawei.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

f053d5db

31 8月, 2020 3 次提交

arm64/ascend: use ascend_enable_full to enable ascend platform · 342049dc

由 Ding Tianhong 提交于 8月 31, 2020

ascend inclusion
category: feature
bugzilla: NA
CVE: NA

-------------------------------------------------

There are too many ascend features enable flag, all of them is
used for all ascend soc till now, so use a new enable flag
to enable all of them for ascend platform by default, it would
clean and simplify the bootargs.

Also clean some code warning.

v2: modify the wrong config name.

v3: modify the wrong include head file.
Signed-off-by: NDing Tianhong <dingtianhong@huawei.com>
Reviewed-by: NZefan Li <lizefan@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

342049dc

ascend: mm/hugetlb: Enable charge migrate hugepages · e36df347

由 Zhou Guanghui 提交于 8月 31, 2020

ascend inclusion
category: feature
Bugzilla: N/A
CVE: N/A

-------------------------------------------------------------

When the driver gets huge pages by alloc_huge_page_node, it attempts
to apply for migrate hugepages after the reserved memory hugepages
are used up. We expect that the migrated hugepages that are applied
for can be charged in memcg to limit the memory usage.

__GFP_ACOUNT flag is added to gfp mask before we allocate migrage
hugepages. Then, if memcg is set by memalloc_use_memcg(), the
allocated migrate hugepages will be charged to this memcg.
Signed-off-by: NZhou Guanghui <zhouguanghui1@huawei.com>
Reviewed-by: NDing Tianhong <dingtianhong@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

e36df347

mm: Check numa node hugepages enough when mmap hugetlb · 4f82c81f

由 Fang Lijun 提交于 8月 31, 2020

ascend inclusion
category: Bugfix
bugzilla: NA
CVE: NA

--------------

System cann't use the cdm nodes memory, but it can mmap all nodes huge
pages, so it will cause Bus error when mmap succeed but the huge pages
were not enough.

When set the cdmmask, users will transfer the numa id by mmap flag to
map the specific numa node hugepages, if there was not enough hugepages
on this node, return -ENOMEM.

v2: Fix compile error when disable CONFIG_COHERENT_DEVICE
Signed-off-by: NFang Lijun <fanglijun3@huawei.com>
Reviewed-by: NDing Tianhong <dingtianhong@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

4f82c81f

27 12月, 2019 2 次提交

configs: Delete CONFIG_ARCH_ASCEND for versatility code · 798f2efd

由 Lijun Fang 提交于 11月 15, 2019

ascend inclusion
category: feature
bugzilla: NA
CVE: NA

--------

Delete CONFIG_ARCH_ASCEND for versatility code
Signed-off-by: NLijun Fang <fanglijun3@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Reviewed-by: NWenan Mao <maowenan@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

798f2efd

mm/hugetlb: Check nid when alloc hugepage · f36b1a10

由 Zhou Guanghui 提交于 11月 11, 2019

ascend inclusion
category: bugfix
bugzilla: NA
CVE: NA

------------

Check nid when alloc hugepage.
Signed-off-by: NZhou Guanghui <zhouguanghui1@huawei.com>
Signed-off-by: NLijun Fang <fanglijun3@huawei.com>
Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
Signed-off-by: NYang Yingliang <yangyingliang@huawei.com>

f36b1a10

openeuler / Kernel 大约 2 年 前同步成功

openeuler / Kernel
大约 2 年前同步成功