提交 5ce1a70e 编写于 作者: L Linus Torvalds

Merge branch 'akpm' (more incoming from Andrew)

Merge second patch-bomb from Andrew Morton:

 - A little DM fix

 - the MM queue

* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (154 commits)
  ksm: allocate roots when needed
  mm: cleanup "swapcache" in do_swap_page
  mm,ksm: swapoff might need to copy
  mm,ksm: FOLL_MIGRATION do migration_entry_wait
  ksm: shrink 32-bit rmap_item back to 32 bytes
  ksm: treat unstable nid like in stable tree
  ksm: add some comments
  tmpfs: fix mempolicy object leaks
  tmpfs: fix use-after-free of mempolicy object
  mm/fadvise.c: drain all pagevecs if POSIX_FADV_DONTNEED fails to discard all pages
  mm: export mmu notifier invalidates
  mm: accelerate mm_populate() treatment of THP pages
  mm: use long type for page counts in mm_populate() and get_user_pages()
  mm: accurately document nr_free_*_pages functions with code comments
  HWPOISON: change order of error_states[]'s elements
  HWPOISON: fix misjudgement of page_action() for errors on mlocked pages
  memcg: stop warning on memcg_propagate_kmem
  net: change type of virtio_chan->p9_max_pages
  vmscan: change type of vm_total_pages to unsigned long
  fs/nfsd: change type of max_delegations, nfsd_drc_max_mem and nfsd_drc_mem_used
  ...
What: /sys/kernel/mm/ksm
Date: September 2009
KernelVersion: 2.6.32
Contact: Linux memory management mailing list <linux-mm@kvack.org>
Description: Interface for Kernel Samepage Merging (KSM)
What: /sys/kernel/mm/ksm/full_scans
What: /sys/kernel/mm/ksm/pages_shared
What: /sys/kernel/mm/ksm/pages_sharing
What: /sys/kernel/mm/ksm/pages_to_scan
What: /sys/kernel/mm/ksm/pages_unshared
What: /sys/kernel/mm/ksm/pages_volatile
What: /sys/kernel/mm/ksm/run
What: /sys/kernel/mm/ksm/sleep_millisecs
Date: September 2009
Contact: Linux memory management mailing list <linux-mm@kvack.org>
Description: Kernel Samepage Merging daemon sysfs interface
full_scans: how many times all mergeable areas have been
scanned.
pages_shared: how many shared pages are being used.
pages_sharing: how many more sites are sharing them i.e. how
much saved.
pages_to_scan: how many present pages to scan before ksmd goes
to sleep.
pages_unshared: how many pages unique but repeatedly checked
for merging.
pages_volatile: how many pages changing too fast to be placed
in a tree.
run: write 0 to disable ksm, read 0 while ksm is disabled.
write 1 to run ksm, read 1 while ksm is running.
write 2 to disable ksm and unmerge all its pages.
sleep_millisecs: how many milliseconds ksm should sleep between
scans.
See Documentation/vm/ksm.txt for more information.
What: /sys/kernel/mm/ksm/merge_across_nodes
Date: January 2013
KernelVersion: 3.9
Contact: Linux memory management mailing list <linux-mm@kvack.org>
Description: Control merging pages across different NUMA nodes.
When it is set to 0 only pages from the same node are merged,
otherwise pages from all nodes can be merged together (default).
......@@ -1640,6 +1640,42 @@ bytes respectively. Such letter suffixes can also be entirely omitted.
that the amount of memory usable for all allocations
is not too small.
movablemem_map=acpi
[KNL,X86,IA-64,PPC] This parameter is similar to
memmap except it specifies the memory map of
ZONE_MOVABLE.
This option inform the kernel to use Hot Pluggable bit
in flags from SRAT from ACPI BIOS to determine which
memory devices could be hotplugged. The corresponding
memory ranges will be set as ZONE_MOVABLE.
NOTE: Whatever node the kernel resides in will always
be un-hotpluggable.
movablemem_map=nn[KMG]@ss[KMG]
[KNL,X86,IA-64,PPC] This parameter is similar to
memmap except it specifies the memory map of
ZONE_MOVABLE.
If user specifies memory ranges, the info in SRAT will
be ingored. And it works like the following:
- If more ranges are all within one node, then from
lowest ss to the end of the node will be ZONE_MOVABLE.
- If a range is within a node, then from ss to the end
of the node will be ZONE_MOVABLE.
- If a range covers two or more nodes, then from ss to
the end of the 1st node will be ZONE_MOVABLE, and all
the rest nodes will only have ZONE_MOVABLE.
If memmap is specified at the same time, the
movablemem_map will be limited within the memmap
areas. If kernelcore or movablecore is also specified,
movablemem_map will have higher priority to be
satisfied. So the administrator should be careful that
the amount of movablemem_map areas are not too large.
Otherwise kernel won't have enough memory to start.
NOTE: We don't stop users specifying the node the
kernel resides in as hotpluggable so that this
option can be used as a workaround of firmware
bugs.
MTD_Partition= [MTD]
Format: <name>,<region-number>,<size>,<offset>
......
......@@ -58,6 +58,21 @@ sleep_millisecs - how many milliseconds ksmd should sleep before next scan
e.g. "echo 20 > /sys/kernel/mm/ksm/sleep_millisecs"
Default: 20 (chosen for demonstration purposes)
merge_across_nodes - specifies if pages from different numa nodes can be merged.
When set to 0, ksm merges only pages which physically
reside in the memory area of same NUMA node. That brings
lower latency to access of shared pages. Systems with more
nodes, at significant NUMA distances, are likely to benefit
from the lower latency of setting 0. Smaller systems, which
need to minimize memory usage, are likely to benefit from
the greater sharing of setting 1 (default). You may wish to
compare how your system performs under each setting, before
deciding on which to use. merge_across_nodes setting can be
changed only when there are no ksm shared pages in system:
set run 2 to unmerge pages first, then to 1 after changing
merge_across_nodes, to remerge according to the new setting.
Default: 1 (merging across nodes as in earlier releases)
run - set 0 to stop ksmd from running but keep merged pages,
set 1 to run ksmd e.g. "echo 1 > /sys/kernel/mm/ksm/run",
set 2 to stop ksmd and unmerge all pages currently merged,
......
......@@ -434,4 +434,7 @@ int __meminit vmemmap_populate(struct page *start_page,
return 0;
}
#endif /* CONFIG_ARM64_64K_PAGES */
void vmemmap_free(struct page *memmap, unsigned long nr_pages)
{
}
#endif /* CONFIG_SPARSEMEM_VMEMMAP */
......@@ -93,7 +93,7 @@ void show_mem(unsigned int filter)
printk(KERN_INFO "%d pages swap cached\n", total_cached);
printk(KERN_INFO "Total of %ld pages in page table cache\n",
quicklist_total_size());
printk(KERN_INFO "%d free buffer pages\n", nr_free_buffer_pages());
printk(KERN_INFO "%ld free buffer pages\n", nr_free_buffer_pages());
}
......
......@@ -666,7 +666,7 @@ void show_mem(unsigned int filter)
printk(KERN_INFO "%d pages swap cached\n", total_cached);
printk(KERN_INFO "Total of %ld pages in page table cache\n",
quicklist_total_size());
printk(KERN_INFO "%d free buffer pages\n", nr_free_buffer_pages());
printk(KERN_INFO "%ld free buffer pages\n", nr_free_buffer_pages());
}
/**
......@@ -822,4 +822,8 @@ int __meminit vmemmap_populate(struct page *start_page,
{
return vmemmap_populate_basepages(start_page, size, node);
}
void vmemmap_free(struct page *memmap, unsigned long nr_pages)
{
}
#endif
......@@ -688,6 +688,24 @@ int arch_add_memory(int nid, u64 start, u64 size)
return ret;
}
#ifdef CONFIG_MEMORY_HOTREMOVE
int arch_remove_memory(u64 start, u64 size)
{
unsigned long start_pfn = start >> PAGE_SHIFT;
unsigned long nr_pages = size >> PAGE_SHIFT;
struct zone *zone;
int ret;
zone = page_zone(pfn_to_page(start_pfn));
ret = __remove_pages(zone, start_pfn, nr_pages);
if (ret)
pr_warn("%s: Problem encountered in __remove_pages() as"
" ret=%d\n", __func__, ret);
return ret;
}
#endif
#endif
/*
......
......@@ -297,5 +297,10 @@ int __meminit vmemmap_populate(struct page *start_page,
return 0;
}
void vmemmap_free(struct page *memmap, unsigned long nr_pages)
{
}
#endif /* CONFIG_SPARSEMEM_VMEMMAP */
......@@ -133,6 +133,18 @@ int arch_add_memory(int nid, u64 start, u64 size)
return __add_pages(nid, zone, start_pfn, nr_pages);
}
#ifdef CONFIG_MEMORY_HOTREMOVE
int arch_remove_memory(u64 start, u64 size)
{
unsigned long start_pfn = start >> PAGE_SHIFT;
unsigned long nr_pages = size >> PAGE_SHIFT;
struct zone *zone;
zone = page_zone(pfn_to_page(start_pfn));
return __remove_pages(zone, start_pfn, nr_pages);
}
#endif
#endif /* CONFIG_MEMORY_HOTPLUG */
/*
......
......@@ -228,4 +228,16 @@ int arch_add_memory(int nid, u64 start, u64 size)
vmem_remove_mapping(start, size);
return rc;
}
#ifdef CONFIG_MEMORY_HOTREMOVE
int arch_remove_memory(u64 start, u64 size)
{
/*
* There is no hardware or firmware interface which could trigger a
* hot memory remove on s390. So there is nothing that needs to be
* implemented.
*/
return -EBUSY;
}
#endif
#endif /* CONFIG_MEMORY_HOTPLUG */
......@@ -268,6 +268,10 @@ int __meminit vmemmap_populate(struct page *start, unsigned long nr, int node)
return ret;
}
void vmemmap_free(struct page *memmap, unsigned long nr_pages)
{
}
/*
* Add memory segment to the segment list if it doesn't overlap with
* an already present segment.
......
......@@ -558,4 +558,21 @@ int memory_add_physaddr_to_nid(u64 addr)
EXPORT_SYMBOL_GPL(memory_add_physaddr_to_nid);
#endif
#ifdef CONFIG_MEMORY_HOTREMOVE
int arch_remove_memory(u64 start, u64 size)
{
unsigned long start_pfn = start >> PAGE_SHIFT;
unsigned long nr_pages = size >> PAGE_SHIFT;
struct zone *zone;
int ret;
zone = page_zone(pfn_to_page(start_pfn));
ret = __remove_pages(zone, start_pfn, nr_pages);
if (unlikely(ret))
pr_warn("%s: Failed, __remove_pages() == %d\n", __func__,
ret);
return ret;
}
#endif
#endif /* CONFIG_MEMORY_HOTPLUG */
......@@ -57,7 +57,7 @@ void show_mem(unsigned int filter)
printk("Mem-info:\n");
show_free_areas(filter);
printk("Free swap: %6ldkB\n",
nr_swap_pages << (PAGE_SHIFT-10));
get_nr_swap_pages() << (PAGE_SHIFT-10));
printk("%ld pages of RAM\n", totalram_pages);
printk("%ld free pages\n", nr_free_pages());
}
......
......@@ -2235,6 +2235,11 @@ void __meminit vmemmap_populate_print_last(void)
node_start = 0;
}
}
void vmemmap_free(struct page *memmap, unsigned long nr_pages)
{
}
#endif /* CONFIG_SPARSEMEM_VMEMMAP */
static void prot_init_common(unsigned long page_none,
......
......@@ -130,7 +130,6 @@ int arch_setup_additional_pages(struct linux_binprm *bprm,
if (!retval) {
unsigned long addr = MEM_USER_INTRPT;
addr = mmap_region(NULL, addr, INTRPT_SIZE,
MAP_FIXED|MAP_ANONYMOUS|MAP_PRIVATE,
VM_READ|VM_EXEC|
VM_MAYREAD|VM_MAYWRITE|VM_MAYEXEC, 0);
if (addr > (unsigned long) -PAGE_SIZE)
......
......@@ -935,6 +935,14 @@ int remove_memory(u64 start, u64 size)
{
return -EINVAL;
}
#ifdef CONFIG_MEMORY_HOTREMOVE
int arch_remove_memory(u64 start, u64 size)
{
/* TODO */
return -EBUSY;
}
#endif
#endif
struct kmem_cache *pgd_cache;
......
......@@ -61,7 +61,7 @@ void show_mem(unsigned int filter)
global_page_state(NR_PAGETABLE),
global_page_state(NR_BOUNCE),
global_page_state(NR_FILE_PAGES),
nr_swap_pages);
get_nr_swap_pages());
for_each_zone(zone) {
unsigned long flags, order, total = 0, largest_order = -1;
......
......@@ -57,8 +57,8 @@ static inline int numa_cpu_node(int cpu)
#endif
#ifdef CONFIG_NUMA
extern void __cpuinit numa_set_node(int cpu, int node);
extern void __cpuinit numa_clear_node(int cpu);
extern void numa_set_node(int cpu, int node);
extern void numa_clear_node(int cpu);
extern void __init init_cpu_to_node(void);
extern void __cpuinit numa_add_cpu(int cpu);
extern void __cpuinit numa_remove_cpu(int cpu);
......
......@@ -351,6 +351,7 @@ static inline void update_page_count(int level, unsigned long pages) { }
* as a pte too.
*/
extern pte_t *lookup_address(unsigned long address, unsigned int *level);
extern int __split_large_page(pte_t *kpte, unsigned long address, pte_t *pbase);
extern phys_addr_t slow_virt_to_phys(void *__address);
#endif /* !__ASSEMBLY__ */
......
......@@ -696,6 +696,10 @@ EXPORT_SYMBOL(acpi_map_lsapic);
int acpi_unmap_lsapic(int cpu)
{
#ifdef CONFIG_ACPI_NUMA
set_apicid_to_node(per_cpu(x86_cpu_to_apicid, cpu), NUMA_NO_NODE);
#endif
per_cpu(x86_cpu_to_apicid, cpu) = -1;
set_cpu_present(cpu, false);
num_processors--;
......
......@@ -1056,6 +1056,15 @@ void __init setup_arch(char **cmdline_p)
setup_bios_corruption_check();
#endif
/*
* In the memory hotplug case, the kernel needs info from SRAT to
* determine which memory is hotpluggable before allocating memory
* using memblock.
*/
acpi_boot_table_init();
early_acpi_boot_init();
early_parse_srat();
#ifdef CONFIG_X86_32
printk(KERN_DEBUG "initial memory mapped: [mem 0x00000000-%#010lx]\n",
(max_pfn_mapped<<PAGE_SHIFT) - 1);
......@@ -1101,10 +1110,6 @@ void __init setup_arch(char **cmdline_p)
/*
* Parse the ACPI tables for possible boot-time SMP configuration.
*/
acpi_boot_table_init();
early_acpi_boot_init();
initmem_init();
memblock_find_dma_reserve();
......
......@@ -862,6 +862,18 @@ int arch_add_memory(int nid, u64 start, u64 size)
return __add_pages(nid, zone, start_pfn, nr_pages);
}
#ifdef CONFIG_MEMORY_HOTREMOVE
int arch_remove_memory(u64 start, u64 size)
{
unsigned long start_pfn = start >> PAGE_SHIFT;
unsigned long nr_pages = size >> PAGE_SHIFT;
struct zone *zone;
zone = page_zone(pfn_to_page(start_pfn));
return __remove_pages(zone, start_pfn, nr_pages);
}
#endif
#endif
/*
......
......@@ -707,6 +707,343 @@ int arch_add_memory(int nid, u64 start, u64 size)
}
EXPORT_SYMBOL_GPL(arch_add_memory);
#define PAGE_INUSE 0xFD
static void __meminit free_pagetable(struct page *page, int order)
{
struct zone *zone;
bool bootmem = false;
unsigned long magic;
unsigned int nr_pages = 1 << order;
/* bootmem page has reserved flag */
if (PageReserved(page)) {
__ClearPageReserved(page);
bootmem = true;
magic = (unsigned long)page->lru.next;
if (magic == SECTION_INFO || magic == MIX_SECTION_INFO) {
while (nr_pages--)
put_page_bootmem(page++);
} else
__free_pages_bootmem(page, order);
} else
free_pages((unsigned long)page_address(page), order);
/*
* SECTION_INFO pages and MIX_SECTION_INFO pages
* are all allocated by bootmem.
*/
if (bootmem) {
zone = page_zone(page);
zone_span_writelock(zone);
zone->present_pages += nr_pages;
zone_span_writeunlock(zone);
totalram_pages += nr_pages;
}
}
static void __meminit free_pte_table(pte_t *pte_start, pmd_t *pmd)
{
pte_t *pte;
int i;
for (i = 0; i < PTRS_PER_PTE; i++) {
pte = pte_start + i;
if (pte_val(*pte))
return;
}
/* free a pte talbe */
free_pagetable(pmd_page(*pmd), 0);
spin_lock(&init_mm.page_table_lock);
pmd_clear(pmd);
spin_unlock(&init_mm.page_table_lock);
}
static void __meminit free_pmd_table(pmd_t *pmd_start, pud_t *pud)
{
pmd_t *pmd;
int i;
for (i = 0; i < PTRS_PER_PMD; i++) {
pmd = pmd_start + i;
if (pmd_val(*pmd))
return;
}
/* free a pmd talbe */
free_pagetable(pud_page(*pud), 0);
spin_lock(&init_mm.page_table_lock);
pud_clear(pud);
spin_unlock(&init_mm.page_table_lock);
}
/* Return true if pgd is changed, otherwise return false. */
static bool __meminit free_pud_table(pud_t *pud_start, pgd_t *pgd)
{
pud_t *pud;
int i;
for (i = 0; i < PTRS_PER_PUD; i++) {
pud = pud_start + i;
if (pud_val(*pud))
return false;
}
/* free a pud table */
free_pagetable(pgd_page(*pgd), 0);
spin_lock(&init_mm.page_table_lock);
pgd_clear(pgd);
spin_unlock(&init_mm.page_table_lock);
return true;
}
static void __meminit
remove_pte_table(pte_t *pte_start, unsigned long addr, unsigned long end,
bool direct)
{
unsigned long next, pages = 0;
pte_t *pte;
void *page_addr;
phys_addr_t phys_addr;
pte = pte_start + pte_index(addr);
for (; addr < end; addr = next, pte++) {
next = (addr + PAGE_SIZE) & PAGE_MASK;
if (next > end)
next = end;
if (!pte_present(*pte))
continue;
/*
* We mapped [0,1G) memory as identity mapping when
* initializing, in arch/x86/kernel/head_64.S. These
* pagetables cannot be removed.
*/
phys_addr = pte_val(*pte) + (addr & PAGE_MASK);
if (phys_addr < (phys_addr_t)0x40000000)
return;
if (IS_ALIGNED(addr, PAGE_SIZE) &&
IS_ALIGNED(next, PAGE_SIZE)) {
/*
* Do not free direct mapping pages since they were
* freed when offlining, or simplely not in use.
*/
if (!direct)
free_pagetable(pte_page(*pte), 0);
spin_lock(&init_mm.page_table_lock);
pte_clear(&init_mm, addr, pte);
spin_unlock(&init_mm.page_table_lock);
/* For non-direct mapping, pages means nothing. */
pages++;
} else {
/*
* If we are here, we are freeing vmemmap pages since
* direct mapped memory ranges to be freed are aligned.
*
* If we are not removing the whole page, it means
* other page structs in this page are being used and
* we canot remove them. So fill the unused page_structs
* with 0xFD, and remove the page when it is wholly
* filled with 0xFD.
*/
memset((void *)addr, PAGE_INUSE, next - addr);
page_addr = page_address(pte_page(*pte));
if (!memchr_inv(page_addr, PAGE_INUSE, PAGE_SIZE)) {
free_pagetable(pte_page(*pte), 0);
spin_lock(&init_mm.page_table_lock);
pte_clear(&init_mm, addr, pte);
spin_unlock(&init_mm.page_table_lock);
}
}
}
/* Call free_pte_table() in remove_pmd_table(). */
flush_tlb_all();
if (direct)
update_page_count(PG_LEVEL_4K, -pages);
}
static void __meminit
remove_pmd_table(pmd_t *pmd_start, unsigned long addr, unsigned long end,
bool direct)
{
unsigned long next, pages = 0;
pte_t *pte_base;
pmd_t *pmd;
void *page_addr;
pmd = pmd_start + pmd_index(addr);
for (; addr < end; addr = next, pmd++) {
next = pmd_addr_end(addr, end);
if (!pmd_present(*pmd))
continue;
if (pmd_large(*pmd)) {
if (IS_ALIGNED(addr, PMD_SIZE) &&
IS_ALIGNED(next, PMD_SIZE)) {
if (!direct)
free_pagetable(pmd_page(*pmd),
get_order(PMD_SIZE));
spin_lock(&init_mm.page_table_lock);
pmd_clear(pmd);
spin_unlock(&init_mm.page_table_lock);
pages++;
} else {
/* If here, we are freeing vmemmap pages. */
memset((void *)addr, PAGE_INUSE, next - addr);
page_addr = page_address(pmd_page(*pmd));
if (!memchr_inv(page_addr, PAGE_INUSE,
PMD_SIZE)) {
free_pagetable(pmd_page(*pmd),
get_order(PMD_SIZE));
spin_lock(&init_mm.page_table_lock);
pmd_clear(pmd);
spin_unlock(&init_mm.page_table_lock);
}
}
continue;
}
pte_base = (pte_t *)pmd_page_vaddr(*pmd);
remove_pte_table(pte_base, addr, next, direct);
free_pte_table(pte_base, pmd);
}
/* Call free_pmd_table() in remove_pud_table(). */
if (direct)
update_page_count(PG_LEVEL_2M, -pages);
}
static void __meminit
remove_pud_table(pud_t *pud_start, unsigned long addr, unsigned long end,
bool direct)
{
unsigned long next, pages = 0;
pmd_t *pmd_base;
pud_t *pud;
void *page_addr;
pud = pud_start + pud_index(addr);
for (; addr < end; addr = next, pud++) {
next = pud_addr_end(addr, end);
if (!pud_present(*pud))
continue;
if (pud_large(*pud)) {
if (IS_ALIGNED(addr, PUD_SIZE) &&
IS_ALIGNED(next, PUD_SIZE)) {
if (!direct)
free_pagetable(pud_page(*pud),
get_order(PUD_SIZE));
spin_lock(&init_mm.page_table_lock);
pud_clear(pud);
spin_unlock(&init_mm.page_table_lock);
pages++;
} else {
/* If here, we are freeing vmemmap pages. */
memset((void *)addr, PAGE_INUSE, next - addr);
page_addr = page_address(pud_page(*pud));
if (!memchr_inv(page_addr, PAGE_INUSE,
PUD_SIZE)) {
free_pagetable(pud_page(*pud),
get_order(PUD_SIZE));
spin_lock(&init_mm.page_table_lock);
pud_clear(pud);
spin_unlock(&init_mm.page_table_lock);
}
}
continue;
}
pmd_base = (pmd_t *)pud_page_vaddr(*pud);
remove_pmd_table(pmd_base, addr, next, direct);
free_pmd_table(pmd_base, pud);
}
if (direct)
update_page_count(PG_LEVEL_1G, -pages);
}
/* start and end are both virtual address. */
static void __meminit
remove_pagetable(unsigned long start, unsigned long end, bool direct)
{
unsigned long next;
pgd_t *pgd;
pud_t *pud;
bool pgd_changed = false;
for (; start < end; start = next) {
next = pgd_addr_end(start, end);
pgd = pgd_offset_k(start);
if (!pgd_present(*pgd))
continue;
pud = (pud_t *)pgd_page_vaddr(*pgd);
remove_pud_table(pud, start, next, direct);
if (free_pud_table(pud, pgd))
pgd_changed = true;
}
if (pgd_changed)
sync_global_pgds(start, end - 1);
flush_tlb_all();
}
void __ref vmemmap_free(struct page *memmap, unsigned long nr_pages)
{
unsigned long start = (unsigned long)memmap;
unsigned long end = (unsigned long)(memmap + nr_pages);
remove_pagetable(start, end, false);
}
static void __meminit
kernel_physical_mapping_remove(unsigned long start, unsigned long end)
{
start = (unsigned long)__va(start);
end = (unsigned long)__va(end);
remove_pagetable(start, end, true);
}
#ifdef CONFIG_MEMORY_HOTREMOVE
int __ref arch_remove_memory(u64 start, u64 size)
{
unsigned long start_pfn = start >> PAGE_SHIFT;
unsigned long nr_pages = size >> PAGE_SHIFT;
struct zone *zone;
int ret;
zone = page_zone(pfn_to_page(start_pfn));
kernel_physical_mapping_remove(start, start + size);
ret = __remove_pages(zone, start_pfn, nr_pages);
WARN_ON_ONCE(ret);
return ret;
}
#endif
#endif /* CONFIG_MEMORY_HOTPLUG */
static struct kcore_list kcore_vsyscall;
......@@ -1019,6 +1356,66 @@ vmemmap_populate(struct page *start_page, unsigned long size, int node)
return 0;
}
#if defined(CONFIG_MEMORY_HOTPLUG_SPARSE) && defined(CONFIG_HAVE_BOOTMEM_INFO_NODE)
void register_page_bootmem_memmap(unsigned long section_nr,
struct page *start_page, unsigned long size)
{
unsigned long addr = (unsigned long)start_page;
unsigned long end = (unsigned long)(start_page + size);
unsigned long next;
pgd_t *pgd;
pud_t *pud;
pmd_t *pmd;
unsigned int nr_pages;
struct page *page;
for (; addr < end; addr = next) {
pte_t *pte = NULL;
pgd = pgd_offset_k(addr);
if (pgd_none(*pgd)) {
next = (addr + PAGE_SIZE) & PAGE_MASK;
continue;
}
get_page_bootmem(section_nr, pgd_page(*pgd), MIX_SECTION_INFO);
pud = pud_offset(pgd, addr);
if (pud_none(*pud)) {
next = (addr + PAGE_SIZE) & PAGE_MASK;
continue;
}
get_page_bootmem(section_nr, pud_page(*pud), MIX_SECTION_INFO);
if (!cpu_has_pse) {
next = (addr + PAGE_SIZE) & PAGE_MASK;
pmd = pmd_offset(pud, addr);
if (pmd_none(*pmd))
continue;
get_page_bootmem(section_nr, pmd_page(*pmd),
MIX_SECTION_INFO);
pte = pte_offset_kernel(pmd, addr);
if (pte_none(*pte))
continue;
get_page_bootmem(section_nr, pte_page(*pte),
SECTION_INFO);
} else {
next = pmd_addr_end(addr, end);
pmd = pmd_offset(pud, addr);
if (pmd_none(*pmd))
continue;
nr_pages = 1 << (get_order(PMD_SIZE));
page = pmd_page(*pmd);
while (nr_pages--)
get_page_bootmem(section_nr, page++,
SECTION_INFO);
}
}
}
#endif
void __meminit vmemmap_populate_print_last(void)
{
if (p_start) {
......
......@@ -56,7 +56,7 @@ early_param("numa", numa_setup);
/*
* apicid, cpu, node mappings
*/
s16 __apicid_to_node[MAX_LOCAL_APIC] __cpuinitdata = {
s16 __apicid_to_node[MAX_LOCAL_APIC] = {
[0 ... MAX_LOCAL_APIC-1] = NUMA_NO_NODE
};
......@@ -78,7 +78,7 @@ EXPORT_SYMBOL(node_to_cpumask_map);
DEFINE_EARLY_PER_CPU(int, x86_cpu_to_node_map, NUMA_NO_NODE);
EXPORT_EARLY_PER_CPU_SYMBOL(x86_cpu_to_node_map);
void __cpuinit numa_set_node(int cpu, int node)
void numa_set_node(int cpu, int node)
{
int *cpu_to_node_map = early_per_cpu_ptr(x86_cpu_to_node_map);
......@@ -101,7 +101,7 @@ void __cpuinit numa_set_node(int cpu, int node)
set_cpu_numa_node(cpu, node);
}
void __cpuinit numa_clear_node(int cpu)
void numa_clear_node(int cpu)
{
numa_set_node(cpu, NUMA_NO_NODE);
}
......@@ -213,10 +213,9 @@ static void __init setup_node_data(int nid, u64 start, u64 end)
* Allocate node data. Try node-local memory and then any node.
* Never allocate in DMA zone.
*/
nd_pa = memblock_alloc_nid(nd_size, SMP_CACHE_BYTES, nid);
nd_pa = memblock_alloc_try_nid(nd_size, SMP_CACHE_BYTES, nid);
if (!nd_pa) {
pr_err("Cannot find %zu bytes in node %d\n",
nd_size, nid);
pr_err("Cannot find %zu bytes in any node\n", nd_size);
return;
}
nd = __va(nd_pa);
......@@ -561,10 +560,12 @@ static int __init numa_init(int (*init_func)(void))
for (i = 0; i < MAX_LOCAL_APIC; i++)
set_apicid_to_node(i, NUMA_NO_NODE);
nodes_clear(numa_nodes_parsed);
/*
* Do not clear numa_nodes_parsed or zero numa_meminfo here, because
* SRAT was parsed earlier in early_parse_srat().
*/
nodes_clear(node_possible_map);
nodes_clear(node_online_map);
memset(&numa_meminfo, 0, sizeof(numa_meminfo));
WARN_ON(memblock_set_node(0, ULLONG_MAX, MAX_NUMNODES));
numa_reset_distance();
......
......@@ -529,21 +529,13 @@ try_preserve_large_page(pte_t *kpte, unsigned long address,
return do_split;
}
static int split_large_page(pte_t *kpte, unsigned long address)
int __split_large_page(pte_t *kpte, unsigned long address, pte_t *pbase)
{
unsigned long pfn, pfninc = 1;
unsigned int i, level;
pte_t *pbase, *tmp;
pte_t *tmp;
pgprot_t ref_prot;
struct page *base;
if (!debug_pagealloc)
spin_unlock(&cpa_lock);
base = alloc_pages(GFP_KERNEL | __GFP_NOTRACK, 0);
if (!debug_pagealloc)
spin_lock(&cpa_lock);
if (!base)
return -ENOMEM;
struct page *base = virt_to_page(pbase);
spin_lock(&pgd_lock);
/*
......@@ -551,10 +543,11 @@ static int split_large_page(pte_t *kpte, unsigned long address)
* up for us already:
*/
tmp = lookup_address(address, &level);
if (tmp != kpte)
goto out_unlock;
if (tmp != kpte) {
spin_unlock(&pgd_lock);
return 1;
}
pbase = (pte_t *)page_address(base);
paravirt_alloc_pte(&init_mm, page_to_pfn(base));
ref_prot = pte_pgprot(pte_clrhuge(*kpte));
/*
......@@ -601,17 +594,27 @@ static int split_large_page(pte_t *kpte, unsigned long address)
* going on.
*/
__flush_tlb_all();
spin_unlock(&pgd_lock);
base = NULL;
return 0;
}
out_unlock:
/*
* If we dropped out via the lookup_address check under
* pgd_lock then stick the page back into the pool:
*/
if (base)
static int split_large_page(pte_t *kpte, unsigned long address)
{
pte_t *pbase;
struct page *base;
if (!debug_pagealloc)
spin_unlock(&cpa_lock);
base = alloc_pages(GFP_KERNEL | __GFP_NOTRACK, 0);
if (!debug_pagealloc)
spin_lock(&cpa_lock);
if (!base)
return -ENOMEM;
pbase = (pte_t *)page_address(base);
if (__split_large_page(kpte, address, pbase))
__free_page(base);
spin_unlock(&pgd_lock);
return 0;
}
......
......@@ -141,11 +141,126 @@ static inline int save_add_info(void) {return 1;}
static inline int save_add_info(void) {return 0;}
#endif
#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
static void __init
handle_movablemem(int node, u64 start, u64 end, u32 hotpluggable)
{
int overlap, i;
unsigned long start_pfn, end_pfn;
start_pfn = PFN_DOWN(start);
end_pfn = PFN_UP(end);
/*
* For movablemem_map=acpi:
*
* SRAT: |_____| |_____| |_________| |_________| ......
* node id: 0 1 1 2
* hotpluggable: n y y n
* movablemem_map: |_____| |_________|
*
* Using movablemem_map, we can prevent memblock from allocating memory
* on ZONE_MOVABLE at boot time.
*
* Before parsing SRAT, memblock has already reserve some memory ranges
* for other purposes, such as for kernel image. We cannot prevent
* kernel from using these memory, so we need to exclude these memory
* even if it is hotpluggable.
* Furthermore, to ensure the kernel has enough memory to boot, we make
* all the memory on the node which the kernel resides in
* un-hotpluggable.
*/
if (hotpluggable && movablemem_map.acpi) {
/* Exclude ranges reserved by memblock. */
struct memblock_type *rgn = &memblock.reserved;
for (i = 0; i < rgn->cnt; i++) {
if (end <= rgn->regions[i].base ||
start >= rgn->regions[i].base +
rgn->regions[i].size)
continue;
/*
* If the memory range overlaps the memory reserved by
* memblock, then the kernel resides in this node.
*/
node_set(node, movablemem_map.numa_nodes_kernel);
goto out;
}
/*
* If the kernel resides in this node, then the whole node
* should not be hotpluggable.
*/
if (node_isset(node, movablemem_map.numa_nodes_kernel))
goto out;
insert_movablemem_map(start_pfn, end_pfn);
/*
* numa_nodes_hotplug nodemask represents which nodes are put
* into movablemem_map.map[].
*/
node_set(node, movablemem_map.numa_nodes_hotplug);
goto out;
}
/*
* For movablemem_map=nn[KMG]@ss[KMG]:
*
* SRAT: |_____| |_____| |_________| |_________| ......
* node id: 0 1 1 2
* user specified: |__| |___|
* movablemem_map: |___| |_________| |______| ......
*
* Using movablemem_map, we can prevent memblock from allocating memory
* on ZONE_MOVABLE at boot time.
*
* NOTE: In this case, SRAT info will be ingored.
*/
overlap = movablemem_map_overlap(start_pfn, end_pfn);
if (overlap >= 0) {
/*
* If part of this range is in movablemem_map, we need to
* add the range after it to extend the range to the end
* of the node, because from the min address specified to
* the end of the node will be ZONE_MOVABLE.
*/
start_pfn = max(start_pfn,
movablemem_map.map[overlap].start_pfn);
insert_movablemem_map(start_pfn, end_pfn);
/*
* Set the nodemask, so that if the address range on one node
* is not continuse, we can add the subsequent ranges on the
* same node into movablemem_map.
*/
node_set(node, movablemem_map.numa_nodes_hotplug);
} else {
if (node_isset(node, movablemem_map.numa_nodes_hotplug))
/*
* Insert the range if we already have movable ranges
* on the same node.
*/
insert_movablemem_map(start_pfn, end_pfn);
}
out:
return;
}
#else /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
static inline void
handle_movablemem(int node, u64 start, u64 end, u32 hotpluggable)
{
}
#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
/* Callback for parsing of the Proximity Domain <-> Memory Area mappings */
int __init
acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
{
u64 start, end;
u32 hotpluggable;
int node, pxm;
if (srat_disabled())
......@@ -154,7 +269,8 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
goto out_err_bad_srat;
if ((ma->flags & ACPI_SRAT_MEM_ENABLED) == 0)
goto out_err;
if ((ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE) && !save_add_info())
hotpluggable = ma->flags & ACPI_SRAT_MEM_HOT_PLUGGABLE;
if (hotpluggable && !save_add_info())
goto out_err;
start = ma->base_address;
......@@ -174,9 +290,12 @@ acpi_numa_memory_affinity_init(struct acpi_srat_mem_affinity *ma)
node_set(node, numa_nodes_parsed);
printk(KERN_INFO "SRAT: Node %u PXM %u [mem %#010Lx-%#010Lx]\n",
printk(KERN_INFO "SRAT: Node %u PXM %u [mem %#010Lx-%#010Lx] %s\n",
node, pxm,
(unsigned long long) start, (unsigned long long) end - 1);
(unsigned long long) start, (unsigned long long) end - 1,
hotpluggable ? "Hot Pluggable": "");
handle_movablemem(node, start, end, hotpluggable);
return 0;
out_err_bad_srat:
......
......@@ -18,6 +18,7 @@
#include <linux/mutex.h>
#include <linux/idr.h>
#include <linux/log2.h>
#include <linux/pm_runtime.h>
#include "blk.h"
......@@ -534,6 +535,14 @@ static void register_disk(struct gendisk *disk)
return;
}
}
/*
* avoid probable deadlock caused by allocating memory with
* GFP_KERNEL in runtime_resume callback of its all ancestor
* devices
*/
pm_runtime_set_memalloc_noio(ddev, true);
disk->part0.holder_dir = kobject_create_and_add("holders", &ddev->kobj);
disk->slave_dir = kobject_create_and_add("slaves", &ddev->kobj);
......@@ -663,6 +672,7 @@ void del_gendisk(struct gendisk *disk)
disk->driverfs_dev = NULL;
if (!sysfs_deprecated)
sysfs_remove_link(block_depr, dev_name(disk_to_dev(disk)));
pm_runtime_set_memalloc_noio(disk_to_dev(disk), false);
device_del(disk_to_dev(disk));
}
EXPORT_SYMBOL(del_gendisk);
......
......@@ -280,9 +280,11 @@ static int acpi_memory_enable_device(struct acpi_memory_device *mem_device)
static int acpi_memory_remove_memory(struct acpi_memory_device *mem_device)
{
int result = 0;
int result = 0, nid;
struct acpi_memory_info *info, *n;
nid = acpi_get_node(mem_device->device->handle);
list_for_each_entry_safe(info, n, &mem_device->res_list, list) {
if (info->failed)
/* The kernel does not use this memory block */
......@@ -295,7 +297,9 @@ static int acpi_memory_remove_memory(struct acpi_memory_device *mem_device)
*/
return -EBUSY;
result = remove_memory(info->start_addr, info->length);
if (nid < 0)
nid = memory_add_physaddr_to_nid(info->start_addr);
result = remove_memory(nid, info->start_addr, info->length);
if (result)
return result;
......
......@@ -282,10 +282,10 @@ acpi_table_parse_srat(enum acpi_srat_type id,
handler, max_entries);
}
int __init acpi_numa_init(void)
{
int cnt = 0;
static int srat_mem_cnt;
void __init early_parse_srat(void)
{
/*
* Should not limit number with cpu num that is from NR_CPUS or nr_cpus=
* SRAT cpu entries could have different order with that in MADT.
......@@ -295,21 +295,24 @@ int __init acpi_numa_init(void)
/* SRAT: Static Resource Affinity Table */
if (!acpi_table_parse(ACPI_SIG_SRAT, acpi_parse_srat)) {
acpi_table_parse_srat(ACPI_SRAT_TYPE_X2APIC_CPU_AFFINITY,
acpi_parse_x2apic_affinity, 0);
acpi_parse_x2apic_affinity, 0);
acpi_table_parse_srat(ACPI_SRAT_TYPE_CPU_AFFINITY,
acpi_parse_processor_affinity, 0);
cnt = acpi_table_parse_srat(ACPI_SRAT_TYPE_MEMORY_AFFINITY,
acpi_parse_memory_affinity,
NR_NODE_MEMBLKS);
acpi_parse_processor_affinity, 0);
srat_mem_cnt = acpi_table_parse_srat(ACPI_SRAT_TYPE_MEMORY_AFFINITY,
acpi_parse_memory_affinity,
NR_NODE_MEMBLKS);
}
}
int __init acpi_numa_init(void)
{
/* SLIT: System Locality Information Table */
acpi_table_parse(ACPI_SIG_SLIT, acpi_parse_slit);
acpi_numa_arch_fixup();
if (cnt < 0)
return cnt;
if (srat_mem_cnt < 0)
return srat_mem_cnt;
else if (!parsed_numa_memblks)
return -ENOENT;
return 0;
......
......@@ -45,6 +45,7 @@
#include <linux/cpuidle.h>
#include <linux/slab.h>
#include <linux/acpi.h>
#include <linux/memory_hotplug.h>
#include <asm/io.h>
#include <asm/cpu.h>
......@@ -641,6 +642,7 @@ static int acpi_processor_remove(struct acpi_device *device)
per_cpu(processors, pr->id) = NULL;
per_cpu(processor_device_array, pr->id) = NULL;
try_offline_node(cpu_to_node(pr->id));
free:
free_cpumask_var(pr->throttling.shared_cpu_map);
......
......@@ -693,6 +693,12 @@ int offline_memory_block(struct memory_block *mem)
return ret;
}
/* return true if the memory block is offlined, otherwise, return false */
bool is_memblock_offlined(struct memory_block *mem)
{
return mem->state == MEM_OFFLINE;
}
/*
* Initialize the sysfs support for memory devices...
*/
......
......@@ -124,6 +124,76 @@ unsigned long pm_runtime_autosuspend_expiration(struct device *dev)
}
EXPORT_SYMBOL_GPL(pm_runtime_autosuspend_expiration);
static int dev_memalloc_noio(struct device *dev, void *data)
{
return dev->power.memalloc_noio;
}
/*
* pm_runtime_set_memalloc_noio - Set a device's memalloc_noio flag.
* @dev: Device to handle.
* @enable: True for setting the flag and False for clearing the flag.
*
* Set the flag for all devices in the path from the device to the
* root device in the device tree if @enable is true, otherwise clear
* the flag for devices in the path whose siblings don't set the flag.
*
* The function should only be called by block device, or network
* device driver for solving the deadlock problem during runtime
* resume/suspend:
*
* If memory allocation with GFP_KERNEL is called inside runtime
* resume/suspend callback of any one of its ancestors(or the
* block device itself), the deadlock may be triggered inside the
* memory allocation since it might not complete until the block
* device becomes active and the involed page I/O finishes. The
* situation is pointed out first by Alan Stern. Network device
* are involved in iSCSI kind of situation.
*
* The lock of dev_hotplug_mutex is held in the function for handling
* hotplug race because pm_runtime_set_memalloc_noio() may be called
* in async probe().
*
* The function should be called between device_add() and device_del()
* on the affected device(block/network device).
*/
void pm_runtime_set_memalloc_noio(struct device *dev, bool enable)
{
static DEFINE_MUTEX(dev_hotplug_mutex);
mutex_lock(&dev_hotplug_mutex);
for (;;) {
bool enabled;
/* hold power lock since bitfield is not SMP-safe. */
spin_lock_irq(&dev->power.lock);
enabled = dev->power.memalloc_noio;
dev->power.memalloc_noio = enable;
spin_unlock_irq(&dev->power.lock);
/*
* not need to enable ancestors any more if the device
* has been enabled.
*/
if (enabled && enable)
break;
dev = dev->parent;
/*
* clear flag of the parent device only if all the
* children don't set the flag because ancestor's
* flag was set by any one of the descendants.
*/
if (!dev || (!enable &&
device_for_each_child(dev, NULL,
dev_memalloc_noio)))
break;
}
mutex_unlock(&dev_hotplug_mutex);
}
EXPORT_SYMBOL_GPL(pm_runtime_set_memalloc_noio);
/**
* rpm_check_suspend_allowed - Test whether a device may be suspended.
* @dev: Device to test.
......@@ -278,7 +348,24 @@ static int rpm_callback(int (*cb)(struct device *), struct device *dev)
if (!cb)
return -ENOSYS;
retval = __rpm_callback(cb, dev);
if (dev->power.memalloc_noio) {
unsigned int noio_flag;
/*
* Deadlock might be caused if memory allocation with
* GFP_KERNEL happens inside runtime_suspend and
* runtime_resume callbacks of one block device's
* ancestor or the block device itself. Network
* device might be thought as part of iSCSI block
* device, so network device and its ancestor should
* be marked as memalloc_noio too.
*/
noio_flag = memalloc_noio_save();
retval = __rpm_callback(cb, dev);
memalloc_noio_restore(noio_flag);
} else {
retval = __rpm_callback(cb, dev);
}
dev->power.runtime_error = retval;
return retval != -EACCES ? retval : -EIO;
......
......@@ -21,6 +21,7 @@
#include <linux/types.h>
#include <linux/bootmem.h>
#include <linux/slab.h>
#include <linux/mm.h>
/*
* Data types ------------------------------------------------------------------
......@@ -52,6 +53,9 @@ static ssize_t start_show(struct firmware_map_entry *entry, char *buf);
static ssize_t end_show(struct firmware_map_entry *entry, char *buf);
static ssize_t type_show(struct firmware_map_entry *entry, char *buf);
static struct firmware_map_entry * __meminit
firmware_map_find_entry(u64 start, u64 end, const char *type);
/*
* Static data -----------------------------------------------------------------
*/
......@@ -79,7 +83,52 @@ static const struct sysfs_ops memmap_attr_ops = {
.show = memmap_attr_show,
};
static struct kobj_type memmap_ktype = {
/* Firmware memory map entries. */
static LIST_HEAD(map_entries);
static DEFINE_SPINLOCK(map_entries_lock);
/*
* For memory hotplug, there is no way to free memory map entries allocated
* by boot mem after the system is up. So when we hot-remove memory whose
* map entry is allocated by bootmem, we need to remember the storage and
* reuse it when the memory is hot-added again.
*/
static LIST_HEAD(map_entries_bootmem);
static DEFINE_SPINLOCK(map_entries_bootmem_lock);
static inline struct firmware_map_entry *
to_memmap_entry(struct kobject *kobj)
{
return container_of(kobj, struct firmware_map_entry, kobj);
}
static void __meminit release_firmware_map_entry(struct kobject *kobj)
{
struct firmware_map_entry *entry = to_memmap_entry(kobj);
if (PageReserved(virt_to_page(entry))) {
/*
* Remember the storage allocated by bootmem, and reuse it when
* the memory is hot-added again. The entry will be added to
* map_entries_bootmem here, and deleted from &map_entries in
* firmware_map_remove_entry().
*/
if (firmware_map_find_entry(entry->start, entry->end,
entry->type)) {
spin_lock(&map_entries_bootmem_lock);
list_add(&entry->list, &map_entries_bootmem);
spin_unlock(&map_entries_bootmem_lock);
}
return;
}
kfree(entry);
}
static struct kobj_type __refdata memmap_ktype = {
.release = release_firmware_map_entry,
.sysfs_ops = &memmap_attr_ops,
.default_attrs = def_attrs,
};
......@@ -88,13 +137,6 @@ static struct kobj_type memmap_ktype = {
* Registration functions ------------------------------------------------------
*/
/*
* Firmware memory map entries. No locking is needed because the
* firmware_map_add() and firmware_map_add_early() functions are called
* in firmware initialisation code in one single thread of execution.
*/
static LIST_HEAD(map_entries);
/**
* firmware_map_add_entry() - Does the real work to add a firmware memmap entry.
* @start: Start of the memory range.
......@@ -118,11 +160,25 @@ static int firmware_map_add_entry(u64 start, u64 end,
INIT_LIST_HEAD(&entry->list);
kobject_init(&entry->kobj, &memmap_ktype);
spin_lock(&map_entries_lock);
list_add_tail(&entry->list, &map_entries);
spin_unlock(&map_entries_lock);
return 0;
}
/**
* firmware_map_remove_entry() - Does the real work to remove a firmware
* memmap entry.
* @entry: removed entry.
*
* The caller must hold map_entries_lock, and release it properly.
**/
static inline void firmware_map_remove_entry(struct firmware_map_entry *entry)
{
list_del(&entry->list);
}
/*
* Add memmap entry on sysfs
*/
......@@ -144,6 +200,78 @@ static int add_sysfs_fw_map_entry(struct firmware_map_entry *entry)
return 0;
}
/*
* Remove memmap entry on sysfs
*/
static inline void remove_sysfs_fw_map_entry(struct firmware_map_entry *entry)
{
kobject_put(&entry->kobj);
}
/*
* firmware_map_find_entry_in_list() - Search memmap entry in a given list.
* @start: Start of the memory range.
* @end: End of the memory range (exclusive).
* @type: Type of the memory range.
* @list: In which to find the entry.
*
* This function is to find the memmap entey of a given memory range in a
* given list. The caller must hold map_entries_lock, and must not release
* the lock until the processing of the returned entry has completed.
*
* Return: Pointer to the entry to be found on success, or NULL on failure.
*/
static struct firmware_map_entry * __meminit
firmware_map_find_entry_in_list(u64 start, u64 end, const char *type,
struct list_head *list)
{
struct firmware_map_entry *entry;
list_for_each_entry(entry, list, list)
if ((entry->start == start) && (entry->end == end) &&
(!strcmp(entry->type, type))) {
return entry;
}
return NULL;
}
/*
* firmware_map_find_entry() - Search memmap entry in map_entries.
* @start: Start of the memory range.
* @end: End of the memory range (exclusive).
* @type: Type of the memory range.
*
* This function is to find the memmap entey of a given memory range.
* The caller must hold map_entries_lock, and must not release the lock
* until the processing of the returned entry has completed.
*
* Return: Pointer to the entry to be found on success, or NULL on failure.
*/
static struct firmware_map_entry * __meminit
firmware_map_find_entry(u64 start, u64 end, const char *type)
{
return firmware_map_find_entry_in_list(start, end, type, &map_entries);
}
/*
* firmware_map_find_entry_bootmem() - Search memmap entry in map_entries_bootmem.
* @start: Start of the memory range.
* @end: End of the memory range (exclusive).
* @type: Type of the memory range.
*
* This function is similar to firmware_map_find_entry except that it find the
* given entry in map_entries_bootmem.
*
* Return: Pointer to the entry to be found on success, or NULL on failure.
*/
static struct firmware_map_entry * __meminit
firmware_map_find_entry_bootmem(u64 start, u64 end, const char *type)
{
return firmware_map_find_entry_in_list(start, end, type,
&map_entries_bootmem);
}
/**
* firmware_map_add_hotplug() - Adds a firmware mapping entry when we do
* memory hotplug.
......@@ -161,9 +289,19 @@ int __meminit firmware_map_add_hotplug(u64 start, u64 end, const char *type)
{
struct firmware_map_entry *entry;
entry = kzalloc(sizeof(struct firmware_map_entry), GFP_ATOMIC);
if (!entry)
return -ENOMEM;
entry = firmware_map_find_entry_bootmem(start, end, type);
if (!entry) {
entry = kzalloc(sizeof(struct firmware_map_entry), GFP_ATOMIC);
if (!entry)
return -ENOMEM;
} else {
/* Reuse storage allocated by bootmem. */
spin_lock(&map_entries_bootmem_lock);
list_del(&entry->list);
spin_unlock(&map_entries_bootmem_lock);
memset(entry, 0, sizeof(*entry));
}
firmware_map_add_entry(start, end, type, entry);
/* create the memmap entry */
......@@ -196,6 +334,36 @@ int __init firmware_map_add_early(u64 start, u64 end, const char *type)
return firmware_map_add_entry(start, end, type, entry);
}
/**
* firmware_map_remove() - remove a firmware mapping entry
* @start: Start of the memory range.
* @end: End of the memory range.
* @type: Type of the memory range.
*
* removes a firmware mapping entry.
*
* Returns 0 on success, or -EINVAL if no entry.
**/
int __meminit firmware_map_remove(u64 start, u64 end, const char *type)
{
struct firmware_map_entry *entry;
spin_lock(&map_entries_lock);
entry = firmware_map_find_entry(start, end - 1, type);
if (!entry) {
spin_unlock(&map_entries_lock);
return -EINVAL;
}
firmware_map_remove_entry(entry);
spin_unlock(&map_entries_lock);
/* remove the memmap entry */
remove_sysfs_fw_map_entry(entry);
return 0;
}
/*
* Sysfs functions -------------------------------------------------------------
*/
......@@ -217,8 +385,10 @@ static ssize_t type_show(struct firmware_map_entry *entry, char *buf)
return snprintf(buf, PAGE_SIZE, "%s\n", entry->type);
}
#define to_memmap_attr(_attr) container_of(_attr, struct memmap_attribute, attr)
#define to_memmap_entry(obj) container_of(obj, struct firmware_map_entry, kobj)
static inline struct memmap_attribute *to_memmap_attr(struct attribute *attr)
{
return container_of(attr, struct memmap_attribute, attr);
}
static ssize_t memmap_attr_show(struct kobject *kobj,
struct attribute *attr, char *buf)
......
......@@ -25,8 +25,8 @@ struct shadow_info {
/*
* It would be nice if we scaled with the size of transaction.
*/
#define HASH_SIZE 256
#define HASH_MASK (HASH_SIZE - 1)
#define DM_HASH_SIZE 256
#define DM_HASH_MASK (DM_HASH_SIZE - 1)
struct dm_transaction_manager {
int is_clone;
......@@ -36,7 +36,7 @@ struct dm_transaction_manager {
struct dm_space_map *sm;
spinlock_t lock;
struct hlist_head buckets[HASH_SIZE];
struct hlist_head buckets[DM_HASH_SIZE];
};
/*----------------------------------------------------------------*/
......@@ -44,7 +44,7 @@ struct dm_transaction_manager {
static int is_shadow(struct dm_transaction_manager *tm, dm_block_t b)
{
int r = 0;
unsigned bucket = dm_hash_block(b, HASH_MASK);
unsigned bucket = dm_hash_block(b, DM_HASH_MASK);
struct shadow_info *si;
struct hlist_node *n;
......@@ -71,7 +71,7 @@ static void insert_shadow(struct dm_transaction_manager *tm, dm_block_t b)
si = kmalloc(sizeof(*si), GFP_NOIO);
if (si) {
si->where = b;
bucket = dm_hash_block(b, HASH_MASK);
bucket = dm_hash_block(b, DM_HASH_MASK);
spin_lock(&tm->lock);
hlist_add_head(&si->hlist, tm->buckets + bucket);
spin_unlock(&tm->lock);
......@@ -86,7 +86,7 @@ static void wipe_shadow_table(struct dm_transaction_manager *tm)
int i;
spin_lock(&tm->lock);
for (i = 0; i < HASH_SIZE; i++) {
for (i = 0; i < DM_HASH_SIZE; i++) {
bucket = tm->buckets + i;
hlist_for_each_entry_safe(si, n, tmp, bucket, hlist)
kfree(si);
......@@ -115,7 +115,7 @@ static struct dm_transaction_manager *dm_tm_create(struct dm_block_manager *bm,
tm->sm = sm;
spin_lock_init(&tm->lock);
for (i = 0; i < HASH_SIZE; i++)
for (i = 0; i < DM_HASH_SIZE; i++)
INIT_HLIST_HEAD(tm->buckets + i);
return tm;
......
......@@ -404,7 +404,7 @@ static inline struct page *zbud_unuse_zbudpage(struct zbudpage *zbudpage,
else
zbud_pers_pageframes--;
zbudpage_spin_unlock(zbudpage);
reset_page_mapcount(page);
page_mapcount_reset(page);
init_page_count(page);
page->index = 0;
return page;
......
......@@ -472,7 +472,7 @@ static void reset_page(struct page *page)
set_page_private(page, 0);
page->mapping = NULL;
page->freelist = NULL;
reset_page_mapcount(page);
page_mapcount_reset(page);
}
static void free_zspage(struct page *first_page)
......
......@@ -5177,6 +5177,7 @@ int usb_reset_device(struct usb_device *udev)
{
int ret;
int i;
unsigned int noio_flag;
struct usb_host_config *config = udev->actconfig;
if (udev->state == USB_STATE_NOTATTACHED ||
......@@ -5186,6 +5187,17 @@ int usb_reset_device(struct usb_device *udev)
return -EINVAL;
}
/*
* Don't allocate memory with GFP_KERNEL in current
* context to avoid possible deadlock if usb mass
* storage interface or usbnet interface(iSCSI case)
* is included in current configuration. The easist
* approach is to do it for every device reset,
* because the device 'memalloc_noio' flag may have
* not been set before reseting the usb device.
*/
noio_flag = memalloc_noio_save();
/* Prevent autosuspend during the reset */
usb_autoresume_device(udev);
......@@ -5230,6 +5242,7 @@ int usb_reset_device(struct usb_device *udev)
}
usb_autosuspend_device(udev);
memalloc_noio_restore(noio_flag);
return ret;
}
EXPORT_SYMBOL_GPL(usb_reset_device);
......
......@@ -101,7 +101,7 @@ static int aio_setup_ring(struct kioctx *ctx)
struct aio_ring *ring;
struct aio_ring_info *info = &ctx->ring_info;
unsigned nr_events = ctx->max_reqs;
unsigned long size;
unsigned long size, populate;
int nr_pages;
/* Compensate for the ring buffer's head/tail overlap entry */
......@@ -129,7 +129,8 @@ static int aio_setup_ring(struct kioctx *ctx)
down_write(&ctx->mm->mmap_sem);
info->mmap_base = do_mmap_pgoff(NULL, 0, info->mmap_size,
PROT_READ|PROT_WRITE,
MAP_ANONYMOUS|MAP_PRIVATE, 0);
MAP_ANONYMOUS|MAP_PRIVATE, 0,
&populate);
if (IS_ERR((void *)info->mmap_base)) {
up_write(&ctx->mm->mmap_sem);
info->mmap_size = 0;
......@@ -147,6 +148,8 @@ static int aio_setup_ring(struct kioctx *ctx)
aio_free_ring(ctx);
return -EAGAIN;
}
if (populate)
mm_populate(info->mmap_base, populate);
ctx->user_id = info->mmap_base;
......
......@@ -3227,7 +3227,7 @@ static struct kmem_cache *bh_cachep __read_mostly;
* Once the number of bh's in the machine exceeds this level, we start
* stripping them in writeback.
*/
static int max_buffer_heads;
static unsigned long max_buffer_heads;
int buffer_heads_over_limit;
......@@ -3343,7 +3343,7 @@ EXPORT_SYMBOL(bh_submit_read);
void __init buffer_init(void)
{
int nrpages;
unsigned long nrpages;
bh_cachep = kmem_cache_create("buffer_head",
sizeof(struct buffer_head), 0,
......
......@@ -151,7 +151,7 @@ get_nfs4_file(struct nfs4_file *fi)
}
static int num_delegations;
unsigned int max_delegations;
unsigned long max_delegations;
/*
* Open owner state (share locks)
......@@ -700,8 +700,8 @@ static int nfsd4_get_drc_mem(int slotsize, u32 num)
num = min_t(u32, num, NFSD_MAX_SLOTS_PER_SESSION);
spin_lock(&nfsd_drc_lock);
avail = min_t(int, NFSD_MAX_MEM_PER_SESSION,
nfsd_drc_max_mem - nfsd_drc_mem_used);
avail = min((unsigned long)NFSD_MAX_MEM_PER_SESSION,
nfsd_drc_max_mem - nfsd_drc_mem_used);
num = min_t(int, num, avail / slotsize);
nfsd_drc_mem_used += num * slotsize;
spin_unlock(&nfsd_drc_lock);
......
......@@ -56,8 +56,8 @@ extern struct svc_version nfsd_version2, nfsd_version3,
extern u32 nfsd_supported_minorversion;
extern struct mutex nfsd_mutex;
extern spinlock_t nfsd_drc_lock;
extern unsigned int nfsd_drc_max_mem;
extern unsigned int nfsd_drc_mem_used;
extern unsigned long nfsd_drc_max_mem;
extern unsigned long nfsd_drc_mem_used;
extern const struct seq_operations nfs_exports_op;
......@@ -106,7 +106,7 @@ static inline int nfsd_v4client(struct svc_rqst *rq)
* NFSv4 State
*/
#ifdef CONFIG_NFSD_V4
extern unsigned int max_delegations;
extern unsigned long max_delegations;
void nfs4_state_init(void);
int nfsd4_init_slabs(void);
void nfsd4_free_slabs(void);
......
......@@ -59,8 +59,8 @@ DEFINE_MUTEX(nfsd_mutex);
* nfsd_drc_pages_used tracks the current version 4.1 DRC memory usage.
*/
spinlock_t nfsd_drc_lock;
unsigned int nfsd_drc_max_mem;
unsigned int nfsd_drc_mem_used;
unsigned long nfsd_drc_max_mem;
unsigned long nfsd_drc_mem_used;
#if defined(CONFIG_NFSD_V2_ACL) || defined(CONFIG_NFSD_V3_ACL)
static struct svc_stat nfsd_acl_svcstats;
......@@ -342,7 +342,7 @@ static void set_max_drc(void)
>> NFSD_DRC_SIZE_SHIFT) * PAGE_SIZE;
nfsd_drc_mem_used = 0;
spin_lock_init(&nfsd_drc_lock);
dprintk("%s nfsd_drc_max_mem %u \n", __func__, nfsd_drc_max_mem);
dprintk("%s nfsd_drc_max_mem %lu \n", __func__, nfsd_drc_max_mem);
}
static int nfsd_get_default_max_blksize(void)
......
......@@ -40,7 +40,7 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
* sysctl_overcommit_ratio / 100) + total_swap_pages;
cached = global_page_state(NR_FILE_PAGES) -
total_swapcache_pages - i.bufferram;
total_swapcache_pages() - i.bufferram;
if (cached < 0)
cached = 0;
......@@ -109,7 +109,7 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
K(i.freeram),
K(i.bufferram),
K(cached),
K(total_swapcache_pages),
K(total_swapcache_pages()),
K(pages[LRU_ACTIVE_ANON] + pages[LRU_ACTIVE_FILE]),
K(pages[LRU_INACTIVE_ANON] + pages[LRU_INACTIVE_FILE]),
K(pages[LRU_ACTIVE_ANON]),
......@@ -158,7 +158,7 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
vmi.used >> 10,
vmi.largest_chunk >> 10
#ifdef CONFIG_MEMORY_FAILURE
,atomic_long_read(&mce_bad_pages) << (PAGE_SHIFT - 10)
,atomic_long_read(&num_poisoned_pages) << (PAGE_SHIFT - 10)
#endif
#ifdef CONFIG_TRANSPARENT_HUGEPAGE
,K(global_page_state(NR_ANON_TRANSPARENT_HUGEPAGES) *
......
......@@ -485,6 +485,14 @@ static inline bool acpi_driver_match_device(struct device *dev,
#endif /* !CONFIG_ACPI */
#ifdef CONFIG_ACPI_NUMA
void __init early_parse_srat(void);
#else
static inline void early_parse_srat(void)
{
}
#endif
#ifdef CONFIG_ACPI
void acpi_os_set_prepare_sleep(int (*func)(u8 sleep_state,
u32 pm1a_ctrl, u32 pm1b_ctrl));
......
......@@ -53,6 +53,7 @@ extern void free_bootmem_node(pg_data_t *pgdat,
unsigned long size);
extern void free_bootmem(unsigned long physaddr, unsigned long size);
extern void free_bootmem_late(unsigned long physaddr, unsigned long size);
extern void __free_pages_bootmem(struct page *page, unsigned int order);
/*
* Flags for reserve_bootmem (also if CONFIG_HAVE_ARCH_BOOTMEM_NODE,
......
......@@ -23,7 +23,7 @@ extern int fragmentation_index(struct zone *zone, unsigned int order);
extern unsigned long try_to_compact_pages(struct zonelist *zonelist,
int order, gfp_t gfp_mask, nodemask_t *mask,
bool sync, bool *contended);
extern int compact_pgdat(pg_data_t *pgdat, int order);
extern void compact_pgdat(pg_data_t *pgdat, int order);
extern void reset_isolation_suitable(pg_data_t *pgdat);
extern unsigned long compaction_suitable(struct zone *zone, int order);
......@@ -80,9 +80,8 @@ static inline unsigned long try_to_compact_pages(struct zonelist *zonelist,
return COMPACT_CONTINUE;
}
static inline int compact_pgdat(pg_data_t *pgdat, int order)
static inline void compact_pgdat(pg_data_t *pgdat, int order)
{
return COMPACT_CONTINUE;
}
static inline void reset_isolation_suitable(pg_data_t *pgdat)
......
......@@ -25,6 +25,7 @@
int firmware_map_add_early(u64 start, u64 end, const char *type);
int firmware_map_add_hotplug(u64 start, u64 end, const char *type);
int firmware_map_remove(u64 start, u64 end, const char *type);
#else /* CONFIG_FIRMWARE_MEMMAP */
......@@ -38,6 +39,11 @@ static inline int firmware_map_add_hotplug(u64 start, u64 end, const char *type)
return 0;
}
static inline int firmware_map_remove(u64 start, u64 end, const char *type)
{
return 0;
}
#endif /* CONFIG_FIRMWARE_MEMMAP */
#endif /* _LINUX_FIRMWARE_MAP_H */
......@@ -219,12 +219,6 @@ static inline void zero_user(struct page *page,
zero_user_segments(page, start, start + size, 0, 0);
}
static inline void __deprecated memclear_highpage_flush(struct page *page,
unsigned int offset, unsigned int size)
{
zero_user(page, offset, size);
}
#ifndef __HAVE_ARCH_COPY_USER_HIGHPAGE
static inline void copy_user_highpage(struct page *to, struct page *from,
......
......@@ -113,7 +113,7 @@ extern void __split_huge_page_pmd(struct vm_area_struct *vma,
do { \
pmd_t *____pmd = (__pmd); \
anon_vma_lock_write(__anon_vma); \
anon_vma_unlock(__anon_vma); \
anon_vma_unlock_write(__anon_vma); \
BUG_ON(pmd_trans_splitting(*____pmd) || \
pmd_trans_huge(*____pmd)); \
} while (0)
......
......@@ -43,9 +43,9 @@ int hugetlb_mempolicy_sysctl_handler(struct ctl_table *, int,
#endif
int copy_hugetlb_page_range(struct mm_struct *, struct mm_struct *, struct vm_area_struct *);
int follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *,
struct page **, struct vm_area_struct **,
unsigned long *, int *, int, unsigned int flags);
long follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *,
struct page **, struct vm_area_struct **,
unsigned long *, unsigned long *, long, unsigned int);
void unmap_hugepage_range(struct vm_area_struct *,
unsigned long, unsigned long, struct page *);
void __unmap_hugepage_range_final(struct mmu_gather *tlb,
......
......@@ -16,9 +16,6 @@
struct stable_node;
struct mem_cgroup;
struct page *ksm_does_need_to_copy(struct page *page,
struct vm_area_struct *vma, unsigned long address);
#ifdef CONFIG_KSM
int ksm_madvise(struct vm_area_struct *vma, unsigned long start,
unsigned long end, int advice, unsigned long *vm_flags);
......@@ -73,15 +70,8 @@ static inline void set_page_stable_node(struct page *page,
* We'd like to make this conditional on vma->vm_flags & VM_MERGEABLE,
* but what if the vma was unmerged while the page was swapped out?
*/
static inline int ksm_might_need_to_copy(struct page *page,
struct vm_area_struct *vma, unsigned long address)
{
struct anon_vma *anon_vma = page_anon_vma(page);
return anon_vma &&
(anon_vma->root != vma->anon_vma->root ||
page->index != linear_page_index(vma, address));
}
struct page *ksm_might_need_to_copy(struct page *page,
struct vm_area_struct *vma, unsigned long address);
int page_referenced_ksm(struct page *page,
struct mem_cgroup *memcg, unsigned long *vm_flags);
......@@ -113,10 +103,10 @@ static inline int ksm_madvise(struct vm_area_struct *vma, unsigned long start,
return 0;
}
static inline int ksm_might_need_to_copy(struct page *page,
static inline struct page *ksm_might_need_to_copy(struct page *page,
struct vm_area_struct *vma, unsigned long address)
{
return 0;
return page;
}
static inline int page_referenced_ksm(struct page *page,
......
......@@ -42,6 +42,7 @@ struct memblock {
extern struct memblock memblock;
extern int memblock_debug;
extern struct movablemem_map movablemem_map;
#define memblock_dbg(fmt, ...) \
if (memblock_debug) printk(KERN_INFO pr_fmt(fmt), ##__VA_ARGS__)
......@@ -60,6 +61,7 @@ int memblock_reserve(phys_addr_t base, phys_addr_t size);
void memblock_trim_memory(phys_addr_t align);
#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
void __next_mem_pfn_range(int *idx, int nid, unsigned long *out_start_pfn,
unsigned long *out_end_pfn, int *out_nid);
......
......@@ -116,7 +116,6 @@ void mem_cgroup_iter_break(struct mem_cgroup *, struct mem_cgroup *);
* For memory reclaim.
*/
int mem_cgroup_inactive_anon_is_low(struct lruvec *lruvec);
int mem_cgroup_inactive_file_is_low(struct lruvec *lruvec);
int mem_cgroup_select_victim_node(struct mem_cgroup *memcg);
unsigned long mem_cgroup_get_lru_size(struct lruvec *lruvec, enum lru_list);
void mem_cgroup_update_lru_size(struct lruvec *, enum lru_list, int);
......@@ -321,12 +320,6 @@ mem_cgroup_inactive_anon_is_low(struct lruvec *lruvec)
return 1;
}
static inline int
mem_cgroup_inactive_file_is_low(struct lruvec *lruvec)
{
return 1;
}
static inline unsigned long
mem_cgroup_get_lru_size(struct lruvec *lruvec, enum lru_list lru)
{
......
......@@ -96,6 +96,7 @@ extern void __online_page_free(struct page *page);
#ifdef CONFIG_MEMORY_HOTREMOVE
extern bool is_pageblock_removable_nolock(struct page *page);
extern int arch_remove_memory(u64 start, u64 size);
#endif /* CONFIG_MEMORY_HOTREMOVE */
/* reasonably generic interface to expand the physical pages in a zone */
......@@ -173,17 +174,16 @@ static inline void arch_refresh_nodedata(int nid, pg_data_t *pgdat)
#endif /* CONFIG_NUMA */
#endif /* CONFIG_HAVE_ARCH_NODEDATA_EXTENSION */
#ifdef CONFIG_SPARSEMEM_VMEMMAP
#ifdef CONFIG_HAVE_BOOTMEM_INFO_NODE
extern void register_page_bootmem_info_node(struct pglist_data *pgdat);
#else
static inline void register_page_bootmem_info_node(struct pglist_data *pgdat)
{
}
static inline void put_page_bootmem(struct page *page)
{
}
#else
extern void register_page_bootmem_info_node(struct pglist_data *pgdat);
extern void put_page_bootmem(struct page *page);
#endif
extern void put_page_bootmem(struct page *page);
extern void get_page_bootmem(unsigned long ingo, struct page *page,
unsigned long type);
/*
* Lock for memory hotplug guarantees 1) all callbacks for memory hotplug
......@@ -233,6 +233,7 @@ static inline void unlock_memory_hotplug(void) {}
#ifdef CONFIG_MEMORY_HOTREMOVE
extern int is_mem_section_removable(unsigned long pfn, unsigned long nr_pages);
extern void try_offline_node(int nid);
#else
static inline int is_mem_section_removable(unsigned long pfn,
......@@ -240,6 +241,8 @@ static inline int is_mem_section_removable(unsigned long pfn,
{
return 0;
}
static inline void try_offline_node(int nid) {}
#endif /* CONFIG_MEMORY_HOTREMOVE */
extern int mem_online_node(int nid);
......@@ -247,7 +250,8 @@ extern int add_memory(int nid, u64 start, u64 size);
extern int arch_add_memory(int nid, u64 start, u64 size);
extern int offline_pages(unsigned long start_pfn, unsigned long nr_pages);
extern int offline_memory_block(struct memory_block *mem);
extern int remove_memory(u64 start, u64 size);
extern bool is_memblock_offlined(struct memory_block *mem);
extern int remove_memory(int nid, u64 start, u64 size);
extern int sparse_add_one_section(struct zone *zone, unsigned long start_pfn,
int nr_pages);
extern void sparse_remove_one_section(struct zone *zone, struct mem_section *ms);
......
......@@ -40,11 +40,9 @@ extern void putback_movable_pages(struct list_head *l);
extern int migrate_page(struct address_space *,
struct page *, struct page *, enum migrate_mode);
extern int migrate_pages(struct list_head *l, new_page_t x,
unsigned long private, bool offlining,
enum migrate_mode mode, int reason);
unsigned long private, enum migrate_mode mode, int reason);
extern int migrate_huge_page(struct page *, new_page_t x,
unsigned long private, bool offlining,
enum migrate_mode mode);
unsigned long private, enum migrate_mode mode);
extern int fail_migrate_page(struct address_space *,
struct page *, struct page *);
......@@ -62,11 +60,11 @@ extern int migrate_huge_page_move_mapping(struct address_space *mapping,
static inline void putback_lru_pages(struct list_head *l) {}
static inline void putback_movable_pages(struct list_head *l) {}
static inline int migrate_pages(struct list_head *l, new_page_t x,
unsigned long private, bool offlining,
enum migrate_mode mode, int reason) { return -ENOSYS; }
unsigned long private, enum migrate_mode mode, int reason)
{ return -ENOSYS; }
static inline int migrate_huge_page(struct page *page, new_page_t x,
unsigned long private, bool offlining,
enum migrate_mode mode) { return -ENOSYS; }
unsigned long private, enum migrate_mode mode)
{ return -ENOSYS; }
static inline int migrate_prep(void) { return -ENOSYS; }
static inline int migrate_prep_local(void) { return -ENOSYS; }
......
......@@ -87,6 +87,7 @@ extern unsigned int kobjsize(const void *objp);
#define VM_PFNMAP 0x00000400 /* Page-ranges managed without "struct page", just pure PFN */
#define VM_DENYWRITE 0x00000800 /* ETXTBSY on write attempts.. */
#define VM_POPULATE 0x00001000
#define VM_LOCKED 0x00002000
#define VM_IO 0x00004000 /* Memory mapped I/O or similar */
......@@ -366,7 +367,7 @@ static inline struct page *compound_head(struct page *page)
* both from it and to it can be tracked, using atomic_inc_and_test
* and atomic_add_negative(-1).
*/
static inline void reset_page_mapcount(struct page *page)
static inline void page_mapcount_reset(struct page *page)
{
atomic_set(&(page)->_mapcount, -1);
}
......@@ -580,50 +581,11 @@ static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma)
* sets it, so none of the operations on it need to be atomic.
*/
/*
* page->flags layout:
*
* There are three possibilities for how page->flags get
* laid out. The first is for the normal case, without
* sparsemem. The second is for sparsemem when there is
* plenty of space for node and section. The last is when
* we have run out of space and have to fall back to an
* alternate (slower) way of determining the node.
*
* No sparsemem or sparsemem vmemmap: | NODE | ZONE | ... | FLAGS |
* classic sparse with space for node:| SECTION | NODE | ZONE | ... | FLAGS |
* classic sparse no space for node: | SECTION | ZONE | ... | FLAGS |
*/
#if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
#define SECTIONS_WIDTH SECTIONS_SHIFT
#else
#define SECTIONS_WIDTH 0
#endif
#define ZONES_WIDTH ZONES_SHIFT
#if SECTIONS_WIDTH+ZONES_WIDTH+NODES_SHIFT <= BITS_PER_LONG - NR_PAGEFLAGS
#define NODES_WIDTH NODES_SHIFT
#else
#ifdef CONFIG_SPARSEMEM_VMEMMAP
#error "Vmemmap: No space for nodes field in page flags"
#endif
#define NODES_WIDTH 0
#endif
/* Page flags: | [SECTION] | [NODE] | ZONE | ... | FLAGS | */
/* Page flags: | [SECTION] | [NODE] | ZONE | [LAST_NID] | ... | FLAGS | */
#define SECTIONS_PGOFF ((sizeof(unsigned long)*8) - SECTIONS_WIDTH)
#define NODES_PGOFF (SECTIONS_PGOFF - NODES_WIDTH)
#define ZONES_PGOFF (NODES_PGOFF - ZONES_WIDTH)
/*
* We are going to use the flags for the page to node mapping if its in
* there. This includes the case where there is no node, so it is implicit.
*/
#if !(NODES_WIDTH > 0 || NODES_SHIFT == 0)
#define NODE_NOT_IN_PAGE_FLAGS
#endif
#define LAST_NID_PGOFF (ZONES_PGOFF - LAST_NID_WIDTH)
/*
* Define the bit shifts to access each section. For non-existent
......@@ -633,6 +595,7 @@ static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma)
#define SECTIONS_PGSHIFT (SECTIONS_PGOFF * (SECTIONS_WIDTH != 0))
#define NODES_PGSHIFT (NODES_PGOFF * (NODES_WIDTH != 0))
#define ZONES_PGSHIFT (ZONES_PGOFF * (ZONES_WIDTH != 0))
#define LAST_NID_PGSHIFT (LAST_NID_PGOFF * (LAST_NID_WIDTH != 0))
/* NODE:ZONE or SECTION:ZONE is used to ID a zone for the buddy allocator */
#ifdef NODE_NOT_IN_PAGE_FLAGS
......@@ -654,6 +617,7 @@ static inline pte_t maybe_mkwrite(pte_t pte, struct vm_area_struct *vma)
#define ZONES_MASK ((1UL << ZONES_WIDTH) - 1)
#define NODES_MASK ((1UL << NODES_WIDTH) - 1)
#define SECTIONS_MASK ((1UL << SECTIONS_WIDTH) - 1)
#define LAST_NID_MASK ((1UL << LAST_NID_WIDTH) - 1)
#define ZONEID_MASK ((1UL << ZONEID_SHIFT) - 1)
static inline enum zone_type page_zonenum(const struct page *page)
......@@ -661,6 +625,10 @@ static inline enum zone_type page_zonenum(const struct page *page)
return (page->flags >> ZONES_PGSHIFT) & ZONES_MASK;
}
#if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
#define SECTION_IN_PAGE_FLAGS
#endif
/*
* The identification function is only used by the buddy allocator for
* determining if two pages could be buddies. We are not really
......@@ -693,31 +661,48 @@ static inline int page_to_nid(const struct page *page)
#endif
#ifdef CONFIG_NUMA_BALANCING
static inline int page_xchg_last_nid(struct page *page, int nid)
#ifdef LAST_NID_NOT_IN_PAGE_FLAGS
static inline int page_nid_xchg_last(struct page *page, int nid)
{
return xchg(&page->_last_nid, nid);
}
static inline int page_last_nid(struct page *page)
static inline int page_nid_last(struct page *page)
{
return page->_last_nid;
}
static inline void reset_page_last_nid(struct page *page)
static inline void page_nid_reset_last(struct page *page)
{
page->_last_nid = -1;
}
#else
static inline int page_xchg_last_nid(struct page *page, int nid)
static inline int page_nid_last(struct page *page)
{
return (page->flags >> LAST_NID_PGSHIFT) & LAST_NID_MASK;
}
extern int page_nid_xchg_last(struct page *page, int nid);
static inline void page_nid_reset_last(struct page *page)
{
int nid = (1 << LAST_NID_SHIFT) - 1;
page->flags &= ~(LAST_NID_MASK << LAST_NID_PGSHIFT);
page->flags |= (nid & LAST_NID_MASK) << LAST_NID_PGSHIFT;
}
#endif /* LAST_NID_NOT_IN_PAGE_FLAGS */
#else
static inline int page_nid_xchg_last(struct page *page, int nid)
{
return page_to_nid(page);
}
static inline int page_last_nid(struct page *page)
static inline int page_nid_last(struct page *page)
{
return page_to_nid(page);
}
static inline void reset_page_last_nid(struct page *page)
static inline void page_nid_reset_last(struct page *page)
{
}
#endif
......@@ -727,7 +712,7 @@ static inline struct zone *page_zone(const struct page *page)
return &NODE_DATA(page_to_nid(page))->node_zones[page_zonenum(page)];
}
#if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
#ifdef SECTION_IN_PAGE_FLAGS
static inline void set_page_section(struct page *page, unsigned long section)
{
page->flags &= ~(SECTIONS_MASK << SECTIONS_PGSHIFT);
......@@ -757,7 +742,7 @@ static inline void set_page_links(struct page *page, enum zone_type zone,
{
set_page_zone(page, zone);
set_page_node(page, node);
#if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
#ifdef SECTION_IN_PAGE_FLAGS
set_page_section(page, pfn_to_section_nr(pfn));
#endif
}
......@@ -817,18 +802,7 @@ void page_address_init(void);
#define PAGE_MAPPING_KSM 2
#define PAGE_MAPPING_FLAGS (PAGE_MAPPING_ANON | PAGE_MAPPING_KSM)
extern struct address_space swapper_space;
static inline struct address_space *page_mapping(struct page *page)
{
struct address_space *mapping = page->mapping;
VM_BUG_ON(PageSlab(page));
if (unlikely(PageSwapCache(page)))
mapping = &swapper_space;
else if ((unsigned long)mapping & PAGE_MAPPING_ANON)
mapping = NULL;
return mapping;
}
extern struct address_space *page_mapping(struct page *page);
/* Neutral page->mapping pointer to address_space or anon_vma or other */
static inline void *page_rmapping(struct page *page)
......@@ -1035,18 +1009,18 @@ static inline int fixup_user_fault(struct task_struct *tsk,
}
#endif
extern int make_pages_present(unsigned long addr, unsigned long end);
extern int access_process_vm(struct task_struct *tsk, unsigned long addr, void *buf, int len, int write);
extern int access_remote_vm(struct mm_struct *mm, unsigned long addr,
void *buf, int len, int write);
int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
unsigned long start, int len, unsigned int foll_flags,
struct page **pages, struct vm_area_struct **vmas,
int *nonblocking);
int get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
unsigned long start, int nr_pages, int write, int force,
struct page **pages, struct vm_area_struct **vmas);
long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
unsigned long start, unsigned long nr_pages,
unsigned int foll_flags, struct page **pages,
struct vm_area_struct **vmas, int *nonblocking);
long get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
unsigned long start, unsigned long nr_pages,
int write, int force, struct page **pages,
struct vm_area_struct **vmas);
int get_user_pages_fast(unsigned long start, int nr_pages, int write,
struct page **pages);
struct kvec;
......@@ -1359,6 +1333,24 @@ extern void free_bootmem_with_active_regions(int nid,
unsigned long max_low_pfn);
extern void sparse_memory_present_with_active_regions(int nid);
#define MOVABLEMEM_MAP_MAX MAX_NUMNODES
struct movablemem_entry {
unsigned long start_pfn; /* start pfn of memory segment */
unsigned long end_pfn; /* end pfn of memory segment (exclusive) */
};
struct movablemem_map {
bool acpi; /* true if using SRAT info */
int nr_map;
struct movablemem_entry map[MOVABLEMEM_MAP_MAX];
nodemask_t numa_nodes_hotplug; /* on which nodes we specify memory */
nodemask_t numa_nodes_kernel; /* on which nodes kernel resides in */
};
extern void __init insert_movablemem_map(unsigned long start_pfn,
unsigned long end_pfn);
extern int __init movablemem_map_overlap(unsigned long start_pfn,
unsigned long end_pfn);
#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
#if !defined(CONFIG_HAVE_MEMBLOCK_NODE_MAP) && \
......@@ -1395,6 +1387,9 @@ extern void setup_per_cpu_pageset(void);
extern void zone_pcp_update(struct zone *zone);
extern void zone_pcp_reset(struct zone *zone);
/* page_alloc.c */
extern int min_free_kbytes;
/* nommu.c */
extern atomic_long_t mmap_pages_allocated;
extern int nommu_shrink_inode_mappings(struct inode *, size_t, size_t);
......@@ -1472,13 +1467,24 @@ extern int install_special_mapping(struct mm_struct *mm,
extern unsigned long get_unmapped_area(struct file *, unsigned long, unsigned long, unsigned long, unsigned long);
extern unsigned long mmap_region(struct file *file, unsigned long addr,
unsigned long len, unsigned long flags,
vm_flags_t vm_flags, unsigned long pgoff);
extern unsigned long do_mmap_pgoff(struct file *, unsigned long,
unsigned long, unsigned long,
unsigned long, unsigned long);
unsigned long len, vm_flags_t vm_flags, unsigned long pgoff);
extern unsigned long do_mmap_pgoff(struct file *file, unsigned long addr,
unsigned long len, unsigned long prot, unsigned long flags,
unsigned long pgoff, unsigned long *populate);
extern int do_munmap(struct mm_struct *, unsigned long, size_t);
#ifdef CONFIG_MMU
extern int __mm_populate(unsigned long addr, unsigned long len,
int ignore_errors);
static inline void mm_populate(unsigned long addr, unsigned long len)
{
/* Ignore errors */
(void) __mm_populate(addr, len, 1);
}
#else
static inline void mm_populate(unsigned long addr, unsigned long len) {}
#endif
/* These take the mm semaphore themselves */
extern unsigned long vm_brk(unsigned long, unsigned long);
extern int vm_munmap(unsigned long, size_t);
......@@ -1623,8 +1629,17 @@ int vm_insert_pfn(struct vm_area_struct *vma, unsigned long addr,
int vm_insert_mixed(struct vm_area_struct *vma, unsigned long addr,
unsigned long pfn);
struct page *follow_page(struct vm_area_struct *, unsigned long address,
unsigned int foll_flags);
struct page *follow_page_mask(struct vm_area_struct *vma,
unsigned long address, unsigned int foll_flags,
unsigned int *page_mask);
static inline struct page *follow_page(struct vm_area_struct *vma,
unsigned long address, unsigned int foll_flags)
{
unsigned int unused_page_mask;
return follow_page_mask(vma, address, foll_flags, &unused_page_mask);
}
#define FOLL_WRITE 0x01 /* check pte is writable */
#define FOLL_TOUCH 0x02 /* mark page accessed */
#define FOLL_GET 0x04 /* do get_page on page */
......@@ -1636,6 +1651,7 @@ struct page *follow_page(struct vm_area_struct *, unsigned long address,
#define FOLL_SPLIT 0x80 /* don't return transhuge pages, split them */
#define FOLL_HWPOISON 0x100 /* check page is hwpoisoned */
#define FOLL_NUMA 0x200 /* force NUMA hinting page fault */
#define FOLL_MIGRATION 0x400 /* wait for page to replace migration entry */
typedef int (*pte_fn_t)(pte_t *pte, pgtable_t token, unsigned long addr,
void *data);
......@@ -1707,7 +1723,11 @@ int vmemmap_populate_basepages(struct page *start_page,
unsigned long pages, int node);
int vmemmap_populate(struct page *start_page, unsigned long pages, int node);
void vmemmap_populate_print_last(void);
#ifdef CONFIG_MEMORY_HOTPLUG
void vmemmap_free(struct page *memmap, unsigned long nr_pages);
#endif
void register_page_bootmem_memmap(unsigned long section_nr, struct page *map,
unsigned long size);
enum mf_flags {
MF_COUNT_INCREASED = 1 << 0,
......@@ -1720,7 +1740,7 @@ extern int unpoison_memory(unsigned long pfn);
extern int sysctl_memory_failure_early_kill;
extern int sysctl_memory_failure_recovery;
extern void shake_page(struct page *p, int access);
extern atomic_long_t mce_bad_pages;
extern atomic_long_t num_poisoned_pages;
extern int soft_offline_page(struct page *page, int flags);
extern void dump_page(struct page *page);
......
......@@ -12,6 +12,7 @@
#include <linux/cpumask.h>
#include <linux/page-debug-flags.h>
#include <linux/uprobes.h>
#include <linux/page-flags-layout.h>
#include <asm/page.h>
#include <asm/mmu.h>
......@@ -173,7 +174,7 @@ struct page {
void *shadow;
#endif
#ifdef CONFIG_NUMA_BALANCING
#ifdef LAST_NID_NOT_IN_PAGE_FLAGS
int _last_nid;
#endif
}
......@@ -414,9 +415,9 @@ struct mm_struct {
#endif
#ifdef CONFIG_NUMA_BALANCING
/*
* numa_next_scan is the next time when the PTEs will me marked
* pte_numa to gather statistics and migrate pages to new nodes
* if necessary
* numa_next_scan is the next time that the PTEs will be marked
* pte_numa. NUMA hinting faults will gather statistics and migrate
* pages to new nodes if necessary.
*/
unsigned long numa_next_scan;
......
......@@ -79,6 +79,8 @@ calc_vm_flag_bits(unsigned long flags)
{
return _calc_vm_trans(flags, MAP_GROWSDOWN, VM_GROWSDOWN ) |
_calc_vm_trans(flags, MAP_DENYWRITE, VM_DENYWRITE ) |
_calc_vm_trans(flags, MAP_LOCKED, VM_LOCKED );
((flags & MAP_LOCKED) ? (VM_LOCKED | VM_POPULATE) : 0) |
(((flags & (MAP_POPULATE | MAP_NONBLOCK)) == MAP_POPULATE) ?
VM_POPULATE : 0);
}
#endif /* _LINUX_MMAN_H */
......@@ -15,7 +15,7 @@
#include <linux/seqlock.h>
#include <linux/nodemask.h>
#include <linux/pageblock-flags.h>
#include <generated/bounds.h>
#include <linux/page-flags-layout.h>
#include <linux/atomic.h>
#include <asm/page.h>
......@@ -57,7 +57,9 @@ enum {
*/
MIGRATE_CMA,
#endif
#ifdef CONFIG_MEMORY_ISOLATION
MIGRATE_ISOLATE, /* can't allocate from here */
#endif
MIGRATE_TYPES
};
......@@ -308,24 +310,6 @@ enum zone_type {
#ifndef __GENERATING_BOUNDS_H
/*
* When a memory allocation must conform to specific limitations (such
* as being suitable for DMA) the caller will pass in hints to the
* allocator in the gfp_mask, in the zone modifier bits. These bits
* are used to select a priority ordered list of memory zones which
* match the requested limits. See gfp_zone() in include/linux/gfp.h
*/
#if MAX_NR_ZONES < 2
#define ZONES_SHIFT 0
#elif MAX_NR_ZONES <= 2
#define ZONES_SHIFT 1
#elif MAX_NR_ZONES <= 4
#define ZONES_SHIFT 2
#else
#error ZONES_SHIFT -- too many zones configured adjust calculation
#endif
struct zone {
/* Fields commonly accessed by the page allocator */
......@@ -543,6 +527,26 @@ static inline int zone_is_oom_locked(const struct zone *zone)
return test_bit(ZONE_OOM_LOCKED, &zone->flags);
}
static inline unsigned zone_end_pfn(const struct zone *zone)
{
return zone->zone_start_pfn + zone->spanned_pages;
}
static inline bool zone_spans_pfn(const struct zone *zone, unsigned long pfn)
{
return zone->zone_start_pfn <= pfn && pfn < zone_end_pfn(zone);
}
static inline bool zone_is_initialized(struct zone *zone)
{
return !!zone->wait_table;
}
static inline bool zone_is_empty(struct zone *zone)
{
return zone->spanned_pages == 0;
}
/*
* The "priority" of VM scanning is how much of the queues we will scan in one
* go. A value of 12 for DEF_PRIORITY implies that we will scan 1/4096th of the
......@@ -752,11 +756,17 @@ typedef struct pglist_data {
#define nid_page_nr(nid, pagenr) pgdat_page_nr(NODE_DATA(nid),(pagenr))
#define node_start_pfn(nid) (NODE_DATA(nid)->node_start_pfn)
#define node_end_pfn(nid) pgdat_end_pfn(NODE_DATA(nid))
#define node_end_pfn(nid) ({\
pg_data_t *__pgdat = NODE_DATA(nid);\
__pgdat->node_start_pfn + __pgdat->node_spanned_pages;\
})
static inline unsigned long pgdat_end_pfn(pg_data_t *pgdat)
{
return pgdat->node_start_pfn + pgdat->node_spanned_pages;
}
static inline bool pgdat_is_empty(pg_data_t *pgdat)
{
return !pgdat->node_start_pfn && !pgdat->node_spanned_pages;
}
#include <linux/memory_hotplug.h>
......@@ -1053,8 +1063,6 @@ static inline unsigned long early_pfn_to_nid(unsigned long pfn)
* PA_SECTION_SHIFT physical address to/from section number
* PFN_SECTION_SHIFT pfn to/from section number
*/
#define SECTIONS_SHIFT (MAX_PHYSMEM_BITS - SECTION_SIZE_BITS)
#define PA_SECTION_SHIFT (SECTION_SIZE_BITS)
#define PFN_SECTION_SHIFT (SECTION_SIZE_BITS - PAGE_SHIFT)
......
#ifndef PAGE_FLAGS_LAYOUT_H
#define PAGE_FLAGS_LAYOUT_H
#include <linux/numa.h>
#include <generated/bounds.h>
/*
* When a memory allocation must conform to specific limitations (such
* as being suitable for DMA) the caller will pass in hints to the
* allocator in the gfp_mask, in the zone modifier bits. These bits
* are used to select a priority ordered list of memory zones which
* match the requested limits. See gfp_zone() in include/linux/gfp.h
*/
#if MAX_NR_ZONES < 2
#define ZONES_SHIFT 0
#elif MAX_NR_ZONES <= 2
#define ZONES_SHIFT 1
#elif MAX_NR_ZONES <= 4
#define ZONES_SHIFT 2
#else
#error ZONES_SHIFT -- too many zones configured adjust calculation
#endif
#ifdef CONFIG_SPARSEMEM
#include <asm/sparsemem.h>
/* SECTION_SHIFT #bits space required to store a section # */
#define SECTIONS_SHIFT (MAX_PHYSMEM_BITS - SECTION_SIZE_BITS)
#endif /* CONFIG_SPARSEMEM */
/*
* page->flags layout:
*
* There are five possibilities for how page->flags get laid out. The first
* pair is for the normal case without sparsemem. The second pair is for
* sparsemem when there is plenty of space for node and section information.
* The last is when there is insufficient space in page->flags and a separate
* lookup is necessary.
*
* No sparsemem or sparsemem vmemmap: | NODE | ZONE | ... | FLAGS |
* " plus space for last_nid: | NODE | ZONE | LAST_NID ... | FLAGS |
* classic sparse with space for node:| SECTION | NODE | ZONE | ... | FLAGS |
* " plus space for last_nid: | SECTION | NODE | ZONE | LAST_NID ... | FLAGS |
* classic sparse no space for node: | SECTION | ZONE | ... | FLAGS |
*/
#if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
#define SECTIONS_WIDTH SECTIONS_SHIFT
#else
#define SECTIONS_WIDTH 0
#endif
#define ZONES_WIDTH ZONES_SHIFT
#if SECTIONS_WIDTH+ZONES_WIDTH+NODES_SHIFT <= BITS_PER_LONG - NR_PAGEFLAGS
#define NODES_WIDTH NODES_SHIFT
#else
#ifdef CONFIG_SPARSEMEM_VMEMMAP
#error "Vmemmap: No space for nodes field in page flags"
#endif
#define NODES_WIDTH 0
#endif
#ifdef CONFIG_NUMA_BALANCING
#define LAST_NID_SHIFT NODES_SHIFT
#else
#define LAST_NID_SHIFT 0
#endif
#if SECTIONS_WIDTH+ZONES_WIDTH+NODES_SHIFT+LAST_NID_SHIFT <= BITS_PER_LONG - NR_PAGEFLAGS
#define LAST_NID_WIDTH LAST_NID_SHIFT
#else
#define LAST_NID_WIDTH 0
#endif
/*
* We are going to use the flags for the page to node mapping if its in
* there. This includes the case where there is no node, so it is implicit.
*/
#if !(NODES_WIDTH > 0 || NODES_SHIFT == 0)
#define NODE_NOT_IN_PAGE_FLAGS
#endif
#if defined(CONFIG_NUMA_BALANCING) && LAST_NID_WIDTH == 0
#define LAST_NID_NOT_IN_PAGE_FLAGS
#endif
#endif /* _LINUX_PAGE_FLAGS_LAYOUT */
#ifndef __LINUX_PAGEISOLATION_H
#define __LINUX_PAGEISOLATION_H
#ifdef CONFIG_MEMORY_ISOLATION
static inline bool is_migrate_isolate_page(struct page *page)
{
return get_pageblock_migratetype(page) == MIGRATE_ISOLATE;
}
static inline bool is_migrate_isolate(int migratetype)
{
return migratetype == MIGRATE_ISOLATE;
}
#else
static inline bool is_migrate_isolate_page(struct page *page)
{
return false;
}
static inline bool is_migrate_isolate(int migratetype)
{
return false;
}
#endif
bool has_unmovable_pages(struct zone *zone, struct page *page, int count,
bool skip_hwpoisoned_pages);
......
......@@ -537,6 +537,7 @@ struct dev_pm_info {
unsigned int irq_safe:1;
unsigned int use_autosuspend:1;
unsigned int timer_autosuspends:1;
unsigned int memalloc_noio:1;
enum rpm_request request;
enum rpm_status runtime_status;
int runtime_error;
......
......@@ -47,6 +47,7 @@ extern void pm_runtime_set_autosuspend_delay(struct device *dev, int delay);
extern unsigned long pm_runtime_autosuspend_expiration(struct device *dev);
extern void pm_runtime_update_max_time_suspended(struct device *dev,
s64 delta_ns);
extern void pm_runtime_set_memalloc_noio(struct device *dev, bool enable);
static inline bool pm_children_suspended(struct device *dev)
{
......@@ -156,6 +157,8 @@ static inline void pm_runtime_set_autosuspend_delay(struct device *dev,
int delay) {}
static inline unsigned long pm_runtime_autosuspend_expiration(
struct device *dev) { return 0; }
static inline void pm_runtime_set_memalloc_noio(struct device *dev,
bool enable){}
#endif /* !CONFIG_PM_RUNTIME */
......
......@@ -123,7 +123,7 @@ static inline void anon_vma_lock_write(struct anon_vma *anon_vma)
down_write(&anon_vma->root->rwsem);
}
static inline void anon_vma_unlock(struct anon_vma *anon_vma)
static inline void anon_vma_unlock_write(struct anon_vma *anon_vma)
{
up_write(&anon_vma->root->rwsem);
}
......
......@@ -51,6 +51,7 @@ struct sched_param {
#include <linux/cred.h>
#include <linux/llist.h>
#include <linux/uidgid.h>
#include <linux/gfp.h>
#include <asm/processor.h>
......@@ -1791,6 +1792,7 @@ extern void thread_group_cputime_adjusted(struct task_struct *p, cputime_t *ut,
#define PF_FROZEN 0x00010000 /* frozen for system suspend */
#define PF_FSTRANS 0x00020000 /* inside a filesystem transaction */
#define PF_KSWAPD 0x00040000 /* I am kswapd */
#define PF_MEMALLOC_NOIO 0x00080000 /* Allocating memory without IO involved */
#define PF_LESS_THROTTLE 0x00100000 /* Throttle me less: I clean memory */
#define PF_KTHREAD 0x00200000 /* I am a kernel thread */
#define PF_RANDOMIZE 0x00400000 /* randomize virtual address space */
......@@ -1828,6 +1830,26 @@ extern void thread_group_cputime_adjusted(struct task_struct *p, cputime_t *ut,
#define tsk_used_math(p) ((p)->flags & PF_USED_MATH)
#define used_math() tsk_used_math(current)
/* __GFP_IO isn't allowed if PF_MEMALLOC_NOIO is set in current->flags */
static inline gfp_t memalloc_noio_flags(gfp_t flags)
{
if (unlikely(current->flags & PF_MEMALLOC_NOIO))
flags &= ~__GFP_IO;
return flags;
}
static inline unsigned int memalloc_noio_save(void)
{
unsigned int flags = current->flags & PF_MEMALLOC_NOIO;
current->flags |= PF_MEMALLOC_NOIO;
return flags;
}
static inline void memalloc_noio_restore(unsigned int flags)
{
current->flags = (current->flags & ~PF_MEMALLOC_NOIO) | flags;
}
/*
* task->jobctl flags
*/
......
......@@ -8,7 +8,7 @@
#include <linux/memcontrol.h>
#include <linux/sched.h>
#include <linux/node.h>
#include <linux/fs.h>
#include <linux/atomic.h>
#include <asm/page.h>
......@@ -156,7 +156,7 @@ enum {
SWP_SCANNING = (1 << 8), /* refcount in scan_swap_map */
};
#define SWAP_CLUSTER_MAX 32
#define SWAP_CLUSTER_MAX 32UL
#define COMPACT_CLUSTER_MAX SWAP_CLUSTER_MAX
/*
......@@ -202,6 +202,18 @@ struct swap_info_struct {
unsigned long *frontswap_map; /* frontswap in-use, one bit per page */
atomic_t frontswap_pages; /* frontswap pages in-use counter */
#endif
spinlock_t lock; /*
* protect map scan related fields like
* swap_map, lowest_bit, highest_bit,
* inuse_pages, cluster_next,
* cluster_nr, lowest_alloc and
* highest_alloc. other fields are only
* changed at swapon/swapoff, so are
* protected by swap_lock. changing
* flags need hold this lock and
* swap_lock. If both locks need hold,
* hold swap_lock first.
*/
};
struct swap_list_t {
......@@ -209,15 +221,12 @@ struct swap_list_t {
int next; /* swapfile to be used next */
};
/* Swap 50% full? Release swapcache more aggressively.. */
#define vm_swap_full() (nr_swap_pages*2 < total_swap_pages)
/* linux/mm/page_alloc.c */
extern unsigned long totalram_pages;
extern unsigned long totalreserve_pages;
extern unsigned long dirty_balance_reserve;
extern unsigned int nr_free_buffer_pages(void);
extern unsigned int nr_free_pagecache_pages(void);
extern unsigned long nr_free_buffer_pages(void);
extern unsigned long nr_free_pagecache_pages(void);
/* Definition of global_page_state not available yet */
#define nr_free_pages() global_page_state(NR_FREE_PAGES)
......@@ -266,7 +275,7 @@ extern unsigned long mem_cgroup_shrink_node_zone(struct mem_cgroup *mem,
extern unsigned long shrink_all_memory(unsigned long nr_pages);
extern int vm_swappiness;
extern int remove_mapping(struct address_space *mapping, struct page *page);
extern long vm_total_pages;
extern unsigned long vm_total_pages;
#ifdef CONFIG_NUMA
extern int zone_reclaim_mode;
......@@ -330,8 +339,9 @@ int generic_swapfile_activate(struct swap_info_struct *, struct file *,
sector_t *);
/* linux/mm/swap_state.c */
extern struct address_space swapper_space;
#define total_swapcache_pages swapper_space.nrpages
extern struct address_space swapper_spaces[];
#define swap_address_space(entry) (&swapper_spaces[swp_type(entry)])
extern unsigned long total_swapcache_pages(void);
extern void show_swap_cache_info(void);
extern int add_to_swap(struct page *);
extern int add_to_swap_cache(struct page *, swp_entry_t, gfp_t);
......@@ -346,8 +356,20 @@ extern struct page *swapin_readahead(swp_entry_t, gfp_t,
struct vm_area_struct *vma, unsigned long addr);
/* linux/mm/swapfile.c */
extern long nr_swap_pages;
extern atomic_long_t nr_swap_pages;
extern long total_swap_pages;
/* Swap 50% full? Release swapcache more aggressively.. */
static inline bool vm_swap_full(void)
{
return atomic_long_read(&nr_swap_pages) * 2 < total_swap_pages;
}
static inline long get_nr_swap_pages(void)
{
return atomic_long_read(&nr_swap_pages);
}
extern void si_swapinfo(struct sysinfo *);
extern swp_entry_t get_swap_page(void);
extern swp_entry_t get_swap_page_of_type(int);
......@@ -380,9 +402,10 @@ mem_cgroup_uncharge_swapcache(struct page *page, swp_entry_t ent, bool swapout)
#else /* CONFIG_SWAP */
#define nr_swap_pages 0L
#define get_nr_swap_pages() 0L
#define total_swap_pages 0L
#define total_swapcache_pages 0UL
#define total_swapcache_pages() 0UL
#define vm_swap_full() 0
#define si_swapinfo(val) \
do { (val)->freeswap = (val)->totalswap = 0; } while (0)
......
......@@ -36,7 +36,6 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
#endif
PGINODESTEAL, SLABS_SCANNED, KSWAPD_INODESTEAL,
KSWAPD_LOW_WMARK_HIT_QUICKLY, KSWAPD_HIGH_WMARK_HIT_QUICKLY,
KSWAPD_SKIP_CONGESTION_WAIT,
PAGEOUTRUN, ALLOCSTALL, PGROTATED,
#ifdef CONFIG_NUMA_BALANCING
NUMA_PTE_UPDATES,
......
......@@ -85,7 +85,7 @@ static inline void vm_events_fold_cpu(int cpu)
#define count_vm_numa_events(x, y) count_vm_events(x, y)
#else
#define count_vm_numa_event(x) do {} while (0)
#define count_vm_numa_events(x, y) do {} while (0)
#define count_vm_numa_events(x, y) do { (void)(y); } while (0)
#endif /* CONFIG_NUMA_BALANCING */
#define __count_zone_vm_events(item, zone, delta) \
......
......@@ -967,11 +967,11 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg, ulong *raddr,
unsigned long flags;
unsigned long prot;
int acc_mode;
unsigned long user_addr;
struct ipc_namespace *ns;
struct shm_file_data *sfd;
struct path path;
fmode_t f_mode;
unsigned long populate = 0;
err = -EINVAL;
if (shmid < 0)
......@@ -1070,13 +1070,15 @@ long do_shmat(int shmid, char __user *shmaddr, int shmflg, ulong *raddr,
goto invalid;
}
user_addr = do_mmap_pgoff(file, addr, size, prot, flags, 0);
*raddr = user_addr;
addr = do_mmap_pgoff(file, addr, size, prot, flags, 0, &populate);
*raddr = addr;
err = 0;
if (IS_ERR_VALUE(user_addr))
err = (long)user_addr;
if (IS_ERR_VALUE(addr))
err = (long)addr;
invalid:
up_write(&current->mm->mmap_sem);
if (populate)
mm_populate(addr, populate);
out_fput:
fput(file);
......
......@@ -1132,18 +1132,28 @@ EXPORT_SYMBOL_GPL(kick_process);
*/
static int select_fallback_rq(int cpu, struct task_struct *p)
{
const struct cpumask *nodemask = cpumask_of_node(cpu_to_node(cpu));
int nid = cpu_to_node(cpu);
const struct cpumask *nodemask = NULL;
enum { cpuset, possible, fail } state = cpuset;
int dest_cpu;
/* Look for allowed, online CPU in same node. */
for_each_cpu(dest_cpu, nodemask) {
if (!cpu_online(dest_cpu))
continue;
if (!cpu_active(dest_cpu))
continue;
if (cpumask_test_cpu(dest_cpu, tsk_cpus_allowed(p)))
return dest_cpu;
/*
* If the node that the cpu is on has been offlined, cpu_to_node()
* will return -1. There is no cpu on the node, and we should
* select the cpu on the other node.
*/
if (nid != -1) {
nodemask = cpumask_of_node(nid);
/* Look for allowed, online CPU in same node. */
for_each_cpu(dest_cpu, nodemask) {
if (!cpu_online(dest_cpu))
continue;
if (!cpu_active(dest_cpu))
continue;
if (cpumask_test_cpu(dest_cpu, tsk_cpus_allowed(p)))
return dest_cpu;
}
}
for (;;) {
......
......@@ -105,7 +105,6 @@ extern char core_pattern[];
extern unsigned int core_pipe_limit;
#endif
extern int pid_max;
extern int min_free_kbytes;
extern int pid_max_min, pid_max_max;
extern int sysctl_drop_caches;
extern int percpu_pagelist_fraction;
......
......@@ -162,10 +162,16 @@ config MOVABLE_NODE
Say Y here if you want to hotplug a whole node.
Say N here if you want kernel to use memory on all nodes evenly.
#
# Only be set on architectures that have completely implemented memory hotplug
# feature. If you are not sure, don't touch it.
#
config HAVE_BOOTMEM_INFO_NODE
def_bool n
# eventually, we can have this option just 'select SPARSEMEM'
config MEMORY_HOTPLUG
bool "Allow for memory hot-add"
select MEMORY_ISOLATION
depends on SPARSEMEM || X86_64_ACPI_NUMA
depends on HOTPLUG && ARCH_ENABLE_MEMORY_HOTPLUG
depends on (IA64 || X86 || PPC_BOOK3S_64 || SUPERH || S390)
......@@ -176,6 +182,8 @@ config MEMORY_HOTPLUG_SPARSE
config MEMORY_HOTREMOVE
bool "Allow for memory hot remove"
select MEMORY_ISOLATION
select HAVE_BOOTMEM_INFO_NODE if X86_64
depends on MEMORY_HOTPLUG && ARCH_ENABLE_MEMORY_HOTREMOVE
depends on MIGRATION
......
......@@ -15,6 +15,7 @@
#include <linux/sysctl.h>
#include <linux/sysfs.h>
#include <linux/balloon_compaction.h>
#include <linux/page-isolation.h>
#include "internal.h"
#ifdef CONFIG_COMPACTION
......@@ -85,7 +86,7 @@ static inline bool isolation_suitable(struct compact_control *cc,
static void __reset_isolation_suitable(struct zone *zone)
{
unsigned long start_pfn = zone->zone_start_pfn;
unsigned long end_pfn = zone->zone_start_pfn + zone->spanned_pages;
unsigned long end_pfn = zone_end_pfn(zone);
unsigned long pfn;
zone->compact_cached_migrate_pfn = start_pfn;
......@@ -215,7 +216,10 @@ static bool suitable_migration_target(struct page *page)
int migratetype = get_pageblock_migratetype(page);
/* Don't interfere with memory hot-remove or the min_free_kbytes blocks */
if (migratetype == MIGRATE_ISOLATE || migratetype == MIGRATE_RESERVE)
if (migratetype == MIGRATE_RESERVE)
return false;
if (is_migrate_isolate(migratetype))
return false;
/* If the page is a large free page, then allow migration */
......@@ -611,8 +615,7 @@ isolate_migratepages_range(struct zone *zone, struct compact_control *cc,
continue;
next_pageblock:
low_pfn += pageblock_nr_pages;
low_pfn = ALIGN(low_pfn, pageblock_nr_pages) - 1;
low_pfn = ALIGN(low_pfn + 1, pageblock_nr_pages) - 1;
last_pageblock_nr = pageblock_nr;
}
......@@ -644,7 +647,7 @@ static void isolate_freepages(struct zone *zone,
struct compact_control *cc)
{
struct page *page;
unsigned long high_pfn, low_pfn, pfn, zone_end_pfn, end_pfn;
unsigned long high_pfn, low_pfn, pfn, z_end_pfn, end_pfn;
int nr_freepages = cc->nr_freepages;
struct list_head *freelist = &cc->freepages;
......@@ -663,7 +666,7 @@ static void isolate_freepages(struct zone *zone,
*/
high_pfn = min(low_pfn, pfn);
zone_end_pfn = zone->zone_start_pfn + zone->spanned_pages;
z_end_pfn = zone_end_pfn(zone);
/*
* Isolate free pages until enough are available to migrate the
......@@ -706,7 +709,7 @@ static void isolate_freepages(struct zone *zone,
* only scans within a pageblock
*/
end_pfn = ALIGN(pfn + 1, pageblock_nr_pages);
end_pfn = min(end_pfn, zone_end_pfn);
end_pfn = min(end_pfn, z_end_pfn);
isolated = isolate_freepages_block(cc, pfn, end_pfn,
freelist, false);
nr_freepages += isolated;
......@@ -795,7 +798,7 @@ static isolate_migrate_t isolate_migratepages(struct zone *zone,
low_pfn = max(cc->migrate_pfn, zone->zone_start_pfn);
/* Only scan within a pageblock boundary */
end_pfn = ALIGN(low_pfn + pageblock_nr_pages, pageblock_nr_pages);
end_pfn = ALIGN(low_pfn + 1, pageblock_nr_pages);
/* Do not cross the free scanner or scan within a memory hole */
if (end_pfn > cc->free_pfn || !pfn_valid(low_pfn)) {
......@@ -920,7 +923,7 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
{
int ret;
unsigned long start_pfn = zone->zone_start_pfn;
unsigned long end_pfn = zone->zone_start_pfn + zone->spanned_pages;
unsigned long end_pfn = zone_end_pfn(zone);
ret = compaction_suitable(zone, cc->order);
switch (ret) {
......@@ -977,7 +980,7 @@ static int compact_zone(struct zone *zone, struct compact_control *cc)
nr_migrate = cc->nr_migratepages;
err = migrate_pages(&cc->migratepages, compaction_alloc,
(unsigned long)cc, false,
(unsigned long)cc,
cc->sync ? MIGRATE_SYNC_LIGHT : MIGRATE_ASYNC,
MR_COMPACTION);
update_nr_listpages(cc);
......@@ -1086,7 +1089,7 @@ unsigned long try_to_compact_pages(struct zonelist *zonelist,
/* Compact all zones within a node */
static int __compact_pgdat(pg_data_t *pgdat, struct compact_control *cc)
static void __compact_pgdat(pg_data_t *pgdat, struct compact_control *cc)
{
int zoneid;
struct zone *zone;
......@@ -1119,28 +1122,26 @@ static int __compact_pgdat(pg_data_t *pgdat, struct compact_control *cc)
VM_BUG_ON(!list_empty(&cc->freepages));
VM_BUG_ON(!list_empty(&cc->migratepages));
}
return 0;
}
int compact_pgdat(pg_data_t *pgdat, int order)
void compact_pgdat(pg_data_t *pgdat, int order)
{
struct compact_control cc = {
.order = order,
.sync = false,
};
return __compact_pgdat(pgdat, &cc);
__compact_pgdat(pgdat, &cc);
}
static int compact_node(int nid)
static void compact_node(int nid)
{
struct compact_control cc = {
.order = -1,
.sync = true,
};
return __compact_pgdat(NODE_DATA(nid), &cc);
__compact_pgdat(NODE_DATA(nid), &cc);
}
/* Compact all nodes in the system */
......
......@@ -17,6 +17,7 @@
#include <linux/fadvise.h>
#include <linux/writeback.h>
#include <linux/syscalls.h>
#include <linux/swap.h>
#include <asm/unistd.h>
......@@ -120,9 +121,22 @@ SYSCALL_DEFINE(fadvise64_64)(int fd, loff_t offset, loff_t len, int advice)
start_index = (offset+(PAGE_CACHE_SIZE-1)) >> PAGE_CACHE_SHIFT;
end_index = (endbyte >> PAGE_CACHE_SHIFT);
if (end_index >= start_index)
invalidate_mapping_pages(mapping, start_index,
if (end_index >= start_index) {
unsigned long count = invalidate_mapping_pages(mapping,
start_index, end_index);
/*
* If fewer pages were invalidated than expected then
* it is possible that some of the pages were on
* a per-cpu pagevec for a remote CPU. Drain all
* pagevecs and try again.
*/
if (count < (end_index - start_index + 1)) {
lru_add_drain_all();
invalidate_mapping_pages(mapping, start_index,
end_index);
}
}
break;
default:
ret = -EINVAL;
......
......@@ -129,6 +129,7 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
struct vm_area_struct *vma;
int err = -EINVAL;
int has_write_lock = 0;
vm_flags_t vm_flags;
if (prot)
return err;
......@@ -160,15 +161,11 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
/*
* Make sure the vma is shared, that it supports prefaulting,
* and that the remapped range is valid and fully within
* the single existing vma. vm_private_data is used as a
* swapout cursor in a VM_NONLINEAR vma.
* the single existing vma.
*/
if (!vma || !(vma->vm_flags & VM_SHARED))
goto out;
if (vma->vm_private_data && !(vma->vm_flags & VM_NONLINEAR))
goto out;
if (!vma->vm_ops || !vma->vm_ops->remap_pages)
goto out;
......@@ -177,6 +174,13 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
/* Must set VM_NONLINEAR before any pages are populated. */
if (!(vma->vm_flags & VM_NONLINEAR)) {
/*
* vm_private_data is used as a swapout cursor
* in a VM_NONLINEAR vma.
*/
if (vma->vm_private_data)
goto out;
/* Don't need a nonlinear mapping, exit success */
if (pgoff == linear_page_index(vma, start)) {
err = 0;
......@@ -184,6 +188,7 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
}
if (!has_write_lock) {
get_write_lock:
up_read(&mm->mmap_sem);
down_write(&mm->mmap_sem);
has_write_lock = 1;
......@@ -199,9 +204,10 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
unsigned long addr;
struct file *file = get_file(vma->vm_file);
flags &= MAP_NONBLOCK;
addr = mmap_region(file, start, size,
flags, vma->vm_flags, pgoff);
vm_flags = vma->vm_flags;
if (!(flags & MAP_NONBLOCK))
vm_flags |= VM_POPULATE;
addr = mmap_region(file, start, size, vm_flags, pgoff);
fput(file);
if (IS_ERR_VALUE(addr)) {
err = addr;
......@@ -220,32 +226,26 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
mutex_unlock(&mapping->i_mmap_mutex);
}
if (!(flags & MAP_NONBLOCK) && !(vma->vm_flags & VM_POPULATE)) {
if (!has_write_lock)
goto get_write_lock;
vma->vm_flags |= VM_POPULATE;
}
if (vma->vm_flags & VM_LOCKED) {
/*
* drop PG_Mlocked flag for over-mapped range
*/
vm_flags_t saved_flags = vma->vm_flags;
if (!has_write_lock)
goto get_write_lock;
vm_flags = vma->vm_flags;
munlock_vma_pages_range(vma, start, start + size);
vma->vm_flags = saved_flags;
vma->vm_flags = vm_flags;
}
mmu_notifier_invalidate_range_start(mm, start, start + size);
err = vma->vm_ops->remap_pages(vma, start, size, pgoff);
mmu_notifier_invalidate_range_end(mm, start, start + size);
if (!err && !(flags & MAP_NONBLOCK)) {
if (vma->vm_flags & VM_LOCKED) {
/*
* might be mapping previously unmapped range of file
*/
mlock_vma_pages_range(vma, start, start + size);
} else {
if (unlikely(has_write_lock)) {
downgrade_write(&mm->mmap_sem);
has_write_lock = 0;
}
make_pages_present(start, start+size);
}
}
/*
* We can't clear VM_NONLINEAR because we'd have to do
......@@ -254,10 +254,13 @@ SYSCALL_DEFINE5(remap_file_pages, unsigned long, start, unsigned long, size,
*/
out:
vm_flags = vma->vm_flags;
if (likely(!has_write_lock))
up_read(&mm->mmap_sem);
else
up_write(&mm->mmap_sem);
if (!err && ((vm_flags & VM_LOCKED) || !(flags & MAP_NONBLOCK)))
mm_populate(start, size);
return err;
}
......@@ -20,6 +20,7 @@
#include <linux/mman.h>
#include <linux/pagemap.h>
#include <linux/migrate.h>
#include <linux/hashtable.h>
#include <asm/tlb.h>
#include <asm/pgalloc.h>
......@@ -62,12 +63,11 @@ static DECLARE_WAIT_QUEUE_HEAD(khugepaged_wait);
static unsigned int khugepaged_max_ptes_none __read_mostly = HPAGE_PMD_NR-1;
static int khugepaged(void *none);
static int mm_slots_hash_init(void);
static int khugepaged_slab_init(void);
static void khugepaged_slab_free(void);
#define MM_SLOTS_HASH_HEADS 1024
static struct hlist_head *mm_slots_hash __read_mostly;
#define MM_SLOTS_HASH_BITS 10
static __read_mostly DEFINE_HASHTABLE(mm_slots_hash, MM_SLOTS_HASH_BITS);
static struct kmem_cache *mm_slot_cache __read_mostly;
/**
......@@ -105,7 +105,6 @@ static int set_recommended_min_free_kbytes(void)
struct zone *zone;
int nr_zones = 0;
unsigned long recommended_min;
extern int min_free_kbytes;
if (!khugepaged_enabled())
return 0;
......@@ -634,12 +633,6 @@ static int __init hugepage_init(void)
if (err)
goto out;
err = mm_slots_hash_init();
if (err) {
khugepaged_slab_free();
goto out;
}
register_shrinker(&huge_zero_page_shrinker);
/*
......@@ -1302,7 +1295,6 @@ int do_huge_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
int target_nid;
int current_nid = -1;
bool migrated;
bool page_locked = false;
spin_lock(&mm->page_table_lock);
if (unlikely(!pmd_same(pmd, *pmdp)))
......@@ -1324,7 +1316,6 @@ int do_huge_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
/* Acquire the page lock to serialise THP migrations */
spin_unlock(&mm->page_table_lock);
lock_page(page);
page_locked = true;
/* Confirm the PTE did not while locked */
spin_lock(&mm->page_table_lock);
......@@ -1337,34 +1328,26 @@ int do_huge_pmd_numa_page(struct mm_struct *mm, struct vm_area_struct *vma,
/* Migrate the THP to the requested node */
migrated = migrate_misplaced_transhuge_page(mm, vma,
pmdp, pmd, addr,
page, target_nid);
if (migrated)
current_nid = target_nid;
else {
spin_lock(&mm->page_table_lock);
if (unlikely(!pmd_same(pmd, *pmdp))) {
unlock_page(page);
goto out_unlock;
}
goto clear_pmdnuma;
}
pmdp, pmd, addr, page, target_nid);
if (!migrated)
goto check_same;
task_numa_fault(current_nid, HPAGE_PMD_NR, migrated);
task_numa_fault(target_nid, HPAGE_PMD_NR, true);
return 0;
check_same:
spin_lock(&mm->page_table_lock);
if (unlikely(!pmd_same(pmd, *pmdp)))
goto out_unlock;
clear_pmdnuma:
pmd = pmd_mknonnuma(pmd);
set_pmd_at(mm, haddr, pmdp, pmd);
VM_BUG_ON(pmd_numa(*pmdp));
update_mmu_cache_pmd(vma, addr, pmdp);
if (page_locked)
unlock_page(page);
out_unlock:
spin_unlock(&mm->page_table_lock);
if (current_nid != -1)
task_numa_fault(current_nid, HPAGE_PMD_NR, migrated);
task_numa_fault(current_nid, HPAGE_PMD_NR, false);
return 0;
}
......@@ -1656,7 +1639,7 @@ static void __split_huge_page_refcount(struct page *page)
page_tail->mapping = page->mapping;
page_tail->index = page->index + i;
page_xchg_last_nid(page_tail, page_last_nid(page));
page_nid_xchg_last(page_tail, page_nid_last(page));
BUG_ON(!PageAnon(page_tail));
BUG_ON(!PageUptodate(page_tail));
......@@ -1846,7 +1829,7 @@ int split_huge_page(struct page *page)
BUG_ON(PageCompound(page));
out_unlock:
anon_vma_unlock(anon_vma);
anon_vma_unlock_write(anon_vma);
put_anon_vma(anon_vma);
out:
return ret;
......@@ -1908,12 +1891,6 @@ static int __init khugepaged_slab_init(void)
return 0;
}
static void __init khugepaged_slab_free(void)
{
kmem_cache_destroy(mm_slot_cache);
mm_slot_cache = NULL;
}
static inline struct mm_slot *alloc_mm_slot(void)
{
if (!mm_slot_cache) /* initialization failed */
......@@ -1926,47 +1903,23 @@ static inline void free_mm_slot(struct mm_slot *mm_slot)
kmem_cache_free(mm_slot_cache, mm_slot);
}
static int __init mm_slots_hash_init(void)
{
mm_slots_hash = kzalloc(MM_SLOTS_HASH_HEADS * sizeof(struct hlist_head),
GFP_KERNEL);
if (!mm_slots_hash)
return -ENOMEM;
return 0;
}
#if 0
static void __init mm_slots_hash_free(void)
{
kfree(mm_slots_hash);
mm_slots_hash = NULL;
}
#endif
static struct mm_slot *get_mm_slot(struct mm_struct *mm)
{
struct mm_slot *mm_slot;
struct hlist_head *bucket;
struct hlist_node *node;
bucket = &mm_slots_hash[((unsigned long)mm / sizeof(struct mm_struct))
% MM_SLOTS_HASH_HEADS];
hlist_for_each_entry(mm_slot, node, bucket, hash) {
hash_for_each_possible(mm_slots_hash, mm_slot, node, hash, (unsigned long)mm)
if (mm == mm_slot->mm)
return mm_slot;
}
return NULL;
}
static void insert_to_mm_slots_hash(struct mm_struct *mm,
struct mm_slot *mm_slot)
{
struct hlist_head *bucket;
bucket = &mm_slots_hash[((unsigned long)mm / sizeof(struct mm_struct))
% MM_SLOTS_HASH_HEADS];
mm_slot->mm = mm;
hlist_add_head(&mm_slot->hash, bucket);
hash_add(mm_slots_hash, &mm_slot->hash, (long)mm);
}
static inline int khugepaged_test_exit(struct mm_struct *mm)
......@@ -2035,7 +1988,7 @@ void __khugepaged_exit(struct mm_struct *mm)
spin_lock(&khugepaged_mm_lock);
mm_slot = get_mm_slot(mm);
if (mm_slot && khugepaged_scan.mm_slot != mm_slot) {
hlist_del(&mm_slot->hash);
hash_del(&mm_slot->hash);
list_del(&mm_slot->mm_node);
free = 1;
}
......@@ -2368,7 +2321,7 @@ static void collapse_huge_page(struct mm_struct *mm,
BUG_ON(!pmd_none(*pmd));
set_pmd_at(mm, address, pmd, _pmd);
spin_unlock(&mm->page_table_lock);
anon_vma_unlock(vma->anon_vma);
anon_vma_unlock_write(vma->anon_vma);
goto out;
}
......@@ -2376,7 +2329,7 @@ static void collapse_huge_page(struct mm_struct *mm,
* All pages are isolated and locked so anon_vma rmap
* can't run anymore.
*/
anon_vma_unlock(vma->anon_vma);
anon_vma_unlock_write(vma->anon_vma);
__collapse_huge_page_copy(pte, new_page, vma, address, ptl);
pte_unmap(pte);
......@@ -2423,7 +2376,7 @@ static int khugepaged_scan_pmd(struct mm_struct *mm,
struct page *page;
unsigned long _address;
spinlock_t *ptl;
int node = -1;
int node = NUMA_NO_NODE;
VM_BUG_ON(address & ~HPAGE_PMD_MASK);
......@@ -2453,7 +2406,7 @@ static int khugepaged_scan_pmd(struct mm_struct *mm,
* be more sophisticated and look at more pages,
* but isn't for now.
*/
if (node == -1)
if (node == NUMA_NO_NODE)
node = page_to_nid(page);
VM_BUG_ON(PageCompound(page));
if (!PageLRU(page) || PageLocked(page) || !PageAnon(page))
......@@ -2484,7 +2437,7 @@ static void collect_mm_slot(struct mm_slot *mm_slot)
if (khugepaged_test_exit(mm)) {
/* free mm_slot */
hlist_del(&mm_slot->hash);
hash_del(&mm_slot->hash);
list_del(&mm_slot->mm_node);
/*
......
......@@ -1293,8 +1293,7 @@ static void __init report_hugepages(void)
for_each_hstate(h) {
char buf[32];
printk(KERN_INFO "HugeTLB registered %s page size, "
"pre-allocated %ld pages\n",
pr_info("HugeTLB registered %s page size, pre-allocated %ld pages\n",
memfmt(buf, huge_page_size(h)),
h->free_huge_pages);
}
......@@ -1702,8 +1701,7 @@ static void __init hugetlb_sysfs_init(void)
err = hugetlb_sysfs_add_hstate(h, hugepages_kobj,
hstate_kobjs, &hstate_attr_group);
if (err)
printk(KERN_ERR "Hugetlb: Unable to add hstate %s",
h->name);
pr_err("Hugetlb: Unable to add hstate %s", h->name);
}
}
......@@ -1826,9 +1824,8 @@ void hugetlb_register_node(struct node *node)
nhs->hstate_kobjs,
&per_node_hstate_attr_group);
if (err) {
printk(KERN_ERR "Hugetlb: Unable to add hstate %s"
" for node %d\n",
h->name, node->dev.id);
pr_err("Hugetlb: Unable to add hstate %s for node %d\n",
h->name, node->dev.id);
hugetlb_unregister_node(node);
break;
}
......@@ -1924,7 +1921,7 @@ void __init hugetlb_add_hstate(unsigned order)
unsigned long i;
if (size_to_hstate(PAGE_SIZE << order)) {
printk(KERN_WARNING "hugepagesz= specified twice, ignoring\n");
pr_warning("hugepagesz= specified twice, ignoring\n");
return;
}
BUG_ON(hugetlb_max_hstate >= HUGE_MAX_HSTATE);
......@@ -1960,8 +1957,8 @@ static int __init hugetlb_nrpages_setup(char *s)
mhp = &parsed_hstate->max_huge_pages;
if (mhp == last_mhp) {
printk(KERN_WARNING "hugepages= specified twice without "
"interleaving hugepagesz=, ignoring\n");
pr_warning("hugepages= specified twice without "
"interleaving hugepagesz=, ignoring\n");
return 1;
}
......@@ -2692,9 +2689,8 @@ static int hugetlb_no_page(struct mm_struct *mm, struct vm_area_struct *vma,
* COW. Warn that such a situation has occurred as it may not be obvious
*/
if (is_vma_resv_set(vma, HPAGE_RESV_UNMAPPED)) {
printk(KERN_WARNING
"PID %d killed due to inadequate hugepage pool\n",
current->pid);
pr_warning("PID %d killed due to inadequate hugepage pool\n",
current->pid);
return ret;
}
......@@ -2924,14 +2920,14 @@ follow_huge_pud(struct mm_struct *mm, unsigned long address,
return NULL;
}
int follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
struct page **pages, struct vm_area_struct **vmas,
unsigned long *position, int *length, int i,
unsigned int flags)
long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
struct page **pages, struct vm_area_struct **vmas,
unsigned long *position, unsigned long *nr_pages,
long i, unsigned int flags)
{
unsigned long pfn_offset;
unsigned long vaddr = *position;
int remainder = *length;
unsigned long remainder = *nr_pages;
struct hstate *h = hstate_vma(vma);
spin_lock(&mm->page_table_lock);
......@@ -3001,7 +2997,7 @@ int follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma,
}
}
spin_unlock(&mm->page_table_lock);
*length = remainder;
*nr_pages = remainder;
*position = vaddr;
return i ? i : -EFAULT;
......
......@@ -162,8 +162,8 @@ void __vma_link_list(struct mm_struct *mm, struct vm_area_struct *vma,
struct vm_area_struct *prev, struct rb_node *rb_parent);
#ifdef CONFIG_MMU
extern long mlock_vma_pages_range(struct vm_area_struct *vma,
unsigned long start, unsigned long end);
extern long __mlock_vma_pages_range(struct vm_area_struct *vma,
unsigned long start, unsigned long end, int *nonblocking);
extern void munlock_vma_pages_range(struct vm_area_struct *vma,
unsigned long start, unsigned long end);
static inline void munlock_vma_pages_all(struct vm_area_struct *vma)
......
......@@ -1300,9 +1300,8 @@ static void kmemleak_scan(void)
*/
lock_memory_hotplug();
for_each_online_node(i) {
pg_data_t *pgdat = NODE_DATA(i);
unsigned long start_pfn = pgdat->node_start_pfn;
unsigned long end_pfn = start_pfn + pgdat->node_spanned_pages;
unsigned long start_pfn = node_start_pfn(i);
unsigned long end_pfn = node_end_pfn(i);
unsigned long pfn;
for (pfn = start_pfn; pfn < end_pfn; pfn++) {
......
此差异已折叠。
......@@ -16,6 +16,9 @@
#include <linux/ksm.h>
#include <linux/fs.h>
#include <linux/file.h>
#include <linux/blkdev.h>
#include <linux/swap.h>
#include <linux/swapops.h>
/*
* Any behaviour which results in changes to the vma->vm_flags needs to
......@@ -131,6 +134,84 @@ static long madvise_behavior(struct vm_area_struct * vma,
return error;
}
#ifdef CONFIG_SWAP
static int swapin_walk_pmd_entry(pmd_t *pmd, unsigned long start,
unsigned long end, struct mm_walk *walk)
{
pte_t *orig_pte;
struct vm_area_struct *vma = walk->private;
unsigned long index;
if (pmd_none_or_trans_huge_or_clear_bad(pmd))
return 0;
for (index = start; index != end; index += PAGE_SIZE) {
pte_t pte;
swp_entry_t entry;
struct page *page;
spinlock_t *ptl;
orig_pte = pte_offset_map_lock(vma->vm_mm, pmd, start, &ptl);
pte = *(orig_pte + ((index - start) / PAGE_SIZE));
pte_unmap_unlock(orig_pte, ptl);
if (pte_present(pte) || pte_none(pte) || pte_file(pte))
continue;
entry = pte_to_swp_entry(pte);
if (unlikely(non_swap_entry(entry)))
continue;
page = read_swap_cache_async(entry, GFP_HIGHUSER_MOVABLE,
vma, index);
if (page)
page_cache_release(page);
}
return 0;
}
static void force_swapin_readahead(struct vm_area_struct *vma,
unsigned long start, unsigned long end)
{
struct mm_walk walk = {
.mm = vma->vm_mm,
.pmd_entry = swapin_walk_pmd_entry,
.private = vma,
};
walk_page_range(start, end, &walk);
lru_add_drain(); /* Push any new pages onto the LRU now */
}
static void force_shm_swapin_readahead(struct vm_area_struct *vma,
unsigned long start, unsigned long end,
struct address_space *mapping)
{
pgoff_t index;
struct page *page;
swp_entry_t swap;
for (; start < end; start += PAGE_SIZE) {
index = ((start - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff;
page = find_get_page(mapping, index);
if (!radix_tree_exceptional_entry(page)) {
if (page)
page_cache_release(page);
continue;
}
swap = radix_to_swp_entry(page);
page = read_swap_cache_async(swap, GFP_HIGHUSER_MOVABLE,
NULL, 0);
if (page)
page_cache_release(page);
}
lru_add_drain(); /* Push any new pages onto the LRU now */
}
#endif /* CONFIG_SWAP */
/*
* Schedule all required I/O operations. Do not wait for completion.
*/
......@@ -140,6 +221,18 @@ static long madvise_willneed(struct vm_area_struct * vma,
{
struct file *file = vma->vm_file;
#ifdef CONFIG_SWAP
if (!file || mapping_cap_swap_backed(file->f_mapping)) {
*prev = vma;
if (!file)
force_swapin_readahead(vma, start, end);
else
force_shm_swapin_readahead(vma, start, end,
file->f_mapping);
return 0;
}
#endif
if (!file)
return -EBADF;
......@@ -371,6 +464,7 @@ SYSCALL_DEFINE3(madvise, unsigned long, start, size_t, len_in, int, behavior)
int error = -EINVAL;
int write;
size_t len;
struct blk_plug plug;
#ifdef CONFIG_MEMORY_FAILURE
if (behavior == MADV_HWPOISON || behavior == MADV_SOFT_OFFLINE)
......@@ -410,18 +504,19 @@ SYSCALL_DEFINE3(madvise, unsigned long, start, size_t, len_in, int, behavior)
if (vma && start > vma->vm_start)
prev = vma;
blk_start_plug(&plug);
for (;;) {
/* Still start < end. */
error = -ENOMEM;
if (!vma)
goto out;
goto out_plug;
/* Here start < (end|vma->vm_end). */
if (start < vma->vm_start) {
unmapped_error = -ENOMEM;
start = vma->vm_start;
if (start >= end)
goto out;
goto out_plug;
}
/* Here vma->vm_start <= start < (end|vma->vm_end) */
......@@ -432,18 +527,20 @@ SYSCALL_DEFINE3(madvise, unsigned long, start, size_t, len_in, int, behavior)
/* Here vma->vm_start <= start < tmp <= (end|vma->vm_end). */
error = madvise_vma(vma, &prev, start, tmp, behavior);
if (error)
goto out;
goto out_plug;
start = tmp;
if (prev && start < prev->vm_end)
start = prev->vm_end;
error = unmapped_error;
if (start >= end)
goto out;
goto out_plug;
if (prev)
vma = prev->vm_next;
else /* madvise_remove dropped mmap_sem */
vma = find_vma(current->mm, start);
}
out_plug:
blk_finish_plug(&plug);
out:
if (write)
up_write(&current->mm->mmap_sem);
......
......@@ -92,9 +92,58 @@ static long __init_memblock memblock_overlaps_region(struct memblock_type *type,
*
* Find @size free area aligned to @align in the specified range and node.
*
* If we have CONFIG_HAVE_MEMBLOCK_NODE_MAP defined, we need to check if the
* memory we found if not in hotpluggable ranges.
*
* RETURNS:
* Found address on success, %0 on failure.
*/
#ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t start,
phys_addr_t end, phys_addr_t size,
phys_addr_t align, int nid)
{
phys_addr_t this_start, this_end, cand;
u64 i;
int curr = movablemem_map.nr_map - 1;
/* pump up @end */
if (end == MEMBLOCK_ALLOC_ACCESSIBLE)
end = memblock.current_limit;
/* avoid allocating the first page */
start = max_t(phys_addr_t, start, PAGE_SIZE);
end = max(start, end);
for_each_free_mem_range_reverse(i, nid, &this_start, &this_end, NULL) {
this_start = clamp(this_start, start, end);
this_end = clamp(this_end, start, end);
restart:
if (this_end <= this_start || this_end < size)
continue;
for (; curr >= 0; curr--) {
if ((movablemem_map.map[curr].start_pfn << PAGE_SHIFT)
< this_end)
break;
}
cand = round_down(this_end - size, align);
if (curr >= 0 &&
cand < movablemem_map.map[curr].end_pfn << PAGE_SHIFT) {
this_end = movablemem_map.map[curr].start_pfn
<< PAGE_SHIFT;
goto restart;
}
if (cand >= this_start)
return cand;
}
return 0;
}
#else /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t start,
phys_addr_t end, phys_addr_t size,
phys_addr_t align, int nid)
......@@ -123,6 +172,7 @@ phys_addr_t __init_memblock memblock_find_in_range_node(phys_addr_t start,
}
return 0;
}
#endif /* CONFIG_HAVE_MEMBLOCK_NODE_MAP */
/**
* memblock_find_in_range - find free area in given range
......
此差异已折叠。
此差异已折叠。
......@@ -69,6 +69,10 @@
#include "internal.h"
#ifdef LAST_NID_NOT_IN_PAGE_FLAGS
#warning Unfortunate NUMA and NUMA Balancing config, growing page-frame for last_nid.
#endif
#ifndef CONFIG_NEED_MULTIPLE_NODES
/* use the per-pgdat data instead for discontigmem - mbligh */
unsigned long max_mapnr;
......@@ -1458,10 +1462,11 @@ int zap_vma_ptes(struct vm_area_struct *vma, unsigned long address,
EXPORT_SYMBOL_GPL(zap_vma_ptes);
/**
* follow_page - look up a page descriptor from a user-virtual address
* follow_page_mask - look up a page descriptor from a user-virtual address
* @vma: vm_area_struct mapping @address
* @address: virtual address to look up
* @flags: flags modifying lookup behaviour
* @page_mask: on output, *page_mask is set according to the size of the page
*
* @flags can have FOLL_ flags set, defined in <linux/mm.h>
*
......@@ -1469,8 +1474,9 @@ EXPORT_SYMBOL_GPL(zap_vma_ptes);
* an error pointer if there is a mapping to something not represented
* by a page descriptor (see also vm_normal_page()).
*/
struct page *follow_page(struct vm_area_struct *vma, unsigned long address,
unsigned int flags)
struct page *follow_page_mask(struct vm_area_struct *vma,
unsigned long address, unsigned int flags,
unsigned int *page_mask)
{
pgd_t *pgd;
pud_t *pud;
......@@ -1480,6 +1486,8 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address,
struct page *page;
struct mm_struct *mm = vma->vm_mm;
*page_mask = 0;
page = follow_huge_addr(mm, address, flags & FOLL_WRITE);
if (!IS_ERR(page)) {
BUG_ON(flags & FOLL_GET);
......@@ -1526,6 +1534,7 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address,
page = follow_trans_huge_pmd(vma, address,
pmd, flags);
spin_unlock(&mm->page_table_lock);
*page_mask = HPAGE_PMD_NR - 1;
goto out;
}
} else
......@@ -1539,8 +1548,24 @@ struct page *follow_page(struct vm_area_struct *vma, unsigned long address,
ptep = pte_offset_map_lock(mm, pmd, address, &ptl);
pte = *ptep;
if (!pte_present(pte))
goto no_page;
if (!pte_present(pte)) {
swp_entry_t entry;
/*
* KSM's break_ksm() relies upon recognizing a ksm page
* even while it is being migrated, so for that case we
* need migration_entry_wait().
*/
if (likely(!(flags & FOLL_MIGRATION)))
goto no_page;
if (pte_none(pte) || pte_file(pte))
goto no_page;
entry = pte_to_swp_entry(pte);
if (!is_migration_entry(entry))
goto no_page;
pte_unmap_unlock(ptep, ptl);
migration_entry_wait(mm, pmd, address);
goto split_fallthrough;
}
if ((flags & FOLL_NUMA) && pte_numa(pte))
goto no_page;
if ((flags & FOLL_WRITE) && !pte_write(pte))
......@@ -1673,15 +1698,16 @@ static inline int stack_guard_page(struct vm_area_struct *vma, unsigned long add
* instead of __get_user_pages. __get_user_pages should be used only if
* you need some special @gup_flags.
*/
int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
unsigned long start, int nr_pages, unsigned int gup_flags,
struct page **pages, struct vm_area_struct **vmas,
int *nonblocking)
long __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
unsigned long start, unsigned long nr_pages,
unsigned int gup_flags, struct page **pages,
struct vm_area_struct **vmas, int *nonblocking)
{
int i;
long i;
unsigned long vm_flags;
unsigned int page_mask;
if (nr_pages <= 0)
if (!nr_pages)
return 0;
VM_BUG_ON(!!pages != !!(gup_flags & FOLL_GET));
......@@ -1757,6 +1783,7 @@ int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
get_page(page);
}
pte_unmap(pte);
page_mask = 0;
goto next_page;
}
......@@ -1774,6 +1801,7 @@ int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
do {
struct page *page;
unsigned int foll_flags = gup_flags;
unsigned int page_increm;
/*
* If we have a pending SIGKILL, don't keep faulting
......@@ -1783,7 +1811,8 @@ int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
return i ? i : -ERESTARTSYS;
cond_resched();
while (!(page = follow_page(vma, start, foll_flags))) {
while (!(page = follow_page_mask(vma, start,
foll_flags, &page_mask))) {
int ret;
unsigned int fault_flags = 0;
......@@ -1857,13 +1886,19 @@ int __get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
flush_anon_page(vma, page, start);
flush_dcache_page(page);
page_mask = 0;
}
next_page:
if (vmas)
if (vmas) {
vmas[i] = vma;
i++;
start += PAGE_SIZE;
nr_pages--;
page_mask = 0;
}
page_increm = 1 + (~(start >> PAGE_SHIFT) & page_mask);
if (page_increm > nr_pages)
page_increm = nr_pages;
i += page_increm;
start += page_increm * PAGE_SIZE;
nr_pages -= page_increm;
} while (nr_pages && start < vma->vm_end);
} while (nr_pages);
return i;
......@@ -1977,9 +2012,9 @@ int fixup_user_fault(struct task_struct *tsk, struct mm_struct *mm,
*
* See also get_user_pages_fast, for performance critical applications.
*/
int get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
unsigned long start, int nr_pages, int write, int force,
struct page **pages, struct vm_area_struct **vmas)
long get_user_pages(struct task_struct *tsk, struct mm_struct *mm,
unsigned long start, unsigned long nr_pages, int write,
int force, struct page **pages, struct vm_area_struct **vmas)
{
int flags = FOLL_TOUCH;
......@@ -2919,7 +2954,7 @@ static int do_swap_page(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned int flags, pte_t orig_pte)
{
spinlock_t *ptl;
struct page *page, *swapcache = NULL;
struct page *page, *swapcache;
swp_entry_t entry;
pte_t pte;
int locked;
......@@ -2970,9 +3005,11 @@ static int do_swap_page(struct mm_struct *mm, struct vm_area_struct *vma,
*/
ret = VM_FAULT_HWPOISON;
delayacct_clear_flag(DELAYACCT_PF_SWAPIN);
swapcache = page;
goto out_release;
}
swapcache = page;
locked = lock_page_or_retry(page, mm, flags);
delayacct_clear_flag(DELAYACCT_PF_SWAPIN);
......@@ -2990,16 +3027,11 @@ static int do_swap_page(struct mm_struct *mm, struct vm_area_struct *vma,
if (unlikely(!PageSwapCache(page) || page_private(page) != entry.val))
goto out_page;
if (ksm_might_need_to_copy(page, vma, address)) {
swapcache = page;
page = ksm_does_need_to_copy(page, vma, address);
if (unlikely(!page)) {
ret = VM_FAULT_OOM;
page = swapcache;
swapcache = NULL;
goto out_page;
}
page = ksm_might_need_to_copy(page, vma, address);
if (unlikely(!page)) {
ret = VM_FAULT_OOM;
page = swapcache;
goto out_page;
}
if (mem_cgroup_try_charge_swapin(mm, page, GFP_KERNEL, &ptr)) {
......@@ -3044,7 +3076,10 @@ static int do_swap_page(struct mm_struct *mm, struct vm_area_struct *vma,
}
flush_icache_page(vma, page);
set_pte_at(mm, address, page_table, pte);
do_page_add_anon_rmap(page, vma, address, exclusive);
if (page == swapcache)
do_page_add_anon_rmap(page, vma, address, exclusive);
else /* ksm created a completely new copy */
page_add_new_anon_rmap(page, vma, address);
/* It's better to call commit-charge after rmap is established */
mem_cgroup_commit_charge_swapin(page, ptr);
......@@ -3052,7 +3087,7 @@ static int do_swap_page(struct mm_struct *mm, struct vm_area_struct *vma,
if (vm_swap_full() || (vma->vm_flags & VM_LOCKED) || PageMlocked(page))
try_to_free_swap(page);
unlock_page(page);
if (swapcache) {
if (page != swapcache) {
/*
* Hold the lock to avoid the swap entry to be reused
* until we take the PT lock for the pte_same() check
......@@ -3085,7 +3120,7 @@ static int do_swap_page(struct mm_struct *mm, struct vm_area_struct *vma,
unlock_page(page);
out_release:
page_cache_release(page);
if (swapcache) {
if (page != swapcache) {
unlock_page(swapcache);
page_cache_release(swapcache);
}
......@@ -3821,30 +3856,6 @@ int __pmd_alloc(struct mm_struct *mm, pud_t *pud, unsigned long address)
}
#endif /* __PAGETABLE_PMD_FOLDED */
int make_pages_present(unsigned long addr, unsigned long end)
{
int ret, len, write;
struct vm_area_struct * vma;
vma = find_vma(current->mm, addr);
if (!vma)
return -ENOMEM;
/*
* We want to touch writable mappings with a write fault in order
* to break COW, except for shared mappings because these don't COW
* and we would not want to dirty them for nothing.
*/
write = (vma->vm_flags & (VM_WRITE | VM_SHARED)) == VM_WRITE;
BUG_ON(addr >= end);
BUG_ON(end > vma->vm_end);
len = DIV_ROUND_UP(end, PAGE_SIZE) - addr/PAGE_SIZE;
ret = get_user_pages(current, current->mm, addr,
len, write, 0, NULL, NULL);
if (ret < 0)
return ret;
return ret == len ? 0 : -EFAULT;
}
#if !defined(__HAVE_ARCH_GATE_AREA)
#if defined(AT_SYSINFO_EHDR)
......
此差异已折叠。
此差异已折叠。
此差异已折叠。
......@@ -75,7 +75,7 @@ static unsigned char mincore_page(struct address_space *mapping, pgoff_t pgoff)
/* shmem/tmpfs may return swap: account for swapcache page too. */
if (radix_tree_exceptional_entry(page)) {
swp_entry_t swap = radix_to_swp_entry(page);
page = find_get_page(&swapper_space, swap.val);
page = find_get_page(swap_address_space(swap), swap.val);
}
#endif
if (page) {
......@@ -135,7 +135,8 @@ static void mincore_pte_range(struct vm_area_struct *vma, pmd_t *pmd,
} else {
#ifdef CONFIG_SWAP
pgoff = entry.val;
*vec = mincore_page(&swapper_space, pgoff);
*vec = mincore_page(swap_address_space(entry),
pgoff);
#else
WARN_ON(1);
*vec = 1;
......
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
/*
* linux/mm/mmzone.c
*
* management codes for pgdats and zones.
* management codes for pgdats, zones and page flags
*/
......@@ -96,3 +96,21 @@ void lruvec_init(struct lruvec *lruvec)
for_each_lru(lru)
INIT_LIST_HEAD(&lruvec->lists[lru]);
}
#if defined(CONFIG_NUMA_BALANCING) && !defined(LAST_NID_NOT_IN_PAGE_FLAGS)
int page_nid_xchg_last(struct page *page, int nid)
{
unsigned long old_flags, flags;
int last_nid;
do {
old_flags = flags = page->flags;
last_nid = page_nid_last(page);
flags &= ~(LAST_NID_MASK << LAST_NID_PGSHIFT);
flags |= (nid & LAST_NID_MASK) << LAST_NID_PGSHIFT;
} while (unlikely(cmpxchg(&page->flags, old_flags, flags) != old_flags));
return last_nid;
}
#endif
此差异已折叠。
此差异已折叠。
......@@ -386,8 +386,10 @@ static void dump_header(struct task_struct *p, gfp_t gfp_mask, int order,
cpuset_print_task_mems_allowed(current);
task_unlock(current);
dump_stack();
mem_cgroup_print_oom_info(memcg, p);
show_mem(SHOW_MEM_FILTER_NODES);
if (memcg)
mem_cgroup_print_oom_info(memcg, p);
else
show_mem(SHOW_MEM_FILTER_NODES);
if (sysctl_oom_dump_tasks)
dump_tasks(memcg, nodemask);
}
......
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
此差异已折叠。
Markdown is supported
0% .
You are about to add 0 people to the discussion. Proceed with caution.
先完成此消息的编辑!
想要评论请 注册