提交 · 8cebfcd074a3044780f3f9af236fc8534d89e55e · openanolis / cloud-kernel

13 12月, 2012 38 次提交

hugetlb: use N_MEMORY instead N_HIGH_MEMORY · 8cebfcd0

由 Lai Jiangshan 提交于 12月 12, 2012

N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.
Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: NHillf Danton <dhillf@gmail.com>
Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lin Feng <linfeng@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8cebfcd0

mempolicy: use N_MEMORY instead N_HIGH_MEMORY · 01f13bd6

由 Lai Jiangshan 提交于 12月 12, 2012

N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.
Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lin Feng <linfeng@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

01f13bd6

mm,migrate: use N_MEMORY instead N_HIGH_MEMORY · 389162c2

由 Lai Jiangshan 提交于 12月 12, 2012

N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.
Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lin Feng <linfeng@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

389162c2

oom: use N_MEMORY instead N_HIGH_MEMORY · bd3a66c1

由 Lai Jiangshan 提交于 12月 12, 2012

N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.
Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: NHillf Danton <dhillf@gmail.com>
Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Lin Feng <linfeng@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bd3a66c1

memcontrol: use N_MEMORY instead N_HIGH_MEMORY · 31aaea4a

由 Lai Jiangshan 提交于 12月 12, 2012

N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.
Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lin Feng <linfeng@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

31aaea4a

procfs: use N_MEMORY instead N_HIGH_MEMORY · 4ff1b2c2

由 Lai Jiangshan 提交于 12月 12, 2012

N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.
Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: NHillf Danton <dhillf@gmail.com>
Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Lin Feng <linfeng@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4ff1b2c2

cpuset: use N_MEMORY instead N_HIGH_MEMORY · 38d7bee9

由 Lai Jiangshan 提交于 12月 12, 2012

N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.
Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: NHillf Danton <dhillf@gmail.com>
Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Lin Feng <linfeng@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

38d7bee9

mm: node_states: introduce N_MEMORY · 8219fc48

由 Lai Jiangshan 提交于 12月 12, 2012

We have N_NORMAL_MEMORY for standing for the nodes that have normal memory
with zone_type <= ZONE_NORMAL.

And we have N_HIGH_MEMORY for standing for the nodes that have normal or
high memory.

But we don't have any word to stand for the nodes that have *any* memory.

And we have N_CPU but without N_MEMORY.

Current code reuse the N_HIGH_MEMORY for this purpose because any node
which has memory must have high memory or normal memory currently.

A)	But this reusing is bad for *readability*. Because the name
	N_HIGH_MEMORY just stands for high or normal:

A.example 1)
	mem_cgroup_nr_lru_pages():
		for_each_node_state(nid, N_HIGH_MEMORY)

	The user will be confused(why this function just counts for high or
	normal memory node? does it counts for ZONE_MOVABLE's lru pages?)
	until someone else tell them N_HIGH_MEMORY is reused to stand for
	nodes that have any memory.

A.cont) If we introduce N_MEMORY, we can reduce this confusing
	AND make the code more clearly:

A.example 2) mm/page_cgroup.c use N_HIGH_MEMORY twice:

	One is in page_cgroup_init(void):
		for_each_node_state(nid, N_HIGH_MEMORY) {

	It means if the node have memory, we will allocate page_cgroup map for
	the node. We should use N_MEMORY instead here to gaim more clearly.

	The second using is in alloc_page_cgroup():
		if (node_state(nid, N_HIGH_MEMORY))
			addr = vzalloc_node(size, nid);

	It means if the node has high or normal memory that can be allocated
	from kernel. We should keep N_HIGH_MEMORY here, and it will be better
	if the "any memory" semantic of N_HIGH_MEMORY is removed.

B)	This reusing is out-dated if we introduce MOVABLE-dedicated node.
	The MOVABLE-dedicated node should not appear in
	node_stats[N_HIGH_MEMORY] nor node_stats[N_NORMAL_MEMORY],
	because MOVABLE-dedicated node has no high or normal memory.

	In x86_64, N_HIGH_MEMORY=N_NORMAL_MEMORY, if a MOVABLE-dedicated node
	is in node_stats[N_HIGH_MEMORY], it is also means it is in
	node_stats[N_NORMAL_MEMORY], it causes SLUB wrong.

	The slub uses
		for_each_node_state(nid, N_NORMAL_MEMORY)
	and creates kmem_cache_node for MOVABLE-dedicated node and cause problem.

In one word, we need a N_MEMORY.  We just intrude it as an alias to
N_HIGH_MEMORY and fix all im-proper usages of N_HIGH_MEMORY in late
patches.
Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: NChristoph Lameter <cl@linux.com>
Acked-by: NHillf Danton <dhillf@gmail.com>
Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
Cc: Lin Feng <linfeng@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8219fc48

mm: use migrate_prep() instead of migrate_prep_local() · be49a6e1

由 Marek Szyprowski 提交于 12月 12, 2012

__alloc_contig_migrate_range() should use all possible ways to get all the
pages migrated from the given memory range, so pruning per-cpu lru lists
for all CPUs is required, regadless the cost of such operation.  Otherwise
some pages which got stuck at per-cpu lru list might get missed by
migration procedure causing the contiguous allocation to fail.
Reported-by: NSeongHwan Yoon <sunghwan.yun@samsung.com>
Signed-off-by: NMarek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: NKyungmin Park <kyungmin.park@samsung.com>
Acked-by: NMichal Nazarewicz <mina86@mina86.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

be49a6e1

mm: compaction: Fix compiler warning · c8bf2d8b

由 Thierry Reding 提交于 12月 12, 2012

compact_capture_page() is only used if compaction is enabled so it should
be moved into the corresponding #ifdef.
Signed-off-by: NThierry Reding <thierry.reding@avionic-design.de>
Acked-by: NMel Gorman <mgorman@suse.de>
Cc: Rik van Riel <riel@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c8bf2d8b

thp: avoid race on multiple parallel page faults to the same page · 3ea41e62

由 Kirill A. Shutemov 提交于 12月 12, 2012

pmd value is stable only with mm->page_table_lock taken. After taking
the lock we need to check that nobody modified the pmd before changing it.
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Jiri Slaby <jslaby@suse.cz>
Cc: David Rientjes <rientjes@google.com>
Reviewed-by: NBob Liu <lliubbo@gmail.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3ea41e62

thp: introduce sysfs knob to disable huge zero page · 79da5407

由 Kirill A. Shutemov 提交于 12月 12, 2012

By default kernel tries to use huge zero page on read page fault.  It's
possible to disable huge zero page by writing 0 or enable it back by
writing 1:

echo 0 >/sys/kernel/mm/transparent_hugepage/khugepaged/use_zero_page
echo 1 >/sys/kernel/mm/transparent_hugepage/khugepaged/use_zero_page
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@linux.intel.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

79da5407

thp, vmstat: implement HZP_ALLOC and HZP_ALLOC_FAILED events · d8a8e1f0

由 Kirill A. Shutemov 提交于 12月 12, 2012

hzp_alloc is incremented every time a huge zero page is successfully
	allocated. It includes allocations which where dropped due
	race with other allocation. Note, it doesn't count every map
	of the huge zero page, only its allocation.

hzp_alloc_failed is incremented if kernel fails to allocate huge zero
	page and falls back to using small pages.
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@linux.intel.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d8a8e1f0

thp: implement refcounting for huge zero page · 97ae1749

由 Kirill A. Shutemov 提交于 12月 12, 2012

H.  Peter Anvin doesn't like huge zero page which sticks in memory forever
after the first allocation.  Here's implementation of lockless refcounting
for huge zero page.

We have two basic primitives: {get,put}_huge_zero_page(). They
manipulate reference counter.

If counter is 0, get_huge_zero_page() allocates a new huge page and takes
two references: one for caller and one for shrinker.  We free the page
only in shrinker callback if counter is 1 (only shrinker has the
reference).

put_huge_zero_page() only decrements counter.  Counter is never zero in
put_huge_zero_page() since shrinker holds on reference.

Freeing huge zero page in shrinker callback helps to avoid frequent
allocate-free.

Refcounting has cost.  On 4 socket machine I observe ~1% slowdown on
parallel (40 processes) read page faulting comparing to lazy huge page
allocation.  I think it's pretty reasonable for synthetic benchmark.

[lliubbo@gmail.com: fix mismerge]
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@linux.intel.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: NBob Liu <lliubbo@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

97ae1749

thp: lazy huge zero page allocation · 78ca0e67

由 Kirill A. Shutemov 提交于 12月 12, 2012

Instead of allocating huge zero page on hugepage_init() we can postpone it
until first huge zero page map. It saves memory if THP is not in use.

cmpxchg() is used to avoid race on huge_zero_pfn initialization.
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@linux.intel.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

78ca0e67

thp: setup huge zero page on non-write page fault · 80371957

由 Kirill A. Shutemov 提交于 12月 12, 2012

All code paths seems covered. Now we can map huge zero page on read page
fault.

We setup it in do_huge_pmd_anonymous_page() if area around fault address
is suitable for THP and we've got read page fault.

If we fail to setup huge zero page (ENOMEM) we fallback to
handle_pte_fault() as we normally do in THP.
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@linux.intel.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

80371957

thp: implement splitting pmd for huge zero page · c5a647d0

由 Kirill A. Shutemov 提交于 12月 12, 2012

We can't split huge zero page itself (and it's bug if we try), but we
can split the pmd which points to it.

On splitting the pmd we create a table with all ptes set to normal zero
page.

[akpm@linux-foundation.org: fix build error]
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@linux.intel.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c5a647d0

thp: change split_huge_page_pmd() interface · e180377f

由 Kirill A. Shutemov 提交于 12月 12, 2012

Pass vma instead of mm and add address parameter.

In most cases we already have vma on the stack. We provides
split_huge_page_pmd_mm() for few cases when we have mm, but not vma.

This change is preparation to huge zero pmd splitting implementation.
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@linux.intel.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e180377f

thp: change_huge_pmd(): make sure we don't try to make a page writable · cad7f613

由 Kirill A. Shutemov 提交于 12月 12, 2012

mprotect core never tries to make page writable using change_huge_pmd().
Let's add an assert that the assumption is true.  It's important to be
sure we will not make huge zero page writable.
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@linux.intel.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

cad7f613

thp: do_huge_pmd_wp_page(): handle huge zero page · 93b4796d

由 Kirill A. Shutemov 提交于 12月 12, 2012

On write access to huge zero page we alloc a new huge page and clear it.

If ENOMEM, graceful fallback: we create a new pmd table and set pte around
fault address to newly allocated normal (4k) page.  All other ptes in the
pmd set to normal zero page.
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@linux.intel.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Acked-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

93b4796d

thp: copy_huge_pmd(): copy huge zero page · fc9fe822

由 Kirill A. Shutemov 提交于 12月 12, 2012

It's easy to copy huge zero page. Just set destination pmd to huge zero
page.

It's safe to copy huge zero page since we have none yet :-p

[rientjes@google.com: fix comment]
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@linux.intel.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fc9fe822

thp: zap_huge_pmd(): zap huge zero pmd · 479f0abb

由 Kirill A. Shutemov 提交于 12月 12, 2012

We don't have a mapped page to zap in huge zero page case.  Let's just clear
pmd and remove it from tlb.
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@linux.intel.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Acked-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

479f0abb

thp: huge zero page: basic preparation · 4a6c1297

由 Kirill A. Shutemov 提交于 12月 12, 2012

During testing I noticed big (up to 2.5 times) memory consumption overhead
on some workloads (e.g.  ft.A from NPB) if THP is enabled.

The main reason for that big difference is lacking zero page in THP case.
We have to allocate a real page on read page fault.

A program to demonstrate the issue:
#include <assert.h>
#include <stdlib.h>
#include <unistd.h>

#define MB 1024*1024

int main(int argc, char **argv)
{
        char *p;
        int i;

        posix_memalign((void **)&p, 2 * MB, 200 * MB);
        for (i = 0; i < 200 * MB; i+= 4096)
                assert(p[i] == 0);
        pause();
        return 0;
}

With thp-never RSS is about 400k, but with thp-always it's 200M.  After
the patcheset thp-always RSS is 400k too.

Design overview.

Huge zero page (hzp) is a non-movable huge page (2M on x86-64) filled with
zeros.  The way how we allocate it changes in the patchset:

- [01/10] simplest way: hzp allocated on boot time in hugepage_init();
- [09/10] lazy allocation on first use;
- [10/10] lockless refcounting + shrinker-reclaimable hzp;

We setup it in do_huge_pmd_anonymous_page() if area around fault address
is suitable for THP and we've got read page fault.  If we fail to setup
hzp (ENOMEM) we fallback to handle_pte_fault() as we normally do in THP.

On wp fault to hzp we allocate real memory for the huge page and clear it.
 If ENOMEM, graceful fallback: we create a new pmd table and set pte
around fault address to newly allocated normal (4k) page.  All other ptes
in the pmd set to normal zero page.

We cannot split hzp (and it's bug if we try), but we can split the pmd
which points to it.  On splitting the pmd we create a table with all ptes
set to normal zero page.

===

By hpa's request I've tried alternative approach for hzp implementation
(see Virtual huge zero page patchset): pmd table with all entries set to
zero page.  This way should be more cache friendly, but it increases TLB
pressure.

The problem with virtual huge zero page: it requires per-arch enabling.
We need a way to mark that pmd table has all ptes set to zero page.

Some numbers to compare two implementations (on 4s Westmere-EX):

Mirobenchmark1
==============

test:
        posix_memalign((void **)&p, 2 * MB, 8 * GB);
        for (i = 0; i < 100; i++) {
                assert(memcmp(p, p + 4*GB, 4*GB) == 0);
                asm volatile ("": : :"memory");
        }

hzp:
 Performance counter stats for './test_memcmp' (5 runs):

      32356.272845 task-clock                #    0.998 CPUs utilized            ( +-  0.13% )
                40 context-switches          #    0.001 K/sec                    ( +-  0.94% )
                 0 CPU-migrations            #    0.000 K/sec
             4,218 page-faults               #    0.130 K/sec                    ( +-  0.00% )
    76,712,481,765 cycles                    #    2.371 GHz                      ( +-  0.13% ) [83.31%]
    36,279,577,636 stalled-cycles-frontend   #   47.29% frontend cycles idle     ( +-  0.28% ) [83.35%]
     1,684,049,110 stalled-cycles-backend    #    2.20% backend  cycles idle     ( +-  2.96% ) [66.67%]
   134,355,715,816 instructions              #    1.75  insns per cycle
                                             #    0.27  stalled cycles per insn  ( +-  0.10% ) [83.35%]
    13,526,169,702 branches                  #  418.039 M/sec                    ( +-  0.10% ) [83.31%]
         1,058,230 branch-misses             #    0.01% of all branches          ( +-  0.91% ) [83.36%]

      32.413866442 seconds time elapsed                                          ( +-  0.13% )

vhzp:
 Performance counter stats for './test_memcmp' (5 runs):

      30327.183829 task-clock                #    0.998 CPUs utilized            ( +-  0.13% )
                38 context-switches          #    0.001 K/sec                    ( +-  1.53% )
                 0 CPU-migrations            #    0.000 K/sec
             4,218 page-faults               #    0.139 K/sec                    ( +-  0.01% )
    71,964,773,660 cycles                    #    2.373 GHz                      ( +-  0.13% ) [83.35%]
    31,191,284,231 stalled-cycles-frontend   #   43.34% frontend cycles idle     ( +-  0.40% ) [83.32%]
       773,484,474 stalled-cycles-backend    #    1.07% backend  cycles idle     ( +-  6.61% ) [66.67%]
   134,982,215,437 instructions              #    1.88  insns per cycle
                                             #    0.23  stalled cycles per insn  ( +-  0.11% ) [83.32%]
    13,509,150,683 branches                  #  445.447 M/sec                    ( +-  0.11% ) [83.34%]
         1,017,667 branch-misses             #    0.01% of all branches          ( +-  1.07% ) [83.32%]

      30.381324695 seconds time elapsed                                          ( +-  0.13% )

Mirobenchmark2
==============

test:
        posix_memalign((void **)&p, 2 * MB, 8 * GB);
        for (i = 0; i < 1000; i++) {
                char *_p = p;
                while (_p < p+4*GB) {
                        assert(*_p == *(_p+4*GB));
                        _p += 4096;
                        asm volatile ("": : :"memory");
                }
        }

hzp:
 Performance counter stats for 'taskset -c 0 ./test_memcmp2' (5 runs):

       3505.727639 task-clock                #    0.998 CPUs utilized            ( +-  0.26% )
                 9 context-switches          #    0.003 K/sec                    ( +-  4.97% )
             4,384 page-faults               #    0.001 M/sec                    ( +-  0.00% )
     8,318,482,466 cycles                    #    2.373 GHz                      ( +-  0.26% ) [33.31%]
     5,134,318,786 stalled-cycles-frontend   #   61.72% frontend cycles idle     ( +-  0.42% ) [33.32%]
     2,193,266,208 stalled-cycles-backend    #   26.37% backend  cycles idle     ( +-  5.51% ) [33.33%]
     9,494,670,537 instructions              #    1.14  insns per cycle
                                             #    0.54  stalled cycles per insn  ( +-  0.13% ) [41.68%]
     2,108,522,738 branches                  #  601.451 M/sec                    ( +-  0.09% ) [41.68%]
           158,746 branch-misses             #    0.01% of all branches          ( +-  1.60% ) [41.71%]
     3,168,102,115 L1-dcache-loads
          #  903.693 M/sec                    ( +-  0.11% ) [41.70%]
     1,048,710,998 L1-dcache-misses
         #   33.10% of all L1-dcache hits    ( +-  0.11% ) [41.72%]
     1,047,699,685 LLC-load
                 #  298.854 M/sec                    ( +-  0.03% ) [33.38%]
             2,287 LLC-misses
               #    0.00% of all LL-cache hits     ( +-  8.27% ) [33.37%]
     3,166,187,367 dTLB-loads
               #  903.147 M/sec                    ( +-  0.02% ) [33.35%]
         4,266,538 dTLB-misses
              #    0.13% of all dTLB cache hits   ( +-  0.03% ) [33.33%]

       3.513339813 seconds time elapsed                                          ( +-  0.26% )

vhzp:
 Performance counter stats for 'taskset -c 0 ./test_memcmp2' (5 runs):

      27313.891128 task-clock                #    0.998 CPUs utilized            ( +-  0.24% )
                62 context-switches          #    0.002 K/sec                    ( +-  0.61% )
             4,384 page-faults               #    0.160 K/sec                    ( +-  0.01% )
    64,747,374,606 cycles                    #    2.370 GHz                      ( +-  0.24% ) [33.33%]
    61,341,580,278 stalled-cycles-frontend   #   94.74% frontend cycles idle     ( +-  0.26% ) [33.33%]
    56,702,237,511 stalled-cycles-backend    #   87.57% backend  cycles idle     ( +-  0.07% ) [33.33%]
    10,033,724,846 instructions              #    0.15  insns per cycle
                                             #    6.11  stalled cycles per insn  ( +-  0.09% ) [41.65%]
     2,190,424,932 branches                  #   80.195 M/sec                    ( +-  0.12% ) [41.66%]
         1,028,630 branch-misses             #    0.05% of all branches          ( +-  1.50% ) [41.66%]
     3,302,006,540 L1-dcache-loads
          #  120.891 M/sec                    ( +-  0.11% ) [41.68%]
       271,374,358 L1-dcache-misses
         #    8.22% of all L1-dcache hits    ( +-  0.04% ) [41.66%]
        20,385,476 LLC-load
                 #    0.746 M/sec                    ( +-  1.64% ) [33.34%]
            76,754 LLC-misses
               #    0.38% of all LL-cache hits     ( +-  2.35% ) [33.34%]
     3,309,927,290 dTLB-loads
               #  121.181 M/sec                    ( +-  0.03% ) [33.34%]
     2,098,967,427 dTLB-misses
              #   63.41% of all dTLB cache hits   ( +-  0.03% ) [33.34%]

      27.364448741 seconds time elapsed                                          ( +-  0.24% )

===

I personally prefer implementation present in this patchset. It doesn't
touch arch-specific code.

This patch:

Huge zero page (hzp) is a non-movable huge page (2M on x86-64) filled with
zeros.

For now let's allocate the page on hugepage_init().  We'll switch to lazy
allocation later.

We are not going to map the huge zero page until we can handle it properly
on all code paths.

is_huge_zero_{pfn,pmd}() functions will be used by following patches to
check whether the pfn/pmd is huge zero page.
Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@linux.intel.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Acked-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4a6c1297

bootmem: remove alloc_arch_preferred_bootmem() · 3f7dfe24

由 Joonsoo Kim 提交于 12月 12, 2012

The name of this function is not suitable, and removing the function and
open-coding it into each call sites makes the code more understandable.

Additionally, we shouldn't do an allocation from bootmem when
slab_is_available(), so directly return kmalloc()'s return value.
Signed-off-by: NJoonsoo Kim <js1304@gmail.com>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3f7dfe24

bootmem: remove not implemented function call, bootmem_arch_preferred_node() · 2d7a6956

由 Joonsoo Kim 提交于 12月 12, 2012

There is no implementation of bootmem_arch_preferred_node() and a call to
this function will cause a compilation error.  So remove it.
Signed-off-by: NJoonsoo Kim <js1304@gmail.com>
Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2d7a6956

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal · 9977d9b3

由 Linus Torvalds 提交于 12月 12, 2012

Pull big execve/kernel_thread/fork unification series from Al Viro:
 "All architectures are converted to new model.  Quite a bit of that
  stuff is actually shared with architecture trees; in such cases it's
  literally shared branch pulled by both, not a cherry-pick.

  A lot of ugliness and black magic is gone (-3KLoC total in this one):

   - kernel_thread()/kernel_execve()/sys_execve() redesign.

     We don't do syscalls from kernel anymore for either kernel_thread()
     or kernel_execve():

     kernel_thread() is essentially clone(2) with callback run before we
     return to userland, the callbacks either never return or do
     successful do_execve() before returning.

     kernel_execve() is a wrapper for do_execve() - it doesn't need to
     do transition to user mode anymore.

     As a result kernel_thread() and kernel_execve() are
     arch-independent now - they live in kernel/fork.c and fs/exec.c
     resp.  sys_execve() is also in fs/exec.c and it's completely
     architecture-independent.

   - daemonize() is gone, along with its parts in fs/*.c

   - struct pt_regs * is no longer passed to do_fork/copy_process/
     copy_thread/do_execve/search_binary_handler/->load_binary/do_coredump.

   - sys_fork()/sys_vfork()/sys_clone() unified; some architectures
     still need wrappers (ones with callee-saved registers not saved in
     pt_regs on syscall entry), but the main part of those suckers is in
     kernel/fork.c now."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: (113 commits)
  do_coredump(): get rid of pt_regs argument
  print_fatal_signal(): get rid of pt_regs argument
  ptrace_signal(): get rid of unused arguments
  get rid of ptrace_signal_deliver() arguments
  new helper: signal_pt_regs()
  unify default ptrace_signal_deliver
  flagday: kill pt_regs argument of do_fork()
  death to idle_regs()
  don't pass regs to copy_process()
  flagday: don't pass regs to copy_thread()
  bfin: switch to generic vfork, get rid of pointless wrappers
  xtensa: switch to generic clone()
  openrisc: switch to use of generic fork and clone
  unicore32: switch to generic clone(2)
  score: switch to generic fork/vfork/clone
  c6x: sanitize copy_thread(), get rid of clone(2) wrapper, switch to generic clone()
  take sys_fork/sys_vfork/sys_clone prototypes to linux/syscalls.h
  mn10300: switch to generic fork/vfork/clone
  h8300: switch to generic fork/vfork/clone
  tile: switch to generic clone()
  ...

Conflicts:
	arch/microblaze/include/asm/Kbuild

9977d9b3

Merge tag 'boards' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · cf4af012

由 Linus Torvalds 提交于 12月 12, 2012

Pull ARM SoC board updates from Olof Johansson:
 "This branch contains a set of various board updates for ARM platforms.

  A few shmobile platforms that are stale have been removed, some
  defconfig updates for various boards selecting new features such as
  pinctrl subsystem support, and various updates enabling peripherals,
  etc."

Fix up conflicts mostly as per Olof.

* tag 'boards' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (58 commits)
  ARM: S3C64XX: Add dummy supplies for Glenfarclas LDOs
  ARM: S3C64XX: Add registration of WM2200 Bells device on Cragganmore
  ARM: kirkwood: Add Plat'Home OpenBlocks A6 support
  ARM: Dove: update defconfig
  ARM: Kirkwood: update defconfig for new boards
  arm: orion5x: add DT related options in defconfig
  arm: orion5x: convert 'LaCie Ethernet Disk mini v2' to Device Tree
  arm: orion5x: basic Device Tree support
  arm: orion5x: mechanical defconfig update
  ARM: kirkwood: Add support for the MPL CEC4
  arm: kirkwood: add support for ZyXEL NSA310
  ARM: Kirkwood: new board USI Topkick
  ARM: kirkwood: use gpio-fan DT binding on lsxl
  ARM: Kirkwood: add Netspace boards to defconfig
  ARM: kirkwood: DT board setup for Network Space Mini v2
  ARM: kirkwood: DT board setup for Network Space Lite v2
  ARM: kirkwood: DT board setup for Network Space v2 and parents
  leds: leds-ns2: add device tree binding
  ARM: Kirkwood: Enable the second I2C bus
  ARM: mmp: select pinctrl driver
  ...

cf4af012

Merge tag 'soc' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · d027db13

由 Linus Torvalds 提交于 12月 12, 2012

Pull ARM SoC updates from Olof Johansson:
 "This contains the bulk of new SoC development for this merge window.

  Two new platforms have been added, the sunxi platforms (Allwinner A1x
  SoCs) by Maxime Ripard, and a generic Broadcom platform for a new
  series of ARMv7 platforms from them, where the hope is that we can
  keep the platform code generic enough to have them all share one mach
  directory.  The new Broadcom platform is contributed by Christian
  Daudt.

  Highbank has grown support for Calxeda's next generation of hardware,
  ECX-2000.

  clps711x has seen a lot of cleanup from Alexander Shiyan, and he's
  also taken on maintainership of the platform.

  Beyond this there has been a bunch of work from a number of people on
  converting more platforms to IRQ domains, pinctrl conversion, cleanup
  and general feature enablement across most of the active platforms."

Fix up trivial conflicts as per Olof.

* tag 'soc' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (174 commits)
  mfd: vexpress-sysreg: Remove LEDs code
  irqchip: irq-sunxi: Add terminating entry for sunxi_irq_dt_ids
  clocksource: sunxi_timer: Add terminating entry for sunxi_timer_dt_ids
  irq: versatile: delete dangling variable
  ARM: sunxi: add missing include for mdelay()
  ARM: EXYNOS: Avoid early use of of_machine_is_compatible()
  ARM: dts: add node for PL330 MDMA1 controller for exynos4
  ARM: EXYNOS: Add support for secondary CPU bring-up on Exynos4412
  ARM: EXYNOS: add UART3 to DEBUG_LL ports
  ARM: S3C24XX: Add clkdev entry for camif-upll clock
  ARM: SAMSUNG: Add s3c24xx/s3c64xx CAMIF GPIO setup helpers
  ARM: sunxi: Add missing sun4i.dtsi file
  pinctrl: samsung: Do not initialise statics to 0
  ARM i.MX6: remove gate_mask from pllv3
  ARM i.MX6: Fix ethernet PLL clocks
  ARM i.MX6: rename PLLs according to datasheet
  ARM i.MX6: Add pwm support
  ARM i.MX51: Add pwm support
  ARM i.MX53: Add pwm support
  ARM: mx5: Replace clk_register_clkdev with clock DT lookup
  ...

d027db13

Merge tag 'cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · d01e4afd

由 Linus Torvalds 提交于 12月 12, 2012

Pull ARM SoC cleanups on various subarchitectures from Olof Johansson:
 "Cleanup patches for various ARM platforms and some of their associated
  drivers.  There's also a branch in here that enables Freescale i.MX to
  be part of the multiplatform support -- the first "big" SoC that is
  moved over (more multiplatform work comes in a separate branch later
  during the merge window)."

Conflicts fixed as per Olof, including a silent semantic one in
arch/arm/mach-omap2/board-generic.c (omap_prcm_restart() was renamed to
omap3xxx_restart(), and a new user of the old name was added).

* tag 'cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (189 commits)
  ARM: omap: fix typo on timer cleanup
  ARM: EXYNOS: Remove unused regs-mem.h file
  ARM: EXYNOS: Remove unused non-dt support for dwmci controller
  ARM: Kirkwood: Use hw_pci.ops instead of hw_pci.scan
  ARM: OMAP3: cm-t3517: use GPTIMER for system clock
  ARM: OMAP2+: timer: remove CONFIG_OMAP_32K_TIMER
  ARM: SAMSUNG: use devm_ functions for ADC driver
  ARM: EXYNOS: no duplicate mask/unmask in eint0_15
  ARM: S3C24XX: SPI clock channel setup is fixed for S3C2443
  ARM: EXYNOS: Remove i2c0 resource information and setting of device names
  ARM: Kirkwood: checkpatch cleanups
  ARM: Kirkwood: Fix sparse warnings.
  ARM: Kirkwood: Remove unused includes
  ARM: kirkwood: cleanup lsxl board includes
  ARM: integrator: use BUG_ON where possible
  ARM: integrator: push down SC dependencies
  ARM: integrator: delete static UART1 mapping
  ARM: integrator: delete SC mapping on the CP
  ARM: integrator: remove static CP syscon mapping
  ARM: integrator: remove static AP syscon mapping
  ...

d01e4afd

Merge tag 'headers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · 8287361a

由 Linus Torvalds 提交于 12月 12, 2012

Pull ARM SoC Header cleanups from Olof Johansson:
 "This is a collection of header file cleanups, mostly for OMAP and
  AT91, that keeps moving the platforms in the direction of
  multiplatform by removing the need for mach-dependent header files
  used in drivers and other places."

Fix up mostly trivial conflicts as per Olof.

* tag 'headers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (106 commits)
  ARM: OMAP2+: Move iommu/iovmm headers to platform_data
  ARM: OMAP2+: Make some definitions local
  ARM: OMAP2+: Move iommu2 to drivers/iommu/omap-iommu2.c
  ARM: OMAP2+: Move plat/iovmm.h to include/linux/omap-iommu.h
  ARM: OMAP2+: Move iopgtable header to drivers/iommu/
  ARM: OMAP: Merge iommu2.h into iommu.h
  atmel: move ATMEL_MAX_UART to platform_data/atmel.h
  ARM: OMAP: Remove omap_init_consistent_dma_size()
  arm: at91: move at91rm9200 rtc header in drivers/rtc
  arm: at91: move reset controller header to arm/arm/mach-at91
  arm: at91: move pit define to the driver
  arm: at91: move at91_shdwc.h to arch/arm/mach-at91
  arm: at91: move board header to arch/arm/mach-at91
  arn: at91: move at91_tc.h to arch/arm/mach-at91
  arm: at91 move at91_aic.h to arch/arm/mach-at91
  arm: at91 move board.h to arch/arm/mach-at91
  arm: at91: move platfarm_data to include/linux/platform_data/atmel.h
  arm: at91: drop machine defconfig
  ARM: OMAP: Remove NEED_MACH_GPIO_H
  ARM: OMAP: Remove unnecessary mach and plat includes
  ...

8287361a

Merge tag 'fixes-non-critical' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · 2989950c

由 Linus Torvalds 提交于 12月 12, 2012

Pull ARM SoC Non-critical bug fixes from Olof Johansson:
 "Simple bug fixes that were not considered important enough for
  inclusion into 3.7, especially those that arrived late during the
  merge window.

  There's also a MAINTAINERS update for the Renesas platforms in here,
  marking Simon Horman as a maintainer and changing the git url to his
  tree."

* tag 'fixes-non-critical' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
  Update ARM/SHMOBILE section of MAINTAINERS
  ARM: Fix Kconfig symbols typo for LEDS
  ARM: pxa: add dummy SA1100 rtc clock in pxa25x
  ARM: pxa: fix pxa25x gpio wakeup setting
  ARM: OMAP4: PM: fix errata handling when CONFIG_PM=n
  ARM: cns3xxx: drop unnecessary symbol selection
  ARM: vexpress: fix ll debug code when building multiplatform
  ARM: OMAP4: retrigger localtimers after re-enabling gic
  ARM: OMAP4460: Workaround for ROM bug because of CA9 r2pX GIC control register change.
  ARM: OMAP4: PM: add errata support
  ARM: davinci: fix return value check by using IS_ERR in tnetv107x_devices_init()
  ARM: davinci: uncompress.h: bail out if uart not initialized
  ARM: davinci: serial.h: fix uart number in the comment
  ARM: davinci: dm644x evm: move pointer dereference below NULL check
  ARM: vexpress: Make the debug UART detection more specific

2989950c

Merge branch 'for-linus' of git://git.linaro.org/people/rmk/linux-arm · b1286f4e

由 Linus Torvalds 提交于 12月 12, 2012

Pull ARM updates from Russell King:
 "Here's the updates for ARM for this merge window, which cover quite a
  variety of areas.

  There's a bunch of patch series from Will tackling various bugs like
  the PROT_NONE handling, ASID allocation, cluster boot protocol and
  ASID TLB tagging updates.

  We move to a build-time sorted exception table rather than doing the
  sorting at run-time, add support for the secure computing filter, and
  some updates to the perf code.  We also have sorted out the placement
  of some headers, fixed some build warnings, fixed some hotplug
  problems with the per-cpu TWD code."

* 'for-linus' of git://git.linaro.org/people/rmk/linux-arm: (73 commits)
  ARM: 7594/1: Add .smp entry for REALVIEW_EB
  ARM: 7599/1: head: Remove boot-time HYP mode check for v5 and below
  ARM: 7598/1: net: bpf_jit_32: fix sp-relative load/stores offsets.
  ARM: 7595/1: syscall: rework ordering in syscall_trace_exit
  ARM: 7596/1: mmci: replace readsl/writesl with ioread32_rep/iowrite32_rep
  ARM: 7597/1: net: bpf_jit_32: fix kzalloc gfp/size mismatch.
  ARM: 7593/1: nommu: do not enable DCACHE_WORD_ACCESS when !CONFIG_MMU
  ARM: 7592/1: nommu: prevent generation of kernel unaligned memory accesses
  ARM: 7591/1: nommu: Enable the strict alignment (CR_A) bit only if ARCH < v6
  ARM: 7590/1: /proc/interrupts: limit the display of IPIs to online CPUs only
  ARM: 7587/1: implement optimized percpu variable access
  ARM: 7589/1: integrator: pass the lm resource to amba
  ARM: 7588/1: amba: create a resource parent registrator
  ARM: 7582/2: rename kvm_seq to vmalloc_seq so to avoid confusion with KVM
  ARM: 7585/1: kernel: fix nr_cpu_ids check in DT logical map init
  ARM: 7584/1: perf: fix link error when CONFIG_HW_PERF_EVENTS is not selected
  ARM: gic: use a private mapping for CPU target interfaces
  ARM: kernel: add logical mappings look-up
  ARM: kernel: add cpu logical map DT init in setup_arch
  ARM: kernel: add device tree init map function
  ...

b1286f4e

Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6 · 6facac1a

由 Linus Torvalds 提交于 12月 12, 2012

Pull CIFS fixes from Steve French:
 "This includes a set of misc.  cifs fixes (most importantly some byte
  range lock related write fixes from Pavel, and some ACL and idmap
  related fixes from Jeff) but also includes the SMB2.02 dialect
  enablement, and a key fix for SMB3 mounts.

  Default authentication upgraded to ntlmv2 for cifs (it was already
  ntlmv2 for smb2)"

* 'for-next' of git://git.samba.org/sfrench/cifs-2.6: (43 commits)
  CIFS: Fix write after setting a read lock for read oplock files
  cifs: parse the device name into UNC and prepath
  cifs: fix up handling of prefixpath= option
  cifs: clean up handling of unc= option
  cifs: fix SID binary to string conversion
  fix "disabling echoes and oplocks" on SMB2 mounts
  Do not send SMB2 signatures for SMB3 frames
  cifs: deal with id_to_sid embedded sid reply corner case
  cifs: fix hardcoded default security descriptor length
  cifs: extra sanity checking for cifs.idmap keys
  cifs: avoid extra allocation for small cifs.idmap keys
  cifs: simplify id_to_sid and sid_to_id mapping code
  CIFS: Fix possible data coherency problem after oplock break to None
  CIFS: Do not permit write to a range mandatory locked with a read lock
  cifs: rename cifs_readdir_lookup to cifs_prime_dcache and make it void return
  cifs: Add CONFIG_CIFS_DEBUG and rename use of CIFS_DEBUG
  cifs: Make CIFS_DEBUG possible to undefine
  SMB3 mounts fail with access denied to some servers
  cifs: Remove unused cEVENT macro
  cifs: always zero out smb_vol before parsing options
  ...

6facac1a

Merge tag 'for-linus-v3.8-rc1' of git://oss.sgi.com/xfs/xfs · 3f1c64f4

由 Linus Torvalds 提交于 12月 12, 2012

Pull xfs update from Ben Myers:
 "There is plenty going on, including the cleanup of xfssyncd, metadata
  verifiers, CRC infrastructure for the log, tracking of inodes with
  speculative allocation, a cleanup of xfs_fs_subr.c, fixes for
  XFS_IOC_ZERO_RANGE, and important fix related to log replay (only
  update the last_sync_lsn when a transaction completes), a fix for
  deadlock on AGF buffers, documentation and comment updates, and a few
  more cleanups and fixes.

  Details:
   - remove the xfssyncd mess
   - only update the last_sync_lsn when a transaction completes
   - zero allocation_args on the kernel stack
   - fix AGF/alloc workqueue deadlock
   - silence uninitialised f.file warning
   - Update inode alloc comments
   - Update mount options documentation
   - report projid32bit feature in geometry call
   - speculative preallocation inode tracking
   - fix attr tree double split corruption
   - fix broken error handling in xfs_vm_writepage
   - drop buffer io reference when a bad bio is built
   - add more attribute tree trace points
   - growfs infrastructure changes for 3.8
   - fs/xfs/xfs_fs_subr.c die die die
   - add CRC infrastructure
   - add CRC checks to the log
   - Remove description of nodelaylog mount option from xfs.txt
   - inode allocation should use unmapped buffers
   - byte range granularity for XFS_IOC_ZERO_RANGE
   - fix direct IO nested transaction deadlock
   - fix stray dquot unlock when reclaiming dquots
   - fix sparse reported log CRC endian issue"

Fix up trivial conflict in fs/xfs/xfs_fsops.c due to the same patch
having been applied twice (commits eaef8543 and 1375cb65: "xfs:
growfs: don't read garbage for new secondary superblocks") with later
updates to the affected code in the XFS tree.

* tag 'for-linus-v3.8-rc1' of git://oss.sgi.com/xfs/xfs: (78 commits)
  xfs: fix sparse reported log CRC endian issue
  xfs: fix stray dquot unlock when reclaiming dquots
  xfs: fix direct IO nested transaction deadlock.
  xfs: byte range granularity for XFS_IOC_ZERO_RANGE
  xfs: inode allocation should use unmapped buffers.
  xfs: Remove the description of nodelaylog mount option from xfs.txt
  xfs: add CRC checks to the log
  xfs: add CRC infrastructure
  xfs: convert buffer verifiers to an ops structure.
  xfs: connect up write verifiers to new buffers
  xfs: add pre-write metadata buffer verifier callbacks
  xfs: add buffer pre-write callback
  xfs: Add verifiers to dir2 data readahead.
  xfs: add xfs_da_node verification
  xfs: factor and verify attr leaf reads
  xfs: factor dir2 leaf read
  xfs: factor out dir2 data block reading
  xfs: factor dir2 free block reading
  xfs: verify dir2 block format buffers
  xfs: factor dir2 block read operations
  ...

3f1c64f4

Merge tag 'dlm-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm · 22a40fd9

由 Linus Torvalds 提交于 12月 12, 2012

Pull dlm updates from David Teigland:
 "This set fixes some conditions in which value blocks are invalidated,
  and includes two trivial cleanups."

* tag 'dlm-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm:
  dlm: fix lvb invalidation conditions
  fs/dlm: remove CONFIG_EXPERIMENTAL
  dlm: remove unused variable in *dlm_lowcomms_get_buffer()

22a40fd9

Merge branch 'for-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup · d206e090

由 Linus Torvalds 提交于 12月 12, 2012

Pull cgroup changes from Tejun Heo:
 "A lot of activities on cgroup side.  The big changes are focused on
  making cgroup hierarchy handling saner.

   - cgroup_rmdir() had peculiar semantics - it allowed cgroup
     destruction to be vetoed by individual controllers and tried to
     drain refcnt synchronously.  The vetoing never worked properly and
     caused good deal of contortions in cgroup.  memcg was the last
     reamining user.  Michal Hocko removed the usage and cgroup_rmdir()
     path has been simplified significantly.  This was done in a
     separate branch so that the memcg people can base further memcg
     changes on top.

   - The above allowed cleaning up cgroup lifecycle management and
     implementation of generic cgroup iterators which are used to
     improve hierarchy support.

   - cgroup_freezer updated to allow migration in and out of a frozen
     cgroup and handle hierarchy.  If a cgroup is frozen, all descendant
     cgroups are frozen.

   - netcls_cgroup and netprio_cgroup updated to handle hierarchy
     properly.

   - Various fixes and cleanups.

   - Two merge commits.  One to pull in memcg and rmdir cleanups (needed
     to build iterators).  The other pulled in cgroup/for-3.7-fixes for
     device_cgroup fixes so that further device_cgroup patches can be
     stacked on top."

Fixed up a trivial conflict in mm/memcontrol.c as per Tejun (due to
commit bea8c150 ("memcg: fix hotplugged memory zone oops") in master
touching code close to commit 2ef37d3f ("memcg: Simplify
mem_cgroup_force_empty_list error handling") in for-3.8)

* 'for-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (65 commits)
  cgroup: update Documentation/cgroups/00-INDEX
  cgroup_rm_file: don't delete the uncreated files
  cgroup: remove subsystem files when remounting cgroup
  cgroup: use cgroup_addrm_files() in cgroup_clear_directory()
  cgroup: warn about broken hierarchies only after css_online
  cgroup: list_del_init() on removed events
  cgroup: fix lockdep warning for event_control
  cgroup: move list add after list head initilization
  netprio_cgroup: allow nesting and inherit config on cgroup creation
  netprio_cgroup: implement netprio[_set]_prio() helpers
  netprio_cgroup: use cgroup->id instead of cgroup_netprio_state->prioidx
  netprio_cgroup: reimplement priomap expansion
  netprio_cgroup: shorten variable names in extend_netdev_table()
  netprio_cgroup: simplify write_priomap()
  netcls_cgroup: move config inheritance to ->css_online() and remove .broken_hierarchy marking
  cgroup: remove obsolete guarantee from cgroup_task_migrate.
  cgroup: add cgroup->id
  cgroup, cpuset: remove cgroup_subsys->post_clone()
  cgroup: s/CGRP_CLONE_CHILDREN/CGRP_CPUSET_CLONE_CHILDREN/
  cgroup: rename ->create/post_create/pre_destroy/destroy() to ->css_alloc/online/offline/free()
  ...

d206e090

Merge branch 'for-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu · fef3ff2e

由 Linus Torvalds 提交于 12月 12, 2012

Pull percpu changes from Tejun Heo:
 "Nothing exciting here either.  Joonsoo's is almost cosmetic.  Cyrill's
  patch fixes "percpu_alloc" early kernel param handling so that the
  kernel doesn't crash when the parameter is specified w/o any argument."

* 'for-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu:
  mm, percpu: Make sure percpu_alloc early parameter has an argument
  percpu: make pcpu_free_chunk() use pcpu_mem_free() instead of kfree()

fef3ff2e

Merge branch 'for-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq · e7b55b8f

由 Linus Torvalds 提交于 12月 12, 2012

Pull workqueue changes from Tejun Heo:
 "Nothing exciting.  Just two trivial changes."

* 'for-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
  workqueue: add WARN_ON_ONCE() on CPU number to wq_worker_waking_up()
  workqueue: trivial fix for return statement in work_busy()

e7b55b8f

12 12月, 2012 2 次提交

Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux · 50851c62

由 Linus Torvalds 提交于 12月 12, 2012

Pull thermal management update from Zhang Rui:
 "Highlights:

   - Introduction of thermal policy support, together with three new
     thermal governors, including step_wise, user_space, fire_share.

   - Introduction of ST-Ericsson db8500_thermal driver and ST-Ericsson
     db8500_cpufreq_cooling driver.

   - Thermal Kconfig file and Makefile refactor.

   - Fixes for generic thermal layer, generic cpucooling, rcar thermal
     driver and Exynos thermal driver."

* 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/rzhang/linux: (36 commits)
  Thermal: Fix DEFAULT_THERMAL_GOVERNOR
  Thermal: fix a NULL pointer dereference when generic thermal layer is built as a module
  thermal: rcar: add rcar_zone_to_priv() macro
  thermal: rcar: fixup the unit of temperature
  thermal: cpu cooling: allow module builds
  thermal: cpu cooling: use const parameter while registering
  Thermal: Add ST-Ericsson DB8500 thermal properties and platform data.
  Thermal: Add ST-Ericsson DB8500 thermal driver.
  drivers/thermal/Makefile refactor
  Exynos: Add missing dependency
  Refactor drivers/thermal/Kconfig
  thermal: cpu_cooling: Make 'notify_device' static
  Thermal: Remove the cooling_cpufreq_list.
  Thermal: fix bug of counting cpu frequencies.
  Thermal: add indent for code alignment.
  thermal: rcar_thermal: remove explicitly used devm_kfree/iounap()
  thermal: user_space: Add missing static storage class specifiers
  thermal: fair_share: Add missing static storage class specifiers
  thermal: step_wise: Add missing static storage class specifiers
  Thermal: Fix oops and unlocking in thermal_sys.c
  ...

50851c62

Merge tag 'regmap-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap · 99b8f42e

由 Linus Torvalds 提交于 12月 12, 2012

Pull regmap updates from Mark Brown:
 "Quite a few enhancements this time around, helpers and diagnostics for
  the most part which is good to see:

   - Addition of table based lookups for the register access checks from
     Davide Ciminaghi, making life easier for drivers with big blocks of
     similar registers.
   - Allow drivers to get the irqdomain for regmap irq_chips, allowing
     the domain to be used with other APIs.
   - Debug improvements for paged register maps.
   - Performance improvments for some of the diagnostic infrastructure,
     very helpful for devices with large register maps."

* tag 'regmap-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
  regmap: debugfs: Cache offsets of valid regions for dump
  regmap: debugfs: Factor out initial seek
  regmap: debugfs: Avoid overflows for very small reads
  regmap: Cache register and value sizes for debugfs
  regmap: introduce tables for readable/writeable/volatile/precious checks
  regmap: core: Report registers in hex when we can't cache
  regmap: Fix printing of size_t variable
  regmap: make lock/unlock functions customizable
  regmap: silence GCC warning
  regmap: Split raw writes that cross window boundaries
  regmap: Make return code checks consistent
  regmap: Factor range lookup out of page selection
  regmap: Provide debugfs read of register ranges
  regmap: Factor out debugfs register read
  regmap: Allow ranges to be named
  regmap: When we sanity check during range adds say what errors we find
  regmap: Rename n_ranges to num_ranges
  regmap: irq: Allow users to retrieve the irq_domain

99b8f42e

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功