- 13 12月, 2012 40 次提交
-
-
由 Lai Jiangshan 提交于
N_HIGH_MEMORY stands for the nodes that has normal or high memory. N_MEMORY stands for the nodes that has any memory. The code here need to handle with the nodes which have memory, we should use N_MEMORY instead. Since we introduced N_MEMORY, we update the initialization of node_states. Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: NLin Feng <linfeng@cn.fujitsu.com> Signed-off-by: NWen Congyang <wency@cn.fujitsu.com> Cc: Christoph Lameter <cl@linux.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Lai Jiangshan 提交于
N_HIGH_MEMORY stands for the nodes that has normal or high memory. N_MEMORY stands for the nodes that has any memory. The code here need to handle with the nodes which have memory, we should use N_MEMORY instead. Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com> Acked-by: NHillf Danton <dhillf@gmail.com> Signed-off-by: NWen Congyang <wency@cn.fujitsu.com> Cc: Christoph Lameter <cl@linux.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: Lin Feng <linfeng@cn.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Lai Jiangshan 提交于
N_HIGH_MEMORY stands for the nodes that has normal or high memory. N_MEMORY stands for the nodes that has any memory. The code here need to handle with the nodes which have memory, we should use N_MEMORY instead. Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: NWen Congyang <wency@cn.fujitsu.com> Cc: Christoph Lameter <cl@linux.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: Lin Feng <linfeng@cn.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Lai Jiangshan 提交于
N_HIGH_MEMORY stands for the nodes that has normal or high memory. N_MEMORY stands for the nodes that has any memory. The code here need to handle with the nodes which have memory, we should use N_MEMORY instead. Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: NWen Congyang <wency@cn.fujitsu.com> Cc: Christoph Lameter <cl@linux.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: Lin Feng <linfeng@cn.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Lai Jiangshan 提交于
N_HIGH_MEMORY stands for the nodes that has normal or high memory. N_MEMORY stands for the nodes that has any memory. The code here need to handle with the nodes which have memory, we should use N_MEMORY instead. Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com> Acked-by: NChristoph Lameter <cl@linux.com> Signed-off-by: NWen Congyang <wency@cn.fujitsu.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: Lin Feng <linfeng@cn.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Lai Jiangshan 提交于
N_HIGH_MEMORY stands for the nodes that has normal or high memory. N_MEMORY stands for the nodes that has any memory. The code here need to handle with the nodes which have memory, we should use N_MEMORY instead. Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com> Acked-by: NHillf Danton <dhillf@gmail.com> Signed-off-by: NWen Congyang <wency@cn.fujitsu.com> Cc: Christoph Lameter <cl@linux.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: Lin Feng <linfeng@cn.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Lai Jiangshan 提交于
N_HIGH_MEMORY stands for the nodes that has normal or high memory. N_MEMORY stands for the nodes that has any memory. The code here need to handle with the nodes which have memory, we should use N_MEMORY instead. Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: NWen Congyang <wency@cn.fujitsu.com> Cc: Christoph Lameter <cl@linux.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: Lin Feng <linfeng@cn.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Lai Jiangshan 提交于
N_HIGH_MEMORY stands for the nodes that has normal or high memory. N_MEMORY stands for the nodes that has any memory. The code here need to handle with the nodes which have memory, we should use N_MEMORY instead. Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com> Acked-by: NChristoph Lameter <cl@linux.com> Signed-off-by: NWen Congyang <wency@cn.fujitsu.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: Lin Feng <linfeng@cn.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Lai Jiangshan 提交于
N_HIGH_MEMORY stands for the nodes that has normal or high memory. N_MEMORY stands for the nodes that has any memory. The code here need to handle with the nodes which have memory, we should use N_MEMORY instead. Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com> Acked-by: NHillf Danton <dhillf@gmail.com> Signed-off-by: NWen Congyang <wency@cn.fujitsu.com> Cc: Christoph Lameter <cl@linux.com> Cc: Lin Feng <linfeng@cn.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Lai Jiangshan 提交于
N_HIGH_MEMORY stands for the nodes that has normal or high memory. N_MEMORY stands for the nodes that has any memory. The code here need to handle with the nodes which have memory, we should use N_MEMORY instead. Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: NWen Congyang <wency@cn.fujitsu.com> Cc: Christoph Lameter <cl@linux.com> Cc: Hillf Danton <dhillf@gmail.com> Cc: Lin Feng <linfeng@cn.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Lai Jiangshan 提交于
N_HIGH_MEMORY stands for the nodes that has normal or high memory. N_MEMORY stands for the nodes that has any memory. The code here need to handle with the nodes which have memory, we should use N_MEMORY instead. Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com> Acked-by: NHillf Danton <dhillf@gmail.com> Signed-off-by: NWen Congyang <wency@cn.fujitsu.com> Cc: Christoph Lameter <cl@linux.com> Cc: Lin Feng <linfeng@cn.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Lai Jiangshan 提交于
N_HIGH_MEMORY stands for the nodes that has normal or high memory. N_MEMORY stands for the nodes that has any memory. The code here need to handle with the nodes which have memory, we should use N_MEMORY instead. Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com> Acked-by: NHillf Danton <dhillf@gmail.com> Signed-off-by: NWen Congyang <wency@cn.fujitsu.com> Cc: Christoph Lameter <cl@linux.com> Cc: Lin Feng <linfeng@cn.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Lai Jiangshan 提交于
We have N_NORMAL_MEMORY for standing for the nodes that have normal memory with zone_type <= ZONE_NORMAL. And we have N_HIGH_MEMORY for standing for the nodes that have normal or high memory. But we don't have any word to stand for the nodes that have *any* memory. And we have N_CPU but without N_MEMORY. Current code reuse the N_HIGH_MEMORY for this purpose because any node which has memory must have high memory or normal memory currently. A) But this reusing is bad for *readability*. Because the name N_HIGH_MEMORY just stands for high or normal: A.example 1) mem_cgroup_nr_lru_pages(): for_each_node_state(nid, N_HIGH_MEMORY) The user will be confused(why this function just counts for high or normal memory node? does it counts for ZONE_MOVABLE's lru pages?) until someone else tell them N_HIGH_MEMORY is reused to stand for nodes that have any memory. A.cont) If we introduce N_MEMORY, we can reduce this confusing AND make the code more clearly: A.example 2) mm/page_cgroup.c use N_HIGH_MEMORY twice: One is in page_cgroup_init(void): for_each_node_state(nid, N_HIGH_MEMORY) { It means if the node have memory, we will allocate page_cgroup map for the node. We should use N_MEMORY instead here to gaim more clearly. The second using is in alloc_page_cgroup(): if (node_state(nid, N_HIGH_MEMORY)) addr = vzalloc_node(size, nid); It means if the node has high or normal memory that can be allocated from kernel. We should keep N_HIGH_MEMORY here, and it will be better if the "any memory" semantic of N_HIGH_MEMORY is removed. B) This reusing is out-dated if we introduce MOVABLE-dedicated node. The MOVABLE-dedicated node should not appear in node_stats[N_HIGH_MEMORY] nor node_stats[N_NORMAL_MEMORY], because MOVABLE-dedicated node has no high or normal memory. In x86_64, N_HIGH_MEMORY=N_NORMAL_MEMORY, if a MOVABLE-dedicated node is in node_stats[N_HIGH_MEMORY], it is also means it is in node_stats[N_NORMAL_MEMORY], it causes SLUB wrong. The slub uses for_each_node_state(nid, N_NORMAL_MEMORY) and creates kmem_cache_node for MOVABLE-dedicated node and cause problem. In one word, we need a N_MEMORY. We just intrude it as an alias to N_HIGH_MEMORY and fix all im-proper usages of N_HIGH_MEMORY in late patches. Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com> Acked-by: NChristoph Lameter <cl@linux.com> Acked-by: NHillf Danton <dhillf@gmail.com> Signed-off-by: NWen Congyang <wency@cn.fujitsu.com> Cc: Lin Feng <linfeng@cn.fujitsu.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Marek Szyprowski 提交于
__alloc_contig_migrate_range() should use all possible ways to get all the pages migrated from the given memory range, so pruning per-cpu lru lists for all CPUs is required, regadless the cost of such operation. Otherwise some pages which got stuck at per-cpu lru list might get missed by migration procedure causing the contiguous allocation to fail. Reported-by: NSeongHwan Yoon <sunghwan.yun@samsung.com> Signed-off-by: NMarek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: NKyungmin Park <kyungmin.park@samsung.com> Acked-by: NMichal Nazarewicz <mina86@mina86.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Thierry Reding 提交于
compact_capture_page() is only used if compaction is enabled so it should be moved into the corresponding #ifdef. Signed-off-by: NThierry Reding <thierry.reding@avionic-design.de> Acked-by: NMel Gorman <mgorman@suse.de> Cc: Rik van Riel <riel@redhat.com> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kirill A. Shutemov 提交于
pmd value is stable only with mm->page_table_lock taken. After taking the lock we need to check that nobody modified the pmd before changing it. Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Jiri Slaby <jslaby@suse.cz> Cc: David Rientjes <rientjes@google.com> Reviewed-by: NBob Liu <lliubbo@gmail.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kirill A. Shutemov 提交于
By default kernel tries to use huge zero page on read page fault. It's possible to disable huge zero page by writing 0 or enable it back by writing 1: echo 0 >/sys/kernel/mm/transparent_hugepage/khugepaged/use_zero_page echo 1 >/sys/kernel/mm/transparent_hugepage/khugepaged/use_zero_page Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: "H. Peter Anvin" <hpa@linux.intel.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kirill A. Shutemov 提交于
hzp_alloc is incremented every time a huge zero page is successfully allocated. It includes allocations which where dropped due race with other allocation. Note, it doesn't count every map of the huge zero page, only its allocation. hzp_alloc_failed is incremented if kernel fails to allocate huge zero page and falls back to using small pages. Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: "H. Peter Anvin" <hpa@linux.intel.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kirill A. Shutemov 提交于
H. Peter Anvin doesn't like huge zero page which sticks in memory forever after the first allocation. Here's implementation of lockless refcounting for huge zero page. We have two basic primitives: {get,put}_huge_zero_page(). They manipulate reference counter. If counter is 0, get_huge_zero_page() allocates a new huge page and takes two references: one for caller and one for shrinker. We free the page only in shrinker callback if counter is 1 (only shrinker has the reference). put_huge_zero_page() only decrements counter. Counter is never zero in put_huge_zero_page() since shrinker holds on reference. Freeing huge zero page in shrinker callback helps to avoid frequent allocate-free. Refcounting has cost. On 4 socket machine I observe ~1% slowdown on parallel (40 processes) read page faulting comparing to lazy huge page allocation. I think it's pretty reasonable for synthetic benchmark. [lliubbo@gmail.com: fix mismerge] Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: "H. Peter Anvin" <hpa@linux.intel.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NBob Liu <lliubbo@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kirill A. Shutemov 提交于
Instead of allocating huge zero page on hugepage_init() we can postpone it until first huge zero page map. It saves memory if THP is not in use. cmpxchg() is used to avoid race on huge_zero_pfn initialization. Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: "H. Peter Anvin" <hpa@linux.intel.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kirill A. Shutemov 提交于
All code paths seems covered. Now we can map huge zero page on read page fault. We setup it in do_huge_pmd_anonymous_page() if area around fault address is suitable for THP and we've got read page fault. If we fail to setup huge zero page (ENOMEM) we fallback to handle_pte_fault() as we normally do in THP. Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: "H. Peter Anvin" <hpa@linux.intel.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kirill A. Shutemov 提交于
We can't split huge zero page itself (and it's bug if we try), but we can split the pmd which points to it. On splitting the pmd we create a table with all ptes set to normal zero page. [akpm@linux-foundation.org: fix build error] Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: "H. Peter Anvin" <hpa@linux.intel.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kirill A. Shutemov 提交于
Pass vma instead of mm and add address parameter. In most cases we already have vma on the stack. We provides split_huge_page_pmd_mm() for few cases when we have mm, but not vma. This change is preparation to huge zero pmd splitting implementation. Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: "H. Peter Anvin" <hpa@linux.intel.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kirill A. Shutemov 提交于
mprotect core never tries to make page writable using change_huge_pmd(). Let's add an assert that the assumption is true. It's important to be sure we will not make huge zero page writable. Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: "H. Peter Anvin" <hpa@linux.intel.com> Cc: Mel Gorman <mel@csn.ul.ie> Cc: David Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kirill A. Shutemov 提交于
On write access to huge zero page we alloc a new huge page and clear it. If ENOMEM, graceful fallback: we create a new pmd table and set pte around fault address to newly allocated normal (4k) page. All other ptes in the pmd set to normal zero page. Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: "H. Peter Anvin" <hpa@linux.intel.com> Cc: Mel Gorman <mel@csn.ul.ie> Acked-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kirill A. Shutemov 提交于
It's easy to copy huge zero page. Just set destination pmd to huge zero page. It's safe to copy huge zero page since we have none yet :-p [rientjes@google.com: fix comment] Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: "H. Peter Anvin" <hpa@linux.intel.com> Cc: Mel Gorman <mel@csn.ul.ie> Signed-off-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kirill A. Shutemov 提交于
We don't have a mapped page to zap in huge zero page case. Let's just clear pmd and remove it from tlb. Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: "H. Peter Anvin" <hpa@linux.intel.com> Cc: Mel Gorman <mel@csn.ul.ie> Acked-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kirill A. Shutemov 提交于
During testing I noticed big (up to 2.5 times) memory consumption overhead on some workloads (e.g. ft.A from NPB) if THP is enabled. The main reason for that big difference is lacking zero page in THP case. We have to allocate a real page on read page fault. A program to demonstrate the issue: #include <assert.h> #include <stdlib.h> #include <unistd.h> #define MB 1024*1024 int main(int argc, char **argv) { char *p; int i; posix_memalign((void **)&p, 2 * MB, 200 * MB); for (i = 0; i < 200 * MB; i+= 4096) assert(p[i] == 0); pause(); return 0; } With thp-never RSS is about 400k, but with thp-always it's 200M. After the patcheset thp-always RSS is 400k too. Design overview. Huge zero page (hzp) is a non-movable huge page (2M on x86-64) filled with zeros. The way how we allocate it changes in the patchset: - [01/10] simplest way: hzp allocated on boot time in hugepage_init(); - [09/10] lazy allocation on first use; - [10/10] lockless refcounting + shrinker-reclaimable hzp; We setup it in do_huge_pmd_anonymous_page() if area around fault address is suitable for THP and we've got read page fault. If we fail to setup hzp (ENOMEM) we fallback to handle_pte_fault() as we normally do in THP. On wp fault to hzp we allocate real memory for the huge page and clear it. If ENOMEM, graceful fallback: we create a new pmd table and set pte around fault address to newly allocated normal (4k) page. All other ptes in the pmd set to normal zero page. We cannot split hzp (and it's bug if we try), but we can split the pmd which points to it. On splitting the pmd we create a table with all ptes set to normal zero page. === By hpa's request I've tried alternative approach for hzp implementation (see Virtual huge zero page patchset): pmd table with all entries set to zero page. This way should be more cache friendly, but it increases TLB pressure. The problem with virtual huge zero page: it requires per-arch enabling. We need a way to mark that pmd table has all ptes set to zero page. Some numbers to compare two implementations (on 4s Westmere-EX): Mirobenchmark1 ============== test: posix_memalign((void **)&p, 2 * MB, 8 * GB); for (i = 0; i < 100; i++) { assert(memcmp(p, p + 4*GB, 4*GB) == 0); asm volatile ("": : :"memory"); } hzp: Performance counter stats for './test_memcmp' (5 runs): 32356.272845 task-clock # 0.998 CPUs utilized ( +- 0.13% ) 40 context-switches # 0.001 K/sec ( +- 0.94% ) 0 CPU-migrations # 0.000 K/sec 4,218 page-faults # 0.130 K/sec ( +- 0.00% ) 76,712,481,765 cycles # 2.371 GHz ( +- 0.13% ) [83.31%] 36,279,577,636 stalled-cycles-frontend # 47.29% frontend cycles idle ( +- 0.28% ) [83.35%] 1,684,049,110 stalled-cycles-backend # 2.20% backend cycles idle ( +- 2.96% ) [66.67%] 134,355,715,816 instructions # 1.75 insns per cycle # 0.27 stalled cycles per insn ( +- 0.10% ) [83.35%] 13,526,169,702 branches # 418.039 M/sec ( +- 0.10% ) [83.31%] 1,058,230 branch-misses # 0.01% of all branches ( +- 0.91% ) [83.36%] 32.413866442 seconds time elapsed ( +- 0.13% ) vhzp: Performance counter stats for './test_memcmp' (5 runs): 30327.183829 task-clock # 0.998 CPUs utilized ( +- 0.13% ) 38 context-switches # 0.001 K/sec ( +- 1.53% ) 0 CPU-migrations # 0.000 K/sec 4,218 page-faults # 0.139 K/sec ( +- 0.01% ) 71,964,773,660 cycles # 2.373 GHz ( +- 0.13% ) [83.35%] 31,191,284,231 stalled-cycles-frontend # 43.34% frontend cycles idle ( +- 0.40% ) [83.32%] 773,484,474 stalled-cycles-backend # 1.07% backend cycles idle ( +- 6.61% ) [66.67%] 134,982,215,437 instructions # 1.88 insns per cycle # 0.23 stalled cycles per insn ( +- 0.11% ) [83.32%] 13,509,150,683 branches # 445.447 M/sec ( +- 0.11% ) [83.34%] 1,017,667 branch-misses # 0.01% of all branches ( +- 1.07% ) [83.32%] 30.381324695 seconds time elapsed ( +- 0.13% ) Mirobenchmark2 ============== test: posix_memalign((void **)&p, 2 * MB, 8 * GB); for (i = 0; i < 1000; i++) { char *_p = p; while (_p < p+4*GB) { assert(*_p == *(_p+4*GB)); _p += 4096; asm volatile ("": : :"memory"); } } hzp: Performance counter stats for 'taskset -c 0 ./test_memcmp2' (5 runs): 3505.727639 task-clock # 0.998 CPUs utilized ( +- 0.26% ) 9 context-switches # 0.003 K/sec ( +- 4.97% ) 4,384 page-faults # 0.001 M/sec ( +- 0.00% ) 8,318,482,466 cycles # 2.373 GHz ( +- 0.26% ) [33.31%] 5,134,318,786 stalled-cycles-frontend # 61.72% frontend cycles idle ( +- 0.42% ) [33.32%] 2,193,266,208 stalled-cycles-backend # 26.37% backend cycles idle ( +- 5.51% ) [33.33%] 9,494,670,537 instructions # 1.14 insns per cycle # 0.54 stalled cycles per insn ( +- 0.13% ) [41.68%] 2,108,522,738 branches # 601.451 M/sec ( +- 0.09% ) [41.68%] 158,746 branch-misses # 0.01% of all branches ( +- 1.60% ) [41.71%] 3,168,102,115 L1-dcache-loads # 903.693 M/sec ( +- 0.11% ) [41.70%] 1,048,710,998 L1-dcache-misses # 33.10% of all L1-dcache hits ( +- 0.11% ) [41.72%] 1,047,699,685 LLC-load # 298.854 M/sec ( +- 0.03% ) [33.38%] 2,287 LLC-misses # 0.00% of all LL-cache hits ( +- 8.27% ) [33.37%] 3,166,187,367 dTLB-loads # 903.147 M/sec ( +- 0.02% ) [33.35%] 4,266,538 dTLB-misses # 0.13% of all dTLB cache hits ( +- 0.03% ) [33.33%] 3.513339813 seconds time elapsed ( +- 0.26% ) vhzp: Performance counter stats for 'taskset -c 0 ./test_memcmp2' (5 runs): 27313.891128 task-clock # 0.998 CPUs utilized ( +- 0.24% ) 62 context-switches # 0.002 K/sec ( +- 0.61% ) 4,384 page-faults # 0.160 K/sec ( +- 0.01% ) 64,747,374,606 cycles # 2.370 GHz ( +- 0.24% ) [33.33%] 61,341,580,278 stalled-cycles-frontend # 94.74% frontend cycles idle ( +- 0.26% ) [33.33%] 56,702,237,511 stalled-cycles-backend # 87.57% backend cycles idle ( +- 0.07% ) [33.33%] 10,033,724,846 instructions # 0.15 insns per cycle # 6.11 stalled cycles per insn ( +- 0.09% ) [41.65%] 2,190,424,932 branches # 80.195 M/sec ( +- 0.12% ) [41.66%] 1,028,630 branch-misses # 0.05% of all branches ( +- 1.50% ) [41.66%] 3,302,006,540 L1-dcache-loads # 120.891 M/sec ( +- 0.11% ) [41.68%] 271,374,358 L1-dcache-misses # 8.22% of all L1-dcache hits ( +- 0.04% ) [41.66%] 20,385,476 LLC-load # 0.746 M/sec ( +- 1.64% ) [33.34%] 76,754 LLC-misses # 0.38% of all LL-cache hits ( +- 2.35% ) [33.34%] 3,309,927,290 dTLB-loads # 121.181 M/sec ( +- 0.03% ) [33.34%] 2,098,967,427 dTLB-misses # 63.41% of all dTLB cache hits ( +- 0.03% ) [33.34%] 27.364448741 seconds time elapsed ( +- 0.24% ) === I personally prefer implementation present in this patchset. It doesn't touch arch-specific code. This patch: Huge zero page (hzp) is a non-movable huge page (2M on x86-64) filled with zeros. For now let's allocate the page on hugepage_init(). We'll switch to lazy allocation later. We are not going to map the huge zero page until we can handle it properly on all code paths. is_huge_zero_{pfn,pmd}() functions will be used by following patches to check whether the pfn/pmd is huge zero page. Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: "H. Peter Anvin" <hpa@linux.intel.com> Cc: Mel Gorman <mel@csn.ul.ie> Acked-by: NDavid Rientjes <rientjes@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joonsoo Kim 提交于
The name of this function is not suitable, and removing the function and open-coding it into each call sites makes the code more understandable. Additionally, we shouldn't do an allocation from bootmem when slab_is_available(), so directly return kmalloc()'s return value. Signed-off-by: NJoonsoo Kim <js1304@gmail.com> Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Joonsoo Kim 提交于
There is no implementation of bootmem_arch_preferred_node() and a call to this function will cause a compilation error. So remove it. Signed-off-by: NJoonsoo Kim <js1304@gmail.com> Cc: Haavard Skinnemoen <hskinnemoen@gmail.com> Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal由 Linus Torvalds 提交于
Pull big execve/kernel_thread/fork unification series from Al Viro: "All architectures are converted to new model. Quite a bit of that stuff is actually shared with architecture trees; in such cases it's literally shared branch pulled by both, not a cherry-pick. A lot of ugliness and black magic is gone (-3KLoC total in this one): - kernel_thread()/kernel_execve()/sys_execve() redesign. We don't do syscalls from kernel anymore for either kernel_thread() or kernel_execve(): kernel_thread() is essentially clone(2) with callback run before we return to userland, the callbacks either never return or do successful do_execve() before returning. kernel_execve() is a wrapper for do_execve() - it doesn't need to do transition to user mode anymore. As a result kernel_thread() and kernel_execve() are arch-independent now - they live in kernel/fork.c and fs/exec.c resp. sys_execve() is also in fs/exec.c and it's completely architecture-independent. - daemonize() is gone, along with its parts in fs/*.c - struct pt_regs * is no longer passed to do_fork/copy_process/ copy_thread/do_execve/search_binary_handler/->load_binary/do_coredump. - sys_fork()/sys_vfork()/sys_clone() unified; some architectures still need wrappers (ones with callee-saved registers not saved in pt_regs on syscall entry), but the main part of those suckers is in kernel/fork.c now." * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal: (113 commits) do_coredump(): get rid of pt_regs argument print_fatal_signal(): get rid of pt_regs argument ptrace_signal(): get rid of unused arguments get rid of ptrace_signal_deliver() arguments new helper: signal_pt_regs() unify default ptrace_signal_deliver flagday: kill pt_regs argument of do_fork() death to idle_regs() don't pass regs to copy_process() flagday: don't pass regs to copy_thread() bfin: switch to generic vfork, get rid of pointless wrappers xtensa: switch to generic clone() openrisc: switch to use of generic fork and clone unicore32: switch to generic clone(2) score: switch to generic fork/vfork/clone c6x: sanitize copy_thread(), get rid of clone(2) wrapper, switch to generic clone() take sys_fork/sys_vfork/sys_clone prototypes to linux/syscalls.h mn10300: switch to generic fork/vfork/clone h8300: switch to generic fork/vfork/clone tile: switch to generic clone() ... Conflicts: arch/microblaze/include/asm/Kbuild
-
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc由 Linus Torvalds 提交于
Pull ARM SoC board updates from Olof Johansson: "This branch contains a set of various board updates for ARM platforms. A few shmobile platforms that are stale have been removed, some defconfig updates for various boards selecting new features such as pinctrl subsystem support, and various updates enabling peripherals, etc." Fix up conflicts mostly as per Olof. * tag 'boards' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (58 commits) ARM: S3C64XX: Add dummy supplies for Glenfarclas LDOs ARM: S3C64XX: Add registration of WM2200 Bells device on Cragganmore ARM: kirkwood: Add Plat'Home OpenBlocks A6 support ARM: Dove: update defconfig ARM: Kirkwood: update defconfig for new boards arm: orion5x: add DT related options in defconfig arm: orion5x: convert 'LaCie Ethernet Disk mini v2' to Device Tree arm: orion5x: basic Device Tree support arm: orion5x: mechanical defconfig update ARM: kirkwood: Add support for the MPL CEC4 arm: kirkwood: add support for ZyXEL NSA310 ARM: Kirkwood: new board USI Topkick ARM: kirkwood: use gpio-fan DT binding on lsxl ARM: Kirkwood: add Netspace boards to defconfig ARM: kirkwood: DT board setup for Network Space Mini v2 ARM: kirkwood: DT board setup for Network Space Lite v2 ARM: kirkwood: DT board setup for Network Space v2 and parents leds: leds-ns2: add device tree binding ARM: Kirkwood: Enable the second I2C bus ARM: mmp: select pinctrl driver ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc由 Linus Torvalds 提交于
Pull ARM SoC updates from Olof Johansson: "This contains the bulk of new SoC development for this merge window. Two new platforms have been added, the sunxi platforms (Allwinner A1x SoCs) by Maxime Ripard, and a generic Broadcom platform for a new series of ARMv7 platforms from them, where the hope is that we can keep the platform code generic enough to have them all share one mach directory. The new Broadcom platform is contributed by Christian Daudt. Highbank has grown support for Calxeda's next generation of hardware, ECX-2000. clps711x has seen a lot of cleanup from Alexander Shiyan, and he's also taken on maintainership of the platform. Beyond this there has been a bunch of work from a number of people on converting more platforms to IRQ domains, pinctrl conversion, cleanup and general feature enablement across most of the active platforms." Fix up trivial conflicts as per Olof. * tag 'soc' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (174 commits) mfd: vexpress-sysreg: Remove LEDs code irqchip: irq-sunxi: Add terminating entry for sunxi_irq_dt_ids clocksource: sunxi_timer: Add terminating entry for sunxi_timer_dt_ids irq: versatile: delete dangling variable ARM: sunxi: add missing include for mdelay() ARM: EXYNOS: Avoid early use of of_machine_is_compatible() ARM: dts: add node for PL330 MDMA1 controller for exynos4 ARM: EXYNOS: Add support for secondary CPU bring-up on Exynos4412 ARM: EXYNOS: add UART3 to DEBUG_LL ports ARM: S3C24XX: Add clkdev entry for camif-upll clock ARM: SAMSUNG: Add s3c24xx/s3c64xx CAMIF GPIO setup helpers ARM: sunxi: Add missing sun4i.dtsi file pinctrl: samsung: Do not initialise statics to 0 ARM i.MX6: remove gate_mask from pllv3 ARM i.MX6: Fix ethernet PLL clocks ARM i.MX6: rename PLLs according to datasheet ARM i.MX6: Add pwm support ARM i.MX51: Add pwm support ARM i.MX53: Add pwm support ARM: mx5: Replace clk_register_clkdev with clock DT lookup ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc由 Linus Torvalds 提交于
Pull ARM SoC cleanups on various subarchitectures from Olof Johansson: "Cleanup patches for various ARM platforms and some of their associated drivers. There's also a branch in here that enables Freescale i.MX to be part of the multiplatform support -- the first "big" SoC that is moved over (more multiplatform work comes in a separate branch later during the merge window)." Conflicts fixed as per Olof, including a silent semantic one in arch/arm/mach-omap2/board-generic.c (omap_prcm_restart() was renamed to omap3xxx_restart(), and a new user of the old name was added). * tag 'cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (189 commits) ARM: omap: fix typo on timer cleanup ARM: EXYNOS: Remove unused regs-mem.h file ARM: EXYNOS: Remove unused non-dt support for dwmci controller ARM: Kirkwood: Use hw_pci.ops instead of hw_pci.scan ARM: OMAP3: cm-t3517: use GPTIMER for system clock ARM: OMAP2+: timer: remove CONFIG_OMAP_32K_TIMER ARM: SAMSUNG: use devm_ functions for ADC driver ARM: EXYNOS: no duplicate mask/unmask in eint0_15 ARM: S3C24XX: SPI clock channel setup is fixed for S3C2443 ARM: EXYNOS: Remove i2c0 resource information and setting of device names ARM: Kirkwood: checkpatch cleanups ARM: Kirkwood: Fix sparse warnings. ARM: Kirkwood: Remove unused includes ARM: kirkwood: cleanup lsxl board includes ARM: integrator: use BUG_ON where possible ARM: integrator: push down SC dependencies ARM: integrator: delete static UART1 mapping ARM: integrator: delete SC mapping on the CP ARM: integrator: remove static CP syscon mapping ARM: integrator: remove static AP syscon mapping ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc由 Linus Torvalds 提交于
Pull ARM SoC Header cleanups from Olof Johansson: "This is a collection of header file cleanups, mostly for OMAP and AT91, that keeps moving the platforms in the direction of multiplatform by removing the need for mach-dependent header files used in drivers and other places." Fix up mostly trivial conflicts as per Olof. * tag 'headers' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (106 commits) ARM: OMAP2+: Move iommu/iovmm headers to platform_data ARM: OMAP2+: Make some definitions local ARM: OMAP2+: Move iommu2 to drivers/iommu/omap-iommu2.c ARM: OMAP2+: Move plat/iovmm.h to include/linux/omap-iommu.h ARM: OMAP2+: Move iopgtable header to drivers/iommu/ ARM: OMAP: Merge iommu2.h into iommu.h atmel: move ATMEL_MAX_UART to platform_data/atmel.h ARM: OMAP: Remove omap_init_consistent_dma_size() arm: at91: move at91rm9200 rtc header in drivers/rtc arm: at91: move reset controller header to arm/arm/mach-at91 arm: at91: move pit define to the driver arm: at91: move at91_shdwc.h to arch/arm/mach-at91 arm: at91: move board header to arch/arm/mach-at91 arn: at91: move at91_tc.h to arch/arm/mach-at91 arm: at91 move at91_aic.h to arch/arm/mach-at91 arm: at91 move board.h to arch/arm/mach-at91 arm: at91: move platfarm_data to include/linux/platform_data/atmel.h arm: at91: drop machine defconfig ARM: OMAP: Remove NEED_MACH_GPIO_H ARM: OMAP: Remove unnecessary mach and plat includes ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc由 Linus Torvalds 提交于
Pull ARM SoC Non-critical bug fixes from Olof Johansson: "Simple bug fixes that were not considered important enough for inclusion into 3.7, especially those that arrived late during the merge window. There's also a MAINTAINERS update for the Renesas platforms in here, marking Simon Horman as a maintainer and changing the git url to his tree." * tag 'fixes-non-critical' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: Update ARM/SHMOBILE section of MAINTAINERS ARM: Fix Kconfig symbols typo for LEDS ARM: pxa: add dummy SA1100 rtc clock in pxa25x ARM: pxa: fix pxa25x gpio wakeup setting ARM: OMAP4: PM: fix errata handling when CONFIG_PM=n ARM: cns3xxx: drop unnecessary symbol selection ARM: vexpress: fix ll debug code when building multiplatform ARM: OMAP4: retrigger localtimers after re-enabling gic ARM: OMAP4460: Workaround for ROM bug because of CA9 r2pX GIC control register change. ARM: OMAP4: PM: add errata support ARM: davinci: fix return value check by using IS_ERR in tnetv107x_devices_init() ARM: davinci: uncompress.h: bail out if uart not initialized ARM: davinci: serial.h: fix uart number in the comment ARM: davinci: dm644x evm: move pointer dereference below NULL check ARM: vexpress: Make the debug UART detection more specific
-
git://git.linaro.org/people/rmk/linux-arm由 Linus Torvalds 提交于
Pull ARM updates from Russell King: "Here's the updates for ARM for this merge window, which cover quite a variety of areas. There's a bunch of patch series from Will tackling various bugs like the PROT_NONE handling, ASID allocation, cluster boot protocol and ASID TLB tagging updates. We move to a build-time sorted exception table rather than doing the sorting at run-time, add support for the secure computing filter, and some updates to the perf code. We also have sorted out the placement of some headers, fixed some build warnings, fixed some hotplug problems with the per-cpu TWD code." * 'for-linus' of git://git.linaro.org/people/rmk/linux-arm: (73 commits) ARM: 7594/1: Add .smp entry for REALVIEW_EB ARM: 7599/1: head: Remove boot-time HYP mode check for v5 and below ARM: 7598/1: net: bpf_jit_32: fix sp-relative load/stores offsets. ARM: 7595/1: syscall: rework ordering in syscall_trace_exit ARM: 7596/1: mmci: replace readsl/writesl with ioread32_rep/iowrite32_rep ARM: 7597/1: net: bpf_jit_32: fix kzalloc gfp/size mismatch. ARM: 7593/1: nommu: do not enable DCACHE_WORD_ACCESS when !CONFIG_MMU ARM: 7592/1: nommu: prevent generation of kernel unaligned memory accesses ARM: 7591/1: nommu: Enable the strict alignment (CR_A) bit only if ARCH < v6 ARM: 7590/1: /proc/interrupts: limit the display of IPIs to online CPUs only ARM: 7587/1: implement optimized percpu variable access ARM: 7589/1: integrator: pass the lm resource to amba ARM: 7588/1: amba: create a resource parent registrator ARM: 7582/2: rename kvm_seq to vmalloc_seq so to avoid confusion with KVM ARM: 7585/1: kernel: fix nr_cpu_ids check in DT logical map init ARM: 7584/1: perf: fix link error when CONFIG_HW_PERF_EVENTS is not selected ARM: gic: use a private mapping for CPU target interfaces ARM: kernel: add logical mappings look-up ARM: kernel: add cpu logical map DT init in setup_arch ARM: kernel: add device tree init map function ...
-
git://git.samba.org/sfrench/cifs-2.6由 Linus Torvalds 提交于
Pull CIFS fixes from Steve French: "This includes a set of misc. cifs fixes (most importantly some byte range lock related write fixes from Pavel, and some ACL and idmap related fixes from Jeff) but also includes the SMB2.02 dialect enablement, and a key fix for SMB3 mounts. Default authentication upgraded to ntlmv2 for cifs (it was already ntlmv2 for smb2)" * 'for-next' of git://git.samba.org/sfrench/cifs-2.6: (43 commits) CIFS: Fix write after setting a read lock for read oplock files cifs: parse the device name into UNC and prepath cifs: fix up handling of prefixpath= option cifs: clean up handling of unc= option cifs: fix SID binary to string conversion fix "disabling echoes and oplocks" on SMB2 mounts Do not send SMB2 signatures for SMB3 frames cifs: deal with id_to_sid embedded sid reply corner case cifs: fix hardcoded default security descriptor length cifs: extra sanity checking for cifs.idmap keys cifs: avoid extra allocation for small cifs.idmap keys cifs: simplify id_to_sid and sid_to_id mapping code CIFS: Fix possible data coherency problem after oplock break to None CIFS: Do not permit write to a range mandatory locked with a read lock cifs: rename cifs_readdir_lookup to cifs_prime_dcache and make it void return cifs: Add CONFIG_CIFS_DEBUG and rename use of CIFS_DEBUG cifs: Make CIFS_DEBUG possible to undefine SMB3 mounts fail with access denied to some servers cifs: Remove unused cEVENT macro cifs: always zero out smb_vol before parsing options ...
-
git://oss.sgi.com/xfs/xfs由 Linus Torvalds 提交于
Pull xfs update from Ben Myers: "There is plenty going on, including the cleanup of xfssyncd, metadata verifiers, CRC infrastructure for the log, tracking of inodes with speculative allocation, a cleanup of xfs_fs_subr.c, fixes for XFS_IOC_ZERO_RANGE, and important fix related to log replay (only update the last_sync_lsn when a transaction completes), a fix for deadlock on AGF buffers, documentation and comment updates, and a few more cleanups and fixes. Details: - remove the xfssyncd mess - only update the last_sync_lsn when a transaction completes - zero allocation_args on the kernel stack - fix AGF/alloc workqueue deadlock - silence uninitialised f.file warning - Update inode alloc comments - Update mount options documentation - report projid32bit feature in geometry call - speculative preallocation inode tracking - fix attr tree double split corruption - fix broken error handling in xfs_vm_writepage - drop buffer io reference when a bad bio is built - add more attribute tree trace points - growfs infrastructure changes for 3.8 - fs/xfs/xfs_fs_subr.c die die die - add CRC infrastructure - add CRC checks to the log - Remove description of nodelaylog mount option from xfs.txt - inode allocation should use unmapped buffers - byte range granularity for XFS_IOC_ZERO_RANGE - fix direct IO nested transaction deadlock - fix stray dquot unlock when reclaiming dquots - fix sparse reported log CRC endian issue" Fix up trivial conflict in fs/xfs/xfs_fsops.c due to the same patch having been applied twice (commits eaef8543 and 1375cb65: "xfs: growfs: don't read garbage for new secondary superblocks") with later updates to the affected code in the XFS tree. * tag 'for-linus-v3.8-rc1' of git://oss.sgi.com/xfs/xfs: (78 commits) xfs: fix sparse reported log CRC endian issue xfs: fix stray dquot unlock when reclaiming dquots xfs: fix direct IO nested transaction deadlock. xfs: byte range granularity for XFS_IOC_ZERO_RANGE xfs: inode allocation should use unmapped buffers. xfs: Remove the description of nodelaylog mount option from xfs.txt xfs: add CRC checks to the log xfs: add CRC infrastructure xfs: convert buffer verifiers to an ops structure. xfs: connect up write verifiers to new buffers xfs: add pre-write metadata buffer verifier callbacks xfs: add buffer pre-write callback xfs: Add verifiers to dir2 data readahead. xfs: add xfs_da_node verification xfs: factor and verify attr leaf reads xfs: factor dir2 leaf read xfs: factor out dir2 data block reading xfs: factor dir2 free block reading xfs: verify dir2 block format buffers xfs: factor dir2 block read operations ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm由 Linus Torvalds 提交于
Pull dlm updates from David Teigland: "This set fixes some conditions in which value blocks are invalidated, and includes two trivial cleanups." * tag 'dlm-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm: dlm: fix lvb invalidation conditions fs/dlm: remove CONFIG_EXPERIMENTAL dlm: remove unused variable in *dlm_lowcomms_get_buffer()
-