1. 25 6月, 2015 40 次提交
    • L
      powerpc/mm: tracking vDSO remap · 83d3f0e9
      Laurent Dufour 提交于
      Some processes (CRIU) are moving the vDSO area using the mremap system
      call.  As a consequence the kernel reference to the vDSO base address is
      no more valid and the signal return frame built once the vDSO has been
      moved is not pointing to the new sigreturn address.
      
      This patch handles vDSO remapping and unmapping.
      Signed-off-by: NLaurent Dufour <ldufour@linux.vnet.ibm.com>
      Reviewed-by: NIngo Molnar <mingo@kernel.org>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      83d3f0e9
    • L
      mm: new arch_remap() hook · 4abad2ca
      Laurent Dufour 提交于
      Some architectures would like to be triggered when a memory area is moved
      through the mremap system call.
      
      This patch introduces a new arch_remap() mm hook which is placed in the
      path of mremap, and is called before the old area is unmapped (and the
      arch_unmap() hook is called).
      Signed-off-by: NLaurent Dufour <ldufour@linux.vnet.ibm.com>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4abad2ca
    • L
      mm: new mm hook framework · 2ae416b1
      Laurent Dufour 提交于
      CRIU is recreating the process memory layout by remapping the checkpointee
      memory area on top of the current process (criu).  This includes remapping
      the vDSO to the place it has at checkpoint time.
      
      However some architectures like powerpc are keeping a reference to the
      vDSO base address to build the signal return stack frame by calling the
      vDSO sigreturn service.  So once the vDSO has been moved, this reference
      is no more valid and the signal frame built later are not usable.
      
      This patch serie is introducing a new mm hook framework, and a new
      arch_remap hook which is called when mremap is done and the mm lock still
      hold.  The next patch is adding the vDSO remap and unmap tracking to the
      powerpc architecture.
      
      This patch (of 3):
      
      This patch introduces a new set of header file to manage mm hooks:
      - per architecture empty header file (arch/x/include/asm/mm-arch-hooks.h)
      - a generic header (include/linux/mm-arch-hooks.h)
      
      The architecture which need to overwrite a hook as to redefine it in its
      header file, while architecture which doesn't need have nothing to do.
      
      The default hooks are defined in the generic header and are used in the
      case the architecture is not defining it.
      
      In a next step, mm hooks defined in include/asm-generic/mm_hooks.h should
      be moved here.
      Signed-off-by: NLaurent Dufour <ldufour@linux.vnet.ibm.com>
      Suggested-by: NAndrew Morton <akpm@linux-foundation.org>
      Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2ae416b1
    • Z
      mm/hugetlb: reduce arch dependent code about huge_pmd_unshare · e81f2d22
      Zhang Zhen 提交于
      Currently we have many duplicates in definitions of huge_pmd_unshare.  In
      all architectures this function just returns 0 when
      CONFIG_ARCH_WANT_HUGE_PMD_SHARE is N.
      
      This patch puts the default implementation in mm/hugetlb.c and lets these
      architectures use the common code.
      Signed-off-by: NZhang Zhen <zhenzhang.zhang@huawei.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Chris Metcalf <cmetcalf@ezchip.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: James Yang <James.Yang@freescale.com>
      Cc: Aneesh Kumar <aneesh.kumar@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e81f2d22
    • K
      mm: fix mprotect() behaviour on VM_LOCKED VMAs · 36f88188
      Kirill A. Shutemov 提交于
      On mlock(2) we trigger COW on private writable VMA to avoid faults in
      future.
      
      mm/gup.c:
       840 long populate_vma_page_range(struct vm_area_struct *vma,
       841                 unsigned long start, unsigned long end, int *nonblocking)
       842 {
       ...
       855          * We want to touch writable mappings with a write fault in order
       856          * to break COW, except for shared mappings because these don't COW
       857          * and we would not want to dirty them for nothing.
       858          */
       859         if ((vma->vm_flags & (VM_WRITE | VM_SHARED)) == VM_WRITE)
       860                 gup_flags |= FOLL_WRITE;
      
      But we miss this case when we make VM_LOCKED VMA writeable via
      mprotect(2). The test case:
      
      	#define _GNU_SOURCE
      	#include <fcntl.h>
      	#include <stdio.h>
      	#include <stdlib.h>
      	#include <unistd.h>
      	#include <sys/mman.h>
      	#include <sys/resource.h>
      	#include <sys/stat.h>
      	#include <sys/time.h>
      	#include <sys/types.h>
      
      	#define PAGE_SIZE 4096
      
      	int main(int argc, char **argv)
      	{
      		struct rusage usage;
      		long before;
      		char *p;
      		int fd;
      
      		/* Create a file and populate first page of page cache */
      		fd = open("/tmp", O_TMPFILE | O_RDWR, S_IRUSR | S_IWUSR);
      		write(fd, "1", 1);
      
      		/* Create a *read-only* *private* mapping of the file */
      		p = mmap(NULL, PAGE_SIZE, PROT_READ, MAP_PRIVATE, fd, 0);
      
      		/*
      		 * Since the mapping is read-only, mlock() will populate the mapping
      		 * with PTEs pointing to page cache without triggering COW.
      		 */
      		mlock(p, PAGE_SIZE);
      
      		/*
      		 * Mapping became read-write, but it's still populated with PTEs
      		 * pointing to page cache.
      		 */
      		mprotect(p, PAGE_SIZE, PROT_READ | PROT_WRITE);
      
      		getrusage(RUSAGE_SELF, &usage);
      		before = usage.ru_minflt;
      
      		/* Trigger COW: fault in mlock()ed VMA. */
      		*p = 1;
      
      		getrusage(RUSAGE_SELF, &usage);
      		printf("faults: %ld\n", usage.ru_minflt - before);
      
      		return 0;
      	}
      
      	$ ./test
      	faults: 1
      
      Let's fix it by triggering populating of VMA in mprotect_fixup() on this
      condition. We don't care about population error as we don't in other
      similar cases i.e. mremap.
      
      [akpm@linux-foundation.org: tweak comment text]
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      36f88188
    • J
      thp: cleanup how khugepaged enters freezer · cd092411
      Jiri Kosina 提交于
      khugepaged_do_scan() checks in every iteration whether freezing(current)
      is true, and in such case breaks out of the loop, which causes
      try_to_freeze() to be called immediately afterwards in
      khugepaged_wait_work().
      
      If nothing else, this causes unnecessary freezing(current) test, and also
      makes the way khugepaged enters freezer a bit less obvious than necessary.
      
      Let's just try to freeze directly, instead of splitting it into two
      (directly adjacent) phases.
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cd092411
    • A
      mm, hwpoison: remove obsolete "Notebook" todo list · ebb09738
      Andi Kleen 提交于
      All the items mentioned here have been either addressed, or were not
      really needed.  So just remove the comment.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ebb09738
    • A
      mm, hwpoison: add comment describing when to add new cases · e0de78df
      Andi Kleen 提交于
      Here's another comment fix for hwpoison.
      
      It describes the "guiding principle" on when to add new
      memory error recovery code.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e0de78df
    • R
      linux/slab.h: fix three off-by-one typos in comment · 1ed58b60
      Rasmus Villemoes 提交于
      The first is a keyboard-off-by-one, the other two the ordinary mathy kind.
      Signed-off-by: NRasmus Villemoes <linux@rasmusvillemoes.dk>
      Acked-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1ed58b60
    • D
      slab: correct size_index table before replacing the bootstrap kmem_cache_node · 34cc6990
      Daniel Sanders 提交于
      This patch moves the initialization of the size_index table slightly
      earlier so that the first few kmem_cache_node's can be safely allocated
      when KMALLOC_MIN_SIZE is large.
      
      There are currently two ways to generate indices into kmalloc_caches (via
      kmalloc_index() and via the size_index table in slab_common.c) and on some
      arches (possibly only MIPS) they potentially disagree with each other
      until create_kmalloc_caches() has been called.  It seems that the
      intention is that the size_index table is a fast equivalent to
      kmalloc_index() and that create_kmalloc_caches() patches the table to
      return the correct value for the cases where kmalloc_index()'s
      if-statements apply.
      
      The failing sequence was:
      * kmalloc_caches contains NULL elements
      * kmem_cache_init initialises the element that 'struct
        kmem_cache_node' will be allocated to. For 32-bit Mips, this is a
        56-byte struct and kmalloc_index returns KMALLOC_SHIFT_LOW (7).
      * init_list is called which calls kmalloc_node to allocate a 'struct
        kmem_cache_node'.
      * kmalloc_slab selects the kmem_caches element using
        size_index[size_index_elem(size)]. For MIPS, size is 56, and the
        expression returns 6.
      * This element of kmalloc_caches is NULL and allocation fails.
      * If it had not already failed, it would have called
        create_kmalloc_caches() at this point which would have changed
        size_index[size_index_elem(size)] to 7.
      
      I don't believe the bug to be LLVM specific but GCC doesn't normally
      encounter the problem.  I haven't been able to identify exactly what GCC
      is doing better (probably inlining) but it seems that GCC is managing to
      optimize to the point that it eliminates the problematic allocations.
      This theory is supported by the fact that GCC can be made to fail in the
      same way by changing inline, __inline, __inline__, and __always_inline in
      include/linux/compiler-gcc.h such that they don't actually inline things.
      Signed-off-by: NDaniel Sanders <daniel.sanders@imgtec.com>
      Acked-by: NPekka Enberg <penberg@kernel.org>
      Acked-by: NChristoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      34cc6990
    • G
      mm/slab_common: support the slub_debug boot option on specific object size · 4066c33d
      Gavin Guo 提交于
      The slub_debug=PU,kmalloc-xx cannot work because in the
      create_kmalloc_caches() the s->name is created after the
      create_kmalloc_cache() is called.  The name is NULL in the
      create_kmalloc_cache() so the kmem_cache_flags() would not set the
      slub_debug flags to the s->flags.  The fix here set up a kmalloc_names
      string array for the initialization purpose and delete the dynamic name
      creation of kmalloc_caches.
      
      [akpm@linux-foundation.org: s/kmalloc_names/kmalloc_info/, tweak comment text]
      Signed-off-by: NGavin Guo <gavin.guo@canonical.com>
      Acked-by: NChristoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4066c33d
    • A
      xtensa: use for_each_sg() · 3693a84d
      Akinobu Mita 提交于
      This replaces the plain loop over the sglist array with for_each_sg()
      macro which consists of sg_next() function calls.  Since xtensa doesn't
      select ARCH_HAS_SG_CHAIN, it is not necessary to use for_each_sg() in
      order to loop over each sg element.  But this can help find problems
      with drivers that do not properly initialize their sg tables when
      CONFIG_DEBUG_SG is enabled.
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3693a84d
    • C
      procfs: treat parked tasks as sleeping for task state · f51c0eae
      Chris Metcalf 提交于
      Allowing watchdog threads to be parked means that we now have the
      opportunity of actually seeing persistent parked threads in the output
      of /proc/<pid>/stat and /proc/<pid>/status.  The existing code reported
      such threads as "Running", which is kind-of true if you think of the
      case where we park them as part of taking cpus offline.  But if we allow
      parking them indefinitely, "Running" is pretty misleading, so we report
      them as "Sleeping" instead.
      
      We could simply report them with a new string, "Parked", but it feels
      like it's a bit risky for userspace to see unexpected new values; the
      output is already documented in Documentation/filesystems/proc.txt, and
      it seems like a mistake to change that lightly.
      
      The scheduler does report parked tasks with a "P" in debugging output
      from sched_show_task() or dump_cpu_task(), but that's a different API.
      Similarly, the trace_ctxwake_* routines report a "P" for parked tasks,
      but again, different API.
      
      This change seemed slightly cleaner than updating the task_state_array
      to have additional rows.  TASK_DEAD should be subsumed by the exit_state
      bits; TASK_WAKEKILL is just a modifier; and TASK_WAKING can very
      reasonably be reported as "Running" (as it is now).  Only TASK_PARKED
      shows up with unreasonable output here.
      Signed-off-by: NChris Metcalf <cmetcalf@ezchip.com>
      Cc: Don Zickus <dzickus@redhat.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Ulrich Obergfell <uobergfe@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f51c0eae
    • C
      watchdog: add watchdog_cpumask sysctl to assist nohz · fe4ba3c3
      Chris Metcalf 提交于
      Change the default behavior of watchdog so it only runs on the
      housekeeping cores when nohz_full is enabled at build and boot time.
      Allow modifying the set of cores the watchdog is currently running on
      with a new kernel.watchdog_cpumask sysctl.
      
      In the current system, the watchdog subsystem runs a periodic timer that
      schedules the watchdog kthread to run.  However, nohz_full cores are
      designed to allow userspace application code running on those cores to
      have 100% access to the CPU.  So the watchdog system prevents the
      nohz_full application code from being able to run the way it wants to,
      thus the motivation to suppress the watchdog on nohz_full cores, which
      this patchset provides by default.
      
      However, if we disable the watchdog globally, then the housekeeping
      cores can't benefit from the watchdog functionality.  So we allow
      disabling it only on some cores.  See Documentation/lockup-watchdogs.txt
      for more information.
      
      [jhubbard@nvidia.com: fix a watchdog crash in some configurations]
      Signed-off-by: NChris Metcalf <cmetcalf@ezchip.com>
      Acked-by: NDon Zickus <dzickus@redhat.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Ulrich Obergfell <uobergfe@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NJohn Hubbard <jhubbard@nvidia.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fe4ba3c3
    • C
      smpboot: allow excluding cpus from the smpboot threads · b5242e98
      Chris Metcalf 提交于
      This patch series allows the watchdog to run by default only on the
      housekeeping cores when nohz_full is in effect; this seems to be a good
      compromise short of turning it off completely (since the nohz_full cores
      can't tolerate a watchdog).
      
      To provide customizability, we add /proc/sys/kernel/watchdog_cpumask so
      that the set of cores running the watchdog can be tuned to different
      values after bootup.
      
      To implement this customizability, we add a new
      smpboot_update_cpumask_percpu_thread() API to the smpboot_thread
      subsystem that lets us park or unpark "unwanted" threads.
      
      And now that threads can be parked for long periods of time, we tweak the
      /proc/<pid>/stat and /proc/<pid>/status code so parked threads aren't
      reported as running, which is otherwise confusing.
      
      This patch (of 3):
      
      This change allows some cores to be excluded from running the
      smp_hotplug_thread tasks.  The following commit to update
      kernel/watchdog.c to use this functionality is the motivating example, and
      more information on the motivation is provided there.
      
      A new smp_hotplug_thread field is introduced, "cpumask", which is cpumask
      field managed by the smpboot subsystem that indicates whether or not the
      given smp_hotplug_thread should run on that core; the cpumask is checked
      when deciding whether to unpark the thread.
      
      To limit the cpumask to less than cpu_possible, you must call
      smpboot_update_cpumask_percpu_thread() after registering.
      Signed-off-by: NChris Metcalf <cmetcalf@ezchip.com>
      Cc: Don Zickus <dzickus@redhat.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Ulrich Obergfell <uobergfe@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b5242e98
    • A
      sparc: use for_each_sg() · 8c07a308
      Akinobu Mita 提交于
      This replaces the plain loop over the sglist array with for_each_sg()
      macro which consists of sg_next() function calls.  Since sparc does select
      ARCH_HAS_SG_CHAIN, it is necessary to use for_each_sg() in order to loop
      over each sg element.  This also help find problems with drivers that do
      not properly initialize their sg tables when CONFIG_DEBUG_SG is enabled.
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8c07a308
    • A
      parisc: use for_each_sg() · 210bff6d
      Akinobu Mita 提交于
      This replaces the plain loop over the sglist array with for_each_sg()
      macro which consists of sg_next() function calls.  Since parisc doesn't
      select ARCH_HAS_SG_CHAIN, it is not necessary to use for_each_sg() in
      order to loop over each sg element.  But this can help find problems with
      drivers that do not properly initialize their sg tables when
      CONFIG_DEBUG_SG is enabled.
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Helge Deller <deller@gmx.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      210bff6d
    • J
      ocfs2: mark local functions as static · b519ea6d
      Joseph Qi 提交于
      Some functions are only used locally, so mark them as static.
      Signed-off-by: NJoseph Qi <joseph.qi@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b519ea6d
    • F
      ocfs2: use swap() in ocfs2_double_lock() · ab1ba021
      Fabian Frederick 提交于
      Use kernel.h macro definition.
      
      Thanks to Julia Lawall for Coccinelle scripting support.
      Signed-off-by: NFabian Frederick <fabf@skynet.be>
      Cc: Julia Lawall <julia.lawall@lip6.fr>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ab1ba021
    • F
      ocfs2: use swap() in swap_refcount_rec() · a612543f
      Fabian Frederick 提交于
      Use kernel.h macro definition.
      
      Thanks to Julia Lawall for Coccinelle scripting support.
      Signed-off-by: NFabian Frederick <fabf@skynet.be>
      Cc: Julia Lawall <julia.lawall@lip6.fr>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a612543f
    • F
      ocfs2: use swap() in dx_leaf_sort_swap() · 2a28f98c
      Fabian Frederick 提交于
      Use kernel.h macro definition.
      
      Thanks to Julia Lawall for Coccinelle scripting support.
      Signed-off-by: NFabian Frederick <fabf@skynet.be>
      Cc: Julia Lawall <julia.lawall@lip6.fr>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2a28f98c
    • J
      ocfs2: fix wrong check in ocfs2_direct_IO_get_blocks · ae1f0814
      Joseph Qi 提交于
      contig_blocks gotten from ocfs2_extent_map_get_blocks cannot be compared
      with clusters_to_alloc. So convert it to clusters first.
      Signed-off-by: NJoseph Qi <joseph.qi@huawei.com>
      Reviewed-by: NWeiwei Wang <wangww631@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ae1f0814
    • X
      ocfs2: fix NULL pointer dereference in function ocfs2_abort_trigger() · 74e364ad
      Xue jiufei 提交于
      ocfs2_abort_trigger() use bh->b_assoc_map to get sb.  But there's no
      function to set bh->b_assoc_map in ocfs2, it will trigger NULL pointer
      dereference while calling this function.  We can get sb from
      bh->b_bdev->bd_super instead of b_assoc_map.
      
      [akpm@linux-foundation.org: update comment, per Joseph]
      Signed-off-by: Njoyce.xue <xuejiufei@huawei.com>
      Cc: Joseph Qi <joseph.qi@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      74e364ad
    • A
      ocfs2: o2net: should remove debugfs in o2net_init() out branch · fce56d84
      alex chen 提交于
      Signed-off-by: NAlex Chen <alex.chen@huawei.com>
      Reviewed-by: NJoseph Qi <joseph.qi@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fce56d84
    • W
      ocfs2: remove OCFS2_IOCB_SEM lock type in direct io · fa5a0eb3
      WeiWei Wang 提交于
      In ocfs2 direct read/write, OCFS2_IOCB_SEM lock type is used to protect
      inode->i_alloc_sem rw semaphore lock in the earlier kernel version.
      However, in the latest kernel, inode->i_alloc_sem rw semaphore lock is not
      used at all, so OCFS2_IOCB_SEM lock type needs to be removed.
      Signed-off-by: NWeiwei Wang <wangww631@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Reviewed-by: NJunxiao Bi <junxiao.bi@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fa5a0eb3
    • J
      ocfs2: do not BUG if jbd2_journal_dirty_metadata fails · e272e7f0
      Joseph Qi 提交于
      jbd2_journal_dirty_metadata may fail.  Currently it cannot take care of
      non zero return value and just BUG in ocfs2_journal_dirty.  This patch is
      aborting the handle and journal instead of BUG.
      Signed-off-by: NJoseph Qi <joseph.qi@huawei.com>
      Cc: joyce.xue <xuejiufei@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Junxiao Bi <junxiao.bi@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e272e7f0
    • X
      ocfs2: remove BUG_ON(!empty_extent) in __ocfs2_rotate_tree_left() · 099768b0
      Xue jiufei 提交于
      ocfs2_rotate_tree_left() calls __ocfs2_rotate_tree_left() for left
      rotation while non-rightmost path containing an empty extent in the leaf
      block.  __ocfs2_rotate_tree_left() returns -EAGAIN if right subtree having an
      empty extent and pass the empty_extent_path to caller.  The caller
      ocfs2_rotate_tree_left() will restart rotation from the returned path.
      
      It will trigger the BUG_ON(!ocfs2_is_empty_extent) when the et on disk
      is as follows:
      
      eb0 is the leaf block of path(say path_a) passed to
      ocfs2_rotate_tree_left, which has an empty rec[0].
      
      eb1 is the leaf block of path(say path_b) that just right to path_a, which
      has no empty record.
      
      eb2 is the leaf block of path(say path_c) that just right to path_b, which
      has an empty rec[0].  And path_c is also the rightmost path.
      
      Now we want to remove the empty rec[0] in eb0:
      
      ocfs2_rotate_tree_left:
        -> call __ocfs2_rotate_tree_left with path_a as its input *path*
          -> call ocfs2_rotate_subtree_left with path_a as its input
             *left_path* and path_b as its input *right_path*. it will move
             rec[0] in eb1 to eb0, and rec[0] in eb0 is not empty now.
          -> continue to call ocfs2_rotate_subtree_left with path_b as its
             input *left_path* and path_c as its input *right_path*, and
             return -EAGAIN because eb2 has an empty rec[0]
        -> call __ocfs2_rotate_tree_left with path_c as it input, rotate all
           records in eb2 to left and return 0.
        -> call __ocfs2_rotate_tree_left with path_a as its input, and triggers
           the BUG_ON(!ocfs2_is_empty_extent) as the rec[0] in eb0 is not empty.
      
      So the BUG_ON() should be removed and return 0 if rec[0] is no longer an
      empty extent.
      Signed-off-by: Njoyce.xue <xuejiufei@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      099768b0
    • X
      ocfs2: return error when ocfs2_figure_merge_contig_type() fails · 9f99ad08
      Xue jiufei 提交于
      ocfs2_figure_merge_contig_type() still returns CONTIG_NONE when some error
      occurs which will cause an unpredictable error.  So return a proper errno
      when ocfs2_figure_merge_contig_type() fails.
      Signed-off-by: Njoyce.xue <xuejiufei@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9f99ad08
    • J
      ocfs2/dlm: cleanup unused function __dlm_wait_on_lockres_flags_set · 345dc681
      Joseph Qi 提交于
      __dlm_wait_on_lockres_flags_set() is declared but not implemented and
      used.  So remove it.
      Signed-off-by: NJoseph Qi <joseph.qi@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      345dc681
    • D
      ocfs2: use retval instead of status for checking error · 2e173152
      Daeseok Youn 提交于
      The use of 'status' in __ocfs2_add_entry() can return wrong value.
      
      Some functions' return value in __ocfs2_add_entry(), i.e
      ocfs2_journal_access_di() is saved to 'status'.  But 'status' is not
      used in 'bail' label for returning result of __ocfs2_add_entry().
      
      So use retval instead of status.
      Signed-off-by: NDaeseok Youn <daeseok.youn@gmail.com>
      Reviewed-by: NJoseph Qi <joseph.qi@huawei.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2e173152
    • J
      ocfs2: fix a tiny race when truncate dio orohaned entry · cf1776a9
      Joseph Qi 提交于
      Once dio crashed it will leave an entry in orphan dir.  And orphan scan
      will take care of the clean up.  There is a tiny race case that the same
      entry will be truncated twice and then trigger the BUG in
      ocfs2_del_inode_from_orphan.
      Signed-off-by: NJoseph Qi <joseph.qi@huawei.com>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cf1776a9
    • A
      ocfs2: remove __mlog_cpu_guess · e327284a
      Andrew Morton 提交于
      raw_smp_processor_id() is the means of avoiding the runtime preemptibility
      check.
      
      [akpm@linux-foundation.org: fix printk warning]
      Cc: Joe Perches <joe@perches.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e327284a
    • J
      ocfs2: reduce object size of mlog uses · 7c2bd2f9
      Joe Perches 提交于
      Using a function for __mlog_printk instead of a macro reduces the object
      size of built-in.o by about 190KB, or ~18% overall (x86-64 defconfig
      with all ocfs2 options)
      
        $ size fs/ocfs2/built-in.o*
           text    data     bss     dec     hex filename
         870954  118471  134408 1123833  1125f9 fs/ocfs2/built-in.o,new
        1064081  118071  134408 1316560  1416d0 fs/ocfs2/built-in.o.old
      
      Miscellanea:
      
       - Move the used-once __mlog_cpu_guess statement expression macro to the
         masklog.c file above the use in __mlog_printk function
      
       - Simplify the mlog macro moving the and/or logic and level code into
         __mlog_printk
      
      [akpm@linux-foundation.org: export __mlog_printk() to other ocfs2 modules]
      Signed-off-by: NJoe Perches <joe@perches.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7c2bd2f9
    • F
      configfs: unexport/make static config_item_init() · 5286d20c
      Fabian Frederick 提交于
      config_item_init() is only used in item.c
      Signed-off-by: NFabian Frederick <fabf@skynet.be>
      Cc: Joel Becker <jlbec@evilplan.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5286d20c
    • P
      NTFS: use kvfree() in ntfs_free() · b0cbeee7
      Pekka Enberg 提交于
      Use kvfree() instead of open-coding it.
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      Cc: Anton Altaparmakov <anton@tuxera.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b0cbeee7
    • N
      fsnotify: remove obsolete documentation · c3cddc4c
      Nikolay Borisov 提交于
      should_send_event is no longer part of struct fsnotify_ops, so remove it.
      Signed-off-by: NNikolay Borisov <kernel@kyup.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c3cddc4c
    • A
      powerpc: use for_each_sg() · 5935877a
      Akinobu Mita 提交于
      This replaces the plain loop over the sglist array with for_each_sg()
      macro which consists of sg_next() function calls.  Since powerpc does
      select ARCH_HAS_SG_CHAIN, it is necessary to use for_each_sg() in order
      to loop over each sg element.  This also help find problems with drivers
      that do not properly initialize their sg tables when CONFIG_DEBUG_SG is
      enabled.
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5935877a
    • A
      metag: use for_each_sg() · ae70a7bb
      Akinobu Mita 提交于
      This replaces the plain loop over the sglist array with for_each_sg()
      macro which consists of sg_next() function calls.  Since metag doesn't
      select ARCH_HAS_SG_CHAIN, it is not necessary to use for_each_sg() in
      order to loop over each sg element.  But this can help find problems
      with drivers that do not properly initialize their sg tables when
      CONFIG_DEBUG_SG is enabled.
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ae70a7bb
    • L
      Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux · e3d8238d
      Linus Torvalds 提交于
      Pull arm64 updates from Catalin Marinas:
       "Mostly refactoring/clean-up:
      
         - CPU ops and PSCI (Power State Coordination Interface) refactoring
           following the merging of the arm64 ACPI support, together with
           handling of Trusted (secure) OS instances
      
         - Using fixmap for permanent FDT mapping, removing the initial dtb
           placement requirements (within 512MB from the start of the kernel
           image).  This required moving the FDT self reservation out of the
           memreserve processing
      
         - Idmap (1:1 mapping used for MMU on/off) handling clean-up
      
         - Removing flush_cache_all() - not safe on ARM unless the MMU is off.
           Last stages of CPU power down/up are handled by firmware already
      
         - "Alternatives" (run-time code patching) refactoring and support for
           immediate branch patching, GICv3 CPU interface access
      
         - User faults handling clean-up
      
        And some fixes:
      
         - Fix for VDSO building with broken ELF toolchains
      
         - Fix another case of init_mm.pgd usage for user mappings (during
           ASID roll-over broadcasting)
      
         - Fix for FPSIMD reloading after CPU hotplug
      
         - Fix for missing syscall trace exit
      
         - Workaround for .inst asm bug
      
         - Compat fix for switching the user tls tpidr_el0 register"
      
      * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (42 commits)
        arm64: use private ratelimit state along with show_unhandled_signals
        arm64: show unhandled SP/PC alignment faults
        arm64: vdso: work-around broken ELF toolchains in Makefile
        arm64: kernel: rename __cpu_suspend to keep it aligned with arm
        arm64: compat: print compat_sp instead of sp
        arm64: mm: Fix freeing of the wrong memmap entries with !SPARSEMEM_VMEMMAP
        arm64: entry: fix context tracking for el0_sp_pc
        arm64: defconfig: enable memtest
        arm64: mm: remove reference to tlb.S from comment block
        arm64: Do not attempt to use init_mm in reset_context()
        arm64: KVM: Switch vgic save/restore to alternative_insn
        arm64: alternative: Introduce feature for GICv3 CPU interface
        arm64: psci: fix !CONFIG_HOTPLUG_CPU build warning
        arm64: fix bug for reloading FPSIMD state after CPU hotplug.
        arm64: kernel thread don't need to save fpsimd context.
        arm64: fix missing syscall trace exit
        arm64: alternative: Work around .inst assembler bugs
        arm64: alternative: Merge alternative-asm.h into alternative.h
        arm64: alternative: Allow immediate branch as alternative instruction
        arm64: Rework alternate sequence for ARM erratum 845719
        ...
      e3d8238d
    • L
      Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm · 4e241557
      Linus Torvalds 提交于
      Pull first batch of KVM updates from Paolo Bonzini:
       "The bulk of the changes here is for x86.  And for once it's not for
        silicon that no one owns: these are really new features for everyone.
      
        Details:
      
         - ARM:
              several features are in progress but missed the 4.2 deadline.
              So here is just a smattering of bug fixes, plus enabling the
              VFIO integration.
      
         - s390:
              Some fixes/refactorings/optimizations, plus support for 2GB
              pages.
      
         - x86:
              * host and guest support for marking kvmclock as a stable
                scheduler clock.
              * support for write combining.
              * support for system management mode, needed for secure boot in
                guests.
              * a bunch of cleanups required for the above
              * support for virtualized performance counters on AMD
              * legacy PCI device assignment is deprecated and defaults to "n"
                in Kconfig; VFIO replaces it
      
              On top of this there are also bug fixes and eager FPU context
              loading for FPU-heavy guests.
      
         - Common code:
              Support for multiple address spaces; for now it is used only for
              x86 SMM but the s390 folks also have plans"
      
      * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (124 commits)
        KVM: s390: clear floating interrupt bitmap and parameters
        KVM: x86/vPMU: Enable PMU handling for AMD PERFCTRn and EVNTSELn MSRs
        KVM: x86/vPMU: Implement AMD vPMU code for KVM
        KVM: x86/vPMU: Define kvm_pmu_ops to support vPMU function dispatch
        KVM: x86/vPMU: introduce kvm_pmu_msr_idx_to_pmc
        KVM: x86/vPMU: reorder PMU functions
        KVM: x86/vPMU: whitespace and stylistic adjustments in PMU code
        KVM: x86/vPMU: use the new macros to go between PMC, PMU and VCPU
        KVM: x86/vPMU: introduce pmu.h header
        KVM: x86/vPMU: rename a few PMU functions
        KVM: MTRR: do not map huge page for non-consistent range
        KVM: MTRR: simplify kvm_mtrr_get_guest_memory_type
        KVM: MTRR: introduce mtrr_for_each_mem_type
        KVM: MTRR: introduce fixed_mtrr_addr_* functions
        KVM: MTRR: sort variable MTRRs
        KVM: MTRR: introduce var_mtrr_range
        KVM: MTRR: introduce fixed_mtrr_segment table
        KVM: MTRR: improve kvm_mtrr_get_guest_memory_type
        KVM: MTRR: do not split 64 bits MSR content
        KVM: MTRR: clean up mtrr default type
        ...
      4e241557