1. 09 10月, 2012 27 次提交
    • M
      mm: compaction: capture a suitable high-order page immediately when it is made available · 1fb3f8ca
      Mel Gorman 提交于
      While compaction is migrating pages to free up large contiguous blocks
      for allocation it races with other allocation requests that may steal
      these blocks or break them up.  This patch alters direct compaction to
      capture a suitable free page as soon as it becomes available to reduce
      this race.  It uses similar logic to split_free_page() to ensure that
      watermarks are still obeyed.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Reviewed-by: NMinchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1fb3f8ca
    • M
      mm: vmscan: scale number of pages reclaimed by reclaim/compaction based on failures · 83fde0f2
      Mel Gorman 提交于
      If allocation fails after compaction then compaction may be deferred for
      a number of allocation attempts.  If there are subsequent failures,
      compact_defer_shift is increased to defer for longer periods.  This
      patch uses that information to scale the number of pages reclaimed with
      compact_defer_shift until allocations succeed again.  The rationale is
      that reclaiming the normal number of pages still allowed compaction to
      fail and its success depends on the number of pages.  If it's failing,
      reclaim more pages until it succeeds again.
      
      Note that this is not implying that VM reclaim is not reclaiming enough
      pages or that its logic is broken.  try_to_free_pages() always asks for
      SWAP_CLUSTER_MAX pages to be reclaimed regardless of order and that is
      what it does.  Direct reclaim stops normally with this check.
      
      	if (sc->nr_reclaimed >= sc->nr_to_reclaim)
      		goto out;
      
      should_continue_reclaim delays when that check is made until a minimum
      number of pages for reclaim/compaction are reclaimed.  It is possible
      that this patch could instead set nr_to_reclaim in try_to_free_pages()
      and drive it from there but that's behaves differently and not
      necessarily for the better.  If driven from do_try_to_free_pages(), it
      is also possible that priorities will rise.
      
      When they reach DEF_PRIORITY-2, it will also start stalling and setting
      pages for immediate reclaim which is more disruptive than not desirable
      in this case.  That is a more wide-reaching change that could cause
      another regression related to THP requests causing interactive jitter.
      
      [akpm@linux-foundation.org: fix build]
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NRik van Riel <riel@redhat.com>
      Reviewed-by: NMinchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      83fde0f2
    • M
      mm: compaction: update comment in try_to_compact_pages · 4ffb6335
      Mel Gorman 提交于
      Allocation success rates have been far lower since 3.4 due to commit
      fe2c2a10 ("vmscan: reclaim at order 0 when compaction is enabled").
      This commit was introduced for good reasons and it was known in advance
      that the success rates would suffer but it was justified on the grounds
      that the high allocation success rates were achieved by aggressive
      reclaim.  Success rates are expected to suffer even more in 3.6 due to
      commit 7db8889a ("mm: have order > 0 compaction start off where it
      left") which testing has shown to severely reduce allocation success
      rates under load - to 0% in one case.
      
      This series aims to improve the allocation success rates without
      regressing the benefits of commit fe2c2a10.  The series is based on
      latest mmotm and takes into account the __GFP_NO_KSWAPD flag is going
      away.
      
      Patch 1 updates a stale comment seeing as I was in the general area.
      
      Patch 2 updates reclaim/compaction to reclaim pages scaled on the number
      	of recent failures.
      
      Patch 3 captures suitable high-order pages freed by compaction to reduce
      	races with parallel allocation requests.
      
      Patch 4 fixes the upstream commit [7db8889a: mm: have order > 0 compaction
      	start off where it left] to enable compaction again
      
      Patch 5 identifies when compacion is taking too long due to contention
      	and aborts.
      
      STRESS-HIGHALLOC
      		 3.6-rc1-akpm	  full-series
      Pass 1          36.00 ( 0.00%)    51.00 (15.00%)
      Pass 2          42.00 ( 0.00%)    63.00 (21.00%)
      while Rested    86.00 ( 0.00%)    86.00 ( 0.00%)
      
      From
      
        http://www.csn.ul.ie/~mel/postings/mmtests-20120424/global-dhp__stress-highalloc-performance-ext3/hydra/comparison.html
      
      I know that the allocation success rates in 3.3.6 was 78% in comparison
      to 36% in in the current akpm tree.  With the full series applied, the
      success rates are up to around 51% with some variability in the results.
      This is not as high a success rate but it does not reclaim excessively
      which is a key point.
      
      MMTests Statistics: vmstat
      Page Ins                                     3050912     3078892
      Page Outs                                    8033528     8039096
      Swap Ins                                           0           0
      Swap Outs                                          0           0
      
      Note that swap in/out rates remain at 0. In 3.3.6 with 78% success rates
      there were 71881 pages swapped out.
      
      Direct pages scanned                           70942      122976
      Kswapd pages scanned                         1366300     1520122
      Kswapd pages reclaimed                       1366214     1484629
      Direct pages reclaimed                         70936      105716
      Kswapd efficiency                                99%         97%
      Kswapd velocity                             1072.550    1182.615
      Direct efficiency                                99%         85%
      Direct velocity                               55.690      95.672
      
      The kswapd velocity changes very little as expected.  kswapd velocity is
      around the 1000 pages/sec mark where as in kernel 3.3.6 with the high
      allocation success rates it was 8140 pages/second.  Direct velocity is
      higher as a result of patch 2 of the series but this is expected and is
      acceptable.  The direct reclaim and kswapd velocities change very little.
      
      If these get accepted for merging then there is a difficulty in how they
      should be handled.  7db8889a ("mm: have order > 0 compaction start off
      where it left") is broken but it is already in 3.6-rc1 and needs to be
      fixed.  However, if just patch 4 from this series is applied then Jim
      Schutt's workload is known to break again as his workload also requires
      patch 5.  While it would be preferred to have all these patches in 3.6 to
      improve compaction in general, it would at least be acceptable if just
      patches 4 and 5 were merged to 3.6 to fix a known problem without breaking
      compaction completely.  On the face of it, that would force
      __GFP_NO_KSWAPD patches to be merged at the same time but I can do a
      version of this series with __GFP_NO_KSWAPD change reverted and then
      rebase it on top of this series.  That might be best overall because I
      note that the __GFP_NO_KSWAPD patch should have removed
      deferred_compaction from page_alloc.c but it didn't but fixing that causes
      collisions with this series.
      
      This patch:
      
      The comment about order applied when the check was order >
      PAGE_ALLOC_COSTLY_ORDER which has not been the case since c5a73c3d ("thp:
      use compaction for all allocation orders").  Fixing the comment while I'm
      in the general area.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Reviewed-by: NMinchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4ffb6335
    • H
      mm/mmap.c: replace find_vma_prepare() with clearer find_vma_links() · 6597d783
      Hugh Dickins 提交于
      People get confused by find_vma_prepare(), because it doesn't care about
      what it returns in its output args, when its callers won't be interested.
      
      Clarify by passing in end-of-range address too, and returning failure if
      any existing vma overlaps the new range: instead of returning an ambiguous
      vma which most callers then must check.  find_vma_links() is a clearer
      name.
      
      This does revert 2.6.27's dfe195fb ("mm: fix uninitialized variables
      for find_vma_prepare callers"), but it looks like gcc 4.3.0 was one of
      those releases too eager to shout about uninitialized variables: only
      copy_vma() warns with 4.5.1 and 4.7.1, which a BUG on error silences.
      
      [hughd@google.com: fix warning, remove BUG()]
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: Benny Halevy <bhalevy@tonian.com>
      Acked-by: NHillf Danton <dhillf@gmail.com>
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6597d783
    • R
      mm: fix nonuniform page status when writing new file with small buffer · d741c9cd
      Robin Dong 提交于
      When writing a new file with 2048 bytes buffer, such as write(fd, buffer,
      2048), it will call generic_perform_write() twice for every page:
      
      	write_begin
      	mark_page_accessed(page)
      	write_end
      
      	write_begin
      	mark_page_accessed(page)
      	write_end
      
      Pages 1-13 will be added to lru-pvecs in write_begin() and will *NOT* be
      added to active_list even they have be accessed twice because they are not
      PageLRU(page).  But when page 14th comes, all pages in lru-pvecs will be
      moved to inactive_list (by __lru_cache_add() ) in first write_begin(), now
      page 14th *is* PageLRU(page).  And after second write_end() only page 14th
      will be in active_list.
      
      In Hadoop environment, we do comes to this situation: after writing a
      file, we find out that only 14th, 28th, 42th...  page are in active_list
      and others in inactive_list.  Now kswapd works, shrinks the inactive_list,
      the file only have 14th, 28th...pages in memory, the readahead request
      size will be broken to only 52k (13*4k), system's performance falls
      dramatically.
      
      This problem can also replay by below steps (the machine has 8G memory):
      
      	1. dd if=/dev/zero of=/test/file.out bs=1024 count=1048576
      	2. cat another 7.5G file to /dev/null
      	3. vmtouch -m 1G -v /test/file.out, it will show:
      
      	/test/file.out
      	[oooooooooooooooooooOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO] 187847/262144
      
      	the 'o' means same pages are in memory but same are not.
      
      The solution for this problem is simple: the 14th page should be added to
      lru_add_pvecs before mark_page_accessed() just as other pages.
      
      [akpm@linux-foundation.org: tweak comment]
      [akpm@linux-foundation.org: grab better comment from the v3 patch]
      Signed-off-by: NRobin Dong <sanbai@taobao.com>
      Reviewed-by: NMinchan Kim <minchan@kernel.org>
      Cc: KOSAKI Motohiro <kosaki.motohiro@gmail.com>
      Reviewed-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d741c9cd
    • K
      mm: kill vma flag VM_RESERVED and mm->reserved_vm counter · 314e51b9
      Konstantin Khlebnikov 提交于
      A long time ago, in v2.4, VM_RESERVED kept swapout process off VMA,
      currently it lost original meaning but still has some effects:
      
       | effect                 | alternative flags
      -+------------------------+---------------------------------------------
      1| account as reserved_vm | VM_IO
      2| skip in core dump      | VM_IO, VM_DONTDUMP
      3| do not merge or expand | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP
      4| do not mlock           | VM_IO, VM_DONTEXPAND, VM_HUGETLB, VM_PFNMAP
      
      This patch removes reserved_vm counter from mm_struct.  Seems like nobody
      cares about it, it does not exported into userspace directly, it only
      reduces total_vm showed in proc.
      
      Thus VM_RESERVED can be replaced with VM_IO or pair VM_DONTEXPAND | VM_DONTDUMP.
      
      remap_pfn_range() and io_remap_pfn_range() set VM_IO|VM_DONTEXPAND|VM_DONTDUMP.
      remap_vmalloc_range() set VM_DONTEXPAND | VM_DONTDUMP.
      
      [akpm@linux-foundation.org: drivers/vfio/pci/vfio_pci.c fixup]
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Carsten Otte <cotte@de.ibm.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Morris <james.l.morris@oracle.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Venkatesh Pallipadi <venki@google.com>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      314e51b9
    • K
      mm: prepare VM_DONTDUMP for using in drivers · 0103bd16
      Konstantin Khlebnikov 提交于
      Rename VM_NODUMP into VM_DONTDUMP: this name matches other negative flags:
      VM_DONTEXPAND, VM_DONTCOPY.  Currently this flag used only for
      sys_madvise.  The next patch will use it for replacing the outdated flag
      VM_RESERVED.
      
      Also forbid madvise(MADV_DODUMP) for special kernel mappings VM_SPECIAL
      (VM_IO | VM_DONTEXPAND | VM_RESERVED | VM_PFNMAP)
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Carsten Otte <cotte@de.ibm.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Morris <james.l.morris@oracle.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Venkatesh Pallipadi <venki@google.com>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0103bd16
    • K
      mm: kill vma flag VM_EXECUTABLE and mm->num_exe_file_vmas · e9714acf
      Konstantin Khlebnikov 提交于
      Currently the kernel sets mm->exe_file during sys_execve() and then tracks
      number of vmas with VM_EXECUTABLE flag in mm->num_exe_file_vmas, as soon
      as this counter drops to zero kernel resets mm->exe_file to NULL.  Plus it
      resets mm->exe_file at last mmput() when mm->mm_users drops to zero.
      
      VMA with VM_EXECUTABLE flag appears after mapping file with flag
      MAP_EXECUTABLE, such vmas can appears only at sys_execve() or after vma
      splitting, because sys_mmap ignores this flag.  Usually binfmt module sets
      mm->exe_file and mmaps executable vmas with this file, they hold
      mm->exe_file while task is running.
      
      comment from v2.6.25-6245-g925d1c40 ("procfs task exe symlink"),
      where all this stuff was introduced:
      
      > The kernel implements readlink of /proc/pid/exe by getting the file from
      > the first executable VMA.  Then the path to the file is reconstructed and
      > reported as the result.
      >
      > Because of the VMA walk the code is slightly different on nommu systems.
      > This patch avoids separate /proc/pid/exe code on nommu systems.  Instead of
      > walking the VMAs to find the first executable file-backed VMA we store a
      > reference to the exec'd file in the mm_struct.
      >
      > That reference would prevent the filesystem holding the executable file
      > from being unmounted even after unmapping the VMAs.  So we track the number
      > of VM_EXECUTABLE VMAs and drop the new reference when the last one is
      > unmapped.  This avoids pinning the mounted filesystem.
      
      exe_file's vma accounting is hooked into every file mmap/unmmap and vma
      split/merge just to fix some hypothetical pinning fs from umounting by mm,
      which already unmapped all its executable files, but still alive.
      
      Seems like currently nobody depends on this behaviour.  We can try to
      remove this logic and keep mm->exe_file until final mmput().
      
      mm->exe_file is still protected with mm->mmap_sem, because we want to
      change it via new sys_prctl(PR_SET_MM_EXE_FILE).  Also via this syscall
      task can change its mm->exe_file and unpin mountpoint explicitly.
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Carsten Otte <cotte@de.ibm.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Morris <james.l.morris@oracle.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Venkatesh Pallipadi <venki@google.com>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e9714acf
    • K
      mm: use mm->exe_file instead of first VM_EXECUTABLE vma->vm_file · 2dd8ad81
      Konstantin Khlebnikov 提交于
      Some security modules and oprofile still uses VM_EXECUTABLE for retrieving
      a task's executable file.  After this patch they will use mm->exe_file
      directly.  mm->exe_file is protected with mm->mmap_sem, so locking stays
      the same.
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
      Acked-by: Chris Metcalf <cmetcalf@tilera.com>			[arch/tile]
      Acked-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>	[tomoyo]
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Carsten Otte <cotte@de.ibm.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Acked-by: NJames Morris <james.l.morris@oracle.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: Venkatesh Pallipadi <venki@google.com>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2dd8ad81
    • K
      mm: kill vma flag VM_CAN_NONLINEAR · 0b173bc4
      Konstantin Khlebnikov 提交于
      Move actual pte filling for non-linear file mappings into the new special
      vma operation: ->remap_pages().
      
      Filesystems must implement this method to get non-linear mapping support,
      if it uses filemap_fault() then generic_file_remap_pages() can be used.
      
      Now device drivers can implement this method and obtain nonlinear vma support.
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Carsten Otte <cotte@de.ibm.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>	#arch/tile
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Morris <james.l.morris@oracle.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Venkatesh Pallipadi <venki@google.com>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0b173bc4
    • K
      mm: kill vma flag VM_INSERTPAGE · 4b6e1e37
      Konstantin Khlebnikov 提交于
      Merge VM_INSERTPAGE into VM_MIXEDMAP.  VM_MIXEDMAP VMA can mix pure-pfn
      ptes, special ptes and normal ptes.
      
      Now copy_page_range() always copies VM_MIXEDMAP VMA on fork like
      VM_PFNMAP.  If driver populates whole VMA at mmap() it probably not
      expects page-faults.
      
      This patch removes special check from vma_wants_writenotify() which
      disables pages write tracking for VMA populated via vm_instert_page().
      BDI below mapped file should not use dirty-accounting, moreover
      do_wp_page() can handle this.
      
      vm_insert_page() still marks vma after first usage.  Usually it is called
      from f_op->mmap() handler under mm->mmap_sem write-lock, so it able to
      change vma->vm_flags.  Caller must set VM_MIXEDMAP at mmap time if it
      wants to call this function from other places, for example from page-fault
      handler.
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Carsten Otte <cotte@de.ibm.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Morris <james.l.morris@oracle.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Venkatesh Pallipadi <venki@google.com>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4b6e1e37
    • K
      mm: introduce arch-specific vma flag VM_ARCH_1 · cc2383ec
      Konstantin Khlebnikov 提交于
      Combine several arch-specific vma flags into one.
      
      before patch:
      
              0x00000200      0x01000000      0x20000000      0x40000000
      x86     VM_NOHUGEPAGE   VM_HUGEPAGE     -               VM_PAT
      powerpc -               -               VM_SAO          -
      parisc  VM_GROWSUP      -               -               -
      ia64    VM_GROWSUP      -               -               -
      nommu   -               VM_MAPPED_COPY  -               -
      others  -               -               -               -
      
      after patch:
      
              0x00000200      0x01000000      0x20000000      0x40000000
      x86     -               VM_PAT          VM_HUGEPAGE     VM_NOHUGEPAGE
      powerpc -               VM_SAO          -               -
      parisc  -               VM_GROWSUP      -               -
      ia64    -               VM_GROWSUP      -               -
      nommu   -               VM_MAPPED_COPY  -               -
      others  -               VM_ARCH_1       -               -
      
      And voila! One completely free bit.
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Carsten Otte <cotte@de.ibm.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Morris <james.l.morris@oracle.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Venkatesh Pallipadi <venki@google.com>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cc2383ec
    • K
      mm, x86, pat: rework linear pfn-mmap tracking · b3b9c293
      Konstantin Khlebnikov 提交于
      Replace the generic vma-flag VM_PFN_AT_MMAP with x86-only VM_PAT.
      
      We can toss mapping address from remap_pfn_range() into
      track_pfn_vma_new(), and collect all PAT-related logic together in
      arch/x86/.
      
      This patch also restores orignal frustration-free is_cow_mapping() check
      in remap_pfn_range(), as it was before commit v2.6.28-rc8-88-g3c8bb73a
      ("x86: PAT: store vm_pgoff for all linear_over_vma_region mappings - v3")
      
      is_linear_pfn_mapping() checks can be removed from mm/huge_memory.c,
      because it already handled by VM_PFNMAP in VM_NO_THP bit-mask.
      
      [suresh.b.siddha@intel.com: Reset the VM_PAT flag as part of untrack_pfn_vma()]
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Cc: Venkatesh Pallipadi <venki@google.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Carsten Otte <cotte@de.ibm.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: James Morris <james.l.morris@oracle.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Venkatesh Pallipadi <venki@google.com>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b3b9c293
    • S
      x86, pat: separate the pfn attribute tracking for remap_pfn_range and vm_insert_pfn · 5180da41
      Suresh Siddha 提交于
      With PAT enabled, vm_insert_pfn() looks up the existing pfn memory
      attribute and uses it.  Expectation is that the driver reserves the
      memory attributes for the pfn before calling vm_insert_pfn().
      
      remap_pfn_range() (when called for the whole vma) will setup a new
      attribute (based on the prot argument) for the specified pfn range.
      This addresses the legacy usage which typically calls remap_pfn_range()
      with a desired memory attribute.  For ranges smaller than the vma size
      (which is typically not the case), remap_pfn_range() will use the
      existing memory attribute for the pfn range.
      
      Expose two different API's for these different behaviors.
      track_pfn_insert() for tracking the pfn attribute set by vm_insert_pfn()
      and track_pfn_remap() for the remap_pfn_range().
      
      This cleanup also prepares the ground for the track/untrack pfn vma
      routines to take over the ownership of setting PAT specific vm_flag in
      the 'vma'.
      
      [khlebnikov@openvz.org: Clear checks in track_pfn_remap()]
      [akpm@linux-foundation.org: tweak a few comments]
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: Venkatesh Pallipadi <venki@google.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Carsten Otte <cotte@de.ibm.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: James Morris <james.l.morris@oracle.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
      Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5180da41
    • S
      x86, pat: remove the dependency on 'vm_pgoff' in track/untrack pfn vma routines · b1a86e15
      Suresh Siddha 提交于
      'pfn' argument for track_pfn_vma_new() can be used for reserving the
      attribute for the pfn range.  No need to depend on 'vm_pgoff'
      
      Similarly, untrack_pfn_vma() can depend on the 'pfn' argument if it is
      non-zero or can use follow_phys() to get the starting value of the pfn
      range.
      
      Also the non zero 'size' argument can be used instead of recomputing it
      from vma.
      
      This cleanup also prepares the ground for the track/untrack pfn vma
      routines to take over the ownership of setting PAT specific vm_flag in the
      'vma'.
      
      [khlebnikov@openvz.org: Clear pfn to paddr conversion]
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: Venkatesh Pallipadi <venki@google.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Carsten Otte <cotte@de.ibm.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Morris <james.l.morris@oracle.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b1a86e15
    • R
      mm: remove __GFP_NO_KSWAPD · c6543459
      Rik van Riel 提交于
      When transparent huge pages were introduced, memory compaction and swap
      storms were an issue, and the kernel had to be careful to not make THP
      allocations cause pageout or compaction.
      
      Now that we have working compaction deferral, kswapd is smart enough to
      invoke compaction and the quadratic behaviour around isolate_free_pages
      has been fixed, it should be safe to remove __GFP_NO_KSWAPD.
      
      [minchan@kernel.org: Comment fix]
      [mgorman@suse.de: Avoid direct reclaim for deferred compaction]
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c6543459
    • S
      CPU hotplug, debug: detect imbalance between get_online_cpus() and put_online_cpus() · 075663d1
      Srivatsa S. Bhat 提交于
      The synchronization between CPU hotplug readers and writers is achieved
      by means of refcounting, safeguarded by the cpu_hotplug.lock.
      
      get_online_cpus() increments the refcount, whereas put_online_cpus()
      decrements it.  If we ever hit an imbalance between the two, we end up
      compromising the guarantees of the hotplug synchronization i.e, for
      example, an extra call to put_online_cpus() can end up allowing a
      hotplug reader to execute concurrently with a hotplug writer.
      
      So, add a WARN_ON() in put_online_cpus() to detect such cases where the
      refcount can go negative, and also attempt to fix it up, so that we can
      continue to run.
      Signed-off-by: NSrivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
      Reviewed-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      075663d1
    • C
      Kconfig: clean up the "#if defined(arch)" list for exception-trace sysctl entry · 7ac57a89
      Catalin Marinas 提交于
      Introduce SYSCTL_EXCEPTION_TRACE config option and selec it in the
      architectures requiring support for the "exception-trace" debug_table
      entry in kernel/sysctl.c.
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7ac57a89
    • C
      Kconfig: clean up the long arch list for the DEBUG_BUGVERBOSE config option · 9b2a60c4
      Catalin Marinas 提交于
      Introduce HAVE_DEBUG_BUGVERBOSE config option and select it in
      corresponding architecture Kconfig files.  Architectures that already
      select GENERIC_BUG don't need to select HAVE_DEBUG_BUGVERBOSE.
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      Acked-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Hirokazu Takata <takata@linux-m32r.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9b2a60c4
    • C
      Kconfig: clean up the long arch list for the DEBUG_KMEMLEAK config option · b69ec42b
      Catalin Marinas 提交于
      Introduce HAVE_DEBUG_KMEMLEAK config option and select it in corresponding
      architecture Kconfig files.  DEBUG_KMEMLEAK now only depends on
      HAVE_DEBUG_KMEMLEAK.
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b69ec42b
    • C
      Kconfig: clean up the long arch list for the UID16 config option · af1839eb
      Catalin Marinas 提交于
      Introduce HAVE_UID16 config option and select it in corresponding
      architecture Kconfig files.  UID16 now only depends on HAVE_UID16.
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      Acked-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Mike Frysinger <vapier@gentoo.org>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: Jesper Nilsson <jesper.nilsson@axis.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      af1839eb
    • K
      MAINTAINERS: add Konrad as the SWIOTLB maintainer · 6e28b761
      Konrad Rzeszutek Wilk 提交于
      Now that I've an IA64 box on top of the other boxes (IBM with Calgary-X,
      Intel VT-d, AMD Vi, and AMD GART - that can use SWIOTLB as fallback) I can
      reliably do regression testing.
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Cc: Ingo Molnar <mingo@elte.hu>
      Acked-by: N"H. Peter Anvin" <hpa@zytor.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Acked-by: NTony Luck <tony.luck@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6e28b761
    • L
      Merge tag 'sound-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · f5a246ea
      Linus Torvalds 提交于
      Pull sound updates from Takashi Iwai:
       "This contains pretty many small commits covering fairly large range of
        files in sound/ directory.  Partly because of additional API support
        and partly because of constantly developed ASoC and ARM stuff.
      
        Some highlights:
      
         - Introduced the helper function and documentation for exposing the
           channel map via control API, as discussed in Plumbers; most of PCI
           drivers are covered, will follow more drivers later
      
         - Most of drivers have been replaced with the new PM callbacks (if
           the bus is supported)
      
         - HD-audio controller got the support of runtime PM and the support
           of D3 clock-stop.  Also changing the power_save option in sysfs
           kicks off immediately to enable / disable the power-save mode.
      
         - Another significant code change in HD-audio is the rewrite of
           firmware loading code.  Other than that, most of changes in
           HD-audio are continued cleanups and standardization for the generic
           auto parser and bug fixes (HBR, device-specific fixups), in
           addition to the support of channel-map API.
      
         - Addition of ASoC bindings for the compressed API, used by the
           mid-x86 drivers.
      
         - Lots of cleanups and API refreshes for ASoC codec drivers and
           DaVinci.
      
         - Conversion of OMAP to dmaengine.
      
         - New machine driver for Wolfson Microelectronics Bells.
      
         - New CODEC driver for Wolfson Microelectronics WM0010.
      
         - Enhancements to the ux500 and wm2000 drivers
      
         - A new driver for DA9055 and the support for regulator bypass mode."
      
      Fix up various arm soc header file reorg conflicts.
      
      * tag 'sound-3.7' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: (339 commits)
        ALSA: hda - Add new codec ALC283 ALC290 support
        ALSA: hda - avoid unneccesary indices on "Headphone Jack" controls
        ALSA: hda - fix indices on boost volume on Conexant
        ALSA: aloop - add locking to timer access
        ALSA: hda - Fix hang caused by race during suspend.
        sound: Remove unnecessary semicolon
        ALSA: hda/realtek - Fix detection of ALC271X codec
        ALSA: hda - Add inverted internal mic quirk for Lenovo IdeaPad U310
        ALSA: hda - make Realtek/Sigmatel/Conexant use the generic unsol event
        ALSA: hda - make a generic unsol event handler
        ASoC: codecs: Add DA9055 codec driver
        ASoC: eukrea-tlv320: Convert it to platform driver
        ALSA: ASoC: add DT bindings for CS4271
        ASoC: wm_hubs: Ensure volume updates are handled during class W startup
        ASoC: wm5110: Adding missing volume update bits
        ASoC: wm5110: Add OUT3R support
        ASoC: wm5110: Add AEC loopback support
        ASoC: wm5110: Rename EPOUT to HPOUT3
        ASoC: arizona: Add more clock rates
        ASoC: arizona: Add more DSP options for mixer input muxes
        ...
      f5a246ea
    • O
      exec: make de_thread() killable · d5bbd43d
      Oleg Nesterov 提交于
      Change de_thread() to use KILLABLE rather than UNINTERRUPTIBLE while
      waiting for other threads.  The only complication is that we should
      clear ->group_exit_task and ->notify_count before we return, and we
      should do this under tasklist_lock.  -EAGAIN is used to match the
      initial signal_group_exit() check/return, it doesn't really matter.
      
      This fixes the (unlikely) race with coredump.  de_thread() checks
      signal_group_exit() before it starts to kill the subthreads, but this
      can't help if another CLONE_VM (but non CLONE_THREAD) task starts the
      coredumping after de_thread() unlocks ->siglock.  In this case the
      killed sub-thread can block in exit_mm() waiting for coredump_finish(),
      execing thread waits for that sub-thead, and the coredumping thread
      waits for execing thread.  Deadlock.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d5bbd43d
    • L
      Merge tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64 · b5356a19
      Linus Torvalds 提交于
      Pull arm64 changes from Catalin Marinas:
       "arm64 fixes:
         - Use swiotlb_init() instead of swiotlb_init_with_default_size().
           The latter is now a static function (commit 74838b75 "swiotlb:
           add the late swiotlb initialization function with iotlb memory").
         - Enable interrupts before calling do_notify_resume().
      
        arm64 clean-up:
         - Use the generic implementation of compat_sys_sendfile() on arm64 as
           commit 8f9c0119 (introducing the function) has been merged."
      
      * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/cmarinas/linux-aarch64:
        arm64: Enable interrupts before calling do_notify_resume()
        arm64: Use the generic compat_sys_sendfile() implementation
        arm64: Call swiotlb_init() instead of swiotlb_init_with_default_size()
      b5356a19
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc · 3c5af8d1
      Linus Torvalds 提交于
      Pull sparc changes from David S Miller:
       "There is an attempt to fix a bad interaction between syscall tracing
        and force_successful_syscall() from Al Viro, but it needs to be redone
        as it introduced regressions and thus had to be reverted for now.
      
        Al is working on an updated version.
      
        But what we do have here are some significant bzero/memset
        improvements for Niagara-4.  An 8K page can be cleared in around 600
        cycles, because we essentially have a store that behaves like
        powerpc's dcbz that we can actually make real use of."
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc:
        Revert strace hiccups fix.
        sparc64: Niagara-4 bzero/memset, plus use MRU stores in page copy.
        sparc64: Fix strace hiccups when force_successful_syscall() triggers.
        sparc64: Rearrange thread info to cheaply clear syscall noerror state.
      3c5af8d1
    • C
      arm64: Enable interrupts before calling do_notify_resume() · 6916fd08
      Catalin Marinas 提交于
      task_work_run() implementation had the side effect of enabling
      interrupts. With commit ac3d0da8 (task_work: Make task_work_add()
      lockless), interrupts are no longer enabled revealing the bug in the
      arch code. This patch enables the interrupt explicitly before calling
      do_notify_resume().
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      6916fd08
  2. 08 10月, 2012 11 次提交
    • C
      arm64: Use the generic compat_sys_sendfile() implementation · e048d004
      Catalin Marinas 提交于
      The generic implementation of compat_sys_sendfile() has been introduced
      by commit 8f9c0119. This patch removes the arm64 implementation in
      favour of the generic one.
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      e048d004
    • C
      arm64: Call swiotlb_init() instead of swiotlb_init_with_default_size() · 27222a3d
      Catalin Marinas 提交于
      Following commit 74838b75 (swiotlb: add the late swiotlb initialization
      function with iotlb memory) the swiotlb_init_with_default_size() is a
      static function. This patch changes the arm64 code to call
      swiotlb_init() instead and use the default size of 64MB. It is assumed
      that AArch64 platforms have enough RAM to afford the pre-allocated
      swiotlb memory. It also removes the #ifdef around this call since
      CONFIG_SWIOTLB is always enabled.
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      27222a3d
    • L
      Merge tag 'upstream-3.7-rc1-fastmap' of git://git.infradead.org/linux-ubi · e9eca4de
      Linus Torvalds 提交于
      Pull UBI fastmap changes from Artem Bityutskiy:
       "This pull request contains the UBI fastmap support implemented by
        Richard Weinberger from Linutronix.  Fastmap is designed to address
        UBI's slow scanning issues.  Namely, it introduces a new on-flash
        data-structure called "fastmap", which stores the information about
        logical<->physical eraseblocks mappings.  So now to get this
        information just read the fastmap, instead of doing full scan.  More
        information here can be found in Richard's announcement in LKML
        (Subject: UBI: Fastmap request for inclusion (v19)):
      
           http://thread.gmane.org/gmane.linux.kernel/1364922/focus=1369109
      
        One thing I want to explicitly say is that fastmap did not have large
        enough linux-next exposure.  It is partially my fault - I did not
        respond quickly enough.  I _really_ apologize for this.  But it had
        good testing and disabled by default, so I do not expect that we'll
        break anything.
      
        Fastmap is declared as experimental so far, and it is off by default.
        We did declare that the on-flash format may be changed.  The reason
        for this is that no one used it in real production so far, so there is
        a high risk that something is missing.  Besides, we do not have
        user-space tools supporting fastmap so far.
      
        Nevertheless, I suggest we merge this feature.  Many people want UBI's
        scanning bottleneck to be fixed and merging fastmap now should
        accelerate its production use.  The plan is to make it bullet-prove,
        somewhat clean-up, and make it the default for UBI.  I do not know how
        many kernel releases will it take.
      
        Basically, I what I want to do for fastmap is something like Linus did
        for btrfs few years ago."
      
      * tag 'upstream-3.7-rc1-fastmap' of git://git.infradead.org/linux-ubi:
        UBI: Wire-up fastmap
        UBI: Add fastmap core
        UBI: Add fastmap support to the WL sub-system
        UBI: Add fastmap stuff to attach.c
        UBI: Wire-up ->fm_sem
        UBI: Add fastmap bits to build.c
        UBI: Add self_check_eba()
        UBI: Export next_sqnum()
        UBI: Add fastmap stuff to ubi.h
        UBI: Add fastmap on-flash data structures
      e9eca4de
    • L
      Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux · 1929041b
      Linus Torvalds 提交于
      Pill drm updates part 2 from Dave Airlie:
       "This is the follow-up pull, 3 pieces
      
        a) exynos next stuff, was delayed but looks okay to me, one patch in
           v4l bits but it was acked by v4l person.
        b) UAPI disintegration bits
        c) intel fixes - DP fixes, hang fixes, other misc fixes."
      
      * 'drm-next' of git://people.freedesktop.org/~airlied/linux: (52 commits)
        drm: exynos: hdmi: remove drm common hdmi platform data struct
        drm: exynos: hdmi: add support for exynos5 hdmi
        drm: exynos: hdmi: replace is_v13 with version check in hdmi
        drm: exynos: hdmi: add support for exynos5 mixer
        drm: exynos: hdmi: add support to disable video processor in mixer
        drm: exynos: hdmi: add support for platform variants for mixer
        drm: exynos: hdmi: add support for exynos5 hdmiphy
        drm: exynos: hdmi: add support for exynos5 ddc
        drm: exynos: remove drm hdmi platform data struct
        drm: exynos: hdmi: turn off HPD interrupt in HDMI chip
        drm: exynos: hdmi: use s5p-hdmi platform data
        drm: exynos: hdmi: fix interrupt handling
        drm: exynos: hdmi: support for platform variants
        media: s5p-hdmi: add HPD GPIO to platform data
        UAPI: (Scripted) Disintegrate include/drm
        drm/i915: Fix GT_MODE default value
        drm/i915: don't frob the vblank ts in finish_page_flip
        drm/i915: call drm_handle_vblank before finish_page_flip
        drm/i915: print warning if vmi915_gem_fault error is not handled
        drm/i915: EBUSY status handling added to i915_gem_fault().
        ...
      1929041b
    • L
      Merge branch 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild · d43b7167
      Linus Torvalds 提交于
      Pull kbuild fixes from Michal Marek:
       "Here are two fixes I intended to send after v3.6-rc7, but failed to do
        so.  So please pull them for v3.7-rc1 and they will be picked up by
        stable.
      
        The first one fixes gcc -x <language> syntax in various build-time
        tests, which icecream and possible other gcc wrappers did not
        understand (and yes, icecream is going to be fixed as well).
      
        The second one fixes make tar-pkg so that unpacking the tarball does
        not replace the /lib -> /usr/lib symlink on recent Fedora releases."
      
      * 'rc-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/mmarek/kbuild:
        kbuild: Fix gcc -x syntax
        kbuild: Do not package /boot and /lib in make tar-pkg
      d43b7167
    • S
      localmodconfig: Document localmodconfig in README · 80b810b2
      Steven Rostedt 提交于
      Someone (over a year ago :-p) asked me to document localmodconfig in the
      README file in the source code.  I thought it was a good idea but other
      things were more important and I simply forgot about it.  Well, I
      stumbled on the email asking me about this and I'm sending it out now.
      Signed-off-by: NSteven "Mr. Procrastinator" Rostedt <rostedt@goodmis.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      80b810b2
    • L
      Merge branch 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux · d8dc91b7
      Linus Torvalds 提交于
      Pul ACPI & Power Management updates from Len Brown:
       - acpidump utility added
       - intel_idle driver now supports IVB Xeon
       - turbostat utility can now count SMIs
       - ACPI can now bind to USB3 hubs
       - misc fixes
      
      * 'release' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux: (49 commits)
        ACPI: Add new sysfs interface to export device description
        ACPI: Harden acpi_table_parse_entries() against BIOS bug
        tools/power/turbostat: add option to count SMIs, re-name some options
        tools/power turbostat: add [-d MSR#][-D MSR#] options to print counter deltas
        intel_idle: enable IVB Xeon support
        tools/power turbostat: add [-m MSR#] option
        tools/power turbostat: make -M output pretty
        tools/power turbostat: print more turbo-limit information
        tools/power turbostat: delete unused line
        tools/power turbostat: run on IVB Xeon
        tools/power/acpi/acpidump: create acpidump(8), local make install targets
        tools/power/acpi/acpidump: version 20101221 - find dynamic tables in sysfs
        ACPI: run _OSC after ACPI_FULL_INITIALIZATION
        tools/power/acpi/acpidump: create acpidump(8), local make install targets
        tools/power/acpi/acpidump: version 20101221 - find dynamic tables in sysfs
        tools/power/acpi/acpidump: version 20071116
        tools/power/acpi/acpidump: version 20070714
        tools/power/acpi/acpidump: version 20060606
        tools/power/acpi/acpidump: version 20051111
        xo15-ebook: convert to module_acpi_driver()
        ...
      d8dc91b7
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client · 7035cdf3
      Linus Torvalds 提交于
      Pull ceph updates from Sage Weil:
       "The bulk of this pull is a series from Alex that refactors and cleans
        up the RBD code to lay the groundwork for supporting the new image
        format and evolving feature set.  There are also some cleanups in
        libceph, and for ceph there's fixed validation of file striping
        layouts and a bugfix in the code handling a shrinking MDS cluster."
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client: (71 commits)
        ceph: avoid 32-bit page index overflow
        ceph: return EIO on invalid layout on GET_DATALOC ioctl
        rbd: BUG on invalid layout
        ceph: propagate layout error on osd request creation
        libceph: check for invalid mapping
        ceph: convert to use le32_add_cpu()
        ceph: Fix oops when handling mdsmap that decreases max_mds
        rbd: update remaining header fields for v2
        rbd: get snapshot name for a v2 image
        rbd: get the snapshot context for a v2 image
        rbd: get image features for a v2 image
        rbd: get the object prefix for a v2 rbd image
        rbd: add code to get the size of a v2 rbd image
        rbd: lay out header probe infrastructure
        rbd: encapsulate code that gets snapshot info
        rbd: add an rbd features field
        rbd: don't use index in __rbd_add_snap_dev()
        rbd: kill create_snap sysfs entry
        rbd: define rbd_dev_image_id()
        rbd: define some new format constants
        ...
      7035cdf3
    • L
      Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · 6432f212
      Linus Torvalds 提交于
      Pull ext4 updates from Ted Ts'o:
       "The big new feature added this time is supporting online resizing
        using the meta_bg feature.  This allows us to resize file systems
        which are greater than 16TB.  In addition, the speed of online
        resizing has been improved in general.
      
        We also fix a number of races, some of which could lead to deadlocks,
        in ext4's Asynchronous I/O and online defrag support, thanks to good
        work by Dmitry Monakhov.
      
        There are also a large number of more minor bug fixes and cleanups
        from a number of other ext4 contributors, quite of few of which have
        submitted fixes for the first time."
      
      * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (69 commits)
        ext4: fix ext4_flush_completed_IO wait semantics
        ext4: fix mtime update in nodelalloc mode
        ext4: fix ext_remove_space for punch_hole case
        ext4: punch_hole should wait for DIO writers
        ext4: serialize truncate with owerwrite DIO workers
        ext4: endless truncate due to nonlocked dio readers
        ext4: serialize unlocked dio reads with truncate
        ext4: serialize dio nonlocked reads with defrag workers
        ext4: completed_io locking cleanup
        ext4: fix unwritten counter leakage
        ext4: give i_aiodio_unwritten a more appropriate name
        ext4: ext4_inode_info diet
        ext4: convert to use leXX_add_cpu()
        ext4: ext4_bread usage audit
        fs: reserve fallocate flag codepoint
        ext4: remove redundant offset check in mext_check_arguments()
        ext4: don't clear orphan list on ro mount with errors
        jbd2: fix assertion failure in commit code due to lacking transaction credits
        ext4: release donor reference when EXT4_IOC_MOVE_EXT ioctl fails
        ext4: enable FITRIM ioctl on bigalloc file system
        ...
      6432f212
    • L
      Merge branch 'i2c-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging · 1b033447
      Linus Torvalds 提交于
      Pull i2c updates from Jean Delvare:
       "Most visible changes are the SMBus multiplexing support added to the
        i2c-i801 driver, as well as support for the VIA VX900."
      
      * 'i2c-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
        i2c-piix4: Fix build failure
        i2c: Correct struct i2c_driver doc about detection
        i2c-i801: Let i2c-mux-gpio find the GPIO chip
        i2c-mux-gpio: Update documentation
        i2c-mux-gpio: Add support for dynamically allocated GPIO pins
        i2c-mux-gpio: Use devm_kzalloc instead of kzalloc
        i2c-i801: Support SMBus multiplexing on Asus Z8 series
        i2c-viapro: Add VIA VX900 device ID
        i2c-parport: i2c_parport_irq can be static
        i2c-designware: i2c_dw_xfer_msg can be static
        i2c/scx200_*: Replace printks with pr_<level>s
        i2c: Make I2C available on UML
        i2c: Convert struct i2c_msg initialization to C99 format
        i2c-smbus: Convert kzalloc to devm_kzalloc
        i2c-mux: Add support for device auto-detection
      1b033447
    • L
      Merge tag 'iommu-updates-v3.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu · c0703c12
      Linus Torvalds 提交于
      Pull IOMMU updates from Joerg Roedel:
       "This time the IOMMU updates contain a bunch of fixes and cleanups to
        various IOMMU drivers and the DMA debug code.  New features are the
        code for IRQ remapping support with the AMD IOMMU (preperation for
        that was already merged in the last release) and a debugfs interface
        to export some statistics in the NVidia Tegra IOMMU driver."
      
      * tag 'iommu-updates-v3.7-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (27 commits)
        iommu/amd: Remove obsolete comment line
        dma-debug: Remove local BUS_NOTIFY_UNBOUND_DRIVER define
        iommu/amd: Fix possible use after free in get_irq_table()
        iommu/amd: Report irq remapping through IOMMU-API
        iommu/amd: Print message to system log when irq remapping is enabled
        iommu/irq: Use amd_iommu_irq_ops if supported
        iommu/amd: Make sure irq remapping still works on dma init failure
        iommu/amd: Add initialization routines for AMD interrupt remapping
        iommu/amd: Add call-back routine for HPET MSI
        iommu/amd: Implement MSI routines for interrupt remapping
        iommu/amd: Add IOAPIC remapping routines
        iommu/amd: Add routines to manage irq remapping tables
        iommu/amd: Add IRTE invalidation routine
        iommu/amd: Make sure IOMMU is not considered to translate itself
        iommu/amd: Split device table initialization into irq and dma part
        iommu/amd: Check if IOAPIC information is correct
        iommu/amd: Allocate data structures to keep track of irq remapping tables
        iommu/amd: Add slab-cache for irq remapping tables
        iommu/amd: Keep track of HPET and IOAPIC device ids
        iommu/amd: Fix features reporting
        ...
      c0703c12
  3. 07 10月, 2012 2 次提交
    • L
      Merge branch 'for-linus' of git://git.linaro.org/people/rmk/linux-arm · 0e51793e
      Linus Torvalds 提交于
      Pull ARM updates from Russell King:
       "This is the first chunk of ARM updates for this merge window.
        Conflicts are expected in two files - asm/timex.h and
        mach-integrator/integrator_cp.c.  Nothing particularly stands out more
        than anything else.
      
        Most of the growth is down to the opcodes stuff from Dave Martin,
        which is countered by Rob's patches to use more of the asm-generic
        headers on ARM."
      
      (A few more conflicts grew since then, but it all looked fairly trivial)
      
      * 'for-linus' of git://git.linaro.org/people/rmk/linux-arm: (44 commits)
        ARM: 7548/1: include linux/sched.h in syscall.h
        ARM: 7541/1: Add ARM ERRATA 775420 workaround
        ARM: ensure vm_struct has its phys_addr member filled in
        ARM: 7540/1: kexec: Check segment memory addresses
        ARM: 7539/1: kexec: scan for dtb magic in segments
        ARM: 7538/1: delay: add registration mechanism for delay timer sources
        ARM: 7536/1: smp: Formalize an IPI for wakeup
        ARM: 7525/1: ptrace: use updated syscall number for syscall auditing
        ARM: 7524/1: support syscall tracing
        ARM: 7519/1: integrator: convert platform devices to Device Tree
        ARM: 7518/1: integrator: convert AMBA devices to device tree
        ARM: 7517/1: integrator: initial device tree support
        ARM: 7516/1: plat-versatile: add DT support to FPGA IRQ
        ARM: 7515/1: integrator: check PL010 base address from resource
        ARM: 7514/1: integrator: call common init function from machine
        ARM: 7522/1: arch_timers: register a time/cycle counter
        ARM: 7523/1: arch_timers: enable the use of the virtual timer
        ARM: 7531/1: mark kernelmode mem{cpy,set} non-experimental
        ARM: 7520/1: Build dtb files in all target
        ARM: Fix build warning in arch/arm/mm/alignment.c
        ...
      0e51793e
    • L
      Merge branch 'next' of git://git.monstr.eu/linux-2.6-microblaze · 5cad3598
      Linus Torvalds 提交于
      Pull microblaze arch updates from Michal Simek.
      
      * 'next' of git://git.monstr.eu/linux-2.6-microblaze:
        Revert "microblaze_mmu_v2: Update signal returning address"
        microblaze: Added more support for PCI
        microblaze: Prefer to use pr_XXX instead of printk(KERN_XX)
        microblaze: Fix bug with passing command line
        microblaze: Remove PAGE properties duplication
        microblaze: Remove additional andi which has been already done
        microblaze: Use predefined macro for ESR_DIZ
        microblaze: Support 4k/16k/64k pages
        microblaze: Do not used hardcoded value in exception handler
        microblaze: Added fdt chosen capability for timer
        microblaze: Add support for ioreadXX/iowriteXX_rep
        microblaze: Improve failure handling for GPIO reset
        microblaze: clinkage.h
      5cad3598