1. 09 10月, 2012 7 次提交
    • K
      mm: kill vma flag VM_EXECUTABLE and mm->num_exe_file_vmas · e9714acf
      Konstantin Khlebnikov 提交于
      Currently the kernel sets mm->exe_file during sys_execve() and then tracks
      number of vmas with VM_EXECUTABLE flag in mm->num_exe_file_vmas, as soon
      as this counter drops to zero kernel resets mm->exe_file to NULL.  Plus it
      resets mm->exe_file at last mmput() when mm->mm_users drops to zero.
      
      VMA with VM_EXECUTABLE flag appears after mapping file with flag
      MAP_EXECUTABLE, such vmas can appears only at sys_execve() or after vma
      splitting, because sys_mmap ignores this flag.  Usually binfmt module sets
      mm->exe_file and mmaps executable vmas with this file, they hold
      mm->exe_file while task is running.
      
      comment from v2.6.25-6245-g925d1c40 ("procfs task exe symlink"),
      where all this stuff was introduced:
      
      > The kernel implements readlink of /proc/pid/exe by getting the file from
      > the first executable VMA.  Then the path to the file is reconstructed and
      > reported as the result.
      >
      > Because of the VMA walk the code is slightly different on nommu systems.
      > This patch avoids separate /proc/pid/exe code on nommu systems.  Instead of
      > walking the VMAs to find the first executable file-backed VMA we store a
      > reference to the exec'd file in the mm_struct.
      >
      > That reference would prevent the filesystem holding the executable file
      > from being unmounted even after unmapping the VMAs.  So we track the number
      > of VM_EXECUTABLE VMAs and drop the new reference when the last one is
      > unmapped.  This avoids pinning the mounted filesystem.
      
      exe_file's vma accounting is hooked into every file mmap/unmmap and vma
      split/merge just to fix some hypothetical pinning fs from umounting by mm,
      which already unmapped all its executable files, but still alive.
      
      Seems like currently nobody depends on this behaviour.  We can try to
      remove this logic and keep mm->exe_file until final mmput().
      
      mm->exe_file is still protected with mm->mmap_sem, because we want to
      change it via new sys_prctl(PR_SET_MM_EXE_FILE).  Also via this syscall
      task can change its mm->exe_file and unpin mountpoint explicitly.
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Carsten Otte <cotte@de.ibm.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Morris <james.l.morris@oracle.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Venkatesh Pallipadi <venki@google.com>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e9714acf
    • K
      mm: kill vma flag VM_CAN_NONLINEAR · 0b173bc4
      Konstantin Khlebnikov 提交于
      Move actual pte filling for non-linear file mappings into the new special
      vma operation: ->remap_pages().
      
      Filesystems must implement this method to get non-linear mapping support,
      if it uses filemap_fault() then generic_file_remap_pages() can be used.
      
      Now device drivers can implement this method and obtain nonlinear vma support.
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Carsten Otte <cotte@de.ibm.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>	#arch/tile
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Morris <james.l.morris@oracle.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Venkatesh Pallipadi <venki@google.com>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0b173bc4
    • K
      mm: kill vma flag VM_INSERTPAGE · 4b6e1e37
      Konstantin Khlebnikov 提交于
      Merge VM_INSERTPAGE into VM_MIXEDMAP.  VM_MIXEDMAP VMA can mix pure-pfn
      ptes, special ptes and normal ptes.
      
      Now copy_page_range() always copies VM_MIXEDMAP VMA on fork like
      VM_PFNMAP.  If driver populates whole VMA at mmap() it probably not
      expects page-faults.
      
      This patch removes special check from vma_wants_writenotify() which
      disables pages write tracking for VMA populated via vm_instert_page().
      BDI below mapped file should not use dirty-accounting, moreover
      do_wp_page() can handle this.
      
      vm_insert_page() still marks vma after first usage.  Usually it is called
      from f_op->mmap() handler under mm->mmap_sem write-lock, so it able to
      change vma->vm_flags.  Caller must set VM_MIXEDMAP at mmap time if it
      wants to call this function from other places, for example from page-fault
      handler.
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Carsten Otte <cotte@de.ibm.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Morris <james.l.morris@oracle.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Venkatesh Pallipadi <venki@google.com>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4b6e1e37
    • K
      mm: introduce arch-specific vma flag VM_ARCH_1 · cc2383ec
      Konstantin Khlebnikov 提交于
      Combine several arch-specific vma flags into one.
      
      before patch:
      
              0x00000200      0x01000000      0x20000000      0x40000000
      x86     VM_NOHUGEPAGE   VM_HUGEPAGE     -               VM_PAT
      powerpc -               -               VM_SAO          -
      parisc  VM_GROWSUP      -               -               -
      ia64    VM_GROWSUP      -               -               -
      nommu   -               VM_MAPPED_COPY  -               -
      others  -               -               -               -
      
      after patch:
      
              0x00000200      0x01000000      0x20000000      0x40000000
      x86     -               VM_PAT          VM_HUGEPAGE     VM_NOHUGEPAGE
      powerpc -               VM_SAO          -               -
      parisc  -               VM_GROWSUP      -               -
      ia64    -               VM_GROWSUP      -               -
      nommu   -               VM_MAPPED_COPY  -               -
      others  -               VM_ARCH_1       -               -
      
      And voila! One completely free bit.
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Carsten Otte <cotte@de.ibm.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Morris <james.l.morris@oracle.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Venkatesh Pallipadi <venki@google.com>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cc2383ec
    • K
      mm, x86, pat: rework linear pfn-mmap tracking · b3b9c293
      Konstantin Khlebnikov 提交于
      Replace the generic vma-flag VM_PFN_AT_MMAP with x86-only VM_PAT.
      
      We can toss mapping address from remap_pfn_range() into
      track_pfn_vma_new(), and collect all PAT-related logic together in
      arch/x86/.
      
      This patch also restores orignal frustration-free is_cow_mapping() check
      in remap_pfn_range(), as it was before commit v2.6.28-rc8-88-g3c8bb73a
      ("x86: PAT: store vm_pgoff for all linear_over_vma_region mappings - v3")
      
      is_linear_pfn_mapping() checks can be removed from mm/huge_memory.c,
      because it already handled by VM_PFNMAP in VM_NO_THP bit-mask.
      
      [suresh.b.siddha@intel.com: Reset the VM_PAT flag as part of untrack_pfn_vma()]
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Cc: Venkatesh Pallipadi <venki@google.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Carsten Otte <cotte@de.ibm.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: James Morris <james.l.morris@oracle.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Cc: Venkatesh Pallipadi <venki@google.com>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b3b9c293
    • S
      x86, pat: separate the pfn attribute tracking for remap_pfn_range and vm_insert_pfn · 5180da41
      Suresh Siddha 提交于
      With PAT enabled, vm_insert_pfn() looks up the existing pfn memory
      attribute and uses it.  Expectation is that the driver reserves the
      memory attributes for the pfn before calling vm_insert_pfn().
      
      remap_pfn_range() (when called for the whole vma) will setup a new
      attribute (based on the prot argument) for the specified pfn range.
      This addresses the legacy usage which typically calls remap_pfn_range()
      with a desired memory attribute.  For ranges smaller than the vma size
      (which is typically not the case), remap_pfn_range() will use the
      existing memory attribute for the pfn range.
      
      Expose two different API's for these different behaviors.
      track_pfn_insert() for tracking the pfn attribute set by vm_insert_pfn()
      and track_pfn_remap() for the remap_pfn_range().
      
      This cleanup also prepares the ground for the track/untrack pfn vma
      routines to take over the ownership of setting PAT specific vm_flag in
      the 'vma'.
      
      [khlebnikov@openvz.org: Clear checks in track_pfn_remap()]
      [akpm@linux-foundation.org: tweak a few comments]
      Signed-off-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: Venkatesh Pallipadi <venki@google.com>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Nick Piggin <npiggin@kernel.dk>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Carsten Otte <cotte@de.ibm.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Eric Paris <eparis@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: James Morris <james.l.morris@oracle.com>
      Cc: Jason Baron <jbaron@redhat.com>
      Cc: Kentaro Takeda <takedakn@nttdata.co.jp>
      Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: Matt Helsley <matthltc@us.ibm.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Suresh Siddha <suresh.b.siddha@intel.com>
      Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
      Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5180da41
    • R
      mm: remove __GFP_NO_KSWAPD · c6543459
      Rik van Riel 提交于
      When transparent huge pages were introduced, memory compaction and swap
      storms were an issue, and the kernel had to be careful to not make THP
      allocations cause pageout or compaction.
      
      Now that we have working compaction deferral, kswapd is smart enough to
      invoke compaction and the quadratic behaviour around isolate_free_pages
      has been fixed, it should be safe to remove __GFP_NO_KSWAPD.
      
      [minchan@kernel.org: Comment fix]
      [mgorman@suse.de: Avoid direct reclaim for deferred compaction]
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c6543459
  2. 06 10月, 2012 1 次提交
  3. 03 10月, 2012 2 次提交
  4. 29 9月, 2012 1 次提交
  5. 28 9月, 2012 1 次提交
  6. 27 9月, 2012 4 次提交
  7. 26 9月, 2012 3 次提交
  8. 25 9月, 2012 7 次提交
  9. 23 9月, 2012 1 次提交
  10. 21 9月, 2012 2 次提交
    • D
      frontswap: support exclusive gets if tmem backend is capable · e3483a5f
      Dan Magenheimer 提交于
      Tmem, as originally specified, assumes that "get" operations
      performed on persistent pools never flush the page of data out
      of tmem on a successful get, waiting instead for a flush
      operation.  This is intended to mimic the model of a swap
      disk, where a disk read is non-destructive.  Unlike a
      disk, however, freeing up the RAM can be valuable.  Over
      the years that frontswap was in the review process, several
      reviewers (and notably Hugh Dickins in 2010) pointed out that
      this would result, at least temporarily, in two copies of the
      data in RAM: one (compressed for zcache) copy in tmem,
      and one copy in the swap cache.  We wondered if this could
      be done differently, at least optionally.
      
      This patch allows tmem backends to instruct the frontswap
      code that this backend performs exclusive gets.  Zcache2
      already contains hooks to support this feature.  Other
      backends are completely unaffected unless/until they are
      updated to support this feature.
      
      While it is not clear that exclusive gets are a performance
      win on all workloads at all times, this small patch allows for
      experimentation by backends.
      
      P.S. Let's not quibble about the naming of "get" vs "read" vs
      "load" etc.  The naming is currently horribly inconsistent between
      cleancache and frontswap and existing tmem backends, so will need
      to be straightened out as a separate patch.  "Get" is used
      by the tmem architecture spec, existing backends, and
      all documentation and presentation material so I am
      using it in this patch.
      Signed-off-by: NDan Magenheimer <dan.magenheimer@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      e3483a5f
    • Z
      mm: frontswap: fix a wrong if condition in frontswap_shrink · a00bb1e9
      Zhenzhong Duan 提交于
      pages_to_unuse is set to 0 to unuse all frontswap pages
      But that doesn't happen since a wrong condition in frontswap_shrink
      cancel it.
      
      -v2: Add comment to explain return value of __frontswap_shrink,
      as suggested by Dan Carpenter, thanks
      Signed-off-by: NZhenzhong Duan <zhenzhong.duan@oracle.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      a00bb1e9
  11. 19 9月, 2012 2 次提交
  12. 18 9月, 2012 6 次提交
  13. 15 9月, 2012 1 次提交
    • T
      cgroup: mark subsystems with broken hierarchy support and whine if cgroups are nested for them · 8c7f6edb
      Tejun Heo 提交于
      Currently, cgroup hierarchy support is a mess.  cpu related subsystems
      behave correctly - configuration, accounting and control on a parent
      properly cover its children.  blkio and freezer completely ignore
      hierarchy and treat all cgroups as if they're directly under the root
      cgroup.  Others show yet different behaviors.
      
      These differing interpretations of cgroup hierarchy make using cgroup
      confusing and it impossible to co-mount controllers into the same
      hierarchy and obtain sane behavior.
      
      Eventually, we want full hierarchy support from all subsystems and
      probably a unified hierarchy.  Users using separate hierarchies
      expecting completely different behaviors depending on the mounted
      subsystem is deterimental to making any progress on this front.
      
      This patch adds cgroup_subsys.broken_hierarchy and sets it to %true
      for controllers which are lacking in hierarchy support.  The goal of
      this patch is two-fold.
      
      * Move users away from using hierarchy on currently non-hierarchical
        subsystems, so that implementing proper hierarchy support on those
        doesn't surprise them.
      
      * Keep track of which controllers are broken how and nudge the
        subsystems to implement proper hierarchy support.
      
      For now, start with a single warning message.  We can whine louder
      later on.
      
      v2: Fixed a typo spotted by Michal. Warning message updated.
      
      v3: Updated memcg part so that it doesn't generate warning in the
          cases where .use_hierarchy=false doesn't make the behavior
          different from root.use_hierarchy=true.  Fixed a typo spotted by
          Glauber.
      
      v4: Check ->broken_hierarchy after cgroup creation is complete so that
          ->create() can affect the result per Michal.  Dropped unnecessary
          memcg root handling per Michal.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Acked-by: NSerge E. Hallyn <serue@us.ibm.com>
      Cc: Glauber Costa <glommer@parallels.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Paul Turner <pjt@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Thomas Graf <tgraf@suug.ch>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      8c7f6edb
  14. 12 9月, 2012 1 次提交
    • M
      slab: fix the DEADLOCK issue on l3 alien lock · 947ca185
      Michael Wang 提交于
      DEADLOCK will be report while running a kernel with NUMA and LOCKDEP enabled,
      the process of this fake report is:
      
      	   kmem_cache_free()	//free obj in cachep
      	-> cache_free_alien()	//acquire cachep's l3 alien lock
      	-> __drain_alien_cache()
      	-> free_block()
      	-> slab_destroy()
      	-> kmem_cache_free()	//free slab in cachep->slabp_cache
      	-> cache_free_alien()	//acquire cachep->slabp_cache's l3 alien lock
      
      Since the cachep and cachep->slabp_cache's l3 alien are in the same lock class,
      fake report generated.
      
      This should not happen since we already have init_lock_keys() which will
      reassign the lock class for both l3 list and l3 alien.
      
      However, init_lock_keys() was invoked at a wrong position which is before we
      invoke enable_cpucache() on each cache.
      
      Since until set slab_state to be FULL, we won't invoke enable_cpucache()
      on caches to build their l3 alien while creating them, so although we invoked
      init_lock_keys(), the l3 alien lock class won't change since we don't have
      them until invoked enable_cpucache() later.
      
      This patch will invoke init_lock_keys() after we done enable_cpucache()
      instead of before to avoid the fake DEADLOCK report.
      
      Michael traced the problem back to a commit in release 3.0.0:
      
      commit 30765b92
      Author: Peter Zijlstra <peterz@infradead.org>
      Date:   Thu Jul 28 23:22:56 2011 +0200
      
          slab, lockdep: Annotate the locks before using them
      
          Fernando found we hit the regular OFF_SLAB 'recursion' before we
          annotate the locks, cure this.
      
          The relevant portion of the stack-trace:
      
          > [    0.000000]  [<c085e24f>] rt_spin_lock+0x50/0x56
          > [    0.000000]  [<c04fb406>] __cache_free+0x43/0xc3
          > [    0.000000]  [<c04fb23f>] kmem_cache_free+0x6c/0xdc
          > [    0.000000]  [<c04fb2fe>] slab_destroy+0x4f/0x53
          > [    0.000000]  [<c04fb396>] free_block+0x94/0xc1
          > [    0.000000]  [<c04fc551>] do_tune_cpucache+0x10b/0x2bb
          > [    0.000000]  [<c04fc8dc>] enable_cpucache+0x7b/0xa7
          > [    0.000000]  [<c0bd9d3c>] kmem_cache_init_late+0x1f/0x61
          > [    0.000000]  [<c0bba687>] start_kernel+0x24c/0x363
          > [    0.000000]  [<c0bba0ba>] i386_start_kernel+0xa9/0xaf
      Reported-by: NFernando Lopez-Lezcano <nando@ccrma.Stanford.EDU>
      Acked-by: NPekka Enberg <penberg@kernel.org>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
          Link: http://lkml.kernel.org/r/1311888176.2617.379.camel@laptopSigned-off-by: NIngo Molnar <mingo@elte.hu>
      
      The commit moved init_lock_keys() before we build up the alien, so we
      failed to reclass it.
      
      Cc: <stable@vger.kernel.org> # 3.0+
      Acked-by: NChristoph Lameter <cl@linux.com>
      Tested-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NMichael Wang <wangyun@linux.vnet.ibm.com>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      947ca185
  15. 10 9月, 2012 1 次提交
    • C
      slub: Zero initial memory segment for kmem_cache and kmem_cache_node · 9df53b15
      Christoph Lameter 提交于
      Tony Luck reported the following problem on IA-64:
      
        Worked fine yesterday on next-20120905, crashes today. First sign of
        trouble was an unaligned access, then a NULL dereference. SL*B related
        bits of my config:
      
        CONFIG_SLUB_DEBUG=y
        # CONFIG_SLAB is not set
        CONFIG_SLUB=y
        CONFIG_SLABINFO=y
        # CONFIG_SLUB_DEBUG_ON is not set
        # CONFIG_SLUB_STATS is not set
      
        And he console log.
      
        PID hash table entries: 4096 (order: 1, 32768 bytes)
        Dentry cache hash table entries: 262144 (order: 7, 2097152 bytes)
        Inode-cache hash table entries: 131072 (order: 6, 1048576 bytes)
        Memory: 2047920k/2086064k available (13992k code, 38144k reserved,
        6012k data, 880k init)
        kernel unaligned access to 0xca2ffc55fb373e95, ip=0xa0000001001be550
        swapper[0]: error during unaligned kernel access
         -1 [1]
        Modules linked in:
      
        Pid: 0, CPU 0, comm:              swapper
        psr : 00001010084a2018 ifs : 800000000000060f ip  :
        [<a0000001001be550>]    Not tainted (3.6.0-rc4-zx1-smp-next-20120906)
        ip is at new_slab+0x90/0x680
        unat: 0000000000000000 pfs : 000000000000060f rsc : 0000000000000003
        rnat: 9666960159966a59 bsps: a0000001001441c0 pr  : 9666960159965a59
        ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70433f
        csd : 0000000000000000 ssd : 0000000000000000
        b0  : a0000001001be500 b6  : a00000010112cb20 b7  : a0000001011660a0
        f6  : 0fff7f0f0f0f0e54f0000 f7  : 0ffe8c5c1000000000000
        f8  : 1000d8000000000000000 f9  : 100068800000000000000
        f10 : 10005f0f0f0f0e54f0000 f11 : 1003e0000000000000078
        r1  : a00000010155eef0 r2  : 0000000000000000 r3  : fffffffffffc1638
        r8  : e0000040600081b8 r9  : ca2ffc55fb373e95 r10 : 0000000000000000
        r11 : e000004040001646 r12 : a000000101287e20 r13 : a000000101280000
        r14 : 0000000000004000 r15 : 0000000000000078 r16 : ca2ffc55fb373e75
        r17 : e000004040040000 r18 : fffffffffffc1646 r19 : e000004040001646
        r20 : fffffffffffc15f8 r21 : 000000000000004d r22 : a00000010132fa68
        r23 : 00000000000000ed r24 : 0000000000000000 r25 : 0000000000000000
        r26 : 0000000000000001 r27 : a0000001012b8500 r28 : a00000010135f4a0
        r29 : 0000000000000000 r30 : 0000000000000000 r31 : 0000000000000001
        Unable to handle kernel NULL pointer dereference (address
        0000000000000018)
        swapper[0]: Oops 11003706212352 [2]
        Modules linked in:
      
        Pid: 0, CPU 0, comm:              swapper
        psr : 0000121008022018 ifs : 800000000000cc18 ip  :
        [<a0000001004dc8f1>]    Not tainted (3.6.0-rc4-zx1-smp-next-20120906)
        ip is at __copy_user+0x891/0x960
        unat: 0000000000000000 pfs : 0000000000000813 rsc : 0000000000000003
        rnat: 0000000000000000 bsps: 0000000000000000 pr  : 9666960159961765
        ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c0270033f
        csd : 0000000000000000 ssd : 0000000000000000
        b0  : a00000010004b550 b6  : a00000010004b740 b7  : a00000010000c750
        f6  : 000000000000000000000 f7  : 1003e9e3779b97f4a7c16
        f8  : 1003e0a00000010001550 f9  : 100068800000000000000
        f10 : 10005f0f0f0f0e54f0000 f11 : 1003e0000000000000078
        r1  : a00000010155eef0 r2  : a0000001012870b0 r3  : a0000001012870b8
        r8  : 0000000000000298 r9  : 0000000000000013 r10 : 0000000000000000
        r11 : 9666960159961a65 r12 : a000000101287010 r13 : a000000101280000
        r14 : a000000101287068 r15 : a000000101287080 r16 : 0000000000000298
        r17 : 0000000000000010 r18 : 0000000000000018 r19 : a000000101287310
        r20 : 0000000000000290 r21 : 0000000000000000 r22 : 0000000000000000
        r23 : a000000101386f58 r24 : 0000000000000000 r25 : 000000007fffffff
        r26 : a000000101287078 r27 : a0000001013c69b0 r28 : 0000000000000000
        r29 : 0000000000000014 r30 : 0000000000000000 r31 : 0000000000000813
      
      Sedat Dilek and Hugh Dickins reported similar problems as well.
      
      Earlier patches in the common set moved the zeroing of the kmem_cache
      structure into common code. See "Move allocation of kmem_cache into
      common code".
      
      The allocation for the two special structures is still done from SLUB
      specific code but no zeroing is done since the cache creation functions
      used to zero. This now needs to be updated so that the structures are
      zeroed during allocation in kmem_cache_init().  Otherwise random pointer
      values may be followed.
      Reported-by: NTony Luck <tony.luck@intel.com>
      Reported-by: NSedat Dilek <sedat.dilek@gmail.com>
      Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
      Reported-by: NHugh Dickins <hughd@google.com>
      Tested-by: NSedat Dilek <sedat.dilek@gmail.com>
      Signed-off-by: NChristoph Lameter <cl@linux.com>
      Signed-off-by: NPekka Enberg <penberg@kernel.org>
      9df53b15