1. 17 2月, 2014 3 次提交
  2. 29 1月, 2014 1 次提交
  3. 22 1月, 2014 32 次提交
    • J
      mm/migrate: remove unused function, fail_migrate_page() · 78d5506e
      Joonsoo Kim 提交于
      fail_migrate_page() isn't used anywhere, so remove it.
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Acked-by: NChristoph Lameter <cl@linux.com>
      Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Reviewed-by: NWanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Rafael Aquini <aquini@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      78d5506e
    • J
      mm/migrate: remove putback_lru_pages, fix comment on putback_movable_pages · 59c82b70
      Joonsoo Kim 提交于
      Some part of putback_lru_pages() and putback_movable_pages() is
      duplicated, so it could confuse us what we should use.  We can remove
      putback_lru_pages() since it is not really needed now.  This makes us
      undestand and maintain the code more easily.
      
      And comment on putback_movable_pages() is stale now, so fix it.
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Reviewed-by: NWanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Rafael Aquini <aquini@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      59c82b70
    • V
      mm: compaction: encapsulate defer reset logic · de6c60a6
      Vlastimil Babka 提交于
      Currently there are several functions to manipulate the deferred
      compaction state variables.  The remaining case where the variables are
      touched directly is when a successful allocation occurs in direct
      compaction, or is expected to be successful in the future by kswapd.
      Here, the lowest order that is expected to fail is updated, and in the
      case of successful allocation, the deferred status and counter is reset
      completely.
      
      Create a new function compaction_defer_reset() to encapsulate this
      functionality and make it easier to understand the code.  No functional
      change.
      Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      de6c60a6
    • M
      mm: numa: limit scope of lock for NUMA migrate rate limiting · 1c5e9c27
      Mel Gorman 提交于
      NUMA migrate rate limiting protects a migration counter and window using
      a lock but in some cases this can be a contended lock.  It is not
      critical that the number of pages be perfect, lost updates are
      acceptable.  Reduce the importance of this lock.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Cc: Alex Thorlton <athorlton@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1c5e9c27
    • S
      mm/memblock: add memblock memory allocation apis · 26f09e9b
      Santosh Shilimkar 提交于
      Introduce memblock memory allocation APIs which allow to support PAE or
      LPAE extension on 32 bits archs where the physical memory start address
      can be beyond 4GB.  In such cases, existing bootmem APIs which operate
      on 32 bit addresses won't work and needs memblock layer which operates
      on 64 bit addresses.
      
      So we add equivalent APIs so that we can replace usage of bootmem with
      memblock interfaces.  Architectures already converted to NO_BOOTMEM use
      these new memblock interfaces.  The architectures which are still not
      converted to NO_BOOTMEM continue to function as is because we still
      maintain the fal lback option of bootmem back-end supporting these new
      interfaces.  So no functional change as such.
      
      In long run, once all the architectures moves to NO_BOOTMEM, we can get
      rid of bootmem layer completely.  This is one step to remove the core
      code dependency with bootmem and also gives path for architectures to
      move away from bootmem.
      
      The proposed interface will became active if both CONFIG_HAVE_MEMBLOCK
      and CONFIG_NO_BOOTMEM are specified by arch.  In case
      !CONFIG_NO_BOOTMEM, the memblock() wrappers will fallback to the
      existing bootmem apis so that arch's not converted to NO_BOOTMEM
      continue to work as is.
      
      The meaning of MEMBLOCK_ALLOC_ACCESSIBLE and MEMBLOCK_ALLOC_ANYWHERE
      is kept same.
      
      [akpm@linux-foundation.org: s/depricated/deprecated/]
      Signed-off-by: NGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Paul Walmsley <paul@pwsan.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Tony Lindgren <tony@atomide.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      26f09e9b
    • G
      mm/memblock: switch to use NUMA_NO_NODE instead of MAX_NUMNODES · b1154233
      Grygorii Strashko 提交于
      It's recommended to use NUMA_NO_NODE everywhere to select "process any
      node" behavior or to indicate that "no node id specified".
      
      Hence, update __next_free_mem_range*() API's to accept both NUMA_NO_NODE
      and MAX_NUMNODES, but emit warning once on MAX_NUMNODES, and correct
      corresponding API's documentation to describe new behavior.  Also,
      update other memblock/nobootmem APIs where MAX_NUMNODES is used
      dirrectly.
      
      The change was suggested by Tejun Heo.
      Signed-off-by: NGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Paul Walmsley <paul@pwsan.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Tony Lindgren <tony@atomide.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b1154233
    • G
      mm/memblock: reorder parameters of memblock_find_in_range_node · 87029ee9
      Grygorii Strashko 提交于
      Reorder parameters of memblock_find_in_range_node to be consistent with
      other memblock APIs.
      
      The change was suggested by Tejun Heo <tj@kernel.org>.
      Signed-off-by: NGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Paul Walmsley <paul@pwsan.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Tony Lindgren <tony@atomide.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      87029ee9
    • G
      mm/bootmem: remove duplicated declaration of __free_pages_bootmem() · 10e89523
      Grygorii Strashko 提交于
      The __free_pages_bootmem is used internally by MM core and already
      defined in internal.h.  So, remove duplicated declaration.
      Signed-off-by: NGrygorii Strashko <grygorii.strashko@ti.com>
      Signed-off-by: NSantosh Shilimkar <santosh.shilimkar@ti.com>
      Reviewed-by: NTejun Heo <tj@kernel.org>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: H. Peter Anvin <hpa@zytor.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: Paul Walmsley <paul@pwsan.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Tony Lindgren <tony@atomide.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      10e89523
    • O
      introduce for_each_thread() to replace the buggy while_each_thread() · 0c740d0a
      Oleg Nesterov 提交于
      while_each_thread() and next_thread() should die, almost every lockless
      usage is wrong.
      
      1. Unless g == current, the lockless while_each_thread() is not safe.
      
         while_each_thread(g, t) can loop forever if g exits, next_thread()
         can't reach the unhashed thread in this case. Note that this can
         happen even if g is the group leader, it can exec.
      
      2. Even if while_each_thread() itself was correct, people often use
         it wrongly.
      
         It was never safe to just take rcu_read_lock() and loop unless
         you verify that pid_alive(g) == T, even the first next_thread()
         can point to the already freed/reused memory.
      
      This patch adds signal_struct->thread_head and task->thread_node to
      create the normal rcu-safe list with the stable head.  The new
      for_each_thread(g, t) helper is always safe under rcu_read_lock() as
      long as this task_struct can't go away.
      
      Note: of course it is ugly to have both task_struct->thread_node and the
      old task_struct->thread_group, we will kill it later, after we change
      the users of while_each_thread() to use for_each_thread().
      
      Perhaps we can kill it even before we convert all users, we can
      reimplement next_thread(t) using the new thread_head/thread_node.  But
      we can't do this right now because this will lead to subtle behavioural
      changes.  For example, do/while_each_thread() always sees at least one
      task, while for_each_thread() can do nothing if the whole thread group
      has died.  Or thread_group_empty(), currently its semantics is not clear
      unless thread_group_leader(p) and we need to audit the callers before we
      can change it.
      
      So this patch adds the new interface which has to coexist with the old
      one for some time, hopefully the next changes will be more or less
      straightforward and the old one will go away soon.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Reviewed-by: NSergey Dyasly <dserrg@gmail.com>
      Tested-by: NSergey Dyasly <dserrg@gmail.com>
      Reviewed-by: NSameer Nanda <snanda@chromium.org>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Mandeep Singh Baines <msb@chromium.org>
      Cc: "Ma, Xindong" <xindong.ma@intel.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Cc: "Tu, Xiaobing" <xiaobing.tu@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0c740d0a
    • J
      mm/rmap: use rmap_walk() in page_referenced() · 9f32624b
      Joonsoo Kim 提交于
      Now, we have an infrastructure in rmap_walk() to handle difference from
      variants of rmap traversing functions.
      
      So, just use it in page_referenced().
      
      In this patch, I change following things.
      
      1. remove some variants of rmap traversing functions.
      	cf> page_referenced_ksm, page_referenced_anon,
      	page_referenced_file
      
      2. introduce new struct page_referenced_arg and pass it to
         page_referenced_one(), main function of rmap_walk, in order to count
         reference, to store vm_flags and to check finish condition.
      
      3. mechanical change to use rmap_walk() in page_referenced().
      
      [liwanp@linux.vnet.ibm.com: fix BUG at rmap_walk]
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Hillf Danton <dhillf@gmail.com>
      Signed-off-by: NWanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9f32624b
    • J
      mm/rmap: use rmap_walk() in try_to_munlock() · e8351ac9
      Joonsoo Kim 提交于
      Now, we have an infrastructure in rmap_walk() to handle difference from
      variants of rmap traversing functions.
      
      So, just use it in try_to_munlock().
      
      In this patch, I change following things.
      
      1. remove some variants of rmap traversing functions.
      	cf> try_to_unmap_ksm, try_to_unmap_anon, try_to_unmap_file
      2. mechanical change to use rmap_walk() in try_to_munlock().
      3. copy and paste comments.
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Hillf Danton <dhillf@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e8351ac9
    • J
      mm/rmap: use rmap_walk() in try_to_unmap() · 52629506
      Joonsoo Kim 提交于
      Now, we have an infrastructure in rmap_walk() to handle difference from
      variants of rmap traversing functions.
      
      So, just use it in try_to_unmap().
      
      In this patch, I change following things.
      
      1. enable rmap_walk() if !CONFIG_MIGRATION.
      2. mechanical change to use rmap_walk() in try_to_unmap().
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Hillf Danton <dhillf@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      52629506
    • J
      mm/rmap: extend rmap_walk_xxx() to cope with different cases · 0dd1c7bb
      Joonsoo Kim 提交于
      There are a lot of common parts in traversing functions, but there are
      also a little of uncommon parts in it.  By assigning proper function
      pointer on each rmap_walker_control, we can handle these difference
      correctly.
      
      Following are differences we should handle.
      
      1. difference of lock function in anon mapping case
      2. nonlinear handling in file mapping case
      3. prechecked condition:
      	checking memcg in page_referenced(),
      	checking VM_SHARE in page_mkclean()
      	checking temporary vma in try_to_unmap()
      4. exit condition:
      	checking page_mapped() in try_to_unmap()
      
      So, in this patch, I introduce 4 function pointers to handle above
      differences.
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Hillf Danton <dhillf@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0dd1c7bb
    • J
      mm/rmap: make rmap_walk to get the rmap_walk_control argument · 051ac83a
      Joonsoo Kim 提交于
      In each rmap traverse case, there is some difference so that we need
      function pointers and arguments to them in order to handle these
      
      For this purpose, struct rmap_walk_control is introduced in this patch,
      and will be extended in following patch.  Introducing and extending are
      separate, because it clarify changes.
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Hillf Danton <dhillf@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      051ac83a
    • T
      memblock, mem_hotplug: make memblock skip hotpluggable regions if needed · 55ac590c
      Tang Chen 提交于
      Linux kernel cannot migrate pages used by the kernel.  As a result,
      hotpluggable memory used by the kernel won't be able to be hot-removed.
      To solve this problem, the basic idea is to prevent memblock from
      allocating hotpluggable memory for the kernel at early time, and arrange
      all hotpluggable memory in ACPI SRAT(System Resource Affinity Table) as
      ZONE_MOVABLE when initializing zones.
      
      In the previous patches, we have marked hotpluggable memory regions with
      MEMBLOCK_HOTPLUG flag in memblock.memory.
      
      In this patch, we make memblock skip these hotpluggable memory regions
      in the default top-down allocation function if movable_node boot option
      is specified.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Signed-off-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "Rafael J . Wysocki" <rjw@sisk.pl>
      Cc: Chen Tang <imtangchen@gmail.com>
      Cc: Gong Chen <gong.chen@linux.intel.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Liu Jiang <jiang.liu@huawei.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Renninger <trenn@suse.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
      Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      55ac590c
    • T
      memblock: make memblock_set_node() support different memblock_type · e7e8de59
      Tang Chen 提交于
      [sfr@canb.auug.org.au: fix powerpc build]
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Reviewed-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "Rafael J . Wysocki" <rjw@sisk.pl>
      Cc: Chen Tang <imtangchen@gmail.com>
      Cc: Gong Chen <gong.chen@linux.intel.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Liu Jiang <jiang.liu@huawei.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Renninger <trenn@suse.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
      Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e7e8de59
    • T
      memblock, mem_hotplug: introduce MEMBLOCK_HOTPLUG flag to mark hotpluggable regions · 66b16edf
      Tang Chen 提交于
      In find_hotpluggable_memory, once we find out a memory region which is
      hotpluggable, we want to mark them in memblock.memory.  So that we could
      control memblock allocator not to allocte hotpluggable memory for the
      kernel later.
      
      To achieve this goal, we introduce MEMBLOCK_HOTPLUG flag to indicate the
      hotpluggable memory regions in memblock and a function
      memblock_mark_hotplug() to mark hotpluggable memory if we find one.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Reviewed-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "Rafael J . Wysocki" <rjw@sisk.pl>
      Cc: Chen Tang <imtangchen@gmail.com>
      Cc: Gong Chen <gong.chen@linux.intel.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Liu Jiang <jiang.liu@huawei.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Renninger <trenn@suse.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
      Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      66b16edf
    • T
      memblock, numa: introduce flags field into memblock · 66a20757
      Tang Chen 提交于
      There is no flag in memblock to describe what type the memory is.
      Sometimes, we may use memblock to reserve some memory for special usage.
      And we want to know what kind of memory it is.  So we need a way to
      
      In hotplug environment, we want to reserve hotpluggable memory so the
      kernel won't be able to use it.  And when the system is up, we have to
      free these hotpluggable memory to buddy.  So we need to mark these
      memory first.
      
      In order to do so, we need to mark out these special memory in memblock.
      In this patch, we introduce a new "flags" member into memblock_region:
      
         struct memblock_region {
                 phys_addr_t base;
                 phys_addr_t size;
                 unsigned long flags;		/* This is new. */
         #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
                 int nid;
         #endif
         };
      
      This patch does the following things:
      1) Add "flags" member to memblock_region.
      2) Modify the following APIs' prototype:
      	memblock_add_region()
      	memblock_insert_region()
      3) Add memblock_reserve_region() to support reserve memory with flags, and keep
         memblock_reserve()'s prototype unmodified.
      4) Modify other APIs to support flags, but keep their prototype unmodified.
      
      The idea is from Wen Congyang <wency@cn.fujitsu.com> and Liu Jiang <jiang.liu@huawei.com>.
      Suggested-by: NWen Congyang <wency@cn.fujitsu.com>
      Suggested-by: NLiu Jiang <jiang.liu@huawei.com>
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Reviewed-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "Rafael J . Wysocki" <rjw@sisk.pl>
      Cc: Chen Tang <imtangchen@gmail.com>
      Cc: Gong Chen <gong.chen@linux.intel.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Renninger <trenn@suse.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
      Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      66a20757
    • J
      mm: add overcommit_kbytes sysctl variable · 49f0ce5f
      Jerome Marchand 提交于
      Some applications that run on HPC clusters are designed around the
      availability of RAM and the overcommit ratio is fine tuned to get the
      maximum usage of memory without swapping.  With growing memory, the
      1%-of-all-RAM grain provided by overcommit_ratio has become too coarse
      for these workload (on a 2TB machine it represents no less than 20GB).
      
      This patch adds the new overcommit_kbytes sysctl variable that allow a
      much finer grain.
      
      [akpm@linux-foundation.org: coding-style fixes]
      [akpm@linux-foundation.org: fix nommu build]
      Signed-off-by: NJerome Marchand <jmarchan@redhat.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      49f0ce5f
    • M
      mm, show_mem: remove SHOW_MEM_FILTER_PAGE_COUNT · aec6a888
      Mel Gorman 提交于
      Commit 4b59e6c4 ("mm, show_mem: suppress page counts in
      non-blockable contexts") introduced SHOW_MEM_FILTER_PAGE_COUNT to
      suppress PFN walks on large memory machines.  Commit c78e9363 ("mm:
      do not walk all of system memory during show_mem") avoided a PFN walk in
      the generic show_mem helper which removes the requirement for
      SHOW_MEM_FILTER_PAGE_COUNT in that case.
      
      This patch removes PFN walkers from the arch-specific implementations
      that report on a per-node or per-zone granularity.  ARM and unicore32
      still do a PFN walk as they report memory usage on each bank which is a
      much finer granularity where the debugging information may still be of
      use.  As the remaining arches doing PFN walks have relatively small
      amounts of memory, this patch simply removes SHOW_MEM_FILTER_PAGE_COUNT.
      
      [akpm@linux-foundation.org: fix parisc]
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: James Bottomley <jejb@parisc-linux.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      aec6a888
    • D
      mm, mempolicy: remove unneeded functions for UMA configs · d80be7c7
      David Rientjes 提交于
      Mempolicies only exist for CONFIG_NUMA configurations.  Therefore, a
      certain class of functions are unneeded in configurations where
      CONFIG_NUMA is disabled such as functions that duplicate existing
      mempolicies, lookup existing policies, set certain mempolicy traits, or
      test mempolicies for certain attributes.
      
      Remove the unneeded functions so that any future callers get a compile-
      time error and protect their code with CONFIG_NUMA as required.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d80be7c7
    • K
      mm: create a separate slab for page->ptl allocation · b35f1819
      Kirill A. Shutemov 提交于
      If DEBUG_SPINLOCK and DEBUG_LOCK_ALLOC are enabled spinlock_t on x86_64
      is 72 bytes.  For page->ptl they will be allocated from kmalloc-96 slab,
      so we loose 24 on each.  An average system can easily allocate few tens
      thousands of page->ptl and overhead is significant.
      
      Let's create a separate slab for page->ptl allocation to solve this.
      
      To make sure that it really works this time, some numbers from my test
      machine (just booted, no load):
      
      Before:
        # grep '^\(kmalloc-96\|page->ptl\)' /proc/slabinfo
        kmalloc-96         31987  32190    128   30    1 : tunables  120   60    8 : slabdata   1073   1073     92
      After:
        # grep '^\(kmalloc-96\|page->ptl\)' /proc/slabinfo
        page->ptl          27516  28143     72   53    1 : tunables  120   60    8 : slabdata    531    531      9
        kmalloc-96          3853   5280    128   30    1 : tunables  120   60    8 : slabdata    176    176      0
      
      Note that the patch is useful not only for debug case, but also for
      PREEMPT_RT, where spinlock_t is always bloated.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b35f1819
    • Y
      mm: get rid of unnecessary pageblock scanning in setup_zone_migrate_reserve · 943dca1a
      Yasuaki Ishimatsu 提交于
      Yasuaki Ishimatsu reported memory hot-add spent more than 5 _hours_ on
      9TB memory machine since onlining memory sections is too slow.  And we
      found out setup_zone_migrate_reserve spent >90% of the time.
      
      The problem is, setup_zone_migrate_reserve scans all pageblocks
      unconditionally, but it is only necessary if the number of reserved
      block was reduced (i.e.  memory hot remove).
      
      Moreover, maximum MIGRATE_RESERVE per zone is currently 2.  It means
      that the number of reserved pageblocks is almost always unchanged.
      
      This patch adds zone->nr_migrate_reserve_block to maintain the number of
      MIGRATE_RESERVE pageblocks and it reduces the overhead of
      setup_zone_migrate_reserve dramatically.  The following table shows time
      of onlining a memory section.
      
        Amount of memory     | 128GB | 192GB | 256GB|
        ---------------------------------------------
        linux-3.12           |  23.9 |  31.4 | 44.5 |
        This patch           |   8.3 |   8.3 |  8.6 |
        Mel's proposal patch |  10.9 |  19.2 | 31.3 |
        ---------------------------------------------
                                         (millisecond)
      
        128GB : 4 nodes and each node has 32GB of memory
        192GB : 6 nodes and each node has 32GB of memory
        256GB : 8 nodes and each node has 32GB of memory
      
        (*1) Mel proposed his idea by the following threads.
             https://lkml.org/lkml/2013/10/30/272
      
      [akpm@linux-foundation.org: tweak comment]
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Reported-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Tested-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      943dca1a
    • O
      mm: thp: turn compound_head() into BUG_ON(!PageTail) in get_huge_page_tail() · 5eaf1a9e
      Oleg Nesterov 提交于
      get_huge_page_tail()->compound_head() looks confusing.  Every caller
      must check PageTail(page), otherwise atomic_inc(&page->_mapcount) is
      simply wrong if this page is compound-trans-head.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Darren Hart <dvhart@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Acked-by: NAndrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5eaf1a9e
    • A
      mm: tail page refcounting optimization for slab and hugetlbfs · 44518d2b
      Andrea Arcangeli 提交于
      This skips the _mapcount mangling for slab and hugetlbfs pages.
      
      The main trouble in doing this is to guarantee that PageSlab and
      PageHeadHuge remains constant for all get_page/put_page run on the tail
      of slab or hugetlbfs compound pages.  Otherwise if they're set during
      get_page but not set during put_page, the _mapcount of the tail page
      would underflow.
      
      PageHeadHuge will remain true until the compound page is released and
      enters the buddy allocator so it won't risk to change even if the tail
      page is the last reference left on the page.
      
      PG_slab instead is cleared before the slab frees the head page with
      put_page, so if the tail pin is released after the slab freed the page,
      we would have a problem.  But in the slab case the tail pin cannot be
      the last reference left on the page.  This is because the slab code is
      free to reuse the compound page after a kfree/kmem_cache_free without
      having to check if there's any tail pin left.  In turn all tail pins
      must be always released while the head is still pinned by the slab code
      and so we know PG_slab will be still set too.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Reviewed-by: NKhalid Aziz <khalid.aziz@oracle.com>
      Cc: Pravin Shelar <pshelar@nicira.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Ben Hutchings <bhutchings@solarflare.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      44518d2b
    • A
      mm: thp: optimize compound_trans_huge · ca641514
      Andrea Arcangeli 提交于
      Currently we don't clobber page_tail->first_page during split_huge_page,
      so compound_trans_head can be set to compound_head without adverse
      effects, and this mostly optimizes away a smp_rmb.
      
      It looks worthwhile to keep around the implementation that doesn't relay
      on page_tail->first_page not to be clobbered, because it would be
      necessary if we'll decide to enforce page->private to zero at all times
      whenever PG_private is not set, also for anonymous pages.  For anonymous
      pages enforcing such an invariant doesn't matter as anonymous pages
      don't use page->private so we can get away with this microoptimization.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: Khalid Aziz <khalid.aziz@oracle.com>
      Cc: Pravin Shelar <pshelar@nicira.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Ben Hutchings <bhutchings@solarflare.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ca641514
    • D
      mm: hugetlbfs: Add some VM_BUG_ON()s to catch non-hugetlbfs pages · 0e147aed
      Dave Hansen 提交于
      Dave Jiang reported that he was seeing oopses when running NUMA systems
      and default_hugepagesz=1G.  I traced the issue down to
      migrate_page_copy() trying to use the same code for hugetlb pages and
      transparent hugepages.  It should not have been trying to pass thp pages
      in there.
      
      So, add some VM_BUG_ON()s for the next hapless VM developer that tries
      the same thing.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Tested-by: NDave Jiang <dave.jiang@intel.com>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0e147aed
    • G
      mm: Make {,set}page_address() static inline if WANT_PAGE_VIRTUAL · f92f455f
      Geert Uytterhoeven 提交于
      {,set}page_address() are macros if WANT_PAGE_VIRTUAL.  If
      !WANT_PAGE_VIRTUAL, they're plain C functions.
      
      If someone calls them with a void *, this pointer is auto-converted to
      struct page * if !WANT_PAGE_VIRTUAL, but causes a build failure on
      architectures using WANT_PAGE_VIRTUAL (arc, m68k and sparc64):
      
        drivers/md/bcache/bset.c: In function `__btree_sort':
        drivers/md/bcache/bset.c:1190: warning: dereferencing `void *' pointer
        drivers/md/bcache/bset.c:1190: error: request for member `virtual' in something not a structure or union
      
      Convert them to static inline functions to fix this.  There are already
      plenty of users of struct page members inside <linux/mm.h>, so there's
      no reason to keep them as macros.
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Tested-by: NGuenter Roeck <linux@roeck-us.net>
      Tested-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f92f455f
    • A
      posix_acl: uninlining · 0afaa120
      Andrew Morton 提交于
      Uninline vast tracts of nested inline functions in
      include/linux/posix_acl.h.
      
      This reduces the text+data+bss size of x86_64 allyesconfig vmlinux by
      8026 bytes.
      
      The patch also regularises the positioning of the EXPORT_SYMBOLs in
      posix_acl.c.
      
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: J. Bruce Fields <bfields@fieldses.org>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Tested-by: NBenny Halevy <bhalevy@primarydata.com>
      Cc: Benny Halevy <bhalevy@panasas.com>
      Cc: Andreas Gruenbacher <agruen@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0afaa120
    • J
      fsnotify: remove .should_send_event callback · 83c4c4b0
      Jan Kara 提交于
      After removing event structure creation from the generic layer there is
      no reason for separate .should_send_event and .handle_event callbacks.
      So just remove the first one.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Cc: Eric Paris <eparis@parisplace.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      83c4c4b0
    • J
      fsnotify: do not share events between notification groups · 7053aee2
      Jan Kara 提交于
      Currently fsnotify framework creates one event structure for each
      notification event and links this event into all interested notification
      groups.  This is done so that we save memory when several notification
      groups are interested in the event.  However the need for event
      structure shared between inotify & fanotify bloats the event structure
      so the result is often higher memory consumption.
      
      Another problem is that fsnotify framework keeps path references with
      outstanding events so that fanotify can return open file descriptors
      with its events.  This has the undesirable effect that filesystem cannot
      be unmounted while there are outstanding events - a regression for
      inotify compared to a situation before it was converted to fsnotify
      framework.  For fanotify this problem is hard to avoid and users of
      fanotify should kind of expect this behavior when they ask for file
      descriptors from notified files.
      
      This patch changes fsnotify and its users to create separate event
      structure for each group.  This allows for much simpler code (~400 lines
      removed by this patch) and also smaller event structures.  For example
      on 64-bit system original struct fsnotify_event consumes 120 bytes, plus
      additional space for file name, additional 24 bytes for second and each
      subsequent group linking the event, and additional 32 bytes for each
      inotify group for private data.  After the conversion inotify event
      consumes 48 bytes plus space for file name which is considerably less
      memory unless file names are long and there are several groups
      interested in the events (both of which are uncommon).  Fanotify event
      fits in 56 bytes after the conversion (fanotify doesn't care about file
      names so its events don't have to have it allocated).  A win unless
      there are four or more fanotify groups interested in the event.
      
      The conversion also solves the problem with unmount when only inotify is
      used as we don't have to grab path references for inotify events.
      
      [hughd@google.com: fanotify: fix corruption preventing startup]
      Signed-off-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Cc: Eric Paris <eparis@parisplace.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7053aee2
    • D
      dma-debug: introduce debug_dma_assert_idle() · 0abdd7a8
      Dan Williams 提交于
      Record actively mapped pages and provide an api for asserting a given
      page is dma inactive before execution proceeds.  Placing
      debug_dma_assert_idle() in cow_user_page() flagged the violation of the
      dma-api in the NET_DMA implementation (see commit 77873803 "net_dma:
      mark broken").
      
      The implementation includes the capability to count, in a limited way,
      repeat mappings of the same page that occur without an intervening
      unmap.  This 'overlap' counter is limited to the few bits of tag space
      in a radix tree.  This mechanism is added to mitigate false negative
      cases where, for example, a page is dma mapped twice and
      debug_dma_assert_idle() is called after the page is un-mapped once.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Vinod Koul <vinod.koul@intel.com>
      Cc: Russell King <rmk+kernel@arm.linux.org.uk>
      Cc: James Bottomley <JBottomley@Parallels.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0abdd7a8
  4. 21 1月, 2014 4 次提交