1. 22 1月, 2014 20 次提交
    • J
      mm/rmap: extend rmap_walk_xxx() to cope with different cases · 0dd1c7bb
      Joonsoo Kim 提交于
      There are a lot of common parts in traversing functions, but there are
      also a little of uncommon parts in it.  By assigning proper function
      pointer on each rmap_walker_control, we can handle these difference
      correctly.
      
      Following are differences we should handle.
      
      1. difference of lock function in anon mapping case
      2. nonlinear handling in file mapping case
      3. prechecked condition:
      	checking memcg in page_referenced(),
      	checking VM_SHARE in page_mkclean()
      	checking temporary vma in try_to_unmap()
      4. exit condition:
      	checking page_mapped() in try_to_unmap()
      
      So, in this patch, I introduce 4 function pointers to handle above
      differences.
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Hillf Danton <dhillf@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0dd1c7bb
    • J
      mm/rmap: make rmap_walk to get the rmap_walk_control argument · 051ac83a
      Joonsoo Kim 提交于
      In each rmap traverse case, there is some difference so that we need
      function pointers and arguments to them in order to handle these
      
      For this purpose, struct rmap_walk_control is introduced in this patch,
      and will be extended in following patch.  Introducing and extending are
      separate, because it clarify changes.
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Hillf Danton <dhillf@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      051ac83a
    • T
      memblock, mem_hotplug: make memblock skip hotpluggable regions if needed · 55ac590c
      Tang Chen 提交于
      Linux kernel cannot migrate pages used by the kernel.  As a result,
      hotpluggable memory used by the kernel won't be able to be hot-removed.
      To solve this problem, the basic idea is to prevent memblock from
      allocating hotpluggable memory for the kernel at early time, and arrange
      all hotpluggable memory in ACPI SRAT(System Resource Affinity Table) as
      ZONE_MOVABLE when initializing zones.
      
      In the previous patches, we have marked hotpluggable memory regions with
      MEMBLOCK_HOTPLUG flag in memblock.memory.
      
      In this patch, we make memblock skip these hotpluggable memory regions
      in the default top-down allocation function if movable_node boot option
      is specified.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Signed-off-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "Rafael J . Wysocki" <rjw@sisk.pl>
      Cc: Chen Tang <imtangchen@gmail.com>
      Cc: Gong Chen <gong.chen@linux.intel.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Liu Jiang <jiang.liu@huawei.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Renninger <trenn@suse.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
      Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      55ac590c
    • T
      memblock: make memblock_set_node() support different memblock_type · e7e8de59
      Tang Chen 提交于
      [sfr@canb.auug.org.au: fix powerpc build]
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Reviewed-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "Rafael J . Wysocki" <rjw@sisk.pl>
      Cc: Chen Tang <imtangchen@gmail.com>
      Cc: Gong Chen <gong.chen@linux.intel.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Liu Jiang <jiang.liu@huawei.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Renninger <trenn@suse.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
      Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e7e8de59
    • T
      memblock, mem_hotplug: introduce MEMBLOCK_HOTPLUG flag to mark hotpluggable regions · 66b16edf
      Tang Chen 提交于
      In find_hotpluggable_memory, once we find out a memory region which is
      hotpluggable, we want to mark them in memblock.memory.  So that we could
      control memblock allocator not to allocte hotpluggable memory for the
      kernel later.
      
      To achieve this goal, we introduce MEMBLOCK_HOTPLUG flag to indicate the
      hotpluggable memory regions in memblock and a function
      memblock_mark_hotplug() to mark hotpluggable memory if we find one.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Reviewed-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "Rafael J . Wysocki" <rjw@sisk.pl>
      Cc: Chen Tang <imtangchen@gmail.com>
      Cc: Gong Chen <gong.chen@linux.intel.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Liu Jiang <jiang.liu@huawei.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Renninger <trenn@suse.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
      Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      66b16edf
    • T
      memblock, numa: introduce flags field into memblock · 66a20757
      Tang Chen 提交于
      There is no flag in memblock to describe what type the memory is.
      Sometimes, we may use memblock to reserve some memory for special usage.
      And we want to know what kind of memory it is.  So we need a way to
      
      In hotplug environment, we want to reserve hotpluggable memory so the
      kernel won't be able to use it.  And when the system is up, we have to
      free these hotpluggable memory to buddy.  So we need to mark these
      memory first.
      
      In order to do so, we need to mark out these special memory in memblock.
      In this patch, we introduce a new "flags" member into memblock_region:
      
         struct memblock_region {
                 phys_addr_t base;
                 phys_addr_t size;
                 unsigned long flags;		/* This is new. */
         #ifdef CONFIG_HAVE_MEMBLOCK_NODE_MAP
                 int nid;
         #endif
         };
      
      This patch does the following things:
      1) Add "flags" member to memblock_region.
      2) Modify the following APIs' prototype:
      	memblock_add_region()
      	memblock_insert_region()
      3) Add memblock_reserve_region() to support reserve memory with flags, and keep
         memblock_reserve()'s prototype unmodified.
      4) Modify other APIs to support flags, but keep their prototype unmodified.
      
      The idea is from Wen Congyang <wency@cn.fujitsu.com> and Liu Jiang <jiang.liu@huawei.com>.
      Suggested-by: NWen Congyang <wency@cn.fujitsu.com>
      Suggested-by: NLiu Jiang <jiang.liu@huawei.com>
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Reviewed-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "Rafael J . Wysocki" <rjw@sisk.pl>
      Cc: Chen Tang <imtangchen@gmail.com>
      Cc: Gong Chen <gong.chen@linux.intel.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Prarit Bhargava <prarit@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Thomas Renninger <trenn@suse.de>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: Vasilis Liaskovitis <vasilis.liaskovitis@profitbricks.com>
      Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      66a20757
    • J
      mm: add overcommit_kbytes sysctl variable · 49f0ce5f
      Jerome Marchand 提交于
      Some applications that run on HPC clusters are designed around the
      availability of RAM and the overcommit ratio is fine tuned to get the
      maximum usage of memory without swapping.  With growing memory, the
      1%-of-all-RAM grain provided by overcommit_ratio has become too coarse
      for these workload (on a 2TB machine it represents no less than 20GB).
      
      This patch adds the new overcommit_kbytes sysctl variable that allow a
      much finer grain.
      
      [akpm@linux-foundation.org: coding-style fixes]
      [akpm@linux-foundation.org: fix nommu build]
      Signed-off-by: NJerome Marchand <jmarchan@redhat.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      49f0ce5f
    • M
      mm, show_mem: remove SHOW_MEM_FILTER_PAGE_COUNT · aec6a888
      Mel Gorman 提交于
      Commit 4b59e6c4 ("mm, show_mem: suppress page counts in
      non-blockable contexts") introduced SHOW_MEM_FILTER_PAGE_COUNT to
      suppress PFN walks on large memory machines.  Commit c78e9363 ("mm:
      do not walk all of system memory during show_mem") avoided a PFN walk in
      the generic show_mem helper which removes the requirement for
      SHOW_MEM_FILTER_PAGE_COUNT in that case.
      
      This patch removes PFN walkers from the arch-specific implementations
      that report on a per-node or per-zone granularity.  ARM and unicore32
      still do a PFN walk as they report memory usage on each bank which is a
      much finer granularity where the debugging information may still be of
      use.  As the remaining arches doing PFN walks have relatively small
      amounts of memory, this patch simply removes SHOW_MEM_FILTER_PAGE_COUNT.
      
      [akpm@linux-foundation.org: fix parisc]
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: James Bottomley <jejb@parisc-linux.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      aec6a888
    • D
      mm, mempolicy: remove unneeded functions for UMA configs · d80be7c7
      David Rientjes 提交于
      Mempolicies only exist for CONFIG_NUMA configurations.  Therefore, a
      certain class of functions are unneeded in configurations where
      CONFIG_NUMA is disabled such as functions that duplicate existing
      mempolicies, lookup existing policies, set certain mempolicy traits, or
      test mempolicies for certain attributes.
      
      Remove the unneeded functions so that any future callers get a compile-
      time error and protect their code with CONFIG_NUMA as required.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d80be7c7
    • K
      mm: create a separate slab for page->ptl allocation · b35f1819
      Kirill A. Shutemov 提交于
      If DEBUG_SPINLOCK and DEBUG_LOCK_ALLOC are enabled spinlock_t on x86_64
      is 72 bytes.  For page->ptl they will be allocated from kmalloc-96 slab,
      so we loose 24 on each.  An average system can easily allocate few tens
      thousands of page->ptl and overhead is significant.
      
      Let's create a separate slab for page->ptl allocation to solve this.
      
      To make sure that it really works this time, some numbers from my test
      machine (just booted, no load):
      
      Before:
        # grep '^\(kmalloc-96\|page->ptl\)' /proc/slabinfo
        kmalloc-96         31987  32190    128   30    1 : tunables  120   60    8 : slabdata   1073   1073     92
      After:
        # grep '^\(kmalloc-96\|page->ptl\)' /proc/slabinfo
        page->ptl          27516  28143     72   53    1 : tunables  120   60    8 : slabdata    531    531      9
        kmalloc-96          3853   5280    128   30    1 : tunables  120   60    8 : slabdata    176    176      0
      
      Note that the patch is useful not only for debug case, but also for
      PREEMPT_RT, where spinlock_t is always bloated.
      Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b35f1819
    • Y
      mm: get rid of unnecessary pageblock scanning in setup_zone_migrate_reserve · 943dca1a
      Yasuaki Ishimatsu 提交于
      Yasuaki Ishimatsu reported memory hot-add spent more than 5 _hours_ on
      9TB memory machine since onlining memory sections is too slow.  And we
      found out setup_zone_migrate_reserve spent >90% of the time.
      
      The problem is, setup_zone_migrate_reserve scans all pageblocks
      unconditionally, but it is only necessary if the number of reserved
      block was reduced (i.e.  memory hot remove).
      
      Moreover, maximum MIGRATE_RESERVE per zone is currently 2.  It means
      that the number of reserved pageblocks is almost always unchanged.
      
      This patch adds zone->nr_migrate_reserve_block to maintain the number of
      MIGRATE_RESERVE pageblocks and it reduces the overhead of
      setup_zone_migrate_reserve dramatically.  The following table shows time
      of onlining a memory section.
      
        Amount of memory     | 128GB | 192GB | 256GB|
        ---------------------------------------------
        linux-3.12           |  23.9 |  31.4 | 44.5 |
        This patch           |   8.3 |   8.3 |  8.6 |
        Mel's proposal patch |  10.9 |  19.2 | 31.3 |
        ---------------------------------------------
                                         (millisecond)
      
        128GB : 4 nodes and each node has 32GB of memory
        192GB : 6 nodes and each node has 32GB of memory
        256GB : 8 nodes and each node has 32GB of memory
      
        (*1) Mel proposed his idea by the following threads.
             https://lkml.org/lkml/2013/10/30/272
      
      [akpm@linux-foundation.org: tweak comment]
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Reported-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Tested-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      943dca1a
    • O
      mm: thp: turn compound_head() into BUG_ON(!PageTail) in get_huge_page_tail() · 5eaf1a9e
      Oleg Nesterov 提交于
      get_huge_page_tail()->compound_head() looks confusing.  Every caller
      must check PageTail(page), otherwise atomic_inc(&page->_mapcount) is
      simply wrong if this page is compound-trans-head.
      Signed-off-by: NOleg Nesterov <oleg@redhat.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Darren Hart <dvhart@linux.intel.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Acked-by: NAndrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5eaf1a9e
    • A
      mm: tail page refcounting optimization for slab and hugetlbfs · 44518d2b
      Andrea Arcangeli 提交于
      This skips the _mapcount mangling for slab and hugetlbfs pages.
      
      The main trouble in doing this is to guarantee that PageSlab and
      PageHeadHuge remains constant for all get_page/put_page run on the tail
      of slab or hugetlbfs compound pages.  Otherwise if they're set during
      get_page but not set during put_page, the _mapcount of the tail page
      would underflow.
      
      PageHeadHuge will remain true until the compound page is released and
      enters the buddy allocator so it won't risk to change even if the tail
      page is the last reference left on the page.
      
      PG_slab instead is cleared before the slab frees the head page with
      put_page, so if the tail pin is released after the slab freed the page,
      we would have a problem.  But in the slab case the tail pin cannot be
      the last reference left on the page.  This is because the slab code is
      free to reuse the compound page after a kfree/kmem_cache_free without
      having to check if there's any tail pin left.  In turn all tail pins
      must be always released while the head is still pinned by the slab code
      and so we know PG_slab will be still set too.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Reviewed-by: NKhalid Aziz <khalid.aziz@oracle.com>
      Cc: Pravin Shelar <pshelar@nicira.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Ben Hutchings <bhutchings@solarflare.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      44518d2b
    • A
      mm: thp: optimize compound_trans_huge · ca641514
      Andrea Arcangeli 提交于
      Currently we don't clobber page_tail->first_page during split_huge_page,
      so compound_trans_head can be set to compound_head without adverse
      effects, and this mostly optimizes away a smp_rmb.
      
      It looks worthwhile to keep around the implementation that doesn't relay
      on page_tail->first_page not to be clobbered, because it would be
      necessary if we'll decide to enforce page->private to zero at all times
      whenever PG_private is not set, also for anonymous pages.  For anonymous
      pages enforcing such an invariant doesn't matter as anonymous pages
      don't use page->private so we can get away with this microoptimization.
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: Khalid Aziz <khalid.aziz@oracle.com>
      Cc: Pravin Shelar <pshelar@nicira.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Ben Hutchings <bhutchings@solarflare.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Minchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ca641514
    • D
      mm: hugetlbfs: Add some VM_BUG_ON()s to catch non-hugetlbfs pages · 0e147aed
      Dave Hansen 提交于
      Dave Jiang reported that he was seeing oopses when running NUMA systems
      and default_hugepagesz=1G.  I traced the issue down to
      migrate_page_copy() trying to use the same code for hugetlb pages and
      transparent hugepages.  It should not have been trying to pass thp pages
      in there.
      
      So, add some VM_BUG_ON()s for the next hapless VM developer that tries
      the same thing.
      Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
      Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Tested-by: NDave Jiang <dave.jiang@intel.com>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0e147aed
    • G
      mm: Make {,set}page_address() static inline if WANT_PAGE_VIRTUAL · f92f455f
      Geert Uytterhoeven 提交于
      {,set}page_address() are macros if WANT_PAGE_VIRTUAL.  If
      !WANT_PAGE_VIRTUAL, they're plain C functions.
      
      If someone calls them with a void *, this pointer is auto-converted to
      struct page * if !WANT_PAGE_VIRTUAL, but causes a build failure on
      architectures using WANT_PAGE_VIRTUAL (arc, m68k and sparc64):
      
        drivers/md/bcache/bset.c: In function `__btree_sort':
        drivers/md/bcache/bset.c:1190: warning: dereferencing `void *' pointer
        drivers/md/bcache/bset.c:1190: error: request for member `virtual' in something not a structure or union
      
      Convert them to static inline functions to fix this.  There are already
      plenty of users of struct page members inside <linux/mm.h>, so there's
      no reason to keep them as macros.
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Acked-by: NMichael S. Tsirkin <mst@redhat.com>
      Tested-by: NGuenter Roeck <linux@roeck-us.net>
      Tested-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f92f455f
    • A
      posix_acl: uninlining · 0afaa120
      Andrew Morton 提交于
      Uninline vast tracts of nested inline functions in
      include/linux/posix_acl.h.
      
      This reduces the text+data+bss size of x86_64 allyesconfig vmlinux by
      8026 bytes.
      
      The patch also regularises the positioning of the EXPORT_SYMBOLs in
      posix_acl.c.
      
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: J. Bruce Fields <bfields@fieldses.org>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Tested-by: NBenny Halevy <bhalevy@primarydata.com>
      Cc: Benny Halevy <bhalevy@panasas.com>
      Cc: Andreas Gruenbacher <agruen@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0afaa120
    • J
      fsnotify: remove .should_send_event callback · 83c4c4b0
      Jan Kara 提交于
      After removing event structure creation from the generic layer there is
      no reason for separate .should_send_event and .handle_event callbacks.
      So just remove the first one.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Cc: Eric Paris <eparis@parisplace.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      83c4c4b0
    • J
      fsnotify: do not share events between notification groups · 7053aee2
      Jan Kara 提交于
      Currently fsnotify framework creates one event structure for each
      notification event and links this event into all interested notification
      groups.  This is done so that we save memory when several notification
      groups are interested in the event.  However the need for event
      structure shared between inotify & fanotify bloats the event structure
      so the result is often higher memory consumption.
      
      Another problem is that fsnotify framework keeps path references with
      outstanding events so that fanotify can return open file descriptors
      with its events.  This has the undesirable effect that filesystem cannot
      be unmounted while there are outstanding events - a regression for
      inotify compared to a situation before it was converted to fsnotify
      framework.  For fanotify this problem is hard to avoid and users of
      fanotify should kind of expect this behavior when they ask for file
      descriptors from notified files.
      
      This patch changes fsnotify and its users to create separate event
      structure for each group.  This allows for much simpler code (~400 lines
      removed by this patch) and also smaller event structures.  For example
      on 64-bit system original struct fsnotify_event consumes 120 bytes, plus
      additional space for file name, additional 24 bytes for second and each
      subsequent group linking the event, and additional 32 bytes for each
      inotify group for private data.  After the conversion inotify event
      consumes 48 bytes plus space for file name which is considerably less
      memory unless file names are long and there are several groups
      interested in the events (both of which are uncommon).  Fanotify event
      fits in 56 bytes after the conversion (fanotify doesn't care about file
      names so its events don't have to have it allocated).  A win unless
      there are four or more fanotify groups interested in the event.
      
      The conversion also solves the problem with unmount when only inotify is
      used as we don't have to grab path references for inotify events.
      
      [hughd@google.com: fanotify: fix corruption preventing startup]
      Signed-off-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Cc: Eric Paris <eparis@parisplace.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7053aee2
    • D
      dma-debug: introduce debug_dma_assert_idle() · 0abdd7a8
      Dan Williams 提交于
      Record actively mapped pages and provide an api for asserting a given
      page is dma inactive before execution proceeds.  Placing
      debug_dma_assert_idle() in cow_user_page() flagged the violation of the
      dma-api in the NET_DMA implementation (see commit 77873803 "net_dma:
      mark broken").
      
      The implementation includes the capability to count, in a limited way,
      repeat mappings of the same page that occur without an intervening
      unmap.  This 'overlap' counter is limited to the few bits of tag space
      in a radix tree.  This mechanism is added to mitigate false negative
      cases where, for example, a page is dma mapped twice and
      debug_dma_assert_idle() is called after the page is un-mapped once.
      Signed-off-by: NDan Williams <dan.j.williams@intel.com>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Vinod Koul <vinod.koul@intel.com>
      Cc: Russell King <rmk+kernel@arm.linux.org.uk>
      Cc: James Bottomley <JBottomley@Parallels.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0abdd7a8
  2. 21 1月, 2014 10 次提交
  3. 18 1月, 2014 1 次提交
  4. 16 1月, 2014 2 次提交
  5. 15 1月, 2014 6 次提交
    • Q
      crash_dump: fix compilation error (on MIPS at least) · 5a610fcc
      Qais Yousef 提交于
        In file included from kernel/crash_dump.c:2:0:
        include/linux/crash_dump.h:22:27: error: unknown type name `pgprot_t'
      
      when CONFIG_CRASH_DUMP=y
      
      The error was traced back to commit 9cb21813 ("vmcore: introduce
      remap_oldmem_pfn_range()")
      
      include <asm/pgtable.h> to get the missing definition
      Signed-off-by: NQais Yousef <qais.yousef@imgtec.com>
      Reviewed-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Michael Holzheu <holzheu@linux.vnet.ibm.com>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Cc: <stable@vger.kernel.org>	[3.12+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5a610fcc
    • V
      hwmon: (sht15) add include guard · ffcc3b2a
      Vivien Didelot 提交于
      Add include guard to include/linux/platform_data/sht15.h to prevent
      multiple inclusion.
      Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
      ffcc3b2a
    • V
      hwmon: (max197) add include guard · 4cb6409b
      Vivien Didelot 提交于
      Add include guard to include/linux/platform_data/max197.h to prevent
      multiple inclusion.
      Signed-off-by: NVivien Didelot <vivien.didelot@savoirfairelinux.com>
      Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
      4cb6409b
    • S
      hwmon: (s3c) Trivial cleanup in hwmon-s3c.h · 2ac1dfc5
      Sachin Kamat 提交于
      Commit 436d42c6 ("ARM: samsung: move platform_data definitions")
      moved the file to the current location but forgot to remove the pointer
      to its previous location. Clean it up. While at it also change the header
      file protection macros appropriately.
      Signed-off-by: NSachin Kamat <sachin.kamat@linaro.org>
      Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
      2ac1dfc5
    • L
      dma: Indicate residue granularity in dma_slave_caps · 50720563
      Lars-Peter Clausen 提交于
      This patch adds a new field to the dma_slave_caps struct which indicates the
      granularity with which the driver is able to update the residue field of the
      dma_tx_state struct. Making this information available to dmaengine users allows
      them to make better decisions on how to operate. E.g. for audio certain features
      like wakeup less operation or timer based scheduling only make sense and work
      correctly if the reported residue is fine-grained enough.
      
      Right now four different levels of granularity are supported:
      	* DESCRIPTOR: The DMA channel is only able to tell whether a descriptor has
      	  been completed or not, which means residue reporting is not supported by
      	  this channel. The residue field of the dma_tx_state field will always be
      	  0.
      	* SEGMENT: The DMA channel updates the residue field after each successfully
      	  completed segment of the transfer (For cyclic transfers this is after each
      	  period). This is typically implemented by having the hardware generate an
      	  interrupt after each transferred segment and then the drivers updates the
      	  outstanding residue by the size of the segment. Another possibility is if
      	  the hardware supports SG and the segment descriptor has a field which gets
      	  set after the segment has been completed. The driver then counts the
      	  number of segments without the flag set to compute the residue.
      	* BURST: The DMA channel updates the residue field after each transferred
      	  burst. This is typically only supported if the hardware has a progress
      	  register of some sort (E.g. a register with the current read/write address
      	  or a register with the amount of bursts/beats/bytes that have been
      	  transferred or still need to be transferred).
      Signed-off-by: NLars-Peter Clausen <lars@metafoo.de>
      Acked-by: NVinod Koul <vinod.koul@intel.com>
      Signed-off-by: NMark Brown <broonie@linaro.org>
      50720563
    • S
      i2c: Re-instate body of i2c_parent_is_i2c_adapter() · 2fac2b89
      Stephen Warren 提交于
      The body of i2c_parent_is_i2c_adapter() is currently guarded by
      I2C_MUX. It should be CONFIG_I2C_MUX instead.
      
      Among potentially other problems, this resulted in i2c_lock_adapter()
      only locking I2C mux child adapters, and not the parent adapter. In
      turn, this could allow inter-mingling of mux child selection and I2C
      transactions, which could result in I2C transactions being directed to
      the wrong I2C bus, and possibly even switching between busses in the
      middle of a transaction.
      
      One concrete issue caused by this bug was corrupted HDMI EDID reads
      during boot on the NVIDIA Tegra Seaboard system, although this only
      became apparent in recent linux-next, when the boot timing was changed
      just enough to trigger the race condition.
      
      Fixes: 3923172b ("i2c: reduce parent checking to a NOOP in non-I2C_MUX case")
      Cc: Phil Carmody <phil.carmody@partner.samsung.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NStephen Warren <swarren@nvidia.com>
      Signed-off-by: NWolfram Sang <wsa@the-dreams.de>
      2fac2b89
  6. 14 1月, 2014 1 次提交