1. 30 4月, 2013 18 次提交
    • J
      sparse-vmemmap: specify vmemmap population range in bytes · 0aad818b
      Johannes Weiner 提交于
      The sparse code, when asking the architecture to populate the vmemmap,
      specifies the section range as a starting page and a number of pages.
      
      This is an awkward interface, because none of the arch-specific code
      actually thinks of the range in terms of 'struct page' units and always
      translates it to bytes first.
      
      In addition, later patches mix huge page and regular page backing for
      the vmemmap.  For this, they need to call vmemmap_populate_basepages()
      on sub-section ranges with PAGE_SIZE and PMD_SIZE in mind.  But these
      are not necessarily multiples of the 'struct page' size and so this unit
      is too coarse.
      
      Just translate the section range into bytes once in the generic sparse
      code, then pass byte ranges down the stack.
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Ben Hutchings <ben@decadent.org.uk>
      Cc: Bernhard Schmidt <Bernhard.Schmidt@lrz.de>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: "Luck, Tony" <tony.luck@intel.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Acked-by: NDavid S. Miller <davem@davemloft.net>
      Tested-by: NDavid S. Miller <davem@davemloft.net>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0aad818b
    • D
      mm, hugetlb: include hugepages in meminfo · 949f7ec5
      David Rientjes 提交于
      Particularly in oom conditions, it's troublesome that hugetlb memory is
      not displayed.  All other meminfo that is emitted will not add up to
      what is expected, and there is no artifact left in the kernel log to
      show that a potentially significant amount of memory is actually
      allocated as hugepages which are not available to be reclaimed.
      
      Booting with hugepages=8192 on the command line, this memory is now
      shown in oom conditions.  For example, with echo m >
      /proc/sysrq-trigger:
      
        Node 0 hugepages_total=2048 hugepages_free=2048 hugepages_surp=0 hugepages_size=2048kB
        Node 1 hugepages_total=2048 hugepages_free=2048 hugepages_surp=0 hugepages_size=2048kB
        Node 2 hugepages_total=2048 hugepages_free=2048 hugepages_surp=0 hugepages_size=2048kB
        Node 3 hugepages_total=2048 hugepages_free=2048 hugepages_surp=0 hugepages_size=2048kB
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      949f7ec5
    • H
      mm: allow arch code to control the user page table ceiling · 6ee8630e
      Hugh Dickins 提交于
      On architectures where a pgd entry may be shared between user and kernel
      (e.g.  ARM+LPAE), freeing page tables needs a ceiling other than 0.
      This patch introduces a generic USER_PGTABLES_CEILING that arch code can
      override.  It is the responsibility of the arch code setting the ceiling
      to ensure the complete freeing of the page tables (usually in
      pgd_free()).
      
      [catalin.marinas@arm.com: commit log; shift_arg_pages(), asm-generic/pgtables.h changes]
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: <stable@vger.kernel.org>	[3.3+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6ee8630e
    • A
      kexec, vmalloc: export additional vmalloc layer information · 13ba3fcb
      Atsushi Kumagai 提交于
      Now, vmap_area_list is exported as VMCOREINFO for makedumpfile to get
      the start address of vmalloc region (vmalloc_start).  The address which
      contains vmalloc_start value is represented as below:
      
        vmap_area_list.next - OFFSET(vmap_area.list) + OFFSET(vmap_area.va_start)
      
      However, both OFFSET(vmap_area.va_start) and OFFSET(vmap_area.list)
      aren't exported as VMCOREINFO.
      
      So this patch exports them externally with small cleanup.
      
      [akpm@linux-foundation.org: vmalloc.h should include list.h for list_head]
      Signed-off-by: NAtsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Dave Anderson <anderson@redhat.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      13ba3fcb
    • J
      mm, vmalloc: export vmap_area_list, instead of vmlist · f1c4069e
      Joonsoo Kim 提交于
      Although our intention is to unexport internal structure entirely, but
      there is one exception for kexec.  kexec dumps address of vmlist and
      makedumpfile uses this information.
      
      We are about to remove vmlist, then another way to retrieve information
      of vmalloc layer is needed for makedumpfile.  For this purpose, we
      export vmap_area_list, instead of vmlist.
      Signed-off-by: NJoonsoo Kim <js1304@gmail.com>
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Dave Anderson <anderson@redhat.com>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f1c4069e
    • J
      mm, vmalloc: move get_vmalloc_info() to vmalloc.c · db3808c1
      Joonsoo Kim 提交于
      Now get_vmalloc_info() is in fs/proc/mmu.c.  There is no reason that this
      code must be here and it's implementation needs vmlist_lock and it iterate
      a vmlist which may be internal data structure for vmalloc.
      
      It is preferable that vmlist_lock and vmlist is only used in vmalloc.c
      for maintainability. So move the code to vmalloc.c
      Signed-off-by: NJoonsoo Kim <js1304@gmail.com>
      Signed-off-by: NJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Atsushi Kumagai <kumagai-atsushi@mxc.nes.nec.co.jp>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Dave Anderson <anderson@redhat.com>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Vivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      db3808c1
    • D
      mm: make snapshotting pages for stable writes a per-bio operation · 71368511
      Darrick J. Wong 提交于
      Walking a bio's page mappings has proved problematic, so create a new
      bio flag to indicate that a bio's data needs to be snapshotted in order
      to guarantee stable pages during writeback.  Next, for the one user
      (ext3/jbd) of snapshotting, hook all the places where writes can be
      initiated without PG_writeback set, and set BIO_SNAP_STABLE there.
      
      We must also flag journal "metadata" bios for stable writeout, since
      file data can be written through the journal.  Finally, the
      MS_SNAP_STABLE mount flag (only used by ext3) is now superfluous, so get
      rid of it.
      
      [akpm@linux-foundation.org: rename _submit_bh()'s `flags' to `bio_flags', delobotomize the _submit_bh declaration]
      [akpm@linux-foundation.org: teeny cleanup]
      Signed-off-by: NDarrick J. Wong <darrick.wong@oracle.com>
      Cc: Andy Lutomirski <luto@amacapital.net>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Artem Bityutskiy <dedekind1@gmail.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Cc: Jens Axboe <axboe@kernel.dk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      71368511
    • G
      mm/hugetlb: add more arch-defined huge_pte functions · 106c992a
      Gerald Schaefer 提交于
      Commit abf09bed ("s390/mm: implement software dirty bits")
      introduced another difference in the pte layout vs.  the pmd layout on
      s390, thoroughly breaking the s390 support for hugetlbfs.  This requires
      replacing some more pte_xxx functions in mm/hugetlbfs.c with a
      huge_pte_xxx version.
      
      This patch introduces those huge_pte_xxx functions and their generic
      implementation in asm-generic/hugetlb.h, which will now be included on
      all architectures supporting hugetlbfs apart from s390.  This change
      will be a no-op for those architectures.
      
      [akpm@linux-foundation.org: fix warning]
      Signed-off-by: NGerald Schaefer <gerald.schaefer@de.ibm.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Hillf Danton <dhillf@gmail.com>
      Acked-by: Michal Hocko <mhocko@suse.cz>	[for !s390 parts]
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Chris Metcalf <cmetcalf@tilera.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      106c992a
    • J
      fs: don't compile in drop_caches.c when CONFIG_SYSCTL=n · 146732ce
      Josh Triplett 提交于
      drop_caches.c provides code only invokable via sysctl, so don't compile it
      in when CONFIG_SYSCTL=n.
      Signed-off-by: NJosh Triplett <josh@joshtriplett.org>
      Acked-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      146732ce
    • M
      cgroup: remove css_get_next · 6d2488f6
      Michal Hocko 提交于
      Now that we have generic and well ordered cgroup tree walkers there is
      no need to keep css_get_next in the place.
      Signed-off-by: NMichal Hocko <mhocko@suse.cz>
      Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Ying Han <yinghan@google.com>
      Cc: Tejun Heo <htejun@gmail.com>
      Cc: Glauber Costa <glommer@parallels.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6d2488f6
    • J
      mm: introduce free_highmem_page() helper to free highmem pages into buddy system · cfa11e08
      Jiang Liu 提交于
      The original goal of this patchset is to fix the bug reported by
      
        https://bugzilla.kernel.org/show_bug.cgi?id=53501
      
      Now it has also been expanded to reduce common code used by memory
      initializion.
      
      This is the second part, which applies to the previous part at:
        http://marc.info/?l=linux-mm&m=136289696323825&w=2
      
      It introduces a helper function free_highmem_page() to free highmem
      pages into the buddy system when initializing mm subsystem.
      Introduction of free_highmem_page() is one step forward to clean up
      accesses and modificaitons of totalhigh_pages, totalram_pages and
      zone->managed_pages etc. I hope we could remove all references to
      totalhigh_pages from the arch/ subdirectory.
      
      We have only tested these patchset on x86 platforms, and have done basic
      compliation tests using cross-compilers from ftp.kernel.org. That means
      some code may not pass compilation on some architectures. So any help
      to test this patchset are welcomed!
      
      There are several other parts still under development:
      Part3: refine code to manage totalram_pages, totalhigh_pages and
      	zone->managed_pages
      Part4: introduce helper functions to simplify mem_init() and remove the
      	global variable num_physpages.
      
      This patch:
      
      Introduce helper function free_highmem_page(), which will be used by
      architectures with HIGHMEM enabled to free highmem pages into the buddy
      system.
      Signed-off-by: NJiang Liu <jiang.liu@huawei.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "Suzuki K. Poulose" <suzuki@in.ibm.com>
      Cc: Alexander Graf <agraf@suse.de>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Attilio Rao <attilio.rao@citrix.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Cong Wang <amwang@redhat.com>
      Cc: David Daney <david.daney@cavium.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Jiang Liu <liuj97@gmail.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
      Cc: Linus Walleij <linus.walleij@linaro.org>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Stephen Boyd <sboyd@codeaurora.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Reviewed-by: NPekka Enberg <penberg@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cfa11e08
    • J
      mm: introduce common help functions to deal with reserved/managed pages · 69afade7
      Jiang Liu 提交于
      The original goal of this patchset is to fix the bug reported by
      
        https://bugzilla.kernel.org/show_bug.cgi?id=53501
      
      Now it has also been expanded to reduce common code used by memory
      initializion.
      
      This is the first part, which applies to v3.9-rc1.
      
      It introduces following common helper functions to simplify
      free_initmem() and free_initrd_mem() on different architectures:
      
      adjust_managed_page_count():
      	will be used to adjust totalram_pages, totalhigh_pages,
      	zone->managed_pages when reserving/unresering a page.
      
      __free_reserved_page():
      	free a reserved page into the buddy system without adjusting
      	page statistics info
      
      free_reserved_page():
      	free a reserved page into the buddy system and adjust page
      	statistics info
      
      mark_page_reserved():
      	mark a page as reserved and adjust page statistics info
      
      free_reserved_area():
      	free a continous ranges of pages by calling free_reserved_page()
      
      free_initmem_default():
      	default method to free __init pages.
      
      We have only tested these patchset on x86 platforms, and have done basic
      compliation tests using cross-compilers from ftp.kernel.org.  That means
      some code may not pass compilation on some architectures.  So any help to
      test this patchset are welcomed!
      
      There are several other parts still under development:
      Part2: introduce free_highmem_page() to simplify freeing highmem pages
      Part3: refine code to manage totalram_pages, totalhigh_pages and
      	zone->managed_pages
      Part4: introduce helper functions to simplify mem_init() and remove the
      	global variable num_physpages.
      
      This patch:
      
      Code to deal with reserved/managed pages are duplicated by many
      architectures, so introduce common help functions to reduce duplicated
      code.  These common help functions will also be used to concentrate code
      to modify totalram_pages and zone->managed_pages, which makes the code
      much more clear.
      Signed-off-by: NJiang Liu <jiang.liu@huawei.com>
      Acked-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
      Cc: Anatolij Gustschin <agust@denx.de>
      Cc: Aurelien Jacquiot <a-jacquiot@ti.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chen Liqin <liqin.chen@sunplusct.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: David Howells <dhowells@redhat.com>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Fenghua Yu <fenghua.yu@intel.com>
      Cc: Guan Xuetao <gxt@mprc.pku.edu.cn>
      Cc: Haavard Skinnemoen <hskinnemoen@gmail.com>
      Cc: Hans-Christian Egtvedt <egtvedt@samfundet.no>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Helge Deller <deller@gmx.de>
      Cc: Hirokazu Takata <takata@linux-m32r.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Cc: James Hogan <james.hogan@imgtec.com>
      Cc: Jeff Dike <jdike@addtoit.com>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Jiang Liu <liuj97@gmail.com>
      Cc: Jonas Bonn <jonas@southpole.se>
      Cc: Koichi Yasutake <yasutake.koichi@jp.panasonic.com>
      Cc: Lennox Wu <lennox.wu@gmail.com>
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Mikael Starvik <starvik@axis.com>
      Cc: Mike Frysinger <vapier@gentoo.org>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Ralf Baechle <ralf@linux-mips.org>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Russell King <linux@arm.linux.org.uk>
      Cc: Sam Ravnborg <sam@ravnborg.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Will Deacon <will.deacon@arm.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      69afade7
    • P
      vm: adjust ifdef for TINY_RCU · 8375ad98
      Paul E. McKenney 提交于
      There is an ifdef in page_cache_get_speculative() that checks for !SMP
      and TREE_RCU, which has been an impossible combination since the advent
      of TINY_RCU.  The ifdef enables a fastpath that is valid when preemption
      is disabled by rcu_read_lock() in UP systems, which is the case when
      TINY_RCU is enabled.  This commit therefore adjusts the ifdef to
      generate the fastpath when TINY_RCU is enabled.
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Reported-by: NAndi Kleen <andi@firstfloor.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8375ad98
    • A
      mm/shmem.c: remove an ifdef · 250297ed
      Andrew Morton 提交于
      Create a CONFIG_MMU=y stub for ramfs_nommu_expand_for_mapping() in the
      usual fashion.
      
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Wolfram Sang <wsa@the-dreams.de>
      Cc: Hugh Dickins <hughd@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      250297ed
    • D
      mm, show_mem: suppress page counts in non-blockable contexts · 4b59e6c4
      David Rientjes 提交于
      On large systems with a lot of memory, walking all RAM to determine page
      types may take a half second or even more.
      
      In non-blockable contexts, the page allocator will emit a page allocation
      failure warning unless __GFP_NOWARN is specified.  In such contexts, irqs
      are typically disabled and such a lengthy delay may even result in NMI
      watchdog timeouts.
      
      To fix this, suppress the page walk in such contexts when printing the
      page allocation failure warning.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Cc: Dave Hansen <dave@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4b59e6c4
    • R
      mm: trace filemap add and del · fe0bfaaf
      Robert Jarzmik 提交于
      Use the events API to trace filemap loading and unloading of file pieces
      into the page cache.
      
      This patch aims at tracing the eviction reload cycle of executable and
      shared libraries pages in a memory constrained environment.
      
      The typical usage is to spot a specific device and inode (for example
      /lib/libc.so) to see the eviction cycles, and find out if frequently
      used code is rather spread across many pages (bad) or coallesced (good).
      Signed-off-by: NRobert Jarzmik <robert.jarzmik@free.fr>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fe0bfaaf
    • J
      debug_locks.h: make warning more verbose · 2c2fea11
      James Hogan 提交于
      The WARN_ON(1) in DEBUG_LOCKS_WARN_ON is surprisingly awkward to track
      down when it's hit, as it's usually buried in macros, causing multiple
      instances to land on the same line number.
      
      This patch makes it more useful by switching to:
      
          WARN(1, "DEBUG_LOCKS_WARN_ON(%s)", #c);
      
      so that the particular DEBUG_LOCKS_WARN_ON is more easily identified and
      grep'd for.  For example:
      
          WARNING: at kernel/mutex.c:198 _mutex_lock_nested+0x31c/0x380()
          DEBUG_LOCKS_WARN_ON(l->magic != l)
      Signed-off-by: NJames Hogan <james.hogan@imgtec.com>
      Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2c2fea11
    • H
      drivers/video: add Hyper-V Synthetic Video Frame Buffer Driver · 68a2d20b
      Haiyang Zhang 提交于
      This is the driver for the Hyper-V Synthetic Video, which supports
      screen resolution up to Full HD 1920x1080 on Windows Server 2012 host,
      and 1600x1200 on Windows Server 2008 R2 or earlier.  It also solves the
      double mouse cursor issue of the emulated video mode.
      Signed-off-by: NHaiyang Zhang <haiyangz@microsoft.com>
      Reviewed-by: NK. Y. Srinivasan <kys@microsoft.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>,
      Cc: Olaf Hering <olaf@aepfle.de>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Florian Tobias Schandinat <FlorianSchandinat@gmx.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      68a2d20b
  2. 24 4月, 2013 1 次提交
    • A
      usb: phy: tegra: don't call into tegra-ehci directly · ee5d5499
      Arnd Bergmann 提交于
      Both phy-tegra-usb.c and ehci-tegra.c export symbols used by the other one,
      which does not work if one of them or both are loadable modules, resulting
      in an error like:
      
      drivers/built-in.o: In function `utmi_phy_clk_disable':
      drivers/usb/phy/phy-tegra-usb.c:302: undefined reference to `tegra_ehci_set_phcd'
      drivers/built-in.o: In function `utmi_phy_clk_enable':
      drivers/usb/phy/phy-tegra-usb.c:324: undefined reference to `tegra_ehci_set_phcd'
      drivers/built-in.o: In function `utmi_phy_power_on':
      drivers/usb/phy/phy-tegra-usb.c:447: undefined reference to `tegra_ehci_set_pts'
      
      This turns the interface into a one-way dependency by letting the tegra ehci
      driver pass two function pointers for callbacks that need to be called by
      the phy driver.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Cc: Venu Byravarasu <vbyravarasu@nvidia.com>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Stephen Warren <swarren@nvidia.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ee5d5499
  3. 23 4月, 2013 4 次提交
  4. 21 4月, 2013 1 次提交
  5. 20 4月, 2013 1 次提交
    • D
      irda: small read past the end of array in debug code · e15465e1
      Dan Carpenter 提交于
      The "reason" can come from skb->data[] and it hasn't been capped so it
      can be from 0-255 instead of just 0-6.  For example in irlmp_state_dtr()
      the code does:
      
      	reason = skb->data[3];
      	...
      	irlmp_disconnect_indication(self, reason, skb);
      
      Also LMREASON has a couple other values which don't have entries in the
      irlmp_reasons[] array.  And 0xff is a valid reason as well which means
      "unknown".
      
      So far as I can see we don't actually care about "reason" except for in
      the debug code.
      Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e15465e1
  6. 19 4月, 2013 4 次提交
    • L
      pinctrl/pinconfig: add debug interface · f07512e6
      Laurent Meunier 提交于
      This update adds a debugfs interface to modify a pin configuration
      for a given state in the pinctrl map. This allows to modify the
      configuration for a non-active state, typically sleep state.
      This configuration is not applied right away, but only when the state
      will be entered.
      
      This solution is mandated for us by HW validation: in order
      to test and verify several pin configurations during sleep without
      recompiling the software.
      
      Change log in this patch set;
      Take into account latest feedback from Stephen Warren:
      - stale comments update
      - improved code efficiency and readibility
      - limit size of global variable pinconf_dbg_conf
      - remove req_type as it can easily be added later when
      add/delete requests support is implemented
      Signed-off-by: NLaurent Meunier <laurent.meunier@st.com>
      Signed-off-by: NLinus Walleij <linus.walleij@linaro.org>
      f07512e6
    • W
      mutex: Queue mutex spinners with MCS lock to reduce cacheline contention · 2bd2c92c
      Waiman Long 提交于
      The current mutex spinning code (with MUTEX_SPIN_ON_OWNER option
      turned on) allow multiple tasks to spin on a single mutex
      concurrently. A potential problem with the current approach is
      that when the mutex becomes available, all the spinning tasks
      will try to acquire the mutex more or less simultaneously. As a
      result, there will be a lot of cacheline bouncing especially on
      systems with a large number of CPUs.
      
      This patch tries to reduce this kind of contention by putting
      the mutex spinners into a queue so that only the first one in
      the queue will try to acquire the mutex. This will reduce
      contention and allow all the tasks to move forward faster.
      
      The queuing of mutex spinners is done using an MCS lock based
      implementation which will further reduce contention on the mutex
      cacheline than a similar ticket spinlock based implementation.
      This patch will add a new field into the mutex data structure
      for holding the MCS lock. This expands the mutex size by 8 bytes
      for 64-bit system and 4 bytes for 32-bit system. This overhead
      will be avoid if the MUTEX_SPIN_ON_OWNER option is turned off.
      
      The following table shows the jobs per minute (JPM) scalability
      data on an 8-node 80-core Westmere box with a 3.7.10 kernel. The
      numactl command is used to restrict the running of the fserver
      workloads to 1/2/4/8 nodes with hyperthreading off.
      
      +-----------------+-----------+-----------+-------------+----------+
      |  Configuration  | Mean JPM  | Mean JPM  |  Mean JPM   | % Change |
      |                 | w/o patch | patch 1   | patches 1&2 |  1->1&2  |
      +-----------------+------------------------------------------------+
      |                 |              User Range 1100 - 2000            |
      +-----------------+------------------------------------------------+
      | 8 nodes, HT off |  227972   |  227237   |   305043    |  +34.2%  |
      | 4 nodes, HT off |  393503   |  381558   |   394650    |   +3.4%  |
      | 2 nodes, HT off |  334957   |  325240   |   338853    |   +4.2%  |
      | 1 node , HT off |  198141   |  197972   |   198075    |   +0.1%  |
      +-----------------+------------------------------------------------+
      |                 |              User Range 200 - 1000             |
      +-----------------+------------------------------------------------+
      | 8 nodes, HT off |  282325   |  312870   |   332185    |   +6.2%  |
      | 4 nodes, HT off |  390698   |  378279   |   393419    |   +4.0%  |
      | 2 nodes, HT off |  336986   |  326543   |   340260    |   +4.2%  |
      | 1 node , HT off |  197588   |  197622   |   197582    |    0.0%  |
      +-----------------+-----------+-----------+-------------+----------+
      
      At low user range 10-100, the JPM differences were within +/-1%.
      So they are not that interesting.
      
      The fserver workload uses mutex spinning extensively. With just
      the mutex change in the first patch, there is no noticeable
      change in performance.  Rather, there is a slight drop in
      performance. This mutex spinning patch more than recovers the
      lost performance and show a significant increase of +30% at high
      user load with the full 8 nodes. Similar improvements were also
      seen in a 3.8 kernel.
      
      The table below shows the %time spent by different kernel
      functions as reported by perf when running the fserver workload
      at 1500 users with all 8 nodes.
      
      +-----------------------+-----------+---------+-------------+
      |        Function       |  % time   | % time  |   % time    |
      |                       | w/o patch | patch 1 | patches 1&2 |
      +-----------------------+-----------+---------+-------------+
      | __read_lock_failed    |  34.96%   | 34.91%  |   29.14%    |
      | __write_lock_failed   |  10.14%   | 10.68%  |    7.51%    |
      | mutex_spin_on_owner   |   3.62%   |  3.42%  |    2.33%    |
      | mspin_lock            |    N/A    |   N/A   |    9.90%    |
      | __mutex_lock_slowpath |   1.46%   |  0.81%  |    0.14%    |
      | _raw_spin_lock        |   2.25%   |  2.50%  |    1.10%    |
      +-----------------------+-----------+---------+-------------+
      
      The fserver workload for an 8-node system is dominated by the
      contention in the read/write lock. Mutex contention also plays a
      role. With the first patch only, mutex contention is down (as
      shown by the __mutex_lock_slowpath figure) which help a little
      bit. We saw only a few percents improvement with that.
      
      By applying patch 2 as well, the single mutex_spin_on_owner
      figure is now split out into an additional mspin_lock figure.
      The time increases from 3.42% to 11.23%. It shows a great
      reduction in contention among the spinners leading to a 30%
      improvement. The time ratio 9.9/2.33=4.3 indicates that there
      are on average 4+ spinners waiting in the spin_lock loop for
      each spinner in the mutex_spin_on_owner loop. Contention in
      other locking functions also go down by quite a lot.
      
      The table below shows the performance change of both patches 1 &
      2 over patch 1 alone in other AIM7 workloads (at 8 nodes,
      hyperthreading off).
      
      +--------------+---------------+----------------+-----------------+
      |   Workload   | mean % change | mean % change  | mean % change   |
      |              | 10-100 users  | 200-1000 users | 1100-2000 users |
      +--------------+---------------+----------------+-----------------+
      | alltests     |      0.0%     |     -0.8%      |     +0.6%       |
      | five_sec     |     -0.3%     |     +0.8%      |     +0.8%       |
      | high_systime |     +0.4%     |     +2.4%      |     +2.1%       |
      | new_fserver  |     +0.1%     |    +14.1%      |    +34.2%       |
      | shared       |     -0.5%     |     -0.3%      |     -0.4%       |
      | short        |     -1.7%     |     -9.8%      |     -8.3%       |
      +--------------+---------------+----------------+-----------------+
      
      The short workload is the only one that shows a decline in
      performance probably due to the spinner locking and queuing
      overhead.
      Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
      Reviewed-by: NDavidlohr Bueso <davidlohr.bueso@hp.com>
      Acked-by: NRik van Riel <riel@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Chandramouleeswaran Aswin <aswin@hp.com>
      Cc: Norton Scott J <scott.norton@hp.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Clark Williams <williams@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1366226594-5506-4-git-send-email-Waiman.Long@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      2bd2c92c
    • W
      mutex: Move mutex spinning code from sched/core.c back to mutex.c · 41fcb9f2
      Waiman Long 提交于
      As mentioned by Ingo, the SCHED_FEAT_OWNER_SPIN scheduler
      feature bit was really just an early hack to make with/without
      mutex-spinning testable. So it is no longer necessary.
      
      This patch removes the SCHED_FEAT_OWNER_SPIN feature bit and
      move the mutex spinning code from kernel/sched/core.c back to
      kernel/mutex.c which is where they should belong.
      Signed-off-by: NWaiman Long <Waiman.Long@hp.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Chandramouleeswaran Aswin <aswin@hp.com>
      Cc: Davidlohr Bueso <davidlohr.bueso@hp.com>
      Cc: Norton Scott J <scott.norton@hp.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Dave Jones <davej@redhat.com>
      Cc: Clark Williams <williams@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Link: http://lkml.kernel.org/r/1366226594-5506-2-git-send-email-Waiman.Long@hp.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
      41fcb9f2
    • L
      Revert "block: add missing block_bio_complete() tracepoint" · 0a82a8d1
      Linus Torvalds 提交于
      This reverts commit 3a366e61.
      
      Wanlong Gao reports that it causes a kernel panic on his machine several
      minutes after boot. Reverting it removes the panic.
      
      Jens says:
       "It's not quite clear why that is yet, so I think we should just revert
        the commit for 3.9 final (which I'm assuming is pretty close).
      
        The wifi is crap at the LSF hotel, so sending this email instead of
        queueing up a revert and pull request."
      Reported-by: NWanlong Gao <gaowanlong@cn.fujitsu.com>
      Requested-by: NJens Axboe <axboe@kernel.dk>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0a82a8d1
  7. 18 4月, 2013 3 次提交
  8. 17 4月, 2013 4 次提交
    • M
      fuse: fix type definitions in uapi header · 4c82456e
      Miklos Szeredi 提交于
      Commit 7e98d530 (Synchronize fuse header with
      one used in library) added #ifdef __linux__ around defines if it is not set.
      The kernel build is self-contained and can be built on non-Linux toolchains.
      After the mentioned commit builds on non-Linux toolchains will try to include
      stdint.h and fail due to -nostdinc, and then fail with a bunch of undefined type
      errors.
      
      Fix by checking for __KERNEL__ instead of __linux__ and using the standard int
      types instead of the linux specific ones.
      Reported-by: NArve Hjønnevåg <arve@android.com>
      Reported-by: NColin Cross <ccross@android.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      4c82456e
    • L
      vm: add vm_iomap_memory() helper function · b4cbb197
      Linus Torvalds 提交于
      Various drivers end up replicating the code to mmap() their memory
      buffers into user space, and our core memory remapping function may be
      very flexible but it is unnecessarily complicated for the common cases
      to use.
      
      Our internal VM uses pfn's ("page frame numbers") which simplifies
      things for the VM, and allows us to pass physical addresses around in a
      denser and more efficient format than passing a "phys_addr_t" around,
      and having to shift it up and down by the page size.  But it just means
      that drivers end up doing that shifting instead at the interface level.
      
      It also means that drivers end up mucking around with internal VM things
      like the vma details (vm_pgoff, vm_start/end) way more than they really
      need to.
      
      So this just exports a function to map a certain physical memory range
      into user space (using a phys_addr_t based interface that is much more
      natural for a driver) and hides all the complexity from the driver.
      Some drivers will still end up tweaking the vm_page_prot details for
      things like prefetching or cacheability etc, but that's actually
      relevant to the driver, rather than caring about what the page offset of
      the mapping is into the particular IO memory region.
      Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b4cbb197
    • J
      xen: drop tracking of IRQ vector · dec02dea
      Jan Beulich 提交于
      For quite a few Xen versions, this wasn't the IRQ vector anymore
      anyway, and it is not being used by the kernel for anything. Hence
      drop the field from struct irq_info, and respective function
      parameters.
      Signed-off-by: NJan Beulich <jbeulich@suse.com>
      Acked-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      dec02dea
    • M
      PCI/ACPI: Remove support of ACPI PCI subdrivers · c309dbb4
      Myron Stowe 提交于
      Both sub-drivers of the "PCI Root Bridge ("pci_bridge")" driver, "acpiphp"
      and "pci_slot", have been converted to hook directly into the PCI core.
      
      With the conversions there are no remaining usages of the 'struct
      acpi_pci_driver' list based infrastructure.  This patch removes it.
      Signed-off-by: NMyron Stowe <myron.stowe@redhat.com>
      Signed-off-by: NJiang Liu <jiang.liu@huawei.com>
      Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
      Reviewed-by: NYinghai Lu <yinghai@kernel.org>
      Cc: "Rafael J. Wysocki" <rafael.j.wysocki@intel.com>
      Cc: Toshi Kani <toshi.kani@hp.com>
      c309dbb4
  9. 16 4月, 2013 3 次提交
  10. 15 4月, 2013 1 次提交
    • C
      ipv6: statically link register_inet6addr_notifier() · f88c91dd
      Cong Wang 提交于
      Tomas reported the following build error:
      
      net/built-in.o: In function `ieee80211_unregister_hw':
      (.text+0x10f0e1): undefined reference to `unregister_inet6addr_notifier'
      net/built-in.o: In function `ieee80211_register_hw':
      (.text+0x10f610): undefined reference to `register_inet6addr_notifier'
      make: *** [vmlinux] Error 1
      
      when built IPv6 as a module.
      
      So we have to statically link these symbols.
      Reported-by: NTomas Melin <tomas.melin@iki.fi>
      Cc: Tomas Melin <tomas.melin@iki.fi>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: YOSHIFUJI Hidaki <yoshfuji@linux-ipv6.org>
      Signed-off-by: NCong Wang <amwang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f88c91dd