1. 28 2月, 2013 2 次提交
  2. 26 2月, 2013 3 次提交
  3. 25 2月, 2013 13 次提交
  4. 24 2月, 2013 22 次提交
    • A
      x86/mm/pageattr: Prevent PSE and GLOABL leftovers to confuse pmd/pte_present and pmd_huge · a8aed3e0
      Andrea Arcangeli 提交于
      Without this patch any kernel code that reads kernel memory in
      non present kernel pte/pmds (as set by pageattr.c) will crash.
      
      With this kernel code:
      
      static struct page *crash_page;
      static unsigned long *crash_address;
      [..]
      	crash_page = alloc_pages(GFP_KERNEL, 9);
      	crash_address = page_address(crash_page);
      	if (set_memory_np((unsigned long)crash_address, 1))
      		printk("set_memory_np failure\n");
      [..]
      
      The kernel will crash if inside the "crash tool" one would try
      to read the memory at the not present address.
      
      crash> p crash_address
      crash_address = $8 = (long unsigned int *) 0xffff88023c000000
      crash> rd 0xffff88023c000000
      [ *lockup* ]
      
      The lockup happens because _PAGE_GLOBAL and _PAGE_PROTNONE
      shares the same bit, and pageattr leaves _PAGE_GLOBAL set on a
      kernel pte which is then mistaken as _PAGE_PROTNONE (so
      pte_present returns true by mistake and the kernel fault then
      gets confused and loops).
      
      With THP the same can happen after we taught pmd_present to
      check _PAGE_PROTNONE and _PAGE_PSE in commit
      027ef6c8 ("mm: thp: fix pmd_present for
      split_huge_page and PROT_NONE with THP").  THP has the same
      problem with _PAGE_GLOBAL as the 4k pages, but it also has a
      problem with _PAGE_PSE, which must be cleared too.
      
      After the patch is applied copy_user correctly returns -EFAULT
      and doesn't lockup anymore.
      
      crash> p crash_address
      crash_address = $9 = (long unsigned int *) 0xffff88023c000000
      crash> rd 0xffff88023c000000
      rd: read error: kernel virtual address: ffff88023c000000  type:
      "64-bit KVADDR"
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Shaohua Li <shaohua.li@intel.com>
      Cc: "H. Peter Anvin" <hpa@linux.intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      a8aed3e0
    • A
      Revert "x86, mm: Make spurious_fault check explicitly check explicitly check the PRESENT bit" · 954f8571
      Andrea Arcangeli 提交于
      I got a report for a minor regression introduced by commit
      027ef6c8 ("mm: thp: fix pmd_present for split_huge_page and
      PROT_NONE with THP").
      
      So the problem is, pageattr creates kernel pagetables (pte and
      pmds) that breaks pte_present/pmd_present and the patch above
      exposed this invariant breakage for pmd_present.
      
      The same problem already existed for the pte and pte_present and
      it was fixed by commit 660a293e ("x86, mm: Make
      spurious_fault check explicitly check the PRESENT bit") (if it
      wasn't for that commit, it wouldn't even be a regression).  That
      fix avoids the pagefault to use pte_present.  I could follow
      through by stopping using pmd_present/pmd_huge too.
      
      However I think it's more robust to fix pageattr and to clear
      the PSE/GLOBAL bitflags too in addition to the present bitflag.
      So the kernel page fault can keep using the regular
      pte_present/pmd_present/pmd_huge.
      
      The confusion arises because _PAGE_GLOBAL and _PAGE_PROTNONE are
      sharing the same bit, and in the pmd case we pretend _PAGE_PSE
      to be set only in present pmds (to facilitate split_huge_page
      final tlb flush).
      Signed-off-by: NAndrea Arcangeli <aarcange@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Shaohua Li <shaohua.li@intel.com>
      Cc: "H. Peter Anvin" <hpa@linux.intel.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      954f8571
    • W
      x86/mm/numa: Don't check if node is NUMA_NO_NODE · 942670d0
      Wen Congyang 提交于
      If we aren't debugging per_cpu maps, the cpu's node is stored in
      per_cpu variable numa_node.  If `node' is NUMA_NO_NODE, it means
      the caller wants to clear the cpu's node.  So we should also
      call set_cpu_numa_node() in this case.
      Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Pavel Machek <pavel@ucw.cz>
      Cc: "Rafael J. Wysocki" <rjw@sisk.pl>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NIngo Molnar <mingo@kernel.org>
      942670d0
    • C
      xtensa: add support for TLS · c50842df
      Chris Zankel 提交于
      The Xtensa architecture provides a global register called THREADPTR
      for the purpose of Thread Local Storage (TLS) support. This allows us
      to use a fairly simple implementation, keeping the thread pointer in
      the regset and simply saving and restoring it upon entering/exiting
      the from user space.
      Signed-off-by: NChris Zankel <chris@zankel.net>
      c50842df
    • M
      xtensa: add missing include asm/uaccess.h to checksum.h · b0c438e6
      Max Filippov 提交于
      This fixes the following build errors seen in the linux-next:
      
      arch/xtensa/include/asm/checksum.h:247:2: error: implicit declaration of
      	function 'access_ok' [-Werror=implicit-function-declaration]
      arch/xtensa/include/asm/checksum.h:247:16: error: 'VERIFY_WRITE' undeclared
      	(first use in this function)
      Signed-off-by: NMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: NChris Zankel <chris@zankel.net>
      b0c438e6
    • M
      xtensa: do not enable GENERIC_GPIO by default · e98c5b5b
      Max Filippov 提交于
      Now that drivers/gpio/devres.c build does not depend on GPIOLIB do not
      enable GENERIC_GPIO by default to fix the following build errors seen
      in the linux-next:
      
      include/asm-generic/gpio.h:270:2: error: implicit declaration of function
      	'__gpio_get_value' [-Werror=implicit-function-declaration]
      include/asm-generic/gpio.h:276:2: error: implicit declaration of function
      	'__gpio_set_value' [-Werror=implicit-function-declaration]
      include/linux/gpio.h:60:19: error: redefinition of 'gpio_cansleep'
      	include/linux/gpio.h:62:2: error: implicit declaration of function
             	'__gpio_cansleep' [-Werror=implicit-function-declaration]
      include/linux/gpio.h:67:2: error: implicit declaration of function
      	'__gpio_to_irq' [-Werror=implicit-function-declaration]
      drivers/gpio/devres.c:26:2: error: implicit declaration of function
      	'gpio_free' [-Werror=implicit-function-declaration]
      drivers/gpio/devres.c:60:2: error: implicit declaration of function
      	'gpio_request' [-Werror=implicit-function-declaration]
      drivers/gpio/devres.c:90:2: error: implicit declaration of function
      	'gpio_request_one' [-Werror=implicit-function-declaration]
      Signed-off-by: NMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: NChris Zankel <chris@zankel.net>
      e98c5b5b
    • M
      xtensa: complete ptrace handling of register windows · 4b2bb03f
      Max Filippov 提交于
      Compute WindowBase and WindowMask registers correctly on ptrace calls.
      Work done earlier by Maxim, Christian and Marc.
      Signed-off-by: NMarc Gauthier <marc@tensilica.com>
      Signed-off-by: NMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: NChris Zankel <chris@zankel.net>
      4b2bb03f
    • D
      xtensa: add support for oprofile · e6ffe17e
      dann 提交于
      Support call graph profiling.
      Keep upper two bits of PC unchanged through backtrace rather than take
      them from sp (a1). The stack pointer is usually in the same GB (same
      upper 2 bits) as PC, but technically doesn't always have to be (and
      might not in the future, when taking full advantage of MMU v3).
      Signed-off-by: NDan Nicolaescu <dann@xtensa-linux.org>
      Signed-off-by: NPete Delaney <piet@tensilica.com>
      Signed-off-by: NMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: NChris Zankel <chris@zankel.net>
      e6ffe17e
    • M
      xtensa: move spill_registers to traps.h · 2d6f82fe
      Max Filippov 提交于
      Signed-off-by: NMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: NChris Zankel <chris@zankel.net>
      2d6f82fe
    • V
      xtensa: ISS: add host file-based simulated disk · b6c7e873
      Victor Prupis 提交于
      Simdisk is a block device that maps to a file in the host file system.
      It is usable for testing in the simulated environment, like xt-sim or
      QEMU. Device binding to host file may be changed at runtime via proc
      interface provided the device is not in use. Number of block devices
      and initial binding to host files is controlled via kernel/module
      parameters, with defaults specified in the kernel configuration.
      Signed-off-by: NVictor Prupis <vnp@tensilica.com>
      Signed-off-by: NMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: NChris Zankel <chris@zankel.net>
      b6c7e873
    • M
      xtensa: fix str[n]cmp return value · c5a285bb
      Max Filippov 提交于
      str[n]cmp functions return negative value if the first string is less
      than the second, positive value if the first string is greater than the
      second and zero if they are equal. This is important when these
      functions are used for sorting/binary search.
      
      With incorrect strcmp return value bsearch was always failing in the
      find_symbol_in_section making it impossible to load any module.
      Signed-off-by: NMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: NChris Zankel <chris@zankel.net>
      c5a285bb
    • M
      xtensa: avoid mmap cache aliasing · de73b6b1
      Max Filippov 提交于
      Provide arch_get_unmapped_area function aligning shared memory mapping
      addresses to the biggest of the page size or the cache way size. That
      guarantees that corresponding virtual addresses of shared mappings are
      cached by the same cache sets.
      Signed-off-by: NMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: NChris Zankel <chris@zankel.net>
      de73b6b1
    • M
      xtensa: add finit_module syscall · 475c32d0
      Max Filippov 提交于
      Signed-off-by: NMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: NChris Zankel <chris@zankel.net>
      475c32d0
    • M
      xtensa: pull signal definitions from signal-defs.h · 5d9f36b9
      Max Filippov 提交于
      This fixes the following build error in the current linux-next:
      
      include/linux/signal.h:261:2: error: unknown type name '__sigrestore_t'
      make[2]: *** [arch/xtensa/kernel/asm-offsets.s] Error 1
      make[1]: *** [prepare0] Error 2
      make: *** [sub-make] Error 2
      
      that appeared after 32dae82 'consolidate kernel-side struct sigaction declarations'
      Signed-off-by: NMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: NChris Zankel <chris@zankel.net>
      5d9f36b9
    • M
      xtensa: fix ipc_parse_version selection · e969161b
      Max Filippov 提交于
      shmctl may be called with IPC_64 flag, select function version of
      ipc_parse_version to correctly handle that.
      Signed-off-by: NMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: NChris Zankel <chris@zankel.net>
      e969161b
    • M
      xtensa: dispatch medium-priority interrupts · 2d1c645c
      Marc Gauthier 提交于
      Add support for dispatching medium-priority interrupts, that is,
      interrupts of priority levels 2 to EXCM_LEVEL. IRQ handling may be
      preempted by higher priority IRQ.
      Signed-off-by: NMarc Gauthier <marc@tensilica.com>
      Signed-off-by: NMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: NChris Zankel <chris@zankel.net>
      2d1c645c
    • P
      xtensa: Add config files for Diamond 233L - Rev C processor variant · d0b73b48
      Pete Delaney 提交于
      The Diamond 233L processor is a pre-configured Xtensa processor tailored
      for Linux application.
      Signed-off-by: NPete Delaney <piet@tensilica.com>
      Signed-off-by: NMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: NChris Zankel <chris@zankel.net>
      d0b73b48
    • S
      xtensa: use new common dtc rule · 2a02bc16
      Stephen Warren 提交于
      The current rules have the .dtb files build in a different directory
      from the .dts files. This patch changes xtensa to use the generic dtb
      rule which builds .dtb files in the same directory as the source .dts.
      
      This requires moving parts of arch/xtensa/boot/Makefile into newly
      created arch/xtensa/boot/dts/Makefile, and updating arch/xtensa/Makefile
      to call the new Makefile.
      Signed-off-by: NStephen Warren <swarren@nvidia.com>
      Signed-off-by: NChris Zankel <chris@zankel.net>
      2a02bc16
    • M
      xtensa: rename prom_update_property to of_update_property · 127bc79e
      Max Filippov 提交于
      This rename happened in 79d1c712 powerpc+of: Rename the drivers/of prom_*
      functions to of_*.
      Signed-off-by: NMax Filippov <jcmvbkbc@gmail.com>
      Signed-off-by: NChris Zankel <chris@zankel.net>
      127bc79e
    • Z
      ia64: use %ld to print pages calculated in nr_free_buffer_pages · 6434b94a
      Zhang Yanfei 提交于
      Now the function nr_free_buffer_pages returns unsigned long, so use %ld
      to print its return value.
      Signed-off-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6434b94a
    • S
      swap: add per-partition lock for swapfile · ec8acf20
      Shaohua Li 提交于
      swap_lock is heavily contended when I test swap to 3 fast SSD (even
      slightly slower than swap to 2 such SSD).  The main contention comes
      from swap_info_get().  This patch tries to fix the gap with adding a new
      per-partition lock.
      
      Global data like nr_swapfiles, total_swap_pages, least_priority and
      swap_list are still protected by swap_lock.
      
      nr_swap_pages is an atomic now, it can be changed without swap_lock.  In
      theory, it's possible get_swap_page() finds no swap pages but actually
      there are free swap pages.  But sounds not a big problem.
      
      Accessing partition specific data (like scan_swap_map and so on) is only
      protected by swap_info_struct.lock.
      
      Changing swap_info_struct.flags need hold swap_lock and
      swap_info_struct.lock, because scan_scan_map() will check it.  read the
      flags is ok with either the locks hold.
      
      If both swap_lock and swap_info_struct.lock must be hold, we always hold
      the former first to avoid deadlock.
      
      swap_entry_free() can change swap_list.  To delete that code, we add a
      new highest_priority_index.  Whenever get_swap_page() is called, we
      check it.  If it's valid, we use it.
      
      It's a pity get_swap_page() still holds swap_lock().  But in practice,
      swap_lock() isn't heavily contended in my test with this patch (or I can
      say there are other much more heavier bottlenecks like TLB flush).  And
      BTW, looks get_swap_page() doesn't really need the lock.  We never free
      swap_info[] and we check SWAP_WRITEOK flag.  The only risk without the
      lock is we could swapout to some low priority swap, but we can quickly
      recover after several rounds of swap, so sounds not a big deal to me.
      But I'd prefer to fix this if it's a real problem.
      
      "swap: make each swap partition have one address_space" improved the
      swapout speed from 1.7G/s to 2G/s.  This patch further improves the
      speed to 2.3G/s, so around 15% improvement.  It's a multi-process test,
      so TLB flush isn't the biggest bottleneck before the patches.
      
      [arnd@arndb.de: fix it for nommu]
      [hughd@google.com: add missing unlock]
      [minchan@kernel.org: get rid of lockdep whinge on sys_swapon]
      Signed-off-by: NShaohua Li <shli@fusionio.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Minchan Kim <minchan.kim@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Seth Jennings <sjenning@linux.vnet.ibm.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
      Cc: Dan Magenheimer <dan.magenheimer@oracle.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ec8acf20
    • T
      acpi, memory-hotplug: support getting hotplug info from SRAT · 01a178a9
      Tang Chen 提交于
      We now provide an option for users who don't want to specify physical
      memory address in kernel commandline.
      
               /*
                * For movablemem_map=acpi:
                *
                * SRAT:                |_____| |_____| |_________| |_________| ......
                * node id:                0       1         1           2
                * hotpluggable:           n       y         y           n
                * movablemem_map:              |_____| |_________|
                *
                * Using movablemem_map, we can prevent memblock from allocating memory
                * on ZONE_MOVABLE at boot time.
                */
      
      So user just specify movablemem_map=acpi, and the kernel will use
      hotpluggable info in SRAT to determine which memory ranges should be set
      as ZONE_MOVABLE.
      
      If all the memory ranges in SRAT is hotpluggable, then no memory can be
      used by kernel.  But before parsing SRAT, memblock has already reserve
      some memory ranges for other purposes, such as for kernel image, and so
      on.  We cannot prevent kernel from using these memory.  So we need to
      exclude these ranges even if these memory is hotpluggable.
      
      Furthermore, there could be several memory ranges in the single node
      which the kernel resides in.  We may skip one range that have memory
      reserved by memblock, but if the rest of memory is too small, then the
      kernel will fail to boot.  So, make the whole node which the kernel
      resides in un-hotpluggable.  Then the kernel has enough memory to use.
      
      NOTE: Using this way will cause NUMA performance down because the
            whole node will be set as ZONE_MOVABLE, and kernel cannot use memory
            on it.  If users don't want to lose NUMA performance, just don't use
            it.
      
      [akpm@linux-foundation.org: fix warning]
      [akpm@linux-foundation.org: use strcmp()]
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Jianguo Wu <wujianguo@huawei.com>
      Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Wu Jianguo <wujianguo@huawei.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Len Brown <lenb@kernel.org>
      Cc: "Brown, Len" <len.brown@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      01a178a9