1. 13 11月, 2013 12 次提交
    • V
      sched: remove ARCH specific fpu_counter from task_struct · 27f69e68
      Vineet Gupta 提交于
      fpu_counter in task_struct was used only by sh/x86.  Both of these now
      carry it in ARCH specific thread_struct, hence this can now be removed
      from generic task_struct, shrinking it slightly for other arches.
      Signed-off-by: NVineet Gupta <vgupta@synopsys.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Paul Mundt <paul.mundt@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      27f69e68
    • R
      jump_label: unlikely(x) > 0 · 261adc9a
      Roel Kluin 提交于
      if (unlikely(x) > 0) doesn't seem to help branch prediction
      Signed-off-by: NRoel Kluin <roel.kluin@gmail.com>
      Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
      Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: "H. Peter Anvin" <hpa@linux.intel.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      261adc9a
    • A
      syscalls.h: use gcc alias instead of assembler aliases for syscalls · 83460ec8
      Andi Kleen 提交于
      Use standard gcc __attribute__((alias(foo))) to define the syscall aliases
      instead of custom assembler macros.
      
      This is far cleaner, and also fixes my LTO kernel build.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Cc: Al Viro <viro@ZenIV.linux.org.uk>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      83460ec8
    • M
      mm: numa: return the number of base pages altered by protection changes · 72403b4a
      Mel Gorman 提交于
      Commit 0255d491 ("mm: Account for a THP NUMA hinting update as one
      PTE update") was added to account for the number of PTE updates when
      marking pages prot_numa.  task_numa_work was using the old return value
      to track how much address space had been updated.  Altering the return
      value causes the scanner to do more work than it is configured or
      documented to in a single unit of work.
      
      This patch reverts that commit and accounts for the number of THP
      updates separately in vmstat.  It is up to the administrator to
      interpret the pair of values correctly.  This is a straight-forward
      operation and likely to only be of interest when actively debugging NUMA
      balancing problems.
      
      The impact of this patch is that the NUMA PTE scanner will scan slower
      when THP is enabled and workloads may converge slower as a result.  On
      the flip size system CPU usage should be lower than recent tests
      reported.  This is an illustrative example of a short single JVM specjbb
      test
      
      specjbb
                             3.12.0                3.12.0
                            vanilla      acctupdates
      TPut 1      26143.00 (  0.00%)     25747.00 ( -1.51%)
      TPut 7     185257.00 (  0.00%)    183202.00 ( -1.11%)
      TPut 13    329760.00 (  0.00%)    346577.00 (  5.10%)
      TPut 19    442502.00 (  0.00%)    460146.00 (  3.99%)
      TPut 25    540634.00 (  0.00%)    549053.00 (  1.56%)
      TPut 31    512098.00 (  0.00%)    519611.00 (  1.47%)
      TPut 37    461276.00 (  0.00%)    474973.00 (  2.97%)
      TPut 43    403089.00 (  0.00%)    414172.00 (  2.75%)
      
                    3.12.0      3.12.0
                   vanillaacctupdates
      User         5169.64     5184.14
      System        100.45       80.02
      Elapsed       252.75      251.85
      
      Performance is similar but note the reduction in system CPU time.  While
      this showed a performance gain, it will not be universal but at least
      it'll be behaving as documented.  The vmstats are obviously different but
      here is an obvious interpretation of them from mmtests.
      
                                      3.12.0      3.12.0
                                     vanillaacctupdates
      NUMA page range updates        1408326    11043064
      NUMA huge PMD updates                0       21040
      NUMA PTE updates               1408326      291624
      
      "NUMA page range updates" == nr_pte_updates and is the value returned to
      the NUMA pte scanner.  NUMA huge PMD updates were the number of THP
      updates which in combination can be used to calculate how many ptes were
      updated from userspace.
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Reported-by: NAlex Thorlton <athorlton@sgi.com>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      72403b4a
    • J
      mm: factor commit limit calculation · 00619bcc
      Jerome Marchand 提交于
      The same calculation is currently done in three differents places.
      Factor that code so future changes has to be made at only one place.
      
      [akpm@linux-foundation.org: uninline vm_commit_limit()]
      Signed-off-by: NJerome Marchand <jmarchan@redhat.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Michal Hocko <mhocko@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      00619bcc
    • T
      mm/memblock.c: introduce bottom-up allocation mode · 79442ed1
      Tang Chen 提交于
      The Linux kernel cannot migrate pages used by the kernel.  As a result,
      kernel pages cannot be hot-removed.  So we cannot allocate hotpluggable
      memory for the kernel.
      
      ACPI SRAT (System Resource Affinity Table) contains the memory hotplug
      info.  But before SRAT is parsed, memblock has already started to allocate
      memory for the kernel.  So we need to prevent memblock from doing this.
      
      In a memory hotplug system, any numa node the kernel resides in should be
      unhotpluggable.  And for a modern server, each node could have at least
      16GB memory.  So memory around the kernel image is highly likely
      unhotpluggable.
      
      So the basic idea is: Allocate memory from the end of the kernel image and
      to the higher memory.  Since memory allocation before SRAT is parsed won't
      be too much, it could highly likely be in the same node with kernel image.
      
      The current memblock can only allocate memory top-down.  So this patch
      introduces a new bottom-up allocation mode to allocate memory bottom-up.
      And later when we use this allocation direction to allocate memory, we
      will limit the start address above the kernel.
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Signed-off-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Acked-by: NToshi Kani <toshi.kani@hp.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Wanpeng Li <liwanp@linux.vnet.ibm.com>
      Cc: Thomas Renninger <trenn@suse.de>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Jiang Liu <jiang.liu@huawei.com>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Lai Jiangshan <laijs@cn.fujitsu.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Taku Izumi <izumi.taku@jp.fujitsu.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Kamezawa Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      79442ed1
    • J
      writeback: do not sync data dirtied after sync start · c4a391b5
      Jan Kara 提交于
      When there are processes heavily creating small files while sync(2) is
      running, it can easily happen that quite some new files are created
      between WB_SYNC_NONE and WB_SYNC_ALL pass of sync(2).  That can happen
      especially if there are several busy filesystems (remember that sync
      traverses filesystems sequentially and waits in WB_SYNC_ALL phase on one
      fs before starting it on another fs).  Because WB_SYNC_ALL pass is slow
      (e.g.  causes a transaction commit and cache flush for each inode in
      ext3), resulting sync(2) times are rather large.
      
      The following script reproduces the problem:
      
        function run_writers
        {
          for (( i = 0; i < 10; i++ )); do
            mkdir $1/dir$i
            for (( j = 0; j < 40000; j++ )); do
              dd if=/dev/zero of=$1/dir$i/$j bs=4k count=4 &>/dev/null
            done &
          done
        }
      
        for dir in "$@"; do
          run_writers $dir
        done
      
        sleep 40
        time sync
      
      Fix the problem by disregarding inodes dirtied after sync(2) was called
      in the WB_SYNC_ALL pass.  To allow for this, sync_inodes_sb() now takes
      a time stamp when sync has started which is used for setting up work for
      flusher threads.
      
      To give some numbers, when above script is run on two ext4 filesystems
      on simple SATA drive, the average sync time from 10 runs is 267.549
      seconds with standard deviation 104.799426.  With the patched kernel,
      the average sync time from 10 runs is 2.995 seconds with standard
      deviation 0.096.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NFengguang Wu <fengguang.wu@intel.com>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c4a391b5
    • N
      tools/vm/page-types.c: support KPF_SOFTDIRTY bit · 46c77e2b
      Naoya Horiguchi 提交于
      Soft dirty bit allows us to track which pages are written since the last
      clear_ref (by "echo 4 > /proc/pid/clear_refs".) This is useful for
      userspace applications to know their memory footprints.
      
      Note that the kernel exposes this flag via bit[55] of /proc/pid/pagemap,
      and the semantics is not a default one (scheduled to be the default in the
      near future.) However, it shifts to the new semantics at the first
      clear_ref, and the users of soft dirty bit always do it before utilizing
      the bit, so that's not a big deal.  Users must avoid relying on the bit in
      page-types before the first clear_ref.
      Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Pavel Emelyanov <xemul@parallels.com>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      46c77e2b
    • Z
      mm/sparsemem: use PAGES_PER_SECTION to remove redundant nr_pages parameter · 85b35fea
      Zhang Yanfei 提交于
      For below functions,
      
      - sparse_add_one_section()
      - kmalloc_section_memmap()
      - __kmalloc_section_memmap()
      - __kfree_section_memmap()
      
      they are always invoked to operate on one memory section, so it is
      redundant to always pass a nr_pages parameter, which is the page numbers
      in one section.  So we can directly use predefined macro PAGES_PER_SECTION
      instead of passing the parameter.
      Signed-off-by: NZhang Yanfei <zhangyanfei@cn.fujitsu.com>
      Cc: Wen Congyang <wency@cn.fujitsu.com>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Toshi Kani <toshi.kani@hp.com>
      Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Yasunori Goto <y-goto@jp.fujitsu.com>
      Cc: Andy Whitcroft <apw@shadowen.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      85b35fea
    • D
      mm, mempolicy: make mpol_to_str robust and always succeed · 948927ee
      David Rientjes 提交于
      mpol_to_str() should not fail.  Currently, it either fails because the
      string buffer is too small or because a string hasn't been defined for a
      mempolicy mode.
      
      If a new mempolicy mode is introduced and no string is defined for it,
      just warn and return "unknown".
      
      If the buffer is too small, just truncate the string and return, the
      same behavior as snprintf().
      
      This also fixes a bug where there was no NULL-byte termination when doing
      *p++ = '=' and *p++ ':' and maxlen has been reached.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Chen Gang <gang.chen@asianux.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Dave Jones <davej@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      948927ee
    • T
      cpu/mem hotplug: add try_online_node() for cpu_up() · 01b0f197
      Toshi Kani 提交于
      cpu_up() has #ifdef CONFIG_MEMORY_HOTPLUG code blocks, which call
      mem_online_node() to put its node online if offlined and then call
      build_all_zonelists() to initialize the zone list.
      
      These steps are specific to memory hotplug, and should be managed in
      mm/memory_hotplug.c.  lock_memory_hotplug() should also be held for the
      whole steps.
      
      For this reason, this patch replaces mem_online_node() with
      try_online_node(), which performs the whole steps with
      lock_memory_hotplug() held.  try_online_node() is named after
      try_offline_node() as they have similar purpose.
      
      There is no functional change in this patch.
      Signed-off-by: NToshi Kani <toshi.kani@hp.com>
      Reviewed-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      01b0f197
    • Q
      mm: add a helper function to check may oom condition · b9921ecd
      Qiang Huang 提交于
      Use helper function to check if we need to deal with oom condition.
      Signed-off-by: NQiang Huang <h.huangqiang@huawei.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b9921ecd
  2. 09 11月, 2013 1 次提交
  3. 07 11月, 2013 3 次提交
  4. 06 11月, 2013 1 次提交
  5. 05 11月, 2013 2 次提交
  6. 04 11月, 2013 2 次提交
  7. 01 11月, 2013 1 次提交
  8. 31 10月, 2013 2 次提交
    • S
      of: Move definition of of_find_next_cache_node into common code. · a3e31b45
      Sudeep KarkadaNagesha 提交于
      Since the definition of_find_next_cache_node is architecture independent,
      the existing definition in powerpc can be moved to driver/of/base.c
      
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Grant Likely <grant.likely@linaro.org>
      Cc: Rob Herring <rob.herring@calxeda.com>
      Signed-off-by: NSudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      a3e31b45
    • G
      percpu: fix this_cpu_sub() subtrahend casting for unsigneds · bd09d9a3
      Greg Thelen 提交于
      this_cpu_sub() is implemented as negation and addition.
      
      This patch casts the adjustment to the counter type before negation to
      sign extend the adjustment.  This helps in cases where the counter type
      is wider than an unsigned adjustment.  An alternative to this patch is
      to declare such operations unsupported, but it seemed useful to avoid
      surprises.
      
      This patch specifically helps the following example:
        unsigned int delta = 1
        preempt_disable()
        this_cpu_write(long_counter, 0)
        this_cpu_sub(long_counter, delta)
        preempt_enable()
      
      Before this change long_counter on a 64 bit machine ends with value
      0xffffffff, rather than 0xffffffffffffffff.  This is because
      this_cpu_sub(pcp, delta) boils down to this_cpu_add(pcp, -delta),
      which is basically:
        long_counter = 0 + 0xffffffff
      
      Also apply the same cast to:
        __this_cpu_sub()
        __this_cpu_sub_return()
        this_cpu_sub_return()
      
      All percpu_test.ko passes, especially the following cases which
      previously failed:
      
        l -= ui_one;
        __this_cpu_sub(long_counter, ui_one);
        CHECK(l, long_counter, -1);
      
        l -= ui_one;
        this_cpu_sub(long_counter, ui_one);
        CHECK(l, long_counter, -1);
        CHECK(l, long_counter, 0xffffffffffffffff);
      
        ul -= ui_one;
        __this_cpu_sub(ulong_counter, ui_one);
        CHECK(ul, ulong_counter, -1);
        CHECK(ul, ulong_counter, 0xffffffffffffffff);
      
        ul = this_cpu_sub_return(ulong_counter, ui_one);
        CHECK(ul, ulong_counter, 2);
      
        ul = __this_cpu_sub_return(ulong_counter, ui_one);
        CHECK(ul, ulong_counter, 1);
      Signed-off-by: NGreg Thelen <gthelen@google.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bd09d9a3
  9. 30 10月, 2013 7 次提交
  10. 29 10月, 2013 9 次提交