提交 · 21c527a3cba07f9a9ce17b3a445f110a847793e2 · openanolis / cloud-kernel

06 11月, 2015 40 次提交

mm/compaction.c: add an is_via_compact_memory() helper · 21c527a3

由 Yaowei Bai 提交于 11月 05, 2015

Introduce is_via_compact_memory() helper indicating compacting via
/proc/sys/vm/compact_memory to improve readability.

To catch this situation in __compaction_suitable, use order as parameter
directly instead of using struct compact_control.

This patch has no functional changes.
Signed-off-by: NYaowei Bai <bywxiaobai@163.com>
Cc: Mel Gorman <mgorman@techsingularity.net>
Acked-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

21c527a3

mm/vmscan: make inactive_anon_is_low_global return directly · 29d06bbb

由 Yaowei Bai 提交于 11月 05, 2015

Delete unnecessary if to let inactive_anon_is_low_global return
directly.

No functional changes.
Signed-off-by: NYaowei Bai <bywxiaobai@163.com>
Acked-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

29d06bbb

mm: hugetlb: proc: add HugetlbPages field to /proc/PID/status · 5d317b2b

由 Naoya Horiguchi 提交于 11月 05, 2015

Currently there's no easy way to get per-process usage of hugetlb pages,
which is inconvenient because userspace applications which use hugetlb
typically want to control their processes on the basis of how much memory
(including hugetlb) they use.  So this patch simply provides easy access
to the info via /proc/PID/status.
Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: NJoern Engel <joern@logfs.org>
Acked-by: NDavid Rientjes <rientjes@google.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5d317b2b

mm: hugetlb: proc: add hugetlb-related fields to /proc/PID/smaps · 25ee01a2

由 Naoya Horiguchi 提交于 11月 05, 2015

Currently /proc/PID/smaps provides no usage info for vma(VM_HUGETLB),
which is inconvenient when we want to know per-task or per-vma base
hugetlb usage.  To solve this, this patch adds new fields for hugetlb
usage like below:

  Size:              20480 kB
  Rss:                   0 kB
  Pss:                   0 kB
  Shared_Clean:          0 kB
  Shared_Dirty:          0 kB
  Private_Clean:         0 kB
  Private_Dirty:         0 kB
  Referenced:            0 kB
  Anonymous:             0 kB
  AnonHugePages:         0 kB
  Shared_Hugetlb:    18432 kB
  Private_Hugetlb:    2048 kB
  Swap:                  0 kB
  KernelPageSize:     2048 kB
  MMUPageSize:        2048 kB
  Locked:                0 kB
  VmFlags: rd wr mr mw me de ht

[hughd@google.com: fix Private_Hugetlb alignment ]
Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Acked-by: NJoern Engel <joern@logfs.org>
Acked-by: NDavid Rientjes <rientjes@google.com>
Acked-by: NMichal Hocko <mhocko@suse.cz>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: NHugh Dickins <hughd@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

25ee01a2

mm: use only per-device readahead limit · 600e19af

由 Roman Gushchin 提交于 11月 05, 2015

Maximal readahead size is limited now by two values:
 1) by global 2Mb constant (MAX_READAHEAD in max_sane_readahead())
 2) by configurable per-device value* (bdi->ra_pages)

There are devices, which require custom readahead limit.
For instance, for RAIDs it's calculated as number of devices
multiplied by chunk size times 2.

Readahead size can never be larger than bdi->ra_pages * 2 value
(POSIX_FADV_SEQUNTIAL doubles readahead size).

If so, why do we need two limits?
I suggest to completely remove this max_sane_readahead() stuff and
use per-device readahead limit everywhere.

Also, using right readahead size for RAID disks can significantly
increase i/o performance:

before:
  dd if=/dev/md2 of=/dev/null bs=100M count=100
  100+0 records in
  100+0 records out
  10485760000 bytes (10 GB) copied, 12.9741 s, 808 MB/s

after:
  $ dd if=/dev/md2 of=/dev/null bs=100M count=100
  100+0 records in
  100+0 records out
  10485760000 bytes (10 GB) copied, 8.91317 s, 1.2 GB/s

(It's an 8-disks RAID5 storage).

This patch doesn't change sys_readahead and madvise(MADV_WILLNEED)
behavior introduced by 6d2be915 ("mm/readahead.c: fix readahead
failure for memoryless NUMA nodes and limit readahead pages").
Signed-off-by: NRoman Gushchin <klamm@yandex-team.ru>
Cc: Raghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Cc: David Rientjes <rientjes@google.com>
Cc: onstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

600e19af

mm/page_alloc: remove unused parameter in init_currently_empty_zone() · b171e409

由 Yaowei Bai 提交于 11月 05, 2015

Commit a2f3aa02 ("[PATCH] Fix sparsemem on Cell") fixed an oops
experienced on the Cell architecture when init-time functions,
early_*(), are called at runtime by introducing an 'enum memmap_context'
parameter to memmap_init_zone() and init_currently_empty_zone().  This
parameter is intended to be used to tell whether the call of these two
functions is being made on behalf of a hotplug event, or happening at
boot-time.  However, init_currently_empty_zone() does not use this
parameter at all, so remove it.
Signed-off-by: NYaowei Bai <bywxiaobai@163.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b171e409

mm, migrate: count pages failing all retries in vmstat and tracepoint · f2f81fb2

由 Vlastimil Babka 提交于 11月 05, 2015

Migration tries up to 10 times to migrate pages that return -EAGAIN until
it gives up.  If some pages fail all retries, they are counted towards the
number of failed pages that migrate_pages() returns.  They should also be
counted in the /proc/vmstat pgmigrate_fail and in the mm_migrate_pages
tracepoint.
Signed-off-by: NVlastimil Babka <vbabka@suse.cz>
Acked-by: NDavid Rientjes <rientjes@google.com>
Cc: Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>
Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: Konstantin Khlebnikov <koct9i@gmail.com>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f2f81fb2

mm/memblock: make memblock_remove_range() static · 35bd16a2

由 Alexander Kuleshov 提交于 11月 05, 2015

memblock_remove_range() is only used in the mm/memblock.c, so we can make
it static.
Signed-off-by: NAlexander Kuleshov <kuleshovmail@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

35bd16a2

mm/mremap: use offset_in_page macro · f19cb115

由 Alexander Kuleshov 提交于 11月 05, 2015

linux/mm.h provides offset_in_page() macro.  Let's use already predefined
macro instead of (addr & ~PAGE_MASK).
Signed-off-by: NAlexander Kuleshov <kuleshovmail@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f19cb115

mm/mmap: use offset_in_page macro · de1741a1

由 Alexander Kuleshov 提交于 11月 05, 2015

linux/mm.h provides offset_in_page() macro.  Let's use already predefined
macro instead of (addr & ~PAGE_MASK).
Signed-off-by: NAlexander Kuleshov <kuleshovmail@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

de1741a1

mm/vmalloc: use offset_in_page macro · 891c49ab

由 Alexander Kuleshov 提交于 11月 05, 2015

linux/mm.h provides offset_in_page() macro.  Let's use already predefined
macro instead of (addr & ~PAGE_MASK).
Signed-off-by: NAlexander Kuleshov <kuleshovmail@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

891c49ab

mm/mlock: use offset_in_page macro · 8fd9e488

由 Alexander Kuleshov 提交于 11月 05, 2015

linux/mm.h provides offset_in_page() macro.  Let's use already predefined
macro instead of (addr & ~PAGE_MASK).
Signed-off-by: NAlexander Kuleshov <kuleshovmail@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8fd9e488

mm/util: use offset_in_page macro · ea53cde0

由 Alexander Kuleshov 提交于 11月 05, 2015

linux/mm.h provides offset_in_page() macro.  Let's use already predefined
macro instead of (addr & ~PAGE_MASK).
Signed-off-by: NAlexander Kuleshov <kuleshovmail@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

ea53cde0

mm/percpu: use offset_in_page macro · f09f1243

由 Alexander Kuleshov 提交于 11月 05, 2015

linux/mm.h provides offset_in_page() macro.  Let's use already predefined
macro instead of (addr & ~PAGE_MASK).
Signed-off-by: NAlexander Kuleshov <kuleshovmail@gmail.com>
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

f09f1243

mm/early_ioremap: use offset_in_page macro · 5d57b014

由 Alexander Kuleshov 提交于 11月 05, 2015

linux/mm.h provides offset_in_page() macro.  Let's use already predefined
macro instead of (addr & ~PAGE_MASK).
Signed-off-by: NAlexander Kuleshov <kuleshovmail@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

5d57b014

mm/mincore: use offset_in_page macro · e7bbdd07

由 Alexander Kuleshov 提交于 11月 05, 2015

linux/mm.h provides offset_in_page() macro.  Let's use already predefined
macro instead of (addr & ~PAGE_MASK).
Signed-off-by: NAlexander Kuleshov <kuleshovmail@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

e7bbdd07

mm/nommu: use offset_in_page macro · 1824cb75

由 Alexander Kuleshov 提交于 11月 05, 2015

linux/mm.h provides offset_in_page() macro.  Let's use already predefined
macro instead of (addr & ~PAGE_MASK).
Signed-off-by: NAlexander Kuleshov <kuleshovmail@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

1824cb75

mm/msync: use offset_in_page macro · b0d61c7e

由 Alexander Kuleshov 提交于 11月 05, 2015

linux/mm.h provides offset_in_page() macro.  Let's use already predefined
macro instead of (addr & ~PAGE_MASK).
Signed-off-by: NAlexander Kuleshov <kuleshovmail@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b0d61c7e

arch/powerpc/mm/numa.c: do not allocate bootmem memory for non existing nodes · c118baf8

由 Raghavendra K T 提交于 11月 05, 2015

With the setup_nr_nodes(), we have already initialized
node_possible_map.  So it is safe to use for_each_node here.

There are many places in the kernel that use hardcoded 'for' loop with
nr_node_ids, because all other architectures have numa nodes populated
serially.  That should be reason we had maintained the same for
powerpc.

But, since sparse numa node ids possible on powerpc, we unnecessarily
allocate memory for non existent numa nodes.

For e.g., on a system with 0,1,16,17 as numa nodes nr_node_ids=18 and
we allocate memory for nodes 2-14.  This patch we allocate memory for
only existing numa nodes.

The patch is boot tested on a 4 node tuleta, confirming with printks
that it works as expected.
Signed-off-by: NRaghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Cc: Vladimir Davydov <vdavydov@parallels.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Anton Blanchard <anton@samba.org>
Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Cc: Greg Kurz <gkurz@linux.vnet.ibm.com>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c118baf8

mm/list_lru.c: replace nr_node_ids for loop with for_each_node() · 145949a1

由 Raghavendra K T 提交于 11月 05, 2015

The functions used in the patch are in slowpath, which gets called
whenever alloc_super is called during mounts.

Though this should not make difference for the architectures with
sequential numa node ids, for the powerpc which can potentially have
sparse node ids (for e.g., 4 node system having numa ids, 0,1,16,17 is
common), this patch saves some unnecessary allocations for non existing
numa nodes.

Even without that saving, perhaps patch makes code more readable.

[vdavydov@parallels.com: take memcg_aware check outside for_each loop]
Signed-off-by: NRaghavendra K T <raghavendra.kt@linux.vnet.ibm.com>
Reviewed-by: NVladimir Davydov <vdavydov@parallels.com>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Anton Blanchard <anton@samba.org>
Cc: Nishanth Aravamudan <nacc@linux.vnet.ibm.com>
Cc: Greg Kurz <gkurz@linux.vnet.ibm.com>
Cc: Grant Likely <grant.likely@linaro.org>
Cc: Nikunj A Dadhania <nikunj@linux.vnet.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

145949a1

mm: fix docbook comment for get_vaddr_frames() · 61f9ec1d

由 Jonathan Corbet 提交于 11月 05, 2015

get_vaddr_frames() has a comment that's *almost* a docbook comment; add
the missing star so that the tools will find it properly.
Signed-off-by: NJonathan Corbet <corbet@lwn.net>
Cc: Jan Kara <jack@suse.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

61f9ec1d

memcg: drop unnecessary cold-path tests from __memcg_kmem_bypass() · 7f822c24

由 Tejun Heo 提交于 11月 05, 2015

__memcg_kmem_bypass() decides whether a kmem allocation should be bypassed
to the root memcg.  Some conditions that it tests are valid criteria
regarding who should be held accountable; however, there are a couple
unnecessary tests for cold paths - __GFP_FAIL and fatal_signal_pending().

The previous patch updated try_charge() to handle both __GFP_FAIL and
dying tasks correctly and the only thing these two tests are doing is
making accounting less accurate and sprinkling tests for cold path
conditions in the hot paths.  There's nothing meaningful gained by these
extra tests.

This patch removes the two unnecessary tests from __memcg_kmem_bypass().
Signed-off-by: NTejun Heo <tj@kernel.org>
Reviewed-by: NVladimir Davydov <vdavydov@parallels.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7f822c24

memcg: ratify and consolidate over-charge handling · 10d53c74

由 Tejun Heo 提交于 11月 05, 2015

try_charge() is the main charging logic of memcg.  When it hits the limit
but either can't fail the allocation due to __GFP_NOFAIL or the task is
likely to free memory very soon, being OOM killed, has SIGKILL pending or
exiting, it "bypasses" the charge to the root memcg and returns -EINTR.
While this is one approach which can be taken for these situations, it has
several issues.

* It unnecessarily lies about the reality.  The number itself doesn't
  go over the limit but the actual usage does.  memcg is either forced
  to or actively chooses to go over the limit because that is the
  right behavior under the circumstances, which is completely fine,
  but, if at all avoidable, it shouldn't be misrepresenting what's
  happening by sneaking the charges into the root memcg.

* Despite trying, we already do over-charge.  kmemcg can't deal with
  switching over to the root memcg by the point try_charge() returns
  -EINTR, so it open-codes over-charing.

* It complicates the callers.  Each try_charge() user has to handle
  the weird -EINTR exception.  memcg_charge_kmem() does the manual
  over-charging.  mem_cgroup_do_precharge() performs unnecessary
  uncharging of root memcg, which BTW is inconsistent with what
  memcg_charge_kmem() does but not broken as [un]charging are noops on
  root memcg.  mem_cgroup_try_charge() needs to switch the returned
  cgroup to the root one.

The reality is that in memcg there are cases where we are forced and/or
willing to go over the limit.  Each such case needs to be scrutinized and
justified but there definitely are situations where that is the right
thing to do.  We alredy do this but with a superficial and inconsistent
disguise which leads to unnecessary complications.

This patch updates try_charge() so that it over-charges and returns 0 when
deemed necessary.  -EINTR return is removed along with all special case
handling in the callers.

While at it, remove the local variable @ret, which was initialized to zero
and never changed, along with done: label which just returned the always
zero @ret.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reviewed-by: NVladimir Davydov <vdavydov@parallels.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

10d53c74

memcg: collect kmem bypass conditions into __memcg_kmem_bypass() · cbfb4798

由 Tejun Heo 提交于 11月 05, 2015

memcg_kmem_newpage_charge() and memcg_kmem_get_cache() are testing the
same series of conditions to decide whether to bypass kmem accounting.
Collect the tests into __memcg_kmem_bypass().

This is pure refactoring.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reviewed-by: NVladimir Davydov <vdavydov@parallels.com>
Acked-by: NMichal Hocko <mhocko@suse.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

cbfb4798

memcg: punt high overage reclaim to return-to-userland path · b23afb93

由 Tejun Heo 提交于 11月 05, 2015

Currently, try_charge() tries to reclaim memory synchronously when the
high limit is breached; however, if the allocation doesn't have
__GFP_WAIT, synchronous reclaim is skipped.  If a process performs only
speculative allocations, it can blow way past the high limit.  This is
actually easily reproducible by simply doing "find /".  slab/slub
allocator tries speculative allocations first, so as long as there's
memory which can be consumed without blocking, it can keep allocating
memory regardless of the high limit.

This patch makes try_charge() always punt the over-high reclaim to the
return-to-userland path.  If try_charge() detects that high limit is
breached, it adds the overage to current->memcg_nr_pages_over_high and
schedules execution of mem_cgroup_handle_over_high() which performs
synchronous reclaim from the return-to-userland path.

As long as kernel doesn't have a run-away allocation spree, this should
provide enough protection while making kmemcg behave more consistently.
It also has the following benefits.

- All over-high reclaims can use GFP_KERNEL regardless of the specific
  gfp mask in use, e.g. GFP_NOFS, when the limit was breached.

- It copes with prio inversion.  Previously, a low-prio task with
  small memory.high might perform over-high reclaim with a bunch of
  locks held.  If a higher prio task needed any of these locks, it
  would have to wait until the low prio task finished reclaim and
  released the locks.  By handing over-high reclaim to the task exit
  path this issue can be avoided.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NMichal Hocko <mhocko@kernel.org>
Reviewed-by: NVladimir Davydov <vdavydov@parallels.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b23afb93

memcg: flatten task_struct->memcg_oom · 626ebc41

由 Tejun Heo 提交于 11月 05, 2015

task_struct->memcg_oom is a sub-struct containing fields which are used
for async memcg oom handling.  Most task_struct fields aren't packaged
this way and it can lead to unnecessary alignment paddings.  This patch
flattens it.

* task.memcg_oom.memcg          -> task.memcg_in_oom
* task.memcg_oom.gfp_mask	-> task.memcg_oom_gfp_mask
* task.memcg_oom.order          -> task.memcg_oom_order
* task.memcg_oom.may_oom        -> task.memcg_may_oom

In addition, task.memcg_may_oom is relocated to where other bitfields are
which reduces the size of task_struct.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NMichal Hocko <mhocko@suse.com>
Reviewed-by: NVladimir Davydov <vdavydov@parallels.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

626ebc41

mm/mmap.c: remove useless statement "vma = NULL" in find_vma() · 55e1ceaf

由 Chen Gang 提交于 11月 05, 2015

Before the main loop, vma is already is NULL.  There is no need to set it
to NULL again.
Signed-off-by: NChen Gang <gang.chen.5i5j@gmail.com>
Reviewed-by: NOleg Nesterov <oleg@redhat.com>
Acked-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

55e1ceaf

uaccess: reimplement probe_kernel_address() using probe_kernel_read() · 0ab32b6f

由 Andrew Morton 提交于 11月 05, 2015

probe_kernel_address() is basically the same as the (later added)
probe_kernel_read().

The return value on EFAULT is a bit different: probe_kernel_address()
returns number-of-bytes-not-copied whereas probe_kernel_read() returns
-EFAULT.  All callers have been checked, none cared.

probe_kernel_read() can be overridden by the architecture whereas
probe_kernel_address() cannot.  parisc, blackfin and um do this, to insert
additional checking.  Hence this patch possibly fixes obscure bugs,
although there are only two probe_kernel_address() callsites outside
arch/.

My first attempt involved removing probe_kernel_address() entirely and
converting all callsites to use probe_kernel_read() directly, but that got
tiresome.

This patch shrinks mm/slab_common.o by 218 bytes.  For a single
probe_kernel_address() callsite.

Cc: Steven Miao <realmz6@gmail.com>
Cc: Jeff Dike <jdike@addtoit.com>
Cc: Richard Weinberger <richard@nod.at>
Cc: "James E.J. Bottomley" <jejb@parisc-linux.org>
Cc: Helge Deller <deller@gmx.de>
Cc: Ingo Molnar <mingo@elte.hu>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0ab32b6f

mm/mlock.c: reorganize mlockall() return values and remove goto-out label · 86d2adcc

由 Alexey Klimov 提交于 11月 05, 2015

In mlockall syscall wrapper after out-label for goto code just doing
return. Remove goto out statements and return error values directly.

Also instead of rewriting ret variable before every if-check move returns
to 'error'-like path under if-check.

Objdump asm listing showed me reducing by few asm lines. Object file size
descreased from 220592 bytes to 220528 bytes for me (for aarch64).
Signed-off-by: NAlexey Klimov <klimov.linux@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

86d2adcc

mm/kmemleak.c: remove unneeded initialization of object to NULL · 9fbed254

由 Alexey Klimov 提交于 11月 05, 2015

Few lines below object is reinitialized by lookup_object() so we don't
need to init it by NULL in the beginning of find_and_get_object().
Signed-off-by: NAlexey Klimov <alexey.klimov@linaro.org>
Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9fbed254

mm: slab: only move management objects off-slab for sizes larger than KMALLOC_MIN_SIZE · d4322d88

由 Catalin Marinas 提交于 11月 05, 2015

On systems with a KMALLOC_MIN_SIZE of 128 (arm64, some mips and powerpc
configurations defining ARCH_DMA_MINALIGN to 128), the first
kmalloc_caches[] entry to be initialised after slab_early_init = 0 is
"kmalloc-128" with index 7.  Depending on the debug kernel configuration,
sizeof(struct kmem_cache) can be larger than 128 resulting in an
INDEX_NODE of 8.

Commit 8fc9cf42 ("slab: make more slab management structure off the
slab") enables off-slab management objects for sizes starting with
PAGE_SIZE >> 5 (128 bytes for a 4KB page configuration) and the creation
of the "kmalloc-128" cache would try to place the management objects
off-slab.  However, since KMALLOC_MIN_SIZE is already 128 and
freelist_size == 32 in __kmem_cache_create(), kmalloc_slab(freelist_size)
returns NULL (kmalloc_caches[7] not populated yet).  This triggers the
following bug on arm64:

  kernel BUG at /work/Linux/linux-2.6-aarch64/mm/slab.c:2283!
  Internal error: Oops - BUG: 0 [#1] SMP
  Modules linked in:
  CPU: 0 PID: 0 Comm: swapper Not tainted 4.3.0-rc4+ #540
  Hardware name: Juno (DT)
  PC is at __kmem_cache_create+0x21c/0x280
  LR is at __kmem_cache_create+0x210/0x280
  [...]
  Call trace:
    __kmem_cache_create+0x21c/0x280
    create_boot_cache+0x48/0x80
    create_kmalloc_cache+0x50/0x88
    create_kmalloc_caches+0x4c/0xf4
    kmem_cache_init+0x100/0x118
    start_kernel+0x214/0x33c

This patch introduces an OFF_SLAB_MIN_SIZE definition to avoid off-slab
management objects for sizes equal to or smaller than KMALLOC_MIN_SIZE.

Fixes: 8fc9cf42 ("slab: make more slab management structure off the slab")
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Reported-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Acked-by: NChristoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: <stable@vger.kernel.org>	[3.15+]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d4322d88

mm/slub: calculate start order with reserved in consideration · 9f835703

由 Wei Yang 提交于 11月 05, 2015

In slub_order(), the order starts from max(min_order,
get_order(min_objects * size)).  When (min_objects * size) has different
order from (min_objects * size + reserved), it will skip this order via a
check in the loop.

This patch optimizes this a little by calculating the start order with
`reserved' in consideration and removing the check in loop.
Signed-off-by: NWei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: NChristoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9f835703

mm/slub: use get_order() instead of fls() · 033fd1bd

由 Wei Yang 提交于 11月 05, 2015

get_order() is more easy to understand.

This patch just replaces it.
Signed-off-by: NWei Yang <weiyang@linux.vnet.ibm.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Reviewed-by: NPekka Enberg <penberg@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

033fd1bd

mm/slub: correct the comment in calculate_order() · 422ff4d7

由 Wei Yang 提交于 11月 05, 2015

In calculate_order(), it tries to calculate the best order by adjusting
the fraction and min_objects.  On each iteration on min_objects, fraction
iterates on 16, 8, 4.  Which means the acceptable waste increases with
1/16, 1/8, 1/4.

This patch corrects the comment according to the code.
Signed-off-by: NWei Yang <weiyang@linux.vnet.ibm.com>
Acked-by: NChristoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

422ff4d7

mm/slab_common.c: initialize kmem_cache pointer to NULL · 40911a79

由 Alexandru Moise 提交于 11月 05, 2015

The assignment to NULL within the error condition was written in a 2014
patch to suppress a compiler warning.  However it would be cleaner to just
initialize the kmem_cache to NULL and just return it in case of an error
condition.
Signed-off-by: NAlexandru Moise <00moses.alexander00@gmail.com>
Acked-by: NChristoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Acked-by: NDavid Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

40911a79

Doc/slub: document slabinfo-gnuplot.sh script · 76f8ec71

由 Sergey Senozhatsky 提交于 11月 05, 2015

Add documentation on how to use slabinfo-gnuplot.sh script.
Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Acked-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

76f8ec71

tools/vm/slabinfo: gnuplot slabifo extended stat · 4a981abd

由 Sergey Senozhatsky 提交于 11月 05, 2015

GNUplot `slabinfo -X' stats, collected, for example, using the
following command:
  while [ 1 ]; do slabinfo -X >> stats; sleep 1; done

`slabinfo-gnuplot.sh stats' pre-processes collected records
and generate graphs (totals, slabs sorted by size, slabs
sorted by size).

Graphs can be [individually] regenerate with different samples
range and graph width-heigh (-r %d,%d and -s %d,%d options).

To visually compare N `totals' graphs:
  slabinfo-gnuplot.sh -t FILE1-totals FILE2-totals ... FILEN-totals
Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4a981abd

tools/vm/slabinfo: cosmetic globals cleanup · 2cee611a

由 Sergey Senozhatsky 提交于 11月 05, 2015

checkpatch.pl complains about globals being explicitly zeroed
out: "ERROR: do not initialise globals to 0 or NULL".

New globals, introduced in this patch set, have no explicit 0
initialization; clean up the old ones to make it less hairy.
Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2cee611a

tools/vm/slabinfo: output sizes in bytes · a8ea0bf1

由 Sergey Senozhatsky 提交于 11月 05, 2015

Introduce "-B|--Bytes" opt to disable store_size() dynamic
size scaling and report size in bytes instead.

This `expands' the interface a bit, it's impossible to use
printf("%6s") anymore to output sizes.

Example:

slabinfo -X -N 2
 Slabcache Totals
 ----------------
 Slabcaches :              91   Aliases  :         119->69   Active:     63
 Memory used:       199798784   # Loss   :        10689376   MRatio:     5%
 # Objects  :          324301   # PartObj:           18151   ORatio:     5%

 Per Cache         Average              Min              Max            Total
 ----------------------------------------------------------------------------
 #Objects             5147                1            89068           324301
 #Slabs                199                1             3886            12537
 #PartSlab              12                0              240              778
 %PartSlab             32%               0%             100%               6%
 PartObjs                5                0             4569            18151
 % PartObj             26%               0%             100%               5%
 Memory            3171409             8192        127336448        199798784
 Used              3001736              160        121429728        189109408
 Loss               169672                0          5906720         10689376

 Per Object        Average              Min              Max
 -----------------------------------------------------------
 Memory                585                8             8192
 User                  583                8             8192
 Loss                    2                0               64

 Slabs sorted by size
 --------------------
 Name                   Objects Objsize           Space Slabs/Part/Cpu  O/S O %Fr %Ef Flg
 ext4_inode_cache         69948    1736       127336448      3871/0/15   18 3   0  95 a
 dentry                   89068     288        26058752      3164/0/17   28 1   0  98 a

 Slabs sorted by loss
 --------------------
 Name                   Objects Objsize            Loss Slabs/Part/Cpu  O/S O %Fr %Ef Flg
 ext4_inode_cache         69948    1736         5906720      3871/0/15   18 3   0  95 a
 inode_cache              11628     864          537472        642/0/4   18 2   0  94 a

Besides, store_size() does not use powers of two for G/M/K

    if (value > 1000000000UL) {
            divisor = 100000000UL;
            trailer = 'G';
    } else if (value > 1000000UL) {
            divisor = 100000UL;
            trailer = 'M';
    } else if (value > 1000UL) {
            divisor = 100;
            trailer = 'K';
    }
Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a8ea0bf1

tools/vm/slabinfo: introduce extended totals mode · 016c6cdf

由 Sergey Senozhatsky 提交于 11月 05, 2015

Add "-X|--Xtotals" opt to output extended totals summary,
which includes:
-- totals summary
-- slabs sorted by size
-- slabs sorted by loss (waste)

Example:
=======

slabinfo --X -N 1
  Slabcache Totals
  ----------------
  Slabcaches :  91      Aliases  : 120->69  Active:  65
  Memory used: 568.3M   # Loss   :  30.4M   MRatio:     5%
  # Objects  : 920.1K   # PartObj: 161.2K   ORatio:    17%

  Per Cache    Average         Min         Max       Total
  ---------------------------------------------------------
  #Objects       14.1K           1      227.8K      920.1K
  #Slabs           533           1       11.7K       34.7K
  #PartSlab         86           0        4.3K        5.6K
  %PartSlab        24%          0%        100%         16%
  PartObjs          17           0      129.3K      161.2K
  % PartObj        17%          0%        100%         17%
  Memory          8.7M        8.1K      384.7M      568.3M
  Used            8.2M         160      366.5M      537.9M
  Loss          468.8K           0       18.2M       30.4M

  Per Object   Average         Min         Max
  ---------------------------------------------
  Memory           587           8        8.1K
  User             584           8        8.1K
  Loss               2           0          64

  Slabs sorted by size
  ----------------------
  Name                   Objects Objsize    Space Slabs/Part/Cpu  O/S O %Fr %Ef Flg
  ext4_inode_cache        211142    1736   384.7M    11732/40/10   18 3   0  95 a

  Slabs sorted by loss
  ----------------------
  Name                   Objects Objsize    Loss Slabs/Part/Cpu  O/S O %Fr %Ef Flg
  ext4_inode_cache        211142    1736    18.2M    11732/40/10   18 3   0  95 a
Signed-off-by: NSergey Senozhatsky <sergey.senozhatsky@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

016c6cdf

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功