提交 · 121d1ba0a019e1465a53533aea133b1b0f6b442d · openanolis / cloud-kernel

01 8月, 2012 40 次提交

mm, oom: fix potential killing of thread that is disabled from oom killing · 121d1ba0

由 David Rientjes 提交于 7月 31, 2012

/proc/sys/vm/oom_kill_allocating_task will immediately kill current when
the oom killer is called to avoid a potentially expensive tasklist scan
for large systems.

Currently, however, it is not checking current's oom_score_adj value which
may be OOM_SCORE_ADJ_MIN, meaning that it has been disabled from oom
killing.

This patch avoids killing current in such a condition and simply falls
back to the tasklist scan since memory still needs to be freed.
Signed-off-by: NDavid Rientjes <rientjes@google.com>
Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

121d1ba0

mm: clear pages_scanned only if draining a pcp adds pages to the buddy allocator again · 2a13515c

由 KOSAKI Motohiro 提交于 7月 31, 2012

commit 2ff754fa ("mm: clear pages_scanned only if draining a pcp adds
pages to the buddy allocator again") fixed one free_pcppages_bulk()
misuse.  But two another miuse still exist.

This patch fixes it.
Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: NDavid Rientjes <rientjes@google.com>
Acked-by: NMel Gorman <mel@csn.ul.ie>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Reviewed-by: NMinchan Kim <minchan@kernel.org>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Rik van Riel <riel@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2a13515c

mm, fadvise: don't return -EINVAL when filesystem cannot implement fadvise() · 3d3727cd

由 KOSAKI Motohiro 提交于 7月 31, 2012

Eric Wong reported his test suite failex when /tmp is tmpfs.

https://lkml.org/lkml/2012/2/24/479

Currentlt the input check of POSIX_FADV_WILLNEED has two problems.

- requires a_ops->readpage.  But in fact, force_page_cache_readahead()
  requires that the target filesystem has either ->readpage or ->readpages.

- returns -EINVAL when the filesystem doesn't have ->readpage.  But
  posix says that fadvise is merely a hint.  Thus fadvise() should return
  0 if filesystem has no means of implementing fadvise().  The userland
  application should not know nor care whcih type of filesystem backs the
  TMPDIR directory, as Eric pointed out.  There is nothing which userspace
  can do to solve this error.

So change the return value to 0 when filesytem doesn't support readahead.

[akpm@linux-foundation.org: checkpatch fixes]
Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Cc: Hugh Dickins <hughd@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Signed-off-by: NEric Wong <normalperson@yhbt.net>
Tested-by: NEric Wong <normalperson@yhbt.net>
Reviewed-by: NWanlong Gao <gaowanlong@cn.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3d3727cd

mm/compaction: cleanup on compaction_deferred · c59e2610

由 Gavin Shan 提交于 7月 31, 2012

When CONFIG_COMPACTION is enabled, compaction_deferred() tries to
recalculate the deferred limit again, which isn't necessary.

When CONFIG_COMPACTION is disabled, compaction_deferred() should return
"true" or "false" since it has "bool" for its return value.
Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com>
Acked-by: NMinchan Kim <minchan@kernel.org>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Acked-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c59e2610

memcg: make mem_cgroup_force_empty_list() return bool · 3c935d18

由 KAMEZAWA Hiroyuki 提交于 7月 31, 2012

mem_cgroup_force_empty_list() just returns 0 or -EBUSY and -EBUSY
indicates 'you need to retry'.  Make mem_cgroup_force_empty_list() return
a bool to simplify the logic.

[akpm@linux-foundation.org: rework mem_cgroup_force_empty_list()'s comment]
Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3c935d18

memcg: mem_cgroup_move_parent() doesn't need gfp_mask · 6068bf01

由 KAMEZAWA Hiroyuki 提交于 7月 31, 2012

Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6068bf01

memcg: clean up force_empty_list() return value check · d845aa2c

由 Kamezawa Hiroyuki 提交于 7月 31, 2012

After bf544fdc241da8 "memcg: move charges to root cgroup if
use_hierarchy=0 in mem_cgroup_move_hugetlb_parent()"
mem_cgroup_move_parent() returns only -EBUSY or -EINVAL.  So we can remove
the -ENOMEM and -EINTR checks.
Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: NMichal Hocko <mhocko@suse.cz>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d845aa2c

memcg: remove check for signal_pending() during rmdir() · 59b8e85c

由 Kamezawa Hiroyuki 提交于 7月 31, 2012

After bf544fdc241da8 "memcg: move charges to root cgroup if
use_hierarchy=0 in mem_cgroup_move_hugetlb_parent()", no memory reclaim
will occur when removing a memory cgroup.  If -EINTR is returned here,
cgroup will show a warning.

We don't need to handle any user interruption signal.  Remove this.
Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Acked-by: NMichal Hocko <mhocko@suse.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

59b8e85c

mm/memblock.c:memblock_double_array(): cosmetic cleanups · fd07383b

由 Andrew Morton 提交于 7月 31, 2012

This function is an 80-column eyesore, quite unnecessarily.  Clean that
up, and use standard comment layout style.

Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Greg Pearson <greg.pearson@hp.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

fd07383b

mm, oom: do not schedule if current has been killed · 4f774b91

由 David Rientjes 提交于 7月 31, 2012

The oom killer currently schedules away from current in an uninterruptible
sleep if it does not have access to memory reserves. It's possible that
current was killed because it shares memory with the oom killed thread or
because it was killed by the user in the interim, however.

This patch only schedules away from current if it does not have a pending
kill, i.e. if it does not share memory with the oom killed thread. It's
possible that it will immediately retry its memory allocation and fail,
but it will immediately be given access to memory reserves if it calls the
oom killer again.

This prevents the delay of memory freeing when threads that share memory
with the oom killed thread get unnecessarily scheduled.
Signed-off-by: NDavid Rientjes <rientjes@google.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Acked-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4f774b91

hugetlb/cgroup: remove exclude and wakeup rmdir calls from migrate · 75754681

由 Aneesh Kumar K.V 提交于 7月 31, 2012

We already hold the hugetlb_lock.  That should prevent a parallel cgroup
rmdir from touching page's hugetlb cgroup.  So remove the exclude and
wakeup calls.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: NMichal Hocko <mhocko@suse.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

75754681

hugetlb/cgroup: assign the page hugetlb cgroup when we move the page to active list. · 94ae8ba7

由 Aneesh Kumar K.V 提交于 7月 31, 2012

A page's hugetlb cgroup assignment and movement to the active list should
occur with hugetlb_lock held.  Otherwise when we remove the hugetlb cgroup
we will iterate the active list and find pages with NULL hugetlb cgroup
values.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: NMichal Hocko <mhocko@suse.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

94ae8ba7

hugetlb: move all the in use pages to active list · 79dbb236

由 Aneesh Kumar K.V 提交于 7月 31, 2012

When we fail to allocate pages from the reserve pool, hugetlb tries to
allocate huge pages using alloc_buddy_huge_page.  Add these to the active
list.  We also need to add the huge page we allocate when we soft offline
the oldpage to active list.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: NMichal Hocko <mhocko@suse.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

79dbb236

hugetlb/cgroup: add HugeTLB controller documentation · 585e27ea

由 Aneesh Kumar K.V 提交于 7月 31, 2012

Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Reviewed-by: NMichal Hocko <mhocko@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

585e27ea

hugetlb/cgroup: migrate hugetlb cgroup info from oldpage to new page during migration · 8e6ac7fa

由 Aneesh Kumar K.V 提交于 7月 31, 2012

With HugeTLB pages, hugetlb cgroup is uncharged in compound page
destructor.  Since we are holding a hugepage reference, we can be sure
that old page won't get uncharged till the last put_page().
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8e6ac7fa

hugetlb/cgroup: add hugetlb cgroup control files · abb8206c

由 Aneesh Kumar K.V 提交于 7月 31, 2012

Add the control files for hugetlb controller

[akpm@linux-foundation.org: s/CONFIG_CGROUP_HUGETLB_RES_CTLR/CONFIG_MEMCG_HUGETLB/g]
[akpm@linux-foundation.org: s/CONFIG_MEMCG_HUGETLB/CONFIG_CGROUP_HUGETLB/]
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hillf Danton <dhillf@gmail.com>
Reviewed-by: NMichal Hocko <mhocko@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

abb8206c

hugetlb/cgroup: add support for cgroup removal · da1def55

由 Aneesh Kumar K.V 提交于 7月 31, 2012

Add support for cgroup removal.  If we don't have parent cgroup, the
charges are moved to root cgroup.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hillf Danton <dhillf@gmail.com>
Reviewed-by: NMichal Hocko <mhocko@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

da1def55

hugetlb/cgroup: add charge/uncharge routines for hugetlb cgroup · 6d76dcf4

由 Aneesh Kumar K.V 提交于 7月 31, 2012

Add the charge and uncharge routines for hugetlb cgroup.  We do cgroup
charging in page alloc and uncharge in compound page destructor.
Assigning page's hugetlb cgroup is protected by hugetlb_lock.

[liwp@linux.vnet.ibm.com: add huge_page_order check to avoid incorrect uncharge]
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Signed-off-by: NWanpeng Li <liwp.linux@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6d76dcf4

hugetlb/cgroup: add the cgroup pointer to page lru · 9dd540e2

由 Aneesh Kumar K.V 提交于 7月 31, 2012

Add the hugetlb cgroup pointer to 3rd page lru.next.  This limit the usage
to hugetlb cgroup to only hugepages with 3 or more normal pages.  I guess
that is an acceptable limitation.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hillf Danton <dhillf@gmail.com>
Reviewed-by: NMichal Hocko <mhocko@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9dd540e2

mm/hugetlb: add new HugeTLB cgroup · 2bc64a20

由 Aneesh Kumar K.V 提交于 7月 31, 2012

Implement a new controller that allows us to control HugeTLB allocations.
The extension allows to limit the HugeTLB usage per control group and
enforces the controller limit during page fault.  Since HugeTLB doesn't
support page reclaim, enforcing the limit at page fault time implies that,
the application will get SIGBUS signal if it tries to access HugeTLB pages
beyond its limit.  This requires the application to know beforehand how
much HugeTLB pages it would require for its use.

The charge/uncharge calls will be added to HugeTLB code in later patch.
Support for cgroup removal will be added in later patches.

[akpm@linux-foundation.org: s/CONFIG_CGROUP_HUGETLB_RES_CTLR/CONFIG_MEMCG_HUGETLB/g]
[akpm@linux-foundation.org: s/CONFIG_MEMCG_HUGETLB/CONFIG_CGROUP_HUGETLB/g]
Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Reviewed-by: NMichal Hocko <mhocko@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2bc64a20

hugetlb: make some static variables global · c3f38a38

由 Aneesh Kumar K.V 提交于 7月 31, 2012

We will use them later in hugetlb_cgroup.c
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Cc: David Rientjes <rientjes@google.com>
Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hillf Danton <dhillf@gmail.com>
Reviewed-by: NMichal Hocko <mhocko@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c3f38a38

hugetlb: add a list for tracking in-use HugeTLB pages · 0edaecfa

由 Aneesh Kumar K.V 提交于 7月 31, 2012

hugepage_activelist will be used to track currently used HugeTLB pages.
We need to find the in-use HugeTLB pages to support HugeTLB cgroup removal.
On cgroup removal we update the page's HugeTLB cgroup to point to parent
cgroup.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Reviewed-by: NMichal Hocko <mhocko@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0edaecfa

hugetlb: simplify migrate_huge_page() · 189ebff2

由 Aneesh Kumar K.V 提交于 7月 31, 2012

Since we migrate only one hugepage, don't use linked list for passing the
page around.  Directly pass the page that need to be migrated as argument.
This also removes the usage of page->lru in the migrate path.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Reviewed-by: NMichal Hocko <mhocko@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

189ebff2

hugetlb: use mmu_gather instead of a temporary linked list for accumulating pages · 24669e58

由 Aneesh Kumar K.V 提交于 7月 31, 2012

Use a mmu_gather instead of a temporary linked list for accumulating pages
when we unmap a hugepage range
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

24669e58

hugetlb: add an inline helper for finding hstate index · 972dc4de

由 Aneesh Kumar K.V 提交于 7月 31, 2012

Add an inline helper and use it in the code.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: NDavid Rientjes <rientjes@google.com>
Acked-by: NMichal Hocko <mhocko@suse.cz>
Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

972dc4de

hugetlb: don't use ERR_PTR with VM_FAULT* values · 76dcee75

由 Aneesh Kumar K.V 提交于 7月 31, 2012

The current use of VM_FAULT_* codes with ERR_PTR requires us to ensure
VM_FAULT_* values will not exceed MAX_ERRNO value.  Decouple the
VM_FAULT_* values from MAX_ERRNO.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: NHillf Danton <dhillf@gmail.com>
Acked-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Michal Hocko <mhocko@suse.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

76dcee75

hugetlb: rename max_hstate to hugetlb_max_hstate · 47d38344

由 Aneesh Kumar K.V 提交于 7月 31, 2012

This patchset implements a cgroup resource controller for HugeTLB pages.
The controller allows to limit the HugeTLB usage per control group and
enforces the controller limit during page fault.  Since HugeTLB doesn't
support page reclaim, enforcing the limit at page fault time implies that,
the application will get SIGBUS signal if it tries to access HugeTLB pages
beyond its limit.  This requires the application to know beforehand how
much HugeTLB pages it would require for its use.

The goal is to control how many HugeTLB pages a group of task can
allocate.  It can be looked at as an extension of the existing quota
interface which limits the number of HugeTLB pages per hugetlbfs
superblock.  HPC job scheduler requires jobs to specify their resource
requirements in the job file.  Once their requirements can be met, job
schedulers like (SLURM) will schedule the job.  We need to make sure that
the jobs won't consume more resources than requested.  If they do we
should either error out or kill the application.

This patch:

Rename max_hstate to hugetlb_max_hstate.  We will be using this from other
subsystems like hugetlb controller in later patches.
Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Acked-by: NDavid Rientjes <rientjes@google.com>
Reviewed-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: NHillf Danton <dhillf@gmail.com>
Acked-by: NMichal Hocko <mhocko@suse.cz>
Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

47d38344

mm: prepare for removal of obsolete /proc/sys/vm/nr_pdflush_threads · 3965c9ae

由 Wanpeng Li 提交于 7月 31, 2012

Since per-BDI flusher threads were introduced in 2.6, the pdflush
mechanism is not used any more.  But the old interface exported through
/proc/sys/vm/nr_pdflush_threads still exists and is obviously useless.

For back-compatibility, printk warning information and return 2 to notify
the users that the interface is removed.
Signed-off-by: NWanpeng Li <liwp@linux.vnet.ibm.com>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3965c9ae

mm/buddy: cleanup on should_fail_alloc_page · deaf386e

由 Gavin Shan 提交于 7月 31, 2012

Currently, function should_fail() has "bool" for its return value, so it's
reasonable to change the return value of function should_fail_alloc_page()
into "bool" as well.

The patch does cleanup on function should_fail_alloc_page() to have "bool"
for its return value.
Signed-off-by: NGavin Shan <shangw@linux.vnet.ibm.com>
Acked-by: NDavid Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

deaf386e

mm: account the total_vm in the vm_stat_account() · 44de9d0c

由 Huang Shijie 提交于 7月 31, 2012

vm_stat_account() accounts the shared_vm, stack_vm and reserved_vm now.
But we can also account for total_vm in the vm_stat_account() which makes
the code tidy.

Even for mprotect_fixup(), we can get the right result in the end.
Signed-off-by: NHuang Shijie <shijie8@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

44de9d0c

documentation: update how page-cluster affects swap I/O · df858fa8

由 Christian Ehrhardt 提交于 7月 31, 2012

Fix of the documentation of /proc/sys/vm/page-cluster to match the
behavior of the code and add some comments about what the tunable will
change in that behavior.
Signed-off-by: NChristian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Acked-by: NJens Axboe <axboe@kernel.dk>
Reviewed-by: NMinchan Kim <minchan@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

df858fa8

swap: allow swap readahead to be merged · 3fb5c298

由 Christian Ehrhardt 提交于 7月 31, 2012

Swap readahead works fine, but the I/O to disk is almost always done in
page size requests, despite the fact that readahead submits
1<<page-cluster pages at a time.

On older kernels the old per device plugging behavior might have captured
this and merged the requests, but currently all comes down to much more
I/Os than required.

On a single device this might not be an issue, but as soon as a server
runs on shared san resources savin I/Os not only improves swapin
throughput but also provides a lower resource utilization.

With a load running KVM in a lot of memory overcommitment (the hot memory
is 1.5 times the host memory) swapping throughput improves significantly
and the lead feels more responsive as well as achieves more throughput.

In a test setup with 16 swap disks running blocktrace on one of those disks
shows the improved merging:
Prior:
Reads Queued:     560,888,    2,243MiB  Writes Queued:     226,242,  904,968KiB
Read Dispatches:  544,701,    2,243MiB  Write Dispatches:  159,318,  904,968KiB
Reads Requeued:         0               Writes Requeued:         0
Reads Completed:  544,716,    2,243MiB  Writes Completed:  159,321,  904,980KiB
Read Merges:       16,187,   64,748KiB  Write Merges:       61,744,  246,976KiB
IO unplugs:       149,614               Timer unplugs:       2,940

With the patch:
Reads Queued:     734,315,    2,937MiB  Writes Queued:     300,188,    1,200MiB
Read Dispatches:  214,972,    2,937MiB  Write Dispatches:  215,176,    1,200MiB
Reads Requeued:         0               Writes Requeued:         0
Reads Completed:  214,971,    2,937MiB  Writes Completed:  215,177,    1,200MiB
Read Merges:      519,343,    2,077MiB  Write Merges:       73,325,  293,300KiB
IO unplugs:       337,130               Timer unplugs:      11,184

I got ~10% to ~40% more throughput in my cases and at the same time much
lower cpu consumption when broken down per transferred kilobyte (the
majority of that due to saved interrupts and better cache handling).  In a
shared SAN others might get an additional benefit as well, because this
now causes less protocol overhead.
Signed-off-by: NChristian Ehrhardt <ehrhardt@linux.vnet.ibm.com>
Acked-by: NRik van Riel <riel@redhat.com>
Acked-by: NJens Axboe <axboe@kernel.dk>
Reviewed-by: NMinchan Kim <minchan@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3fb5c298

memcg: remove MEM_CGROUP_CHARGE_TYPE_FORCE · a7d6f529

由 Kamezawa Hiroyuki 提交于 7月 31, 2012

There are no users since commit b2402857 ("memcg: remove PCG_CACHE").
Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: NHugh Dickins <hughd@google.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a7d6f529

memcg: rename MEM_CGROUP_CHARGE_TYPE_MAPPED as MEM_CGROUP_CHARGE_TYPE_ANON · 41326c17

由 Kamezawa Hiroyuki 提交于 7月 31, 2012

Now, in memcg, 2 "MAPPED" enum/macro are found
 MEM_CGROUP_CHARGE_TYPE_MAPPED
 MEM_CGROUP_STAT_FILE_MAPPED

Thier names looks similar to each other but the former is used for
accounting anonymous memory. rename it as TYPE_ANON.
Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: NMichal Hocko <mhocko@suse.cz>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

41326c17

memcg: rename MEM_CGROUP_STAT_SWAPOUT as MEM_CGROUP_STAT_SWAP · bff6bb83

由 Kamezawa Hiroyuki 提交于 7月 31, 2012

MEM_CGROUP_STAT_SWAPOUT represents the usage of swap rather than
the number of swap-out events. Rename it to be MEM_CGROUP_STAT_SWAP.
Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Acked-by: NMichal Hocko <mhocko@suse.cz>
Acked-by: NHugh Dickins <hughd@google.com>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bff6bb83

mm: make vb_alloc() more foolproof · aa91c4d8

由 Jan Kara 提交于 7月 31, 2012

If someone calls vb_alloc() (or vm_map_ram() for that matter) to allocate
0 bytes (0 pages), get_order() returns BITS_PER_LONG - PAGE_CACHE_SHIFT
and interesting stuff happens.  So make debugging such problems easier and
warn about 0-size allocation.

[akpm@linux-foundation.org: use WARN_ON-return-value feature]
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

aa91c4d8

vmalloc: walk vmap_areas by sorted list instead of rb_next() · 92ca922f

由 Hong zhi guo 提交于 7月 31, 2012

There's a walk by repeating rb_next to find a suitable hole.  Could be
simply replaced by walk on the sorted vmap_area_list.  More simpler and
efficient.

Mutation of the list and tree only happens in pair within
__insert_vmap_area and __free_vmap_area, under protection of
vmap_area_lock.  The patch code is also under vmap_area_lock, so the list
walk is safe, and consistent with the tree walk.

Tested on SMP by repeating batch of vmalloc anf vfree for random sizes and
rounds for hours.
Signed-off-by: NHong Zhiguo <honkiko@gmail.com>
Cc: Nick Piggin <npiggin@kernel.dk>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

92ca922f

drivers/media/video/v4l2-ioctl.c: fix build · c2cddf99

由 Andrew Morton 提交于 7月 31, 2012

Fix zillions of these:

drivers/media/video/v4l2-ioctl.c:1848: error: unknown field 'func' specified in initializer
drivers/media/video/v4l2-ioctl.c:1848: warning: missing braces around initializer
drivers/media/video/v4l2-ioctl.c:1848: warning: (near initialization for 'v4l2_ioctls[0].<anonymous>')
drivers/media/video/v4l2-ioctl.c:1848: warning: initialization makes integer from pointer without a cast
drivers/media/video/v4l2-ioctl.c:1848: error: initializer element is not computable at load time
drivers/media/video/v4l2-ioctl.c:1848: error: (near initialization for 'v4l2_ioctls[0].<anonymous>.offset')

Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c2cddf99

xtensa: select generic atomic64_t support · c66af411

由 Fengguang Wu 提交于 7月 31, 2012

This will fix build errors:

block/blk-cgroup.c:609:2: error: unknown type name 'atomic64_t'
block/blk-cgroup.c:609:2: error: implicit declaration of function 'ATOMIC64_INIT' [-Werror=implicit-function-declaration]
Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
Cc: Chris Zankel <chris@zankel.net>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c66af411

fault-injection: fix failcmd.sh warning · d6b09e75

由 Akinobu Mita 提交于 7月 31, 2012

"fault-injection: add tool to run command with failslab or
fail_page_alloc" added tools/testing/fault-injection/failcmd.sh to make it
easier to inject slab/page allocation failures by fault injection.

failcmd.sh prints the following warning when running with arguments
for command.

	# ./failcmd.sh echo aaa
	failcmd.sh: line 209: [: echo: binary operator expected
	aaa

This warning is caused by an improper check whether at least one
parameter is left after parsing command options.

Fix it by testing the length of $1 instead of $@
Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d6b09e75

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功