提交 · 4e8f0a609677a25f504527e50981df146c5b3d08 · openeuler / raspberrypi-kernel

25 1月, 2013 14 次提交

workqueue: remove worker_pool->gcwq · 4e8f0a60

由 Tejun Heo 提交于 1月 24, 2013

The only remaining user of pool->gcwq is std_worker_pool_pri().
Reimplement it using get_gcwq() and remove worker_pool->gcwq.

This is part of an effort to remove global_cwq and make worker_pool
the top level abstraction, which in turn will help implementing worker
pools with user-specified attributes.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reviewed-by: NLai Jiangshan <laijs@cn.fujitsu.com>

4e8f0a60

workqueue: replace for_each_worker_pool() with for_each_std_worker_pool() · 38db41d9

由 Tejun Heo 提交于 1月 24, 2013

for_each_std_worker_pool() takes @cpu instead of @gcwq.

This is part of an effort to remove global_cwq and make worker_pool
the top level abstraction, which in turn will help implementing worker
pools with user-specified attributes.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reviewed-by: NLai Jiangshan <laijs@cn.fujitsu.com>

38db41d9

workqueue: make freezing/thawing per-pool · a1056305

由 Tejun Heo 提交于 1月 24, 2013

Instead of holding locks from both pools and then processing the pools
together, make freezing/thwaing per-pool - grab locks of one pool,
process it, release it and then proceed to the next pool.

While this patch changes processing order across pools, order within
each pool remains the same.  As each pool is independent, this
shouldn't break anything.

This is part of an effort to remove global_cwq and make worker_pool
the top level abstraction, which in turn will help implementing worker
pools with user-specified attributes.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reviewed-by: NLai Jiangshan <laijs@cn.fujitsu.com>

a1056305

workqueue: make hotplug processing per-pool · 94cf58bb

由 Tejun Heo 提交于 1月 24, 2013

Instead of holding locks from both pools and then processing the pools
together, make hotplug processing per-pool - grab locks of one pool,
process it, release it and then proceed to the next pool.

rebind_workers() is updated to take and process @pool instead of @gcwq
which results in a lot of de-indentation.  gcwq_claim_assoc_and_lock()
and its counterpart are replaced with in-line per-pool locking.

While this patch changes processing order across pools, order within
each pool remains the same.  As each pool is independent, this
shouldn't break anything.

This is part of an effort to remove global_cwq and make worker_pool
the top level abstraction, which in turn will help implementing worker
pools with user-specified attributes.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reviewed-by: NLai Jiangshan <laijs@cn.fujitsu.com>

94cf58bb

workqueue: move global_cwq->lock to worker_pool · d565ed63

由 Tejun Heo 提交于 1月 24, 2013

Move gcwq->lock to pool->lock.  The conversion is mostly
straight-forward.  Things worth noting are

* In many places, this removes the need to use gcwq completely.  pool
  is used directly instead.  get_std_worker_pool() is added to help
  some of these conversions.  This also leaves get_work_gcwq() without
  any user.  Removed.

* In hotplug and freezer paths, the pools belonging to a CPU are often
  processed together.  This patch makes those paths hold locks of all
  pools, with highpri lock nested inside, to keep the conversion
  straight-forward.  These nested lockings will be removed by
  following patches.

This is part of an effort to remove global_cwq and make worker_pool
the top level abstraction, which in turn will help implementing worker
pools with user-specified attributes.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reviewed-by: NLai Jiangshan <laijs@cn.fujitsu.com>

d565ed63

workqueue: move global_cwq->cpu to worker_pool · ec22ca5e

由 Tejun Heo 提交于 1月 24, 2013

Move gcwq->cpu to pool->cpu.  This introduces a couple places where
gcwq->pools[0].cpu is used.  These will soon go away as gcwq is
further reduced.

This is part of an effort to remove global_cwq and make worker_pool
the top level abstraction, which in turn will help implementing worker
pools with user-specified attributes.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reviewed-by: NLai Jiangshan <laijs@cn.fujitsu.com>

ec22ca5e

workqueue: move busy_hash from global_cwq to worker_pool · c9e7cf27

由 Tejun Heo 提交于 1月 24, 2013

There's no functional necessity for the two pools on the same CPU to
share the busy hash table.  It's also likely to be a bottleneck when
implementing pools with user-specified attributes.

This patch makes busy_hash per-pool.  The conversion is mostly
straight-forward.  Changes worth noting are,

* Large block of changes in rebind_workers() is moving the block
  inside for_each_worker_pool() as now there are separate hash tables
  for each pool.  This changes the order of operations but doesn't
  break anything.

* Thre for_each_worker_pool() loops in gcwq_unbind_fn() are combined
  into one.  This again changes the order of operaitons but doesn't
  break anything.

This is part of an effort to remove global_cwq and make worker_pool
the top level abstraction, which in turn will help implementing worker
pools with user-specified attributes.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reviewed-by: NLai Jiangshan <laijs@cn.fujitsu.com>

c9e7cf27

workqueue: record pool ID instead of CPU in work->data when off-queue · 7c3eed5c

由 Tejun Heo 提交于 1月 24, 2013

Currently, when a work item is off-queue, work->data records the CPU
it was last on, which is used to locate the last executing instance
for non-reentrance, flushing, etc.

We're in the process of removing global_cwq and making worker_pool the
top level abstraction.  This patch makes work->data point to the pool
it was last associated with instead of CPU.

After the previous WORK_OFFQ_POOL_CPU and worker_poo->id additions,
the conversion is fairly straight-forward.  WORK_OFFQ constants and
functions are modified to record and read back pool ID instead.
worker_pool_by_id() is added to allow looking up pool from ID.
get_work_pool() replaces get_work_gcwq(), which is reimplemented using
get_work_pool().  get_work_pool_id() replaces work_cpu().

This patch shouldn't introduce any observable behavior changes.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reviewed-by: NLai Jiangshan <laijs@cn.fujitsu.com>

7c3eed5c

workqueue: add worker_pool->id · 9daf9e67

由 Tejun Heo 提交于 1月 24, 2013

Add worker_pool->id which is allocated from worker_pool_idr.  This
will be used to record the last associated worker_pool in work->data.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reviewed-by: NLai Jiangshan <laijs@cn.fujitsu.com>

9daf9e67

workqueue: introduce WORK_OFFQ_CPU_NONE · 715b06b8

由 Tejun Heo 提交于 1月 24, 2013

Currently, when a work item is off queue, high bits of its data
encodes the last CPU it was on.  This is scheduled to be changed to
pool ID, which will make it impossible to use WORK_CPU_NONE to
indicate no association.

This patch limits the number of bits which are used for off-queue cpu
number to 31 (so that the max fits in an int) and uses the highest
possible value - WORK_OFFQ_CPU_NONE - to indicate no association.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reviewed-by: NLai Jiangshan <laijs@cn.fujitsu.com>

715b06b8

workqueue: make GCWQ_FREEZING a pool flag · 35b6bb63

由 Tejun Heo 提交于 1月 24, 2013

Make GCWQ_FREEZING a pool flag POOL_FREEZING.  This patch doesn't
change locking - FREEZING on both pools of a CPU are set or clear
together while holding gcwq->lock.  It shouldn't cause any functional
difference.

This leaves gcwq->flags w/o any flags.  Removed.

While at it, convert BUG_ON()s in freeze_workqueue_begin() and
thaw_workqueues() to WARN_ON_ONCE().

This is part of an effort to remove global_cwq and make worker_pool
the top level abstraction, which in turn will help implementing worker
pools with user-specified attributes.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reviewed-by: NLai Jiangshan <laijs@cn.fujitsu.com>

35b6bb63

workqueue: make GCWQ_DISASSOCIATED a pool flag · 24647570

由 Tejun Heo 提交于 1月 24, 2013

Make GCWQ_DISASSOCIATED a pool flag POOL_DISASSOCIATED.  This patch
doesn't change locking - DISASSOCIATED on both pools of a CPU are set
or clear together while holding gcwq->lock.  It shouldn't cause any
functional difference.

This is part of an effort to remove global_cwq and make worker_pool
the top level abstraction, which in turn will help implementing worker
pools with user-specified attributes.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reviewed-by: NLai Jiangshan <laijs@cn.fujitsu.com>

24647570

workqueue: use std_ prefix for the standard per-cpu pools · e34cdddb

由 Tejun Heo 提交于 1月 24, 2013

There are currently two worker pools per cpu (including the unbound
cpu) and they are the only pools in use.  New class of pools are
scheduled to be added and some pool related APIs will be added
inbetween.  Call the existing pools the standard pools and prefix them
with std_.  Do this early so that new APIs can use std_ prefix from
the beginning.

This patch doesn't introduce any functional difference.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reviewed-by: NLai Jiangshan <laijs@cn.fujitsu.com>

e34cdddb

workqueue: unexport work_cpu() · e2905b29

由 Tejun Heo 提交于 1月 24, 2013

This function no longer has any external users.  Unexport it.  It will
be removed later on.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reviewed-by: NLai Jiangshan <laijs@cn.fujitsu.com>

e2905b29

19 1月, 2013 3 次提交

workqueue: implement current_is_async() · 84b233ad

由 Tejun Heo 提交于 1月 18, 2013

This function queries whether %current is an async worker executing an
async item.  This will be used to implement warning on synchronous
request_module() from async workers.
Signed-off-by: NTejun Heo <tj@kernel.org>

84b233ad

workqueue: move struct worker definition to workqueue_internal.h · 2eaebdb3

由 Tejun Heo 提交于 1月 18, 2013

This will be used to implement an inline function to query whether
%current is a workqueue worker and, if so, allow determining which
work item it's executing.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>

2eaebdb3

workqueue: rename kernel/workqueue_sched.h to kernel/workqueue_internal.h · ea138446

由 Tejun Heo 提交于 1月 18, 2013

Workqueue wants to expose more interface internal to kernel/.  Instead
of adding a new header file, repurpose kernel/workqueue_sched.h.
Rename it to workqueue_internal.h and add include protector.

This patch doesn't introduce any functional changes.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>

ea138446

18 1月, 2013 1 次提交

workqueue: set PF_WQ_WORKER on rescuers · 111c225a

由 Tejun Heo 提交于 1月 17, 2013

PF_WQ_WORKER is used to tell scheduler that the task is a workqueue
worker and needs wq_worker_sleeping/waking_up() invoked on it for
concurrency management.  As rescuers never participate in concurrency
management, PF_WQ_WORKER wasn't set on them.

There's a need for an interface which can query whether %current is
executing a work item and if so which.  Such interface requires a way
to identify all tasks which may execute work items and PF_WQ_WORKER
will be used for that.  As all normal workers always have PF_WQ_WORKER
set, we only need to add it to rescuers.

As rescuers start with WORKER_PREP but never clear it, it's always
NOT_RUNNING and there's no need to worry about it interfering with
concurrency management even if PF_WQ_WORKER is set; however, unlike
normal workers, rescuers currently don't have its worker struct as
kthread_data().  It uses the associated workqueue_struct instead.
This is problematic as wq_worker_sleeping/waking_up() expect struct
worker at kthread_data().

This patch adds worker->rescue_wq and start rescuer kthreads with
worker struct as kthread_data and sets PF_WQ_WORKER on rescuers.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>

111c225a

20 12月, 2012 1 次提交

workqueue: fix find_worker_executing_work() brekage from hashtable conversion · 023f27d3

由 Tejun Heo 提交于 12月 19, 2012

42f8570f ("workqueue: use new hashtable implementation") incorrectly
made busy workers hashed by the pointer value of worker instead of
work.  This broke find_worker_executing_work() which in turn broke a
lot of fundamental operations of workqueue - non-reentrancy and
flushing among others.  The flush malfunction triggered warning in
disk event code in Fengguang's automated test.

 write_dev_root_ (3265) used greatest stack depth: 2704 bytes left
 ------------[ cut here ]------------
 WARNING: at /c/kernel-tests/src/stable/block/genhd.c:1574 disk_clear_events+0x\
cf/0x108()
 Hardware name: Bochs
 Modules linked in:
 Pid: 3328, comm: ata_id Not tainted 3.7.0-01930-gbff6343 #1167
 Call Trace:
  [<ffffffff810997c4>] warn_slowpath_common+0x83/0x9c
  [<ffffffff810997f7>] warn_slowpath_null+0x1a/0x1c
  [<ffffffff816aea77>] disk_clear_events+0xcf/0x108
  [<ffffffff811bd8be>] check_disk_change+0x27/0x59
  [<ffffffff822e48e2>] cdrom_open+0x49/0x68b
  [<ffffffff81ab0291>] idecd_open+0x88/0xb7
  [<ffffffff811be58f>] __blkdev_get+0x102/0x3ec
  [<ffffffff811bea08>] blkdev_get+0x18f/0x30f
  [<ffffffff811bebfd>] blkdev_open+0x75/0x80
  [<ffffffff8118f510>] do_dentry_open+0x1ea/0x295
  [<ffffffff8118f5f0>] finish_open+0x35/0x41
  [<ffffffff8119c720>] do_last+0x878/0xa25
  [<ffffffff8119c993>] path_openat+0xc6/0x333
  [<ffffffff8119cf37>] do_filp_open+0x38/0x86
  [<ffffffff81190170>] do_sys_open+0x6c/0xf9
  [<ffffffff8119021e>] sys_open+0x21/0x23
  [<ffffffff82c1c3d9>] system_call_fastpath+0x16/0x1b
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: NFengguang Wu <fengguang.wu@intel.com>
Cc: Sasha Levin <sasha.levin@oracle.com>

023f27d3

19 12月, 2012 2 次提交

workqueue: consider work function when searching for busy work items · a2c1c57b

由 Tejun Heo 提交于 12月 18, 2012

To avoid executing the same work item concurrenlty, workqueue hashes
currently busy workers according to their current work items and looks
up the the table when it wants to execute a new work item.  If there
already is a worker which is executing the new work item, the new item
is queued to the found worker so that it gets executed only after the
current execution finishes.

Unfortunately, a work item may be freed while being executed and thus
recycled for different purposes.  If it gets recycled for a different
work item and queued while the previous execution is still in
progress, workqueue may make the new work item wait for the old one
although the two aren't really related in any way.

In extreme cases, this false dependency may lead to deadlock although
it's extremely unlikely given that there aren't too many self-freeing
work item users and they usually don't wait for other work items.

To alleviate the problem, record the current work function in each
busy worker and match it together with the work item address in
find_worker_executing_work().  While this isn't complete, it ensures
that unrelated work items don't interact with each other and in the
very unlikely case where a twisted wq user triggers it, it's always
onto itself making the culprit easy to spot.
Signed-off-by: NTejun Heo <tj@kernel.org>
Reported-by: NAndrey Isakov <andy51@gmx.ru>
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=51701
Cc: stable@vger.kernel.org

a2c1c57b

workqueue: use new hashtable implementation · 42f8570f

由 Sasha Levin 提交于 12月 17, 2012

Switch workqueues to use the new hashtable implementation. This reduces the
amount of generic unrelated code in the workqueues.

This patch depends on d9b482c8 ("hashtable: introduce a small and naive
hashtable") which was merged in v3.6.
Acked-by: NTejun Heo <tj@kernel.org>
Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
Signed-off-by: NTejun Heo <tj@kernel.org>

42f8570f

18 12月, 2012 9 次提交

pidns: remove unused is_container_init() · a5ba911e

由 Gao feng 提交于 12月 17, 2012

Since commit 1cdcbec1 ("CRED: Neuter sys_capset()")
is_container_init() has no callers.
Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
Cc: David Howells <dhowells@redhat.com>
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Cc: James Morris <jmorris@namei.org>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a5ba911e

ptrace: introduce PTRACE_O_EXITKILL · 992fb6e1

由 Oleg Nesterov 提交于 12月 17, 2012

Ptrace jailers want to be sure that the tracee can never escape
from the control. However if the tracer dies unexpectedly the
tracee continues to run in potentially unsafe mode.

Add the new ptrace option PTRACE_O_EXITKILL. If the tracer exits
it sends SIGKILL to every tracee which has this bit set.

Note that the new option is not equal to the last-option << 1.  Because
currently all options have an event, and the new one starts the eventless
group.  It uses the random 20 bit, so we have the room for 12 more events,
but we can also add the new eventless options below this one.

Suggested by Amnon Shiloh.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Tested-by: NAmnon Shiloh <u3557@miso.sublimeip.com>
Cc: Denys Vlasenko <dvlasenk@redhat.com>
Cc: Michael Kerrisk <mtk.manpages@gmail.com>
Cc: Serge Hallyn <serge.hallyn@canonical.com>
Cc: Chris Evans <scarybeasts@gmail.com>
Cc: David Howells <dhowells@redhat.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

992fb6e1

compat: generic compat_sys_sched_rr_get_interval() implementation · 0ad50c38

由 Catalin Marinas 提交于 12月 17, 2012

This function is used by sparc, powerpc tile and arm64 for compat support.
 The patch adds a generic implementation with a wrapper for PowerPC to do
the u32->int sign extension.

The reason for a single patch covering powerpc, tile, sparc and arm64 is
to keep it bisectable, otherwise kernel building may fail with mismatched
function declarations.
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Acked-by: Chris Metcalf <cmetcalf@tilera.com>  [for tile]
Acked-by: NDavid S. Miller <davem@davemloft.net>
Acked-by: NArnd Bergmann <arnd@arndb.de>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0ad50c38

trace: use kbasename() · b2e902f0

由 Andy Shevchenko 提交于 12月 17, 2012

Signed-off-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b2e902f0

printk: boot_delay should only affect output · 2fa72c8f

由 Andrew Cooks 提交于 12月 17, 2012

The boot_delay parameter affects all printk(), even if the log level
prevents visible output from the call.  It results in delays greater than
the user intended without purpose.

This patch changes the behaviour of boot_delay to only delay output.
Signed-off-by: NAndrew Cooks <acooks@gmail.com>
Acked-by: NRandy Dunlap <rdunlap@infradead.org>
Cc: Joe Perches <joe@perches.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2fa72c8f

watchdog: store the watchdog sample period as a variable · 0f34c400

由 Chuansheng Liu 提交于 12月 17, 2012

Currently getting the sample period is always thru a complex
calculation: get_softlockup_thresh() * ((u64)NSEC_PER_SEC / 5).

We can store the sample period as a variable, and set it as __read_mostly
type.
Signed-off-by: Nliu chuansheng <chuansheng.liu@intel.com>
Cc: Don Zickus <dzickus@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

0f34c400

lseek: the "whence" argument is called "whence" · 965c8e59

由 Andrew Morton 提交于 12月 17, 2012

But the kernel decided to call it "origin" instead.  Fix most of the
sites.
Acked-by: NHugh Dickins <hughd@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

965c8e59

kernel: remove reference to feature-removal-schedule.txt · 8ec7d50f

由 Tao Ma 提交于 12月 17, 2012

In commit 9c0ece06 ("Get rid of Documentation/feature-removal.txt"),
Linus removed feature-removal-schedule.txt from Documentation, but there
is still some reference to this file.  So remove them.
Signed-off-by: NTao Ma <boyu.mt@taobao.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

8ec7d50f

sched: numa: Fix build error if CONFIG_NUMA_BALANCING && !CONFIG_TRANSPARENT_HUGEPAGE · 221392c3

由 Mel Gorman 提交于 12月 17, 2012

Michal Hocko reported that the following build error occurs if
CONFIG_NUMA_BALANCING is set without THP support

  kernel/sched/fair.c: In function ‘task_numa_work’:
  kernel/sched/fair.c:932:55: error: call to ‘__build_bug_failed’ declared with attribute error: BUILD_BUG failed

The problem is that HPAGE_PMD_SHIFT triggers a BUILD_BUG() on
!CONFIG_TRANSPARENT_HUGEPAGE. This patch addresses the problem.
Reported-by: NMichal Hocko <mhocko@suse.cz>
Signed-off-by: NMel Gorman <mgorman@suse.de>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

221392c3

14 12月, 2012 1 次提交

Revert "sched: Update_cfs_shares at period edge" · 17bc14b7

由 Linus Torvalds 提交于 12月 14, 2012

This reverts commit f269ae04.

It turns out it causes a very noticeable interactivity regression with
CONFIG_SCHED_AUTOGROUP (test-case: "make -j32" of the kernel in a
terminal window, while scrolling in a browser - the autogrouping means
that the two end up in separate cgroups, and the browser should be
smooth as silk despite the high load).

Says Paul Turner:
 "It seems that the update-throttling on the wake-side is reducing the
  interactive tasks' ability to preempt.  While I suspect the right
  longer term answer here is force these updates only in the
  cross-cgroup case; this is less trivial.  For this release I believe
  the right answer is either going to be a revert or restore the updates
  on the enqueue-side."
Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
Bisected-by: NMike Galbraith <efault@gmx.de>
Acked-by: NPaul Turner <pjt@google.com>
Acked-by: NIngo Molnar <mingo@kernel.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

17bc14b7

13 12月, 2012 3 次提交

res_counter: delete res_counter_write() · 44e33e8f

由 Greg Thelen 提交于 12月 12, 2012

Since commit 628f4235 ("memcg: limit change shrink usage") both
res_counter_write() and write_strategy_fn have been unused.  This patch
deletes them both.
Signed-off-by: NGreg Thelen <gthelen@google.com>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Tejun Heo <tj@kernel.org>
Acked-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
Cc: Frederic Weisbecker <fweisbec@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

44e33e8f

kthread: use N_MEMORY instead N_HIGH_MEMORY · aee4faa4

由 Lai Jiangshan 提交于 12月 12, 2012

N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.
Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Hillf Danton <dhillf@gmail.com>
Cc: Lin Feng <linfeng@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

aee4faa4

cpuset: use N_MEMORY instead N_HIGH_MEMORY · 38d7bee9

由 Lai Jiangshan 提交于 12月 12, 2012

N_HIGH_MEMORY stands for the nodes that has normal or high memory.
N_MEMORY stands for the nodes that has any memory.

The code here need to handle with the nodes which have memory, we should
use N_MEMORY instead.
Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com>
Acked-by: NHillf Danton <dhillf@gmail.com>
Signed-off-by: NWen Congyang <wency@cn.fujitsu.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Lin Feng <linfeng@cn.fujitsu.com>
Cc: David Rientjes <rientjes@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

38d7bee9

11 12月, 2012 6 次提交

mm: sched: numa: Delay PTE scanning until a task is scheduled on a new node · 5bca2303

由 Mel Gorman 提交于 11月 22, 2012

Due to the fact that migrations are driven by the CPU a task is running
on there is no point tracking NUMA faults until one task runs on a new
node. This patch tracks the first node used by an address space. Until
it changes, PTE scanning is disabled and no NUMA hinting faults are
trapped. This should help workloads that are short-lived, do not care
about NUMA placement or have bound themselves to a single node.

This takes advantage of the logic in "mm: sched: numa: Implement slow
start for working set sampling" to delay when the checks are made. This
will take advantage of processes that set their CPU and node bindings
early in their lifetime. It will also potentially allow any initial load
balancing to take place.
Signed-off-by: NMel Gorman <mgorman@suse.de>

5bca2303

mm: sched: numa: Control enabling and disabling of NUMA balancing if !SCHED_DEBUG · 3105b86a

由 Mel Gorman 提交于 11月 23, 2012

The "mm: sched: numa: Control enabling and disabling of NUMA balancing"
depends on scheduling debug being enabled but it's perfectly legimate to
disable automatic NUMA balancing even without this option. This should
take care of it.
Signed-off-by: NMel Gorman <mgorman@suse.de>

3105b86a

mm: sched: numa: Control enabling and disabling of NUMA balancing · 1a687c2e

由 Mel Gorman 提交于 11月 22, 2012

This patch adds Kconfig options and kernel parameters to allow the
enabling and disabling of automatic NUMA balancing. The existance
of such a switch was and is very important when debugging problems
related to transparent hugepages and we should have the same for
automatic NUMA placement.
Signed-off-by: NMel Gorman <mgorman@suse.de>

1a687c2e

mm: sched: Adapt the scanning rate if a NUMA hinting fault does not migrate · b8593bfd

由 Mel Gorman 提交于 11月 21, 2012

The PTE scanning rate and fault rates are two of the biggest sources of
system CPU overhead with automatic NUMA placement. Ideally a proper policy
would detect if a workload was properly placed, schedule and adjust the
PTE scanning rate accordingly. We do not track the necessary information
to do that but we at least know if we migrated or not.

This patch scans slower if a page was not migrated as the result of a
NUMA hinting fault up to sysctl_numa_balancing_scan_period_max which is
now higher than the previous default. Once every minute it will reset
the scanner in case of phase changes.

This is hilariously crude and the numbers are arbitrary. Workloads will
converge quite slowly in comparison to what a proper policy should be able
to do. On the plus side, we will chew up less CPU for workloads that have
no need for automatic balancing.
Signed-off-by: NMel Gorman <mgorman@suse.de>

b8593bfd

sched: numa: Slowly increase the scanning period as NUMA faults are handled · fb003b80

由 Mel Gorman 提交于 11月 15, 2012

Currently the rate of scanning for an address space is controlled
by the individual tasks. The next scan is simply determined by
2*p->numa_scan_period.

The 2*p->numa_scan_period is arbitrary and never changes. At this point
there is still no proper policy that decides if a task or process is
properly placed. It just scans and assumes the next NUMA fault will
place it properly. As it is assumed that pages will get properly placed
over time, increase the scan window each time a fault is incurred. This
is a big assumption as noted in the comments.

It should be noted that changing to p->numa_scan_period will increase
system CPU usage because now the scanning rate has effectively doubled.
If that is a problem then the min_rate should be made 200ms instead of
restoring the 2* logic.
Signed-off-by: NMel Gorman <mgorman@suse.de>

fb003b80

mm: numa: Rate limit setting of pte_numa if node is saturated · e14808b4

由 Mel Gorman 提交于 11月 19, 2012

If there are a large number of NUMA hinting faults and all of them
are resulting in migrations it may indicate that memory is just
bouncing uselessly around. NUMA balancing cost is likely exceeding
any benefit from locality. Rate limit the PTE updates if the node
is migration rate-limited. As noted in the comments, this distorts
the NUMA faulting statistics.
Signed-off-by: NMel Gorman <mgorman@suse.de>

e14808b4