提交 · c90423d1de12fbeaf0c898e1db0e962de347302b · openeuler / raspberrypi-kernel

06 11月, 2013 1 次提交

sched: Move completion code from core.c to completion.c · b8a21626

由 Peter Zijlstra 提交于 10月 04, 2013

Completions already have their own header file: linux/completion.h
Move the implementation out of kernel/sched/core.c and into its own
file: kernel/sched/completion.c.
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/n/tip-x2y49rmxu5dljt66ai2lcfuw@git.kernel.orgSigned-off-by: NIngo Molnar <mingo@kernel.org>

b8a21626

04 11月, 2013 1 次提交

ipc, msg: forbid negative values for "msg{max,mnb,mni}" · 9bf76ca3

由 Mathias Krause 提交于 11月 03, 2013

Negative message lengths make no sense -- so don't do negative queue
lenghts or identifier counts. Prevent them from getting negative.

Also change the underlying data types to be unsigned to avoid hairy
surprises with sign extensions in cases where those variables get
evaluated in unsigned expressions with bigger data types, e.g size_t.

In case a user still wants to have "unlimited" sizes she could just use
INT_MAX instead.
Signed-off-by: NMathias Krause <minipli@googlemail.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9bf76ca3

01 11月, 2013 1 次提交

sched/wait: Fix __wait_event_interruptible_lock_irq_timeout() · 7d716456

由 Heiko Carstens 提交于 10月 31, 2013

__wait_event_interruptible_lock_irq_timeout() needs the timeout
parameter passed instead of "ret".

This magically compiled since the only user has a local ret
variable. Luckily we got a build warning:

  CC      drivers/s390/scsi/zfcp_qdio.o
  drivers/s390/scsi/zfcp_qdio.c: In function 'zfcp_qdio_sbal_get':
  include/linux/wait.h:780:15: warning: 'ret' may be used uninitialized
Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20131031114814.GB5551@osirisSigned-off-by: NIngo Molnar <mingo@kernel.org>

7d716456

31 10月, 2013 2 次提交

hung_task debugging: Add tracepoint to report the hang · 6a716c90

由 Oleg Nesterov 提交于 10月 19, 2013

Currently check_hung_task() prints a warning if it detects the
problem, but it is not convenient to watch the system logs if
user-space wants to be notified about the hang.

Add the new trace_sched_process_hang() into check_hung_task(),
this way a user-space monitor can easily wait for the hang and
potentially resolve a problem.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Cc: Dave Sullivan <dsulliva@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Link: http://lkml.kernel.org/r/20131019161828.GA7439@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

6a716c90

percpu: fix this_cpu_sub() subtrahend casting for unsigneds · bd09d9a3

由 Greg Thelen 提交于 10月 30, 2013

this_cpu_sub() is implemented as negation and addition.

This patch casts the adjustment to the counter type before negation to
sign extend the adjustment.  This helps in cases where the counter type
is wider than an unsigned adjustment.  An alternative to this patch is
to declare such operations unsupported, but it seemed useful to avoid
surprises.

This patch specifically helps the following example:
  unsigned int delta = 1
  preempt_disable()
  this_cpu_write(long_counter, 0)
  this_cpu_sub(long_counter, delta)
  preempt_enable()

Before this change long_counter on a 64 bit machine ends with value
0xffffffff, rather than 0xffffffffffffffff.  This is because
this_cpu_sub(pcp, delta) boils down to this_cpu_add(pcp, -delta),
which is basically:
  long_counter = 0 + 0xffffffff

Also apply the same cast to:
  __this_cpu_sub()
  __this_cpu_sub_return()
  this_cpu_sub_return()

All percpu_test.ko passes, especially the following cases which
previously failed:

  l -= ui_one;
  __this_cpu_sub(long_counter, ui_one);
  CHECK(l, long_counter, -1);

  l -= ui_one;
  this_cpu_sub(long_counter, ui_one);
  CHECK(l, long_counter, -1);
  CHECK(l, long_counter, 0xffffffffffffffff);

  ul -= ui_one;
  __this_cpu_sub(ulong_counter, ui_one);
  CHECK(ul, ulong_counter, -1);
  CHECK(ul, ulong_counter, 0xffffffffffffffff);

  ul = this_cpu_sub_return(ulong_counter, ui_one);
  CHECK(ul, ulong_counter, 2);

  ul = __this_cpu_sub_return(ulong_counter, ui_one);
  CHECK(ul, ulong_counter, 1);
Signed-off-by: NGreg Thelen <gthelen@google.com>
Acked-by: NTejun Heo <tj@kernel.org>
Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

bd09d9a3

29 10月, 2013 1 次提交

perf: Fix perf ring buffer memory ordering · bf378d34

由 Peter Zijlstra 提交于 10月 28, 2013

The PPC64 people noticed a missing memory barrier and crufty old
comments in the perf ring buffer code. So update all the comments and
add the missing barrier.

When the architecture implements local_t using atomic_long_t there
will be double barriers issued; but short of introducing more
conditional barrier primitives this is the best we can do.
Reported-by: NVictor Kaplansky <victork@il.ibm.com>
Tested-by: NVictor Kaplansky <victork@il.ibm.com>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: michael@ellerman.id.au
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Michael Neuling <mikey@neuling.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: anton@samba.org
Cc: benh@kernel.crashing.org
Link: http://lkml.kernel.org/r/20131025173749.GG19466@laptop.lanSigned-off-by: NIngo Molnar <mingo@kernel.org>

bf378d34

23 10月, 2013 1 次提交

sched/wait: Fix build breakage · 92ec1180

由 Thierry Reding 提交于 10月 23, 2013

The wait_event_interruptible_lock_irq() macro is missing a
semi-colon which causes a build failure in the i915 DRM driver.
Signed-off-by: NThierry Reding <treding@nvidia.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1382528455-29911-1-git-send-email-treding@nvidia.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

92ec1180

22 10月, 2013 4 次提交

mac802154: correct a typo in ieee802154_alloc_device() prototype · 7e4d8a19

由 Alexandre Belloni 提交于 10月 21, 2013

This has no other impact than a cosmetic one.
Signed-off-by: NAlexandre Belloni <alexandre.belloni@free-electrons.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7e4d8a19

ipv6: fill rt6i_gateway with nexthop address · 550bab42

由 Julian Anastasov 提交于 10月 20, 2013

Make sure rt6i_gateway contains nexthop information in
all routes returned from lookup or when routes are directly
attached to skb for generated ICMP packets.

The effect of this patch should be a faster version of
rt6_nexthop() and the consideration of local addresses as
nexthop.
Signed-off-by: NJulian Anastasov <ja@ssi.bg>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

550bab42

ipv6: always prefer rt6i_gateway if present · 96dc8095

由 Julian Anastasov 提交于 10月 20, 2013

In v3.9 6fd6ce20 ("ipv6: Do not depend on rt->n in
ip6_finish_output2()." changed the behaviour of ip6_finish_output2()
such that the recently introduced rt6_nexthop() is used
instead of an assigned neighbor.

As rt6_nexthop() prefers rt6i_gateway only for gatewayed
routes this causes a problem for users like IPVS, xt_TEE and
RAW(hdrincl) if they want to use different address for routing
compared to the destination address.

Another case is when redirect can create RTF_DYNAMIC
route without RTF_GATEWAY flag, we ignore the rt6i_gateway
in rt6_nexthop().

Fix the above problems by considering the rt6i_gateway if
present, so that traffic routed to address on local subnet is
not wrongly diverted to the destination address.

Thanks to Simon Horman and Phil Oester for spotting the
problematic commit.

Thanks to Hannes Frederic Sowa for his review and help in testing.
Reported-by: NPhil Oester <kernel@linuxace.com>
Reported-by: NMark Brooks <mark@loadbalancer.org>
Signed-off-by: NJulian Anastasov <ja@ssi.bg>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

96dc8095

IB/core: Temporarily disable create_flow/destroy_flow uverbs · 7afbddfa

由 Yann Droneaud 提交于 10月 10, 2013

The create_flow/destroy_flow uverbs and the associated extensions to
the user-kernel verbs ABI are under review and are too experimental to
freeze at this point.

So userspace is not exposed to experimental features and an uinstable
ABI, temporarily disable this for v3.12 (with a Kconfig option behind
staging to reenable it if desired).

The feature will be enabled after proper cleanup for v3.13.
Signed-off-by: NYann Droneaud <ydroneaud@opteya.com>
Link: http://marc.info/?i=cover.1381351016.git.ydroneaud@opteya.com
Link: http://marc.info/?i=cover.1381177342.git.ydroneaud@opteya.com

[ Add a Kconfig option to reenable these verbs.  - Roland ]
Signed-off-by: NRoland Dreier <roland@purestorage.com>

7afbddfa

20 10月, 2013 1 次提交

net: fix cipso packet validation when !NETLABEL · f2e5ddcc

由 Seif Mazareeb 提交于 10月 17, 2013

When CONFIG_NETLABEL is disabled, the cipso_v4_validate() function could loop
forever in the main loop if opt[opt_iter +1] == 0, this will causing a kernel
crash in an SMP system, since the CPU executing this function will
stall /not respond to IPIs.

This problem can be reproduced by running the IP Stack Integrity Checker
(http://isic.sourceforge.net) using the following command on a Linux machine
connected to DUT:

"icmpsic -s rand -d <DUT IP address> -r 123456"
wait (1-2 min)
Signed-off-by: NSeif Mazareeb <seif@marvell.com>
Acked-by: NPaul Moore <paul@paul-moore.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f2e5ddcc

18 10月, 2013 4 次提交

drm: Pad drm_mode_get_connector to 64-bit boundary · bc5bd37c

由 Chris Wilson 提交于 10月 16, 2013

Pavel Roskin reported that DRM_IOCTL_MODE_GETCONNECTOR was overwritting
the 4 bytes beyond the end of its structure with a 32-bit userspace
running on a 64-bit kernel. This is due to the padding gcc inserts as
the drm_mode_get_connector struct includes a u64 and its size is not a
natural multiple of u64s.

64-bit kernel:

sizeof(drm_mode_get_connector)=80, alignof=8
sizeof(drm_mode_get_encoder)=20, alignof=4
sizeof(drm_mode_modeinfo)=68, alignof=4

32-bit userspace:

sizeof(drm_mode_get_connector)=76, alignof=4
sizeof(drm_mode_get_encoder)=20, alignof=4
sizeof(drm_mode_modeinfo)=68, alignof=4

Fortuituously we can insert explicit padding to the tail of our
structures without breaking ABI.
Reported-by: NPavel Roskin <proski@gnu.org>
Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
Cc: Dave Airlie <airlied@redhat.com>
Cc: dri-devel@lists.freedesktop.org
Cc: stable@vger.kernel.org
Signed-off-by: NDave Airlie <airlied@redhat.com>

bc5bd37c

yam: integer underflow in yam_ioctl() · 9e5f1721

由 Dan Carpenter 提交于 10月 14, 2013

We cap bitrate at YAM_MAXBITRATE in yam_ioctl(), but it could also be
negative.  I don't know the impact of using a negative bitrate but let's
prevent it.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9e5f1721

net: dst: provide accessor function to dst->xfrm · e87b3998

由 Vlad Yasevich 提交于 10月 15, 2013

dst->xfrm is conditionally defined.  Provide accessor funtion that
is always available.
Signed-off-by: NVlad Yasevich <vyasevich@gmail.com>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e87b3998

usb: usb_phy_gen: refine conditional declaration of usb_nop_xceiv_register · 94468783

由 Guenter Roeck 提交于 10月 16, 2013

Commit 3fa4d734 (usb: phy: rename nop_usb_xceiv => usb_phy_gen_xceiv)
changed the conditional around the declaration of usb_nop_xceiv_register
from
	#if defined(CONFIG_NOP_USB_XCEIV) ||
		(defined(CONFIG_NOP_USB_XCEIV_MODULE) && defined(MODULE))
to
	#if IS_ENABLED(CONFIG_NOP_USB_XCEIV)

While that looks the same, it is semantically different. The first expression
is true if CONFIG_NOP_USB_XCEIV is built as module and if the including
code is built as module. The second expression is true if code depending on
CONFIG_NOP_USB_XCEIV if built as module or into the kernel.

As a result, the arm:allmodconfig build fails with

arch/arm/mach-omap2/built-in.o: In function `omap3_evm_init':
arch/arm/mach-omap2/board-omap3evm.c:703: undefined reference to
	`usb_nop_xceiv_register'

Fix the problem by reverting to the old conditional.

Cc: Josh Boyer <jwboyer@redhat.com>
Signed-off-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

94468783

17 10月, 2013 3 次提交

ACPI / PM: Drop two functions that are not used any more · 2421ad48

由 Rafael J. Wysocki 提交于 10月 17, 2013

Two functions defined in device_pm.c, acpi_dev_pm_add_dependent()
and acpi_dev_pm_remove_dependent(), have no callers and may be
dropped, so drop them.

Moreover, they are the only functions adding entries to and removing
entries from the power_dependent list in struct acpi_device, so drop
that list too.
Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>

2421ad48

mm: memcg: handle non-error OOM situations more gracefully · 49426420

由 Johannes Weiner 提交于 10月 16, 2013

Commit 3812c8c8 ("mm: memcg: do not trap chargers with full
callstack on OOM") assumed that only a few places that can trigger a
memcg OOM situation do not return VM_FAULT_OOM, like optional page cache
readahead.  But there are many more and it's impractical to annotate
them all.

First of all, we don't want to invoke the OOM killer when the failed
allocation is gracefully handled, so defer the actual kill to the end of
the fault handling as well.  This simplifies the code quite a bit for
added bonus.

Second, since a failed allocation might not be the abrupt end of the
fault, the memcg OOM handler needs to be re-entrant until the fault
finishes for subsequent allocation attempts.  If an allocation is
attempted after the task already OOMed, allow it to bypass the limit so
that it can quickly finish the fault and invoke the OOM killer.
Reported-by: NazurIt <azurit@pobox.sk>
Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Cc: Michal Hocko <mhocko@suse.cz>
Cc: <stable@kernel.org>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

49426420

usb-storage: add quirk for mandatory READ_CAPACITY_16 · 32c37fc3

由 Oliver Neukum 提交于 10月 14, 2013

Some USB drive enclosures do not correctly report an
overflow condition if they hold a drive with a capacity
over 2TB and are confronted with a READ_CAPACITY_10.
They answer with their capacity modulo 2TB.
The generic layer cannot cope with that. It must be told
to use READ_CAPACITY_16 from the beginning.
Signed-off-by: NOliver Neukum <oneukum@suse.de>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

32c37fc3

16 10月, 2013 2 次提交

sched/wait: Introduce prepare_to_wait_event() · c2d81644

由 Oleg Nesterov 提交于 10月 07, 2013

Add the new helper, prepare_to_wait_event() which should only be used
by ___wait_event().

prepare_to_wait_event() returns -ERESTARTSYS if signal_pending_state()
is true, otherwise it does prepare_to_wait/exclusive.  This allows to
uninline the signal-pending checks in wait_event*() macros.

Also, it can initialize wait->private/func. We do not care if they were
already initialized, the values are the same. This also shaves a couple
of insns from the inlined code.

This obviously makes prepare_*() path a little bit slower, but we are
likely going to sleep anyway, so I think it makes sense to shrink .text:

               text    data      bss      dec     hex  filename
            ===================================================
   before:  5126092 2959248 10117120 18202460 115bf5c   vmlinux
    after:  5124618 2955152 10117120 18196890 115a99a   vmlinux

on my build.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20131007161824.GA29757@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

c2d81644

sched/wait: Add ___wait_cond_timeout() to wait_event*_timeout() too · 8922915b

由 Oleg Nesterov 提交于 10月 07, 2013

Commit 4c663cfc ("wait: fix false timeouts when using
wait_event_timeout()") introduced the additional condition checks
after a timeout but only in the "slow" __wait*() paths.

wait_event_timeout(wq, CONDITION, 0) still returns 0 if CONDITION
is already true and we do not call __wait*().

Now that we have ___wait_cond_timeout() we can use it instead to
ensure that __ret will be properly updated.
Signed-off-by: NOleg Nesterov <oleg@redhat.com>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20131007183106.GA10973@redhat.comSigned-off-by: NIngo Molnar <mingo@kernel.org>

8922915b

15 10月, 2013 1 次提交

Revert "drivers: of: add initialization code for dma reserved memory" · 1931ee14

由 Marek Szyprowski 提交于 10月 11, 2013

This reverts commit 9d8eab7a. There is
still no consensus on the bindings for the reserved memory and various
drawbacks of the proposed solution has been shown, so the best now is to
revert it completely and start again from scratch later.
Signed-off-by: NMarek Szyprowski <m.szyprowski@samsung.com>
Signed-off-by: NGrant Likely <grant.likely@linaro.org>

1931ee14

12 10月, 2013 1 次提交

ASoC: rcar: fixup generation checker · c5d5a58d

由 Kuninori Morimoto 提交于 10月 11, 2013

Current rcar is using rsnd_is_gen1/gen2() to checking its
IP generation, but it needs data mask.
This patch fixes it up.
Signed-off-by: NKuninori Morimoto <kuninori.morimoto.gx@renesas.com>
Signed-off-by: NMark Brown <broonie@linaro.org>

c5d5a58d

11 10月, 2013 6 次提交

compiler/gcc4: Add quirk for 'asm goto' miscompilation bug · 3f0116c3

由 Ingo Molnar 提交于 10月 10, 2013

Fengguang Wu, Oleg Nesterov and Peter Zijlstra tracked down
a kernel crash to a GCC bug: GCC miscompiles certain 'asm goto'
constructs, as outlined here:

  http://gcc.gnu.org/bugzilla/show_bug.cgi?id=58670

Implement a workaround suggested by Jakub Jelinek.
Reported-and-tested-by: NFengguang Wu <fengguang.wu@intel.com>
Reported-by: NOleg Nesterov <oleg@redhat.com>
Reported-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Suggested-by: NJakub Jelinek <jakub@redhat.com>
Reviewed-by: NRichard Henderson <rth@twiddle.net>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <stable@kernel.org>
Signed-off-by: NIngo Molnar <mingo@kernel.org>

3f0116c3

Revert "i915: Update VGA arbiter support for newer devices" · ebff5fa9

由 Dave Airlie 提交于 10月 11, 2013

This reverts commit 81b5c7bc.

Adding drm/i915 into the vga arbiter chain means that X (in a piece of
well-meant paranoia) will do a get/put on the vga decoding around
_every_ accel call down into the ddx. Which results in some nice
performance disasters [1]. This really breaks userspace, by disabling
DRI for everyone, and stops OpenGL from working, this isn't limited
to just the i915 but both the integrated and discrete GPUs on
multi-gpu systems, in other words this causes untold worlds of pain,

Ville tried to come up with a Great Hack to fiddle the required VGA
I/O ops behind everyone's back using stop_machine, but that didn't
really work out [2]. Given that we're fairly late in the -rc stage for
such games let's just revert this all.

One thing we might want to keep is to delay the disabling of the vga
decoding until the fbdev emulation and the fbcon screen is set up. If
we kill vga mem decoding beforehand fbcon can end up with a white
square in the top-left corner it tried to save from the vga memory for
a seamless transition. And we have bug reports on older platforms
which seem to match these symptoms.

But again that's something to play around with in -next.

References: [1] http://lists.x.org/archives/xorg-devel/2013-September/037763.html
References: [2] http://www.spinics.net/lists/intel-gfx/msg34062.html
Cc: Alex Williamson <alex.williamson@redhat.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: NDaniel Vetter <daniel.vetter@ffwll.ch>
Signed-off-by: NDave Airlie <airlied@redhat.com>

ebff5fa9

random: allow architectures to optionally define random_get_entropy() · 61875f30

由 Theodore Ts'o 提交于 9月 21, 2013

Allow architectures which have a disabled get_cycles() function to
provide a random_get_entropy() function which provides a fine-grained,
rapidly changing counter that can be used by the /dev/random driver.

For example, an architecture might have a rapidly changing register
used to control random TLB cache eviction, or DRAM refresh that
doesn't meet the requirements of get_cycles(), but which is good
enough for the needs of the random driver.
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Cc: stable@vger.kernel.org

61875f30

IB/mlx5: Fix eq names to display nicely in /proc/interrupts · ada9f5d0

由 Sagi Grimberg 提交于 9月 11, 2013

It's helpful for a driver to put the pci slot name in its interrupt
names, so /proc/interrupts will show the pci slot of the device.
Signed-off-by: NSagi Grimberg <sagig@mellanox.com>
Signed-off-by: NEli Cohen <eli@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

ada9f5d0

mlx5: Fix layout of struct mlx5_init_seg · 2f6daec1

由 Eli Cohen 提交于 9月 11, 2013

The layout of struct health_buffer was not according to firmware
specification.  Fix it to comply.
Signed-off-by: NEli Cohen <eli@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

2f6daec1

mlx5: Remove checksum on command interface commands · c1868b82

由 Eli Cohen 提交于 9月 11, 2013

Checksum calculations consume CPU resources and can be significant to
the rate of resource creation/destruction.
Signed-off-by: NEli Cohen <eli@mellanox.com>
Signed-off-by: NRoland Dreier <roland@purestorage.com>

c1868b82

09 10月, 2013 11 次提交

sched/numa: Skip some page migrations after a shared fault · de1c9ce6

由 Rik van Riel 提交于 10月 07, 2013

Shared faults can lead to lots of unnecessary page migrations,
slowing down the system, and causing private faults to hit the
per-pgdat migration ratelimit.

This patch adds sysctl numa_balancing_migrate_deferred, which specifies
how many shared page migrations to skip unconditionally, after each page
migration that is skipped because it is a shared fault.

This reduces the number of page migrations back and forth in
shared fault situations. It also gives a strong preference to
the tasks that are already running where most of the memory is,
and to moving the other tasks to near the memory.

Testing this with a much higher scan rate than the default
still seems to result in fewer page migrations than before.

Memory seems to be somewhat better consolidated than previously,
with multi-instance specjbb runs on a 4 node system.
Signed-off-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NMel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1381141781-10992-62-git-send-email-mgorman@suse.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

de1c9ce6

mm: numa: Revert temporarily disabling of NUMA migration · 1e3646ff

由 Rik van Riel 提交于 10月 07, 2013

With the scan rate code working (at least for multi-instance specjbb),
the large hammer that is "sched: Do not migrate memory immediately after
switching node" can be replaced with something smarter. Revert temporarily
migration disabling and all traces of numa_migrate_seq.
Signed-off-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NMel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1381141781-10992-61-git-send-email-mgorman@suse.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

1e3646ff

sched/numa: Remove the numa_balancing_scan_period_reset sysctl · 930aa174

由 Mel Gorman 提交于 10月 07, 2013

With scan rate adaptions based on whether the workload has properly
converged or not there should be no need for the scan period reset
hammer. Get rid of it.
Signed-off-by: NMel Gorman <mgorman@suse.de>
Reviewed-by: NRik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1381141781-10992-60-git-send-email-mgorman@suse.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

930aa174

sched/numa: Adjust scan rate in task_numa_placement · 04bb2f94

由 Rik van Riel 提交于 10月 07, 2013

Adjust numa_scan_period in task_numa_placement, depending on how much
useful work the numa code can do. The more local faults there are in a
given scan window the longer the period (and hence the slower the scan rate)
during the next window. If there are excessive shared faults then the scan
period will decrease with the amount of scaling depending on whether the
ratio of shared/private faults. If the preferred node changes then the
scan rate is reset to recheck if the task is properly placed.
Signed-off-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NMel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1381141781-10992-59-git-send-email-mgorman@suse.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

04bb2f94

sched/numa: Be more careful about joining numa groups · dabe1d99

由 Rik van Riel 提交于 10月 07, 2013

Due to the way the pid is truncated, and tasks are moved between
CPUs by the scheduler, it is possible for the current task_numa_fault
to group together tasks that do not actually share memory together.

This patch adds a few easy sanity checks to task_numa_fault, joining
tasks together if they share the same tsk->mm, or if the fault was on
a page with an elevated mapcount, in a shared VMA.
Signed-off-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NMel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1381141781-10992-57-git-send-email-mgorman@suse.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

dabe1d99

sched/numa: Add debugging · b32e86b4

由 Ingo Molnar 提交于 10月 07, 2013

Signed-off-by: NIngo Molnar <mingo@kernel.org>
Reviewed-by: NRik van Riel <riel@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Signed-off-by: NIngo Molnar <mingo@kernel.org>
Link: http://lkml.kernel.org/r/1381141781-10992-53-git-send-email-mgorman@suse.de

b32e86b4

sched/numa: Call task_numa_free() from do_execve() · 82727018

由 Rik van Riel 提交于 10月 07, 2013

It is possible for a task in a numa group to call exec, and
have the new (unrelated) executable inherit the numa group
association from its former self.

This has the potential to break numa grouping, and is trivial
to fix.
Signed-off-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NMel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1381141781-10992-51-git-send-email-mgorman@suse.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

82727018

sched/numa: Use group fault statistics in numa placement · 83e1d2cd

由 Mel Gorman 提交于 10月 07, 2013

This patch uses the fraction of faults on a particular node for both task
and group, to figure out the best node to place a task. If the task and
group statistics disagree on what the preferred node should be then a full
rescan will select the node with the best combined weight.
Signed-off-by: NMel Gorman <mgorman@suse.de>
Reviewed-by: NRik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1381141781-10992-50-git-send-email-mgorman@suse.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

83e1d2cd

sched/numa: Stay on the same node if CLONE_VM · 5e1576ed

由 Rik van Riel 提交于 10月 07, 2013

A newly spawned thread inside a process should stay on the same
NUMA node as its parent. This prevents processes from being "torn"
across multiple NUMA nodes every time they spawn a new thread.
Signed-off-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NMel Gorman <mgorman@suse.de>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1381141781-10992-49-git-send-email-mgorman@suse.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

5e1576ed

mm: numa: Do not group on RO pages · 6688cc05

由 Peter Zijlstra 提交于 10月 07, 2013

And here's a little something to make sure not the whole world ends up
in a single group.

As while we don't migrate shared executable pages, we do scan/fault on
them. And since everybody links to libc, everybody ends up in the same
group.
Suggested-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Signed-off-by: NMel Gorman <mgorman@suse.de>
Reviewed-by: NRik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1381141781-10992-47-git-send-email-mgorman@suse.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

6688cc05

sched/numa: Report a NUMA task group ID · e29cf08b

由 Mel Gorman 提交于 10月 07, 2013

It is desirable to model from userspace how the scheduler groups tasks
over time. This patch adds an ID to the numa_group and reports it via
/proc/PID/status.
Signed-off-by: NMel Gorman <mgorman@suse.de>
Reviewed-by: NRik van Riel <riel@redhat.com>
Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Signed-off-by: NPeter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1381141781-10992-45-git-send-email-mgorman@suse.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

e29cf08b