提交 · e1c4cde62b230c1110ac8e3552898c8582679c6f · openeuler / raspberrypi-kernel

07 7月, 2016 6 次提交

timers: Remove set_timer_slack() leftovers · 53bf837b

由 Thomas Gleixner 提交于 7月 04, 2016

We now have implicit batching in the timer wheel. The slack API is no longer
used, so remove it.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Alan Stern <stern@rowland.harvard.edu>
Cc: Andrew F. Davis <afd@ti.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Chris Mason <clm@fb.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: David Woodhouse <dwmw2@infradead.org>
Cc: Dmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: George Spelvin <linux@sciencehorizons.net>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Jaehoon Chung <jh80.chung@samsung.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mathias Nyman <mathias.nyman@intel.com>
Cc: Pali Rohár <pali.rohar@gmail.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Sebastian Reichel <sre@kernel.org>
Cc: Ulf Hansson <ulf.hansson@linaro.org>
Cc: linux-block@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Cc: linux-mmc@vger.kernel.org
Cc: linux-pm@vger.kernel.org
Cc: linux-usb@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160704094342.189813118@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

53bf837b

timers: Switch to a non-cascading wheel · 500462a9

由 Thomas Gleixner 提交于 7月 04, 2016

The current timer wheel has some drawbacks:

1) Cascading:

Cascading can be an unbound operation and is completely pointless in most
cases because the vast majority of the timer wheel timers are canceled or
rearmed before expiration. (They are used as timeout safeguards, not as
real timers to measure time.)

2) No fast lookup of the next expiring timer:

In NOHZ scenarios the first timer soft interrupt after a long NOHZ period
must fast forward the base time to the current value of jiffies. As we
have no way to find the next expiring timer fast, the code loops linearly
and increments the base time one by one and checks for expired timers
in each step. This causes unbound overhead spikes exactly in the moment
when we should wake up as fast as possible.

After a thorough analysis of real world data gathered on laptops,
workstations, webservers and other machines (thanks Chris!) I came to the
conclusion that the current 'classic' timer wheel implementation can be
modified to address the above issues.

The vast majority of timer wheel timers is canceled or rearmed before
expiry. Most of them are timeouts for networking and other I/O tasks. The
nature of timeouts is to catch the exception from normal operation (TCP ack
timed out, disk does not respond, etc.). For these kinds of timeouts the
accuracy of the timeout is not really a concern. Timeouts are very often
approximate worst-case values and in case the timeout fires, we already
waited for a long time and performance is down the drain already.

The few timers which actually expire can be split into two categories:

1) Short expiry times which expect halfways accurate expiry

2) Long term expiry times are inaccurate today already due to the
batching which is done for NOHZ automatically and also via the
set_timer_slack() API.

So for long term expiry timers we can avoid the cascading property and just
leave them in the less granular outer wheels until expiry or
cancelation. Timers which are armed with a timeout larger than the wheel
capacity are no longer cascaded. We expire them with the longest possible
timeout (6+ days). We have not observed such timeouts in our data collection,
but at least we handle them, applying the rule of the least surprise.

To avoid extending the wheel levels for HZ=1000 so we can accomodate the
longest observed timeouts (5 days in the network conntrack code) we reduce the
first level granularity on HZ=1000 to 4ms, which effectively is the same as
the HZ=250 behaviour. From our data analysis there is nothing which relies on
that 1ms granularity and as a side effect we get better batching and timer
locality for the networking code as well.

Contrary to the classic wheel the granularity of the next wheel is not the
capacity of the first wheel. The granularities of the wheels are in the
currently chosen setting 8 times the granularity of the previous wheel.

So for HZ=250 we end up with the following granularity levels:

Level Offset Granularity Range
0 0 4 ms 0 ms - 252 ms
1 64 32 ms 256 ms - 2044 ms (256ms - ~2s)
2 128 256 ms 2048 ms - 16380 ms (~2s - ~16s)
3 192 2048 ms (~2s) 16384 ms - 131068 ms (~16s - ~2m)
4 256 16384 ms (~16s) 131072 ms - 1048572 ms (~2m - ~17m)
5 320 131072 ms (~2m) 1048576 ms - 8388604 ms (~17m - ~2h)
6 384 1048576 ms (~17m) 8388608 ms - 67108863 ms (~2h - ~18h)
7 448 8388608 ms (~2h) 67108864 ms - 536870911 ms (~18h - ~6d)

That's a worst case inaccuracy of 12.5% for the timers which are queued at the
beginning of a level.

So the new wheel concept addresses the old issues:

1) Cascading is avoided completely

2) By keeping the timers in the bucket until expiry/cancelation we can track
the buckets which have timers enqueued in a bucket bitmap and therefore can
look up the next expiring timer very fast and O(1).

A further benefit of the concept is that the slack calculation which is done
on every timer start is no longer necessary because the granularity levels
provide natural batching already.

Our extensive testing with various loads did not show any performance
degradation vs. the current wheel implementation.

This patch does not address the 'fast lookup' issue as we wanted to make sure
that there is no regression introduced by the wheel redesign. The
optimizations are in follow up patches.

This patch contains fixes from Anna-Maria Gleixner and Richard Cochran.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Chris Mason <clm@fb.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: George Spelvin <linux@sciencehorizons.net>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160704094342.108621834@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

500462a9

timers: Reduce the CPU index space to 256k · b0d6e2dc

由 Thomas Gleixner 提交于 7月 04, 2016

We want to store the array index in the flags space. 256k CPUs should be
enough for a while.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Chris Mason <clm@fb.com>
Cc: George Spelvin <linux@sciencehorizons.net>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160704094342.030144293@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

b0d6e2dc

hlist: Add hlist_is_singular_node() helper · 15dba1e3

由 Thomas Gleixner 提交于 7月 04, 2016

Required to figure out whether the entry is the only one in the hlist.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Chris Mason <clm@fb.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: George Spelvin <linux@sciencehorizons.net>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160704094341.867631372@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

15dba1e3

timers: Remove the deprecated mod_timer_pinned() API · 177ec0a0

由 Thomas Gleixner 提交于 7月 04, 2016

We switched all users to initialize the timers as pinned and call
mod_timer(). Remove the now unused timer API function.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Chris Mason <clm@fb.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: George Spelvin <linux@sciencehorizons.net>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160704094341.706205231@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

177ec0a0

timers: Make 'pinned' a timer property · e675447b

由 Thomas Gleixner 提交于 7月 04, 2016

We want to move the timer migration logic from a 'push' to a 'pull' model.

Under the current 'push' model pinned timers are handled via
a runtime API variant: mod_timer_pinned().

The 'pull' model requires us to store the pinned attribute of a timer
in the timer_list structure itself, as a new TIMER_PINNED bit in
timer->flags.

This flag must be set at initialization time and the timer APIs
recognize the flag.

This patch:

 - Implements the new flag and associated new-style initialization
   methods

 - makes mod_timer() recognize new-style pinned timers,

 - and adds some migration helper facility to allow
   step by step conversion of old-style to new-style
   pinned timers.
Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
Reviewed-by: NFrederic Weisbecker <fweisbec@gmail.com>
Cc: Arjan van de Ven <arjan@infradead.org>
Cc: Chris Mason <clm@fb.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: George Spelvin <linux@sciencehorizons.net>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Len Brown <lenb@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: rt@linutronix.de
Link: http://lkml.kernel.org/r/20160704094341.049338558@linutronix.deSigned-off-by: NIngo Molnar <mingo@kernel.org>

e675447b

02 7月, 2016 2 次提交

net_sched: fix mirrored packets checksum · 82a31b92

由 WANG Cong 提交于 6月 30, 2016

Similar to commit 9b368814 ("net: fix bridge multicast packet checksum validation")
we need to fixup the checksum for CHECKSUM_COMPLETE when
pushing skb on RX path. Otherwise we get similar splats.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Tom Herbert <tom@herbertland.com>
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

82a31b92

packet: Use symmetric hash for PACKET_FANOUT_HASH. · eb70db87

由 David S. Miller 提交于 7月 01, 2016

People who use PACKET_FANOUT_HASH want a symmetric hash, meaning that
they want packets going in both directions on a flow to hash to the
same bucket.

The core kernel SKB hash became non-symmetric when the ipv6 flow label
and other entities were incorporated into the standard flow hash order
to increase entropy.

But there are no users of PACKET_FANOUT_HASH who want an assymetric
hash, they all want a symmetric one.

Therefore, use the flow dissector to compute a flat symmetric hash
over only the protocol, addresses and ports.  This hash does not get
installed into and override the normal skb hash, so this change has
no effect whatsoever on the rest of the stack.
Reported-by: NEric Leblond <eric@regit.org>
Tested-by: NEric Leblond <eric@regit.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eb70db87

01 7月, 2016 1 次提交

net/mlx5: Add timeout handle to commands with callback · 65ee6708

由 Mohamad Haj Yahia 提交于 6月 30, 2016

The current implementation does not handle timeout in case of command
with callback request, and this can lead to deadlock if the command
doesn't get fw response.
Add delayed callback timeout work before posting the command to fw.
In case of real fw command completion we will cancel the delayed work.
In case of fw command timeout the callback timeout handler will be
called and it will simulate fw completion with timeout error.

Fixes: e126ba97 ('mlx5: Add driver for Mellanox Connect-IB adapters')
Signed-off-by: NMohamad Haj Yahia <mohamad@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

65ee6708

30 6月, 2016 6 次提交

mfd: da9053: Fix compiler warning message for uninitialised variable · ba4a1c28

由 Steve Twiss 提交于 6月 27, 2016

Fix compiler warning caused by an uninitialised variable inside
da9052_group_write() function. Defaulting the value to zero covers
the trivial case.
Signed-off-by: NSteve Twiss <stwiss.opensource@diasemi.com>
Reported-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: NLee Jones <lee.jones@linaro.org>

ba4a1c28

reset: TRIVIAL: Add line break at same place for similar APIs · 0bcc0eab

由 Lee Jones 提交于 6月 06, 2016

Standardise the way inline functions:

  devm_reset_control_get_shared_by_index
  devm_reset_control_get_exclusive_by_index

... are formatted.
Signed-off-by: NLee Jones <lee.jones@linaro.org>
Signed-off-by: NPhilipp Zabel <p.zabel@pengutronix.de>

0bcc0eab

reset: Supply *_shared variant calls when using *_optional APIs · c33d61a0

由 Lee Jones 提交于 6月 06, 2016

Consumers need to be able to specify whether they are requesting an
'exclusive' or 'shared' reset line no matter which API (of_*, devm_*,
etc) they are using.  This change allows users of the optional_* API
in particular to specify that their request is for a 'shared' line.
Signed-off-by: NLee Jones <lee.jones@linaro.org>
Signed-off-by: NPhilipp Zabel <p.zabel@pengutronix.de>

c33d61a0

reset: Supply *_shared variant calls when using of_* API · 40faee8e

由 Lee Jones 提交于 6月 06, 2016

Consumers need to be able to specify whether they are requesting an
'exclusive' or 'shared' reset line no matter which API (of_*, devm_*,
etc) they are using.  This change allows users of the of_* API in
particular to specify that their request is for a 'shared' line.
Signed-off-by: NLee Jones <lee.jones@linaro.org>
Signed-off-by: NPhilipp Zabel <p.zabel@pengutronix.de>

40faee8e

reset: Ensure drivers are explicit when requesting reset lines · a53e35db

由 Lee Jones 提交于 6月 06, 2016

Phasing out generic reset line requests enables us to make some better
decisions on when and how to (de)assert said lines. If an 'exclusive'
line is requested, we know a device *requires* a reset and that it's
preferable to act upon a request right away. However, if a 'shared'
reset line is requested, we can reasonably assume sure that placing a
device into reset isn't a hard requirement, but probably a measure to
save power and is thus able to cope with not being asserted if another
device is still in use.

In order allow gentle adoption and not to forcing all consumers to
move to the API immediately, causing administration headache between
subsystems, this patch adds some temporary stand-in shim-calls. This
will ease the burden at merge time and allow subsystems to migrate over
to the new API in a more realistic time-frame.
Signed-off-by: NLee Jones <lee.jones@linaro.org>
Signed-off-by: NPhilipp Zabel <p.zabel@pengutronix.de>

a53e35db

reset: Reorder inline reset_control_get*() wrappers · 3c35f6ed

由 Lee Jones 提交于 6月 06, 2016

We're about to split the current API into two, where consumers will
be forced to be explicit when requesting reset lines.  The choice
will be to either the call the *_exclusive or *_shared variant
depending on whether they can actually tolorate not being asserted
when that request is made.

The new API will look like this once reorded and complete:

  reset_control_get_exclusive()
  reset_control_get_shared()
  reset_control_get_optional_exclusive()
  reset_control_get_optional_shared()
  of_reset_control_get_exclusive()
  of_reset_control_get_shared()
  of_reset_control_get_exclusive_by_index()
  of_reset_control_get_shared_by_index()
  devm_reset_control_get_exclusive()
  devm_reset_control_get_shared()
  devm_reset_control_get_optional_exclusive()
  devm_reset_control_get_optional_shared()
  devm_reset_control_get_exclusive_by_index()
  devm_reset_control_get_shared_by_index()
Signed-off-by: NLee Jones <lee.jones@linaro.org>
Signed-off-by: NPhilipp Zabel <p.zabel@pengutronix.de>

3c35f6ed

29 6月, 2016 2 次提交

bpf, perf: delay release of BPF prog after grace period · ceb56070

由 Daniel Borkmann 提交于 6月 27, 2016

Commit dead9f29 ("perf: Fix race in BPF program unregister") moved
destruction of BPF program from free_event_rcu() callback to __free_event(),
which is problematic if used with tail calls: if prog A is attached as
trace event directly, but at the same time present in a tail call map used
by another trace event program elsewhere, then we need to delay destruction
via RCU grace period since it can still be in use by the program doing the
tail call (the prog first needs to be dropped from the tail call map, then
trace event with prog A attached destroyed, so we get immediate destruction).

Fixes: dead9f29 ("perf: Fix race in BPF program unregister")
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Cc: Jann Horn <jann@thejh.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ceb56070

audit: move audit_get_tty to reduce scope and kabi changes · 3f5be2da

由 Richard Guy Briggs 提交于 6月 28, 2016

The only users of audit_get_tty and audit_put_tty are internal to
audit, so move it out of include/linux/audit.h to kernel.h and create
a proper function rather than inlining it.  This also reduces kABI
changes.
Suggested-by: NPaul Moore <pmoore@redhat.com>
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
[PM: line wrapped description]
Signed-off-by: NPaul Moore <paul@paul-moore.com>

3f5be2da

28 6月, 2016 5 次提交

sock_diag: do not broadcast raw socket destruction · 9a0fee2b

由 Willem de Bruijn 提交于 6月 24, 2016

Diag intends to broadcast tcp_sk and udp_sk socket destruction.
Testing sk->sk_protocol for IPPROTO_TCP/IPPROTO_UDP alone is not
sufficient for this. Raw sockets can have the same type.

Add a test for sk->sk_type.

Fixes: eb4cb008 ("sock_diag: define destruction multicast groups")
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9a0fee2b

clk: Add missing clk_get_sys() stub · b81ea968

由 Daniel Lezcano 提交于 6月 02, 2016

When compiling with the COMPILE_TEST option set, the clps711x does not
compile because of the clk_get_sys() noop stub missing.
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: NMichael Turquette <mturquette@baylibre.com>

b81ea968

clocksources: Switch back to the clksrc table · 177cf6e5

由 Daniel Lezcano 提交于 6月 07, 2016

All the clocksource drivers's init function are now converted to return
an error code. CLOCKSOURCE_OF_DECLARE is no longer used as well as the
clksrc-of table.

Let's convert back the names:
 - CLOCKSOURCE_OF_DECLARE_RET => CLOCKSOURCE_OF_DECLARE
 - clksrc-of-ret              => clksrc-of
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>

For exynos_mct and samsung_pwm_timer:
Acked-by: NKrzysztof Kozlowski <k.kozlowski@samsung.com>

For arch/arc:
Acked-by: NVineet Gupta <vgupta@synopsys.com>

For mediatek driver:
Acked-by: NMatthias Brugger <matthias.bgg@gmail.com>

For the Rockchip-part
Acked-by: NHeiko Stuebner <heiko@sntech.de>

For STi :
Acked-by: NPatrice Chotard <patrice.chotard@st.com>

For the mps2-timer.c and versatile.c changes:
Acked-by: NLiviu Dudau <Liviu.Dudau@arm.com>

For the OXNAS part :
Acked-by: NNeil Armstrong <narmstrong@baylibre.com>

For LPC32xx driver:
Acked-by: NSylvain Lemieux <slemieux.tyco@gmail.com>

For Broadcom Kona timer change:
Acked-by: NRay Jui <ray.jui@broadcom.com>

For Sun4i and Sun5i:
Acked-by: NChen-Yu Tsai <wens@csie.org>

For Meson6:
Acked-by: NCarlo Caione <carlo@caione.org>

For Keystone:
Acked-by: NSantosh Shilimkar <ssantosh@kernel.org>

For NPS:
Acked-by: NNoam Camus <noamca@mellanox.com>

For bcm2835:
Acked-by: NEric Anholt <eric@anholt.net>

177cf6e5

clocksource/drivers/clksrc-probe: Introduce init functions with return code · b7c4db86

由 Daniel Lezcano 提交于 5月 31, 2016

Currently, the clksrc-probe is not able to handle any error from the init
functions. There are different issues with the current code:
 - the code is duplicated in the init functions by writing error
 - every driver tends to panic in its own init function
 - counting the number of clocksources is not reliable

This patch adds another table to store the functions returning an error.
The table is temporary while we convert all the drivers to return an error
and will disappear.
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>

b7c4db86

of: Add a new macro to declare_of for one parameter function returning a value · c35d9292

由 Daniel Lezcano 提交于 4月 18, 2016

The macro OF_DECLARE_1 expect a void (*func)(struct device_node *) while the
OF_DECLARE_2 expect a int (*func)(struct device_node *, struct device_node *).

The second one allows to pass an init function returning a value, which make
possible to call the functions in the table and check the return value in order
to catch at a higher level the errors and handle them from there instead of
doing a panic in each driver (well at least this is the case for the clkevt).

Unfortunately the OF_DECLARE_1 does not allow that and that lead to some code
duplication and crappyness in the drivers.

The OF_DECLARE_1 is used by all the clk drivers and the clocksource/clockevent
drivers. It is not possible to do the change in one shot as we have to change
all the init functions.

The OF_DECLARE_2 specifies an init function prototype with two parameters with
the node and its parent. The latter won't be used, ever, in the timer drivers.

Introduce a OF_DECLARE_1_RET macro to be used, and hopefully we can smoothly
and iteratively change the users of OF_DECLARE_1 to use the new macro instead.
Signed-off-by: NDaniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: NRob Herring <robh@kernel.org>

c35d9292

25 6月, 2016 5 次提交

Revert "mm: make faultaround produce old ptes" · 315d09bf

由 Kirill A. Shutemov 提交于 6月 24, 2016

This reverts commit 5c0a85fa.

The commit causes ~6% regression in unixbench.

Let's revert it for now and consider other solution for reclaim problem
later.

Link: http://lkml.kernel.org/r/1465893750-44080-2-git-send-email-kirill.shutemov@linux.intel.comSigned-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: N"Huang, Ying" <ying.huang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

315d09bf

mm: mempool: kasan: don't poot mempool objects in quarantine · 9b75a867

由 Andrey Ryabinin 提交于 6月 24, 2016

Currently we may put reserved by mempool elements into quarantine via
kasan_kfree().  This is totally wrong since quarantine may really free
these objects.  So when mempool will try to use such element,
use-after-free will happen.  Or mempool may decide that it no longer
need that element and double-free it.

So don't put object into quarantine in kasan_kfree(), just poison it.
Rename kasan_kfree() to kasan_poison_kfree() to respect that.

Also, we shouldn't use kasan_slab_alloc()/kasan_krealloc() in
kasan_unpoison_element() because those functions may update allocation
stacktrace.  This would be wrong for the most of the remove_element call
sites.

(The only call site where we may want to update alloc stacktrace is
 in mempool_alloc(). Kmemleak solves this by calling
 kmemleak_update_trace(), so we could make something like that too.
 But this is out of scope of this patch).

Fixes: 55834c59 ("mm: kasan: initial memory quarantine implementation")
Link: http://lkml.kernel.org/r/575977C3.1010905@virtuozzo.comSigned-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
Reported-by: NKuthonuzo Luruo <kuthonuzo.luruo@hpe.com>
Acked-by: NAlexander Potapenko <glider@google.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9b75a867

fix up initial thread stack pointer vs thread_info confusion · 7f1a00b6

由 Linus Torvalds 提交于 6月 24, 2016

The INIT_TASK() initializer was similarly confused about the stack vs
thread_info allocation that the allocators had, and that were fixed in
commit b235beea ("Clarify naming of thread info/stack allocators").

The task ->stack pointer only incidentally ends up having the same value
as the thread_info, and in fact that will change.

So fix the initial task struct initializer to point to 'init_stack'
instead of 'init_thread_info', and make sure the ia64 definition for
that exists.

This actually makes the ia64 tsk->stack pointer be sensible for the
initial task, but not for any other task.  As mentioned in commit
b235beea, that whole pointer isn't actually used on ia64, since
task_stack_page() there just points to the (single) allocation.

All the other architectures seem to have copied the 'init_stack'
definition, even if it tended to be generally unusued.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7f1a00b6

USB: EHCI: declare hostpc register as zero-length array · 7e8b3dfe

由 Alan Stern 提交于 6月 23, 2016

The HOSTPC extension registers found in some EHCI implementations form
a variable-length array, with one element for each port.  Therefore
the hostpc field in struct ehci_regs should be declared as a
zero-length array, not a single-element array.

This fixes a problem reported by UBSAN.
Signed-off-by: NAlan Stern <stern@rowland.harvard.edu>
Reported-by: NWilfried Klaebe <linux-kernel@lebenslange-mailadresse.de>
Tested-by: NWilfried Klaebe <linux-kernel@lebenslange-mailadresse.de>
CC: <stable@vger.kernel.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

7e8b3dfe

Clarify naming of thread info/stack allocators · b235beea

由 Linus Torvalds 提交于 6月 24, 2016

We've had the thread info allocated together with the thread stack for
most architectures for a long time (since the thread_info was split off
from the task struct), but that is about to change.

But the patches that move the thread info to be off-stack (and a part of
the task struct instead) made it clear how confused the allocator and
freeing functions are.

Because the common case was that we share an allocation with the thread
stack and the thread_info, the two pointers were identical.  That
identity then meant that we would have things like

	ti = alloc_thread_info_node(tsk, node);
	...
	tsk->stack = ti;

which certainly _worked_ (since stack and thread_info have the same
value), but is rather confusing: why are we assigning a thread_info to
the stack? And if we move the thread_info away, the "confusing" code
just gets to be entirely bogus.

So remove all this confusion, and make it clear that we are doing the
stack allocation by renaming and clarifying the function names to be
about the stack.  The fact that the thread_info then shares the
allocation is an implementation detail, and not really about the
allocation itself.

This is a pure renaming and type fix: we pass in the same pointer, it's
just that we clarify what the pointer means.

The ia64 code that actually only has one single allocation (for all of
task_struct, thread_info and kernel thread stack) now looks a bit odd,
but since "tsk->stack" is actually not even used there, that oddity
doesn't matter.  It would be a separate thing to clean that up, I
intentionally left the ia64 changes as a pure brute-force renaming and
type change.
Acked-by: NAndy Lutomirski <luto@amacapital.net>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b235beea

24 6月, 2016 2 次提交

locking/static_key: Fix concurrent static_key_slow_inc() · 4c5ea0a9

由 Paolo Bonzini 提交于 6月 21, 2016

The following scenario is possible:

    CPU 1                                   CPU 2
    static_key_slow_inc()
     atomic_inc_not_zero()
      -> key.enabled == 0, no increment
     jump_label_lock()
     atomic_inc_return()
      -> key.enabled == 1 now
                                            static_key_slow_inc()
                                             atomic_inc_not_zero()
                                              -> key.enabled == 1, inc to 2
                                             return
                                            ** static key is wrong!
     jump_label_update()
     jump_label_unlock()

Testing the static key at the point marked by (**) will follow the
wrong path for jumps that have not been patched yet.  This can
actually happen when creating many KVM virtual machines with userspace
LAPIC emulation; just run several copies of the following program:

    #include <fcntl.h>
    #include <unistd.h>
    #include <sys/ioctl.h>
    #include <linux/kvm.h>

    int main(void)
    {
        for (;;) {
            int kvmfd = open("/dev/kvm", O_RDONLY);
            int vmfd = ioctl(kvmfd, KVM_CREATE_VM, 0);
            close(ioctl(vmfd, KVM_CREATE_VCPU, 1));
            close(vmfd);
            close(kvmfd);
        }
        return 0;
    }

Every KVM_CREATE_VCPU ioctl will attempt a static_key_slow_inc() call.
The static key's purpose is to skip NULL pointer checks and indeed one
of the processes eventually dereferences NULL.

As explained in the commit that introduced the bug:

  706249c2 ("locking/static_keys: Rework update logic")

jump_label_update() needs key.enabled to be true.  The solution adopted
here is to temporarily make key.enabled == -1, and use go down the
slow path when key.enabled <= 0.
Reported-by: NDmitry Vyukov <dvyukov@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org> # v4.3+
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 706249c2 ("locking/static_keys: Rework update logic")
Link: http://lkml.kernel.org/r/1466527937-69798-1-git-send-email-pbonzini@redhat.com
[ Small stylistic edits to the changelog and the code. ]
Signed-off-by: NIngo Molnar <mingo@kernel.org>

4c5ea0a9

pwm: Fix pwm_apply_args() · 33cdcee0

由 Boris Brezillon 提交于 6月 22, 2016

Commit 5ec803ed ("pwm: Add core infrastructure to allow atomic
updates"), implemented pwm_disable() as a wrapper around
pwm_apply_state(), and then, commit ef2bf499 ("pwm: Improve args
checking in pwm_apply_state()") added missing checks on the ->period
value in pwm_apply_state() to ensure we were not passing inappropriate
values to the ->config() or ->apply() methods.

The conjunction of these 2 commits led to a case where pwm_disable()
was no longer succeeding, thus preventing the polarity setting done
in pwm_apply_args().

Set a valid period in pwm_apply_args() to ensure polarity setting
won't be rejected.
Signed-off-by: NBoris Brezillon <boris.brezillon@free-electrons.com>
Reported-by: NGeert Uytterhoeven <geert@linux-m68k.org>
Suggested-by: NBrian Norris <briannorris@chromium.org>
Fixes: 5ec803ed ("pwm: Add core infrastructure to allow atomic updates")
Tested-by: NGeert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: NBrian Norris <briannorris@chromium.org>
Signed-off-by: NThierry Reding <thierry.reding@gmail.com>

33cdcee0

23 6月, 2016 2 次提交

IB/mlx5: Fix post send fence logic · c9b25495

由 Eli Cohen 提交于 6月 22, 2016

If the caller specified IB_SEND_FENCE in the send flags of the work
request and no previous work request stated that the successive one
should be fenced, the work request would be executed without a fence.
This could result in RDMA read or atomic operations failure due to a MR
being invalidated. Fix this by adding the mlx5 enumeration for fencing
RDMA/atomic operations and fix the logic to apply this.

Fixes: e126ba97 ('mlx5: Add driver for Mellanox Connect-IB adapters')
Signed-off-by: NEli Cohen <eli@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

c9b25495

net/mlx4_en: Avoid unregister_netdev at shutdown flow · 9d769311

由 Eran Ben Elisha 提交于 6月 21, 2016

This allows a clean shutdown, even if some netdev clients do not
release their reference from this netdev. It is enough to release
the HW resources only as the kernel is shutting down.

Fixes: 2ba5fbd6 ('net/mlx4_core: Handle AER flow properly')
Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9d769311

21 6月, 2016 2 次提交

time: Add time64_to_tm() · e6c2682a

由 Deepa Dinamani 提交于 6月 08, 2016

time_to_tm() takes time_t as an argument.
time_t is not y2038 safe.
Add time64_to_tm() that takes time64_t as an argument
which is y2038 safe.
The plan is to eventually replace all calls to time_to_tm()
by time64_to_tm().

Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: NDeepa Dinamani <deepa.kernel@gmail.com>
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>

e6c2682a

alarmtimer: Fix comments describing structure fields · af4afb40

由 Pratyush Patel 提交于 6月 14, 2016

Updated struct alarm and struct alarm_timer descriptions.

Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Signed-off-by: NPratyush Patel <pratyushpatel.1995@gmail.com>
Signed-off-by: NJohn Stultz <john.stultz@linaro.org>

af4afb40

20 6月, 2016 1 次提交

qed*: Don't reset statistics on inner reload · a0d26d5a

由 Yuval Mintz 提交于 6月 19, 2016

Several user APIs can cause driver to perform an inner-reload.
Currently, doing this would cause the HW/FW statistics of the
adapter to reset, which isn't the expected behavior [statistics
should only reset on explicit unloads].
Signed-off-by: NYuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a0d26d5a

18 6月, 2016 2 次提交

isa: Dummy isa_register_driver should return error code · 5e25db87

由 William Breathitt Gray 提交于 5月 09, 2016

The inline isa_register_driver stub simply allows compilation on systems
with CONFIG_ISA disabled; the dummy isa_register_driver does not
register an isa_driver at all. The inline isa_register_driver should
return -ENODEV to indicate lack of support when attempting to register
an isa_driver on such a system with CONFIG_ISA disabled.

Cc: Matthew Wilcox <matthew@wil.cx>
Reported-by: NSasha Levin <sasha.levin@oracle.com>
Tested-by: Ye Xiaolong
Signed-off-by: NWilliam Breathitt Gray <vilhelm.gray@gmail.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

5e25db87

isa: Allow ISA-style drivers on modern systems · 3a495511

由 William Breathitt Gray 提交于 5月 27, 2016

Several modern devices, such as PC/104 cards, are expected to run on
modern systems via an ISA bus interface. Since ISA is a legacy interface
for most modern architectures, ISA support should remain disabled in
general. Support for ISA-style drivers should be enabled on a per driver
basis.

To allow ISA-style drivers on modern systems, this patch introduces the
ISA_BUS_API and ISA_BUS Kconfig options. The ISA bus driver will now
build conditionally on the ISA_BUS_API Kconfig option, which defaults to
the legacy ISA Kconfig option. The ISA_BUS Kconfig option allows the
ISA_BUS_API Kconfig option to be selected on architectures which do not
enable ISA (e.g. X86_64).

The ISA_BUS Kconfig option is currently only implemented for X86
architectures. Other architectures may have their own ISA_BUS Kconfig
options added as required.
Reviewed-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NWilliam Breathitt Gray <vilhelm.gray@gmail.com>
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

3a495511

16 6月, 2016 4 次提交

bpf: fix matching of data/data_end in verifier · 19de99f7

由 Alexei Starovoitov 提交于 6月 15, 2016

The ctx structure passed into bpf programs is different depending on bpf
program type. The verifier incorrectly marked ctx->data and ctx->data_end
access based on ctx offset only. That caused loads in tracing programs
int bpf_prog(struct pt_regs *ctx) { .. ctx->ax .. }
to be incorrectly marked as PTR_TO_PACKET which later caused verifier
to reject the program that was actually valid in tracing context.
Fix this by doing program type specific matching of ctx offsets.

Fixes: 969bf05e ("bpf: direct packet access")
Reported-by: NSasha Goldshtein <goldshtn@gmail.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

19de99f7

net: Don't forget pr_fmt on net_dbg_ratelimited for CONFIG_DYNAMIC_DEBUG · daddef76

由 Jason A. Donenfeld 提交于 6月 15, 2016

The implementation of net_dbg_ratelimited in the CONFIG_DYNAMIC_DEBUG
case was added with 2c94b537 ("net: Implement net_dbg_ratelimited() for
CONFIG_DYNAMIC_DEBUG case"). The implementation strategy was to take the
usual definition of the dynamic_pr_debug macro, but alter it by adding a
call to "net_ratelimit()" in the if statement. This is, in fact, the
correct approach.

However, while doing this, the author of the commit forgot to surround
fmt by pr_fmt, resulting in unprefixed log messages appearing in the
console. So, this commit adds back the pr_fmt(fmt) invocation, making
net_dbg_ratelimited properly consistent across DEBUG, no DEBUG, and
DYNAMIC_DEBUG cases, and bringing parity with the behavior of
dynamic_pr_debug as well.

Fixes: 2c94b537 ("net: Implement net_dbg_ratelimited() for CONFIG_DYNAMIC_DEBUG case")
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
Cc: Tim Bingham <tbingham@akamai.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

daddef76

rcu: sysctl: Panic on RCU Stall · 088e9d25

由 Daniel Bristot de Oliveira 提交于 6月 02, 2016

It is not always easy to determine the cause of an RCU stall just by
analysing the RCU stall messages, mainly when the problem is caused
by the indirect starvation of rcu threads. For example, when preempt_rcu
is not awakened due to the starvation of a timer softirq.

We have been hard coding panic() in the RCU stall functions for
some time while testing the kernel-rt. But this is not possible in
some scenarios, like when supporting customers.

This patch implements the sysctl kernel.panic_on_rcu_stall. If
set to 1, the system will panic() when an RCU stall takes place,
enabling the capture of a vmcore. The vmcore provides a way to analyze
all kernel/tasks states, helping out to point to the culprit and the
solution for the stall.

The kernel.panic_on_rcu_stall sysctl is disabled by default.

Changes from v1:
- Fixed a typo in the git log
- The if(sysctl_panic_on_rcu_stall) panic() is in a static function
- Fixed the CONFIG_TINY_RCU compilation issue
- The var sysctl_panic_on_rcu_stall is now __read_mostly

Cc: Jonathan Corbet <corbet@lwn.net>
Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Cc: Josh Triplett <josh@joshtriplett.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Lai Jiangshan <jiangshanlai@gmail.com>
Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com>
Reviewed-by: NJosh Triplett <josh@joshtriplett.org>
Reviewed-by: NArnaldo Carvalho de Melo <acme@kernel.org>
Tested-by: N"Luis Claudio R. Goncalves" <lgoncalv@redhat.com>
Signed-off-by: NDaniel Bristot de Oliveira <bristot@redhat.com>
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

088e9d25

rcu: Make call_rcu_tasks() tolerate first call with irqs disabled · 4929c913

由 Paul E. McKenney 提交于 5月 02, 2016

Currently, if the very first call to call_rcu_tasks() has irqs disabled,
it will create the rcu_tasks_kthread with irqs disabled, which will
result in a splat in the memory allocator, which kthread_run() invokes
with the expectation that irqs are enabled.

This commit fixes this problem by deferring kthread creation if called
with irqs disabled.  The first call to call_rcu_tasks() that has irqs
enabled will create the kthread.

This bug was detected by rcutorture changes that were motivated by
Iftekhar Ahmed's mutation-testing efforts.
Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>

4929c913