提交 · f151d9db4c1e7f7ac202ae75f4cbc62cfc784156 · openeuler / Kernel

29 6月, 2016 2 次提交

bpf, perf: delay release of BPF prog after grace period · ceb56070

由 Daniel Borkmann 提交于 6月 27, 2016

Commit dead9f29 ("perf: Fix race in BPF program unregister") moved
destruction of BPF program from free_event_rcu() callback to __free_event(),
which is problematic if used with tail calls: if prog A is attached as
trace event directly, but at the same time present in a tail call map used
by another trace event program elsewhere, then we need to delay destruction
via RCU grace period since it can still be in use by the program doing the
tail call (the prog first needs to be dropped from the tail call map, then
trace event with prog A attached destroyed, so we get immediate destruction).

Fixes: dead9f29 ("perf: Fix race in BPF program unregister")
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Cc: Jann Horn <jann@thejh.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ceb56070

audit: move audit_get_tty to reduce scope and kabi changes · 3f5be2da

由 Richard Guy Briggs 提交于 6月 28, 2016

The only users of audit_get_tty and audit_put_tty are internal to
audit, so move it out of include/linux/audit.h to kernel.h and create
a proper function rather than inlining it.  This also reduces kABI
changes.
Suggested-by: NPaul Moore <pmoore@redhat.com>
Signed-off-by: NRichard Guy Briggs <rgb@redhat.com>
[PM: line wrapped description]
Signed-off-by: NPaul Moore <paul@paul-moore.com>

3f5be2da

28 6月, 2016 3 次提交

sock_diag: do not broadcast raw socket destruction · 9a0fee2b

由 Willem de Bruijn 提交于 6月 24, 2016

Diag intends to broadcast tcp_sk and udp_sk socket destruction.
Testing sk->sk_protocol for IPPROTO_TCP/IPPROTO_UDP alone is not
sufficient for this. Raw sockets can have the same type.

Add a test for sk->sk_type.

Fixes: eb4cb008 ("sock_diag: define destruction multicast groups")
Signed-off-by: NWillem de Bruijn <willemb@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9a0fee2b

drivers: net: stmmac: add port selection programming · 02e57b9d

由 Giuseppe CAVALLARO 提交于 6月 24, 2016

In case of SGMII more, for example when a MAC2MAC connection
is needed, the port selection bits (inside the MAC configuration
registers) have to be programmed according to the link selected.
So the patch adds a new DT parameter to pass the port selection
and to programmed related PCS and CORE to use it.
Signed-off-by: NGiuseppe Cavallaro <peppe.cavallaro@st.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

02e57b9d

of_mdio: select fixed phy support unconditionally · a5e4bd99

由 Arnd Bergmann 提交于 6月 24, 2016

Calling the fixed-phy functions when CONFIG_FIXED_PHY=m as a previous
change tried cannot work if the caller is in built-in code:

drivers/of/built-in.o: In function `of_phy_register_fixed_link':
of_reserved_mem.c:(.text+0x85e0): undefined reference to `fixed_phy_register'

Making of_mdio depend on 'FIXED_PHY || !FIXED_PHY' would solve this
dependency by enforcing that OF_MDIO itself becomes a loadable module
when FIXED_PHY=y, but that creates a different dependency as it
breaks any built-in ethernet driver that uses of_mdio.

Making FIXED_PHY a bool option also cannot work, since it depends on
PHYLIB, which again is tristate.

This version now uses 'select FIXED_PHY' to ensure that the fixed-phy
portion of of_mdio is not optional. The main downside of this is
a small increase in code size for cases that do not need fixed phy
support, but it should avoid all of the link-time problems.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Fixes: d1bd330a ("of_mdio: Enable fixed PHY support if driver is a module")
Acked-by: NRandy Dunlap <rdunlap@infradead.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a5e4bd99

27 6月, 2016 3 次提交

net/mlx5e: Report correct auto negotiation and allow toggling · 52244d96

由 Gal Pressman 提交于 6月 23, 2016

Previous to this patch auto negotiation was reported off although it was
on by default in hardware. This patch reports the correct information to
ethtool and allows the user to toggle it on/off.

Added another parameter to set port proto function in order to pass
the auto negotiation field to the hardware.
Signed-off-by: NGal Pressman <galp@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

52244d96

net/mlx5e: Toggle link only after modifying port parameters · 667daeda

由 Gal Pressman 提交于 6月 23, 2016

Add a dedicated function to toggle port link. It should be called only
after setting a port register.
Toggle will set port link to down and bring it back up in case that it's
admin status was up.
Signed-off-by: NGal Pressman <galp@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

667daeda

net/mlx5: Rate limit tables support · 1466cc5b

由 Yevgeny Petrilin 提交于 6月 23, 2016

Configuring and managing HW rate limit tables.
The HW holds a table of rate limits, each rate is
associated with an index in that table.
Later a Send Queue uses this index to set the rate limit.
Multiple Send Queues can have the same rate limit, which is
represented by a single entry in this table.
Even though a rate can be shared, each queue is being rate
limited independently of others.

The SW shadow of this table holds the rate itself,
the index in the HW table and the refcount (number of queues)
working with this rate.

The exported functions are mlx5_rl_add_rate and mlx5_rl_remove_rate.
Number of different rates and their values are derived
from HW capabilities.
Signed-off-by: NYevgeny Petrilin <yevgenyp@mellanox.com>
Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1466cc5b

25 6月, 2016 4 次提交

Revert "mm: make faultaround produce old ptes" · 315d09bf

由 Kirill A. Shutemov 提交于 6月 24, 2016

This reverts commit 5c0a85fa.

The commit causes ~6% regression in unixbench.

Let's revert it for now and consider other solution for reclaim problem
later.

Link: http://lkml.kernel.org/r/1465893750-44080-2-git-send-email-kirill.shutemov@linux.intel.comSigned-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
Reported-by: N"Huang, Ying" <ying.huang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rik van Riel <riel@redhat.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Vinayak Menon <vinmenon@codeaurora.org>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

315d09bf

mm: mempool: kasan: don't poot mempool objects in quarantine · 9b75a867

由 Andrey Ryabinin 提交于 6月 24, 2016

Currently we may put reserved by mempool elements into quarantine via
kasan_kfree().  This is totally wrong since quarantine may really free
these objects.  So when mempool will try to use such element,
use-after-free will happen.  Or mempool may decide that it no longer
need that element and double-free it.

So don't put object into quarantine in kasan_kfree(), just poison it.
Rename kasan_kfree() to kasan_poison_kfree() to respect that.

Also, we shouldn't use kasan_slab_alloc()/kasan_krealloc() in
kasan_unpoison_element() because those functions may update allocation
stacktrace.  This would be wrong for the most of the remove_element call
sites.

(The only call site where we may want to update alloc stacktrace is
 in mempool_alloc(). Kmemleak solves this by calling
 kmemleak_update_trace(), so we could make something like that too.
 But this is out of scope of this patch).

Fixes: 55834c59 ("mm: kasan: initial memory quarantine implementation")
Link: http://lkml.kernel.org/r/575977C3.1010905@virtuozzo.comSigned-off-by: NAndrey Ryabinin <aryabinin@virtuozzo.com>
Reported-by: NKuthonuzo Luruo <kuthonuzo.luruo@hpe.com>
Acked-by: NAlexander Potapenko <glider@google.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9b75a867

fix up initial thread stack pointer vs thread_info confusion · 7f1a00b6

由 Linus Torvalds 提交于 6月 24, 2016

The INIT_TASK() initializer was similarly confused about the stack vs
thread_info allocation that the allocators had, and that were fixed in
commit b235beea ("Clarify naming of thread info/stack allocators").

The task ->stack pointer only incidentally ends up having the same value
as the thread_info, and in fact that will change.

So fix the initial task struct initializer to point to 'init_stack'
instead of 'init_thread_info', and make sure the ia64 definition for
that exists.

This actually makes the ia64 tsk->stack pointer be sensible for the
initial task, but not for any other task.  As mentioned in commit
b235beea, that whole pointer isn't actually used on ia64, since
task_stack_page() there just points to the (single) allocation.

All the other architectures seem to have copied the 'init_stack'
definition, even if it tended to be generally unusued.
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7f1a00b6

Clarify naming of thread info/stack allocators · b235beea

由 Linus Torvalds 提交于 6月 24, 2016

We've had the thread info allocated together with the thread stack for
most architectures for a long time (since the thread_info was split off
from the task struct), but that is about to change.

But the patches that move the thread info to be off-stack (and a part of
the task struct instead) made it clear how confused the allocator and
freeing functions are.

Because the common case was that we share an allocation with the thread
stack and the thread_info, the two pointers were identical.  That
identity then meant that we would have things like

	ti = alloc_thread_info_node(tsk, node);
	...
	tsk->stack = ti;

which certainly _worked_ (since stack and thread_info have the same
value), but is rather confusing: why are we assigning a thread_info to
the stack? And if we move the thread_info away, the "confusing" code
just gets to be entirely bogus.

So remove all this confusion, and make it clear that we are doing the
stack allocation by renaming and clarifying the function names to be
about the stack.  The fact that the thread_info then shares the
allocation is an implementation detail, and not really about the
allocation itself.

This is a pure renaming and type fix: we pass in the same pointer, it's
just that we clarify what the pointer means.

The ia64 code that actually only has one single allocation (for all of
task_struct, thread_info and kernel thread stack) now looks a bit odd,
but since "tsk->stack" is actually not even used there, that oddity
doesn't matter.  It would be a separate thing to clean that up, I
intentionally left the ia64 changes as a pure brute-force renaming and
type change.
Acked-by: NAndy Lutomirski <luto@amacapital.net>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

b235beea

24 6月, 2016 4 次提交

locking/static_key: Fix concurrent static_key_slow_inc() · 4c5ea0a9

由 Paolo Bonzini 提交于 6月 21, 2016

The following scenario is possible:

    CPU 1                                   CPU 2
    static_key_slow_inc()
     atomic_inc_not_zero()
      -> key.enabled == 0, no increment
     jump_label_lock()
     atomic_inc_return()
      -> key.enabled == 1 now
                                            static_key_slow_inc()
                                             atomic_inc_not_zero()
                                              -> key.enabled == 1, inc to 2
                                             return
                                            ** static key is wrong!
     jump_label_update()
     jump_label_unlock()

Testing the static key at the point marked by (**) will follow the
wrong path for jumps that have not been patched yet.  This can
actually happen when creating many KVM virtual machines with userspace
LAPIC emulation; just run several copies of the following program:

    #include <fcntl.h>
    #include <unistd.h>
    #include <sys/ioctl.h>
    #include <linux/kvm.h>

    int main(void)
    {
        for (;;) {
            int kvmfd = open("/dev/kvm", O_RDONLY);
            int vmfd = ioctl(kvmfd, KVM_CREATE_VM, 0);
            close(ioctl(vmfd, KVM_CREATE_VCPU, 1));
            close(vmfd);
            close(kvmfd);
        }
        return 0;
    }

Every KVM_CREATE_VCPU ioctl will attempt a static_key_slow_inc() call.
The static key's purpose is to skip NULL pointer checks and indeed one
of the processes eventually dereferences NULL.

As explained in the commit that introduced the bug:

  706249c2 ("locking/static_keys: Rework update logic")

jump_label_update() needs key.enabled to be true.  The solution adopted
here is to temporarily make key.enabled == -1, and use go down the
slow path when key.enabled <= 0.
Reported-by: NDmitry Vyukov <dvyukov@google.com>
Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org>
Cc: <stable@vger.kernel.org> # v4.3+
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 706249c2 ("locking/static_keys: Rework update logic")
Link: http://lkml.kernel.org/r/1466527937-69798-1-git-send-email-pbonzini@redhat.com
[ Small stylistic edits to the changelog and the code. ]
Signed-off-by: NIngo Molnar <mingo@kernel.org>

4c5ea0a9

qed: Add support for coalescing config read/update. · 722003ac

由 Sudarsana Reddy Kalluru 提交于 6月 21, 2016

This patch adds support for configuring the device tx/rx coalescing
timeout values in the order of micro seconds. It also adds APIs for
upper layer drivers for reading/updating the coalescing values.
Signed-off-by: NSudarsana Reddy Kalluru <sudarsana.kalluru@qlogic.com>
Signed-off-by: NYuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

722003ac

net/mlx4_en: Add DCB PFC support through CEE netlink commands · af7d5185

由 Rana Shahout 提交于 6月 21, 2016

This patch adds support for reading and updating priority flow
control (PFC) attributes in the driver via netlink.
Signed-off-by: NRana Shahout <ranas@mellanox.com>
Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: NEugenia Emantayev <eugenia@mellanox.com>
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

af7d5185

of_mdio: Enable fixed PHY support if driver is a module · d1bd330a

由 Ben Hutchings 提交于 6月 21, 2016

The fixed_phy driver doesn't have to be built-in, and it's
important that of_mdio supports it even if it's a module.
Signed-off-by: NBen Hutchings <ben.hutchings@codethink.co.uk>
Acked-by: NFlorian Fainelli <f.fainelli@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d1bd330a

23 6月, 2016 2 次提交

IB/mlx5: Fix post send fence logic · c9b25495

由 Eli Cohen 提交于 6月 22, 2016

If the caller specified IB_SEND_FENCE in the send flags of the work
request and no previous work request stated that the successive one
should be fenced, the work request would be executed without a fence.
This could result in RDMA read or atomic operations failure due to a MR
being invalidated. Fix this by adding the mlx5 enumeration for fencing
RDMA/atomic operations and fix the logic to apply this.

Fixes: e126ba97 ('mlx5: Add driver for Mellanox Connect-IB adapters')
Signed-off-by: NEli Cohen <eli@mellanox.com>
Signed-off-by: NLeon Romanovsky <leon@kernel.org>
Signed-off-by: NDoug Ledford <dledford@redhat.com>

c9b25495

net/mlx4_en: Avoid unregister_netdev at shutdown flow · 9d769311

由 Eran Ben Elisha 提交于 6月 21, 2016

This allows a clean shutdown, even if some netdev clients do not
release their reference from this netdev. It is enough to release
the HW resources only as the kernel is shutting down.

Fixes: 2ba5fbd6 ('net/mlx4_core: Handle AER flow properly')
Signed-off-by: NEran Ben Elisha <eranbe@mellanox.com>
Signed-off-by: NTariq Toukan <tariqt@mellanox.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9d769311

22 6月, 2016 1 次提交

rxrpc: Fix exclusive connection handling · cc8feb8e

由 David Howells 提交于 4月 04, 2016

"Exclusive connections" are meant to be used for a single client call and
then scrapped. The idea is to limit the use of the negotiated security
context. The current code, however, isn't doing this: it is instead
restricting the socket to a single virtual connection and doing all the
calls over that.

This is changed such that the socket no longer maintains a special virtual
connection over which it will do all the calls, but rather gets a new one
each time a new exclusive call is made.

Further, using a socket option for this is a poor choice. It should be
done on sendmsg with a control message marker instead so that calls can be
marked exclusive individually. To that end, add RXRPC_EXCLUSIVE_CALL
which, if passed to sendmsg() as a control message element, will cause the
call to be done on an single-use connection.

The socket option (RXRPC_EXCLUSIVE_CONNECTION) still exists and, if set,
will override any lack of RXRPC_EXCLUSIVE_CALL being specified so that
programs using the setsockopt() will appear to work the same.
Signed-off-by: NDavid Howells <dhowells@redhat.com>

cc8feb8e

20 6月, 2016 1 次提交

qed*: Don't reset statistics on inner reload · a0d26d5a

由 Yuval Mintz 提交于 6月 19, 2016

Several user APIs can cause driver to perform an inner-reload.
Currently, doing this would cause the HW/FW statistics of the
adapter to reset, which isn't the expected behavior [statistics
should only reset on explicit unloads].
Signed-off-by: NYuval Mintz <Yuval.Mintz@qlogic.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a0d26d5a

19 6月, 2016 4 次提交

ipv6: RFC 4884 partial support for SIT/GRE tunnels · 20e1954f

由 Eric Dumazet 提交于 6月 18, 2016

When receiving an ICMPv4 message containing extensions as
defined in RFC 4884, and translating it to ICMPv6 at SIT
or GRE tunnel, we need some extra manipulation in order
to properly forward the extensions.

This patch only takes care of Time Exceeded messages as they
are the ones that typically carry information from various
routers in a fabric during a traceroute session.

It also avoids complex skb logic if the data_len is not
a multiple of 8.

RFC states :

   The "original datagram" field MUST contain at least 128 octets.
   If the original datagram did not contain 128 octets, the
   "original datagram" field MUST be zero padded to 128 octets.

In practice routers use 128 bytes of original datagram, not more.

Initial translation was added in commit ca15a078
("sit: generate icmpv6 error when receiving icmpv4 error")
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Oussama Ghorbel <ghorbel@pivasoftware.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

20e1954f

ipv6: translate ICMP_TIME_EXCEEDED to ICMPV6_TIME_EXCEED · 2d7a3b27

由 Eric Dumazet 提交于 6月 18, 2016

For better traceroute/mtr support for SIT and GRE tunnels,
we translate IPV4 ICMP ICMP_TIME_EXCEEDED to ICMPV6_TIME_EXCEED

We also have to translate the IPv4 source IP address of ICMP
message to IPv6 v4mapped.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2d7a3b27

ip6: move ipip6_err_gen_icmpv6_unreach() · 5fbba8ac

由 Eric Dumazet 提交于 6月 18, 2016

We want to use this helper from GRE as well, so this is
the time to move it in net/ipv6/icmp.c

Also add a @nhs parameter, since SIT and GRE have different
values for the header(s) to skip.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5fbba8ac

ipv6: icmp: add a force_saddr param to icmp6_send() · b1cadc1a

由 Eric Dumazet 提交于 6月 18, 2016

SIT or GRE tunnels might want to translate an IPV4 address
into a v4mapped one when translating ICMP to ICMPv6.

This patch adds the parameter to icmp6_send() but
does not change icmpv6_send() signature.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b1cadc1a

18 6月, 2016 4 次提交

isa: Dummy isa_register_driver should return error code · 5e25db87

由 William Breathitt Gray 提交于 5月 09, 2016

The inline isa_register_driver stub simply allows compilation on systems
with CONFIG_ISA disabled; the dummy isa_register_driver does not
register an isa_driver at all. The inline isa_register_driver should
return -ENODEV to indicate lack of support when attempting to register
an isa_driver on such a system with CONFIG_ISA disabled.

Cc: Matthew Wilcox <matthew@wil.cx>
Reported-by: NSasha Levin <sasha.levin@oracle.com>
Tested-by: Ye Xiaolong
Signed-off-by: NWilliam Breathitt Gray <vilhelm.gray@gmail.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

5e25db87

net: Remove deprecated tunnel specific UDP offload functions · 1938ee1f

由 Alexander Duyck 提交于 6月 16, 2016

Now that we have all the drivers using udp_tunnel_get_rx_ports,
ndo_add_udp_enc_rx_port, and ndo_del_udp_enc_rx_port we can drop the
function calls that were specific to VXLAN and GENEVE.
Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1938ee1f

net: Merge VXLAN and GENEVE push notifiers into a single notifier · 7c46a640

由 Alexander Duyck 提交于 6月 16, 2016

This patch merges the notifiers for VXLAN and GENEVE into a single UDP
tunnel notifier. The idea is that we will want to only have to make one
notifier call to receive the list of ports for VXLAN and GENEVE tunnels
that need to be offloaded.

In addition we add a new set of ndo functions named ndo_udp_tunnel_add and
ndo_udp_tunnel_del that are meant to allow us to track the tunnel meta-data
such as port and address family as tunnels are added and removed. The
tunnel meta-data is now transported in a structure named udp_tunnel_info
which for now carries the type, address family, and port number. In the
future this could be updated so that we can include a tuple of values
including things such as the destination IP address and other fields.

I also ended up going with a naming scheme that consisted of using the
prefix udp_tunnel on function names. I applied this to the notifier and
ndo ops as well so that it hopefully points to the fact that these are
primarily used in the udp_tunnel functions.
Signed-off-by: NAlexander Duyck <aduyck@mirantis.com>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7c46a640

isa: Allow ISA-style drivers on modern systems · 3a495511

由 William Breathitt Gray 提交于 5月 27, 2016

Several modern devices, such as PC/104 cards, are expected to run on
modern systems via an ISA bus interface. Since ISA is a legacy interface
for most modern architectures, ISA support should remain disabled in
general. Support for ISA-style drivers should be enabled on a per driver
basis.

To allow ISA-style drivers on modern systems, this patch introduces the
ISA_BUS_API and ISA_BUS Kconfig options. The ISA bus driver will now
build conditionally on the ISA_BUS_API Kconfig option, which defaults to
the legacy ISA Kconfig option. The ISA_BUS Kconfig option allows the
ISA_BUS_API Kconfig option to be selected on architectures which do not
enable ISA (e.g. X86_64).

The ISA_BUS Kconfig option is currently only implemented for X86
architectures. Other architectures may have their own ISA_BUS Kconfig
options added as required.
Reviewed-by: NGuenter Roeck <linux@roeck-us.net>
Signed-off-by: NWilliam Breathitt Gray <vilhelm.gray@gmail.com>
Acked-by: NLinus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

3a495511

17 6月, 2016 1 次提交

net: stmmac: allow to split suspend/resume from init/exit callbacks · cecbc556

由 Vincent Palatin 提交于 6月 15, 2016

Let the stmmac platform drivers provide dedicated suspend and resume
callbacks rather than always re-using the init and exits callbacks.
If the driver does not provide the suspend or resume callback, we fall
back to the old behavior trying to use exit or init.

This allows a specific platform to perform only a partial power-down on
suspend if Wake-on-Lan is enabled but always perform the full shutdown
sequence if the module is unloaded.
Signed-off-by: NVincent Palatin <vpalatin@chromium.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cecbc556

16 6月, 2016 11 次提交

bpf, maps: flush own entries on perf map release · 3b1efb19

由 Daniel Borkmann 提交于 6月 15, 2016

The behavior of perf event arrays are quite different from all
others as they are tightly coupled to perf event fds, f.e. shown
recently by commit e03e7ee3 ("perf/bpf: Convert perf_event_array
to use struct file") to make refcounting on perf event more robust.
A remaining issue that the current code still has is that since
additions to the perf event array take a reference on the struct
file via perf_event_get() and are only released via fput() (that
cleans up the perf event eventually via perf_event_release_kernel())
when the element is either manually removed from the map from user
space or automatically when the last reference on the perf event
map is dropped. However, this leads us to dangling struct file's
when the map gets pinned after the application owning the perf
event descriptor exits, and since the struct file reference will
in such case only be manually dropped or via pinned file removal,
it leads to the perf event living longer than necessary, consuming
needlessly resources for that time.

Relations between perf event fds and bpf perf event map fds can be
rather complex. F.e. maps can act as demuxers among different perf
event fds that can possibly be owned by different threads and based
on the index selection from the program, events get dispatched to
one of the per-cpu fd endpoints. One perf event fd (or, rather a
per-cpu set of them) can also live in multiple perf event maps at
the same time, listening for events. Also, another requirement is
that perf event fds can get closed from application side after they
have been attached to the perf event map, so that on exit perf event
map will take care of dropping their references eventually. Likewise,
when such maps are pinned, the intended behavior is that a user
application does bpf_obj_get(), puts its fds in there and on exit
when fd is released, they are dropped from the map again, so the map
acts rather as connector endpoint. This also makes perf event maps
inherently different from program arrays as described in more detail
in commit c9da161c ("bpf: fix clearing on persistent program
array maps").

To tackle this, map entries are marked by the map struct file that
added the element to the map. And when the last reference to that map
struct file is released from user space, then the tracked entries
are purged from the map. This is okay, because new map struct files
instances resp. frontends to the anon inode are provided via
bpf_map_new_fd() that is called when we invoke bpf_obj_get_user()
for retrieving a pinned map, but also when an initial instance is
created via map_create(). The rest is resolved by the vfs layer
automatically for us by keeping reference count on the map's struct
file. Any concurrent updates on the map slot are fine as well, it
just means that perf_event_fd_array_release() needs to delete less
of its own entires.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3b1efb19

bpf, maps: extend map_fd_get_ptr arguments · d056a788

由 Daniel Borkmann 提交于 6月 15, 2016

This patch extends map_fd_get_ptr() callback that is used by fd array
maps, so that struct file pointer from the related map can be passed
in. It's safe to remove map_update_elem() callback for the two maps since
this is only allowed from syscall side, but not from eBPF programs for these
two map types. Like in per-cpu map case, bpf_fd_array_map_update_elem()
needs to be called directly here due to the extra argument.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d056a788

bpf, maps: add release callback · 61d1b6a4

由 Daniel Borkmann 提交于 6月 15, 2016

Add a release callback for maps that is invoked when the last
reference to its struct file is gone and the struct file about
to be released by vfs. The handler will be used by fd array maps.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

61d1b6a4

bpf: fix matching of data/data_end in verifier · 19de99f7

由 Alexei Starovoitov 提交于 6月 15, 2016

The ctx structure passed into bpf programs is different depending on bpf
program type. The verifier incorrectly marked ctx->data and ctx->data_end
access based on ctx offset only. That caused loads in tracing programs
int bpf_prog(struct pt_regs *ctx) { .. ctx->ax .. }
to be incorrectly marked as PTR_TO_PACKET which later caused verifier
to reject the program that was actually valid in tracing context.
Fix this by doing program type specific matching of ctx offsets.

Fixes: 969bf05e ("bpf: direct packet access")
Reported-by: NSasha Goldshtein <goldshtn@gmail.com>
Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
Acked-by: NDaniel Borkmann <daniel@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

19de99f7

net: Don't forget pr_fmt on net_dbg_ratelimited for CONFIG_DYNAMIC_DEBUG · daddef76

由 Jason A. Donenfeld 提交于 6月 15, 2016

The implementation of net_dbg_ratelimited in the CONFIG_DYNAMIC_DEBUG
case was added with 2c94b537 ("net: Implement net_dbg_ratelimited() for
CONFIG_DYNAMIC_DEBUG case"). The implementation strategy was to take the
usual definition of the dynamic_pr_debug macro, but alter it by adding a
call to "net_ratelimit()" in the if statement. This is, in fact, the
correct approach.

However, while doing this, the author of the commit forgot to surround
fmt by pr_fmt, resulting in unprefixed log messages appearing in the
console. So, this commit adds back the pr_fmt(fmt) invocation, making
net_dbg_ratelimited properly consistent across DEBUG, no DEBUG, and
DYNAMIC_DEBUG cases, and bringing parity with the behavior of
dynamic_pr_debug as well.

Fixes: 2c94b537 ("net: Implement net_dbg_ratelimited() for CONFIG_DYNAMIC_DEBUG case")
Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
Cc: Tim Bingham <tbingham@akamai.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

daddef76

ipv6: introduce neighbour discovery ops · f997c55c

由 Alexander Aring 提交于 6月 15, 2016

This patch introduces neighbour discovery ops callback structure. The
idea is to separate the handling for 6LoWPAN into the 6lowpan module.

These callback offers 6lowpan different handling, such as 802.15.4 short
address handling or RFC6775 (Neighbor Discovery Optimization for IPv6
over 6LoWPANs).

Cc: David S. Miller <davem@davemloft.net>
Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Cc: James Morris <jmorris@namei.org>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: Patrick McHardy <kaber@trash.net>
Acked-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NAlexander Aring <aar@pengutronix.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f997c55c

6lowpan: add private neighbour data · 8626a0c8

由 Alexander Aring 提交于 6月 15, 2016

This patch will introduce a 6lowpan neighbour private data. Like the
interface private data we handle private data for generic 6lowpan and
for link-layer specific 6lowpan.

The current first use case if to save the short address for a 802.15.4
6lowpan neighbour.

Cc: David S. Miller <davem@davemloft.net>
Reviewed-by: NStefan Schmidt <stefan@osg.samsung.com>
Acked-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NAlexander Aring <aar@pengutronix.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8626a0c8

net_sched: add the ability to defer skb freeing · 1b5c5493

由 Eric Dumazet 提交于 6月 13, 2016

qdisc are changed under RTNL protection and often
while blocking BH and root qdisc spinlock.

When lots of skbs need to be dropped, we free
them under these locks causing TX/RX freezes,
and more generally latency spikes.

This commit adds rtnl_kfree_skbs(), used to queue
skbs for deferred freeing.

Actual freeing happens right after RTNL is released,
with appropriate scheduling points.

rtnl_qdisc_drop() can also be used in place
of disc_drop() when RTNL is held.

qdisc_reset_queue() and __qdisc_reset_queue() get
the new behavior, so standard qdiscs like pfifo, pfifo_fast...
have their ->reset() method automatically handled.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1b5c5493

skb_array: resize support · 7d7072e3

由 Michael S. Tsirkin 提交于 6月 13, 2016

Update skb_array after ptr_ring API changes.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Tested-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7d7072e3

ptr_ring: resize support · 5d49de53

由 Michael S. Tsirkin 提交于 6月 13, 2016

This adds ring resize support. Seems to be necessary as
users such as tun allow userspace control over queue size.

If resize is used, this costs us ability to peek at queue without
consumer lock - should not be a big deal as peek and consumer are
usually run on the same CPU.

If ring is made bigger, ring contents is preserved.  If ring is made
smaller, extra pointers are passed to an optional destructor callback.

Cleanup function also gains destructor callback such that
all pointers in queue can be cleaned up.

This changes some APIs but we don't have any users yet,
so it won't break bisect.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5d49de53

skb_array: array based FIFO for skbs · ad69f35d

由 Michael S. Tsirkin 提交于 6月 13, 2016

A simple array based FIFO of pointers.  Intended for net stack so uses
skbs for type safety. Implemented as a set of wrappers around ptr_ring.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Acked-by: NJesper Dangaard Brouer <brouer@redhat.com>
Tested-by: NJesper Dangaard Brouer <brouer@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ad69f35d

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功