提交 · 8b929ab12fb2ab960adb3c3ec8d107fef5ff3243 · openeuler / raspberrypi-kernel

24 3月, 2015 6 次提交

inet: remove some sk_listener dependencies · 8b929ab1

由 Eric Dumazet 提交于 3月 22, 2015

listener can be source of false sharing. request sock has some
useful information like : ireq->ir_iif, ireq->ir_num, ireq->ireq_net

This patch does not solve the major problem of having to read
sk->sk_protocol which is sharing a cache line with sk->sk_wmem_alloc.
(This same field is read later in ip_build_and_send_pkt())

One idea would be to move sk_protocol close to sk_family
(using 8 bits instead of 16 for sk_family seems enough)
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8b929ab1

inet: remove sk_listener parameter from syn_ack_timeout() · 42cb80a2

由 Eric Dumazet 提交于 3月 22, 2015

It is not needed, and req->sk_listener points to the listener anyway.
request_sock argument can be const.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

42cb80a2

inet: cache listen_sock_qlen() and read rskq_defer_accept once · 2b41fab7

由 Eric Dumazet 提交于 3月 22, 2015

Cache listen_sock_qlen() to limit false sharing, and read
rskq_defer_accept once as it might change under us.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2b41fab7

switchdev: fix stp update API to work with layered netdevices · 558d51fa

由 Roopa Prabhu 提交于 3月 21, 2015

make it same as the netdev_switch_port_bridge_setlink/dellink
api (ie traverse lowerdevs to get to the switch port).

removes "WARN_ON(!ops->ndo_switch_parent_id_get)" because
direct bridge ports can be stacked netdevices (like bonds
and team of switch ports) which may not implement this ndo.

v2 to v3:
	- remove changes to bond and team. Bring back the
	transparently following lowerdevs like i initially
	had for setlink/getlink
	(http://www.spinics.net/lists/netdev/msg313436.html)
	dave and scott feldman also seem to prefer it be that
	way and move to non-transparent way of doing things
	if we see a problem down the lane.

v3 to v4:
	- fix ret initialization

v4 to v5:
	- return err on first failure (scott feldman)

v5 to v6:
	- change variable name (err) and initialize to
	-EOPNOTSUPP (scott feldman).
Signed-off-by: NRoopa Prabhu <roopa@cumulusnetworks.com>
Acked-by: NScott Feldman <sfeldma@gmail.com>
Acked-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

558d51fa

net: clear skb->priority when forwarding to another netns · 08b4b8ea

由 WANG Cong 提交于 3月 20, 2015

skb->priority can be set for two purposes:

1) With respect to IP TOS field, which is computed by a mask.
Ususally used for priority qdisc's (pfifo, prio etc.), on TX
side (we only have ingress qdisc on RX side).

2) Used as a classid or flowid, works in the same way with tc
classid. What's more, this can even override the classid
of tc filters.

For case 1), it has been respected within its netns, I don't
see any point of keeping it for another netns, especially
when packets will be forwarded to Rx path (no matter from TX
path or RX path).

For case 2) we care, our applications run inside a netns,
and we classify the packets by our own filters outside,
If some application sets this priority, it could bypass
our filters, therefore clear it when moving out of a netns,
it makes no sense to bypass tc filters out of its netns.
Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

08b4b8ea

net: socket: add support for async operations · 0345f931

由 tadeusz.struk@intel.com 提交于 3月 19, 2015

Add support for async operations.
Signed-off-by: NTadeusz Struk <tadeusz.struk@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0345f931

21 3月, 2015 19 次提交

netlink: Remove netlink_compare_arg.trailer · 8f2ddaac

由 Herbert Xu 提交于 3月 21, 2015

Instead of computing the offset from trailer, this patch computes
netlink_compare_arg_len from the offset of portid and then adds 4
to it.  This allows trailer to be removed.
Reported-by: NDavid Miller <davem@davemloft.net>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8f2ddaac

net: neighbour: Add mcast_resolicit to configure the number of multicast... · 8da86466

由 YOSHIFUJI Hideaki/吉藤英明提交于 3月 19, 2015

net: neighbour: Add mcast_resolicit to configure the number of multicast resolicitations in PROBE state.

We send unicast neighbor (ARP or NDP) solicitations ucast_probes
times in PROBE state.  Zhu Yanjun reported that some implementation
does not reply against them and the entry will become FAILED, which
is undesirable.

We had been dealt with such nodes by sending multicast probes mcast_
solicit times after unicast probes in PROBE state.  In 2003, I made
a change not to send them to improve compatibility with IPv6 NDP.

Let's introduce per-protocol per-interface sysctl knob "mcast_
reprobe" to configure the number of multicast (re)solicitation for
reconfirmation in PROBE state.  The default is 0, since we have
been doing so for 10+ years.
Reported-by: NZhu Yanjun <Yanjun.Zhu@windriver.com>
CC: Ulf Samuelsson <ulf.samuelsson@ericsson.com>
Signed-off-by: NYOSHIFUJI Hideaki <hideaki.yoshifuji@miraclelinux.com>
Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8da86466

net: dsa: make NET_DSA manually selectable from the config · c6f15070

由 Mathieu Olivari 提交于 3月 20, 2015

Change bd76a116 made all DSA drivers
depend on NET_DSA rather than selecting them. However, as the only way
to select this option was to actually select a driver, it made DSA
impossible to enable at all.

This patch adds an explicit entry which the user will have to enable
prior selecting a driver.
Signed-off-by: NMathieu Olivari <mathieu@codeaurora.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c6f15070

Revert "selinux: add a skb_owned_by() hook" · d3593b5c

由 Eric Dumazet 提交于 3月 20, 2015

This reverts commit ca10b9e9.

No longer needed after commit eb8895de
("tcp: tcp_make_synack() should use sock_wmalloc")

When under SYNFLOOD, we build lot of SYNACK and hit false sharing
because of multiple modifications done on sk_listener->sk_wmem_alloc

Since tcp_make_synack() uses sock_wmalloc(), there is no need
to call skb_set_owner_w() again, as this adds two atomic operations.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d3593b5c

act_bpf: add initial eBPF support for actions · a8cb5f55

由 Daniel Borkmann 提交于 3月 20, 2015

This work extends the "classic" BPF programmable tc action by extending
its scope also to native eBPF code!

Together with commit e2e9b654 ("cls_bpf: add initial eBPF support
for programmable classifiers") this adds the facility to implement fully
flexible classifier and actions for tc that can be implemented in a C
subset in user space, "safely" loaded into the kernel, and being run in
native speed when JITed.

Also, since eBPF maps can be shared between eBPF programs, it offers the
possibility that cls_bpf and act_bpf can share data 1) between themselves
and 2) between user space applications. That means that, f.e. customized
runtime statistics can be collected in user space, but also more importantly
classifier and action behaviour could be altered based on map input from
the user space application.

For the remaining details on the workflow and integration, see the cls_bpf
commit e2e9b654. Preliminary iproute2 part can be found under [1].

  [1] http://git.breakpoint.cc/cgit/dborkman/iproute2.git/log/?h=ebpf-actSigned-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Acked-by: NJiri Pirko <jiri@resnulli.us>
Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a8cb5f55

ebpf: add sched_act_type and map it to sk_filter's verifier ops · 94caee8c

由 Daniel Borkmann 提交于 3月 20, 2015

In order to prepare eBPF support for tc action, we need to add
sched_act_type, so that the eBPF verifier is aware of what helper
function act_bpf may use, that it can load skb data and read out
currently available skb fields.

This is bascially analogous to 96be4325 ("ebpf: add sched_cls_type
and map it to sk_filter's verifier ops").

BPF_PROG_TYPE_SCHED_CLS and BPF_PROG_TYPE_SCHED_ACT need to be
separate since both will have a different set of functionality in
future (classifier vs action), thus we won't run into ABI troubles
when the point in time comes to diverge functionality from the
classifier.

The future plan for act_bpf would be that it will be able to write
into skb->data and alter selected fields mirrored in struct __sk_buff.

For an initial support, it's sufficient to map it to sk_filter_ops.
Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Cc: Jiri Pirko <jiri@resnulli.us>
Reviewed-by: NJiri Pirko <jiri@resnulli.us>
Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

94caee8c

net: validate the range we feed to iov_iter_init() in sys_sendto/sys_recvfrom · 4de930ef

由 Al Viro 提交于 3月 20, 2015

Cc: stable@vger.kernel.org # v3.19
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4de930ef

net: compat: Update get_compat_msghdr() to match copy_msghdr_from_user() behaviour · 91edd096

由 Catalin Marinas 提交于 3月 20, 2015

Commit db31c55a (net: clamp ->msg_namelen instead of returning an
error) introduced the clamping of msg_namelen when the unsigned value
was larger than sizeof(struct sockaddr_storage). This caused a
msg_namelen of -1 to be valid. The native code was subsequently fixed by
commit dbb490b9 (net: socket: error on a negative msg_namelen).

In addition, the native code sets msg_namelen to 0 when msg_name is
NULL. This was done in commit (6a2a2b3a net:socket: set msg_namelen
to 0 if msg_name is passed as NULL in msghdr struct from userland) and
subsequently updated by 08adb7da (fold verify_iovec() into
copy_msghdr_from_user()).

This patch brings the get_compat_msghdr() in line with
copy_msghdr_from_user().

Fixes: db31c55a (net: clamp ->msg_namelen instead of returning an error)
Cc: David S. Miller <davem@davemloft.net>
Cc: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

91edd096

tipc: Use inlined rhashtable interface · 6cca7289

由 Herbert Xu 提交于 3月 20, 2015

This patch converts tipc to the inlined rhashtable interface.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6cca7289

netfilter: Convert nft_hash to inlined rhashtable · fa377321

由 Herbert Xu 提交于 3月 20, 2015

This patch converts nft_hash to the inlined rhashtable interface.

This patch also replaces the call to rhashtable_lookup_compare with
a straight rhashtable_lookup_fast because it's simply doing a memcmp
(in fact nft_hash_lookup already uses memcmp instead of nft_data_cmp).

Furthermore, the compare function is only meant to compare, it is not
supposed to have side-effects.  The current side-effect code can
simply be moved into the nft_hash_get.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fa377321

netlink: Move namespace into hash key · c428ecd1

由 Herbert Xu 提交于 3月 20, 2015

Currently the name space is a de facto key because it has to match
before we find an object in the hash table.  However, it isn't in
the hash value so all objects from different name spaces with the
same port ID hash to the same bucket.

This is bad as the number of name spaces is unbounded.

This patch fixes this by using the namespace when doing the hash.

Because the namespace field doesn't lie next to the portid field
in the netlink socket, this patch switches over to the rhashtable
interface without a fixed key.

This patch also uses the new inlined rhashtable interface where
possible.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c428ecd1

ebpf, filter: do not convert skb->protocol to host endianess during runtime · 0b8c707d

由 Daniel Borkmann 提交于 3月 19, 2015

Commit c2497395 ("bpf: allow BPF programs access 'protocol' and 'vlan_tci'
fields") has added support for accessing protocol, vlan_present and vlan_tci
into the skb offset map.

As referenced in the below discussion, accessing skb->protocol from an eBPF
program should be converted without handling endianess.

The reason for this is that an eBPF program could simply do a check more
naturally, by f.e. testing skb->protocol == htons(ETH_P_IP), where the LLVM
compiler resolves htons() against a constant automatically during compilation
time, as opposed to an otherwise needed run time conversion.

After all, the way of programming both from a user perspective differs quite
a lot, i.e. bpf_asm ["ld proto"] versus a C subset/LLVM.

Reference: https://patchwork.ozlabs.org/patch/450819/Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
Acked-by: NAlexei Starovoitov <ast@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0b8c707d

ipv6: invert join/leave anycast rtnl/socket locking order · c4a6853d

由 Marcelo Ricardo Leitner 提交于 3月 20, 2015

Commit baf606d9 ("ipv4,ipv6: grab rtnl before locking the socket")
missed to update two setsockopt options, IPV6_JOIN_ANYCAST and
IPV6_LEAVE_ANYCAST, causing a lock inverstion regarding to the updated ones.

As ipv6_sock_ac_join and ipv6_sock_ac_leave are only called from
do_ipv6_setsockopt, we are good to just move the rtnl lock upper.

Fixes: baf606d9 ("ipv4,ipv6: grab rtnl before locking the socket")
Reported-by: NYing Huang <ying.huang@intel.com>
Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c4a6853d

tcp: fix tcp fin memory accounting · d22e1537

由 Josh Hunt 提交于 3月 19, 2015

tcp_send_fin() does not account for the memory it allocates properly, so
sk_forward_alloc can be negative in cases where we've sent a FIN:

ss example output (ss -amn | grep -B1 f4294):
tcp    FIN-WAIT-1 0      1            192.168.0.1:45520         192.0.2.1:8080
	skmem:(r0,rb87380,t0,tb87380,f4294966016,w1280,o0,bl0)
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d22e1537

ipv6: fix backtracking for throw routes · 73ba57bf

由 Steven Barth 提交于 3月 19, 2015

for throw routes to trigger evaluation of other policy rules
EAGAIN needs to be propagated up to fib_rules_lookup
similar to how its done for IPv4

A simple testcase for verification is:

ip -6 rule add lookup 33333 priority 33333
ip -6 route add throw 2001:db8::1
ip -6 route add 2001:db8::1 via fe80::1 dev wlan0 table 33333
ip route get 2001:db8::1
Signed-off-by: NSteven Barth <cyrus@openwrt.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

73ba57bf

ipv6: call ipv6_proxy_select_ident instead of ipv6_select_ident in udp6_ufo_fragment · 8e199dfd

由 Sabrina Dubroca 提交于 3月 19, 2015

Matt Grant reported frequent crashes in ipv6_select_ident when
udp6_ufo_fragment is called from openvswitch on a skb that doesn't
have a dst_entry set.

ipv6_proxy_select_ident generates the frag_id without using the dst
associated with the skb.  This approach was suggested by Vladislav
Yasevich.

Fixes: 0508c07f ("ipv6: Select fragment id during UFO segmentation if not set.")
Cc: Vladislav Yasevich <vyasevic@redhat.com>
Reported-by: NMatt Grant <matt@mattgrant.net.nz>
Tested-by: NMatt Grant <matt@mattgrant.net.nz>
Signed-off-by: NSabrina Dubroca <sd@queasysnail.net>
Acked-by: NVladislav Yasevich <vyasevic@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8e199dfd

net: increase sk_[max_]ack_backlog · becb74f0

由 Eric Dumazet 提交于 3月 19, 2015

sk_ack_backlog & sk_max_ack_backlog were 16bit fields, meaning
listen() backlog was limited to 65535.

It is time to increase the width to allow much bigger backlog,
if admins change /proc/sys/net/core/somaxconn &
/proc/sys/net/ipv4/tcp_max_syn_backlog default values.

Tested:

echo 5000000 >/proc/sys/net/core/somaxconn
echo 5000000 >/proc/sys/net/ipv4/tcp_max_syn_backlog

Ran a SYNFLOOD test against a listener using listen(fd, 5000000)

myhost~# grep request_sock_TCP /proc/slabinfo
request_sock_TCP  4185642 4411940    304   13    1 : tunables   54   27    8 : slabdata 339380 339380      0
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

becb74f0

inet: get rid of central tcp/dccp listener timer · fa76ce73

由 Eric Dumazet 提交于 3月 19, 2015

One of the major issue for TCP is the SYNACK rtx handling,
done by inet_csk_reqsk_queue_prune(), fired by the keepalive
timer of a TCP_LISTEN socket.

This function runs for awful long times, with socket lock held,
meaning that other cpus needing this lock have to spin for hundred of ms.

SYNACK are sent in huge bursts, likely to cause severe drops anyway.

This model was OK 15 years ago when memory was very tight.

We now can afford to have a timer per request sock.

Timer invocations no longer need to lock the listener,
and can be run from all cpus in parallel.

With following patch increasing somaxconn width to 32 bits,
I tested a listener with more than 4 million active request sockets,
and a steady SYNFLOOD of ~200,000 SYN per second.
Host was sending ~830,000 SYNACK per second.

This is ~100 times more what we could achieve before this patch.

Later, we will get rid of the listener hash and use ehash instead.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fa76ce73

inet: drop prev pointer handling in request sock · 52452c54

由 Eric Dumazet 提交于 3月 19, 2015

When request sock are put in ehash table, the whole notion
of having a previous request to update dl_next is pointless.

Also, following patch will get rid of big purge timer,
so we want to delete a request sock without holding listener lock.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

52452c54

20 3月, 2015 4 次提交

tipc: fix build issue when building without IPv6 · 446981e5

由 Marcelo Ricardo Leitner 提交于 3月 19, 2015

We can't directly call ipv6_sock_mc_join() but should use the stub
instead and protect it around IS_ENABLED.

Fixes: d0f91938 ("tipc: add ip/udp media type")
Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

446981e5

tipc: add support for connect() on dgram/rdm sockets · f2f8036e

由 Erik Hugne 提交于 3月 19, 2015

Following the example of ip4_datagram_connect, we store the
address in the socket structure for dgram/rdm sockets and use
that as the default destination for subsequent send() calls.
It is allowed to connect to any address types, and the behaviour
of send() will be the same as a normal sendto() with this address
provided. Binding to an AF_UNSPEC address clears the association.
Signed-off-by: NErik Hugne <erik.hugne@ericsson.com>
Reviewed-by: NYing Xue <ying.xue@windriver.com>
Reviewed-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f2f8036e

tipc: do not report -EHOSTUNREACH for failed local delivery · 3bd88ee7

由 Erik Hugne 提交于 3月 19, 2015

Since commit 1186adf7 ("tipc: simplify message forwarding and
rejection in socket layer") -EHOSTUNREACH is propagated back to
the sending process if we fail to deliver the message to another
socket local to the node.
This is wrong, host unreachable should only be reported when the
destination port/name does not exist in the cluster, and that
check is always done before sending the message. Also, this
introduces inconsistent sendmsg() behavior for local/remote
destinations. Errors occurring on the receiving side should not
trickle up to the sender. If message delivery fails TIPC should
either discard the packet or reject it back to the sender based
on the destination droppable option.
Signed-off-by: NErik Hugne <erik.hugne@ericsson.com>
Acked-by: NYing Xue <ying.xue@windriver.com>
Acked-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3bd88ee7

tipc: remove redundant call to tipc_node_remove_conn · 18d6c584

由 Erik Hugne 提交于 3月 19, 2015

tipc_node_remove_conn may be called twice if shutdown() is
called on a socket that have messages in the receive queue.
Calling this function twice does no harm, but is unnecessary
and we remove the redundant call.
Signed-off-by: NErik Hugne <erik.hugne@ericsson.com>
Acked-by: NYing Xue <ying.xue@windriver.com>
Acked-by: NJon Maloy <jon.maloy@ericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

18d6c584

19 3月, 2015 11 次提交

mac802154: fix typo in header guard · ea6edfbc

由 Nicolas Iooss 提交于 3月 19, 2015

Signed-off-by: NNicolas Iooss <nicolas.iooss_linux@m4x.org>
Fixes: b6eea9ca ("mac802154: introduce driver-ops header")
Acked-by: NAlexander Aring <alex.aring@gmail.com>
Signed-off-by: NMarcel Holtmann <marcel@holtmann.org>

ea6edfbc

J
bridge: add ageing_time, stp_state, priority over netlink · af615762
由 Jörg Thalheim 提交于 3月 18, 2015
```
Signed-off-by: NJörg Thalheim <joerg@higgsboson.tk>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
af615762

net: Fix high overhead of vlan sub-device teardown. · 99c4a26a

由 David S. Miller 提交于 3月 18, 2015

When a networking device is taken down that has a non-trivial number
of VLAN devices configured under it, we eat a full synchronize_net()
for every such VLAN device.

This is because of the call chain:

	NETDEV_DOWN notifier
	--> vlan_device_event()
		--> dev_change_flags()
		--> __dev_change_flags()
		--> __dev_close()
		--> __dev_close_many()
		--> dev_deactivate_many()
			--> synchronize_net()

This is kind of rediculous because we already have infrastructure for
batching doing operation X to a list of net devices so that we only
incur one sync.

So make use of that by exporting dev_close_many() and adjusting it's
interfaace so that the caller can fully manage the batch list.  Use
this in vlan_device_event() and all the overhead goes away.
Reported-by: NSalam Noureddine <noureddine@arista.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

99c4a26a

inet: add a schedule point in inet_twsk_purge() · 738e6d30

由 Eric Dumazet 提交于 3月 18, 2015

On a large hash table, we can easily spend seconds to
walk over all entries. Add a cond_resched() to yield
cpu if necessary.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

738e6d30

net: add support for phys_port_name · db24a904

由 David Ahern 提交于 3月 17, 2015

Similar to port id allow netdevices to specify port names and export
the name via sysfs. Drivers can implement the netdevice operation to
assist udev in having sane default names for the devices using the
rule:

$ cat /etc/udev/rules.d/80-net-setup-link.rules
SUBSYSTEM=="net", ACTION=="add", ATTR{phys_port_name}!="",
NAME="$attr{phys_port_name}"

Use of phys_name versus phys_id was suggested-by Jiri Pirko.
Signed-off-by: NDavid Ahern <dsahern@gmail.com>
Acked-by: NJiri Pirko <jiri@resnulli.us>
Acked-by: NScott Feldman <sfeldma@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

db24a904

ipv4, ipv6: kill ip_mc_{join, leave}_group and ipv6_sock_mc_{join, drop} · 54ff9ef3

由 Marcelo Ricardo Leitner 提交于 3月 18, 2015

in favor of their inner __ ones, which doesn't grab rtnl.

As these functions need to operate on a locked socket, we can't be
grabbing rtnl by then. It's too late and doing so causes reversed
locking.

So this patch:
- move rtnl handling to callers instead while already fixing some
  reversed locking situations, like on vxlan and ipvs code.
- renames __ ones to not have the __ mark:
  __ip_mc_{join,leave}_group -> ip_mc_{join,leave}_group
  __ipv6_sock_mc_{join,drop} -> ipv6_sock_mc_{join,drop}
Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

54ff9ef3

ipv4,ipv6: grab rtnl before locking the socket · baf606d9

由 Marcelo Ricardo Leitner 提交于 3月 18, 2015

There are some setsockopt operations in ipv4 and ipv6 that are grabbing
rtnl after having grabbed the socket lock. Yet this makes it impossible
to do operations that have to lock the socket when already within a rtnl
protected scope, like ndo dev_open and dev_stop.

We normally take coarse grained locks first but setsockopt inverted that.

So this patch invert the lock logic for these operations and makes
setsockopt grab rtnl if it will be needed prior to grabbing socket lock.
Signed-off-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
Acked-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

baf606d9

inet: request sock should init IPv6/IPv4 addresses · 08d2cc3b

由 Eric Dumazet 提交于 3月 18, 2015

In order to be able to use sk_ehashfn() for request socks,
we need to initialize their IPv6/IPv4 addresses.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

08d2cc3b

inet: get rid of last __inet_hash_connect() argument · b4d6444e

由 Eric Dumazet 提交于 3月 18, 2015

We now always call __inet_hash_nolisten(), no need to pass it
as an argument.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b4d6444e

ipv6: get rid of __inet6_hash() · 77a6a471

由 Eric Dumazet 提交于 3月 18, 2015

We can now use inet_hash() and __inet_hash() instead of private
functions.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

77a6a471

inet: add IPv6 support to sk_ehashfn() · d1e559d0

由 Eric Dumazet 提交于 3月 18, 2015

Intent is to converge IPv4 & IPv6 inet_hash functions to
factorize code.

IPv4 sockets initialize sk_rcv_saddr and sk_v6_daddr
in this patch, thanks to new sk_daddr_set() and sk_rcv_saddr_set()
helpers.

__inet6_hash can now use sk_ehashfn() instead of a private
inet6_sk_ehashfn() and will simply use __inet_hash() in a
following patch.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d1e559d0