提交 · 43815482370c510c569fd18edb57afcb0fa8cab6 · openeuler / raspberrypi-kernel

02 5月, 2010 1 次提交

net: sock_def_readable() and friends RCU conversion · 43815482

由 Eric Dumazet 提交于 4月 29, 2010

sk_callback_lock rwlock actually protects sk->sk_sleep pointer, so we
need two atomic operations (and associated dirtying) per incoming
packet.

RCU conversion is pretty much needed :

1) Add a new structure, called "struct socket_wq" to hold all fields
that will need rcu_read_lock() protection (currently: a
wait_queue_head_t and a struct fasync_struct pointer).

[Future patch will add a list anchor for wakeup coalescing]

2) Attach one of such structure to each "struct socket" created in
sock_alloc_inode().

3) Respect RCU grace period when freeing a "struct socket_wq"

4) Change sk_sleep pointer in "struct sock" by sk_wq, pointer to "struct
socket_wq"

5) Change sk_sleep() function to use new sk->sk_wq instead of
sk->sk_sleep

6) Change sk_has_sleeper() to wq_has_sleeper() that must be used inside
a rcu_read_lock() section.

7) Change all sk_has_sleeper() callers to :
  - Use rcu_read_lock() instead of read_lock(&sk->sk_callback_lock)
  - Use wq_has_sleeper() to eventually wakeup tasks.
  - Use rcu_read_unlock() instead of read_unlock(&sk->sk_callback_lock)

8) sock_wake_async() is modified to use rcu protection as well.

9) Exceptions :
  macvtap, drivers/net/tun.c, af_unix use integrated "struct socket_wq"
instead of dynamically allocated ones. They dont need rcu freeing.

Some cleanups or followups are probably needed, (possible
sk_callback_lock conversion to a spinlock for example...).
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

43815482

01 5月, 2010 3 次提交

ipv6: cleanup: remove unneeded null check · 83d7eb29

由 Dan Carpenter 提交于 4月 30, 2010

We dereference "sk" unconditionally elsewhere in the function.  

This was left over from:  b30bd282 "ip6_xmit: remove unnecessary NULL
ptr check".  According to that commit message, "the sk argument to 
ip6_xmit is never NULL nowadays since the skb->priority assigment 
expects a valid socket."
Signed-off-by: NDan Carpenter <error27@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

83d7eb29

xfrm: potential uninitialized variable num_xfrms · 4b021628

由 Changli Gao 提交于 4月 27, 2010

potential uninitialized variable num_xfrms

fix compiler warning: 'num_xfrms' may be used uninitialized in this function.
Signed-off-by: NChangli Gao <xiaosuo@gmail.com>
----
 net/xfrm/xfrm_policy.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4b021628

net: speedup sock_recv_ts_and_drops() · 767dd033

由 Eric Dumazet 提交于 4月 28, 2010

sock_recv_ts_and_drops() is fat and slow (~ 4% of cpu time on some
profiles)

We can test all socket flags at once to make fast path fast again.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

767dd033

29 4月, 2010 8 次提交

net: ip_queue_rcv_skb() helper · f84af32c

由 Eric Dumazet 提交于 4月 28, 2010

When queueing a skb to socket, we can immediately release its dst if
target socket do not use IP_CMSG_PKTINFO.

tcp_data_queue() can drop dst too.

This to benefit from a hot cache line and avoid the receiver, possibly
on another cpu, to dirty this cache line himself.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f84af32c

net: speedup udp receive path · 4b0b72f7

由 Eric Dumazet 提交于 4月 28, 2010

Since commit 95766fff ([UDP]: Add memory accounting.), 
each received packet needs one extra sock_lock()/sock_release() pair.

This added latency because of possible backlog handling. Then later,
ticket spinlocks added yet another latency source in case of DDOS.

This patch introduces lock_sock_bh() and unlock_sock_bh()
synchronization primitives, avoiding one atomic operation and backlog
processing.

skb_free_datagram_locked() uses them instead of full blown
lock_sock()/release_sock(). skb is orphaned inside locked section for
proper socket memory reclaim, and finally freed outside of it.

UDP receive path now take the socket spinlock only once.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4b0b72f7

Bugfix: Link selection was swapped in switch. · 2c485209

由 Sjur Braendeland 提交于 4月 28, 2010

Signed-off-by: NSjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2c485209

caif: Bugfixes in CAIF netdevice for close and flow control · 8391c4aa

由 Sjur Braendeland 提交于 4月 28, 2010

Changes:
o Bugfix: Flow control was causing the device to be destroyed.
o Bugfix: Handle CAIF channel connect failures.
o If the underlying link layer is gone the net-device is no longer removed,
  but closed.
Signed-off-by: NSjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8391c4aa

caif: Rewritten socket implementation · bece7b23

由 Sjur Braendeland 提交于 4月 28, 2010

Changes:
 This is a complete re-write of the socket layer. Making the socket
 implementation more aligned with the other socket layers and using more
 of the support functions available in sock.c. Lots of code is copied
 from af_unix (and some from af_irda).
 Non-blocking mode should be working as well.
Signed-off-by: NSjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bece7b23

caif: Disconnect without waiting for response · 8d545c8f

由 Sjur Braendeland 提交于 4月 28, 2010

Changes:
o Function cfcnfg_disconn_adapt_layer is changed to do asynchronous
  disconnect, not waiting for any response from the modem. Due to this
  the function cfcnfg_linkdestroy_rsp does nothing anymore.
o Because disconnect may take down a connection before a connect response
  is received the function cfcnfg_linkup_rsp is checking if the client is
  still waiting for the response, if not a disconnect request is sent to
  the modem.
o cfctrl is no longer keeping track of pending disconnect requests.
o Added function cfctrl_cancel_req, which is used for deleting a pending
  connect request if disconnect is done before connect response is received.
o Removed unused function cfctrl_insert_req2
o Added better handling of connect reject from modem.
Signed-off-by: NSjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8d545c8f

caif: Add reference counting to service layer · 5b208656

由 Sjur Braendeland 提交于 4月 28, 2010

Changes:
o Added functions cfsrvl_get and cfsrvl_put.
o Added support release_client to use by socket and net device.
o Increase reference counting for in-flight packets from cfmuxl
Signed-off-by: NSjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5b208656

caif: Rename functions in cfcnfg and caif_dev · e539d83c

由 Sjur Braendeland 提交于 4月 28, 2010

Changes:
 o Renamed cfcnfg_del_adapt_layer to cfcnfg_disconn_adapt_layer
 o Fixed typo cfcfg to cfcnfg
 o Renamed linkid to channel_id
 o Updated documentation in caif_dev.h
 o Minor formatting changes
Signed-off-by: NSjur Braendeland <sjur.brandeland@stericsson.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e539d83c

28 4月, 2010 15 次提交

bridge: multicast_flood cleanup · afe0159d

由 stephen hemminger 提交于 4月 27, 2010

Move some declarations around to make it clearer which variables
are being used inside loop.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

afe0159d

bridge: multicast port group RCU fix · 83f6a740

由 stephen hemminger 提交于 4月 27, 2010

The recently introduced bridge mulitcast port group list was only
partially using RCU correctly. It was missing rcu_dereference()
and missing the necessary barrier on deletion.

The code should have used one of the standard list methods (list or hlist)
instead of open coding a RCU based link list.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

83f6a740

bridge: multicast flood · 168d40ee

由 stephen hemminger 提交于 4月 27, 2010

Fix unsafe usage of RCU. Would never work on Alpha SMP because
of lack of rcu_dereference()
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

168d40ee

bridge: simplify multicast_add_router · 7e80c124

由 stephen hemminger 提交于 4月 27, 2010

By coding slightly differently, there are only two cases
to deal with: add at head and add after previous entry.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7e80c124

Revert "bridge: Use hlist_for_each_entry_rcu() in br_multicast_add_router()" · 709b9326

由 David S. Miller 提交于 4月 27, 2010

This reverts commit ff65e827.

As explained by Stephen Hemminger, the traversal doesn't require
RCU handling as we hold a lock.

The list addition et al. calls, on the other hand, do.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

709b9326

D
bridge: Use hlist_for_each_entry_rcu() in br_multicast_add_router() · ff65e827
由 David S. Miller 提交于 4月 27, 2010
```
Noticed by Michał Mirosław.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
ff65e827

net: disallow to use net_assign_generic externally · 05fceb4a

由 Jiri Pirko 提交于 4月 23, 2010

Now there's no need to use this fuction directly because it's handled by
register_pernet_device. So to make this simple and easy to understand,
make this static to do not tempt potentional users.
Signed-off-by: NJiri Pirko <jpirko@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

05fceb4a

net: sk_add_backlog() take rmem_alloc into account · c377411f

由 Eric Dumazet 提交于 4月 27, 2010

Current socket backlog limit is not enough to really stop DDOS attacks,
because user thread spend many time to process a full backlog each
round, and user might crazy spin on socket lock.

We should add backlog size and receive_queue size (aka rmem_alloc) to
pace writers, and let user run without being slow down too much.

Introduce a sk_rcvqueues_full() helper, to avoid taking socket lock in
stress situations.

Under huge stress from a multiqueue/RPS enabled NIC, a single flow udp
receiver can now process ~200.000 pps (instead of ~100 pps before the
patch) on a 8 core machine.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c377411f

net: batch skb dequeueing from softnet input_pkt_queue · 6e7676c1

由 Changli Gao 提交于 4月 27, 2010

batch skb dequeueing from softnet input_pkt_queue to reduce potential lock
contention when RPS is enabled.

Note: in the worst case, the number of packets in a softnet_data may
be double of netdev_max_backlog.
Signed-off-by: NChangli Gao <xiaosuo@gmail.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6e7676c1

net: Make RFS socket operations not be inet specific. · c58dc01b

由 David S. Miller 提交于 4月 27, 2010

Idea from Eric Dumazet.

As for placement inside of struct sock, I tried to choose a place
that otherwise has a 32-bit hole on 64-bit systems.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>

c58dc01b

net: reimplement softnet_data.output_queue as a FIFO queue · a9cbd588

由 Changli Gao 提交于 4月 26, 2010

reimplement softnet_data.output_queue as a FIFO queue to keep the
fairness among the qdiscs rescheduled.
Signed-off-by: NChangli Gao <xiaosuo@gmail.com>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
----
 include/linux/netdevice.h |    1 +
 net/core/dev.c            |   22 ++++++++++++----------
 2 files changed, 13 insertions(+), 10 deletions(-)
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a9cbd588

TCP: avoid to send keepalive probes if receiving data · 6c37e5de

由 Flavio Leitner 提交于 4月 26, 2010

RFC 1122 says the following:
...
  Keep-alive packets MUST only be sent when no data or
  acknowledgement packets have been received for the
  connection within an interval.
...

The acknowledgement packet is reseting the keepalive
timer but the data packet isn't. This patch fixes it by
checking the timestamp of the last received data packet
too when the keepalive timer expires.
Signed-off-by: NFlavio Leitner <fleitner@redhat.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Acked-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6c37e5de

bridge: multicast router list manipulation · dcdca2c4

由 stephen hemminger 提交于 4月 27, 2010

I prefer that the hlist be only accessed through the hlist macro
objects. Explicit twiddling of links (especially with RCU) exposes
the code to future bugs.

Compile tested only.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dcdca2c4

bridge: use is_multicast_ether_addr · 7180f775

由 stephen hemminger 提交于 4月 27, 2010

Use existing inline function.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7180f775

bridge: Fix build of ipv6 multicast code. · d4c4f07d

由 David S. Miller 提交于 4月 27, 2010

Based upon a report from Stephen Rothwell:

--------------------
net/bridge/br_multicast.c: In function 'br_ip6_multicast_alloc_query':
net/bridge/br_multicast.c:469: error: implicit declaration of function 'csum_ipv6_magic'

Introduced by commit 08b202b6 ("bridge
br_multicast: IPv6 MLD support") from the net tree.

csum_ipv6_magic is declared in net/ip6_checksum.h ...
--------------------
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d4c4f07d

27 4月, 2010 5 次提交

bridge br_multicast: Ensure to initialize BR_INPUT_SKB_CB(skb)->mrouters_only. · 1fafc7a9

由 YOSHIFUJI Hideaki / 吉藤英明提交于 4月 25, 2010

Even with commit 32dec5dd ("bridge
br_multicast: Don't refer to BR_INPUT_SKB_CB(skb)->mrouters_only
without IGMP snooping."), BR_INPUT_SKB_CB(skb)->mrouters_only is
not appropriately initialized if IGMP/MLD snooping support is
compiled and disabled, so we can see garbage.
Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1fafc7a9

bridge br_multicast: Ensure to initialize BR_INPUT_SKB_CB(skb)->mrouters_only. · 4eb8b903

由 YOSHIFUJI Hideaki / 吉藤英明提交于 4月 25, 2010

Even with commit 32dec5dd ("bridge
br_multicast: Don't refer to BR_INPUT_SKB_CB(skb)->mrouters_only
without IGMP snooping."), BR_INPUT_SKB_CB(skb)->mrouters_only is
not appropriately initialized if IGMP snooping support is
compiled and disabled, so we can see garbage.
Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4eb8b903

ieee802154: Fix oops during ieee802154_sock_ioctl · 93c0c8b4

由 Stefan Schmidt 提交于 4月 26, 2010

Trying to run izlisten (from lowpan-tools tests) on a device that does not
exists I got the oops below. The problem is that we are using get_dev_by_name
without checking if we really get a device back. We don't in this case and
writing to dev->type generates this oops.

[Oops code removed by Dmitry Eremin-Solenikov]

If possible this patch should be applied to the current -rc fixes branch.
Signed-off-by: NStefan Schmidt <stefan@datenfreihafen.org>
Signed-off-by: NDmitry Eremin-Solenikov <dbaryshkov@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

93c0c8b4

net: use sk_sleep() · 4a4771a5

由 Eric Dumazet 提交于 4月 25, 2010

Commit aa395145 (net: sk_sleep() helper) missed three files in the
conversion.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4a4771a5

phonet: use phonet_pernet instead of directly net_generic · 0db3f0f4

由 Jiri Pirko 提交于 4月 26, 2010

As in for example pppoe introduce phonet_pernet and use it instead of calling
net_generic directly.
Signed-off-by: NJiri Pirko <jpirko@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0db3f0f4

26 4月, 2010 4 次提交

net: ipmr: add support for dumping routing tables over netlink · cb6a4e46

由 Patrick McHardy 提交于 4月 26, 2010

The ipmr /proc interface (ip_mr_cache) can't be extended to dump routes
from any tables but the main table in a backwards compatible fashion since
the output format ends in a variable amount of output interfaces.

Introduce a new netlink interface to dump multicast routes from all tables,
similar to the netlink interface for regular routes.
Signed-off-by: NPatrick McHardy <kaber@trash.net>

cb6a4e46

net: rtnetlink: decouple rtnetlink address families from real address families · 25239cee

由 Patrick McHardy 提交于 4月 26, 2010

Decouple rtnetlink address families from real address families in socket.h to
be able to add rtnetlink interfaces to code that is not a real address family
without increasing AF_MAX/NPROTO.

This will be used to add support for multicast route dumping from all tables
as the proc interface can't be extended to support anything but the main table
without breaking compatibility.

This partialy undoes the patch to introduce independant families for routing
rules and converts ipmr routing rules to a new rtnetlink family. Similar to
that patch, values up to 127 are reserved for real address families, values
above that may be used arbitrarily.
Signed-off-by: NPatrick McHardy <kaber@trash.net>

25239cee

net: fib_rules: mark arguments to fib_rules_register const and __net_initdata · 3d0c9c4e

由 Patrick McHardy 提交于 4月 26, 2010

fib_rules_register() duplicates the template passed to it without modification,
mark the argument as const. Additionally the templates are only needed when
instantiating a new namespace, so mark them as __net_initdata, which means
they can be discarded when CONFIG_NET_NS=n.
Signed-off-by: NPatrick McHardy <kaber@trash.net>

3d0c9c4e

ipv6: Fix inet6_csk_bind_conflict() · 6443bb1f

由 Eric Dumazet 提交于 4月 25, 2010

Commit fda48a0d (tcp: bind() fix when many ports are bound)
introduced a bug on IPV6 part.
We should not call ipv6_addr_any(inet6_rcv_saddr(sk2)) but
ipv6_addr_any(inet6_rcv_saddr(sk)) because sk2 can be IPV4, while sk is
IPV6.
Reported-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Tested-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6443bb1f

25 4月, 2010 2 次提交

netns: rename unregister_pernet_subsys parameter · b3c981d2

由 Jiri Pirko 提交于 4月 25, 2010

Stay consistent with other functions and with comment also and name
pernet_operations parameter properly.
Signed-off-by: NJiri Pirko <jpirko@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b3c981d2

rps: optimize rps_get_cpu() · 8c52d509

由 Changli Gao 提交于 4月 24, 2010

optimize rps_get_cpu().

don't initialize ports when we can get the ports. one memory access
for ports than two.
Signed-off-by: NChangli Gao <xiaosuo@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8c52d509

24 4月, 2010 2 次提交

IPv6: Complete IPV6_DONTFRAG support · 4b340ae2

由 Brian Haley 提交于 4月 23, 2010

Finally add support to detect a local IPV6_DONTFRAG event
and return the relevant data to the user if they've enabled
IPV6_RECVPATHMTU on the socket.  The next recvmsg() will
return no data, but have an IPV6_PATHMTU as ancillary data.
Signed-off-by: NBrian Haley <brian.haley@hp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4b340ae2

IPv6: Add dontfrag argument to relevant functions · 13b52cd4

由 Brian Haley 提交于 4月 23, 2010

Add dontfrag argument to relevant functions for
IPV6_DONTFRAG support, as well as allowing the value
to be passed-in via ancillary cmsg data.
Signed-off-by: NBrian Haley <brian.haley@hp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

13b52cd4