提交 · 8d5b2c084d2e71587e30a6ef528a8a8051e59dcd · openeuler / raspberrypi-kernel

24 10月, 2009 6 次提交

gre: convert hash tables locking to RCU · 8d5b2c08

由 Eric Dumazet 提交于 10月 23, 2009

GRE tunnels use one rwlock to protect their hash tables.

This locking scheme can be converted to RCU for free, since netdevice
already must wait for a RCU grace period at dismantle time.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8d5b2c08

ip6tnl: convert hash tables locking to RCU · 2922bc8a

由 Eric Dumazet 提交于 10月 23, 2009

ip6_tunnels use one rwlock to protect their hash tables.

This locking scheme can be converted to RCU for free, since netdevice
already must wait for a RCU grace period at dismantle time.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2922bc8a

ipip: convert hash tables locking to RCU · 8f95dd63

由 Eric Dumazet 提交于 10月 23, 2009

IPIP tunnels use one rwlock to protect their hash tables.

This locking scheme can be converted to RCU for free, since netdevice
already must wait for a RCU grace period at dismantle time.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8f95dd63

xfrm6_tunnel: RCU conversion · 91cc3bb0

由 Eric Dumazet 提交于 10月 23, 2009

xfrm6_tunnels use one rwlock to protect their hash tables.

Plain and straightforward conversion to RCU locking to permit better SMP
performance.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

91cc3bb0

ipv6 sit: RCU conversion phase II · 4543c10d

由 Eric Dumazet 提交于 10月 23, 2009

SIT tunnels use one rwlock to protect their hash tables.

This locking scheme can be converted to RCU for free, since netdevice
already must wait for a RCU grace period at dismantle time.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4543c10d

ipv6 sit: RCU conversion phase I · ef9a9d11

由 Eric Dumazet 提交于 10月 23, 2009

SIT tunnels use one rwlock to protect their prl entries.

This first patch adds RCU locking for prl management,
with standard call_rcu() calls.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ef9a9d11

23 10月, 2009 1 次提交

pkt_sched: skbedit add support for setting mark · 1c55d62e

由 jamal 提交于 10月 15, 2009

This adds support for setting the skb mark.
Signed-off-by: NJamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1c55d62e

22 10月, 2009 1 次提交

rtnetlink: rtnl_setlink() and rtnl_getlink() changes · a3d12891

由 Eric Dumazet 提交于 10月 21, 2009

rtnl_getlink() & rtnl_setlink() run with RTNL held, we can use
__dev_get_by_index() and __dev_get_by_name() variants and avoid
dev_hold()/dev_put()

Adds to rtnl_getlink() the capability to find a device by its name,
not only by its index.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a3d12891

21 10月, 2009 4 次提交

net: Use sk_tx_queue_mapping for connected sockets · a4ee3ce3

由 Krishna Kumar 提交于 10月 19, 2009

For connected sockets, the first run of dev_pick_tx saves the
calculated txq in sk_tx_queue_mapping. This is not saved if
either the device has a queue select or the socket is not
connected. Next iterations of dev_pick_tx uses the cached value
of sk_tx_queue_mapping.
Signed-off-by: NKrishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a4ee3ce3

net: Fix for dst_negative_advice · ea94ff3b

由 Krishna Kumar 提交于 10月 19, 2009

dst_negative_advice() should check for changed dst and reset
sk_tx_queue_mapping accordingly. Pass sock to the callers of
dst_negative_advice.

(sk_reset_txq is defined just for use by dst_negative_advice. The
only way I could find to get around this is to move dst_negative_()
from dst.h to dst.c, include sock.h in dst.c, etc)
Signed-off-by: NKrishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ea94ff3b

net: IPv6 changes · f04c8276

由 Krishna Kumar 提交于 10月 19, 2009

IPv6: Reset sk_tx_queue_mapping when dst_cache is reset. Use existing
macro to do the work.
Signed-off-by: NKrishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f04c8276

net: Introduce sk_tx_queue_mapping · e022f0b4

由 Krishna Kumar 提交于 10月 19, 2009

Introduce sk_tx_queue_mapping; and functions that set, test and
get this value. Reset sk_tx_queue_mapping to -1 whenever the dst
cache is set/reset, and in socket alloc. Setting txq to -1 and
using valid txq=<0 to n-1> allows the tx path to use the value
of sk_tx_queue_mapping directly instead of subtracting 1 on every
tx.
Signed-off-by: NKrishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e022f0b4

20 10月, 2009 5 次提交

filter: Add SKF_AD_QUEUE instruction · d19742fb

由 Eric Dumazet 提交于 10月 20, 2009

It can help being able to filter packets on their queue_mapping.

If filter performance is not good, we could add a "numqueue" field
in struct packet_type, so that netif_nit_deliver() and other functions
can directly ignore packets with not expected queue number.

Lets experiment this simple filter extension first.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d19742fb

af_packet: mc_drop/flush_mclist changes · ad959e76

由 Eric Dumazet 提交于 10月 16, 2009

We hold RTNL, we can use __dev_get_by_index() instead of dev_get_by_index()
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ad959e76

af_packet: Avoid cache line dirtying · 94b05952

由 Eric Dumazet 提交于 10月 16, 2009

While doing multiple captures, I found af_packet was dirtying cache line
containing its prot_hook.

This slow down machines where several cpus are necessary to handle capture
traffic, as each prot_hook is traversed for each packet coming in or out
the host.

This patches moves "struct packet_type prot_hook" to the end of
packet_sock, and uses a ____cacheline_aligned_in_smp to make sure
this remains shared by all cpus.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

94b05952

IP: Cleanups · 0eae750e

由 John Dykstra 提交于 10月 19, 2009

Use symbols instead of magic constants while checking PMTU discovery
setsockopt.

Remove redundant test in ip_rt_frag_needed() (done by caller).
Signed-off-by: NJohn Dykstra <john.dykstra1@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0eae750e

pkt_sched: ingress socket filter by mark · 7e75f93e

由 jamal 提交于 10月 19, 2009

Allow bpf to set a filter to drop packets that dont
match a specific mark
Signed-off-by: NJamal Hadi Salim <hadi@cyberus.ca>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7e75f93e

19 10月, 2009 5 次提交

xfrm: remove skb_icv_walk · eb2ff967

由 Steffen Klassert 提交于 10月 07, 2009

The last users of skb_icv_walk are converted to ahash now,
so skb_icv_walk is unused and can be removed.
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eb2ff967

ah6: convert to ahash · 8631e9bd

由 Steffen Klassert 提交于 10月 07, 2009

This patch converts ah6 to the new ahash interface.
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8631e9bd

ah4: convert to ahash · dff3bb06

由 Steffen Klassert 提交于 10月 07, 2009

This patch converts ah4 to the new ahash interface.
Signed-off-by: NSteffen Klassert <steffen.klassert@secunet.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dff3bb06

net: sk_drops consolidation part 2 · 8edf19c2

由 Eric Dumazet 提交于 10月 15, 2009

- skb_kill_datagram() can increment sk->sk_drops itself, not callers.

- UDP on IPV4 & IPV6 dropped frames (because of bad checksum or policy checks) increment sk_drops
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8edf19c2

inet: rename some inet_sock fields · c720c7e8

由 Eric Dumazet 提交于 10月 15, 2009

In order to have better cache layouts of struct sock (separate zones
for rx/tx paths), we need this preliminary patch.

Goal is to transfert fields used at lookup time in the first
read-mostly cache line (inside struct sock_common) and move sk_refcnt
to a separate cache line (only written by rx path)

This patch adds inet_ prefix to daddr, rcv_saddr, dport, num, saddr,
sport and id fields. This allows a future patch to define these
fields as macros, like sk_refcnt, without name clashes.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c720c7e8

18 10月, 2009 4 次提交

genetlink: Optimize and one bug fix in genl_generate_id() · 988ade6b

由 Krishna Kumar 提交于 10月 14, 2009

1. GENL_MIN_ID is a valid id -> no need to start at
   GENL_MIN_ID + 1.
2. Avoid going through the ids two times: If we start at
   GENL_MIN_ID+1 (*or bigger*) and all ids are over!, the
   code iterates through the list twice (*or lesser*).
3. Simplify code - no need to start at idx=0 which gets
   reset to GENL_MIN_ID.

Patch on net-next-2.6. Reboot test shows that first id
passed to genl_register_family was 16, next two were
GENL_ID_GENERATE and genl_generate_id returned 17 & 18
(user level testing of same code shows expected values
across entire range of MIN/MAX).
Signed-off-by: NKrishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

988ade6b

genetlink: Optimize genl_register_family() · 93860b08

由 Krishna Kumar 提交于 10月 14, 2009

genl_register_family() doesn't need to call genl_family_find_byid
when GENL_ID_GENERATE is passed during register.

Patch on net-next-2.6, compile and reboot testing only.
Signed-off-by: NKrishna Kumar <krkumar2@in.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

93860b08

af_iucv: remove duplicate sock_set_flag · 9a4ff8d4

由 Ursula Braun 提交于 10月 14, 2009

Remove duplicate sock_set_flag(sk, SOCK_ZAPPED) in iucv_sock_close,
which has been overlooked in September-commit
7514bab0.

Cc: Hendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: NUrsula Braun <ursula.braun@de.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9a4ff8d4

af_iucv: use sk functions to modify sk->sk_ack_backlog · 49f5eba7

由 Hendrik Brueckner 提交于 10月 14, 2009

Instead of modifying sk->sk_ack_backlog directly, use respective
socket functions.
Signed-off-by: NHendrik Brueckner <brueckner@linux.vnet.ibm.com>
Signed-off-by: NUrsula Braun <ursula.braun@de.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

49f5eba7

16 10月, 2009 1 次提交

Phonet: hold socket before giving it to sk_deliver_skb() · 21912d1c

由 Rémi Denis-Courmont 提交于 10月 15, 2009

Signed-off-by: NRémi Denis-Courmont <remi.denis-courmont@nokia.com>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

21912d1c

15 10月, 2009 6 次提交

net: sk_drops consolidation · 766e9037

由 Eric Dumazet 提交于 10月 14, 2009

sock_queue_rcv_skb() can update sk_drops itself, removing need for
callers to take care of it. This is more consistent since
sock_queue_rcv_skb() also reads sk_drops when queueing a skb.

This adds sk_drops managment to many protocols that not cared yet.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

766e9037

Phonet: forward incoming packets · 86a0a1e5

由 Rémi Denis-Courmont 提交于 10月 14, 2009

Signed-off-by: NRémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

86a0a1e5

Phonet: route outgoing packets · aa6c45f3

由 Rémi Denis-Courmont 提交于 10月 14, 2009

Signed-off-by: NRémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aa6c45f3

Phonet: routing table Netlink interface · f062f41d

由 Rémi Denis-Courmont 提交于 10月 14, 2009

Signed-off-by: NRémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f062f41d

Phonet: routing table backend · 55748ac0

由 Rémi Denis-Courmont 提交于 10月 14, 2009

The Phonet "universe" only has 64 addresses, so we keep a trivial flat
routing table.
Signed-off-by: NRémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

55748ac0

Phonet: deliver broadcast packets to broadcast sockets · f14001fc

由 Rémi Denis-Courmont 提交于 10月 14, 2009

Signed-off-by: NRémi Denis-Courmont <remi.denis-courmont@nokia.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f14001fc

14 10月, 2009 1 次提交

net: Use netdev_alloc_skb_ip_align() · 89d71a66

由 Eric Dumazet 提交于 10月 13, 2009

Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

89d71a66

13 10月, 2009 6 次提交

x25: bit and/or confusion in x25_ioctl()? · 06a96b33

由 roel kluin 提交于 10月 07, 2009

Looking at commit ebc3f64b it appears that this was intended
and not the original, equivalent to `if (facilities.reverse & ~0x81)'.

In x25_parse_facilities() that patch changed how facilities->reverse
was set. No other bits were set than 0x80 and/or 0x01.
Signed-off-by: NRoel Kluin <roel.kluin@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

06a96b33

tcp: replace ehash_size by ehash_mask · f373b53b

由 Eric Dumazet 提交于 10月 09, 2009

Storing the mask (size - 1) instead of the size allows fast path to be
a bit faster.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f373b53b

ipv6: fix devconf after adding force_tllao option · c3faca05

由 Cosmin Ratiu 提交于 10月 09, 2009

Signed-off-by: NCosmin Ratiu <cratiu@ixiacom.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c3faca05

udp: Fix udp_poll() and ioctl() · 85584672

由 Eric Dumazet 提交于 10月 09, 2009

udp_poll() can in some circumstances drop frames with incorrect checksums.

Problem is we now have to lock the socket while dropping frames, or risk
sk_forward corruption.

This bug is present since commit 95766fff
([UDP]: Add memory accounting.)

While we are at it, we can correct ioctl(SIOCINQ) to also drop bad frames.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

85584672

tcp: fix tcp_defer_accept to consider the timeout · 6d01a026

由 Willy Tarreau 提交于 10月 13, 2009

I was trying to use TCP_DEFER_ACCEPT and noticed that if the
client does not talk, the connection is never accepted and
remains in SYN_RECV state until the retransmits expire, where
it finally is deleted. This is bad when some firewall such as
netfilter sits between the client and the server because the
firewall sees the connection in ESTABLISHED state while the
server will finally silently drop it without sending an RST.

This behaviour contradicts the man page which says it should
wait only for some time :

       TCP_DEFER_ACCEPT (since Linux 2.4)
          Allows a listener to be awakened only when data arrives
          on the socket.  Takes an integer value  (seconds), this
          can  bound  the  maximum  number  of attempts TCP will
          make to complete the connection. This option should not
          be used in code intended to be portable.

Also, looking at ipv4/tcp.c, a retransmit counter is correctly
computed :

        case TCP_DEFER_ACCEPT:
                icsk->icsk_accept_queue.rskq_defer_accept = 0;
                if (val > 0) {
                        /* Translate value in seconds to number of
                         * retransmits */
                        while (icsk->icsk_accept_queue.rskq_defer_accept < 32 &&
                               val > ((TCP_TIMEOUT_INIT / HZ) <<
                                       icsk->icsk_accept_queue.rskq_defer_accept))
                                icsk->icsk_accept_queue.rskq_defer_accept++;
                        icsk->icsk_accept_queue.rskq_defer_accept++;
                }
                break;

==> rskq_defer_accept is used as a counter of retransmits.

But in tcp_minisocks.c, this counter is only checked. And in
fact, I have found no location which updates it. So I think
that what was intended was to decrease it in tcp_minisocks
whenever it is checked, which the trivial patch below does.
Signed-off-by: NWilly Tarreau <w@1wt.eu>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6d01a026

net: Introduce recvmmsg socket syscall · a2e27255

由 Arnaldo Carvalho de Melo 提交于 10月 12, 2009

Meaning receive multiple messages, reducing the number of syscalls and
net stack entry/exit operations.

Next patches will introduce mechanisms where protocols that want to
optimize this operation will provide an unlocked_recvmsg operation.

This takes into account comments made by:

. Paul Moore: sock_recvmsg is called only for the first datagram,
  sock_recvmsg_nosec is used for the rest.

. Caitlin Bestler: recvmmsg now has a struct timespec timeout, that
  works in the same fashion as the ppoll one.

  If the underlying protocol returns a datagram with MSG_OOB set, this
  will make recvmmsg return right away with as many datagrams (+ the OOB
  one) it has received so far.

. Rémi Denis-Courmont & Steven Whitehouse: If we receive N < vlen
  datagrams and then recvmsg returns an error, recvmmsg will return
  the successfully received datagrams, store the error and return it
  in the next call.

This paves the way for a subsequent optimization, sk_prot->unlocked_recvmsg,
where we will be able to acquire the lock only at batch start and end, not at
every underlying recvmsg call.
Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a2e27255