提交 · 5e6700b3bf98fe98d630bf9c939ad4c85ce95592 · openeuler / raspberrypi-kernel

28 6月, 2013 2 次提交

由 Nicolas Dichtel 提交于 6月 26, 2013

This patch allows to switch the netns when packet is encapsulated or
decapsulated. In other word, the encapsulated packet is received in a netns,
where the lookup is done to find the tunnel. Once the tunnel is found, the
packet is decapsulated and injecting into the corresponding interface which
stands to another netns.

When one of the two netns is removed, the tunnel is destroyed.
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5e6700b3

dev: introduce skb_scrub_packet() · 621e84d6

由 Nicolas Dichtel 提交于 6月 26, 2013

The goal of this new function is to perform all needed cleanup before sending
an skb into another netns.
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

621e84d6

27 6月, 2013 2 次提交

ipv6: rearm router solicitaion timer when setting new tokenized address · 77ecaace

由 Hannes Frederic Sowa 提交于 6月 26, 2013

When a new tokenized address gets installed we send out just one
router solicition. We should send out `rtr_solicits' in case one router
advertisment got lost.

So, rearm the timer as we do in addrconf_dad_complete.

Cc: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

77ecaace

sit: fix 4in4 + IPsec scenario · 963b89e8

由 Nicolas Dichtel 提交于 6月 26, 2013

Since commit 32b8a8e5 "sit: add IPv4 over IPv4 support",
tunnel->parms.iph.protocol is 0 when both 4in4 and 6in4 are setup, but
xfrm_lookup() is called only when proto is != 0, thus we need to pass the real
value.
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

963b89e8

26 6月, 2013 8 次提交

net: poll/select low latency socket support · 2d48d67f

由 Eliezer Tamir 提交于 6月 24, 2013

select/poll busy-poll support.

Split sysctl value into two separate ones, one for read and one for poll.
updated Documentation/sysctl/net.txt

Add a new poll flag POLL_LL. When this flag is set, sock_poll will call
sk_poll_ll if possible. sock_poll sets this flag in its return value
to indicate to select/poll when a socket that can busy poll is found.

When poll/select have nothing to report, call the low-level
sock_poll again until we are out of time or we find something.

Once the system call finds something, it stops setting POLL_LL, so it can
return the result to the user ASAP.
Signed-off-by: NEliezer Tamir <eliezer.tamir@linux.intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2d48d67f

net: sctp: simplify sctp_get_port · 62208f12

由 Daniel Borkmann 提交于 6月 25, 2013

No need to have an extra ret variable when we directly can return
the value of sctp_get_port_local().
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Acked-by: NVlad Yasevich <vyasevich@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

62208f12

net: sctp: decouple cleaning some socket data from endpoint · 0a2fbac1

由 Daniel Borkmann 提交于 6月 25, 2013

Rather instead of having the endpoint clean the garbage from the
socket, use a sk_destruct handler sctp_destruct_sock(), that does
the job for that when there are no more references on the socket.
At least do this for our crypto transform through crypto_free_hash()
that is allocated when in listening state.

Also, perform sctp_put_port() only when sk is valid. At a later
point in time we can still determine if there's an option of
placing this into sk_prot->unhash() or sctp_endpoint_free() without
any races. For now, leave it in sctp_endpoint_destroy() though.
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Acked-by: NVlad Yasevich <vyasevich@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0a2fbac1

net: sctp: minor: sctp_seq_dump_local_addrs add missing newline · b527fe69

由 Daniel Borkmann 提交于 6月 25, 2013

A trailing newline has been forgotten to add into the WARN().
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Acked-by: NVlad Yasevich <vyasevich@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b527fe69

net: sctp: migrate cookie life from timeval to ktime · 52db882f

由 Daniel Borkmann 提交于 6月 25, 2013

Currently, SCTP code defines its own timeval functions (since timeval
is rarely used inside the kernel by others), namely tv_lt() and
TIMEVAL_ADD() macros, that operate on SCTP cookie expiration.

We might as well remove all those, and operate directly on ktime
structures for a couple of reasons: ktime is available on all archs;
complexity of ktime calculations depending on the arch is less than
(reduces to a simple arithmetic operations on archs with
BITS_PER_LONG == 64 or CONFIG_KTIME_SCALAR) or equal to timeval
functions (other archs); code becomes more readable; macros can be
thrown out.
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Acked-by: NVlad Yasevich <vyasevich@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

52db882f

ipv6: remove old token ipv6 address as soon as possible · 2b9651d7

由 Hannes Frederic Sowa 提交于 6月 24, 2013

If the tokenized ip address is re-set on an interface we depend on the
arrival of a new router advertisment to call addrconf_verify to clean
up the old address (which valid_lft is now set to 0). Old addresses can
linger around for a longer time if e.g. the source of router advertisments
vanishes.

So, call addrconf_verify immediately after setting the new tokenized
address to get rid of the old tokenized addresses.

Cc: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2b9651d7

ipv6: don't disable interface if last ipv6 address is removed · 876fd05d

由 Hannes Frederic Sowa 提交于 6月 24, 2013

The reason behind this change is that as soon as we delete
the last ipv6 address of an interface we also lose the
/proc/sys/net/ipv6/conf/<interface> directory. This seems to be a
usability problem for me.

I don't see any reason why we should shutdown ipv6 on that interface in
such cases.

Cc: YOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

876fd05d

ipv6: split duplicate address detection and router solicitation timer · b7b1bfce

由 Hannes Frederic Sowa 提交于 6月 23, 2013

This patch splits the timers for duplicate address detection and router
solicitations apart. The router solicitations timer goes into inet6_dev
and the dad timer stays in inet6_ifaddr.

The reason behind this patch is to reduce the number of unneeded router
solicitations send out by the host if additional link-local addresses
are created. Currently we send out RS for every link-local address on
an interface.

If the RS timer fires we pick a source address with ipv6_get_lladdr. This
change could hurt people adding additional link-local addresses and
specifying these addresses in the radvd clients section because we
no longer guarantee that we use every ll address as source address in
router solicitations.

Cc: Flavio Leitner <fleitner@redhat.com>
Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
Cc: David Stevens <dlstevens@us.ibm.com>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Reviewed-by: NFlavio Leitner <fbl@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b7b1bfce

25 6月, 2013 2 次提交

ipv6: add include file to suppress sparse warnings · 6da334ee

由 Eric Dumazet 提交于 6月 25, 2013

commit f88c91dd ("ipv6: statically link
register_inet6addr_notifier()" added following sparse warnings :

net/ipv6/addrconf_core.c:83:5: warning: symbol
'register_inet6addr_notifier' was not declared. Should it be static?
net/ipv6/addrconf_core.c:89:5: warning: symbol
'unregister_inet6addr_notifier' was not declared. Should it be static?
net/ipv6/addrconf_core.c:95:5: warning: symbol
'inet6addr_notifier_call_chain' was not declared. Should it be static?
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Cong Wang <amwang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6da334ee

net: netlink: virtual tap device management · bcbde0d4

由 Daniel Borkmann 提交于 6月 21, 2013

Similarly to the networking receive path with ptype_all taps, we add
the possibility to register netdevices that are for ARPHRD_NETLINK to
the netlink subsystem, so that those can be used for netlink analyzers
resp. debuggers. We do not offer a direct callback function as out-of-tree
modules could do crap with it. Instead, a netdevice must be registered
properly and only receives a clone, managed by the netlink layer. Symbols
are exported as GPL-only.
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bcbde0d4

24 6月, 2013 9 次提交

net: Unmap fragment page once iterator is done · aeb193ea

由 Wedson Almeida Filho 提交于 6月 23, 2013

Callers of skb_seq_read() are currently forced to call skb_abort_seq_read()
even when consuming all the data because the last call to skb_seq_read (the
one that returns 0 to indicate the end) fails to unmap the last fragment page.

With this patch callers will be allowed to traverse the SKB data by calling
skb_prepare_seq_read() once and repeatedly calling skb_seq_read() as originally
intended (and documented in the original commit 677e90ed), that is, only call
skb_abort_seq_read() if the sequential read is actually aborted.
Signed-off-by: NWedson Almeida Filho <wedsonaf@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aeb193ea

openvswitch: Use correct config guard. · 479b1a58

由 Pravin B Shelar 提交于 6月 20, 2013

This bug was introduced by commit aa310701
(openvswitch: Add gre tunnel support.)
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

479b1a58

bridge: fix a typo in comments · 7c77602f

由 Cong Wang 提交于 6月 21, 2013

Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: NCong Wang <amwang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7c77602f

net: allow large number of tx queues · 60877a32

由 Eric Dumazet 提交于 6月 20, 2013

netif_alloc_netdev_queues() uses kcalloc() to allocate memory
for the "struct netdev_queue *_tx" array.

For large number of tx queues, kcalloc() might fail, so this
patch does a fallback to vzalloc().

As vmalloc() adds overhead on a critical network path, add __GFP_REPEAT
to kzalloc() flags to do this fallback only when really needed.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

60877a32

VSOCK: Fix VSOCK_HASH and VSOCK_CONN_HASH · a49dd9dc

由 Asias He 提交于 6月 20, 2013

If we mod with VSOCK_HASH_SIZE -1, we get 0, 1, .... 249.  Actually, we
have vsock_bind_table[0 ... 250] and vsock_connected_table[0 .. 250].
In this case the last entry will never be used.

We should mod with VSOCK_HASH_SIZE instead.
Signed-off-by: NAsias He <asias@redhat.com>
Acked-by: NAndy King <acking@vmware.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a49dd9dc

VSOCK: Remove unnecessary label · 0fc93246

由 Asias He 提交于 6月 20, 2013

Signed-off-by: NAsias He <asias@redhat.com>
Acked-by: NAndy King <acking@vmware.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0fc93246

VSOCK: Return VMCI_ERROR_NO_MEM when fails to allocate skb · dce1a287

由 Asias He 提交于 6月 20, 2013

vmci_transport_recv_dgram_cb always return VMCI_SUCESS even if we fail
to allocate skb, return VMCI_ERROR_NO_MEM instead.
Signed-off-by: NAsias He <asias@redhat.com>
Acked-by: NAndy King <acking@vmware.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dce1a287

VSOCK: Introduce vsock_auto_bind helper · b3a6dfe8

由 Asias He 提交于 6月 20, 2013

This peace of code is called three times, let's have a helper for it.
Signed-off-by: NAsias He <asias@redhat.com>
Acked-by: NAndy King <acking@vmware.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b3a6dfe8

ipv6: remove a useless pr_info() in addrconf_gre_config() · b33698e2

由 Cong Wang 提交于 6月 20, 2013

This is debug info, should at least be pr_debug(), but given
that this code is in upstream for two years, there is no
need to keep this debugging printk any more, so just remove it.

Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: NCong Wang <amwang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b33698e2

20 6月, 2013 17 次提交

inet: frag , remove an empty ifdef. · af92e542

由 Rami Rosen 提交于 6月 15, 2013

This patch removes an empty ifdef from inet_frag_intern()
in net/ipv4/inet_fragment.c.

commit b67bfe0d
(hlist: drop the node parameter from iterators) removed hlist from
net/ipv4/inet_fragment.c, but did not remove the enclosing ifdef command,
which is now empty.
Signed-off-by: NRami Rosen <ramirose@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

af92e542

htb: refactor struct htb_sched fields for performance · c9364636

由 Eric Dumazet 提交于 6月 15, 2013

htb_sched structures are big, and source of false sharing on SMP.

Every time a packet is queued or dequeue, many cache lines must be
touched because structures are not lay out properly.

By carefully splitting htb_sched in two parts, and define sub structures
to increase data locality, we can improve performance dramatically on
SMP.

New htb_prio structure can also be used in htb_class to increase data
locality.

I got 26 % performance increase on a 24 threads machine, with 200
concurrent netperf in TCP_RR mode, using a HTB hierarchy of 4 classes.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c9364636

tcp: introduce a per-route knob for quick ack · bcefe17c

由 Cong Wang 提交于 6月 15, 2013

In previous discussions, I tried to find some reasonable heuristics
for delayed ACK, however this seems not possible, according to Eric:

	"ACKS might also be delayed because of bidirectional
	traffic, and is more controlled by the application
	response time. TCP stack can not easily estimate it."

	"ACK can be incredibly useful to recover from losses in
	a short time.

	The vast majority of TCP sessions are small lived, and we
	send one ACK per received segment anyway at beginning or
	retransmits to let the sender smoothly increase its cwnd,
	so an auto-tuning facility wont help them that much."

and according to David:

	"ACKs are the only information we have to detect loss.

	And, for the same reasons that TCP VEGAS is fundamentally
	broken, we cannot measure the pipe or some other
	receiver-side-visible piece of information to determine
	when it's "safe" to stretch ACK.

	And even if it's "safe", we should not do it so that losses are
	accurately detected and we don't spuriously retransmit.

	The only way to know when the bandwidth increases is to
	"test" it, by sending more and more packets until drops happen.
	That's why all successful congestion control algorithms must
	operate on explicited tested pieces of information.

	Similarly, it's not really possible to universally know if
	it's safe to stretch ACK or not."

It still makes sense to enable or disable quick ack mode like
what TCP_QUICK_ACK does.

Similar to TCP_QUICK_ACK option, but for people who can't
modify the source code and still wants to control
TCP delayed ACK behavior. As David suggested, this should belong
to per-path scope, since different pathes may want different
behaviors.

Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Rick Jones <rick.jones2@hp.com>
Cc: Stephen Hemminger <stephen@networkplumber.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Thomas Graf <tgraf@suug.ch>
CC: David Laight <David.Laight@ACULAB.COM>
Signed-off-by: NCong Wang <amwang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bcefe17c

D
sctp: Convert __list_for_each use to list_for_each · 2c0740e4
由 Dave Jones 提交于 6月 17, 2013
```
Signed-off-by: NDave Jones <davej@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
2c0740e4

tcp:typo unset should be unsent · 9ef71e0c

由 Weiping Pan 提交于 6月 18, 2013

Signed-off-by: NWeiping Pan <wpan@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9ef71e0c

sit: fix an oops when IFLA_IPTUN_PROTO is not set · c2ff682a

由 Nicolas Dichtel 提交于 6月 19, 2013

The use of this attribute has been added in 32b8a8e5 (sit: add IPv4 over
IPv4 support). It is optional, by default proto is IPPROTO_IPV6.
Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c2ff682a

neigh: disallow un-init_net to change thresh of neigh · dc25c676

由 Gao feng 提交于 6月 20, 2013

thresh and interval are global resources,
only init net can change them.
Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dc25c676

neigh: only allow init_net to change the default neigh_parms · 170d6f99

由 Gao feng 提交于 6月 20, 2013

Though we don't export the /proc/sys/net/ipv[4,6]/neigh/default/
directory to the un-init_net, but we can still use cmd such as
"ip ntable change name arp_cache locktime 129" to change the locktime
of default neigh_parms.

This patch disallows the un-init_net to find out the neigh_table.parms.
So the un-init_net will failed to influence the init_net.
Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

170d6f99

neigh: no need to call lookup_neigh_parms in neigh_parms_alloc · cf89d6b2

由 Gao feng 提交于 6月 20, 2013

neigh_table.parms always exist and is initialized,kmemdup
can use it to create new neigh_parms, actually lookup_neigh_parms
here will return neigh_table.parms too.
Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cf89d6b2

openvswitch: Add gre tunnel support. · aa310701

由 Pravin B Shelar 提交于 6月 17, 2013

Add gre vport implementation.  Most of gre protocol processing
is pushed to gre module. It make use of gre demultiplexer
therefore it can co-exist with linux device based gre tunnels.
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Acked-by: NJesse Gross <jesse@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aa310701

openvswitch: Optimize flow key match for non tunnel flows. · a3e82996

由 Pravin B Shelar 提交于 6月 17, 2013

Following patch adds start offset for sw_flow-key, so that we can
skip tunneling information in key for non-tunnel flows.
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Acked-by: NJesse Gross <jesse@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a3e82996

openvswitch: Expand action buffer size. · ffe3f432

由 Pravin B Shelar 提交于 6月 17, 2013

MAX_ACTIONS_BUFSIZE limits action list size, set tunnel action
needs extra space on action list, for now increase max actions list limit.
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Acked-by: NJesse Gross <jesse@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ffe3f432

openvswitch: Add tunneling interface. · 7d5437c7

由 Pravin B Shelar 提交于 6月 17, 2013

Add ovs tunnel interface for set tunnel action for userspace.
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Acked-by: NJesse Gross <jesse@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7d5437c7

openvswitch: Copy individual actions. · 74f84a57

由 Pravin B Shelar 提交于 6月 17, 2013

Rather than validating actions and then copying all actiaons
in one block, following patch does same operation in single pass.
This validate and copy action one by one. This is required for
ovs tunneling patch.

This patch does not change any functionality.
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Acked-by: NJesse Gross <jesse@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

74f84a57

ip_tunnel: push generic protocol handling to ip_tunnel module. · 3d7b46cd

由 Pravin B Shelar 提交于 6月 17, 2013

Process skb tunnel header before sending packet to protocol handler.
this allows code sharing between gre and ovs gre modules.
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3d7b46cd

ip_tunnels: extend iptunnel_xmit() · 0e6fbc5b

由 Pravin B Shelar 提交于 6月 17, 2013

Refactor various ip tunnels xmit functions and extend iptunnel_xmit()
so that there is more code sharing.
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0e6fbc5b

gre: export gre_handle_offloads() function. · 45f2e997

由 Pravin B Shelar 提交于 6月 17, 2013

This is required for OVS GRE offloading.
Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

45f2e997