提交 · 469bdcefdc47a69028029e792ff1e80680c867b9 · openeuler / raspberrypi-kernel

01 1月, 2014 1 次提交

vlan: Fix header ops passthru when doing TX VLAN offload. · 2205369a

由 David S. Miller 提交于 12月 31, 2013

When the vlan code detects that the real device can do TX VLAN offloads
in hardware, it tries to arrange for the real device's header_ops to
be invoked directly.

But it does so illegally, by simply hooking the real device's
header_ops up to the VLAN device.

This doesn't work because we will end up invoking a set of header_ops
routines which expect a device type which matches the real device, but
will see a VLAN device instead.

Fix this by providing a pass-thru set of header_ops which will arrange
to pass the proper real device instead.

To facilitate this add a dev_rebuild_header().  There are
implementations which provide a ->cache and ->create but not a
->rebuild (f.e. PLIP).  So we need a helper function just like
dev_hard_header() to avoid crashes.

Use this helper in the one existing place where the
header_ops->rebuild was being invoked, the neighbour code.

With lots of help from Florian Westphal.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2205369a

18 12月, 2013 2 次提交

net: allow netdev_all_upper_get_next_dev_rcu with rtnl lock held · 85328240

由 John Fastabend 提交于 11月 26, 2013

It is useful to be able to walk all upper devices when bringing
a device online where the RTNL lock is held. In this case it
is safe to walk the all_adj_list because the RTNL lock is used
to protect the write side as well.

This patch adds a check to see if the rtnl lock is held before
throwing a warning in netdev_all_upper_get_next_dev_rcu().

Also because we now have a call site for lockdep_rtnl_is_held()
outside COFIG_LOCK_PROVING an inline definition returning 1 is
needed. Similar to the rcu_read_lock_is_held().

Fixes: 2a47fa45 ("ixgbe: enable l2 forwarding acceleration for macvlans")
CC: Veaceslav Falico <vfalico@redhat.com>
Reported-by: NYuanhan Liu <yuanhan.liu@linux.intel.com>
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Tested-by: NPhil Schmitt <phillip.j.schmitt@intel.com>
Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>

85328240

neigh: Netlink notification for administrative NUD state change · 53385d2d

由 Bob Gilligan 提交于 12月 15, 2013

The neighbour code sends up an RTM_NEWNEIGH netlink notification if
the NUD state of a neighbour cache entry is changed by a timer (e.g.
from REACHABLE to STALE), even if the lladdr of the entry has not
changed.

But an administrative change to the the NUD state of a neighbour cache
entry that does not change the lladdr (e.g. via "ip -4 neigh change
...  nud ...") does not trigger a netlink notification.  This means
that netlink listeners will not hear about administrative NUD state
changes such as from a resolved state to PERMANENT.

This patch changes the neighbor code to generate an RTM_NEWNEIGH
message when the NUD state of an entry is changed administratively.
Signed-off-by: NBob Gilligan <gilligan@aristanetworks.com>
Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

53385d2d

11 12月, 2013 1 次提交

net: unix: allow set_peek_off to fail · 12663bfc

由 Sasha Levin 提交于 12月 07, 2013

unix_dgram_recvmsg() will hold the readlock of the socket until recv
is complete.

In the same time, we may try to setsockopt(SO_PEEK_OFF) which will hang until
unix_dgram_recvmsg() will complete (which can take a while) without allowing
us to break out of it, triggering a hung task spew.

Instead, allow set_peek_off to fail, this way userspace will not hang.
Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
Acked-by: NPavel Emelyanov <xemul@parallels.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

12663bfc

10 12月, 2013 1 次提交

net: drop_monitor: fix the value of maxattr · d323e92c

由 Changli Gao 提交于 12月 08, 2013

maxattr in genl_family should be used to save the max attribute
type, but not the max command type. Drop monitor doesn't support
any attributes, so we should leave it as zero.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d323e92c

06 12月, 2013 1 次提交

net: clear local_df when passing skb between namespaces · 239c78db

由 Hannes Frederic Sowa 提交于 12月 05, 2013

We must clear local_df when passing the skb between namespaces as the
packet is not local to the new namespace any more and thus may not get
fragmented by local rules. Fred Templin noticed that other namespaces
do fragment IPv6 packets while forwarding. Instead they should have send
back a PTB.

The same problem should be present when forwarding DF-IPv4 packets
between namespaces.
Reported-by: NTemplin, Fred L <Fred.L.Templin@boeing.com>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

239c78db

02 12月, 2013 1 次提交

{pktgen, xfrm} Update IPv4 header total len and checksum after tranformation · 3868204d

由 fan.du 提交于 12月 01, 2013

commit a553e4a6 ("[PKTGEN]: IPSEC support")
tried to support IPsec ESP transport transformation for pktgen, but acctually
this doesn't work at all for two reasons(The orignal transformed packet has
bad IPv4 checksum value, as well as wrong auth value, reported by wireshark)

- After transpormation, IPv4 header total length needs update,
  because encrypted payload's length is NOT same as that of plain text.

- After transformation, IPv4 checksum needs re-caculate because of payload
  has been changed.

With this patch, armmed pktgen with below cofiguration, Wireshark is able to
decrypted ESP packet generated by pktgen without any IPv4 checksum error or
auth value error.

pgset "flag IPSEC"
pgset "flows 1"
Signed-off-by: NFan Du <fan.du@windriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3868204d

22 11月, 2013 1 次提交

gso: handle new frag_list of frags GRO packets · 9d8506cc

由 Herbert Xu 提交于 11月 21, 2013

Recently GRO started generating packets with frag_lists of frags.
This was not handled by GSO, thus leading to a crash.

Thankfully these packets are of a regular form and are easy to
handle.  This patch handles them in two ways.  For completely
non-linear frag_list entries, we simply continue to iterate over
the frag_list frags once we exhaust the normal frags.  For frag_list
entries with linear parts, we call pskb_trim on the first part
of the frag_list skb, and then process the rest of the frags in
the usual way.

This patch also kills a chunk of dead frag_list code that has
obviously never ever been run since it ends up generating a bogus
GSO-segmented packet with a frag_list entry.

Future work is planned to split super big packets into TSO
ones.

Fixes: 8a29111c ("net: gro: allow to build full sized skb")
Reported-by: NChristoph Paasch <christoph.paasch@uclouvain.be>
Reported-by: NJerry Chu <hkchu@google.com>
Reported-by: NSander Eikelenboom <linux@eikelenboom.it>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Tested-by: NSander Eikelenboom <linux@eikelenboom.it>
Tested-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9d8506cc

21 11月, 2013 2 次提交

net: rework recvmsg handler msg_name and msg_namelen logic · f3d33426

由 Hannes Frederic Sowa 提交于 11月 21, 2013

This patch now always passes msg->msg_namelen as 0. recvmsg handlers must
set msg_namelen to the proper size <= sizeof(struct sockaddr_storage)
to return msg_name to the user.

This prevents numerous uninitialized memory leaks we had in the
recvmsg handlers and makes it harder for new code to accidentally leak
uninitialized memory.

Optimize for the case recvfrom is called with NULL as address. We don't
need to copy the address at all, so set it to NULL before invoking the
recvmsg handler. We can do so, because all the recvmsg handlers must
cope with the case a plain read() is called on them. read() also sets
msg_name to NULL.

Also document these changes in include/linux/net.h as suggested by David
Miller.

Changes since RFC:

Set msg->msg_name = NULL if user specified a NULL in msg_name but had a
non-null msg_namelen in verify_iovec/verify_compat_iovec. This doesn't
affect sendto as it would bail out earlier while trying to copy-in the
address. It also more naturally reflects the logic by the callers of
verify_iovec.

With this change in place I could remove "
if (!uaddr || msg_sys->msg_namelen == 0)
	msg->msg_name = NULL
".

This change does not alter the user visible error logic as we ignore
msg_namelen as long as msg_name is NULL.

Also remove two unnecessary curly brackets in ___sys_recvmsg and change
comments to netdev style.

Cc: David Miller <davem@davemloft.net>
Suggested-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f3d33426

net: core: Always propagate flag changes to interfaces · d2615bf4

由 Vlad Yasevich 提交于 11月 19, 2013

The following commit:
    b6c40d68
    net: only invoke dev->change_rx_flags when device is UP

tried to fix a problem with VLAN devices and promiscuouse flag setting.
The issue was that VLAN device was setting a flag on an interface that
was down, thus resulting in bad promiscuity count.
This commit blocked flag propagation to any device that is currently
down.

A later commit:
    deede2fa
    vlan: Don't propagate flag changes on down interfaces

fixed VLAN code to only propagate flags when the VLAN interface is up,
thus fixing the same issue as above, only localized to VLAN.

The problem we have now is that if we have create a complex stack
involving multiple software devices like bridges, bonds, and vlans,
then it is possible that the flags would not propagate properly to
the physical devices.  A simple examle of the scenario is the
following:

  eth0----> bond0 ----> bridge0 ---> vlan50

If bond0 or eth0 happen to be down at the time bond0 is added to
the bridge, then eth0 will never have promisc mode set which is
currently required for operation as part of the bridge.  As a
result, packets with vlan50 will be dropped by the interface.

The only 2 devices that implement the special flag handling are
VLAN and DSA and they both have required code to prevent incorrect
flag propagation.  As a result we can remove the generic solution
introduced in b6c40d68 and leave
it to the individual devices to decide whether they will block
flag propagation or not.
Reported-by: NStefan Priebe <s.priebe@profihost.ag>
Suggested-by: NVeaceslav Falico <vfalico@redhat.com>
Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d2615bf4

20 11月, 2013 4 次提交

genetlink: make multicast groups const, prevent abuse · 2a94fe48

由 Johannes Berg 提交于 11月 19, 2013

Register generic netlink multicast groups as an array with
the family and give them contiguous group IDs. Then instead
of passing the global group ID to the various functions that
send messages, pass the ID relative to the family - for most
families that's just 0 because the only have one group.

This avoids the list_head and ID in each group, adding a new
field for the mcast group ID offset to the family.

At the same time, this allows us to prevent abusing groups
again like the quota and dropmon code did, since we can now
check that a family only uses a group it owns.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2a94fe48

genetlink: pass family to functions using groups · 68eb5503

由 Johannes Berg 提交于 11月 19, 2013

This doesn't really change anything, but prepares for the
next patch that will change the APIs to pass the group ID
within the family, rather than the global group ID.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

68eb5503

drop_monitor/genetlink: use proper genetlink multicast APIs · e5dcecba

由 Johannes Berg 提交于 11月 19, 2013

The drop monitor code is abusing the genetlink API and is
statically using the generic netlink multicast group 1, even
if that group belongs to somebody else (which it invariably
will, since it's not reserved.)

Make the drop monitor code use the proper APIs to reserve a
group ID, but also reserve the group id 1 in generic netlink
code to preserve the userspace API. Since drop monitor can
be a module, don't clear the bit for it on unregistration.
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e5dcecba

genetlink: only pass array to genl_register_family_with_ops() · c53ed742

由 Johannes Berg 提交于 11月 19, 2013

As suggested by David Miller, make genl_register_family_with_ops()
a macro and pass only the array, evaluating ARRAY_SIZE() in the
macro, this is a little safer.

The openvswitch has some indirection, assing ops/n_ops directly in
that code. This might ultimately just assign the pointers in the
family initializations, saving the struct genl_family_and_ops and
code (once mcast groups are handled differently.)
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c53ed742

16 11月, 2013 1 次提交

macvlan: disable LRO on lower device instead of macvlan · 529d0489

由 Michal Kubeček 提交于 11月 15, 2013

A macvlan device has always LRO disabled so that calling
dev_disable_lro() on it does nothing. If we need to disable LRO
e.g. because

  - the macvlan device is inserted into a bridge
  - IPv6 forwarding is enabled for it
  - it is in a different namespace than lowerdev and IPv4
    forwarding is enabled in it

we need to disable LRO on its underlying device instead (as we
do for 802.1q VLAN devices).

v2: use newly introduced netif_is_macvlan()
Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

529d0489

15 11月, 2013 1 次提交

genetlink: make all genl_ops users const · 4534de83

由 Johannes Berg 提交于 11月 14, 2013

Now that genl_ops are no longer modified in place when
registering, they can be made const. This patch was done
mostly with spatch:

@@
identifier ops;
@@
+const
 struct genl_ops ops[] = {
 ...
 };

(except the struct thing in net/openvswitch/datapath.c)
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4534de83

14 11月, 2013 1 次提交

core/dev: do not ignore dmac in dev_forward_skb() · 81b9eab5

由 Alexei Starovoitov 提交于 11月 12, 2013

commit 06a23fe3
("core/dev: set pkt_type after eth_type_trans() in dev_forward_skb()")
and refactoring 64261f23
("dev: move skb_scrub_packet() after eth_type_trans()")

are forcing pkt_type to be PACKET_HOST when skb traverses veth.

which means that ip forwarding will kick in inside netns
even if skb->eth->h_dest != dev->dev_addr

Fix order of eth_type_trans() and skb_scrub_packet() in dev_forward_skb()
and in ip_tunnel_rcv()

Fixes: 06a23fe3 ("core/dev: set pkt_type after eth_type_trans() in dev_forward_skb()")
CC: Isaku Yamahata <yamahatanetdev@gmail.com>
CC: Maciej Zenczykowski <zenczykowski@gmail.com>
CC: Nicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

81b9eab5

11 11月, 2013 1 次提交

netfilter: push reasm skb through instead of original frag skbs · 6aafeef0

由 Jiri Pirko 提交于 11月 06, 2013

Pushing original fragments through causes several problems. For example
for matching, frags may not be matched correctly. Take following
example:

<example>
On HOSTA do:
ip6tables -I INPUT -p icmpv6 -j DROP
ip6tables -I INPUT -p icmpv6 -m icmp6 --icmpv6-type 128 -j ACCEPT

and on HOSTB you do:
ping6 HOSTA -s2000    (MTU is 1500)

Incoming echo requests will be filtered out on HOSTA. This issue does
not occur with smaller packets than MTU (where fragmentation does not happen)
</example>

As was discussed previously, the only correct solution seems to be to use
reassembled skb instead of separete frags. Doing this has positive side
effects in reducing sk_buff by one pointer (nfct_reasm) and also the reams
dances in ipvs and conntrack can be removed.

Future plan is to remove net/ipv6/netfilter/nf_conntrack_reasm.c
entirely and use code in net/ipv6/reassembly.c instead.
Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Acked-by: NJulian Anastasov <ja@ssi.bg>
Signed-off-by: NMarcelo Ricardo Leitner <mleitner@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6aafeef0

09 11月, 2013 2 次提交

net: Fix "ip rule delete table 256" · 13eb2ab2

由 Andreas Henriksson 提交于 11月 07, 2013

When trying to delete a table >= 256 using iproute2 the local table
will be deleted.
The table id is specified as a netlink attribute when it needs more then
8 bits and iproute2 then sets the table field to RT_TABLE_UNSPEC (0).
Preconditions to matching the table id in the rule delete code
doesn't seem to take the "table id in netlink attribute" into condition
so the frh_get_table helper function never gets to do its job when
matching against current rule.
Use the helper function twice instead of peaking at the table value directly.

Originally reported at: http://bugs.debian.org/724783Reported-by: NNicolas HICHER <nhicher@avencall.com>
Signed-off-by: NAndreas Henriksson <andreas@fatal.se>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

13eb2ab2

net: flow_dissector: small optimizations in IPv4 dissect · 3797d3e8

由 Eric Dumazet 提交于 11月 07, 2013

By moving code around, we avoid :

1) A reload of iph->ihl (bit field, so needs a mask)

2) A conditional test (replaced by a conditional mov on x86)
   Fast path loads iph->protocol anyway.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3797d3e8

08 11月, 2013 3 次提交

net: skbuff - kernel-doc fixes · bc32383c

由 Mathias Krause 提交于 11月 07, 2013

Use "@" to refer to parameters in the kernel-doc description. According
to Documentation/kernel-doc-nano-HOWTO.txt "&" shall be used to refer to
structures only.
Signed-off-by: NMathias Krause <mathias.krause@secunet.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bc32383c

net: move pskb_put() to core code · 0c7ddf36

由 Mathias Krause 提交于 11月 07, 2013

This function has usage beside IPsec so move it to the core skbuff code.
While doing so, give it some documentation and change its return type to
'unsigned char *' to be in line with skb_put().
Signed-off-by: NMathias Krause <mathias.krause@secunet.com>
Cc: Steffen Klassert <steffen.klassert@secunet.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0c7ddf36

net: Add layer 2 hardware acceleration operations for macvlan devices · a6cc0cfa

由 John Fastabend 提交于 11月 06, 2013

Add a operations structure that allows a network interface to export
the fact that it supports package forwarding in hardware between
physical interfaces and other mac layer devices assigned to it (such
as macvlans). This operaions structure can be used by virtual mac
devices to bypass software switching so that forwarding can be done
in hardware more efficiently.
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Signed-off-by: NNeil Horman <nhorman@tuxdriver.com>
CC: Andy Gospodarek <andy@greyhouse.net>
CC: "David S. Miller" <davem@davemloft.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a6cc0cfa

05 11月, 2013 2 次提交

net: introduce skb_coalesce_rx_frag() · f8e617e1

由 Jason Wang 提交于 11月 01, 2013

Sometimes we need to coalesce the rx frags to avoid frag list. One example is
virtio-net driver which tries to use small frags for both MTU sized packet and
GSO packet. So this patch introduce skb_coalesce_rx_frag() to do this.

Cc: Rusty Russell <rusty@rustcorp.com.au>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Michael Dalton <mwdalton@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Acked-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NJason Wang <jasowang@redhat.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f8e617e1

net: checksum: fix warning in skb_checksum · cea80ea8

由 Daniel Borkmann 提交于 11月 04, 2013

This patch fixes a build warning in skb_checksum() by wrapping the
csum_partial() usage in skb_checksum(). The problem is that on a few
architectures, csum_partial is used with prefix asmlinkage whereas
on most architectures it's not. So fix this up generically as we did
with csum_block_add_ext() to match the signature. Introduced by
2817a336 ("net: skb_checksum: allow custom update/combine for
walking skb").
Reported-by: NFengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cea80ea8

04 11月, 2013 2 次提交

net: extend net_device allocation to vmalloc() · 74d332c1

由 Eric Dumazet 提交于 10月 30, 2013

Joby Poriyath provided a xen-netback patch to reduce the size of
xenvif structure as some netdev allocation could fail under
memory pressure/fragmentation.

This patch is handling the problem at the core level, allowing
any netdev structures to use vmalloc() if kmalloc() failed.

As vmalloc() adds overhead on a critical network path, add __GFP_REPEAT
to kzalloc() flags to do this fallback only when really needed.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Reported-by: NJoby Poriyath <joby.poriyath@citrix.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

74d332c1

net: skb_checksum: allow custom update/combine for walking skb · 2817a336

由 Daniel Borkmann 提交于 10月 30, 2013

Currently, skb_checksum walks over 1) linearized, 2) frags[], and
3) frag_list data and calculats the one's complement, a 32 bit
result suitable for feeding into itself or csum_tcpudp_magic(),
but unsuitable for SCTP as we're calculating CRC32c there.

Hence, in order to not re-implement the very same function in
SCTP (and maybe other protocols) over and over again, use an
update() + combine() callback internally to allow for walking
over the skb with different algorithms.
Signed-off-by: NDaniel Borkmann <dborkman@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2817a336

02 11月, 2013 1 次提交

net: flow_dissector: fail on evil iph->ihl · 6f092343

由 Jason Wang 提交于 11月 01, 2013

We don't validate iph->ihl which may lead a dead loop if we meet a IPIP
skb whose iph->ihl is zero. Fix this by failing immediately when iph->ihl
is evil (less than 5).

This issue were introduced by commit ec5efe79
(rps: support IPIP encapsulation).

Cc: Eric Dumazet <edumazet@google.com>
Cc: Petr Matousek <pmatouse@redhat.com>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Daniel Borkmann <dborkman@redhat.com>
Signed-off-by: NJason Wang <jasowang@redhat.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6f092343

29 10月, 2013 3 次提交
- Z
  net, mc: fix the incorrect comments in two mc-related functions · cdfb97bc
  由 Zhi Yong Wu 提交于 10月 28, 2013
```
Signed-off-by: NZhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  cdfb97bc
- Z
  net, iovec: fix the incorrect comment in memcpy_fromiovecend() · ab1a2d77
  由 Zhi Yong Wu 提交于 10月 28, 2013
```
Signed-off-by: NZhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  ab1a2d77
- Z
  net, datagram: fix the incorrect comment in zerocopy_sg_from_iovec() · c4e819d1
  由 Zhi Yong Wu 提交于 10月 28, 2013
```
Signed-off-by: NZhi Yong Wu <wuzhy@linux.vnet.ibm.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  c4e819d1
26 10月, 2013 5 次提交

netpoll: fix rx_hook() interface by passing the skb · 8fb479a4

由 Antonio Quartulli 提交于 10月 23, 2013

Right now skb->data is passed to rx_hook() even if the skb
has not been linearised and without giving rx_hook() a way
to linearise it.

Change the rx_hook() interface and make it accept the skb
and the offset to the UDP payload as arguments. rx_hook() is
also renamed to rx_skb_hook() to ensure that out of the tree
users notice the API change.

In this way any rx_skb_hook() implementation can perform all
the needed operations to properly (and safely) access the
skb data.
Signed-off-by: NAntonio Quartulli <antonio@meshcoding.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8fb479a4

net: fix rtnl notification in atomic context · 7f294054

由 Alexei Starovoitov 提交于 10月 23, 2013

commit 991fb3f7 "dev: always advertise rx_flags changes via netlink"
introduced rtnl notification from __dev_set_promiscuity(),
which can be called in atomic context.

Steps to reproduce:
ip tuntap add dev tap1 mode tap
ifconfig tap1 up
tcpdump -nei tap1 &
ip tuntap del dev tap1 mode tap

[  271.627994] device tap1 left promiscuous mode
[  271.639897] BUG: sleeping function called from invalid context at mm/slub.c:940
[  271.664491] in_atomic(): 1, irqs_disabled(): 0, pid: 3394, name: ip
[  271.677525] INFO: lockdep is turned off.
[  271.690503] CPU: 0 PID: 3394 Comm: ip Tainted: G        W    3.12.0-rc3+ #73
[  271.703996] Hardware name: System manufacturer System Product Name/P8Z77 WS, BIOS 3007 07/26/2012
[  271.731254]  ffffffff81a58506 ffff8807f0d57a58 ffffffff817544e5 ffff88082fa0f428
[  271.760261]  ffff8808071f5f40 ffff8807f0d57a88 ffffffff8108bad1 ffffffff81110ff8
[  271.790683]  0000000000000010 00000000000000d0 00000000000000d0 ffff8807f0d57af8
[  271.822332] Call Trace:
[  271.838234]  [<ffffffff817544e5>] dump_stack+0x55/0x76
[  271.854446]  [<ffffffff8108bad1>] __might_sleep+0x181/0x240
[  271.870836]  [<ffffffff81110ff8>] ? rcu_irq_exit+0x68/0xb0
[  271.887076]  [<ffffffff811a80be>] kmem_cache_alloc_node+0x4e/0x2a0
[  271.903368]  [<ffffffff810b4ddc>] ? vprintk_emit+0x1dc/0x5a0
[  271.919716]  [<ffffffff81614d67>] ? __alloc_skb+0x57/0x2a0
[  271.936088]  [<ffffffff810b4de0>] ? vprintk_emit+0x1e0/0x5a0
[  271.952504]  [<ffffffff81614d67>] __alloc_skb+0x57/0x2a0
[  271.968902]  [<ffffffff8163a0b2>] rtmsg_ifinfo+0x52/0x100
[  271.985302]  [<ffffffff8162ac6d>] __dev_notify_flags+0xad/0xc0
[  272.001642]  [<ffffffff8162ad0c>] __dev_set_promiscuity+0x8c/0x1c0
[  272.017917]  [<ffffffff81731ea5>] ? packet_notifier+0x5/0x380
[  272.033961]  [<ffffffff8162b109>] dev_set_promiscuity+0x29/0x50
[  272.049855]  [<ffffffff8172e937>] packet_dev_mc+0x87/0xc0
[  272.065494]  [<ffffffff81732052>] packet_notifier+0x1b2/0x380
[  272.080915]  [<ffffffff81731ea5>] ? packet_notifier+0x5/0x380
[  272.096009]  [<ffffffff81761c66>] notifier_call_chain+0x66/0x150
[  272.110803]  [<ffffffff8108503e>] __raw_notifier_call_chain+0xe/0x10
[  272.125468]  [<ffffffff81085056>] raw_notifier_call_chain+0x16/0x20
[  272.139984]  [<ffffffff81620190>] call_netdevice_notifiers_info+0x40/0x70
[  272.154523]  [<ffffffff816201d6>] call_netdevice_notifiers+0x16/0x20
[  272.168552]  [<ffffffff816224c5>] rollback_registered_many+0x145/0x240
[  272.182263]  [<ffffffff81622641>] rollback_registered+0x31/0x40
[  272.195369]  [<ffffffff816229c8>] unregister_netdevice_queue+0x58/0x90
[  272.208230]  [<ffffffff81547ca0>] __tun_detach+0x140/0x340
[  272.220686]  [<ffffffff81547ed6>] tun_chr_close+0x36/0x60
Signed-off-by: NAlexei Starovoitov <ast@plumgrid.com>
Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7f294054

net: initialize hashrnd in flow_dissector with net_get_random_once · 66415cf8

由 Hannes Frederic Sowa 提交于 10月 23, 2013

We also can defer the initialization of hashrnd in flow_dissector
to its first use. Since net_get_random_once is irq safe now we don't
have to audit the call paths if one of this functions get called by an
interrupt handler.

Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

66415cf8

net: make net_get_random_once irq safe · f84be2bd

由 Hannes Frederic Sowa 提交于 10月 23, 2013

I initial build non irq safe version of net_get_random_once because I
would liked to have the freedom to defer even the extraction process of
get_random_bytes until the nonblocking pool is fully seeded.

I don't think this is a good idea anymore and thus this patch makes
net_get_random_once irq safe. Now someone using net_get_random_once does
not need to care from where it is called.

Cc: David S. Miller <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f84be2bd

net: add missing dev_put() in __netdev_adjacent_dev_insert · 974daef7

由 Nikolay Aleksandrov 提交于 10月 23, 2013

I think that a dev_put() is needed in the error path to preserve the
proper dev refcount.

CC: Veaceslav Falico <vfalico@redhat.com>
Signed-off-by: NNikolay Aleksandrov <nikolay@redhat.com>
Acked-by: NVeaceslav Falico <vfalico@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

974daef7

24 10月, 2013 1 次提交

net: always inline net_secret_init · 34d92d53

由 Hannes Frederic Sowa 提交于 10月 23, 2013

Currently net_secret_init does not get inlined, so we always have a call
to net_secret_init even in the fast path.

Let's specify net_secret_init as __always_inline so we have the nop in
the fast-path without the call to net_secret_init and the unlikely path
at the epilogue of the function.

jump_labels handle the inlining correctly.
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

34d92d53

23 10月, 2013 1 次提交

net: remove function sk_reset_txq() · 0a6957e7

由 ZHAO Gang 提交于 10月 22, 2013

What sk_reset_txq() does is just calls function sk_tx_queue_reset(),
and sk_reset_txq() is used only in sock.h, by dst_negative_advice().
Let dst_negative_advice() calls sk_tx_queue_reset() directly so we
can remove unneeded sk_reset_txq().
Signed-off-by: NZHAO Gang <gamerh2o@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0a6957e7

22 10月, 2013 1 次提交

ipv6: sit: add GSO/TSO support · 61c1db7f

由 Eric Dumazet 提交于 10月 20, 2013

Now ipv6_gso_segment() is stackable, its relatively easy to
implement GSO/TSO support for SIT tunnels

Performance results, when segmentation is done after tunnel
device (as no NIC is yet enabled for TSO SIT support) :

Before patch :

lpq84:~# ./netperf -H 2002:af6:1153:: -Cc
MIGRATED TCP STREAM TEST from ::0 (::) port 0 AF_INET6 to 2002:af6:1153:: () port 0 AF_INET6
Recv Send Send Utilization Service Demand
Socket Socket Message Elapsed Send Recv Send Recv
Size Size Size Time Throughput local remote local remote
bytes bytes bytes secs. 10^6bits/s % S % S us/KB us/KB

87380 16384 16384 10.00 3168.31 4.81 4.64 2.988 2.877

After patch :

87380 16384 16384 10.00 5525.00 7.76 5.17 2.763 1.840
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

61c1db7f

20 10月, 2013 1 次提交

net: switch net_secret key generation to net_get_random_once · e34c9a69

由 Hannes Frederic Sowa 提交于 10月 19, 2013

Cc: Eric Dumazet <edumazet@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e34c9a69