提交 · e3d8fabee3b66ce158b2603f270479b84b6e4ba7 · openanolis / cloud-kernel

08 12月, 2012 1 次提交

net: call notifiers for mtu change even if iface is not up · e3d8fabe

由 Jiri Pirko 提交于 12月 03, 2012

Do the same thing as in set mac. Call notifiers every time.
Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e3d8fabe

05 12月, 2012 1 次提交

net: dev_change_net_namespace: send a KOBJ_REMOVED/KOBJ_ADD · 4e66ae2e

由 Serge Hallyn 提交于 12月 03, 2012

When a new nic is created in namespace ns1, the kernel sends a KOBJ_ADD uevent
to ns1.  When the nic is moved to ns2, we only send a KOBJ_MOVE to ns2, and
nothing to ns1.

This patch changes that behavior so that when moving a nic from ns1 to ns2, we
send a KOBJ_REMOVED to ns1 and KOBJ_ADD to ns2.  (The KOBJ_MOVE is still
sent to ns2).

The effects of this can be seen when starting and stopping containers in
an upstart based host.  Lxc will create a pair of veth nics, the kernel
sends KOBJ_ADD, and upstart starts network-instance jobs for each.  When
one nic is moved to the container, because no KOBJ_REMOVED event is
received, the network-instance job for that veth never goes away.  This
was reported at https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1065589
With this patch the networ-instance jobs properly go away.

The other oddness solved here is that if a nic is passed into a running
upstart-based container, without this patch no network-instance job is
started in the container.  But when the container creates a new nic
itself (ip link add new type veth) then network-interface jobs are
created.  With this patch, behavior comes in line with a regular host.

v2: also send KOBJ_ADD to new netns.  There will then be a
_MOVE event from the device_rename() call, but that should
be innocuous.
Signed-off-by: NSerge Hallyn <serge.hallyn@canonical.com>
Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: NDaniel Lezcano <daniel.lezcano@free.fr>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4e66ae2e

30 11月, 2012 1 次提交

core: make GRO methods static. · bb728820

由 Rami Rosen 提交于 11月 28, 2012

This patch changes three methods to be static and removes their
EXPORT_SYMBOLs in core/dev.c and their external declaration in
netdevice.h. The methods, dev_gro_receive(), napi_frags_finish() and
napi_skb_finish(), which are in the GRO rx path, are not used
outside core/dev.c.
Signed-off-by: NRami Rosen <ramirose@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bb728820

27 11月, 2012 1 次提交

sockopt: Change getsockopt() of SO_BINDTODEVICE to return an interface name · c91f6df2

由 Brian Haley 提交于 11月 26, 2012

Instead of having the getsockopt() of SO_BINDTODEVICE return an index, which
will then require another call like if_indextoname() to get the actual interface
name, have it return the name directly.

This also matches the existing man page description on socket(7) which mentions
the argument being an interface name.

If the value has not been set, zero is returned and optlen will be set to zero
to indicate there is no interface name present.

Added a seqlock to protect this code path, and dev_ifname(), from someone
changing the device name via dev_change_name().

v2: Added seqlock protection while copying device name.

v3: Fixed word wrap in patch.
Signed-off-by: NBrian Haley <brian.haley@hp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c91f6df2

21 11月, 2012 1 次提交

net: Remove redundant null check before kfree in dev.c · 388dfc2d

由 Sachin Kamat 提交于 11月 20, 2012

kfree on a null pointer is a no-op.
Signed-off-by: NSachin Kamat <sachin.kamat@linaro.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

388dfc2d

19 11月, 2012 1 次提交

net: Allow userns root control of the core of the network stack. · 5e1fccc0

由 Eric W. Biederman 提交于 11月 16, 2012

Allow an unpriviled user who has created a user namespace, and then
created a network namespace to effectively use the new network
namespace, by reducing capable(CAP_NET_ADMIN) and
capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns,
CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls.

Settings that merely control a single network device are allowed.
Either the network device is a logical network device where
restrictions make no difference or the network device is hardware NIC
that has been explicity moved from the initial network namespace.

In general policy and network stack state changes are allowed
while resource control is left unchanged.

Allow ethtool ioctls.

Allow binding to network devices.
Allow setting the socket mark.
Allow setting the socket priority.

Allow setting the network device alias via sysfs.
Allow setting the mtu via sysfs.
Allow changing the network device flags via sysfs.
Allow setting the network device group via sysfs.

Allow the following network device ioctls.
SIOCGMIIPHY
SIOCGMIIREG
SIOCSIFNAME
SIOCSIFFLAGS
SIOCSIFMETRIC
SIOCSIFMTU
SIOCSIFHWADDR
SIOCSIFSLAVE
SIOCADDMULTI
SIOCDELMULTI
SIOCSIFHWBROADCAST
SIOCSMIIREG
SIOCBONDENSLAVE
SIOCBONDRELEASE
SIOCBONDSETHWADDR
SIOCBONDCHANGEACTIVE
SIOCBRADDIF
SIOCBRDELIF
SIOCSHWTSTAMP
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5e1fccc0

17 11月, 2012 2 次提交

net-rps: Fix brokeness causing OOO packets · baefa31d

由 Tom Herbert 提交于 11月 16, 2012

In commit c445477d which adds aRFS to the kernel, the CPU
selected for RFS is not set correctly when CPU is changing.
This is causing OOO packets and probably other issues.
Signed-off-by: NTom Herbert <therbert@google.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Acked-by: NBen Hutchings <bhutchings@solarflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

baefa31d

net: use right lock in __dev_remove_offload · c53aa505

由 Eric Dumazet 提交于 11月 16, 2012

offload_base is protected by offload_lock, not ptype_lock
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Vlad Yasevich <vyasevic@redhat.com>
Acked-by: NVlad Yasevich <vyasevic@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c53aa505

16 11月, 2012 3 次提交

net: Remove code duplication between offload structures · f191a1d1

由 Vlad Yasevich 提交于 11月 15, 2012

Move the offload callbacks into its own structure.
Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f191a1d1

net: Switch to using the new packet offload infrustructure · 22061d80

由 Vlad Yasevich 提交于 11月 15, 2012

Convert to using the new GSO/GRO registration mechanism and new
packet offload structure.
Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

22061d80

net: Add generic packet offload infrastructure. · 62532da9

由 Vlad Yasevich 提交于 11月 15, 2012

Create a new data structure to contain the GRO/GSO callbacks and add
a new registration mechanism.
Singed-off-by: NVlad Yasevich <vyasevic@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

62532da9

08 11月, 2012 1 次提交

af-packet: fix oops when socket is not present · a3d744e9

由 Eric Leblond 提交于 11月 06, 2012

Due to a NULL dereference, the following patch is causing oops
in normal trafic condition:

commit c0de08d0
Author: Eric Leblond <eric@regit.org>
Date:   Thu Aug 16 22:02:58 2012 +0000

    af_packet: don't emit packet on orig fanout group

This buggy patch was a feature fix and has reached most stable
branches.

When skb->sk is NULL and when packet fanout is used, there is a
crash in match_fanout_group where skb->sk is accessed.
This patch fixes the issue by returning false as soon as the
socket is NULL: this correspond to the wanted behavior because
the kernel as to resend the skb to all the listening socket in
this case.
Signed-off-by: NEric Leblond <eric@regit.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a3d744e9

22 10月, 2012 1 次提交

net:dev: remove double indentical assignment in dev_change_net_namespace(). · 47b70db5

由 Rami Rosen 提交于 10月 19, 2012

This patch removes double assignment of err to -EINVAL in dev_change_net_namespace().
Signed-off-by: NRami Rosen <ramirose@gmail.com>
Acked-by: NSerge E. Hallyn <serge.hallyn@ubuntu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

47b70db5

09 10月, 2012 2 次提交

vlan: don't deliver frames for unknown vlans to protocols · 48cc32d3

由 Florian Zumbiehl 提交于 10月 07, 2012

6a32e4f9 made the vlan code skip marking
vlan-tagged frames for not locally configured vlans as PACKET_OTHERHOST if
there was an rx_handler, as the rx_handler could cause the frame to be received
on a different (virtual) vlan-capable interface where that vlan might be
configured.

As rx_handlers do not necessarily return RX_HANDLER_ANOTHER, this could cause
frames for unknown vlans to be delivered to the protocol stack as if they had
been received untagged.

For example, if an ipv6 router advertisement that's tagged for a locally not
configured vlan is received on an interface with macvlan interfaces attached,
macvlan's rx_handler returns RX_HANDLER_PASS after delivering the frame to the
macvlan interfaces, which caused it to be passed to the protocol stack, leading
to ipv6 addresses for the announced prefix being configured even though those
are completely unusable on the underlying interface.

The fix moves marking as PACKET_OTHERHOST after the rx_handler so the
rx_handler, if there is one, sees the frame unchanged, but afterwards,
before the frame is delivered to the protocol stack, it gets marked whether
there is an rx_handler or not.
Signed-off-by: NFlorian Zumbiehl <florz@florz.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

48cc32d3

net: gro: selective flush of packets · 2e71a6f8

由 Eric Dumazet 提交于 10月 06, 2012

Current GRO can hold packets in gro_list for almost unlimited
time, in case napi->poll() handler consumes its budget over and over.

In this case, napi_complete()/napi_gro_flush() are not called.

Another problem is that gro_list is flushed in non friendly way :
We scan the list and complete packets in the reverse order.
(youngest packets first, oldest packets last)
This defeats priorities that sender could have cooked.

Since GRO currently only store TCP packets, we dont really notice the
bug because of retransmits, but this behavior can add unexpected
latencies, particularly on mice flows clamped by elephant flows.

This patch makes sure no packet can stay more than 1 ms in queue, and
only in stress situations.

It also complete packets in the right order to minimize latencies.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Cc: Jesse Gross <jesse@nicira.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Yuchung Cheng <ycheng@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2e71a6f8

08 10月, 2012 1 次提交

net: gro: fix a potential crash in skb_gro_reset_offset · ca07e43e

由 Eric Dumazet 提交于 10月 06, 2012

Before accessing skb first fragment, better make sure there
is one.

This is probably not needed for old kernels, since an ethernet frame
cannot contain only an ethernet header, but the recent GRO addition
to tunnels makes this patch needed.

Also skb_gro_reset_offset() can be static, it actually allows
compiler to inline it.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ca07e43e

02 10月, 2012 1 次提交

net: add gro_cells infrastructure · c9e6bc64

由 Eric Dumazet 提交于 9月 27, 2012

This adds a new include file (include/net/gro_cells.h), to bring GRO
(Generic Receive Offload) capability to tunnels, in a modular way.

Because tunnels receive path is lockless, and GRO adds a serialization
using a napi_struct, I chose to add an array of up to
DEFAULT_MAX_NUM_RSS_QUEUES cells, so that multi queue devices wont be
slowed down because of GRO layer.

skb_get_rx_queue() is used as selector.

In the future, we might add optional fanout capabilities, using rxhash
for example.

With help from Ben Hutchings who reminded me
netif_get_num_default_rss_queues() function.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c9e6bc64

21 9月, 2012 1 次提交

net: do not disable sg for packets requiring no checksum · c0d680e5

由 Ed Cashin 提交于 9月 19, 2012

A change in a series of VLAN-related changes appears to have
inadvertently disabled the use of the scatter gather feature of
network cards for transmission of non-IP ethernet protocols like ATA
over Ethernet (AoE).  Below is a reference to the commit that
introduces a "harmonize_features" function that turns off scatter
gather when the NIC does not support hardware checksumming for the
ethernet protocol of an sk buff.

  commit f01a5236
  Author: Jesse Gross <jesse@nicira.com>
  Date:   Sun Jan 9 06:23:31 2011 +0000

      net offloading: Generalize netif_get_vlan_features().

The can_checksum_protocol function is not equipped to consider a
protocol that does not require checksumming.  Calling it for a
protocol that requires no checksum is inappropriate.

The patch below has harmonize_features call can_checksum_protocol when
the protocol needs a checksum, so that the network layer is not forced
to perform unnecessary skb linearization on the transmission of AoE
packets.  Unnecessary linearization results in decreased performance
and increased memory pressure, as reported here:

  http://www.spinics.net/lists/linux-mm/msg15184.html

The problem has probably not been widely experienced yet, because
only recently has the kernel.org-distributed aoe driver acquired the
ability to use payloads of over a page in size, with the patchset
recently included in the mm tree:

  https://lkml.org/lkml/2012/8/28/140

The coraid.com-distributed aoe driver already could use payloads of
greater than a page in size, but its users generally do not use the
newest kernels.
Signed-off-by: NEd Cashin <ecashin@coraid.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c0d680e5

20 9月, 2012 4 次提交

netpoll: call ->ndo_select_queue() in tx path · 8c4c49df

由 Amerigo Wang 提交于 9月 17, 2012

In netpoll tx path, we miss the chance of calling ->ndo_select_queue(),
thus could cause problems when bonding is involved.

This patch makes dev_pick_tx() extern (and rename it to netdev_pick_tx())
to let netpoll call it in netpoll_send_skb_on_dev().
Reported-by: NSylvain Munaut <s.munaut@whatever-company.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: NCong Wang <amwang@redhat.com>
Tested-by: NSylvain Munaut <s.munaut@whatever-company.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8c4c49df

net: provide a default dev->ethtool_ops · 2c60db03

由 Eric Dumazet 提交于 9月 16, 2012

Instead of forcing device drivers to provide empty ethtool_ops or tweak
net/core/ethtool.c again, we could provide a generic ethtool_ops.

This occurred to me when I wanted to add GSO support to GRE tunnels.
ethtool -k support should be generic for all drivers.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Ben Hutchings <bhutchings@solarflare.com>
Cc: Maciej Żenczykowski <maze@google.com>
Reviewed-by: NBen Hutchings <bhutchings@solarflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2c60db03

net: dev: fix incorrect getting net device's name · 828de4f6

由 Gao feng 提交于 9月 13, 2012

When moving a nic from net namespace A to net namespace B,
in dev_change_net_namesapce,we call __dev_get_by_name to
decide if the netns B has the device has the same name.

if the netns B already has the same named device,we call
dev_get_valid_name to try to get a valid name for this nic in
the netns B,but net_device->nd_net still point to netns A now.

this patch fix it.
Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

828de4f6

net: more accurate network taps in transmit path · b40863c6

由 Eric Dumazet 提交于 9月 18, 2012

dev_queue_xmit_nit() should be called right before ndo_start_xmit()
calls or we might give wrong packet contents to taps users :

Packet checksum can be changed, or packet can be linearized or
segmented, and segments partially sent for the later case.

Also a memory allocation can fail and packet never really hit the
driver entry point.
Reported-by: NJamie Gloudon <jamie.gloudon@gmail.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b40863c6

19 9月, 2012 1 次提交

net: fix memory leak on oom with zerocopy · 0e698bf6

由 Michael S. Tsirkin 提交于 9月 15, 2012

If orphan flags fails, we don't free the skb
on receive, which leaks the skb memory.

Return value was also wrong: netif_receive_skb
is supposed to return NET_RX_DROP, not ENOMEM.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0e698bf6

18 9月, 2012 1 次提交

userns: Convert the audit loginuid to be a kuid · e1760bd5

由 Eric W. Biederman 提交于 9月 10, 2012

Always store audit loginuids in type kuid_t.

Print loginuids by converting them into uids in the appropriate user
namespace, and then printing the resulting uid.

Modify audit_get_loginuid to return a kuid_t.

Modify audit_set_loginuid to take a kuid_t.

Modify /proc/<pid>/loginuid on read to convert the loginuid into the
user namespace of the opener of the file.

Modify /proc/<pid>/loginud on write to convert the loginuid
rom the user namespace of the opener of the file.

Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Eric Paris <eparis@redhat.com>
Cc: Paul Moore <paul@paul-moore.com> ?
Cc: David Miller <davem@davemloft.net>
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>

e1760bd5

17 9月, 2012 3 次提交

device and dynamic_debug: Use dev_vprintk_emit and dev_printk_emit · 666f355f

由 Joe Perches 提交于 9月 12, 2012

Convert direct calls of vprintk_emit and printk_emit to the
dev_ equivalents.

Make create_syslog_header static.
Signed-off-by: NJoe Perches <joe@perches.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Tested-by: NJim Cromie <jim.cromie@gmail.com>
Acked-by: NJason Baron <jbaron@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

666f355f

netdev_printk/netif_printk: Remove a superfluous logging colon · c2c5a705

由 Joe Perches 提交于 9月 12, 2012

netdev_printk originally called dev_printk with %pV.

This style emitted the complete dev_printk header with
a colon followed by the netdev_name prefix followed
by a colon.

Now that netdev_printk does not call dev_printk, the
extra colon is superfluous.  Remove it.

Example:
old: sky2 0000:02:00.0: eth0: Link is up at 100 Mbps, full duplex, flow control both
new: sky2 0000:02:00.0 eth0: Link is up at 100 Mbps, full duplex, flow control both
Signed-off-by: NJoe Perches <joe@perches.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Tested-by: NJim Cromie <jim.cromie@gmail.com>
Acked-by: NJason Baron <jbaron@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

c2c5a705

netdev_printk/dynamic_netdev_dbg: Directly call printk_emit · b004ff49

由 Joe Perches 提交于 9月 12, 2012

A lot of stack is used in recursive printks with %pV.

Using multiple levels of %pV (a logging function with %pV
that calls another logging function with %pV) can consume
more stack than necessary.

Avoid excessive stack use by not calling dev_printk from
netdev_printk and dynamic_netdev_dbg.  Duplicate the logic
and form of dev_printk instead.

Make __netdev_printk static.
Remove EXPORT_SYMBOL(__netdev_printk)
Whitespace and brace style neatening.
Signed-off-by: NJoe Perches <joe@perches.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Tested-by: NJim Cromie <jim.cromie@gmail.com>
Acked-by: NJason Baron <jbaron@redhat.com>
Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>

b004ff49

09 9月, 2012 1 次提交

net: small bug on rxhash calculation · 68622342

由 Chema Gonzalez 提交于 9月 07, 2012

In the current rxhash calculation function, while the
sorting of the ports/addrs is coherent (you get the
same rxhash for packets sharing the same 4-tuple, in
both directions), ports and addrs are sorted
independently. This implies packets from a connection
between the same addresses but crossed ports hash to
the same rxhash.

For example, traffic between A=S:l and B=L:s is hashed
(in both directions) from {L, S, {s, l}}. The same
rxhash is obtained for packets between C=S:s and D=L:l.

This patch ensures that you either swap both addrs and ports,
or you swap none. Traffic between A and B, and traffic
between C and D, get their rxhash from different sources
({L, S, {l, s}} for A<->B, and {L, S, {s, l}} for C<->D)

The patch is co-written with Eric Dumazet <edumazet@google.com>
Signed-off-by: NChema Gonzalez <chema@google.com>
Signed-off-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

68622342

01 9月, 2012 1 次提交

net: fix documentation of skb_needs_linearize(). · d1a53dfd

由 Rami Rosen 提交于 8月 27, 2012

skb_needs_linearize() does not check highmem DMA as it does not call
illegal_highdma() anymore, so there is no need to mention highmem DMA here.

(Indeed, ~NETIF_F_SG flag, which is checked in skb_needs_linearize(), can
be set when illegal_highdma() returns true, and we are assured that
illegal_highdma() is invoked prior to skb_needs_linearize() as
skb_needs_linearize() is a static method called only once.
But ~NETIF_F_SG can be set not only there in this same invocation path.
It can also be set when can_checksum_protocol() returns false).

see commit 02932ce9,
Convert skb_need_linearize() to use precomputed features.
Signed-off-by: NRami Rosen <rosenr@marvell.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d1a53dfd

31 8月, 2012 1 次提交

net: dev: fix the incorrect hold of net namespace's lo device · 6549dd43

由 Gao feng 提交于 8月 23, 2012

When moving a net device from one net namespace to another
net namespace,dev_change_net_namespace calls NETDEV_DOWN
event,so the original net namespace's dst entries which
beloned to this net device will be put into dst_garbage
list.

then dev_change_net_namespace will set this net device's
net to the new net namespace.

If we unregister this net device's driver, this will trigger
the NETDEV_UNREGISTER_FINAL event, dst_ifdown will be called,
and get this net device's dst entries from dst_garbage list,
put these entries' dev to the new net namespace's lo device.

It's not what we want,actually we need these dst entries hold
the original net namespace's lo device,this incorrect device
holding will trigger emg message like below.
unregister_netdevice: waiting for lo to become free. Usage count = 1

so we should call NETDEV_UNREGISTER_FINAL event in
dev_change_net_namespace too,in order to make sure dst entries
already in the dst_garbage list, we need rcu_barrier before we
call NETDEV_UNREGISTER_FINAL event.

With help form Eric Dumazet.
Signed-off-by: NGao feng <gaofeng@cn.fujitsu.com>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

6549dd43

25 8月, 2012 1 次提交

net: Set device operstate at registration time · 8f4cccbb

由 Ben Hutchings 提交于 8月 20, 2012

The operstate of a device is initially IF_OPER_UNKNOWN and is updated
asynchronously by linkwatch after each change of carrier state
reported by the driver.  The default carrier state of a net device is
on, and this will never be changed on drivers that do not support
carrier detection, thus the operstate remains IF_OPER_UNKNOWN.

For devices that do support carrier detection, the driver must set the
carrier state to off initially, then poll the hardware state when the
device is opened.  However, we must not activate linkwatch for a
unregistered device, and commit b4730016 ('net: Do not fire linkwatch
events until the device is registered.') ensured that we don't.  But
this means that the operstate for many devices that support carrier
detection remains IF_OPER_UNKNOWN when it should be IF_OPER_DOWN.

The same issue exists with the dormant state.

The proper initialisation sequence, avoiding a race with opening of
the device, is:

        rtnl_lock();
        rc = register_netdevice(dev);
        if (rc)
                goto out_unlock;
        netif_carrier_off(dev); /* or netif_dormant_on(dev) */
        rtnl_unlock();

but it seems silly that this should have to be repeated in so many
drivers.  Further, the operstate seen immediately after opening the
device may still be IF_OPER_UNKNOWN due to the asynchronous nature of
linkwatch.

Commit 22604c86 ('net: Fix for initial link state in 2.6.28') attempted
to fix this by setting the operstate synchronously, but it was
reverted as it could lead to deadlock.

This initialises the operstate synchronously at registration time
only.
Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

8f4cccbb

24 8月, 2012 1 次提交

net: reinstate rtnl in call_netdevice_notifiers() · 748e2d93

由 Eric Dumazet 提交于 8月 22, 2012

Eric Biederman pointed out that not holding RTNL while calling
call_netdevice_notifiers() was racy.

This patch is a direct transcription his feedback
against commit 0115e8e3 (net: remove delay at device dismantle)

Thanks Eric !
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Gao feng <gaofeng@cn.fujitsu.com>
Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

748e2d93

23 8月, 2012 1 次提交

net: remove delay at device dismantle · 0115e8e3

由 Eric Dumazet 提交于 8月 22, 2012

I noticed extra one second delay in device dismantle, tracked down to
a call to dst_dev_event() while some call_rcu() are still in RCU queues.

These call_rcu() were posted by rt_free(struct rtable *rt) calls.

We then wait a little (but one second) in netdev_wait_allrefs() before
kicking again NETDEV_UNREGISTER.

As the call_rcu() are now completed, dst_dev_event() can do the needed
device swap on busy dst.

To solve this problem, add a new NETDEV_UNREGISTER_FINAL, called
after a rcu_barrier(), but outside of RTNL lock.

Use NETDEV_UNREGISTER_FINAL with care !

Change dst_dev_event() handler to react to NETDEV_UNREGISTER_FINAL

Also remove NETDEV_UNREGISTER_BATCH, as its not used anymore after
IP cache removal.

With help from Gao feng
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Tom Herbert <therbert@google.com>
Cc: Mahesh Bandewar <maheshb@google.com>
Cc: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: Gao feng <gaofeng@cn.fujitsu.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0115e8e3

20 8月, 2012 2 次提交

net/core/dev.c: fix kernel-doc warning · 3de7a37b

由 Randy Dunlap 提交于 8月 18, 2012

Fix kernel-doc warning:

Warning(net/core/dev.c:5745): No description found for parameter 'dev'
Signed-off-by: NRandy Dunlap <rdunlap@xenotime.net>
Cc:	"David S. Miller" <davem@davemloft.net>
Cc:	netdev@vger.kernel.org
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3de7a37b

af_packet: don't emit packet on orig fanout group · c0de08d0

由 Eric Leblond 提交于 8月 16, 2012

If a packet is emitted on one socket in one group of fanout sockets,
it is transmitted again. It is thus read again on one of the sockets
of the fanout group. This result in a loop for software which
generate packets when receiving one.
This retransmission is not the intended behavior: a fanout group
must behave like a single socket. The packet should not be
transmitted on a socket if it originates from a socket belonging
to the same fanout group.

This patch fixes the issue by changing the transmission check to
take fanout group info account.
Reported-by: NAleksandr Kotov <a1k@mail.ru>
Signed-off-by: NEric Leblond <eric@regit.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c0de08d0

15 8月, 2012 3 次提交

userns: Convert __dev_set_promiscuity to use kuids in audit logs · d04a48b0

由 Eric W. Biederman 提交于 5月 23, 2012

Cc: Klaus Heinrich Kiwi <klausk@br.ibm.com>
Cc: Eric Paris <eparis@redhat.com>
Acked-by: NDavid S. Miller <davem@davemloft.net>
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>

d04a48b0

net: remove netdev_bonding_change() · b7bc2a5b

由 Amerigo Wang 提交于 8月 09, 2012

I don't see any benifits to use netdev_bonding_change() than
using call_netdevice_notifiers() directly.

Cc: David S. Miller <davem@davemloft.net>
Signed-off-by: NCong Wang <amwang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b7bc2a5b

net: move and rename netif_notify_peers() · ee89bab1

由 Amerigo Wang 提交于 8月 09, 2012

I believe net/core/dev.c is a better place for netif_notify_peers(),
because other net event notify functions also stay in this file.

And rename it to netdev_notify_peers().

Cc: David S. Miller <davem@davemloft.net>
Cc: Ian Campbell <Ian.Campbell@citrix.com>
Signed-off-by: NCong Wang <amwang@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ee89bab1

10 8月, 2012 2 次提交

net: Make ifindex generation per-net namespace · aa79e66e

由 Pavel Emelyanov 提交于 8月 08, 2012

Strictly speaking this is only _really_ required for checkpoint-restore to
make loopback device always have the same index.

This change appears to be safe wrt "ifindex should be unique per-system"
concept, as all the ifindex usage is either already made per net namespace
of is explicitly limited with init_net only.

There are two cool side effects of this. The first one -- ifindices of
devices in container are always small, regardless of how many containers
we've started (and re-started) so far. The second one is -- we can speed
up the loopback ifidex access as shown in the next patch.

v2: Place ifindex right after dev_base_seq : avoid two holes and use the
same cache line, dirtied in list_netdevice()/unlist_netdevice()
Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

aa79e66e

net: Allow to create links with given ifindex · 9c7dafbf

由 Pavel Emelyanov 提交于 8月 08, 2012

Currently the RTM_NEWLINK results in -EOPNOTSUPP if the ifinfomsg->ifi_index
is not zero. I propose to allow requesting ifindices on link creation. This
is required by the checkpoint-restore to correctly restore a net namespace
(i.e. -- a container).
Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

9c7dafbf

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功