提交 · e3d8fabee3b66ce158b2603f270479b84b6e4ba7 · openanolis / cloud-kernel

08 12月, 2012 1 次提交

net: call notifiers for mtu change even if iface is not up · e3d8fabe

由 Jiri Pirko 提交于 12月 03, 2012

Do the same thing as in set mac. Call notifiers every time.
Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e3d8fabe

06 12月, 2012 2 次提交

net: fix some compiler warning in net/core/neighbour.c · b93196dc

由 Cong Wang 提交于 12月 06, 2012

net/core/neighbour.c:65:12: warning: 'zero' defined but not used [-Wunused-variable]
net/core/neighbour.c:66:12: warning: 'unres_qlen_max' defined but not used [-Wunused-variable]

These variables are only used when CONFIG_SYSCTL is defined,
so move them under #ifdef CONFIG_SYSCTL.
Reported-by: NFengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NCong Wang <amwang@redhat.com>
Acked-by: NShan Wei <davidshan@tencent.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b93196dc

net: neighbour: prohibit negative value for unres_qlen_bytes parameter · ce46cc64

由 Shan Wei 提交于 12月 04, 2012

unres_qlen_bytes and unres_qlen are int type.
But multiple relation(unres_qlen_bytes = unres_qlen * SKB_TRUESIZE(ETH_FRAME_LEN))
will cause type overflow when seting unres_qlen. e.g.

$ echo 1027506 > /proc/sys/net/ipv4/neigh/eth1/unres_qlen
$ cat /proc/sys/net/ipv4/neigh/eth1/unres_qlen
1182657265
$ cat /proc/sys/net/ipv4/neigh/eth1/unres_qlen_bytes
-2147479756

The gutted value is not that we setting。
But user/administrator don't know this is caused by int type overflow.

what's more, it is meaningless and even dangerous that unres_qlen_bytes is set
with negative number. Because, for unresolved neighbour address, kernel will cache packets
without limit in __neigh_event_send()(e.g. (u32)-1 = 2GB).
Signed-off-by: NShan Wei <davidshan@tencent.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ce46cc64

05 12月, 2012 1 次提交

net: dev_change_net_namespace: send a KOBJ_REMOVED/KOBJ_ADD · 4e66ae2e

由 Serge Hallyn 提交于 12月 03, 2012

When a new nic is created in namespace ns1, the kernel sends a KOBJ_ADD uevent
to ns1.  When the nic is moved to ns2, we only send a KOBJ_MOVE to ns2, and
nothing to ns1.

This patch changes that behavior so that when moving a nic from ns1 to ns2, we
send a KOBJ_REMOVED to ns1 and KOBJ_ADD to ns2.  (The KOBJ_MOVE is still
sent to ns2).

The effects of this can be seen when starting and stopping containers in
an upstart based host.  Lxc will create a pair of veth nics, the kernel
sends KOBJ_ADD, and upstart starts network-instance jobs for each.  When
one nic is moved to the container, because no KOBJ_REMOVED event is
received, the network-instance job for that veth never goes away.  This
was reported at https://bugs.launchpad.net/ubuntu/+source/lxc/+bug/1065589
With this patch the networ-instance jobs properly go away.

The other oddness solved here is that if a nic is passed into a running
upstart-based container, without this patch no network-instance job is
started in the container.  But when the container creates a new nic
itself (ip link add new type veth) then network-interface jobs are
created.  With this patch, behavior comes in line with a regular host.

v2: also send KOBJ_ADD to new netns.  There will then be a
_MOVE event from the device_rename() call, but that should
be innocuous.
Signed-off-by: NSerge Hallyn <serge.hallyn@canonical.com>
Acked-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Acked-by: NDaniel Lezcano <daniel.lezcano@free.fr>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4e66ae2e

01 12月, 2012 1 次提交

rtnelink: remove unused parameter from rtnl_create_link(). · c0713563

由 Rami Rosen 提交于 11月 30, 2012

This patch removes an unused parameter (src_net) from rtnl_create_link()
method and from the method single invocation, in veth.
This parameter was used in the past when calling
ops->get_tx_queues(src_net, tb) in rtnl_create_link().
The get_tx_queues() member of rtnl_link_ops was replaced by two methods,
get_num_tx_queues() and get_num_rx_queues(), which do not get any
parameter. This was done in commit d40156aa by
Jiri Pirko ("rtnl: allow to specify different num for rx and tx queue count").
Signed-off-by: NRami Rosen <ramirose@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c0713563

30 11月, 2012 1 次提交

core: make GRO methods static. · bb728820

由 Rami Rosen 提交于 11月 28, 2012

This patch changes three methods to be static and removes their
EXPORT_SYMBOLs in core/dev.c and their external declaration in
netdevice.h. The methods, dev_gro_receive(), napi_frags_finish() and
napi_skb_finish(), which are in the GRO rx path, are not used
outside core/dev.c.
Signed-off-by: NRami Rosen <ramirose@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bb728820

27 11月, 2012 1 次提交

sockopt: Change getsockopt() of SO_BINDTODEVICE to return an interface name · c91f6df2

由 Brian Haley 提交于 11月 26, 2012

Instead of having the getsockopt() of SO_BINDTODEVICE return an index, which
will then require another call like if_indextoname() to get the actual interface
name, have it return the name directly.

This also matches the existing man page description on socket(7) which mentions
the argument being an interface name.

If the value has not been set, zero is returned and optlen will be set to zero
to indicate there is no interface name present.

Added a seqlock to protect this code path, and dev_ifname(), from someone
changing the device name via dev_change_name().

v2: Added seqlock protection while copying device name.

v3: Fixed word wrap in patch.
Signed-off-by: NBrian Haley <brian.haley@hp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c91f6df2

21 11月, 2012 1 次提交

net: Remove redundant null check before kfree in dev.c · 388dfc2d

由 Sachin Kamat 提交于 11月 20, 2012

kfree on a null pointer is a no-op.
Signed-off-by: NSachin Kamat <sachin.kamat@linaro.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

388dfc2d

20 11月, 2012 2 次提交

net: remove unnecessary wireless includes · 01f1c6b9

由 Johannes Berg 提交于 11月 16, 2012

The wireless and wext includes in net-sysfs.c aren't
needed, so remove them.
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

01f1c6b9

net: core: use this_cpu_ptr per-cpu helper · 1f743b07

由 Shan Wei 提交于 11月 12, 2012

flush_tasklet is a struct, not a pointer in percpu var.
so use this_cpu_ptr to get the member pointer.
Signed-off-by: NShan Wei <davidshan@tencent.com>
Reviewed-by: NChristoph Lameter <cl@linux.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1f743b07

19 11月, 2012 7 次提交

net: Enable a userns root rtnl calls that are safe for unprivilged users · b51642f6

由 Eric W. Biederman 提交于 11月 16, 2012

- Only allow moving network devices to network namespaces you have
  CAP_NET_ADMIN privileges over.

- Enable creating/deleting/modifying interfaces
- Enable adding/deleting addresses
- Enable adding/setting/deleting neighbour entries
- Enable adding/removing routes
- Enable adding/removing fib rules
- Enable setting the forwarding state
- Enable adding/removing ipv6 address labels
- Enable setting bridge parameter
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b51642f6

net: Allow userns root control of the core of the network stack. · 5e1fccc0

由 Eric W. Biederman 提交于 11月 16, 2012

Allow an unpriviled user who has created a user namespace, and then
created a network namespace to effectively use the new network
namespace, by reducing capable(CAP_NET_ADMIN) and
capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns,
CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls.

Settings that merely control a single network device are allowed.
Either the network device is a logical network device where
restrictions make no difference or the network device is hardware NIC
that has been explicity moved from the initial network namespace.

In general policy and network stack state changes are allowed
while resource control is left unchanged.

Allow ethtool ioctls.

Allow binding to network devices.
Allow setting the socket mark.
Allow setting the socket priority.

Allow setting the network device alias via sysfs.
Allow setting the mtu via sysfs.
Allow changing the network device flags via sysfs.
Allow setting the network device group via sysfs.

Allow the following network device ioctls.
SIOCGMIIPHY
SIOCGMIIREG
SIOCSIFNAME
SIOCSIFFLAGS
SIOCSIFMETRIC
SIOCSIFMTU
SIOCSIFHWADDR
SIOCSIFSLAVE
SIOCADDMULTI
SIOCDELMULTI
SIOCSIFHWBROADCAST
SIOCSMIIREG
SIOCBONDENSLAVE
SIOCBONDRELEASE
SIOCBONDSETHWADDR
SIOCBONDCHANGEACTIVE
SIOCBRADDIF
SIOCBRDELIF
SIOCSHWTSTAMP
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5e1fccc0

net: Allow userns root to force the scm creds · 00f70de0

由 Eric W. Biederman 提交于 11月 16, 2012

If the user calling sendmsg has the appropriate privieleges
in their user namespace allow them to set the uid, gid, and
pid in the SCM_CREDENTIALS control message to any valid value.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

00f70de0

net: Push capable(CAP_NET_ADMIN) into the rtnl methods · dfc47ef8

由 Eric W. Biederman 提交于 11月 16, 2012

- In rtnetlink_rcv_msg convert the capable(CAP_NET_ADMIN) check
  to ns_capable(net->user-ns, CAP_NET_ADMIN).  Allowing unprivileged
  users to make netlink calls to modify their local network
  namespace.

- In the rtnetlink doit methods add capable(CAP_NET_ADMIN) so
  that calls that are not safe for unprivileged users are still
  protected.

Later patches will remove the extra capable calls from methods
that are safe for unprivilged users.
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

dfc47ef8

net: Don't export sysctls to unprivileged users · 464dc801

由 Eric W. Biederman 提交于 11月 16, 2012

In preparation for supporting the creation of network namespaces
by unprivileged users, modify all of the per net sysctl exports
and refuse to allow them to unprivileged users.

This makes it safe for unprivileged users in general to access
per net sysctls, and allows sysctls to be exported to unprivileged
users on an individual basis as they are deemed safe.
Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

464dc801

userns: make each net (net_ns) belong to a user_ns · d328b836

由 Eric W. Biederman 提交于 11月 16, 2012

The user namespace which creates a new network namespace owns that
namespace and all resources created in it.  This way we can target
capability checks for privileged operations against network resources to
the user_ns which created the network namespace in which the resource
lives.  Privilege to the user namespace which owns the network
namespace, or any parent user namespace thereof, provides the same
privilege to the network resource.

This patch is reworked from a version originally by
Serge E. Hallyn <serge.hallyn@canonical.com>
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d328b836

netns: Deduplicate and fix copy_net_ns when !CONFIG_NET_NS · 2407dc25

由 Eric W. Biederman 提交于 11月 16, 2012

The copy of copy_net_ns used when the network stack is not
built is broken as it does not return -EINVAL when attempting
to create a new network namespace.  We don't even have
a previous network namespace.

Since we need a copy of copy_net_ns in net/net_namespace.h that is
available when the networking stack is not built at all move the
correct version of copy_net_ns from net_namespace.c into net_namespace.h
Leaving us with just 2 versions of copy_net_ns.  One version for when
we compile in network namespace suport and another stub for all other
occasions.
Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2407dc25

17 11月, 2012 3 次提交

wireless: add back sysfs directory · 38c1a01c

由 Johannes Berg 提交于 11月 16, 2012

commit 35b2a113 broke (at least)
Fedora's networking scripts, they check for the existence of the
wireless directory. As the files aren't used, add the directory
back and not the files. Also do it for both drivers based on the
old wireless extensions and cfg80211, regardless of whether the
compat code for wext is built into cfg80211 or not.

Cc: stable@vger.kernel.org [3.6]
Reported-by: NDave Airlie <airlied@gmail.com>
Reported-by: NBill Nottingham <notting@redhat.com>
Signed-off-by: NJohannes Berg <johannes.berg@intel.com>

38c1a01c

net-rps: Fix brokeness causing OOO packets · baefa31d

由 Tom Herbert 提交于 11月 16, 2012

In commit c445477d which adds aRFS to the kernel, the CPU
selected for RFS is not set correctly when CPU is changing.
This is causing OOO packets and probably other issues.
Signed-off-by: NTom Herbert <therbert@google.com>
Acked-by: NEric Dumazet <edumazet@google.com>
Acked-by: NBen Hutchings <bhutchings@solarflare.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

baefa31d

net: use right lock in __dev_remove_offload · c53aa505

由 Eric Dumazet 提交于 11月 16, 2012

offload_base is protected by offload_lock, not ptype_lock
Signed-off-by: NEric Dumazet <edumazet@google.com>
Cc: Vlad Yasevich <vyasevic@redhat.com>
Acked-by: NVlad Yasevich <vyasevic@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c53aa505

16 11月, 2012 4 次提交

net: correct check in dev_addr_del() · a652208e

由 Jiri Pirko 提交于 11月 14, 2012

Check (ha->addr == dev->dev_addr) is always true because dev_addr_init()
sets this. Correct the check to behave properly on addr removal.
Signed-off-by: NJiri Pirko <jiri@resnulli.us>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a652208e

net: Remove code duplication between offload structures · f191a1d1

由 Vlad Yasevich 提交于 11月 15, 2012

Move the offload callbacks into its own structure.
Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f191a1d1

net: Switch to using the new packet offload infrustructure · 22061d80

由 Vlad Yasevich 提交于 11月 15, 2012

Convert to using the new GSO/GRO registration mechanism and new
packet offload structure.
Signed-off-by: NVlad Yasevich <vyasevic@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

22061d80

net: Add generic packet offload infrastructure. · 62532da9

由 Vlad Yasevich 提交于 11月 15, 2012

Create a new data structure to contain the GRO/GSO callbacks and add
a new registration mechanism.
Singed-off-by: NVlad Yasevich <vyasevic@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

62532da9

08 11月, 2012 1 次提交

af-packet: fix oops when socket is not present · a3d744e9

由 Eric Leblond 提交于 11月 06, 2012

Due to a NULL dereference, the following patch is causing oops
in normal trafic condition:

commit c0de08d0
Author: Eric Leblond <eric@regit.org>
Date:   Thu Aug 16 22:02:58 2012 +0000

    af_packet: don't emit packet on orig fanout group

This buggy patch was a feature fix and has reached most stable
branches.

When skb->sk is NULL and when packet fanout is used, there is a
crash in match_fanout_group where skb->sk is accessed.
This patch fixes the issue by returning false as soon as the
socket is NULL: this correspond to the wanted behavior because
the kernel as to resend the skb to all the listening socket in
this case.
Signed-off-by: NEric Leblond <eric@regit.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a3d744e9

04 11月, 2012 3 次提交

net: fix bridge notify hook to manage flags correctly · c38e01b8

由 John Fastabend 提交于 11月 02, 2012

The bridge notify hook rtnl_bridge_notify() was not handling the
case where the master flags was set or with both flags set. First
flags are not being passed correctly and second the logic to parse
them is broken.

This patch passes the original flags value and fixes the
logic.
Reported-by: NBen Hutchings <bhutchings@solarflare.com>
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c38e01b8

rtnetlink: Use nlmsg type RTM_NEWNEIGH from dflt fdb dump · a7a558fe

由 John Fastabend 提交于 11月 01, 2012

Change the dflt fdb dump handler to use RTM_NEWNEIGH to
be compatible with bridge dump routines.

The dump reply from the network driver handlers should
match the reply from bridge handler. The fact they were
not in the ixgbe case was effectively a bug. This patch
resolves it.

Applications that were not checking the nlmsg type will
continue to work. And now applications that do check
the type will work as expected.
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a7a558fe

pktgen: clean up ktime_t helpers · 398f382c

由 Daniel Borkmann 提交于 10月 28, 2012

Some years ago, the ktime_t helper functions ktime_now() and ktime_lt()
have been introduced. Instead of defining them inside pktgen.c, they
should either use ktime_t library functions or, if not available, they
should be defined in ktime.h, so that also others can benefit from them.
ktime_compare() is introduced with a similar notion as in timespec_compare().
Signed-off-by: NDaniel Borkmann <daniel.borkmann@tik.ee.ethz.ch>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Stephen Hemminger <shemminger@vyatta.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Acked-by: NThomas Gleixner <tglx@linutronix.de>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

398f382c

03 11月, 2012 3 次提交

net: Fix continued iteration in rtnl_bridge_getlink() · 25b1e679

由 Ben Hutchings 提交于 11月 02, 2012

Commit e5a55a89 ('net: create generic
bridge ops') broke the handling of a non-zero starting index in
rtnl_bridge_getlink() (based on the old br_dump_ifinfo()).

When the starting index is non-zero, we need to increment the current
index for each entry that we are skipping.  Also, we need to check the
index before both cases, since we may previously have stopped
iteration between getting information about a device from its master
and from itself.
Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
Tested-by: NJohn Fastabend <john.r.fastabend@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

25b1e679

skb: api to report errors for zero copy skbs · 25121173

由 Michael S. Tsirkin 提交于 11月 01, 2012

Orphaning frags for zero copy skbs needs to allocate data in atomic
context so is has a chance to fail. If it does we currently discard
the skb which is safe, but we don't report anything to the caller,
so it can not recover by e.g. disabling zero copy.

Add an API to free skb reporting such errors: this is used
by tun in case orphaning frags fails.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

25121173

skb: report completion status for zero copy skbs · e19d6763

由 Michael S. Tsirkin 提交于 11月 01, 2012

Even if skb is marked for zero copy, net core might still decide
to copy it later which is somewhat slower than a copy in user context:
besides copying the data we need to pin/unpin the pages.

Add a parameter reporting such cases through zero copy callback:
if this happens a lot, device can take this into account
and switch to copying in user context.

This patch updates all users but ignores the passed value for now:
it will be used by follow-up patches.
Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e19d6763

01 11月, 2012 5 次提交

sk-filter: Add ability to get socket filter program (v2) · a8fc9277

由 Pavel Emelyanov 提交于 11月 01, 2012

The SO_ATTACH_FILTER option is set only. I propose to add the get
ability by using SO_ATTACH_FILTER in getsockopt. To be less
irritating to eyes the SO_GET_FILTER alias to it is declared. This
ability is required by checkpoint-restore project to be able to
save full state of a socket.

There are two issues with getting filter back.

First, kernel modifies the sock_filter->code on filter load, thus in
order to return the filter element back to user we have to decode it
into user-visible constants. Fortunately the modification in question
is interconvertible.

Second, the BPF_S_ALU_DIV_K code modifies the command argument k to
speed up the run-time division by doing kernel_k = reciprocal(user_k).
Bad news is that different user_k may result in same kernel_k, so we
can't get the original user_k back. Good news is that we don't have
to do it. What we need to is calculate a user2_k so, that

  reciprocal(user2_k) == reciprocal(user_k) == kernel_k

i.e. if it's re-loaded back the compiled again value will be exactly
the same as it was. That said, the user2_k can be calculated like this

  user2_k = reciprocal(kernel_k)

with an exception, that if kernel_k == 0, then user2_k == 1.

The optlen argument is treated like this -- when zero, kernel returns
the amount of sock_fprog elements in filter, otherwise it should be
large enough for the sock_fprog array.

changes since v1:
* Declared SO_GET_FILTER in all arch headers
* Added decode of vlan-tag codes
Signed-off-by: NPavel Emelyanov <xemul@parallels.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a8fc9277

net: filter: add vlan tag access · f3335031

由 Eric Dumazet 提交于 10月 27, 2012

BPF filters lack ability to access skb->vlan_tci

This patch adds two new ancillary accessors :

SKF_AD_VLAN_TAG         (44) mapped to vlan_tx_tag_get(skb)

SKF_AD_VLAN_TAG_PRESENT (48) mapped to vlan_tx_tag_present(skb)

This allows libpcap/tcpdump to use a kernel filter instead of
having to fallback to accept all packets, then filter them in
user space.
Signed-off-by: NEric Dumazet <edumazet@google.com>
Suggested-by: NAni Sinha <ani@aristanetworks.com>
Suggested-by: NDaniel Borkmann <danborkmann@iogearbox.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

f3335031

ixgbe: add setlink, getlink support to ixgbe and ixgbevf · 815cccbf

由 John Fastabend 提交于 10月 24, 2012

This adds support for the net device ops to manage the embedded
hardware bridge on ixgbe devices. With this patch the bridge
mode can be toggled between VEB and VEPA to support stacking
macvlan devices or using the embedded switch without any SW
component in 802.1Qbg/br environments.

Additionally, this adds source address pruning to the ixgbevf
driver to prune any frames sent back from a reflective relay on
the switch. This is required because the existing hardware does
not support this. Without it frames get pushed into the stack
with its own src mac which is invalid per 802.1Qbg VEPA
definition.
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

815cccbf

net: set and query VEB/VEPA bridge mode via PF_BRIDGE · 2469ffd7

由 John Fastabend 提交于 10月 24, 2012

Hardware switches may support enabling and disabling the
loopback switch which puts the device in a VEPA mode defined
in the IEEE 802.1Qbg specification. In this mode frames are
not switched in the hardware but sent directly to the switch.
SR-IOV capable NICs will likely support this mode I am
aware of at least two such devices. Also I am told (but don't
have any of this hardware available) that there are devices
that only support VEPA modes. In these cases it is important
at a minimum to be able to query these attributes.

This patch adds an additional IFLA_BRIDGE_MODE attribute that can be
set and dumped via the PF_BRIDGE:{SET|GET}LINK operations. Also
anticipating bridge attributes that may be common for both embedded
bridges and software bridges this adds a flags attribute
IFLA_BRIDGE_FLAGS currently used to determine if the command or event
is being generated to/from an embedded bridge or software bridge.
Finally, the event generation is pulled out of the bridge module and
into rtnetlink proper.

For example using the macvlan driver in VEPA mode on top of
an embedded switch requires putting the embedded switch into
a VEPA mode to get the expected results.

	--------  --------
        | VEPA |  | VEPA |       <-- macvlan vepa edge relays
        --------  --------
           |        |
           |        |
        ------------------
        |      VEPA      |       <-- embedded switch in NIC
        ------------------
                |
                |
        -------------------
        | external switch |      <-- shiny new physical
	-------------------          switch with VEPA support

A packet sent from the macvlan VEPA at the top could be
loopbacked on the embedded switch and never seen by the
external switch. So in order for this to work the embedded
switch needs to be set in the VEPA state via the above
described commands.

By making these attributes nested in IFLA_AF_SPEC we allow
future extensions to be made as needed.

CC: Lennert Buytenhek <buytenh@wantstofly.org>
CC: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2469ffd7

net: create generic bridge ops · e5a55a89

由 John Fastabend 提交于 10月 24, 2012

The PF_BRIDGE:RTM_{GET|SET}LINK nlmsg family and type are
currently embedded in the ./net/bridge module. This prohibits
them from being used by other bridging devices. One example
of this being hardware that has embedded bridging components.

In order to use these nlmsg types more generically this patch
adds two net_device_ops hooks. One to set link bridge attributes
and another to dump the current bride attributes.

	ndo_bridge_setlink()
	ndo_bridge_getlink()

CC: Lennert Buytenhek <buytenh@wantstofly.org>
CC: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NJohn Fastabend <john.r.fastabend@intel.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e5a55a89

26 10月, 2012 3 次提交

cgroup: net_cls: Pass in task to sock_update_classid() · fd9a08a7

由 Daniel Wagner 提交于 10月 25, 2012

sock_update_classid() assumes that the update operation always are
applied on the current task. sock_update_classid() needs to know on
which tasks to work on in order to be able to migrate task between
cgroups using the struct cgroup_subsys attach() callback.
Signed-off-by: NDaniel Wagner <daniel.wagner@bmw-carit.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: "Michael S. Tsirkin" <mst@redhat.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Joe Perches <joe@perches.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Stanislav Kinsbursky <skinsbursky@parallels.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: <netdev@vger.kernel.org>
Cc: <cgroups@vger.kernel.org>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

fd9a08a7

cgroup: net_cls: Remove rcu_read_lock/unlock · 3ace03cc

由 Daniel Wagner 提交于 10月 25, 2012

As Eric pointed out:
"Hey task_cls_classid() has its own rcu protection since commit
3fb5a991 (cls_cgroup: Fix rcu lockdep warning)

So we can safely revert Paul commit (1144182a)
(We no longer need rcu_read_lock/unlock here)"
Signed-off-by: NDaniel Wagner <daniel.wagner@bmw-carit.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: netdev@vger.kernel.org
Cc: cgroups@vger.kernel.org
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3ace03cc

cgroup: net_prio: Mark local used function static · c658f19d

由 Daniel Wagner 提交于 10月 25, 2012

net_prio_attach() is only access via cgroup_subsys callbacks,
therefore we can reduce the visibility of this function.
Signed-off-by: NDaniel Wagner <daniel.wagner@bmw-carit.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: John Fastabend <john.r.fastabend@intel.com>
Cc: Li Zefan <lizefan@huawei.com>
Cc: Neil Horman <nhorman@tuxdriver.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: <netdev@vger.kernel.org>
Cc: <cgroups@vger.kernel.org>
Acked-by: NNeil Horman <nhorman@tuxdriver.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c658f19d

24 10月, 2012 1 次提交

netlink: cleanup the unnecessary return value check · c80bbeae

由 Hans Zhang 提交于 10月 22, 2012

It's no needed to check the return value of tab since the NULL situation
has been handled already, and the rtnl_msg_handlers[PF_UNSPEC] has been
initialized as non-NULL during the rtnetlink_init().
Signed-off-by: NHans Zhang <zhanghonghui@huawei.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c80bbeae

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功