提交 · 7c28bd0b8ec4d128bd7660671d1b626b0abc471f · openanolis / cloud-kernel

24 10月, 2009 1 次提交

rtnetlink: speedup rtnl_dump_ifinfo() · 7c28bd0b

由 Eric Dumazet 提交于 10月 24, 2009

When handling large number of netdevice, rtnl_dump_ifinfo()
is very slow because it has O(N^2) complexity.

Instead of scanning one single list, we can use the 256 sub lists
of the dev_index hash table.

This considerably speedups "ip link" operations
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

7c28bd0b

22 10月, 2009 1 次提交

rtnetlink: rtnl_setlink() and rtnl_getlink() changes · a3d12891

由 Eric Dumazet 提交于 10月 21, 2009

rtnl_getlink() & rtnl_setlink() run with RTNL held, we can use
__dev_get_by_index() and __dev_get_by_name() variants and avoid
dev_hold()/dev_put()

Adds to rtnl_getlink() the capability to find a device by its name,
not only by its index.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a3d12891

06 9月, 2009 1 次提交

net_sched: reintroduce dev->qdisc for use by sch_api · af356afa

由 Patrick McHardy 提交于 9月 04, 2009

Currently the multiqueue integration with the qdisc API suffers from
a few problems:

- with multiple queues, all root qdiscs use the same handle. This means
  they can't be exposed to userspace in a backwards compatible fashion.

- all API operations always refer to queue number 0. Newly created
  qdiscs are automatically shared between all queues, its not possible
  to address individual queues or restore multiqueue behaviour once a
  shared qdisc has been attached.

- Dumps only contain the root qdisc of queue 0, in case of non-shared
  qdiscs this means the statistics are incomplete.

This patch reintroduces dev->qdisc, which points to the (single) root qdisc
from userspace's point of view. Currently it either points to the first
(non-shared) default qdisc, or a qdisc shared between all queues. The
following patches will introduce a classful dummy qdisc, which will be used
as root qdisc and contain the per-queue qdiscs as children.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

af356afa

03 9月, 2009 1 次提交

vlan: multiqueue vlan device · 2e59af3d

由 Eric Dumazet 提交于 9月 02, 2009

vlan devices are currently not multi-queue capable.

We can do that with a new rtnl_link_ops method,
get_tx_queues(), called from rtnl_create_link()

This new method gets num_tx_queues/real_num_tx_queues
from real device.

register_vlan_device() is also handled.
Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

2e59af3d

13 7月, 2009 1 次提交

net: move and export get_net_ns_by_pid · 30ffee84

由 Johannes Berg 提交于 7月 10, 2009

The function get_net_ns_by_pid(), to get a network
namespace from a pid_t, will be required in cfg80211
as well. Therefore, let's move it to net_namespace.c
and export it. We can't make it a static inline in
the !NETNS case because it needs to verify that the
given pid even exists (and return -ESRCH).
Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

30ffee84

25 2月, 2009 1 次提交

netlink: change nlmsg_notify() return value logic · 1ce85fe4

由 Pablo Neira Ayuso 提交于 2月 24, 2009

This patch changes the return value of nlmsg_notify() as follows:

If NETLINK_BROADCAST_ERROR is set by any of the listeners and
an error in the delivery happened, return the broadcast error;
else if there are no listeners apart from the socket that
requested a change with the echo flag, return the result of the
unicast notification. Thus, with this patch, the unicast
notification is handled in the same way of a broadcast listener
that has set the NETLINK_BROADCAST_ERROR socket flag.

This patch is useful in case that the caller of nlmsg_notify()
wants to know the result of the delivery of a netlink notification
(including the broadcast delivery) and take any action in case
that the delivery failed. For example, ctnetlink can drop packets
if the event delivery failed to provide reliable logging and
state-synchronization at the cost of dropping packets.

This patch also modifies the rtnetlink code to ignore the return
value of rtnl_notify() in all callers. The function rtnl_notify()
(before this patch) returned the error of the unicast notification
which makes rtnl_set_sk_err() reports errors to all listeners. This
is not of any help since the origin of the change (the socket that
requested the echoing) notices the ENOBUFS error if the notification
fails and should resync itself.
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Acked-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1ce85fe4

20 11月, 2008 2 次提交

netdev: introduce dev_get_stats() · eeda3fd6

由 Stephen Hemminger 提交于 11月 19, 2008

In order for the network device ops get_stats call to be immutable, the handling
of the default internal network device stats block has to be changed. Add a new
helper function which replaces the old use of internal_get_stats.

Note: change return code to make it clear that the caller should not
go changing the returned statistics.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

eeda3fd6

netdev: network device operations infrastructure · d314774c

由 Stephen Hemminger 提交于 11月 19, 2008

This patch changes the network device internal API to move adminstrative
operations out of the network device structure and into a separate structure.

This patch involves some hackery to maintain compatablity between the
new and old model, so all 300+ drivers don't have to be changed at once.
For drivers that aren't converted yet, the netdevice_ops virt function list
still resides in the net_device structure. For old protocols, the new
net_device_ops are copied out to the old net_device pointers.

After the transistion is completed the nag message can be changed to
an WARN_ON, and the compatiablity code can be made configurable.

Some function pointers aren't moved:
* destructor can't be in net_device_ops because
  it may need to be referenced after the module is unloaded.
* neighbor setup is manipulated in a couple of places that need special
  consideration
* hard_start_xmit is in the fast path for transmit.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d314774c

17 11月, 2008 1 次提交

rtnetlink: propagate error from dev_change_flags in do_setlink() · 5f9021cf

由 Johannes Berg 提交于 11月 16, 2008

Unlike ifconfig, iproute doesn't report an error when setting
an interface up fails:

(example: put wireless network mac80211 interface into repeater mode
with iwconfig but do not set a peer MAC address, it should fail with
-ENOLINK)

without patch:
# ip link set wlan0 up ; echo $?
0
# 

with patch:
# ip link set wlan0 up ; echo $?
RTNETLINK answers: Link has been severed
2
# 

Propagate the return value from dev_change_flags() to fix this.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Tested-by: NJohannes Berg <johannes@sipsolutions.net>
Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

5f9021cf

17 10月, 2008 1 次提交

net: Remove CONFIG_KMOD from net/ (towards removing CONFIG_KMOD entirely) · 95a5afca

由 Johannes Berg 提交于 10月 16, 2008

Some code here depends on CONFIG_KMOD to not try to load
protocol modules or similar, replace by CONFIG_MODULES
where more than just request_module depends on CONFIG_KMOD
and and also use try_then_request_module in ebtables.
Signed-off-by: NJohannes Berg <johannes@sipsolutions.net>
Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

95a5afca

08 10月, 2008 1 次提交

net: Fix netdev_run_todo dead-lock · 58ec3b4d

由 Herbert Xu 提交于 10月 07, 2008

Benjamin Thery tracked down a bug that explains many instances
of the error

unregister_netdevice: waiting for %s to become free. Usage count = %d

It turns out that netdev_run_todo can dead-lock with itself if
a second instance of it is run in a thread that will then free
a reference to the device waited on by the first instance.

The problem is really quite silly.  We were trying to create
parallelism where none was required.  As netdev_run_todo always
follows a RTNL section, and that todo tasks can only be added
with the RTNL held, by definition you should only need to wait
for the very ones that you've added and be done with it.

There is no need for a second mutex or spinlock.

This is exactly what the following patch does.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

58ec3b4d

23 9月, 2008 1 次提交

net: network device name ifalias support · 0b815a1a

由 Stephen Hemminger 提交于 9月 22, 2008

This patch add support for keeping an additional character alias
associated with an network interface. This is useful for maintaining
the SNMP ifAlias value which is a user defined value. Routers use this
to hold information like which circuit or line it is connected to. It
is just an arbitrary text label on the network device.

There are two exposed interfaces with this patch, the value can be
read/written either via netlink or sysfs.

This could be maintained just by the snmp daemon, but it is more
generally useful for other management tools, and the kernel is good
place to act as an agreed upon interface to store it.
Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0b815a1a

18 7月, 2008 1 次提交

netdev: Allocate multiple queues for TX. · e8a0464c

由 David S. Miller 提交于 7月 17, 2008

alloc_netdev_mq() now allocates an array of netdev_queue
structures for TX, based upon the queue_count argument.

Furthermore, all accesses to the TX queues are now vectored
through the netdev_get_tx_queue() and netdev_for_each_tx_queue()
interfaces.  This makes it easy to grep the tree for all
things that want to get to a TX queue of a net device.

Problem spots which are not really multiqueue aware yet, and
only work with one queue, can easily be spotted by grepping
for all netdev_get_tx_queue() calls that pass in a zero index.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

e8a0464c

09 7月, 2008 1 次提交
- D
  netdev: Move rest of qdisc state into struct netdev_queue · b0e1e646
  由 David S. Miller 提交于 7月 08, 2008
```
Now qdisc, qdisc_sleeping, and qdisc_list also live there.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
```
  b0e1e646
04 6月, 2008 1 次提交

netlink: Improve returned error codes · bc3ed28c

由 Thomas Graf 提交于 6月 03, 2008

Make nlmsg_trim(), nlmsg_cancel(), genlmsg_cancel(), and
nla_nest_cancel() void functions.

Return -EMSGSIZE instead of -1 if the provided message buffer is not
big enough.
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

bc3ed28c

22 5月, 2008 1 次提交

net: The dev->get_stats pointer is not NULL nowadays. · 96e74088

由 Pavel Emelyanov 提交于 5月 21, 2008

And so does the pointer is returns, but sysfs and netlinks still 
check for both cases.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

96e74088

24 4月, 2008 1 次提交

[RTNETLINK]: Fix bogus ASSERT_RTNL warning · c9c1014b

由 Patrick McHardy 提交于 4月 23, 2008

ASSERT_RTNL uses mutex_trylock to test whether the rtnl_mutex is
held. This bogus warnings when running in atomic context, which
f.e. happens when adding secondary unicast addresses through
macvlan or vlan or when synchronizing multicast addresses from
wireless devices.

Mid-term we might want to consider moving all address updates
to process context since the locking seems overly complicated,
for now just fix the bogus warning by changing ASSERT_RTNL to
use mutex_is_locked().
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c9c1014b

16 4月, 2008 2 次提交

[RTNL]: Introduce the rtnl_kill_links helper. · 669f87ba

由 Pavel Emelyanov 提交于 4月 16, 2008

This one is responsible for calling ->dellink on each net
device found in net to help with vlan net_exit hook in the
nearest future.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Acked-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

669f87ba

[RTNL]: Relax for_each_netdev_safe in __rtnl_link_unregister. · 3a931a80

由 Pavel Emelyanov 提交于 4月 16, 2008

Each potential list_del (happening from inside a ->dellink call)
is followed by goto restart, so there's no need in _safe iteration.
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Acked-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

3a931a80

26 3月, 2008 2 次提交

[NET] NETNS: Omit sock->sk_net without CONFIG_NET_NS. · 3b1e0a65

由 YOSHIFUJI Hideaki 提交于 3月 26, 2008

Introduce per-sock inlines: sock_net(), sock_net_set()
and per-inet_timewait_sock inlines: twsk_net(), twsk_net_set().
Without CONFIG_NET_NS, no namespace other than &init_net exists.
Let's explicitly define them to help compiler optimizations.
Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>

3b1e0a65

[NET] NETNS: Omit net_device->nd_net without CONFIG_NET_NS. · c346dca1

由 YOSHIFUJI Hideaki 提交于 3月 25, 2008

Introduce per-net_device inlines: dev_net(), dev_net_set().
Without CONFIG_NET_NS, no namespace other than &init_net exists.
Let's explicitly define them to help compiler optimizations.
Signed-off-by: NYOSHIFUJI Hideaki <yoshfuji@linux-ipv6.org>

c346dca1

24 2月, 2008 1 次提交

[RTNL]: Validate hardware and broadcast address attribute for RTM_NEWLINK · 1840bb13

由 Thomas Graf 提交于 2月 23, 2008

RTM_NEWLINK allows for already existing links to be modified. For this
purpose do_setlink() is called which expects address attributes with a
payload length of at least dev->addr_len. This patch adds the necessary
validation for the RTM_NEWLINK case.

The address length for links to be created is not checked for now as the
actual attribute length is used when copying the address to the netdevice
structure. It might make sense to report an error if less than addr_len
bytes are provided but enforcing this might break drivers trying to be
smart with not transmitting all zero addresses.
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1840bb13

20 2月, 2008 1 次提交

[RTNL]: Add missing link netlink attribute policy definitions · 76e87306

由 Thomas Graf 提交于 2月 19, 2008

IFLA_LINK is no longer a write-only attribute on the kernel side and
must thus be validated. Same goes for the newly introduced
IFLA_LINKINFO.

Fixes undefined behaviour if either of the attributes are not well
formed.
Signed-off-by: NThomas Graf <tgraf@suug.ch>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

76e87306

18 2月, 2008 1 次提交

Revert "[RTNETLINK]: Send a single notification on device state changes." · 93b2d4a2

由 David S. Miller 提交于 2月 17, 2008

This reverts commit 45b50354.

It break locking around dev->link_mode as well as cause
other bootup problems.
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

93b2d4a2

13 2月, 2008 1 次提交

[RTNETLINK]: Send a single notification on device state changes. · 45b50354

由 Laszlo Attila Toth 提交于 2月 12, 2008

In do_setlink() a single notification is sent at the end of the
function if any modification occured. If the address has been changed,
another notification is sent.

Both of them is required because originally only the NETDEV_CHANGEADDR
notification was sent and although device state change implies address
change, some programs may expect the original notification. It remains
for compatibity.

If set_operstate() is called from do_setlink(), it doesn't send a
notification, only if it is called from rtnl_create_link() as earlier.
Signed-off-by: NLaszlo Attila Toth <panther@balabit.hu>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

45b50354

05 2月, 2008 1 次提交

[NET] rtnetlink.c: remove no longer used functions · 03245ce2

由 Adrian Bunk 提交于 2月 05, 2008

This patch removes the following no longer used functions:
- rtattr_parse()
- rtattr_strlcpy()
- __rtattr_parse_nested_compat()
Signed-off-by: NAdrian Bunk <bunk@kernel.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

03245ce2

29 1月, 2008 6 次提交

[NETNS]: Namespace stop vs 'ip r l' race. · 775516bf

由 Denis V. Lunev 提交于 1月 18, 2008

During network namespace stop process kernel side netlink sockets
belonging to a namespace should be closed. They should not prevent
namespace to stop, so they do not increment namespace usage
counter. Though this counter will be put during last sock_put.

The raplacement of the correct netns for init_ns solves the problem
only partial as socket to be stoped until proper stop is a valid
netlink kernel socket and can be looked up by the user processes. This
is not a problem until it resides in initial namespace (no processes
inside this net), but this is not true for init_net.

So, hold the referrence for a socket, remove it from lookup tables and
only after that change namespace and perform a last put.
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Tested-by: NAlexey Dobriyan <adobriyan@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

775516bf

[NETNS]: Consolidate kernel netlink socket destruction. · b7c6ba6e

由 Denis V. Lunev 提交于 1月 28, 2008

Create a specific helper for netlink kernel socket disposal. This just
let the code look better and provides a ground for proper disposal
inside a namespace.
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Tested-by: NAlexey Dobriyan <adobriyan@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

b7c6ba6e

[NETNS]: Memory leak on network namespace stop. · 4f84d82f

由 Denis V. Lunev 提交于 1月 18, 2008

Network namespace allocates 2 kernel netlink sockets, fibnl &
rtnl. These sockets should be disposed properly, i.e. by
sock_release. Plain sock_put is not enough.
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Tested-by: NAlexey Dobriyan <adobriyan@openvz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4f84d82f

[NET]: Make the netlink methods in rtnetlink handle multiple network namespaces · 4b3da706

由 Eric W. Biederman 提交于 11月 19, 2007

After the previous prep work this just consists of removing checks
limiting the code to work in the initial network namespace, and
updating rtmsg_ifinfo so we can generate events for devices in
something other then the initial network namespace.

Referring to network other network devices like the IFLA_LINK
and IFLA_MASTER attributes do, gets interesting if those network
devices happen to be in other network namespaces.  Currently
ifindex numbers are allocated globally so I have taken the path
of least resistance and not still report the information even
though the devices they are talking about are invisible.

If applications start getting confused or when ifindex
numbers become local to the network namespace we may need
to do something different in the future.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NDenis V. Lunev <den@openz.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

4b3da706

[NET]: Make rtnetlink infrastructure network namespace aware (v3) · 97c53cac

由 Denis V. Lunev 提交于 11月 19, 2007

After this patch none of the netlink callback support anything
except the initial network namespace but the rtnetlink infrastructure
now handles multiple network namespaces.

Changes from v2:
- IPv6 addrlabel processing

Changes from v1:
- no need for special rtnl_unlock handling
- fixed IPv6 ndisc
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

97c53cac

[NET]: Modify all rtnetlink methods to only work in the initial namespace (v2) · b854272b

由 Denis V. Lunev 提交于 12月 01, 2007

Before I can enable rtnetlink to work in all network namespaces I need
to be certain that something won't break.  So this patch deliberately
disables all of the rtnletlink methods in everything except the
initial network namespace.  After the methods have been audited this
extra check can be disabled.

Changes from v1:
- added IPv6 addrlabel protection
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>

b854272b

21 1月, 2008 1 次提交

[NET]: rtnl_link: fix use-after-free · 68365458

由 Patrick McHardy 提交于 1月 20, 2008

When unregistering the rtnl_link_ops, all existing devices using
the ops are destroyed. With nested devices this may lead to a
use-after-free despite the use of for_each_netdev_safe() in case
the upper device is next in the device list and is destroyed
by the NETDEV_UNREGISTER notifier.

The easy fix is to restart scanning the device list after removing
a device. Alternatively we could add new devices to the front of
the list to avoid having dependant devices follow the device they
depend on. A third option would be to only restart scanning if
dev->iflink of the next device matches dev->ifindex of the current
one. For now this seems like the safest solution.

With this patch, the veth rtnl_link_ops unregistration can use
rtnl_link_unregister() directly since it now also handles destruction
of multiple devices at once.
Signed-off-by: NPatrick McHardy <kaber@trash.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

68365458

27 10月, 2007 1 次提交

[NETNS]: Fix get_net_ns_by_pid · ceaa79c4

由 Eric W. Biederman 提交于 10月 26, 2007

The pid namespace patches changed the semantics of
find_task_by_pid without breaking the compile resulting
in get_net_ns_by_pid doing the wrong thing.

So switch to using the intended find_task_by_vpid.

Combined with Denis' earlier patch to make netlink traffic
fully synchronous the inadvertent race I introduced with
accessing current is actually removed.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

ceaa79c4

20 10月, 2007 1 次提交

Make access to task's nsproxy lighter · cf7b708c

由 Pavel Emelyanov 提交于 10月 18, 2007

When someone wants to deal with some other taks's namespaces it has to lock
the task and then to get the desired namespace if the one exists.  This is
slow on read-only paths and may be impossible in some cases.

E.g.  Oleg recently noticed a race between unshare() and the (sent for
review in cgroups) pid namespaces - when the task notifies the parent it
has to know the parent's namespace, but taking the task_lock() is
impossible there - the code is under write locked tasklist lock.

On the other hand switching the namespace on task (daemonize) and releasing
the namespace (after the last task exit) is rather rare operation and we
can sacrifice its speed to solve the issues above.

The access to other task namespaces is proposed to be performed
like this:

     rcu_read_lock();
     nsproxy = task_nsproxy(tsk);
     if (nsproxy != NULL) {
             / *
               * work with the namespaces here
               * e.g. get the reference on one of them
               * /
     } / *
         * NULL task_nsproxy() means that this task is
         * almost dead (zombie)
         * /
     rcu_read_unlock();

This patch has passed the review by Eric and Oleg :) and,
of course, tested.

[clg@fr.ibm.com: fix unshare()]
[ebiederm@xmission.com: Update get_net_ns_by_pid]
Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Serge Hallyn <serue@us.ibm.com>
Signed-off-by: NCedric Le Goater <clg@fr.ibm.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

cf7b708c

11 10月, 2007 5 次提交

[NET]: make netlink user -> kernel interface synchronious · cd40b7d3

由 Denis V. Lunev 提交于 10月 10, 2007

This patch make processing netlink user -> kernel messages synchronious.
This change was inspired by the talk with Alexey Kuznetsov about current
netlink messages processing. He says that he was badly wrong when introduced 
asynchronious user -> kernel communication.

The call netlink_unicast is the only path to send message to the kernel
netlink socket. But, unfortunately, it is also used to send data to the
user.

Before this change the user message has been attached to the socket queue
and sk->sk_data_ready was called. The process has been blocked until all
pending messages were processed. The bad thing is that this processing
may occur in the arbitrary process context.

This patch changes nlk->data_ready callback to get 1 skb and force packet
processing right in the netlink_unicast.

Kernel -> user path in netlink_unicast remains untouched.

EINTR processing for in netlink_run_queue was changed. It forces rtnl_lock
drop, but the process remains in the cycle until the message will be fully
processed. So, there is no need to use this kludges now.
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Acked-by: NAlexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

cd40b7d3

[NET]: rtnl_unlock cleanups · 1536cc0d

由 Denis V. Lunev 提交于 10月 10, 2007

There is no need to process outstanding netlink user->kernel packets
during rtnl_unlock now. There is no rtnl_trylock in the rtnetlink_rcv
anymore.

Normal code path is the following:
netlink_sendmsg
   netlink_unicast
       netlink_sendskb
           skb_queue_tail
           netlink_data_ready
               rtnetlink_rcv
                   mutex_lock(&rtnl_mutex);
                   netlink_run_queue(sk, qlen, &rtnetlink_rcv_msg);
                   mutex_unlock(&rtnl_mutex);

So, it is possible, that packets can be present in the rtnl->sk_receive_queue
during rtnl_unlock, but there is no need to process them at that moment as
rtnetlink_rcv for that packet is pending.
Signed-off-by: NDenis V. Lunev <den@openvz.org>
Acked-by: NAlexey Kuznetsov <kuznet@ms2.inr.ac.ru>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

1536cc0d

[NETLINK]: Avoid pointer in netlink_run_queue · 0cfad075

由 Herbert Xu 提交于 9月 16, 2007

I was looking at Patrick's fix to inet_diag and it occured
to me that we're using a pointer argument to return values
unnecessarily in netlink_run_queue.  Changing it to return
the value will allow the compiler to generate better code
since the value won't have to be memory-backed.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

0cfad075

[NET]: netlink support for moving devices between network namespaces. · d8a5ec67

由 Eric W. Biederman 提交于 9月 12, 2007

The simplest thing to implement is moving network devices between
namespaces.  However with the same attribute IFLA_NET_NS_PID we can
easily implement creating devices in the destination network
namespace as well.  However that is a little bit trickier so this
patch sticks to what is simple and easy.

A pid is used to identify a process that happens to be a member
of the network namespace we want to move the network device to.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d8a5ec67

[NET]: Make the device list and device lookups per namespace. · 881d966b

由 Eric W. Biederman 提交于 9月 17, 2007

This patch makes most of the generic device layer network
namespace safe.  This patch makes dev_base_head a
network namespace variable, and then it picks up
a few associated variables.  The functions:
dev_getbyhwaddr
dev_getfirsthwbytype
dev_get_by_flags
dev_get_by_name
__dev_get_by_name
dev_get_by_index
__dev_get_by_index
dev_ioctl
dev_ethtool
dev_load
wireless_process_ioctl

were modified to take a network namespace argument, and
deal with it.

vlan_ioctl_set and brioctl_set were modified so their
hooks will receive a network namespace argument.

So basically anthing in the core of the network stack that was
affected to by the change of dev_base was modified to handle
multiple network namespaces.  The rest of the network stack was
simply modified to explicitly use &init_net the initial network
namespace.  This can be fixed when those components of the network
stack are modified to handle multiple network namespaces.

For now the ifindex generator is left global.

Fundametally ifindex numbers are per namespace, or else
we will have corner case problems with migration when
we get that far.

At the same time there are assumptions in the network stack
that the ifindex of a network device won't change.  Making
the ifindex number global seems a good compromise until
the network stack can cope with ifindex changes when
you change namespaces, and the like.
Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

881d966b

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功