1. 06 9月, 2009 1 次提交
    • P
      net_sched: reintroduce dev->qdisc for use by sch_api · af356afa
      Patrick McHardy 提交于
      Currently the multiqueue integration with the qdisc API suffers from
      a few problems:
      
      - with multiple queues, all root qdiscs use the same handle. This means
        they can't be exposed to userspace in a backwards compatible fashion.
      
      - all API operations always refer to queue number 0. Newly created
        qdiscs are automatically shared between all queues, its not possible
        to address individual queues or restore multiqueue behaviour once a
        shared qdisc has been attached.
      
      - Dumps only contain the root qdisc of queue 0, in case of non-shared
        qdiscs this means the statistics are incomplete.
      
      This patch reintroduces dev->qdisc, which points to the (single) root qdisc
      from userspace's point of view. Currently it either points to the first
      (non-shared) default qdisc, or a qdisc shared between all queues. The
      following patches will introduce a classful dummy qdisc, which will be used
      as root qdisc and contain the per-queue qdiscs as children.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      af356afa
  2. 03 9月, 2009 1 次提交
  3. 13 7月, 2009 1 次提交
  4. 25 2月, 2009 1 次提交
    • P
      netlink: change nlmsg_notify() return value logic · 1ce85fe4
      Pablo Neira Ayuso 提交于
      This patch changes the return value of nlmsg_notify() as follows:
      
      If NETLINK_BROADCAST_ERROR is set by any of the listeners and
      an error in the delivery happened, return the broadcast error;
      else if there are no listeners apart from the socket that
      requested a change with the echo flag, return the result of the
      unicast notification. Thus, with this patch, the unicast
      notification is handled in the same way of a broadcast listener
      that has set the NETLINK_BROADCAST_ERROR socket flag.
      
      This patch is useful in case that the caller of nlmsg_notify()
      wants to know the result of the delivery of a netlink notification
      (including the broadcast delivery) and take any action in case
      that the delivery failed. For example, ctnetlink can drop packets
      if the event delivery failed to provide reliable logging and
      state-synchronization at the cost of dropping packets.
      
      This patch also modifies the rtnetlink code to ignore the return
      value of rtnl_notify() in all callers. The function rtnl_notify()
      (before this patch) returned the error of the unicast notification
      which makes rtnl_set_sk_err() reports errors to all listeners. This
      is not of any help since the origin of the change (the socket that
      requested the echoing) notices the ENOBUFS error if the notification
      fails and should resync itself.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Acked-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1ce85fe4
  5. 20 11月, 2008 2 次提交
    • S
      netdev: introduce dev_get_stats() · eeda3fd6
      Stephen Hemminger 提交于
      In order for the network device ops get_stats call to be immutable, the handling
      of the default internal network device stats block has to be changed. Add a new
      helper function which replaces the old use of internal_get_stats.
      
      Note: change return code to make it clear that the caller should not
      go changing the returned statistics.
      Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eeda3fd6
    • S
      netdev: network device operations infrastructure · d314774c
      Stephen Hemminger 提交于
      This patch changes the network device internal API to move adminstrative
      operations out of the network device structure and into a separate structure.
      
      This patch involves some hackery to maintain compatablity between the
      new and old model, so all 300+ drivers don't have to be changed at once.
      For drivers that aren't converted yet, the netdevice_ops virt function list
      still resides in the net_device structure. For old protocols, the new
      net_device_ops are copied out to the old net_device pointers.
      
      After the transistion is completed the nag message can be changed to
      an WARN_ON, and the compatiablity code can be made configurable.
      
      Some function pointers aren't moved:
      * destructor can't be in net_device_ops because
        it may need to be referenced after the module is unloaded.
      * neighbor setup is manipulated in a couple of places that need special
        consideration
      * hard_start_xmit is in the fast path for transmit.
      Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d314774c
  6. 17 11月, 2008 1 次提交
  7. 17 10月, 2008 1 次提交
  8. 08 10月, 2008 1 次提交
    • H
      net: Fix netdev_run_todo dead-lock · 58ec3b4d
      Herbert Xu 提交于
      Benjamin Thery tracked down a bug that explains many instances
      of the error
      
      unregister_netdevice: waiting for %s to become free. Usage count = %d
      
      It turns out that netdev_run_todo can dead-lock with itself if
      a second instance of it is run in a thread that will then free
      a reference to the device waited on by the first instance.
      
      The problem is really quite silly.  We were trying to create
      parallelism where none was required.  As netdev_run_todo always
      follows a RTNL section, and that todo tasks can only be added
      with the RTNL held, by definition you should only need to wait
      for the very ones that you've added and be done with it.
      
      There is no need for a second mutex or spinlock.
      
      This is exactly what the following patch does.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      58ec3b4d
  9. 23 9月, 2008 1 次提交
    • S
      net: network device name ifalias support · 0b815a1a
      Stephen Hemminger 提交于
      This patch add support for keeping an additional character alias
      associated with an network interface. This is useful for maintaining
      the SNMP ifAlias value which is a user defined value. Routers use this
      to hold information like which circuit or line it is connected to. It
      is just an arbitrary text label on the network device.
      
      There are two exposed interfaces with this patch, the value can be
      read/written either via netlink or sysfs.
      
      This could be maintained just by the snmp daemon, but it is more
      generally useful for other management tools, and the kernel is good
      place to act as an agreed upon interface to store it.
      Signed-off-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0b815a1a
  10. 18 7月, 2008 1 次提交
    • D
      netdev: Allocate multiple queues for TX. · e8a0464c
      David S. Miller 提交于
      alloc_netdev_mq() now allocates an array of netdev_queue
      structures for TX, based upon the queue_count argument.
      
      Furthermore, all accesses to the TX queues are now vectored
      through the netdev_get_tx_queue() and netdev_for_each_tx_queue()
      interfaces.  This makes it easy to grep the tree for all
      things that want to get to a TX queue of a net device.
      
      Problem spots which are not really multiqueue aware yet, and
      only work with one queue, can easily be spotted by grepping
      for all netdev_get_tx_queue() calls that pass in a zero index.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e8a0464c
  11. 09 7月, 2008 1 次提交
  12. 04 6月, 2008 1 次提交
  13. 22 5月, 2008 1 次提交
  14. 24 4月, 2008 1 次提交
    • P
      [RTNETLINK]: Fix bogus ASSERT_RTNL warning · c9c1014b
      Patrick McHardy 提交于
      ASSERT_RTNL uses mutex_trylock to test whether the rtnl_mutex is
      held. This bogus warnings when running in atomic context, which
      f.e. happens when adding secondary unicast addresses through
      macvlan or vlan or when synchronizing multicast addresses from
      wireless devices.
      
      Mid-term we might want to consider moving all address updates
      to process context since the locking seems overly complicated,
      for now just fix the bogus warning by changing ASSERT_RTNL to
      use mutex_is_locked().
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c9c1014b
  15. 16 4月, 2008 2 次提交
  16. 26 3月, 2008 2 次提交
  17. 24 2月, 2008 1 次提交
    • T
      [RTNL]: Validate hardware and broadcast address attribute for RTM_NEWLINK · 1840bb13
      Thomas Graf 提交于
      RTM_NEWLINK allows for already existing links to be modified. For this
      purpose do_setlink() is called which expects address attributes with a
      payload length of at least dev->addr_len. This patch adds the necessary
      validation for the RTM_NEWLINK case.
      
      The address length for links to be created is not checked for now as the
      actual attribute length is used when copying the address to the netdevice
      structure. It might make sense to report an error if less than addr_len
      bytes are provided but enforcing this might break drivers trying to be
      smart with not transmitting all zero addresses.
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1840bb13
  18. 20 2月, 2008 1 次提交
  19. 18 2月, 2008 1 次提交
  20. 13 2月, 2008 1 次提交
    • L
      [RTNETLINK]: Send a single notification on device state changes. · 45b50354
      Laszlo Attila Toth 提交于
      In do_setlink() a single notification is sent at the end of the
      function if any modification occured. If the address has been changed,
      another notification is sent.
      
      Both of them is required because originally only the NETDEV_CHANGEADDR
      notification was sent and although device state change implies address
      change, some programs may expect the original notification. It remains
      for compatibity.
      
      If set_operstate() is called from do_setlink(), it doesn't send a
      notification, only if it is called from rtnl_create_link() as earlier.
      Signed-off-by: NLaszlo Attila Toth <panther@balabit.hu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      45b50354
  21. 05 2月, 2008 1 次提交
  22. 29 1月, 2008 6 次提交
  23. 21 1月, 2008 1 次提交
    • P
      [NET]: rtnl_link: fix use-after-free · 68365458
      Patrick McHardy 提交于
      When unregistering the rtnl_link_ops, all existing devices using
      the ops are destroyed. With nested devices this may lead to a
      use-after-free despite the use of for_each_netdev_safe() in case
      the upper device is next in the device list and is destroyed
      by the NETDEV_UNREGISTER notifier.
      
      The easy fix is to restart scanning the device list after removing
      a device. Alternatively we could add new devices to the front of
      the list to avoid having dependant devices follow the device they
      depend on. A third option would be to only restart scanning if
      dev->iflink of the next device matches dev->ifindex of the current
      one. For now this seems like the safest solution.
      
      With this patch, the veth rtnl_link_ops unregistration can use
      rtnl_link_unregister() directly since it now also handles destruction
      of multiple devices at once.
      Signed-off-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      68365458
  24. 27 10月, 2007 1 次提交
  25. 20 10月, 2007 1 次提交
    • P
      Make access to task's nsproxy lighter · cf7b708c
      Pavel Emelyanov 提交于
      When someone wants to deal with some other taks's namespaces it has to lock
      the task and then to get the desired namespace if the one exists.  This is
      slow on read-only paths and may be impossible in some cases.
      
      E.g.  Oleg recently noticed a race between unshare() and the (sent for
      review in cgroups) pid namespaces - when the task notifies the parent it
      has to know the parent's namespace, but taking the task_lock() is
      impossible there - the code is under write locked tasklist lock.
      
      On the other hand switching the namespace on task (daemonize) and releasing
      the namespace (after the last task exit) is rather rare operation and we
      can sacrifice its speed to solve the issues above.
      
      The access to other task namespaces is proposed to be performed
      like this:
      
           rcu_read_lock();
           nsproxy = task_nsproxy(tsk);
           if (nsproxy != NULL) {
                   / *
                     * work with the namespaces here
                     * e.g. get the reference on one of them
                     * /
           } / *
               * NULL task_nsproxy() means that this task is
               * almost dead (zombie)
               * /
           rcu_read_unlock();
      
      This patch has passed the review by Eric and Oleg :) and,
      of course, tested.
      
      [clg@fr.ibm.com: fix unshare()]
      [ebiederm@xmission.com: Update get_net_ns_by_pid]
      Signed-off-by: NPavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Cc: Oleg Nesterov <oleg@tv-sign.ru>
      Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
      Cc: Serge Hallyn <serue@us.ibm.com>
      Signed-off-by: NCedric Le Goater <clg@fr.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cf7b708c
  26. 11 10月, 2007 7 次提交
    • D
      [NET]: make netlink user -> kernel interface synchronious · cd40b7d3
      Denis V. Lunev 提交于
      This patch make processing netlink user -> kernel messages synchronious.
      This change was inspired by the talk with Alexey Kuznetsov about current
      netlink messages processing. He says that he was badly wrong when introduced 
      asynchronious user -> kernel communication.
      
      The call netlink_unicast is the only path to send message to the kernel
      netlink socket. But, unfortunately, it is also used to send data to the
      user.
      
      Before this change the user message has been attached to the socket queue
      and sk->sk_data_ready was called. The process has been blocked until all
      pending messages were processed. The bad thing is that this processing
      may occur in the arbitrary process context.
      
      This patch changes nlk->data_ready callback to get 1 skb and force packet
      processing right in the netlink_unicast.
      
      Kernel -> user path in netlink_unicast remains untouched.
      
      EINTR processing for in netlink_run_queue was changed. It forces rtnl_lock
      drop, but the process remains in the cycle until the message will be fully
      processed. So, there is no need to use this kludges now.
      Signed-off-by: NDenis V. Lunev <den@openvz.org>
      Acked-by: NAlexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cd40b7d3
    • D
      [NET]: rtnl_unlock cleanups · 1536cc0d
      Denis V. Lunev 提交于
      There is no need to process outstanding netlink user->kernel packets
      during rtnl_unlock now. There is no rtnl_trylock in the rtnetlink_rcv
      anymore.
      
      Normal code path is the following:
      netlink_sendmsg
         netlink_unicast
             netlink_sendskb
                 skb_queue_tail
                 netlink_data_ready
                     rtnetlink_rcv
                         mutex_lock(&rtnl_mutex);
                         netlink_run_queue(sk, qlen, &rtnetlink_rcv_msg);
                         mutex_unlock(&rtnl_mutex);
      
      So, it is possible, that packets can be present in the rtnl->sk_receive_queue
      during rtnl_unlock, but there is no need to process them at that moment as
      rtnetlink_rcv for that packet is pending.
      Signed-off-by: NDenis V. Lunev <den@openvz.org>
      Acked-by: NAlexey Kuznetsov <kuznet@ms2.inr.ac.ru>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1536cc0d
    • H
      [NETLINK]: Avoid pointer in netlink_run_queue · 0cfad075
      Herbert Xu 提交于
      I was looking at Patrick's fix to inet_diag and it occured
      to me that we're using a pointer argument to return values
      unnecessarily in netlink_run_queue.  Changing it to return
      the value will allow the compiler to generate better code
      since the value won't have to be memory-backed.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0cfad075
    • E
      [NET]: netlink support for moving devices between network namespaces. · d8a5ec67
      Eric W. Biederman 提交于
      The simplest thing to implement is moving network devices between
      namespaces.  However with the same attribute IFLA_NET_NS_PID we can
      easily implement creating devices in the destination network
      namespace as well.  However that is a little bit trickier so this
      patch sticks to what is simple and easy.
      
      A pid is used to identify a process that happens to be a member
      of the network namespace we want to move the network device to.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d8a5ec67
    • E
      [NET]: Make the device list and device lookups per namespace. · 881d966b
      Eric W. Biederman 提交于
      This patch makes most of the generic device layer network
      namespace safe.  This patch makes dev_base_head a
      network namespace variable, and then it picks up
      a few associated variables.  The functions:
      dev_getbyhwaddr
      dev_getfirsthwbytype
      dev_get_by_flags
      dev_get_by_name
      __dev_get_by_name
      dev_get_by_index
      __dev_get_by_index
      dev_ioctl
      dev_ethtool
      dev_load
      wireless_process_ioctl
      
      were modified to take a network namespace argument, and
      deal with it.
      
      vlan_ioctl_set and brioctl_set were modified so their
      hooks will receive a network namespace argument.
      
      So basically anthing in the core of the network stack that was
      affected to by the change of dev_base was modified to handle
      multiple network namespaces.  The rest of the network stack was
      simply modified to explicitly use &init_net the initial network
      namespace.  This can be fixed when those components of the network
      stack are modified to handle multiple network namespaces.
      
      For now the ifindex generator is left global.
      
      Fundametally ifindex numbers are per namespace, or else
      we will have corner case problems with migration when
      we get that far.
      
      At the same time there are assumptions in the network stack
      that the ifindex of a network device won't change.  Making
      the ifindex number global seems a good compromise until
      the network stack can cope with ifindex changes when
      you change namespaces, and the like.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      881d966b
    • E
      [NET]: Support multiple network namespaces with netlink · b4b51029
      Eric W. Biederman 提交于
      Each netlink socket will live in exactly one network namespace,
      this includes the controlling kernel sockets.
      
      This patch updates all of the existing netlink protocols
      to only support the initial network namespace.  Request
      by clients in other namespaces will get -ECONREFUSED.
      As they would if the kernel did not have the support for
      that netlink protocol compiled in.
      
      As each netlink protocol is updated to be multiple network
      namespace safe it can register multiple kernel sockets
      to acquire a presence in the rest of the network namespaces.
      
      The implementation in af_netlink is a simple filter implementation
      at hash table insertion and hash table look up time.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b4b51029
    • E
      [NET]: Make device event notification network namespace safe · e9dc8653
      Eric W. Biederman 提交于
      Every user of the network device notifiers is either a protocol
      stack or a pseudo device.  If a protocol stack that does not have
      support for multiple network namespaces receives an event for a
      device that is not in the initial network namespace it quite possibly
      can get confused and do the wrong thing.
      
      To avoid problems until all of the protocol stacks are converted
      this patch modifies all netdev event handlers to ignore events on
      devices that are not in the initial network namespace.
      
      As the rest of the code is made network namespace aware these
      checks can be removed.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e9dc8653