1. 11 10月, 2007 24 次提交
    • E
      [PATCH] NET : convert IP route cache garbage collection from softirq processing to a workqueue · 86bba269
      Eric Dumazet 提交于
      When the periodic IP route cache flush is done (every 600 seconds on
      default configuration), some hosts suffer a lot and eventually trigger
      the "soft lockup" message.
      
      dst_run_gc() is doing a scan of a possibly huge list of dst_entries,
      eventually freeing some (less than 1%) of them, while holding the
      dst_lock spinlock for the whole scan.
      
      Then it rearms a timer to redo the full thing 1/10 s later...
      The slowdown can last one minute or so, depending on how active are
      the tcp sessions.
      
      This second version of the patch converts the processing from a softirq
      based one to a workqueue.
      
      Even if the list of entries in garbage_list is huge, host is still
      responsive to softirqs and can make progress.
      
      Instead of resetting gc timer to 0.1 second if one entry was freed in a
      gc run, we do this if more than 10% of entries were freed.
      
      Before patch :
      
      Aug 16 06:21:37 SRV1 kernel: BUG: soft lockup detected on CPU#0!
      Aug 16 06:21:37 SRV1 kernel:
      Aug 16 06:21:37 SRV1 kernel: Call Trace:
      Aug 16 06:21:37 SRV1 kernel:  <IRQ>  [<ffffffff802286f0>] wake_up_process+0x10/0x20
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff80251e09>] softlockup_tick+0xe9/0x110
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff803cd380>] dst_run_gc+0x0/0x140
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff802376f3>] run_local_timers+0x13/0x20
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff802379c7>] update_process_times+0x57/0x90
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff80216034>] smp_local_timer_interrupt+0x34/0x60
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff802165cc>] smp_apic_timer_interrupt+0x5c/0x80
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff8020a816>] apic_timer_interrupt+0x66/0x70
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff803cd3d3>] dst_run_gc+0x53/0x140
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff803cd3c6>] dst_run_gc+0x46/0x140
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff80237148>] run_timer_softirq+0x148/0x1c0
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff8023340c>] __do_softirq+0x6c/0xe0
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff8020ad6c>] call_softirq+0x1c/0x30
      Aug 16 06:21:37 SRV1 kernel:  <EOI>  [<ffffffff8020cb34>] do_softirq+0x34/0x90
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff802331cf>] local_bh_enable_ip+0x3f/0x60
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff80422913>] _spin_unlock_bh+0x13/0x20
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff803dfde8>] rt_garbage_collect+0x1d8/0x320
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff803cd4dd>] dst_alloc+0x1d/0xa0
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff803e1433>] __ip_route_output_key+0x573/0x800
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff803c02e2>] sock_common_recvmsg+0x32/0x50
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff803e16dc>] ip_route_output_flow+0x1c/0x60
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff80400160>] tcp_v4_connect+0x150/0x610
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff803ebf07>] inet_bind_bucket_create+0x17/0x60
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff8040cd16>] inet_stream_connect+0xa6/0x2c0
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff80422981>] _spin_lock_bh+0x11/0x30
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff803c0bdf>] lock_sock_nested+0xcf/0xe0
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff80422981>] _spin_lock_bh+0x11/0x30
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff803be551>] sys_connect+0x71/0xa0
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff803eee3f>] tcp_setsockopt+0x1f/0x30
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff803c030f>] sock_common_setsockopt+0xf/0x20
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff803be4bd>] sys_setsockopt+0x9d/0xc0
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff8028881e>] sys_ioctl+0x5e/0x80
      Aug 16 06:21:37 SRV1 kernel:  [<ffffffff80209c4e>] system_call+0x7e/0x83
      
      After patch : (RT_CACHE_DEBUG set to 2 to get following traces)
      
      dst_total: 75469 delayed: 74109 work_perf: 141 expires: 150 elapsed: 8092 us
      dst_total: 78725 delayed: 73366 work_perf: 743 expires: 400 elapsed: 8542 us
      dst_total: 86126 delayed: 71844 work_perf: 1522 expires: 775 elapsed: 8849 us
      dst_total: 100173 delayed: 68791 work_perf: 3053 expires: 1256 elapsed: 9748 us
      dst_total: 121798 delayed: 64711 work_perf: 4080 expires: 1997 elapsed: 10146 us
      dst_total: 154522 delayed: 58316 work_perf: 6395 expires: 25 elapsed: 11402 us
      dst_total: 154957 delayed: 58252 work_perf: 64 expires: 150 elapsed: 6148 us
      dst_total: 157377 delayed: 57843 work_perf: 409 expires: 400 elapsed: 6350 us
      dst_total: 163745 delayed: 56679 work_perf: 1164 expires: 775 elapsed: 7051 us
      dst_total: 176577 delayed: 53965 work_perf: 2714 expires: 1389 elapsed: 8120 us
      dst_total: 198993 delayed: 49627 work_perf: 4338 expires: 1997 elapsed: 8909 us
      dst_total: 226638 delayed: 46865 work_perf: 2762 expires: 2748 elapsed: 7351 us
      
      I successfully reduced the IP route cache of many hosts by a four factor
      thanks to this patch. Previously, I had to disable "ip route flush cache"
      to avoid crashes.
      Signed-off-by: NEric Dumazet <dada1@cosmosbay.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      86bba269
    • D
      [NET]: #if 0 out net_alloc() for now. · 678aa8e4
      David S. Miller 提交于
      We will undo this once it is actually used.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      678aa8e4
    • E
      [NET]: netlink support for moving devices between network namespaces. · d8a5ec67
      Eric W. Biederman 提交于
      The simplest thing to implement is moving network devices between
      namespaces.  However with the same attribute IFLA_NET_NS_PID we can
      easily implement creating devices in the destination network
      namespace as well.  However that is a little bit trickier so this
      patch sticks to what is simple and easy.
      
      A pid is used to identify a process that happens to be a member
      of the network namespace we want to move the network device to.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d8a5ec67
    • E
      [NET]: Implement network device movement between namespaces · ce286d32
      Eric W. Biederman 提交于
      This patch introduces NETIF_F_NETNS_LOCAL a flag to indicate
      a network device is local to a single network namespace and
      should never be moved.  Useful for pseudo devices that we
      need an instance in each network namespace (like the loopback
      device) and for any device we find that cannot handle multiple
      network namespaces so we may trap them in the initial network
      namespace.
      
      This patch introduces the function dev_change_net_namespace
      a function used to move a network device from one network
      namespace to another.  To the network device nothing
      special appears to happen, to the components of the network
      stack it appears as if the network device was unregistered
      in the network namespace it is in, and a new device
      was registered in the network namespace the device
      was moved to.
      
      This patch sets up a namespace device destructor that
      upon the exit of a network namespace moves all of the
      movable network devices  to the initial network namespace
      so they are not lost.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ce286d32
    • E
      [NET]: Factor out __dev_alloc_name from dev_alloc_name · b267b179
      Eric W. Biederman 提交于
      When forcibly changing the network namespace of a device
      I need something that can generate a name for the device
      in the new namespace without overwriting the old name.
      
      __dev_alloc_name provides me that functionality.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b267b179
    • E
      [NET]: Make the device list and device lookups per namespace. · 881d966b
      Eric W. Biederman 提交于
      This patch makes most of the generic device layer network
      namespace safe.  This patch makes dev_base_head a
      network namespace variable, and then it picks up
      a few associated variables.  The functions:
      dev_getbyhwaddr
      dev_getfirsthwbytype
      dev_get_by_flags
      dev_get_by_name
      __dev_get_by_name
      dev_get_by_index
      __dev_get_by_index
      dev_ioctl
      dev_ethtool
      dev_load
      wireless_process_ioctl
      
      were modified to take a network namespace argument, and
      deal with it.
      
      vlan_ioctl_set and brioctl_set were modified so their
      hooks will receive a network namespace argument.
      
      So basically anthing in the core of the network stack that was
      affected to by the change of dev_base was modified to handle
      multiple network namespaces.  The rest of the network stack was
      simply modified to explicitly use &init_net the initial network
      namespace.  This can be fixed when those components of the network
      stack are modified to handle multiple network namespaces.
      
      For now the ifindex generator is left global.
      
      Fundametally ifindex numbers are per namespace, or else
      we will have corner case problems with migration when
      we get that far.
      
      At the same time there are assumptions in the network stack
      that the ifindex of a network device won't change.  Making
      the ifindex number global seems a good compromise until
      the network stack can cope with ifindex changes when
      you change namespaces, and the like.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      881d966b
    • E
      [NET]: Support multiple network namespaces with netlink · b4b51029
      Eric W. Biederman 提交于
      Each netlink socket will live in exactly one network namespace,
      this includes the controlling kernel sockets.
      
      This patch updates all of the existing netlink protocols
      to only support the initial network namespace.  Request
      by clients in other namespaces will get -ECONREFUSED.
      As they would if the kernel did not have the support for
      that netlink protocol compiled in.
      
      As each netlink protocol is updated to be multiple network
      namespace safe it can register multiple kernel sockets
      to acquire a presence in the rest of the network namespaces.
      
      The implementation in af_netlink is a simple filter implementation
      at hash table insertion and hash table look up time.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b4b51029
    • E
      [NET]: Make device event notification network namespace safe · e9dc8653
      Eric W. Biederman 提交于
      Every user of the network device notifiers is either a protocol
      stack or a pseudo device.  If a protocol stack that does not have
      support for multiple network namespaces receives an event for a
      device that is not in the initial network namespace it quite possibly
      can get confused and do the wrong thing.
      
      To avoid problems until all of the protocol stacks are converted
      this patch modifies all netdev event handlers to ignore events on
      devices that are not in the initial network namespace.
      
      As the rest of the code is made network namespace aware these
      checks can be removed.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e9dc8653
    • E
      [NET]: Initialize the network namespace of network devices. · 6d34b1c2
      Eric W. Biederman 提交于
      Except for carefully selected pseudo devices all network
      interfaces should start out in the initial network namespace.
      Ultimately it will be register_netdev that examines what
      dev->nd_net is set to and places a device in a network namespace.
      
      This patch modifies alloc_netdev to initialize the network
      namespace a device is in with the initial network namespace.
      This gets it right for the vast majority of devices so their
      drivers need not be modified and for those few pseudo devices
      that need something different they can change this parameter
      before calling register_netdevice.
      
      The network namespace parameter on a network device is not
      reference counted as the devices are inside of a network namespace
      and cannot remain in that namespace past the lifetime of the
      network namespace.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6d34b1c2
    • E
      [NET]: Make socket creation namespace safe. · 1b8d7ae4
      Eric W. Biederman 提交于
      This patch passes in the namespace a new socket should be created in
      and has the socket code do the appropriate reference counting.  By
      virtue of this all socket create methods are touched.  In addition
      the socket create methods are modified so that they will fail if
      you attempt to create a socket in a non-default network namespace.
      
      Failing if we attempt to create a socket outside of the default
      network namespace ensures that as we incrementally make the network stack
      network namespace aware we will not export functionality that someone
      has not audited and made certain is network namespace safe.
      Allowing us to partially enable network namespaces before all of the
      exotic protocols are supported.
      
      Any protocol layers I have missed will fail to compile because I now
      pass an extra parameter into the socket creation code.
      
      [ Integrated AF_IUCV build fixes from Andrew Morton... -DaveM ]
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1b8d7ae4
    • E
      [NET]: Make /proc/net per network namespace · 457c4cbc
      Eric W. Biederman 提交于
      This patch makes /proc/net per network namespace.  It modifies the global
      variables proc_net and proc_net_stat to be per network namespace.
      The proc_net file helpers are modified to take a network namespace argument,
      and all of their callers are fixed to pass &init_net for that argument.
      This ensures that all of the /proc/net files are only visible and
      usable in the initial network namespace until the code behind them
      has been updated to be handle multiple network namespaces.
      
      Making /proc/net per namespace is necessary as at least some files
      in /proc/net depend upon the set of network devices which is per
      network namespace, and even more files in /proc/net have contents
      that are relevant to a single network namespace.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      457c4cbc
    • E
      [NET]: Basic network namespace infrastructure. · 5f256bec
      Eric W. Biederman 提交于
      This is the basic infrastructure needed to support network
      namespaces.  This infrastructure is:
      - Registration functions to support initializing per network
        namespace data when a network namespaces is created or destroyed.
      
      - struct net.  The network namespace data structure.
        This structure will grow as variables are made per network
        namespace but this is the minimal starting point.
      
      - Functions to grab a reference to the network namespace.
        I provide both get/put functions that keep a network namespace
        from being freed.  And hold/release functions serve as weak references
        and will warn if their count is not zero when the data structure
        is freed.  Useful for dealing with more complicated data structures
        like the ipv4 route cache.
      
      - A list of all of the network namespaces so we can iterate over them.
      
      - A slab for the network namespace data structure allowing leaks
        to be spotted.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5f256bec
    • J
      [NET]: Change type of owner in sock_lock_t to int, rename · d2e9117c
      John Heffner 提交于
      The type of owner in sock_lock_t is currently (struct sock_iocb *),
      presumably for historical reasons.  It is never used as this type, only
      tested as NULL or set to (void *)1.  For clarity, this changes it to type
      int, and renames to owned, to avoid any possible type casting errors.
      Signed-off-by: NJohn Heffner <jheffner@psc.edu>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d2e9117c
    • R
      [PKTGEN]: Remove softirq scheduling. · b163911f
      Robert Olsson 提交于
      It's not a job for pktgen.
      Signed-off-by: NRobert Olsson <robert.olsson@its.uu.se>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b163911f
    • R
      [PKTGEN]: Multiqueue support. · 45b270f8
      Robert Olsson 提交于
      Below some pktgen support to send into different TX queues.
      This can of course be feed into input queues on other machines
      Signed-off-by: NRobert Olsson <robert.olsson@its.uu.se>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      45b270f8
    • J
      [ETHTOOL]: Internal cleanup of ethtool_value-related handlers · 13c99b24
      Jeff Garzik 提交于
      Several get/set functions can be handled by a passing the ethtool_op
      function pointer directly to a generic function.  This permits deletion
      of a fair bit of redundant code.
      Signed-off-by: NJeff Garzik <jeff@garzik.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      13c99b24
    • J
    • J
    • J
      [ETHTOOL]: Add ETHTOOL_[GS]FLAGS sub-ioctls · 3ae7c0b2
      Jeff Garzik 提交于
      Signed-off-by: NJeff Garzik <jeff@garzik.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3ae7c0b2
    • S
      [NET] netconsole: Support dynamic reconfiguration using configfs · 0bcc1816
      Satyam Sharma 提交于
      Based upon initial work by Keiichi Kii <k-keiichi@bx.jp.nec.com>.
      
      This patch introduces support for dynamic reconfiguration (adding, removing
      and/or modifying parameters of netconsole targets at runtime) using a
      userspace interface exported via configfs.  Documentation is also updated
      accordingly.
      
      Issues and brief design overview:
      
      (1) Kernel-initiated creation / destruction of kernel objects is not
          possible with configfs -- the lifetimes of the "config items" is managed
          exclusively from userspace.  But netconsole must support boot/module
          params too, and these are parsed in kernel and hence netpolls must be
          setup from the kernel.  Joel Becker suggested to separately manage the
          lifetimes of the two kinds of netconsole_target objects -- those created
          via configfs mkdir(2) from userspace and those specified from the
          boot/module option string.  This adds complexity and some redundancy here
          and also means that boot/module param-created targets are not exposed
          through the configfs namespace (and hence cannot be updated / destroyed
          dynamically).  However, this saves us from locking / refcounting
          complexities that would need to be introduced in configfs to support
          kernel-initiated item creation / destroy there.
      
      (2) In configfs, item creation takes place in the call chain of the
          mkdir(2) syscall in the driver subsystem.  If we used an ioctl(2) to
          create / destroy objects from userspace, the special userspace program is
          able to fill out the structure to be passed into the ioctl and hence
          specify attributes such as local interface that are required at the time
          we set up the netpoll.  For configfs, this information is not available at
          the time of mkdir(2).  So, we keep all newly-created targets (via
          configfs) disabled by default.  The user is expected to set various
          attributes appropriately (including the local network interface if
          required) and then write(2) "1" to the "enabled" attribute.  Thus,
          netpoll_setup() is then called on the set parameters in the context of
          _this_ write(2) on the "enabled" attribute itself.  This design enables
          the user to reconfigure existing netconsole targets at runtime to be
          attached to newly-come-up interfaces that may not have existed when
          netconsole was loaded or when the targets were actually created.  All this
          effectively enables us to get rid of custom ioctls.
      
      (3) Ultra-paranoid configfs attribute show() and store() operations, with
          sanity and input range checking, using only safe string primitives, and
          compliant with the recommendations in Documentation/filesystems/sysfs.txt.
      
      (4) A new function netpoll_print_options() is created in the netpoll API,
          that just prints out the configured parameters for a netpoll structure.
          netpoll_parse_options() is modified to use that and it is also exported to
          be used from netconsole.
      Signed-off-by: NSatyam Sharma <satyam@infradead.org>
      Acked-by: NKeiichi Kii <k-keiichi@bx.jp.nec.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0bcc1816
    • T
      [NEIGH]: Netlink notifications · d961db35
      Thomas Graf 提交于
      Currently neighbour event notifications are limited to update
      notifications and only sent if the ARP daemon is enabled. This
      patch extends the existing notification code by also reporting
      neighbours being removed due to gc or administratively and
      removes the dependency on the ARP daemon. This allows to keep
      track of neighbour states without periodically fetching the
      complete neighbour table.
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d961db35
    • T
      [NEIGH]: Combine neighbour cleanup and release · 4f494554
      Thomas Graf 提交于
      Introduces neigh_cleanup_and_release() to be used after a
      neighbour has been removed from its neighbour table. Serves
      as preparation to add event notifications.
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4f494554
    • P
      [RTNETLINK]: Introduce generic rtnl_create_link(). · e7199288
      Pavel Emelianov 提交于
      This routine gets the parsed rtnl attributes and creates a new
      link with generic info (IFLA_LINKINFO policy). Its intention
      is to help the drivers, that need to create several links at
      once (like VETH).
      
      This is nothing but a copy-paste-ed part of rtnl_newlink() function
      that is responsible for creation of new device.
      Signed-off-by: NPavel Emelianov <xemul@openvz.org>
      Acked-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e7199288
    • S
      [NET]: Make NAPI polling independent of struct net_device objects. · bea3348e
      Stephen Hemminger 提交于
      Several devices have multiple independant RX queues per net
      device, and some have a single interrupt doorbell for several
      queues.
      
      In either case, it's easier to support layouts like that if the
      structure representing the poll is independant from the net
      device itself.
      
      The signature of the ->poll() call back goes from:
      
      	int foo_poll(struct net_device *dev, int *budget)
      
      to
      
      	int foo_poll(struct napi_struct *napi, int budget)
      
      The caller is returned the number of RX packets processed (or
      the number of "NAPI credits" consumed if you want to get
      abstract).  The callee no longer messes around bumping
      dev->quota, *budget, etc. because that is all handled in the
      caller upon return.
      
      The napi_struct is to be embedded in the device driver private data
      structures.
      
      Furthermore, it is the driver's responsibility to disable all NAPI
      instances in it's ->stop() device close handler.  Since the
      napi_struct is privatized into the driver's private data structures,
      only the driver knows how to get at all of the napi_struct instances
      it may have per-device.
      
      With lots of help and suggestions from Rusty Russell, Roland Dreier,
      Michael Chan, Jeff Garzik, and Jamal Hadi Salim.
      
      Bug fixes from Thomas Graf, Roland Dreier, Peter Zijlstra,
      Joseph Fannin, Scott Wood, Hans J. Koch, and Michael Chan.
      
      [ Ported to current tree and all drivers converted.  Integrated
        Stephen's follow-on kerneldoc additions, and restored poll_list
        handling to the old style to fix mutual exclusion issues.  -DaveM ]
      Signed-off-by: NStephen Hemminger <shemminger@linux-foundation.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bea3348e
  2. 17 9月, 2007 1 次提交
  3. 15 9月, 2007 1 次提交
    • D
      [NET]: Fix two issues wrt. SO_BINDTODEVICE. · 4878809f
      David S. Miller 提交于
      1) Comments suggest that setting optlen to zero will unbind
         the socket from whatever device it might be attached to.  This
         hasn't been the case since at least 2.2.x because the first thing
         this function does is return -EINVAL if 'optlen' is less than
         sizeof(int).
      
         This check also means that passing in a two byte string doesn't
         work so well.  It's almost as if this code was testing with "eth?"
         patterned strings and nothing else :-)
      
         Fix this by breaking the logic of this facility out into a
         seperate function which validates optlen more appropriately.
      
         The optlen==0 and small string cases now work properly.
      
      2) We should reset the cached route of the socket after we have made
         the device binding changes, not before.
      
      Reported by Ben Greear.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4878809f
  4. 11 9月, 2007 1 次提交
  5. 31 8月, 2007 1 次提交
  6. 29 8月, 2007 1 次提交
  7. 27 8月, 2007 2 次提交
  8. 15 8月, 2007 1 次提交
  9. 14 8月, 2007 1 次提交
  10. 08 8月, 2007 1 次提交
  11. 01 8月, 2007 3 次提交
  12. 31 7月, 2007 3 次提交
    • A
      [PKTGEN]: make get_ipsec_sa() static and non-inline · fea1ab0f
      Adrian Bunk 提交于
      Non-static inline code usually doesn't makes sense.
      
      In this case making is static and non-inline is the correct solution.
      Signed-off-by: NAdrian Bunk <bunk@stusta.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fea1ab0f
    • H
      [NET]: Allow netdev REGISTER/CHANGENAME events to fail · fcc5a03a
      Herbert Xu 提交于
      This patch adds code to allow errors to be passed up from event
      handlers of NETDEV_REGISTER and NETDEV_CHANGENAME.  It also adds
      the notifier_from_errno/notifier_to_errnor helpers to pass the
      errno value up to the notifier caller.
      
      If an error is detected when a device is registered, it causes
      that operation to fail.  A NETDEV_UNREGISTER will be sent to
      all event handlers.
      
      Similarly if NETDEV_CHANGENAME fails the original name is restored
      and a new NETDEV_CHANGENAME event is sent.
      
      As such all event handlers must be idempotent with respect to
      these events.
      
      When an event handler is registered NETDEV_REGISTER events are
      sent for all devices currently registered.  Should any of them
      fail, we will send NETDEV_GOING_DOWN/NETDEV_DOWN/NETDEV_UNREGISTER
      events to that handler for the devices which have already been
      registered with it.  The handler registration itself will fail.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fcc5a03a
    • H
      [NET]: Take dev_base_lock when moving device name hash list entry · 7f988eab
      Herbert Xu 提交于
      When we added name-based hashing the dev_base_lock was designated as the
      lock to take when changing the name hash list.  Unfortunately, because
      it was a preexisting lock that just happened to be taken in the right
      spots we neglected to take it in dev_change_name.
      
      The race can affect calles of __dev_get_by_name that do so without taking
      the RTNL.  They may end up walking down the wrong hash chain and end up
      missing the device that they're looking for.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7f988eab