1. 18 4月, 2011 1 次提交
  2. 24 3月, 2011 1 次提交
    • D
      ipv4: Fallback to FIB local table in __ip_dev_find(). · 406b6f97
      David S. Miller 提交于
      In commit 9435eb1c
      ("ipv4: Implement __ip_dev_find using new interface address hash.")
      we reimplemented __ip_dev_find() so that it doesn't have to
      do a full FIB table lookup.
      
      Instead, it consults a hash table of addresses configured to
      interfaces.
      
      This works identically to the old code in all except one case,
      and that is for loopback subnets.
      
      The old code would match the loopback device for any IP address
      that falls within a subnet configured to the loopback device.
      
      Handle this corner case by doing the FIB lookup.
      
      We could implement this via inet_addr_onlink() but:
      
      1) Someone could configure many addresses to loopback and
         inet_addr_onlink() is a simple list traversal.
      
      2) We know the old code works.
      Reported-by: NJulian Anastasov <ja@ssi.bg>
      Acked-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      406b6f97
  3. 22 3月, 2011 2 次提交
    • J
      ipv4: optimize route adding on secondary promotion · 04024b93
      Julian Anastasov 提交于
      Optimize the calling of fib_add_ifaddr for all
      secondary addresses after the promoted one to start from
      their place, not from the new place of the promoted
      secondary. It will save some CPU cycles because we
      are sure the promoted secondary was first for the subnet
      and all next secondaries do not change their place.
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      04024b93
    • J
      ipv4: remove the routes on secondary promotion · 2d230e2b
      Julian Anastasov 提交于
      The secondary address promotion relies on fib_sync_down_addr
      to remove all routes created for the secondary addresses when
      the old primary address is deleted. It does not happen for cases
      when the primary address is also in another subnet. Fix that
      by deleting local and broadcast routes for all secondaries while
      they are on device list and by faking that all addresses from
      this subnet are to be deleted. It relies on fib_del_ifaddr being
      able to ignore the IPs from the concerned subnet while checking
      for duplication.
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2d230e2b
  4. 10 3月, 2011 1 次提交
    • D
      ipv4: Fix erroneous uses of ifa_address. · 6c91afe1
      David S. Miller 提交于
      In usual cases ifa_address == ifa_local, but in the case where
      SIOCSIFDSTADDR sets the destination address on a point-to-point
      link, ifa_address gets set to that destination address.
      
      Therefore we should use ifa_local when we want the local interface
      address.
      
      There were two cases where the selection was done incorrectly:
      
      1) When devinet_ioctl() does matching, it checks ifa_address even
         though gifconf correct reported ifa_local to the user
      
      2) IN_DEV_ARP_NOTIFY handling sends a gratuitous ARP using
         ifa_address instead of ifa_local.
      Reported-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6c91afe1
  5. 04 3月, 2011 1 次提交
  6. 19 2月, 2011 2 次提交
  7. 15 2月, 2011 1 次提交
    • I
      arp_notify: unconditionally send gratuitous ARP for NETDEV_NOTIFY_PEERS. · d11327ad
      Ian Campbell 提交于
      NETDEV_NOTIFY_PEER is an explicit request by the driver to send a link
      notification while NETDEV_UP/NETDEV_CHANGEADDR generate link
      notifications as a sort of side effect.
      
      In the later cases the sysctl option is present because link
      notification events can have undesired effects e.g. if the link is
      flapping. I don't think this applies in the case of an explicit
      request from a driver.
      
      This patch makes NETDEV_NOTIFY_PEER unconditional, if preferred we
      could add a new sysctl for this case which defaults to on.
      
      This change causes Xen post-migration ARP notifications (which cause
      switches to relearn their MAC tables etc) to be sent by default.
      Signed-off-by: NIan Campbell <ian.campbell@citrix.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d11327ad
  8. 13 12月, 2010 1 次提交
    • D
      ipv4: Don't pre-seed hoplimit metric. · 323e126f
      David S. Miller 提交于
      Always go through a new ip4_dst_hoplimit() helper, just like ipv6.
      
      This allowed several simplifications:
      
      1) The interim dst_metric_hoplimit() can go as it's no longer
         userd.
      
      2) The sysctl_ip_default_ttl entry no longer needs to use
         ipv4_doint_and_flush, since the sysctl is not cached in
         routing cache metrics any longer.
      
      3) ipv4_doint_and_flush no longer needs to be exported and
         therefore can be marked static.
      
      When ipv4_doint_and_flush_strategy was removed some time ago,
      the external declaration in ip.h was mistakenly left around
      so kill that off too.
      
      We have to move the sysctl_ip_default_ttl declaration into
      ipv4's route cache definition header net/route.h, because
      currently net/ip.h (where the declaration lives now) has
      a back dependency on net/route.h
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      323e126f
  9. 07 12月, 2010 1 次提交
  10. 28 11月, 2010 1 次提交
    • T
      rtnl: make link af-specific updates atomic · cf7afbfe
      Thomas Graf 提交于
      As David pointed out correctly, updates to af-specific attributes
      are currently not atomic. If multiple changes are requested and
      one of them fails, previous updates may have been applied already
      leaving the link behind in a undefined state.
      
      This patch splits the function parse_link_af() into two functions
      validate_link_af() and set_link_at(). validate_link_af() is placed
      to validate_linkmsg() check for errors as early as possible before
      any changes to the link have been made. set_link_af() is called to
      commit the changes later.
      
      This method is not fail proof, while it is currently sufficient
      to make set_link_af() inerrable and thus 100% atomic, the
      validation function method will not be able to detect all error
      scenarios in the future, there will likely always be errors
      depending on states which are f.e. not protected by rtnl_mutex
      and thus may change between validation and setting.
      
      Also, instead of silently ignoring unknown address families and
      config blocks for address families which did not register a set
      function the errors EAFNOSUPPORT respectively EOPNOSUPPORT are
      returned to avoid comitting 4 out of 5 update requests without
      notifying the user.
      Signed-off-by: NThomas Graf <tgraf@infradead.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cf7afbfe
  11. 18 11月, 2010 1 次提交
  12. 19 10月, 2010 1 次提交
  13. 16 9月, 2010 1 次提交
  14. 31 5月, 2010 1 次提交
    • I
      arp_notify: allow drivers to explicitly request a notification event. · 06c4648d
      Ian Campbell 提交于
      Currently such notifications are only generated when the device comes up or the
      address changes. However one use case for these notifications is to enable
      faster network recovery after a virtual machine migration (by causing switches
      to relearn their MAC tables). A migration appears to the network stack as a
      temporary loss of carrier and therefore does not trigger either of the current
      conditions. Rather than adding carrier up as a trigger (which can cause issues
      when interfaces a flapping) simply add an interface which the driver can use
      to explicitly trigger the notification.
      Signed-off-by: NIan Campbell <ian.campbell@citrix.com>
      Cc: Stephen Hemminger <shemminger@linux-foundation.org>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: netdev@vger.kernel.org
      Cc: stable@kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      06c4648d
  15. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  16. 27 3月, 2010 1 次提交
  17. 19 3月, 2010 1 次提交
  18. 20 2月, 2010 1 次提交
    • E
      net: Fix sysctl restarts... · 88af182e
      Eric W. Biederman 提交于
      Yuck.  It turns out that when we restart sysctls we were restarting
      with the values already changed.  Which unfortunately meant that
      the second time through we thought there was no change and skipped
      all kinds of work, despite the fact that there was indeed a change.
      
      I have fixed this the simplest way possible by restoring the changed
      values when we restart the sysctl write.
      
      One of my coworkers spotted this bug when after disabling forwarding
      on an interface pings were still forwarded.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      88af182e
  19. 17 2月, 2010 2 次提交
  20. 07 1月, 2010 1 次提交
    • J
      net: RFC3069, private VLAN proxy arp support · 65324144
      Jesper Dangaard Brouer 提交于
      This is to be used together with switch technologies, like RFC3069,
      that where the individual ports are not allowed to communicate with
      each other, but they are allowed to talk to the upstream router.  As
      described in RFC 3069, it is possible to allow these hosts to
      communicate through the upstream router by proxy_arp'ing.
      
      This patch basically allow proxy arp replies back to the same
      interface (from which the ARP request/solicitation was received).
      
      Tunable per device via proc "proxy_arp_pvlan":
        /proc/sys/net/ipv4/conf/*/proxy_arp_pvlan
      
      This switch technology is known by different vendor names:
       - In RFC 3069 it is called VLAN Aggregation.
       - Cisco and Allied Telesyn call it Private VLAN.
       - Hewlett-Packard call it Source-Port filtering or port-isolation.
       - Ericsson call it MAC-Forced Forwarding (RFC Draft).
      Signed-off-by: NJesper Dangaard Brouer <hawk@comx.dk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      65324144
  21. 26 12月, 2009 1 次提交
    • J
      net: restore ip source validation · 28f6aeea
      Jamal Hadi Salim 提交于
      when using policy routing and the skb mark:
      there are cases where a back path validation requires us
      to use a different routing table for src ip validation than
      the one used for mapping ingress dst ip.
      One such a case is transparent proxying where we pretend to be
      the destination system and therefore the local table
      is used for incoming packets but possibly a main table would
      be used on outbound.
      Make the default behavior to allow the above and if users
      need to turn on the symmetry via sysctl src_valid_mark
      Signed-off-by: NJamal Hadi Salim <hadi@cyberus.ca>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      28f6aeea
  22. 04 12月, 2009 1 次提交
  23. 26 11月, 2009 1 次提交
  24. 14 11月, 2009 1 次提交
    • E
      ipv4: speedup inet_dump_ifaddr() · eec4df98
      Eric Dumazet 提交于
      Stephen Hemminger a écrit :
      > On Thu, 12 Nov 2009 15:11:36 +0100
      > Eric Dumazet <eric.dumazet@gmail.com> wrote:
      >
      >> When handling large number of netdevices, inet_dump_ifaddr()
      >> is very slow because it has O(N^2) complexity.
      >>
      >> Instead of scanning one single list, we can use the NETDEV_HASHENTRIES
      >> sub lists of the dev_index hash table, and RCU lookups.
      >>
      >> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
      >
      > You might be able to make RCU critical section smaller by moving
      > it into loop.
      >
      
      Indeed. But we dump at most one skb (<= 8192 bytes ?), so rcu_read_lock
      holding time is small, unless we meet many netdevices without
      addresses. I wonder if its really common...
      
      Thanks
      
      [PATCH net-next-2.6] ipv4: speedup inet_dump_ifaddr()
      
      When handling large number of netdevices, inet_dump_ifaddr()
      is very slow because it has O(N2) complexity.
      
      Instead of scanning one single list, we can use the NETDEV_HASHENTRIES
      sub lists of the dev_index hash table, and RCU lookups.
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Acked-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eec4df98
  25. 12 11月, 2009 1 次提交
    • E
      sysctl net: Remove unused binary sysctl code · f8572d8f
      Eric W. Biederman 提交于
      Now that sys_sysctl is a compatiblity wrapper around /proc/sys
      all sysctl strategy routines, and all ctl_name and strategy
      entries in the sysctl tables are unused, and can be
      revmoed.
      
      In addition neigh_sysctl_register has been modified to no longer
      take a strategy argument and it's callers have been modified not
      to pass one.
      
      Cc: "David Miller" <davem@davemloft.net>
      Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org>
      Cc: netdev@vger.kernel.org
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      f8572d8f
  26. 05 11月, 2009 1 次提交
  27. 04 11月, 2009 1 次提交
  28. 02 11月, 2009 1 次提交
  29. 07 10月, 2009 1 次提交
  30. 24 9月, 2009 1 次提交
  31. 15 9月, 2009 1 次提交
  32. 19 5月, 2009 1 次提交
  33. 25 2月, 2009 1 次提交
    • P
      netlink: change nlmsg_notify() return value logic · 1ce85fe4
      Pablo Neira Ayuso 提交于
      This patch changes the return value of nlmsg_notify() as follows:
      
      If NETLINK_BROADCAST_ERROR is set by any of the listeners and
      an error in the delivery happened, return the broadcast error;
      else if there are no listeners apart from the socket that
      requested a change with the echo flag, return the result of the
      unicast notification. Thus, with this patch, the unicast
      notification is handled in the same way of a broadcast listener
      that has set the NETLINK_BROADCAST_ERROR socket flag.
      
      This patch is useful in case that the caller of nlmsg_notify()
      wants to know the result of the delivery of a netlink notification
      (including the broadcast delivery) and take any action in case
      that the delivery failed. For example, ctnetlink can drop packets
      if the event delivery failed to provide reliable logging and
      state-synchronization at the cost of dropping packets.
      
      This patch also modifies the rtnetlink code to ignore the return
      value of rtnl_notify() in all callers. The function rtnl_notify()
      (before this patch) returned the error of the unicast notification
      which makes rtnl_set_sk_err() reports errors to all listeners. This
      is not of any help since the origin of the change (the socket that
      requested the echoing) notices the ENOBUFS error if the notification
      fails and should resync itself.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Acked-by: NPatrick McHardy <kaber@trash.net>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1ce85fe4
  34. 01 2月, 2009 1 次提交
  35. 03 11月, 2008 1 次提交
  36. 29 10月, 2008 1 次提交
  37. 17 10月, 2008 1 次提交