1. 19 11月, 2012 3 次提交
    • E
      net: Allow userns root to control ipv4 · 52e804c6
      Eric W. Biederman 提交于
      Allow an unpriviled user who has created a user namespace, and then
      created a network namespace to effectively use the new network
      namespace, by reducing capable(CAP_NET_ADMIN) and
      capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns,
      CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls.
      
      Settings that merely control a single network device are allowed.
      Either the network device is a logical network device where
      restrictions make no difference or the network device is hardware NIC
      that has been explicity moved from the initial network namespace.
      
      In general policy and network stack state changes are allowed
      while resource control is left unchanged.
      
      Allow creating raw sockets.
      Allow the SIOCSARP ioctl to control the arp cache.
      Allow the SIOCSIFFLAG ioctl to allow setting network device flags.
      Allow the SIOCSIFADDR ioctl to allow setting a netdevice ipv4 address.
      Allow the SIOCSIFBRDADDR ioctl to allow setting a netdevice ipv4 broadcast address.
      Allow the SIOCSIFDSTADDR ioctl to allow setting a netdevice ipv4 destination address.
      Allow the SIOCSIFNETMASK ioctl to allow setting a netdevice ipv4 netmask.
      Allow the SIOCADDRT and SIOCDELRT ioctls to allow adding and deleting ipv4 routes.
      
      Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL and SIOCDELTUNNEL ioctls for
      adding, changing and deleting gre tunnels.
      
      Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL and SIOCDELTUNNEL ioctls for
      adding, changing and deleting ipip tunnels.
      
      Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL and SIOCDELTUNNEL ioctls for
      adding, changing and deleting ipsec virtual tunnel interfaces.
      
      Allow setting the MRT_INIT, MRT_DONE, MRT_ADD_VIF, MRT_DEL_VIF, MRT_ADD_MFC,
      MRT_DEL_MFC, MRT_ASSERT, MRT_PIM, MRT_TABLE socket options on multicast routing
      sockets.
      
      Allow setting and receiving IPOPT_CIPSO, IP_OPT_SEC, IP_OPT_SID and
      arbitrary ip options.
      
      Allow setting IP_SEC_POLICY/IP_XFRM_POLICY ipv4 socket option.
      Allow setting the IP_TRANSPARENT ipv4 socket option.
      Allow setting the TCP_REPAIR socket option.
      Allow setting the TCP_CONGESTION socket option.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      52e804c6
    • E
      net: Push capable(CAP_NET_ADMIN) into the rtnl methods · dfc47ef8
      Eric W. Biederman 提交于
      - In rtnetlink_rcv_msg convert the capable(CAP_NET_ADMIN) check
        to ns_capable(net->user-ns, CAP_NET_ADMIN).  Allowing unprivileged
        users to make netlink calls to modify their local network
        namespace.
      
      - In the rtnetlink doit methods add capable(CAP_NET_ADMIN) so
        that calls that are not safe for unprivileged users are still
        protected.
      
      Later patches will remove the extra capable calls from methods
      that are safe for unprivilged users.
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dfc47ef8
    • E
      net: Don't export sysctls to unprivileged users · 464dc801
      Eric W. Biederman 提交于
      In preparation for supporting the creation of network namespaces
      by unprivileged users, modify all of the per net sysctl exports
      and refuse to allow them to unprivileged users.
      
      This makes it safe for unprivileged users in general to access
      per net sysctls, and allows sysctls to be exported to unprivileged
      users on an individual basis as they are deemed safe.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      464dc801
  2. 02 11月, 2012 1 次提交
  3. 29 10月, 2012 2 次提交
  4. 22 9月, 2012 1 次提交
  5. 19 9月, 2012 1 次提交
  6. 11 9月, 2012 1 次提交
  7. 08 9月, 2012 1 次提交
  8. 24 8月, 2012 1 次提交
  9. 23 8月, 2012 1 次提交
    • E
      net: remove delay at device dismantle · 0115e8e3
      Eric Dumazet 提交于
      I noticed extra one second delay in device dismantle, tracked down to
      a call to dst_dev_event() while some call_rcu() are still in RCU queues.
      
      These call_rcu() were posted by rt_free(struct rtable *rt) calls.
      
      We then wait a little (but one second) in netdev_wait_allrefs() before
      kicking again NETDEV_UNREGISTER.
      
      As the call_rcu() are now completed, dst_dev_event() can do the needed
      device swap on busy dst.
      
      To solve this problem, add a new NETDEV_UNREGISTER_FINAL, called
      after a rcu_barrier(), but outside of RTNL lock.
      
      Use NETDEV_UNREGISTER_FINAL with care !
      
      Change dst_dev_event() handler to react to NETDEV_UNREGISTER_FINAL
      
      Also remove NETDEV_UNREGISTER_BATCH, as its not used anymore after
      IP cache removal.
      
      With help from Gao feng
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Cc: Mahesh Bandewar <maheshb@google.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Gao feng <gaofeng@cn.fujitsu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0115e8e3
  10. 04 8月, 2012 1 次提交
    • E
      ipv4: change inet_addr_hash() · 40384999
      Eric Dumazet 提交于
      Use net_hash_mix(net) instead of hash_ptr(net, 8), and use
      hash_32() instead of using a serie of XOR
      
      Define IN4_ADDR_HSIZE_SHIFT for clarity
      
      __ip_dev_find() can perform the net_eq() call only if ifa_local
      matches the key, to avoid unneeded dereferences.
      
      remove inline attributes
      
      # size net/ipv4/devinet.o.before net/ipv4/devinet.o
         text	   data	    bss	    dec	    hex	filename
        17471	   2545	   2048	  22064	   5630	net/ipv4/devinet.o.before
        17335	   2545	   2048	  21928	   55a8	net/ipv4/devinet.o
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      40384999
  11. 13 6月, 2012 1 次提交
    • T
      ipv4: Add interface option to enable routing of 127.0.0.0/8 · d0daebc3
      Thomas Graf 提交于
      Routing of 127/8 is tradtionally forbidden, we consider
      packets from that address block martian when routing and do
      not process corresponding ARP requests.
      
      This is a sane default but renders a huge address space
      practically unuseable.
      
      The RFC states that no address within the 127/8 block should
      ever appear on any network anywhere but it does not forbid
      the use of such addresses outside of the loopback device in
      particular. For example to address a pool of virtual guests
      behind a load balancer.
      
      This patch adds a new interface option 'route_localnet'
      enabling routing of the 127/8 address block and processing
      of ARP requests on a specific interface.
      
      Note that for the feature to work, the default local route
      covering 127/8 dev lo needs to be removed.
      
      Example:
        $ sysctl -w net.ipv4.conf.eth0.route_localnet=1
        $ ip route del 127.0.0.0/8 dev lo table local
        $ ip addr add 127.1.0.1/16 dev eth0
        $ ip route flush cache
      
      V2: Fix invalid check to auto flush cache (thanks davem)
      Signed-off-by: NThomas Graf <tgraf@suug.ch>
      Acked-by: NNeil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d0daebc3
  12. 16 5月, 2012 1 次提交
  13. 21 4月, 2012 1 次提交
  14. 16 4月, 2012 1 次提交
  15. 02 4月, 2012 1 次提交
  16. 29 3月, 2012 1 次提交
  17. 23 3月, 2012 1 次提交
    • A
      bonding: remove entries for master_ip and vlan_ip and query devices instead · eaddcd76
      Andy Gospodarek 提交于
      The following patch aimed to resolve an issue where secondary, tertiary,
      etc. addresses added to bond interfaces could overwrite the
      bond->master_ip and vlan_ip values.
      
              commit 917fbdb3
              Author: Henrik Saavedra Persson <henrik.e.persson@ericsson.com>
              Date:   Wed Nov 23 23:37:15 2011 +0000
      
                  bonding: only use primary address for ARP
      
      That patch was good because it prevented bonds using ARP monitoring from
      sending frames with an invalid source IP address.  Unfortunately, it
      didn't always work as expected.
      
      When using an ioctl (like ifconfig does) to set the IP address and
      netmask, 2 separate ioctls are actually called to set the IP and netmask
      if the mask chosen doesn't match the standard mask for that class of
      address.  The first ioctl did not have a mask that matched the one in
      the primary address and would still cause the device address to be
      overwritten.  The second ioctl that was called to set the mask would
      then detect as secondary and ignored, but the damage was already done.
      
      This was not an issue when using an application that used netlink
      sockets as the setting of IP and netmask came down at once.  The
      inconsistent behavior between those two interfaces was something that
      needed to be resolved.
      
      While I was thinking about how I wanted to resolve this, Ralf Zeidler
      came with a patch that resolved this on a RHEL kernel by keeping a full
      shadow of the entries in dev->ifa_list for the bonding device and vlan
      devices in the bonding driver.  I didn't like the duplication of the
      list as I want to see the 'bonding' struct and code shrink rather than
      grow, but liked the general idea.
      
      As the Subject indicates this patch drops the master_ip and vlan_ip
      elements from the 'bonding' and 'vlan_entry' structs, respectively.
      This can be done because a device's address-list is now traversed to
      determine the optimal source IP address for ARP requests and for checks
      to see if the bonding device has a particular IP address.  This code
      could have all be contained inside the bonding driver, but it made more
      sense to me to EXPORT and call inet_confirm_addr since it did exactly
      what was needed.
      
      I tested this and a backported patch and everything works as expected.
      Ralf also helped with verification of the backported patch.
      
      Thanks to Ralf for all his help on this.
      
      v2: Whitespace and organizational changes based on suggestions from Jay
      Vosburgh and Dave Miller.
      
      v3: Fixup incorrect usage of rcu_read_unlock based on Dave Miller's
      suggestion.
      Signed-off-by: NAndy Gospodarek <andy@greyhouse.net>
      CC: Ralf Zeidler <ralf.zeidler@nsn.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eaddcd76
  18. 13 1月, 2012 1 次提交
  19. 02 12月, 2011 1 次提交
  20. 02 8月, 2011 1 次提交
  21. 26 7月, 2011 1 次提交
  22. 10 6月, 2011 1 次提交
    • G
      rtnetlink: Compute and store minimum ifinfo dump size · c7ac8679
      Greg Rose 提交于
      The message size allocated for rtnl ifinfo dumps was limited to
      a single page.  This is not enough for additional interface info
      available with devices that support SR-IOV and caused a bug in
      which VF info would not be displayed if more than approximately
      40 VFs were created per interface.
      
      Implement a new function pointer for the rtnl_register service that will
      calculate the amount of data required for the ifinfo dump and allocate
      enough data to satisfy the request.
      Signed-off-by: NGreg Rose <gregory.v.rose@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      c7ac8679
  23. 11 5月, 2011 1 次提交
  24. 03 5月, 2011 1 次提交
  25. 30 4月, 2011 1 次提交
    • B
      ipv4, ipv6, bonding: Restore control over number of peer notifications · ad246c99
      Ben Hutchings 提交于
      For backward compatibility, we should retain the module parameters and
      sysfs attributes to control the number of peer notifications
      (gratuitous ARPs and unsolicited NAs) sent after bonding failover.
      Also, it is possible for failover to take place even though the new
      active slave does not have link up, and in that case the peer
      notification should be deferred until it does.
      
      Change ipv4 and ipv6 so they do not automatically send peer
      notifications on bonding failover.
      
      Change the bonding driver to send separate NETDEV_NOTIFY_PEERS
      notifications when the link is up, as many times as requested.  Since
      it does not directly control which protocols send notifications, make
      num_grat_arp and num_unsol_na aliases for a single parameter.  Bump
      the bonding version number and update its documentation.
      Signed-off-by: NBen Hutchings <bhutchings@solarflare.com>
      Signed-off-by: NJay Vosburgh <fubar@us.ibm.com>
      Acked-by: NBrian Haley <brian.haley@hp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ad246c99
  26. 18 4月, 2011 1 次提交
  27. 24 3月, 2011 1 次提交
    • D
      ipv4: Fallback to FIB local table in __ip_dev_find(). · 406b6f97
      David S. Miller 提交于
      In commit 9435eb1c
      ("ipv4: Implement __ip_dev_find using new interface address hash.")
      we reimplemented __ip_dev_find() so that it doesn't have to
      do a full FIB table lookup.
      
      Instead, it consults a hash table of addresses configured to
      interfaces.
      
      This works identically to the old code in all except one case,
      and that is for loopback subnets.
      
      The old code would match the loopback device for any IP address
      that falls within a subnet configured to the loopback device.
      
      Handle this corner case by doing the FIB lookup.
      
      We could implement this via inet_addr_onlink() but:
      
      1) Someone could configure many addresses to loopback and
         inet_addr_onlink() is a simple list traversal.
      
      2) We know the old code works.
      Reported-by: NJulian Anastasov <ja@ssi.bg>
      Acked-by: NStephen Hemminger <shemminger@vyatta.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      406b6f97
  28. 22 3月, 2011 2 次提交
    • J
      ipv4: optimize route adding on secondary promotion · 04024b93
      Julian Anastasov 提交于
      Optimize the calling of fib_add_ifaddr for all
      secondary addresses after the promoted one to start from
      their place, not from the new place of the promoted
      secondary. It will save some CPU cycles because we
      are sure the promoted secondary was first for the subnet
      and all next secondaries do not change their place.
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      04024b93
    • J
      ipv4: remove the routes on secondary promotion · 2d230e2b
      Julian Anastasov 提交于
      The secondary address promotion relies on fib_sync_down_addr
      to remove all routes created for the secondary addresses when
      the old primary address is deleted. It does not happen for cases
      when the primary address is also in another subnet. Fix that
      by deleting local and broadcast routes for all secondaries while
      they are on device list and by faking that all addresses from
      this subnet are to be deleted. It relies on fib_del_ifaddr being
      able to ignore the IPs from the concerned subnet while checking
      for duplication.
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2d230e2b
  29. 10 3月, 2011 1 次提交
    • D
      ipv4: Fix erroneous uses of ifa_address. · 6c91afe1
      David S. Miller 提交于
      In usual cases ifa_address == ifa_local, but in the case where
      SIOCSIFDSTADDR sets the destination address on a point-to-point
      link, ifa_address gets set to that destination address.
      
      Therefore we should use ifa_local when we want the local interface
      address.
      
      There were two cases where the selection was done incorrectly:
      
      1) When devinet_ioctl() does matching, it checks ifa_address even
         though gifconf correct reported ifa_local to the user
      
      2) IN_DEV_ARP_NOTIFY handling sends a gratuitous ARP using
         ifa_address instead of ifa_local.
      Reported-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6c91afe1
  30. 04 3月, 2011 1 次提交
  31. 19 2月, 2011 2 次提交
  32. 15 2月, 2011 1 次提交
    • I
      arp_notify: unconditionally send gratuitous ARP for NETDEV_NOTIFY_PEERS. · d11327ad
      Ian Campbell 提交于
      NETDEV_NOTIFY_PEER is an explicit request by the driver to send a link
      notification while NETDEV_UP/NETDEV_CHANGEADDR generate link
      notifications as a sort of side effect.
      
      In the later cases the sysctl option is present because link
      notification events can have undesired effects e.g. if the link is
      flapping. I don't think this applies in the case of an explicit
      request from a driver.
      
      This patch makes NETDEV_NOTIFY_PEER unconditional, if preferred we
      could add a new sysctl for this case which defaults to on.
      
      This change causes Xen post-migration ARP notifications (which cause
      switches to relearn their MAC tables etc) to be sent by default.
      Signed-off-by: NIan Campbell <ian.campbell@citrix.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d11327ad
  33. 13 12月, 2010 1 次提交
    • D
      ipv4: Don't pre-seed hoplimit metric. · 323e126f
      David S. Miller 提交于
      Always go through a new ip4_dst_hoplimit() helper, just like ipv6.
      
      This allowed several simplifications:
      
      1) The interim dst_metric_hoplimit() can go as it's no longer
         userd.
      
      2) The sysctl_ip_default_ttl entry no longer needs to use
         ipv4_doint_and_flush, since the sysctl is not cached in
         routing cache metrics any longer.
      
      3) ipv4_doint_and_flush no longer needs to be exported and
         therefore can be marked static.
      
      When ipv4_doint_and_flush_strategy was removed some time ago,
      the external declaration in ip.h was mistakenly left around
      so kill that off too.
      
      We have to move the sysctl_ip_default_ttl declaration into
      ipv4's route cache definition header net/route.h, because
      currently net/ip.h (where the declaration lives now) has
      a back dependency on net/route.h
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      323e126f
  34. 07 12月, 2010 1 次提交
  35. 28 11月, 2010 1 次提交
    • T
      rtnl: make link af-specific updates atomic · cf7afbfe
      Thomas Graf 提交于
      As David pointed out correctly, updates to af-specific attributes
      are currently not atomic. If multiple changes are requested and
      one of them fails, previous updates may have been applied already
      leaving the link behind in a undefined state.
      
      This patch splits the function parse_link_af() into two functions
      validate_link_af() and set_link_at(). validate_link_af() is placed
      to validate_linkmsg() check for errors as early as possible before
      any changes to the link have been made. set_link_af() is called to
      commit the changes later.
      
      This method is not fail proof, while it is currently sufficient
      to make set_link_af() inerrable and thus 100% atomic, the
      validation function method will not be able to detect all error
      scenarios in the future, there will likely always be errors
      depending on states which are f.e. not protected by rtnl_mutex
      and thus may change between validation and setting.
      
      Also, instead of silently ignoring unknown address families and
      config blocks for address families which did not register a set
      function the errors EAFNOSUPPORT respectively EOPNOSUPPORT are
      returned to avoid comitting 4 out of 5 update requests without
      notifying the user.
      Signed-off-by: NThomas Graf <tgraf@infradead.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cf7afbfe