1. 22 3月, 2013 1 次提交
  2. 28 2月, 2013 1 次提交
    • S
      hlist: drop the node parameter from iterators · b67bfe0d
      Sasha Levin 提交于
      I'm not sure why, but the hlist for each entry iterators were conceived
      
              list_for_each_entry(pos, head, member)
      
      The hlist ones were greedy and wanted an extra parameter:
      
              hlist_for_each_entry(tpos, pos, head, member)
      
      Why did they need an extra pos parameter? I'm not quite sure. Not only
      they don't really need it, it also prevents the iterator from looking
      exactly like the list iterator, which is unfortunate.
      
      Besides the semantic patch, there was some manual work required:
      
       - Fix up the actual hlist iterators in linux/list.h
       - Fix up the declaration of other iterators based on the hlist ones.
       - A very small amount of places were using the 'node' parameter, this
       was modified to use 'obj->member' instead.
       - Coccinelle didn't handle the hlist_for_each_entry_safe iterator
       properly, so those had to be fixed up manually.
      
      The semantic patch which is mostly the work of Peter Senna Tschudin is here:
      
      @@
      iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host;
      
      type T;
      expression a,c,d,e;
      identifier b;
      statement S;
      @@
      
      -T b;
          <+... when != b
      (
      hlist_for_each_entry(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_continue(a,
      - b,
      c) S
      |
      hlist_for_each_entry_from(a,
      - b,
      c) S
      |
      hlist_for_each_entry_rcu(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_rcu_bh(a,
      - b,
      c, d) S
      |
      hlist_for_each_entry_continue_rcu_bh(a,
      - b,
      c) S
      |
      for_each_busy_worker(a, c,
      - b,
      d) S
      |
      ax25_uid_for_each(a,
      - b,
      c) S
      |
      ax25_for_each(a,
      - b,
      c) S
      |
      inet_bind_bucket_for_each(a,
      - b,
      c) S
      |
      sctp_for_each_hentry(a,
      - b,
      c) S
      |
      sk_for_each(a,
      - b,
      c) S
      |
      sk_for_each_rcu(a,
      - b,
      c) S
      |
      sk_for_each_from
      -(a, b)
      +(a)
      S
      + sk_for_each_from(a) S
      |
      sk_for_each_safe(a,
      - b,
      c, d) S
      |
      sk_for_each_bound(a,
      - b,
      c) S
      |
      hlist_for_each_entry_safe(a,
      - b,
      c, d, e) S
      |
      hlist_for_each_entry_continue_rcu(a,
      - b,
      c) S
      |
      nr_neigh_for_each(a,
      - b,
      c) S
      |
      nr_neigh_for_each_safe(a,
      - b,
      c, d) S
      |
      nr_node_for_each(a,
      - b,
      c) S
      |
      nr_node_for_each_safe(a,
      - b,
      c, d) S
      |
      - for_each_gfn_sp(a, c, d, b) S
      + for_each_gfn_sp(a, c, d) S
      |
      - for_each_gfn_indirect_valid_sp(a, c, d, b) S
      + for_each_gfn_indirect_valid_sp(a, c, d) S
      |
      for_each_host(a,
      - b,
      c) S
      |
      for_each_host_safe(a,
      - b,
      c, d) S
      |
      for_each_mesh_entry(a,
      - b,
      c, d) S
      )
          ...+>
      
      [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c]
      [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c]
      [akpm@linux-foundation.org: checkpatch fixes]
      [akpm@linux-foundation.org: fix warnings]
      [akpm@linux-foudnation.org: redo intrusive kvm changes]
      Tested-by: NPeter Senna Tschudin <peter.senna@gmail.com>
      Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b67bfe0d
  3. 12 1月, 2013 1 次提交
  4. 19 11月, 2012 3 次提交
    • E
      net: Enable a userns root rtnl calls that are safe for unprivilged users · b51642f6
      Eric W. Biederman 提交于
      - Only allow moving network devices to network namespaces you have
        CAP_NET_ADMIN privileges over.
      
      - Enable creating/deleting/modifying interfaces
      - Enable adding/deleting addresses
      - Enable adding/setting/deleting neighbour entries
      - Enable adding/removing routes
      - Enable adding/removing fib rules
      - Enable setting the forwarding state
      - Enable adding/removing ipv6 address labels
      - Enable setting bridge parameter
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b51642f6
    • E
      net: Allow userns root to control ipv4 · 52e804c6
      Eric W. Biederman 提交于
      Allow an unpriviled user who has created a user namespace, and then
      created a network namespace to effectively use the new network
      namespace, by reducing capable(CAP_NET_ADMIN) and
      capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns,
      CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls.
      
      Settings that merely control a single network device are allowed.
      Either the network device is a logical network device where
      restrictions make no difference or the network device is hardware NIC
      that has been explicity moved from the initial network namespace.
      
      In general policy and network stack state changes are allowed
      while resource control is left unchanged.
      
      Allow creating raw sockets.
      Allow the SIOCSARP ioctl to control the arp cache.
      Allow the SIOCSIFFLAG ioctl to allow setting network device flags.
      Allow the SIOCSIFADDR ioctl to allow setting a netdevice ipv4 address.
      Allow the SIOCSIFBRDADDR ioctl to allow setting a netdevice ipv4 broadcast address.
      Allow the SIOCSIFDSTADDR ioctl to allow setting a netdevice ipv4 destination address.
      Allow the SIOCSIFNETMASK ioctl to allow setting a netdevice ipv4 netmask.
      Allow the SIOCADDRT and SIOCDELRT ioctls to allow adding and deleting ipv4 routes.
      
      Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL and SIOCDELTUNNEL ioctls for
      adding, changing and deleting gre tunnels.
      
      Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL and SIOCDELTUNNEL ioctls for
      adding, changing and deleting ipip tunnels.
      
      Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL and SIOCDELTUNNEL ioctls for
      adding, changing and deleting ipsec virtual tunnel interfaces.
      
      Allow setting the MRT_INIT, MRT_DONE, MRT_ADD_VIF, MRT_DEL_VIF, MRT_ADD_MFC,
      MRT_DEL_MFC, MRT_ASSERT, MRT_PIM, MRT_TABLE socket options on multicast routing
      sockets.
      
      Allow setting and receiving IPOPT_CIPSO, IP_OPT_SEC, IP_OPT_SID and
      arbitrary ip options.
      
      Allow setting IP_SEC_POLICY/IP_XFRM_POLICY ipv4 socket option.
      Allow setting the IP_TRANSPARENT ipv4 socket option.
      Allow setting the TCP_REPAIR socket option.
      Allow setting the TCP_CONGESTION socket option.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      52e804c6
    • E
      net: Push capable(CAP_NET_ADMIN) into the rtnl methods · dfc47ef8
      Eric W. Biederman 提交于
      - In rtnetlink_rcv_msg convert the capable(CAP_NET_ADMIN) check
        to ns_capable(net->user-ns, CAP_NET_ADMIN).  Allowing unprivileged
        users to make netlink calls to modify their local network
        namespace.
      
      - In the rtnetlink doit methods add capable(CAP_NET_ADMIN) so
        that calls that are not safe for unprivileged users are still
        protected.
      
      Later patches will remove the extra capable calls from methods
      that are safe for unprivilged users.
      Acked-by: NSerge Hallyn <serge.hallyn@canonical.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dfc47ef8
  5. 09 10月, 2012 1 次提交
    • J
      ipv4: fix sending of redirects · e81da0e1
      Julian Anastasov 提交于
      After "Cache input routes in fib_info nexthops" (commit
      d2d68ba9) and "Elide fib_validate_source() completely when possible"
      (commit 7a9bc9b8) we can not send ICMP redirects. It seems we
      should not cache the RTCF_DOREDIRECT flag in nh_rth_input because
      the same fib_info can be used for traffic that is not redirected,
      eg. from other input devices or from sources that are not in same subnet.
      
      	As result, we have to disable the caching of RTCF_DOREDIRECT
      flag and to force source validation for the case when forwarding
      traffic to the input device. If traffic comes from directly connected
      source we allow redirection as it was done before both changes.
      
      	Avoid setting RTCF_DOREDIRECT if IN_DEV_TX_REDIRECTS
      is disabled, this can avoid source address validation and to
      help caching the routes.
      
      	After the change "Adjust semantics of rt->rt_gateway"
      (commit f8126f1d) we should make sure our ICMP_REDIR_HOST messages
      contain daddr instead of 0.0.0.0 when target is directly connected.
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e81da0e1
  6. 19 9月, 2012 1 次提交
  7. 11 9月, 2012 1 次提交
  8. 09 9月, 2012 1 次提交
  9. 08 9月, 2012 1 次提交
  10. 24 8月, 2012 1 次提交
  11. 23 8月, 2012 1 次提交
    • E
      net: remove delay at device dismantle · 0115e8e3
      Eric Dumazet 提交于
      I noticed extra one second delay in device dismantle, tracked down to
      a call to dst_dev_event() while some call_rcu() are still in RCU queues.
      
      These call_rcu() were posted by rt_free(struct rtable *rt) calls.
      
      We then wait a little (but one second) in netdev_wait_allrefs() before
      kicking again NETDEV_UNREGISTER.
      
      As the call_rcu() are now completed, dst_dev_event() can do the needed
      device swap on busy dst.
      
      To solve this problem, add a new NETDEV_UNREGISTER_FINAL, called
      after a rcu_barrier(), but outside of RTNL lock.
      
      Use NETDEV_UNREGISTER_FINAL with care !
      
      Change dst_dev_event() handler to react to NETDEV_UNREGISTER_FINAL
      
      Also remove NETDEV_UNREGISTER_BATCH, as its not used anymore after
      IP cache removal.
      
      With help from Gao feng
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Tom Herbert <therbert@google.com>
      Cc: Mahesh Bandewar <maheshb@google.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Gao feng <gaofeng@cn.fujitsu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0115e8e3
  12. 10 8月, 2012 1 次提交
  13. 01 8月, 2012 1 次提交
  14. 24 7月, 2012 1 次提交
  15. 21 7月, 2012 1 次提交
    • D
      ipv4: Delete routing cache. · 89aef892
      David S. Miller 提交于
      The ipv4 routing cache is non-deterministic, performance wise, and is
      subject to reasonably easy to launch denial of service attacks.
      
      The routing cache works great for well behaved traffic, and the world
      was a much friendlier place when the tradeoffs that led to the routing
      cache's design were considered.
      
      What it boils down to is that the performance of the routing cache is
      a product of the traffic patterns seen by a system rather than being a
      product of the contents of the routing tables.  The former of which is
      controllable by external entitites.
      
      Even for "well behaved" legitimate traffic, high volume sites can see
      hit rates in the routing cache of only ~%10.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      89aef892
  16. 19 7月, 2012 1 次提交
  17. 13 7月, 2012 1 次提交
  18. 06 7月, 2012 1 次提交
  19. 30 6月, 2012 1 次提交
    • P
      netlink: add netlink_kernel_cfg parameter to netlink_kernel_create · a31f2d17
      Pablo Neira Ayuso 提交于
      This patch adds the following structure:
      
      struct netlink_kernel_cfg {
              unsigned int    groups;
              void            (*input)(struct sk_buff *skb);
              struct mutex    *cb_mutex;
      };
      
      That can be passed to netlink_kernel_create to set optional configurations
      for netlink kernel sockets.
      
      I've populated this structure by looking for NULL and zero parameters at the
      existing code. The remaining parameters that always need to be set are still
      left in the original interface.
      
      That includes optional parameters for the netlink socket creation. This allows
      easy extensibility of this interface in the future.
      
      This patch also adapts all callers to use this new interface.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a31f2d17
  20. 29 6月, 2012 3 次提交
    • D
      ipv4: Elide fib_validate_source() completely when possible. · 7a9bc9b8
      David S. Miller 提交于
      If rpfilter is off (or the SKB has an IPSEC path) and there are not
      tclassid users, we don't have to do anything at all when
      fib_validate_source() is invoked besides setting the itag to zero.
      
      We monitor tclassid uses with a counter (modified only under RTNL and
      marked __read_mostly) and we protect the fib_validate_source() real
      work with a test against this counter and whether rpfilter is to be
      done.
      
      Having a way to know whether we need no tclassid processing or not
      also opens the door for future optimized rpfilter algorithms that do
      not perform full FIB lookups.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7a9bc9b8
    • D
      ipv4: Adjust in_dev handling in fib_validate_source() · 9e56e380
      David S. Miller 提交于
      Checking for in_dev being NULL is pointless.
      
      In fact, all of our callers have in_dev precomputed already,
      so just pass it in and remove the NULL checking.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9e56e380
    • D
      ipv4: Fix bugs in fib_compute_spec_dst(). · a207a4b2
      David S. Miller 提交于
      Based upon feedback from Julian Anastasov.
      
      1) Use route flags to determine multicast/broadcast, not the
         packet flags.
      
      2) Leave saddr unspecified in flow key.
      
      3) Adjust how we invoke inet_select_addr().  Pass ip_hdr(skb)->saddr as
         second arg, and if it was zeronet use link scope.
      
      4) Use loopback as input interface in flow key.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a207a4b2
  21. 28 6月, 2012 2 次提交
    • D
      ipv4: Kill rt->rt_spec_dst, no longer used. · 41347dcd
      David S. Miller 提交于
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      41347dcd
    • D
      ipv4: Create and use fib_compute_spec_dst() helper. · 35ebf65e
      David S. Miller 提交于
      The specific destination is the host we direct unicast replies to.
      Usually this is the original packet source address, but if we are
      responding to a multicast or broadcast packet we have to use something
      different.
      
      Specifically we must use the source address we would use if we were to
      send a packet to the unicast source of the original packet.
      
      The routing cache precomputes this value, but we want to remove that
      precomputation because it creates a hard dependency on the expensive
      rpfilter source address validation which we'd like to make cheaper.
      
      There are only three places where this matters:
      
      1) ICMP replies.
      
      2) pktinfo CMSG
      
      3) IP options
      
      Now there will be no real users of rt->rt_spec_dst and we can simply
      remove it altogether.
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      35ebf65e
  22. 16 4月, 2012 1 次提交
  23. 29 3月, 2012 1 次提交
  24. 12 3月, 2012 1 次提交
    • J
      net: Convert printks to pr_<level> · 058bd4d2
      Joe Perches 提交于
      Use a more current kernel messaging style.
      
      Convert a printk block to print_hex_dump.
      Coalesce formats, align arguments.
      Use %s, __func__ instead of embedding function names.
      
      Some messages that were prefixed with <foo>_close are
      now prefixed with <foo>_fini.  Some ah4 and esp messages
      are now not prefixed with "ip ".
      
      The intent of this patch is to later add something like
        #define pr_fmt(fmt) "IPv4: " fmt.
      to standardize the output messages.
      
      Text size is trivially reduced. (x86-32 allyesconfig)
      
      $ size net/ipv4/built-in.o*
         text	   data	    bss	    dec	    hex	filename
       887888	  31558	 249696	1169142	 11d6f6	net/ipv4/built-in.o.new
       887934	  31558	 249800	1169292	 11d78c	net/ipv4/built-in.o.old
      Signed-off-by: NJoe Perches <joe@perches.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      058bd4d2
  25. 10 6月, 2011 1 次提交
    • G
      rtnetlink: Compute and store minimum ifinfo dump size · c7ac8679
      Greg Rose 提交于
      The message size allocated for rtnl ifinfo dumps was limited to
      a single page.  This is not enough for additional interface info
      available with devices that support SR-IOV and caused a bug in
      which VF info would not be displayed if more than approximately
      40 VFs were created per interface.
      
      Implement a new function pointer for the rtnl_register service that will
      calculate the amount of data required for the ifinfo dump and allocate
      enough data to satisfy the request.
      Signed-off-by: NGreg Rose <gregory.v.rose@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      c7ac8679
  26. 11 4月, 2011 2 次提交
  27. 31 3月, 2011 1 次提交
  28. 25 3月, 2011 1 次提交
  29. 22 3月, 2011 1 次提交
    • J
      ipv4: fix route deletion for IPs on many subnets · e6abbaa2
      Julian Anastasov 提交于
      Alex Sidorenko reported for problems with local
      routes left after IP addresses are deleted. It happens
      when same IPs are used in more than one subnet for the
      device.
      
      	Fix fib_del_ifaddr to restrict the checks for duplicate
      local and broadcast addresses only to the IFAs that use
      our primary IFA or another primary IFA with same address.
      And we expect the prefsrc to be matched when the routes
      are deleted because it is possible they to differ only by
      prefsrc. This patch prevents local and broadcast routes
      to be leaked until their primary IP is deleted finally
      from the box.
      
      	As the secondary address promotion needs to delete
      the routes for all secondaries that used the old primary IFA,
      add option to ignore these secondaries from the checks and
      to assume they are already deleted, so that we can safely
      delete the route while these IFAs are still on the device list.
      Reported-by: NAlex Sidorenko <alexandre.sidorenko@hp.com>
      Signed-off-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e6abbaa2
  30. 13 3月, 2011 3 次提交
  31. 10 3月, 2011 1 次提交
  32. 08 3月, 2011 1 次提交
    • D
      ipv4: Cache source address in nexthop entries. · 1fc050a1
      David S. Miller 提交于
      When doing output route lookups, we have to select the source address
      if the user has not specified an explicit one.
      
      First, if the route has an explicit preferred source address
      specified, then we use that.
      
      Otherwise we search the route's outgoing interface for a suitable
      address.
      
      This search can be precomputed and cached at route insertion time.
      
      The only missing part is that we have to refresh this precomputed
      value any time addresses are added or removed from the interface, and
      this is accomplished by fib_update_nh_saddrs().
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1fc050a1