1. 01 4月, 2015 1 次提交
  2. 26 3月, 2015 1 次提交
  3. 13 3月, 2015 1 次提交
    • E
      net: Introduce possible_net_t · 0c5c9fb5
      Eric W. Biederman 提交于
      Having to say
      > #ifdef CONFIG_NET_NS
      > 	struct net *net;
      > #endif
      
      in structures is a little bit wordy and a little bit error prone.
      
      Instead it is possible to say:
      > typedef struct {
      > #ifdef CONFIG_NET_NS
      >       struct net *net;
      > #endif
      > } possible_net_t;
      
      And then in a header say:
      
      > 	possible_net_t net;
      
      Which is cleaner and easier to use and easier to test, as the
      possible_net_t is always there no matter what the compile options.
      
      Further this allows read_pnet and write_pnet to be functions in all
      cases which is better at catching typos.
      
      This change adds possible_net_t, updates the definitions of read_pnet
      and write_pnet, updates optional struct net * variables that
      write_pnet uses on to have the type possible_net_t, and finally fixes
      up the b0rked users of read_pnet and write_pnet.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0c5c9fb5
  4. 18 1月, 2015 1 次提交
    • J
      netlink: make nlmsg_end() and genlmsg_end() void · 053c095a
      Johannes Berg 提交于
      Contrary to common expectations for an "int" return, these functions
      return only a positive value -- if used correctly they cannot even
      return 0 because the message header will necessarily be in the skb.
      
      This makes the very common pattern of
      
        if (genlmsg_end(...) < 0) { ... }
      
      be a whole bunch of dead code. Many places also simply do
      
        return nlmsg_end(...);
      
      and the caller is expected to deal with it.
      
      This also commonly (at least for me) causes errors, because it is very
      common to write
      
        if (my_function(...))
          /* error condition */
      
      and if my_function() does "return nlmsg_end()" this is of course wrong.
      
      Additionally, there's not a single place in the kernel that actually
      needs the message length returned, and if anyone needs it later then
      it'll be very easy to just use skb->len there.
      
      Remove this, and make the functions void. This removes a bunch of dead
      code as described above. The patch adds lines because I did
      
      -	return nlmsg_end(...);
      +	nlmsg_end(...);
      +	return 0;
      
      I could have preserved all the function's return values by returning
      skb->len, but instead I've audited all the places calling the affected
      functions and found that none cared. A few places actually compared
      the return value with <= 0 in dump functionality, but that could just
      be changed to < 0 with no change in behaviour, so I opted for the more
      efficient version.
      
      One instance of the error I've made numerous times now is also present
      in net/phonet/pn_netlink.c in the route_dumpit() function - it didn't
      check for <0 or <=0 and thus broke out of the loop every single time.
      I've preserved this since it will (I think) have caused the messages to
      userspace to be formatted differently with just a single message for
      every SKB returned to userspace. It's possible that this isn't needed
      for the tools that actually use this, but I don't even know what they
      are so couldn't test that changing this behaviour would be acceptable.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      053c095a
  5. 16 7月, 2014 1 次提交
    • T
      net: set name_assign_type in alloc_netdev() · c835a677
      Tom Gundersen 提交于
      Extend alloc_netdev{,_mq{,s}}() to take name_assign_type as argument, and convert
      all users to pass NET_NAME_UNKNOWN.
      
      Coccinelle patch:
      
      @@
      expression sizeof_priv, name, setup, txqs, rxqs, count;
      @@
      
      (
      -alloc_netdev_mqs(sizeof_priv, name, setup, txqs, rxqs)
      +alloc_netdev_mqs(sizeof_priv, name, NET_NAME_UNKNOWN, setup, txqs, rxqs)
      |
      -alloc_netdev_mq(sizeof_priv, name, setup, count)
      +alloc_netdev_mq(sizeof_priv, name, NET_NAME_UNKNOWN, setup, count)
      |
      -alloc_netdev(sizeof_priv, name, setup)
      +alloc_netdev(sizeof_priv, name, NET_NAME_UNKNOWN, setup)
      )
      
      v9: move comments here from the wrong commit
      Signed-off-by: NTom Gundersen <teg@jklm.no>
      Reviewed-by: NDavid Herrmann <dh.herrmann@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c835a677
  6. 03 6月, 2014 1 次提交
    • E
      inetpeer: get rid of ip_id_count · 73f156a6
      Eric Dumazet 提交于
      Ideally, we would need to generate IP ID using a per destination IP
      generator.
      
      linux kernels used inet_peer cache for this purpose, but this had a huge
      cost on servers disabling MTU discovery.
      
      1) each inet_peer struct consumes 192 bytes
      
      2) inetpeer cache uses a binary tree of inet_peer structs,
         with a nominal size of ~66000 elements under load.
      
      3) lookups in this tree are hitting a lot of cache lines, as tree depth
         is about 20.
      
      4) If server deals with many tcp flows, we have a high probability of
         not finding the inet_peer, allocating a fresh one, inserting it in
         the tree with same initial ip_id_count, (cf secure_ip_id())
      
      5) We garbage collect inet_peer aggressively.
      
      IP ID generation do not have to be 'perfect'
      
      Goal is trying to avoid duplicates in a short period of time,
      so that reassembly units have a chance to complete reassembly of
      fragments belonging to one message before receiving other fragments
      with a recycled ID.
      
      We simply use an array of generators, and a Jenkin hash using the dst IP
      as a key.
      
      ipv6_select_ident() is put back into net/ipv6/ip6_output.c where it
      belongs (it is only used from this file)
      
      secure_ip_id() and secure_ipv6_id() no longer are needed.
      
      Rename ip_select_ident_more() to ip_select_ident_segs() to avoid
      unnecessary decrement/increment of the number of segments.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      73f156a6
  7. 31 5月, 2014 1 次提交
  8. 17 4月, 2014 1 次提交
    • C
      ipv4, fib: pass LOOPBACK_IFINDEX instead of 0 to flowi4_iif · 6a662719
      Cong Wang 提交于
      As suggested by Julian:
      
      	Simply, flowi4_iif must not contain 0, it does not
      	look logical to ignore all ip rules with specified iif.
      
      because in fib_rule_match() we do:
      
              if (rule->iifindex && (rule->iifindex != fl->flowi_iif))
                      goto out;
      
      flowi4_iif should be LOOPBACK_IFINDEX by default.
      
      We need to move LOOPBACK_IFINDEX to include/net/flow.h:
      
      1) It is mostly used by flowi_iif
      
      2) Fix the following compile error if we use it in flow.h
      by the patches latter:
      
      In file included from include/linux/netfilter.h:277:0,
                       from include/net/netns/netfilter.h:5,
                       from include/net/net_namespace.h:21,
                       from include/linux/netdevice.h:43,
                       from include/linux/icmpv6.h:12,
                       from include/linux/ipv6.h:61,
                       from include/net/ipv6.h:16,
                       from include/linux/sunrpc/clnt.h:27,
                       from include/linux/nfs_fs.h:30,
                       from init/do_mounts.c:32:
      include/net/flow.h: In function ‘flowi4_init_output’:
      include/net/flow.h:84:32: error: ‘LOOPBACK_IFINDEX’ undeclared (first use in this function)
      
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Julian Anastasov <ja@ssi.bg>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NCong Wang <cwang@twopensource.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6a662719
  9. 21 3月, 2014 1 次提交
  10. 15 1月, 2014 1 次提交
  11. 10 12月, 2013 1 次提交
    • J
      neigh: restore old behaviour of default parms values · 1d4c8c29
      Jiri Pirko 提交于
      Previously inet devices were only constructed when addresses are added.
      Therefore the default neigh parms values they get are the ones at the
      time of these operations.
      
      Now that we're creating inet devices earlier, this changes the behaviour
      of default neigh parms values in an incompatible way (see bug #8519).
      
      This patch creates a compromise by setting the default values at the
      same point as before but only for those that have not been explicitly
      set by the user since the inet device's creation.
      
      Introduced by:
      commit 8030f544
      Author: Herbert Xu <herbert@gondor.apana.org.au>
      Date:   Thu Feb 22 01:53:47 2007 +0900
      
          [IPV4] devinet: Register inetdev earlier.
      Signed-off-by: NJiri Pirko <jiri@resnulli.us>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1d4c8c29
  12. 20 9月, 2013 1 次提交
    • A
      ip: generate unique IP identificator if local fragmentation is allowed · 703133de
      Ansis Atteka 提交于
      If local fragmentation is allowed, then ip_select_ident() and
      ip_select_ident_more() need to generate unique IDs to ensure
      correct defragmentation on the peer.
      
      For example, if IPsec (tunnel mode) has to encrypt large skbs
      that have local_df bit set, then all IP fragments that belonged
      to different ESP datagrams would have used the same identificator.
      If one of these IP fragments would get lost or reordered, then
      peer could possibly stitch together wrong IP fragments that did
      not belong to the same datagram. This would lead to a packet loss
      or data corruption.
      Signed-off-by: NAnsis Atteka <aatteka@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      703133de
  13. 04 9月, 2013 1 次提交
  14. 24 7月, 2013 1 次提交
  15. 11 6月, 2013 1 次提交
  16. 29 5月, 2013 2 次提交
  17. 29 3月, 2013 1 次提交
  18. 27 3月, 2013 1 次提交
    • P
      GRE: Refactor GRE tunneling code. · c5441932
      Pravin B Shelar 提交于
      Following patch refactors GRE code into ip tunneling code and GRE
      specific code. Common tunneling code is moved to ip_tunnel module.
      ip_tunnel module is written as generic library which can be used
      by different tunneling implementations.
      
      ip_tunnel module contains following components:
       - packet xmit and rcv generic code. xmit flow looks like
         (gre_xmit/ipip_xmit)->ip_tunnel_xmit->ip_local_out.
       - hash table of all devices.
       - lookup for tunnel devices.
       - control plane operations like device create, destroy, ioctl, netlink
         operations code.
       - registration for tunneling modules, like gre, ipip etc.
       - define single pcpu_tstats dev->tstats.
       - struct tnl_ptk_info added to pass parsed tunnel packet parameters.
      
      ipip.h header is renamed to ip_tunnel.h
      Signed-off-by: NPravin B Shelar <pshelar@nicira.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c5441932
  19. 19 2月, 2013 2 次提交
  20. 23 1月, 2013 1 次提交
  21. 22 1月, 2013 1 次提交
    • N
      mcast: add multicast proxy support (IPv4 and IPv6) · 660b26dc
      Nicolas Dichtel 提交于
      This patch add the support of proxy multicast, ie being able to build a static
      multicast tree. It adds the support of (*,*) and (*,G) entries.
      
      The user should define an (*,*) entry which is not used for real forwarding.
      This entry defines the upstream in iif and contains all interfaces from the
      static tree in its oifs. It will be used to forward packet upstream when they
      come from an interface belonging to the static tree.
      Hence, the user should define (*,G) entries to build its static tree. Note that
      upstream interface must be part of oifs: packets are sent to all oifs
      interfaces except the input interface. This ensures to always join the whole
      static tree, even if the packet is not coming from the upstream interface.
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Acked-by: NDavid L Stevens <dlstevens@us.ibm.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      660b26dc
  22. 05 12月, 2012 5 次提交
  23. 27 11月, 2012 1 次提交
  24. 26 11月, 2012 2 次提交
  25. 19 11月, 2012 1 次提交
    • E
      net: Allow userns root to control ipv4 · 52e804c6
      Eric W. Biederman 提交于
      Allow an unpriviled user who has created a user namespace, and then
      created a network namespace to effectively use the new network
      namespace, by reducing capable(CAP_NET_ADMIN) and
      capable(CAP_NET_RAW) calls to be ns_capable(net->user_ns,
      CAP_NET_ADMIN), or capable(net->user_ns, CAP_NET_RAW) calls.
      
      Settings that merely control a single network device are allowed.
      Either the network device is a logical network device where
      restrictions make no difference or the network device is hardware NIC
      that has been explicity moved from the initial network namespace.
      
      In general policy and network stack state changes are allowed
      while resource control is left unchanged.
      
      Allow creating raw sockets.
      Allow the SIOCSARP ioctl to control the arp cache.
      Allow the SIOCSIFFLAG ioctl to allow setting network device flags.
      Allow the SIOCSIFADDR ioctl to allow setting a netdevice ipv4 address.
      Allow the SIOCSIFBRDADDR ioctl to allow setting a netdevice ipv4 broadcast address.
      Allow the SIOCSIFDSTADDR ioctl to allow setting a netdevice ipv4 destination address.
      Allow the SIOCSIFNETMASK ioctl to allow setting a netdevice ipv4 netmask.
      Allow the SIOCADDRT and SIOCDELRT ioctls to allow adding and deleting ipv4 routes.
      
      Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL and SIOCDELTUNNEL ioctls for
      adding, changing and deleting gre tunnels.
      
      Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL and SIOCDELTUNNEL ioctls for
      adding, changing and deleting ipip tunnels.
      
      Allow the SIOCADDTUNNEL, SIOCCHGTUNNEL and SIOCDELTUNNEL ioctls for
      adding, changing and deleting ipsec virtual tunnel interfaces.
      
      Allow setting the MRT_INIT, MRT_DONE, MRT_ADD_VIF, MRT_DEL_VIF, MRT_ADD_MFC,
      MRT_DEL_MFC, MRT_ASSERT, MRT_PIM, MRT_TABLE socket options on multicast routing
      sockets.
      
      Allow setting and receiving IPOPT_CIPSO, IP_OPT_SEC, IP_OPT_SID and
      arbitrary ip options.
      
      Allow setting IP_SEC_POLICY/IP_XFRM_POLICY ipv4 socket option.
      Allow setting the IP_TRANSPARENT ipv4 socket option.
      Allow setting the TCP_REPAIR socket option.
      Allow setting the TCP_CONGESTION socket option.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      52e804c6
  26. 06 10月, 2012 1 次提交
  27. 11 9月, 2012 1 次提交
  28. 31 8月, 2012 1 次提交
    • F
      net: ipv4: ipmr_expire_timer causes crash when removing net namespace · acbb219d
      Francesco Ruggeri 提交于
      When tearing down a net namespace, ipv4 mr_table structures are freed
      without first deactivating their timers. This can result in a crash in
      run_timer_softirq.
      This patch mimics the corresponding behaviour in ipv6.
      Locking and synchronization seem to be adequate.
      We are about to kfree mrt, so existing code should already make sure that
      no other references to mrt are pending or can be created by incoming traffic.
      The functions invoked here do not cause new references to mrt or other
      race conditions to be created.
      Invoking del_timer_sync guarantees that ipmr_expire_timer is inactive.
      Both ipmr_expire_process (whose completion we may have to wait in
      del_timer_sync) and mroute_clean_tables internally use mfc_unres_lock
      or other synchronizations when needed, and they both only modify mrt.
      
      Tested in Linux 3.4.8.
      Signed-off-by: NFrancesco Ruggeri <fruggeri@aristanetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      acbb219d
  29. 10 8月, 2012 1 次提交
  30. 21 7月, 2012 2 次提交
  31. 11 7月, 2012 1 次提交
  32. 28 6月, 2012 1 次提交