1. 04 4月, 2017 1 次提交
    • M
      can: initial support for network namespaces · 8e8cda6d
      Mario Kicherer 提交于
      This patch adds initial support for network namespaces. The changes only
      enable support in the CAN raw, proc and af_can code. GW and BCM still
      have their checks that ensure that they are used only from the main
      namespace.
      
      The patch boils down to moving the global structures, i.e. the global
      filter list and their /proc stats, into a per-namespace structure and passing
      around the corresponding "struct net" in a lot of different places.
      
      Changes since v1:
       - rebased on current HEAD (2bfe01ef)
       - fixed overlong line
      Signed-off-by: NMario Kicherer <dev@kicherer.org>
      Signed-off-by: NMarc Kleine-Budde <mkl@pengutronix.de>
      8e8cda6d
  2. 19 11月, 2016 1 次提交
  3. 18 11月, 2016 1 次提交
    • A
      netns: make struct pernet_operations::id unsigned int · c7d03a00
      Alexey Dobriyan 提交于
      Make struct pernet_operations::id unsigned.
      
      There are 2 reasons to do so:
      
      1)
      This field is really an index into an zero based array and
      thus is unsigned entity. Using negative value is out-of-bound
      access by definition.
      
      2)
      On x86_64 unsigned 32-bit data which are mixed with pointers
      via array indexing or offsets added or subtracted to pointers
      are preffered to signed 32-bit data.
      
      "int" being used as an array index needs to be sign-extended
      to 64-bit before being used.
      
      	void f(long *p, int i)
      	{
      		g(p[i]);
      	}
      
        roughly translates to
      
      	movsx	rsi, esi
      	mov	rdi, [rsi+...]
      	call 	g
      
      MOVSX is 3 byte instruction which isn't necessary if the variable is
      unsigned because x86_64 is zero extending by default.
      
      Now, there is net_generic() function which, you guessed it right, uses
      "int" as an array index:
      
      	static inline void *net_generic(const struct net *net, int id)
      	{
      		...
      		ptr = ng->ptr[id - 1];
      		...
      	}
      
      And this function is used a lot, so those sign extensions add up.
      
      Patch snipes ~1730 bytes on allyesconfig kernel (without all junk
      messing with code generation):
      
      	add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
      
      Unfortunately some functions actually grow bigger.
      This is a semmingly random artefact of code generation with register
      allocator being used differently. gcc decides that some variable
      needs to live in new r8+ registers and every access now requires REX
      prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be
      used which is longer than [r8]
      
      However, overall balance is in negative direction:
      
      	add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
      	function                                     old     new   delta
      	nfsd4_lock                                  3886    3959     +73
      	tipc_link_build_proto_msg                   1096    1140     +44
      	mac80211_hwsim_new_radio                    2776    2808     +32
      	tipc_mon_rcv                                1032    1058     +26
      	svcauth_gss_legacy_init                     1413    1429     +16
      	tipc_bcbase_select_primary                   379     392     +13
      	nfsd4_exchange_id                           1247    1260     +13
      	nfsd4_setclientid_confirm                    782     793     +11
      		...
      	put_client_renew_locked                      494     480     -14
      	ip_set_sockfn_get                            730     716     -14
      	geneve_sock_add                              829     813     -16
      	nfsd4_sequence_done                          721     703     -18
      	nlmclnt_lookup_host                          708     686     -22
      	nfsd4_lockt                                 1085    1063     -22
      	nfs_get_client                              1077    1050     -27
      	tcf_bpf_init                                1106    1076     -30
      	nfsd4_encode_fattr                          5997    5930     -67
      	Total: Before=154856051, After=154854321, chg -0.00%
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c7d03a00
  4. 09 8月, 2016 1 次提交
  5. 03 8月, 2016 1 次提交
  6. 14 12月, 2015 1 次提交
  7. 07 8月, 2015 1 次提交
  8. 19 6月, 2015 1 次提交
  9. 18 5月, 2015 1 次提交
  10. 10 5月, 2015 2 次提交
  11. 13 3月, 2015 2 次提交
  12. 12 3月, 2015 1 次提交
    • E
      net: add real socket cookies · 33cf7c90
      Eric Dumazet 提交于
      A long standing problem in netlink socket dumps is the use
      of kernel socket addresses as cookies.
      
      1) It is a security concern.
      
      2) Sockets can be reused quite quickly, so there is
         no guarantee a cookie is used once and identify
         a flow.
      
      3) request sock, establish sock, and timewait socks
         for a given flow have different cookies.
      
      Part of our effort to bring better TCP statistics requires
      to switch to a different allocator.
      
      In this patch, I chose to use a per network namespace 64bit generator,
      and to use it only in the case a socket needs to be dumped to netlink.
      (This might be refined later if needed)
      
      Note that I tried to carry cookies from request sock, to establish sock,
      then timewait sockets.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Cc: Eric Salo <salo@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      33cf7c90
  13. 04 3月, 2015 1 次提交
    • E
      mpls: Basic routing support · 0189197f
      Eric W. Biederman 提交于
      This change adds a new Kconfig option MPLS_ROUTING.
      
      The core of this change is the code to look at an mpls packet received
      from another machine.  Look that packet up in a routing table and
      forward the packet on.
      
      Support of MPLS over ATM is not considered or attempted here.  This
      implemntation follows RFC3032 and implements the MPLS shim header that
      can pass over essentially any network.
      
      What RFC3021 refers to as the as the Incoming Label Map (ILM) I call
      net->mpls.platform_label[].  What RFC3031 refers to as the Next Label
      Hop Forwarding Entry (NHLFE) I call mpls_route.  Though calling it the
      label fordwarding information base (lfib) might also be valid.
      
      Further the implemntation forwards packets as described in RFC3032.
      There is no need and given the original motivation for MPLS a strong
      discincentive to have a flexible label forwarding path.  In essence
      the logic is the topmost label is read, looked up, removed, and
      replaced by 0 or more new lables and the sent out the specified
      interface to it's next hop.
      
      Quite a few optional features are not implemented here.  Among them
      are generation of ICMP errors when the TTL is exceeded or the packet
      is larger than the next hop MTU (those conditions are detected and the
      packets are dropped instead of generating an icmp error).  The traffic
      class field is always set to 0.  The implementation focuses on IP over
      MPLS and does not handle egress of other kinds of protocols.
      
      Instead of implementing coordination with the neighbour table and
      sorting out how to input next hops in a different address family (for
      which there is value).  I was lazy and implemented a next hop mac
      address instead.  The code is simpler and there are flavor of MPLS
      such as MPLS-TP where neither an IPv4 nor an IPv6 next hop is
      appropriate so a next hop by mac address would need to be implemented
      at some point.
      
      Two new definitions AF_MPLS and PF_MPLS are exposed to userspace.
      
      Decoding the mpls header must be done by first byeswapping a 32bit bit
      endian word into the local cpu endian and then bit shifting to extract
      the pieces.  There is no C bit-field that can represent a wire format
      mpls header on a little endian machine as the low bits of the 20bit
      label wind up in the wrong half of third byte.  Therefore internally
      everything is deal with in cpu native byte order except when writing
      to and reading from a packet.
      
      For management simplicity if a label is configured to forward out
      an interface that is down the packet is dropped early.  Similarly
      if an network interface is removed rt_dev is updated to NULL
      (so no reference is preserved) and any packets for that label
      are dropped.  Keeping the label entries in the kernel allows
      the kernel label table to function as the definitive source
      of which labels are allocated and which are not.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0189197f
  14. 20 1月, 2015 1 次提交
  15. 05 12月, 2014 1 次提交
  16. 01 10月, 2014 1 次提交
    • H
      ipv6: remove rt6i_genid · 705f1c86
      Hannes Frederic Sowa 提交于
      Eric Dumazet noticed that all no-nonexthop or no-gateway routes which
      are already marked DST_HOST (e.g. input routes routes) will always be
      invalidated during sk_dst_check. Thus per-socket dst caching absolutely
      had no effect and early demuxing had no effect.
      
      Thus this patch removes rt6i_genid: fn_sernum already gets modified during
      add operations, so we only must ensure we mutate fn_sernum during ipv6
      address remove operations. This is a fairly cost extensive operations,
      but address removal should not happen that often. Also our mtu update
      functions do the same and we heard no complains so far. xfrm policy
      changes also cause a call into fib6_flush_trees. Also plug a hole in
      rt6_info (no cacheline changes).
      
      I verified via tracing that this change has effect.
      
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: YOSHIFUJI Hideaki <hideaki@yoshifuji.org>
      Cc: Vlad Yasevich <vyasevich@gmail.com>
      Cc: Nicolas Dichtel <nicolas.dichtel@6wind.com>
      Cc: Martin Lau <kafai@fb.com>
      Signed-off-by: NHannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      705f1c86
  17. 25 4月, 2014 1 次提交
  18. 21 4月, 2014 1 次提交
  19. 17 4月, 2014 1 次提交
    • C
      ipv4, fib: pass LOOPBACK_IFINDEX instead of 0 to flowi4_iif · 6a662719
      Cong Wang 提交于
      As suggested by Julian:
      
      	Simply, flowi4_iif must not contain 0, it does not
      	look logical to ignore all ip rules with specified iif.
      
      because in fib_rule_match() we do:
      
              if (rule->iifindex && (rule->iifindex != fl->flowi_iif))
                      goto out;
      
      flowi4_iif should be LOOPBACK_IFINDEX by default.
      
      We need to move LOOPBACK_IFINDEX to include/net/flow.h:
      
      1) It is mostly used by flowi_iif
      
      2) Fix the following compile error if we use it in flow.h
      by the patches latter:
      
      In file included from include/linux/netfilter.h:277:0,
                       from include/net/netns/netfilter.h:5,
                       from include/net/net_namespace.h:21,
                       from include/linux/netdevice.h:43,
                       from include/linux/icmpv6.h:12,
                       from include/linux/ipv6.h:61,
                       from include/net/ipv6.h:16,
                       from include/linux/sunrpc/clnt.h:27,
                       from include/linux/nfs_fs.h:30,
                       from init/do_mounts.c:32:
      include/net/flow.h: In function ‘flowi4_init_output’:
      include/net/flow.h:84:32: error: ‘LOOPBACK_IFINDEX’ undeclared (first use in this function)
      
      Cc: Eric Biederman <ebiederm@xmission.com>
      Cc: Julian Anastasov <ja@ssi.bg>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NCong Wang <cwang@twopensource.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6a662719
  20. 01 3月, 2014 1 次提交
  21. 10 2月, 2014 1 次提交
  22. 15 10月, 2013 1 次提交
  23. 29 9月, 2013 1 次提交
    • E
      net: Delay default_device_exit_batch until no devices are unregistering v2 · 50624c93
      Eric W. Biederman 提交于
      There is currently serialization network namespaces exiting and
      network devices exiting as the final part of netdev_run_todo does not
      happen under the rtnl_lock.  This is compounded by the fact that the
      only list of devices unregistering in netdev_run_todo is local to the
      netdev_run_todo.
      
      This lack of serialization in extreme cases results in network devices
      unregistering in netdev_run_todo after the loopback device of their
      network namespace has been freed (making dst_ifdown unsafe), and after
      the their network namespace has exited (making the NETDEV_UNREGISTER,
      and NETDEV_UNREGISTER_FINAL callbacks unsafe).
      
      Add the missing serialization by a per network namespace count of how
      many network devices are unregistering and having a wait queue that is
      woken up whenever the count is decreased.  The count and wait queue
      allow default_device_exit_batch to wait until all of the unregistration
      activity for a network namespace has finished before proceeding to
      unregister the loopback device and then allowing the network namespace
      to exit.
      
      Only a single global wait queue is used because there is a single global
      lock, and there is a single waiter, per network namespace wait queues
      would be a waste of resources.
      
      The per network namespace count of unregistering devices gives a
      progress guarantee because the number of network devices unregistering
      in an exiting network namespace must ultimately drop to zero (assuming
      network device unregistration completes).
      
      The basic logic remains the same as in v1.  This patch is now half
      comment and half rtnl_lock_unregistering an expanded version of
      wait_event performs no extra work in the common case where no network
      devices are unregistering when we get to default_device_exit_batch.
      Reported-by: NFrancesco Ruggeri <fruggeri@aristanetworks.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      50624c93
  24. 22 9月, 2013 1 次提交
  25. 01 8月, 2013 1 次提交
  26. 26 6月, 2013 1 次提交
  27. 03 6月, 2013 1 次提交
    • T
      ipv4: use separate genid for next hop exceptions · 5aad1de5
      Timo Teräs 提交于
      commit 13d82bf5 (ipv4: Fix flushing of cached routing informations)
      added the support to flush learned pmtu information.
      
      However, using rt_genid is quite heavy as it is bumped on route
      add/change and multicast events amongst other places. These can
      happen quite often, especially if using dynamic routing protocols.
      
      While this is ok with routes (as they are just recreated locally),
      the pmtu information is learned from remote systems and the icmp
      notification can come with long delays. It is worthy to have separate
      genid to avoid excessive pmtu resets.
      
      Cc: Steffen Klassert <steffen.klassert@secunet.com>
      Signed-off-by: NTimo Teräs <timo.teras@iki.fi>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5aad1de5
  28. 06 4月, 2013 1 次提交
  29. 20 11月, 2012 1 次提交
    • E
      proc: Usable inode numbers for the namespace file descriptors. · 98f842e6
      Eric W. Biederman 提交于
      Assign a unique proc inode to each namespace, and use that
      inode number to ensure we only allocate at most one proc
      inode for every namespace in proc.
      
      A single proc inode per namespace allows userspace to test
      to see if two processes are in the same namespace.
      
      This has been a long requested feature and only blocked because
      a naive implementation would put the id in a global space and
      would ultimately require having a namespace for the names of
      namespaces, making migration and certain virtualization tricks
      impossible.
      
      We still don't have per superblock inode numbers for proc, which
      appears necessary for application unaware checkpoint/restart and
      migrations (if the application is using namespace file descriptors)
      but that is now allowd by the design if it becomes important.
      
      I have preallocated the ipc and uts initial proc inode numbers so
      their structures can be statically initialized.
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      98f842e6
  30. 19 11月, 2012 4 次提交
  31. 06 10月, 2012 1 次提交
  32. 20 9月, 2012 1 次提交
  33. 19 9月, 2012 1 次提交
  34. 15 8月, 2012 1 次提交
  35. 10 8月, 2012 1 次提交