1. 30 12月, 2016 2 次提交
  2. 28 12月, 2016 1 次提交
  3. 07 12月, 2016 1 次提交
    • F
      netfilter: defrag: only register defrag functionality if needed · 834184b1
      Florian Westphal 提交于
      nf_defrag modules for ipv4 and ipv6 export an empty stub function.
      Any module that needs the defragmentation hooks registered simply 'calls'
      this empty function to create a phony module dependency -- modprobe will
      then load the defrag module too.
      
      This extends netfilter ipv4/ipv6 defragmentation modules to delay the hook
      registration until the functionality is requested within a network namespace
      instead of module load time for all namespaces.
      
      Hooks are only un-registered on module unload or when a namespace that used
      such defrag functionality exits.
      
      We have to use struct net for this as the register hooks can be called
      before netns initialization here from the ipv4/ipv6 conntrack module
      init path.
      
      There is no unregister functionality support, defrag will always be
      active once it was requested inside a net namespace.
      
      The reason is that defrag has impact on nft and iptables rulesets
      (without defrag we might see framents).
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      834184b1
  4. 05 12月, 2016 3 次提交
    • D
      netfilter: conntrack: built-in support for UDPlite · 9b91c96c
      Davide Caratti 提交于
      CONFIG_NF_CT_PROTO_UDPLITE is no more a tristate. When set to y,
      connection tracking support for UDPlite protocol is built-in into
      nf_conntrack.ko.
      
      footprint test:
      $ ls -l net/netfilter/nf_conntrack{_proto_udplite,}.ko \
              net/ipv4/netfilter/nf_conntrack_ipv4.ko \
              net/ipv6/netfilter/nf_conntrack_ipv6.ko
      
      (builtin)|| udplite|  ipv4  |  ipv6  |nf_conntrack
      ---------++--------+--------+--------+--------------
      none     || 432538 | 828755 | 828676 | 6141434
      UDPlite  ||   -    | 829649 | 829362 | 6498204
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      9b91c96c
    • D
      netfilter: conntrack: built-in support for SCTP · a85406af
      Davide Caratti 提交于
      CONFIG_NF_CT_PROTO_SCTP is no more a tristate. When set to y, connection
      tracking support for SCTP protocol is built-in into nf_conntrack.ko.
      
      footprint test:
      $ ls -l net/netfilter/nf_conntrack{_proto_sctp,}.ko \
              net/ipv4/netfilter/nf_conntrack_ipv4.ko \
              net/ipv6/netfilter/nf_conntrack_ipv6.ko
      
      (builtin)||  sctp  |  ipv4  |  ipv6  | nf_conntrack
      ---------++--------+--------+--------+--------------
      none     || 498243 | 828755 | 828676 | 6141434
      SCTP     ||   -    | 829254 | 829175 | 6547872
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      a85406af
    • D
      netfilter: conntrack: built-in support for DCCP · c51d3901
      Davide Caratti 提交于
      CONFIG_NF_CT_PROTO_DCCP is no more a tristate. When set to y, connection
      tracking support for DCCP protocol is built-in into nf_conntrack.ko.
      
      footprint test:
      $ ls -l net/netfilter/nf_conntrack{_proto_dccp,}.ko \
              net/ipv4/netfilter/nf_conntrack_ipv4.ko \
              net/ipv6/netfilter/nf_conntrack_ipv6.ko
      
      (builtin)||  dccp  |  ipv4  |  ipv6  | nf_conntrack
      ---------++--------+--------+--------+--------------
      none     || 469140 | 828755 | 828676 | 6141434
      DCCP     ||   -    | 830566 | 829935 | 6533526
      Signed-off-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      c51d3901
  5. 04 12月, 2016 3 次提交
    • I
      ipv4: fib: Allow for consistent FIB dumping · cacaad11
      Ido Schimmel 提交于
      The next patch will enable listeners of the FIB notification chain to
      request a dump of the FIB tables. However, since RTNL isn't taken during
      the dump, it's possible for the FIB tables to change mid-dump, which
      will result in inconsistency between the listener's table and the
      kernel's.
      
      Allow listeners to know about changes that occurred mid-dump, by adding
      a change sequence counter to each net namespace. The counter is
      incremented just before a notification is sent in the FIB chain.
      Signed-off-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cacaad11
    • A
      netns: fix net_generic() "id - 1" bloat · 6af2d5ff
      Alexey Dobriyan 提交于
      net_generic() function is both a) inline and b) used ~600 times.
      
      It has the following code inside
      
      		...
      	ptr = ng->ptr[id - 1];
      		...
      
      "id" is never compile time constant so compiler is forced to subtract 1.
      And those decrements or LEA [r32 - 1] instructions add up.
      
      We also start id'ing from 1 to catch bugs where pernet sybsystem id
      is not initialized and 0. This is quite pointless idea (nothing will
      work or immediate interference with first registered subsystem) in
      general but it hints what needs to be done for code size reduction.
      
      Namely, overlaying allocation of pointer array and fixed part of
      structure in the beginning and using usual base-0 addressing.
      
      Ids are just cookies, their exact values do not matter, so lets start
      with 3 on x86_64.
      
      Code size savings (oh boy): -4.2 KB
      
      As usual, ignore the initial compiler stupidity part of the table.
      
      	add/remove: 0/0 grow/shrink: 12/670 up/down: 89/-4297 (-4208)
      	function                                     old     new   delta
      	tipc_nametbl_insert_publ                    1250    1270     +20
      	nlmclnt_lookup_host                          686     703     +17
      	nfsd4_encode_fattr                          5930    5941     +11
      	nfs_get_client                              1050    1061     +11
      	register_pernet_operations                   333     342      +9
      	tcf_mirred_init                              843     849      +6
      	tcf_bpf_init                                1143    1149      +6
      	gss_setup_upcall                             990     994      +4
      	idmap_name_to_id                             432     434      +2
      	ops_init                                     274     275      +1
      	nfsd_inject_forget_client                    259     260      +1
      	nfs4_alloc_client                            612     613      +1
      	tunnel_key_walker                            164     163      -1
      
      		...
      
      	tipc_bcbase_select_primary                   392     360     -32
      	mac80211_hwsim_new_radio                    2808    2767     -41
      	ipip6_tunnel_ioctl                          2228    2186     -42
      	tipc_bcast_rcv                               715     672     -43
      	tipc_link_build_proto_msg                   1140    1089     -51
      	nfsd4_lock                                  3851    3796     -55
      	tipc_mon_rcv                                1012     956     -56
      	Total: Before=156643951, After=156639743, chg -0.00%
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6af2d5ff
    • A
      netns: add dummy struct inside "struct net_generic" · 9bfc7b99
      Alexey Dobriyan 提交于
      This is precursor to fixing "[id - 1]" bloat inside net_generic().
      
      Name "s" is chosen to complement name "u" often used for dummy unions.
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9bfc7b99
  6. 18 11月, 2016 1 次提交
    • A
      netns: make struct pernet_operations::id unsigned int · c7d03a00
      Alexey Dobriyan 提交于
      Make struct pernet_operations::id unsigned.
      
      There are 2 reasons to do so:
      
      1)
      This field is really an index into an zero based array and
      thus is unsigned entity. Using negative value is out-of-bound
      access by definition.
      
      2)
      On x86_64 unsigned 32-bit data which are mixed with pointers
      via array indexing or offsets added or subtracted to pointers
      are preffered to signed 32-bit data.
      
      "int" being used as an array index needs to be sign-extended
      to 64-bit before being used.
      
      	void f(long *p, int i)
      	{
      		g(p[i]);
      	}
      
        roughly translates to
      
      	movsx	rsi, esi
      	mov	rdi, [rsi+...]
      	call 	g
      
      MOVSX is 3 byte instruction which isn't necessary if the variable is
      unsigned because x86_64 is zero extending by default.
      
      Now, there is net_generic() function which, you guessed it right, uses
      "int" as an array index:
      
      	static inline void *net_generic(const struct net *net, int id)
      	{
      		...
      		ptr = ng->ptr[id - 1];
      		...
      	}
      
      And this function is used a lot, so those sign extensions add up.
      
      Patch snipes ~1730 bytes on allyesconfig kernel (without all junk
      messing with code generation):
      
      	add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
      
      Unfortunately some functions actually grow bigger.
      This is a semmingly random artefact of code generation with register
      allocator being used differently. gcc decides that some variable
      needs to live in new r8+ registers and every access now requires REX
      prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be
      used which is longer than [r8]
      
      However, overall balance is in negative direction:
      
      	add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
      	function                                     old     new   delta
      	nfsd4_lock                                  3886    3959     +73
      	tipc_link_build_proto_msg                   1096    1140     +44
      	mac80211_hwsim_new_radio                    2776    2808     +32
      	tipc_mon_rcv                                1032    1058     +26
      	svcauth_gss_legacy_init                     1413    1429     +16
      	tipc_bcbase_select_primary                   379     392     +13
      	nfsd4_exchange_id                           1247    1260     +13
      	nfsd4_setclientid_confirm                    782     793     +11
      		...
      	put_client_renew_locked                      494     480     -14
      	ip_set_sockfn_get                            730     716     -14
      	geneve_sock_add                              829     813     -16
      	nfsd4_sequence_done                          721     703     -18
      	nlmclnt_lookup_host                          708     686     -22
      	nfsd4_lockt                                 1085    1063     -22
      	nfs_get_client                              1077    1050     -27
      	tcf_bpf_init                                1106    1076     -30
      	nfsd4_encode_fattr                          5997    5930     -67
      	Total: Before=154856051, After=154854321, chg -0.00%
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c7d03a00
  7. 14 11月, 2016 1 次提交
  8. 10 11月, 2016 1 次提交
    • D
      ipv6: sr: add code base for control plane support of SR-IPv6 · 915d7e5e
      David Lebrun 提交于
      This patch adds the necessary hooks and structures to provide support
      for SR-IPv6 control plane, essentially the Generic Netlink commands
      that will be used for userspace control over the Segment Routing
      kernel structures.
      
      The genetlink commands provide control over two different structures:
      tunnel source and HMAC data. The tunnel source is the source address
      that will be used by default when encapsulating packets into an
      outer IPv6 header + SRH. If the tunnel source is set to :: then an
      address of the outgoing interface will be selected as the source.
      
      The HMAC commands currently just return ENOTSUPP and will be implemented
      in a future patch.
      Signed-off-by: NDavid Lebrun <david.lebrun@uclouvain.be>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      915d7e5e
  9. 25 9月, 2016 1 次提交
  10. 24 8月, 2016 1 次提交
  11. 13 8月, 2016 1 次提交
    • P
      netfilter: remove ip_conntrack* sysctl compat code · adf05168
      Pablo Neira Ayuso 提交于
      This backward compatibility has been around for more than ten years,
      since Yasuyuki Kozakai introduced IPv6 in conntrack. These days, we have
      alternate /proc/net/nf_conntrack* entries, the ctnetlink interface and
      the conntrack utility got adopted by many people in the user community
      according to what I observed on the netfilter user mailing list.
      
      So let's get rid of this.
      
      Note that nf_conntrack_htable_size and unsigned int nf_conntrack_max do
      not need to be exported as symbol anymore.
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      adf05168
  12. 12 8月, 2016 2 次提交
  13. 10 8月, 2016 2 次提交
  14. 25 5月, 2016 1 次提交
    • E
      netfilter: nf_queue: Make the queue_handler pernet · dc3ee32e
      Eric W. Biederman 提交于
      Florian Weber reported:
      > Under full load (unshare() in loop -> OOM conditions) we can
      > get kernel panic:
      >
      > BUG: unable to handle kernel NULL pointer dereference at 0000000000000008
      > IP: [<ffffffff81476c85>] nfqnl_nf_hook_drop+0x35/0x70
      > [..]
      > task: ffff88012dfa3840 ti: ffff88012dffc000 task.ti: ffff88012dffc000
      > RIP: 0010:[<ffffffff81476c85>]  [<ffffffff81476c85>] nfqnl_nf_hook_drop+0x35/0x70
      > RSP: 0000:ffff88012dfffd80  EFLAGS: 00010206
      > RAX: 0000000000000008 RBX: ffffffff81add0c0 RCX: ffff88013fd80000
      > [..]
      > Call Trace:
      >  [<ffffffff81474d98>] nf_queue_nf_hook_drop+0x18/0x20
      >  [<ffffffff814738eb>] nf_unregister_net_hook+0xdb/0x150
      >  [<ffffffff8147398f>] netfilter_net_exit+0x2f/0x60
      >  [<ffffffff8141b088>] ops_exit_list.isra.4+0x38/0x60
      >  [<ffffffff8141b652>] setup_net+0xc2/0x120
      >  [<ffffffff8141bd09>] copy_net_ns+0x79/0x120
      >  [<ffffffff8106965b>] create_new_namespaces+0x11b/0x1e0
      >  [<ffffffff810698a7>] unshare_nsproxy_namespaces+0x57/0xa0
      >  [<ffffffff8104baa2>] SyS_unshare+0x1b2/0x340
      >  [<ffffffff81608276>] entry_SYSCALL_64_fastpath+0x1e/0xa8
      > Code: 65 00 48 89 e5 41 56 41 55 41 54 53 83 e8 01 48 8b 97 70 12 00 00 48 98 49 89 f4 4c 8b 74 c2 18 4d 8d 6e 08 49 81 c6 88 00 00 00 <49> 8b 5d 00 48 85 db 74 1a 48 89 df 4c 89 e2 48 c7 c6 90 68 47
      >
      
      The simple fix for this requires a new pernet variable for struct
      nf_queue that indicates when it is safe to use the dynamically
      allocated nf_queue state.
      
      As we need a variable anyway make nf_register_queue_handler and
      nf_unregister_queue_handler pernet.  This allows the existing logic of
      when it is safe to use the state from the nfnetlink_queue module to be
      reused with no changes except for making it per net.
      
      The syncrhonize_rcu from nf_unregister_queue_handler is moved to a new
      function nfnl_queue_net_exit_batch so that the worst case of having a
      syncrhonize_rcu in the pernet exit path is not experienced in batch
      mode.
      Reported-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      dc3ee32e
  15. 09 5月, 2016 2 次提交
  16. 06 5月, 2016 1 次提交
  17. 05 5月, 2016 1 次提交
  18. 25 4月, 2016 1 次提交
  19. 12 4月, 2016 1 次提交
    • D
      net: ipv4: Consider failed nexthops in multipath routes · a6db4494
      David Ahern 提交于
      Multipath route lookups should consider knowledge about next hops and not
      select a hop that is known to be failed.
      
      Example:
      
                           [h2]                   [h3]   15.0.0.5
                            |                      |
                           3|                     3|
                          [SP1]                  [SP2]--+
                           1  2                   1     2
                           |  |     /-------------+     |
                           |   \   /                    |
                           |     X                      |
                           |    / \                     |
                           |   /   \---------------\    |
                           1  2                     1   2
               12.0.0.2  [TOR1] 3-----------------3 [TOR2] 12.0.0.3
                           4                         4
                            \                       /
                              \                    /
                               \                  /
                                -------|   |-----/
                                       1   2
                                      [TOR3]
                                        3|
                                         |
                                        [h1]  12.0.0.1
      
      host h1 with IP 12.0.0.1 has 2 paths to host h3 at 15.0.0.5:
      
          root@h1:~# ip ro ls
          ...
          12.0.0.0/24 dev swp1  proto kernel  scope link  src 12.0.0.1
          15.0.0.0/16
                  nexthop via 12.0.0.2  dev swp1 weight 1
                  nexthop via 12.0.0.3  dev swp1 weight 1
          ...
      
      If the link between tor3 and tor1 is down and the link between tor1
      and tor2 then tor1 is effectively cut-off from h1. Yet the route lookups
      in h1 are alternating between the 2 routes: ping 15.0.0.5 gets one and
      ssh 15.0.0.5 gets the other. Connections that attempt to use the
      12.0.0.2 nexthop fail since that neighbor is not reachable:
      
          root@h1:~# ip neigh show
          ...
          12.0.0.3 dev swp1 lladdr 00:02:00:00:00:1b REACHABLE
          12.0.0.2 dev swp1  FAILED
          ...
      
      The failed path can be avoided by considering known neighbor information
      when selecting next hops. If the neighbor lookup fails we have no
      knowledge about the nexthop, so give it a shot. If there is an entry
      then only select the nexthop if the state is sane. This is similar to
      what fib_detect_death does.
      
      To maintain backward compatibility use of the neighbor information is
      based on a new sysctl, fib_multipath_use_neigh.
      Signed-off-by: NDavid Ahern <dsa@cumulusnetworks.com>
      Reviewed-by: NJulian Anastasov <ja@ssi.bg>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a6db4494
  20. 17 3月, 2016 1 次提交
  21. 09 3月, 2016 2 次提交
  22. 17 2月, 2016 3 次提交
  23. 11 2月, 2016 4 次提交
  24. 08 2月, 2016 3 次提交