1. 09 4月, 2019 1 次提交
  2. 03 4月, 2019 1 次提交
  3. 29 3月, 2019 1 次提交
    • Y
      openvswitch: Add timeout support to ct action · 06bd2bdf
      Yi-Hung Wei 提交于
      Add support for fine-grain timeout support to conntrack action.
      The new OVS_CT_ATTR_TIMEOUT attribute of the conntrack action
      specifies a timeout to be associated with this connection.
      If no timeout is specified, it acts as is, that is the default
      timeout for the connection will be automatically applied.
      
      Example usage:
      $ nfct timeout add timeout_1 inet tcp syn_sent 100 established 200
      $ ovs-ofctl add-flow br0 in_port=1,ip,tcp,action=ct(commit,timeout=timeout_1)
      
      CC: Pravin Shelar <pshelar@ovn.org>
      CC: Pablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: NYi-Hung Wei <yihung.wei@gmail.com>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      06bd2bdf
  4. 27 3月, 2019 1 次提交
  5. 22 3月, 2019 1 次提交
    • J
      genetlink: make policy common to family · 3b0f31f2
      Johannes Berg 提交于
      Since maxattr is common, the policy can't really differ sanely,
      so make it common as well.
      
      The only user that did in fact manage to make a non-common policy
      is taskstats, which has to be really careful about it (since it's
      still using a common maxattr!). This is no longer supported, but
      we can fake it using pre_doit.
      
      This reduces the size of e.g. nl80211.o (which has lots of commands):
      
         text	   data	    bss	    dec	    hex	filename
       398745	  14323	   2240	 415308	  6564c	net/wireless/nl80211.o (before)
       397913	  14331	   2240	 414484	  65314	net/wireless/nl80211.o (after)
      --------------------------------
         -832      +8       0    -824
      
      Which is obviously just 8 bytes for each command, and an added 8
      bytes for the new policy pointer. I'm not sure why the ops list is
      counted as .text though.
      
      Most of the code transformations were done using the following spatch:
          @ops@
          identifier OPS;
          expression POLICY;
          @@
          struct genl_ops OPS[] = {
          ...,
           {
          -	.policy = POLICY,
           },
          ...
          };
      
          @@
          identifier ops.OPS;
          expression ops.POLICY;
          identifier fam;
          expression M;
          @@
          struct genl_family fam = {
                  .ops = OPS,
                  .maxattr = M,
          +       .policy = POLICY,
                  ...
          };
      
      This also gets rid of devlink_nl_cmd_region_read_dumpit() accessing
      the cb->data as ops, which we want to change in a later genl patch.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3b0f31f2
  6. 27 2月, 2019 2 次提交
    • F
      netfilter: nat: remove nf_nat_l3proto.h and nf_nat_core.h · d2c5c103
      Florian Westphal 提交于
      The l3proto name is gone, its header file is the last trace.
      While at it, also remove nf_nat_core.h, its very small and all users
      include nf_nat.h too.
      
      before:
         text    data     bss     dec     hex filename
        22948    1612    4136   28696    7018 nf_nat.ko
      
      after removal of l3proto register/unregister functions:
         text	   data	    bss	    dec	    hex	filename
        22196	   1516	   4136	  27848	   6cc8 nf_nat.ko
      
      checkpatch complains about overly long lines, but line breaks
      do not make things more readable and the line length gets smaller
      here, not larger.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      d2c5c103
    • F
      netfilter: nat: merge nf_nat_ipv4,6 into nat core · 3bf195ae
      Florian Westphal 提交于
      before:
         text    data     bss     dec     hex filename
        16566    1576    4136   22278    5706 nf_nat.ko
         3598	    844	      0	   4442	   115a	nf_nat_ipv6.ko
         3187	    844	      0	   4031	    fbf	nf_nat_ipv4.ko
      
      after:
         text    data     bss     dec     hex filename
        22948    1612    4136   28696    7018 nf_nat.ko
      
      ... with ipv4/v6 nat now provided directly via nf_nat.ko.
      
      Also changes:
             ret = nf_nat_ipv4_fn(priv, skb, state);
             if (ret != NF_DROP && ret != NF_STOLEN &&
      into
      	if (ret != NF_ACCEPT)
      		return ret;
      
      everywhere.
      
      The nat hooks never should return anything other than
      ACCEPT or DROP (and the latter only in rare error cases).
      
      The original code uses multi-line ANDing including assignment-in-if:
              if (ret != NF_DROP && ret != NF_STOLEN &&
                 !(IPCB(skb)->flags & IPSKB_XFRM_TRANSFORMED) &&
                  (ct = nf_ct_get(skb, &ctinfo)) != NULL) {
      
      I removed this while moving, breaking those in separate conditionals
      and moving the assignments into extra lines.
      
      checkpatch still generates some warnings:
       1. Overly long lines (of moved code).
          Breaking them is even more ugly. so I kept this as-is.
       2. use of extern function declarations in a .c file.
          This is necessary evil, we must call
          nf_nat_l3proto_register() from the nat core now.
          All l3proto related functions are removed later in this series,
          those prototypes are then removed as well.
      
      v2: keep empty nf_nat_ipv6_csum_update stub for CONFIG_IPV6=n case.
      v3: remove IS_ENABLED(NF_NAT_IPV4/6) tests, NF_NAT_IPVx toggles
          are removed here.
      v4: also get rid of the assignments in conditionals.
      Signed-off-by: NFlorian Westphal <fw@strlen.de>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      3bf195ae
  7. 18 1月, 2019 1 次提交
  8. 01 12月, 2018 1 次提交
  9. 04 11月, 2018 1 次提交
  10. 05 10月, 2018 1 次提交
  11. 02 10月, 2018 1 次提交
  12. 20 9月, 2018 1 次提交
  13. 18 7月, 2018 2 次提交
  14. 16 7月, 2018 1 次提交
  15. 26 5月, 2018 1 次提交
    • Y
      openvswitch: Support conntrack zone limit · 11efd5cb
      Yi-Hung Wei 提交于
      Currently, nf_conntrack_max is used to limit the maximum number of
      conntrack entries in the conntrack table for every network namespace.
      For the VMs and containers that reside in the same namespace,
      they share the same conntrack table, and the total # of conntrack entries
      for all the VMs and containers are limited by nf_conntrack_max.  In this
      case, if one of the VM/container abuses the usage the conntrack entries,
      it blocks the others from committing valid conntrack entries into the
      conntrack table.  Even if we can possibly put the VM in different network
      namespace, the current nf_conntrack_max configuration is kind of rigid
      that we cannot limit different VM/container to have different # conntrack
      entries.
      
      To address the aforementioned issue, this patch proposes to have a
      fine-grained mechanism that could further limit the # of conntrack entries
      per-zone.  For example, we can designate different zone to different VM,
      and set conntrack limit to each zone.  By providing this isolation, a
      mis-behaved VM only consumes the conntrack entries in its own zone, and
      it will not influence other well-behaved VMs.  Moreover, the users can
      set various conntrack limit to different zone based on their preference.
      
      The proposed implementation utilizes Netfilter's nf_conncount backend
      to count the number of connections in a particular zone.  If the number of
      connection is above a configured limitation, ovs will return ENOMEM to the
      userspace.  If userspace does not configure the zone limit, the limit
      defaults to zero that is no limitation, which is backward compatible to
      the behavior without this patch.
      
      The following high leve APIs are provided to the userspace:
        - OVS_CT_LIMIT_CMD_SET:
          * set default connection limit for all zones
          * set the connection limit for a particular zone
        - OVS_CT_LIMIT_CMD_DEL:
          * remove the connection limit for a particular zone
        - OVS_CT_LIMIT_CMD_GET:
          * get the default connection limit for all zones
          * get the connection limit for a particular zone
      Signed-off-by: NYi-Hung Wei <yihung.wei@gmail.com>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      11efd5cb
  16. 24 4月, 2018 1 次提交
    • T
      netfilter: add NAT support for shifted portmap ranges · 2eb0f624
      Thierry Du Tre 提交于
      This is a patch proposal to support shifted ranges in portmaps.  (i.e. tcp/udp
      incoming port 5000-5100 on WAN redirected to LAN 192.168.1.5:2000-2100)
      
      Currently DNAT only works for single port or identical port ranges.  (i.e.
      ports 5000-5100 on WAN interface redirected to a LAN host while original
      destination port is not altered) When different port ranges are configured,
      either 'random' mode should be used, or else all incoming connections are
      mapped onto the first port in the redirect range. (in described example
      WAN:5000-5100 will all be mapped to 192.168.1.5:2000)
      
      This patch introduces a new mode indicated by flag NF_NAT_RANGE_PROTO_OFFSET
      which uses a base port value to calculate an offset with the destination port
      present in the incoming stream. That offset is then applied as index within the
      redirect port range (index modulo rangewidth to handle range overflow).
      
      In described example the base port would be 5000. An incoming stream with
      destination port 5004 would result in an offset value 4 which means that the
      NAT'ed stream will be using destination port 2004.
      
      Other possibilities include deterministic mapping of larger or multiple ranges
      to a smaller range : WAN:5000-5999 -> LAN:5000-5099 (maps WAN port 5*xx to port
      51xx)
      
      This patch does not change any current behavior. It just adds new NAT proto
      range functionality which must be selected via the specific flag when intended
      to use.
      
      A patch for iptables (libipt_DNAT.c + libip6t_DNAT.c) will also be proposed
      which makes this functionality immediately available.
      Signed-off-by: NThierry Du Tre <thierry@dtsystems.be>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      2eb0f624
  17. 01 2月, 2018 1 次提交
    • E
      openvswitch: Remove padding from packet before L3+ conntrack processing · 9382fe71
      Ed Swierk 提交于
      IPv4 and IPv6 packets may arrive with lower-layer padding that is not
      included in the L3 length. For example, a short IPv4 packet may have
      up to 6 bytes of padding following the IP payload when received on an
      Ethernet device with a minimum packet length of 64 bytes.
      
      Higher-layer processing functions in netfilter (e.g. nf_ip_checksum(),
      and help() in nf_conntrack_ftp) assume skb->len reflects the length of
      the L3 header and payload, rather than referring back to
      ip_hdr->tot_len or ipv6_hdr->payload_len, and get confused by
      lower-layer padding.
      
      In the normal IPv4 receive path, ip_rcv() trims the packet to
      ip_hdr->tot_len before invoking netfilter hooks. In the IPv6 receive
      path, ip6_rcv() does the same using ipv6_hdr->payload_len. Similarly
      in the br_netfilter receive path, br_validate_ipv4() and
      br_validate_ipv6() trim the packet to the L3 length before invoking
      netfilter hooks.
      
      Currently in the OVS conntrack receive path, ovs_ct_execute() pulls
      the skb to the L3 header but does not trim it to the L3 length before
      calling nf_conntrack_in(NF_INET_PRE_ROUTING). When
      nf_conntrack_proto_tcp encounters a packet with lower-layer padding,
      nf_ip_checksum() fails causing a "nf_ct_tcp: bad TCP checksum" log
      message. While extra zero bytes don't affect the checksum, the length
      in the IP pseudoheader does. That length is based on skb->len, and
      without trimming, it doesn't match the length the sender used when
      computing the checksum.
      
      In ovs_ct_execute(), trim the skb to the L3 length before higher-layer
      processing.
      Signed-off-by: NEd Swierk <eswierk@skyportsystems.com>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9382fe71
  18. 03 1月, 2018 1 次提交
  19. 22 10月, 2017 1 次提交
  20. 11 10月, 2017 1 次提交
  21. 25 8月, 2017 1 次提交
  22. 12 8月, 2017 1 次提交
  23. 25 7月, 2017 1 次提交
  24. 16 7月, 2017 1 次提交
    • G
      openvswitch: Fix for force/commit action failures · 8b97ac5b
      Greg Rose 提交于
      When there is an established connection in direction A->B, it is
      possible to receive a packet on port B which then executes
      ct(commit,force) without first performing ct() - ie, a lookup.
      In this case, we would expect that this packet can delete the existing
      entry so that we can commit a connection with direction B->A. However,
      currently we only perform a check in skb_nfct_cached() for whether
      OVS_CS_F_TRACKED is set and OVS_CS_F_INVALID is not set, ie that a
      lookup previously occurred. In the above scenario, a lookup has not
      occurred but we should still be able to statelessly look up the
      existing entry and potentially delete the entry if it is in the
      opposite direction.
      
      This patch extends the check to also hint that if the action has the
      force flag set, then we will lookup the existing entry so that the
      force check at the end of skb_nfct_cached has the ability to delete
      the connection.
      
      Fixes: dd41d330b03 ("openvswitch: Add force commit.")
      CC: Pravin Shelar <pshelar@nicira.com>
      CC: dev@openvswitch.org
      Signed-off-by: NJoe Stringer <joe@ovn.org>
      Signed-off-by: NGreg Rose <gvrose8192@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8b97ac5b
  25. 15 5月, 2017 1 次提交
  26. 25 4月, 2017 3 次提交
    • J
      openvswitch: Delete conntrack entry clashing with an expectation. · cf5d7091
      Jarno Rajahalme 提交于
      Conntrack helpers do not check for a potentially clashing conntrack
      entry when creating a new expectation.  Also, nf_conntrack_in() will
      check expectations (via init_conntrack()) only if a conntrack entry
      can not be found.  The expectation for a packet which also matches an
      existing conntrack entry will not be removed by conntrack, and is
      currently handled inconsistently by OVS, as OVS expects the
      expectation to be removed when the connection tracking entry matching
      that expectation is confirmed.
      
      It should be noted that normally an IP stack would not allow reuse of
      a 5-tuple of an old (possibly lingering) connection for a new data
      connection, so this is somewhat unlikely corner case.  However, it is
      possible that a misbehaving source could cause conntrack entries be
      created that could then interfere with new related connections.
      
      Fix this in the OVS module by deleting the clashing conntrack entry
      after an expectation has been matched.  This causes the following
      nf_conntrack_in() call also find the expectation and remove it when
      creating the new conntrack entry, as well as the forthcoming reply
      direction packets to match the new related connection instead of the
      old clashing conntrack entry.
      
      Fixes: 7f8a436e ("openvswitch: Add conntrack action")
      Reported-by: NYang Song <yangsong@vmware.com>
      Signed-off-by: NJarno Rajahalme <jarno@ovn.org>
      Acked-by: NJoe Stringer <joe@ovn.org>
      Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
      cf5d7091
    • J
      openvswitch: Add eventmask support to CT action. · 12064551
      Jarno Rajahalme 提交于
      Add a new optional conntrack action attribute OVS_CT_ATTR_EVENTMASK,
      which can be used in conjunction with the commit flag
      (OVS_CT_ATTR_COMMIT) to set the mask of bits specifying which
      conntrack events (IPCT_*) should be delivered via the Netfilter
      netlink multicast groups.  Default behavior depends on the system
      configuration, but typically a lot of events are delivered.  This can be
      very chatty for the NFNLGRP_CONNTRACK_UPDATE group, even if only some
      types of events are of interest.
      
      Netfilter core init_conntrack() adds the event cache extension, so we
      only need to set the ctmask value.  However, if the system is
      configured without support for events, the setting will be skipped due
      to extension not being found.
      Signed-off-by: NJarno Rajahalme <jarno@ovn.org>
      Reviewed-by: NGreg Rose <gvrose8192@gmail.com>
      Acked-by: NJoe Stringer <joe@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      12064551
    • J
      openvswitch: Typo fix. · abd0a4f2
      Jarno Rajahalme 提交于
      Fix typo in a comment.
      Signed-off-by: NJarno Rajahalme <jarno@ovn.org>
      Acked-by: NGreg Rose <gvrose8192@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      abd0a4f2
  27. 15 4月, 2017 1 次提交
  28. 29 3月, 2017 1 次提交
    • J
      openvswitch: Fix refcount leak on force commit. · b768b16d
      Jarno Rajahalme 提交于
      The reference count held for skb needs to be released when the skb's
      nfct pointer is cleared regardless of if nf_ct_delete() is called or
      not.
      
      Failing to release the skb's reference cound led to deferred conntrack
      cleanup spinning forever within nf_conntrack_cleanup_net_list() when
      cleaning up a network namespace:
      
         kworker/u16:0-19025 [004] 45981067.173642: sched_switch: kworker/u16:0:19025 [120] R ==> rcu_preempt:7 [120]
         kworker/u16:0-19025 [004] 45981067.173651: kernel_stack: <stack trace>
      => ___preempt_schedule (ffffffffa001ed36)
      => _raw_spin_unlock_bh (ffffffffa0713290)
      => nf_ct_iterate_cleanup (ffffffffc00a4454)
      => nf_conntrack_cleanup_net_list (ffffffffc00a5e1e)
      => nf_conntrack_pernet_exit (ffffffffc00a63dd)
      => ops_exit_list.isra.1 (ffffffffa06075f3)
      => cleanup_net (ffffffffa0607df0)
      => process_one_work (ffffffffa0084c31)
      => worker_thread (ffffffffa008592b)
      => kthread (ffffffffa008bee2)
      => ret_from_fork (ffffffffa071b67c)
      
      Fixes: dd41d33f ("openvswitch: Add force commit.")
      Reported-by: NYang Song <yangsong@vmware.com>
      Signed-off-by: NJarno Rajahalme <jarno@ovn.org>
      Acked-by: NJoe Stringer <joe@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b768b16d
  29. 02 3月, 2017 1 次提交
    • E
      ipv6: orphan skbs in reassembly unit · 48cac18e
      Eric Dumazet 提交于
      Andrey reported a use-after-free in IPv6 stack.
      
      Issue here is that we free the socket while it still has skb
      in TX path and in some queues.
      
      It happens here because IPv6 reassembly unit messes skb->truesize,
      breaking skb_set_owner_w() badly.
      
      We fixed a similar issue for IPV4 in commit 8282f274 ("inet: frag:
      Always orphan skbs inside ip_defrag()")
      Acked-by: NJoe Stringer <joe@ovn.org>
      
      ==================================================================
      BUG: KASAN: use-after-free in sock_wfree+0x118/0x120
      Read of size 8 at addr ffff880062da0060 by task a.out/4140
      
      page:ffffea00018b6800 count:1 mapcount:0 mapping:          (null)
      index:0x0 compound_mapcount: 0
      flags: 0x100000000008100(slab|head)
      raw: 0100000000008100 0000000000000000 0000000000000000 0000000180130013
      raw: dead000000000100 dead000000000200 ffff88006741f140 0000000000000000
      page dumped because: kasan: bad access detected
      
      CPU: 0 PID: 4140 Comm: a.out Not tainted 4.10.0-rc3+ #59
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:15
       dump_stack+0x292/0x398 lib/dump_stack.c:51
       describe_address mm/kasan/report.c:262
       kasan_report_error+0x121/0x560 mm/kasan/report.c:370
       kasan_report mm/kasan/report.c:392
       __asan_report_load8_noabort+0x3e/0x40 mm/kasan/report.c:413
       sock_flag ./arch/x86/include/asm/bitops.h:324
       sock_wfree+0x118/0x120 net/core/sock.c:1631
       skb_release_head_state+0xfc/0x250 net/core/skbuff.c:655
       skb_release_all+0x15/0x60 net/core/skbuff.c:668
       __kfree_skb+0x15/0x20 net/core/skbuff.c:684
       kfree_skb+0x16e/0x4e0 net/core/skbuff.c:705
       inet_frag_destroy+0x121/0x290 net/ipv4/inet_fragment.c:304
       inet_frag_put ./include/net/inet_frag.h:133
       nf_ct_frag6_gather+0x1125/0x38b0 net/ipv6/netfilter/nf_conntrack_reasm.c:617
       ipv6_defrag+0x21b/0x350 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68
       nf_hook_entry_hookfn ./include/linux/netfilter.h:102
       nf_hook_slow+0xc3/0x290 net/netfilter/core.c:310
       nf_hook ./include/linux/netfilter.h:212
       __ip6_local_out+0x52c/0xaf0 net/ipv6/output_core.c:160
       ip6_local_out+0x2d/0x170 net/ipv6/output_core.c:170
       ip6_send_skb+0xa1/0x340 net/ipv6/ip6_output.c:1722
       ip6_push_pending_frames+0xb3/0xe0 net/ipv6/ip6_output.c:1742
       rawv6_push_pending_frames net/ipv6/raw.c:613
       rawv6_sendmsg+0x2cff/0x4130 net/ipv6/raw.c:927
       inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744
       sock_sendmsg_nosec net/socket.c:635
       sock_sendmsg+0xca/0x110 net/socket.c:645
       sock_write_iter+0x326/0x620 net/socket.c:848
       new_sync_write fs/read_write.c:499
       __vfs_write+0x483/0x760 fs/read_write.c:512
       vfs_write+0x187/0x530 fs/read_write.c:560
       SYSC_write fs/read_write.c:607
       SyS_write+0xfb/0x230 fs/read_write.c:599
       entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:203
      RIP: 0033:0x7ff26e6f5b79
      RSP: 002b:00007ff268e0ed98 EFLAGS: 00000206 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 00007ff268e0f9c0 RCX: 00007ff26e6f5b79
      RDX: 0000000000000010 RSI: 0000000020f50fe1 RDI: 0000000000000003
      RBP: 00007ff26ebc1220 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
      R13: 00007ff268e0f9c0 R14: 00007ff26efec040 R15: 0000000000000003
      
      The buggy address belongs to the object at ffff880062da0000
       which belongs to the cache RAWv6 of size 1504
      The buggy address ffff880062da0060 is located 96 bytes inside
       of 1504-byte region [ffff880062da0000, ffff880062da05e0)
      
      Freed by task 4113:
       save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
       save_stack+0x43/0xd0 mm/kasan/kasan.c:502
       set_track mm/kasan/kasan.c:514
       kasan_slab_free+0x73/0xc0 mm/kasan/kasan.c:578
       slab_free_hook mm/slub.c:1352
       slab_free_freelist_hook mm/slub.c:1374
       slab_free mm/slub.c:2951
       kmem_cache_free+0xb2/0x2c0 mm/slub.c:2973
       sk_prot_free net/core/sock.c:1377
       __sk_destruct+0x49c/0x6e0 net/core/sock.c:1452
       sk_destruct+0x47/0x80 net/core/sock.c:1460
       __sk_free+0x57/0x230 net/core/sock.c:1468
       sk_free+0x23/0x30 net/core/sock.c:1479
       sock_put ./include/net/sock.h:1638
       sk_common_release+0x31e/0x4e0 net/core/sock.c:2782
       rawv6_close+0x54/0x80 net/ipv6/raw.c:1214
       inet_release+0xed/0x1c0 net/ipv4/af_inet.c:425
       inet6_release+0x50/0x70 net/ipv6/af_inet6.c:431
       sock_release+0x8d/0x1e0 net/socket.c:599
       sock_close+0x16/0x20 net/socket.c:1063
       __fput+0x332/0x7f0 fs/file_table.c:208
       ____fput+0x15/0x20 fs/file_table.c:244
       task_work_run+0x19b/0x270 kernel/task_work.c:116
       exit_task_work ./include/linux/task_work.h:21
       do_exit+0x186b/0x2800 kernel/exit.c:839
       do_group_exit+0x149/0x420 kernel/exit.c:943
       SYSC_exit_group kernel/exit.c:954
       SyS_exit_group+0x1d/0x20 kernel/exit.c:952
       entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:203
      
      Allocated by task 4115:
       save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
       save_stack+0x43/0xd0 mm/kasan/kasan.c:502
       set_track mm/kasan/kasan.c:514
       kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:605
       kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:544
       slab_post_alloc_hook mm/slab.h:432
       slab_alloc_node mm/slub.c:2708
       slab_alloc mm/slub.c:2716
       kmem_cache_alloc+0x1af/0x250 mm/slub.c:2721
       sk_prot_alloc+0x65/0x2a0 net/core/sock.c:1334
       sk_alloc+0x105/0x1010 net/core/sock.c:1396
       inet6_create+0x44d/0x1150 net/ipv6/af_inet6.c:183
       __sock_create+0x4f6/0x880 net/socket.c:1199
       sock_create net/socket.c:1239
       SYSC_socket net/socket.c:1269
       SyS_socket+0xf9/0x230 net/socket.c:1249
       entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:203
      
      Memory state around the buggy address:
       ffff880062d9ff00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
       ffff880062d9ff80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      >ffff880062da0000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                             ^
       ffff880062da0080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff880062da0100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      ==================================================================
      Reported-by: NAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      48cac18e
  30. 20 2月, 2017 1 次提交
  31. 10 2月, 2017 6 次提交
    • J
      openvswitch: Pack struct sw_flow_key. · 316d4d78
      Jarno Rajahalme 提交于
      struct sw_flow_key has two 16-bit holes. Move the most matched
      conntrack match fields there.  In some typical cases this reduces the
      size of the key that needs to be hashed into half and into one cache
      line.
      Signed-off-by: NJarno Rajahalme <jarno@ovn.org>
      Acked-by: NJoe Stringer <joe@ovn.org>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      316d4d78
    • J
      openvswitch: Add force commit. · dd41d33f
      Jarno Rajahalme 提交于
      Stateful network admission policy may allow connections to one
      direction and reject connections initiated in the other direction.
      After policy change it is possible that for a new connection an
      overlapping conntrack entry already exists, where the original
      direction of the existing connection is opposed to the new
      connection's initial packet.
      
      Most importantly, conntrack state relating to the current packet gets
      the "reply" designation based on whether the original direction tuple
      or the reply direction tuple matched.  If this "directionality" is
      wrong w.r.t. to the stateful network admission policy it may happen
      that packets in neither direction are correctly admitted.
      
      This patch adds a new "force commit" option to the OVS conntrack
      action that checks the original direction of an existing conntrack
      entry.  If that direction is opposed to the current packet, the
      existing conntrack entry is deleted and a new one is subsequently
      created in the correct direction.
      Signed-off-by: NJarno Rajahalme <jarno@ovn.org>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Acked-by: NJoe Stringer <joe@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dd41d33f
    • J
      openvswitch: Add original direction conntrack tuple to sw_flow_key. · 9dd7f890
      Jarno Rajahalme 提交于
      Add the fields of the conntrack original direction 5-tuple to struct
      sw_flow_key.  The new fields are initially marked as non-existent, and
      are populated whenever a conntrack action is executed and either finds
      or generates a conntrack entry.  This means that these fields exist
      for all packets that were not rejected by conntrack as untrackable.
      
      The original tuple fields in the sw_flow_key are filled from the
      original direction tuple of the conntrack entry relating to the
      current packet, or from the original direction tuple of the master
      conntrack entry, if the current conntrack entry has a master.
      Generally, expected connections of connections having an assigned
      helper (e.g., FTP), have a master conntrack entry.
      
      The main purpose of the new conntrack original tuple fields is to
      allow matching on them for policy decision purposes, with the premise
      that the admissibility of tracked connections reply packets (as well
      as original direction packets), and both direction packets of any
      related connections may be based on ACL rules applying to the master
      connection's original direction 5-tuple.  This also makes it easier to
      make policy decisions when the actual packet headers might have been
      transformed by NAT, as the original direction 5-tuple represents the
      packet headers before any such transformation.
      
      When using the original direction 5-tuple the admissibility of return
      and/or related packets need not be based on the mere existence of a
      conntrack entry, allowing separation of admission policy from the
      established conntrack state.  While existence of a conntrack entry is
      required for admission of the return or related packets, policy
      changes can render connections that were initially admitted to be
      rejected or dropped afterwards.  If the admission of the return and
      related packets was based on mere conntrack state (e.g., connection
      being in an established state), a policy change that would make the
      connection rejected or dropped would need to find and delete all
      conntrack entries affected by such a change.  When using the original
      direction 5-tuple matching the affected conntrack entries can be
      allowed to time out instead, as the established state of the
      connection would not need to be the basis for packet admission any
      more.
      
      It should be noted that the directionality of related connections may
      be the same or different than that of the master connection, and
      neither the original direction 5-tuple nor the conntrack state bits
      carry this information.  If needed, the directionality of the master
      connection can be stored in master's conntrack mark or labels, which
      are automatically inherited by the expected related connections.
      
      The fact that neither ARP nor ND packets are trackable by conntrack
      allows mutual exclusion between ARP/ND and the new conntrack original
      tuple fields.  Hence, the IP addresses are overlaid in union with ARP
      and ND fields.  This allows the sw_flow_key to not grow much due to
      this patch, but it also means that we must be careful to never use the
      new key fields with ARP or ND packets.  ARP is easy to distinguish and
      keep mutually exclusive based on the ethernet type, but ND being an
      ICMPv6 protocol requires a bit more attention.
      Signed-off-by: NJarno Rajahalme <jarno@ovn.org>
      Acked-by: NJoe Stringer <joe@ovn.org>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9dd7f890
    • J
      openvswitch: Inherit master's labels. · 09aa98ad
      Jarno Rajahalme 提交于
      We avoid calling into nf_conntrack_in() for expected connections, as
      that would remove the expectation that we want to stick around until
      we are ready to commit the connection.  Instead, we do a lookup in the
      expectation table directly.  However, after a successful expectation
      lookup we have set the flow key label field from the master
      connection, whereas nf_conntrack_in() does not do this.  This leads to
      master's labels being inherited after an expectation lookup, but those
      labels not being inherited after the corresponding conntrack action
      with a commit flag.
      
      This patch resolves the problem by changing the commit code path to
      also inherit the master's labels to the expected connection.
      Resolving this conflict in favor of inheriting the labels allows more
      information be passed from the master connection to related
      connections, which would otherwise be much harder if the 32 bits in
      the connmark are not enough.  Labels can still be set explicitly, so
      this change only affects the default values of the labels in presense
      of a master connection.
      
      Fixes: 7f8a436e ("openvswitch: Add conntrack action")
      Signed-off-by: NJarno Rajahalme <jarno@ovn.org>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Acked-by: NJoe Stringer <joe@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      09aa98ad
    • J
      openvswitch: Refactor labels initialization. · 6ffcea79
      Jarno Rajahalme 提交于
      Refactoring conntrack labels initialization makes changes in later
      patches easier to review.
      Signed-off-by: NJarno Rajahalme <jarno@ovn.org>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Acked-by: NJoe Stringer <joe@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6ffcea79
    • J
      openvswitch: Simplify labels length logic. · b87cec38
      Jarno Rajahalme 提交于
      Since 23014011 ("netfilter: conntrack: support a fixed size of 128
      distinct labels"), the size of conntrack labels extension has fixed to
      128 bits, so we do not need to check for labels sizes shorter than 128
      at run-time.  This patch simplifies labels length logic accordingly,
      but allows the conntrack labels size to be increased in the future
      without breaking the build.  In the event of conntrack labels
      increasing in size OVS would still be able to deal with the 128 first
      label bits.
      Suggested-by: NJoe Stringer <joe@ovn.org>
      Signed-off-by: NJarno Rajahalme <jarno@ovn.org>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Acked-by: NJoe Stringer <joe@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b87cec38