1. 21 8月, 2023 1 次提交
    • F
      net: openvswitch: fix flow memory leak in ovs_flow_cmd_new · 0023f0c4
      Fedor Pchelkin 提交于
      stable inclusion
      from stable-v5.10.168
      commit 70154489f531587996f3e9d7cceeee65cff0001d
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I7URR4
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=70154489f531587996f3e9d7cceeee65cff0001d
      
      ----------------------------------------------------
      
      [ Upstream commit 0c598aed ]
      
      Syzkaller reports a memory leak of new_flow in ovs_flow_cmd_new() as it is
      not freed when an allocation of a key fails.
      
      BUG: memory leak
      unreferenced object 0xffff888116668000 (size 632):
        comm "syz-executor231", pid 1090, jiffies 4294844701 (age 18.871s)
        hex dump (first 32 bytes):
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<00000000defa3494>] kmem_cache_zalloc include/linux/slab.h:654 [inline]
          [<00000000defa3494>] ovs_flow_alloc+0x19/0x180 net/openvswitch/flow_table.c:77
          [<00000000c67d8873>] ovs_flow_cmd_new+0x1de/0xd40 net/openvswitch/datapath.c:957
          [<0000000010a539a8>] genl_family_rcv_msg_doit+0x22d/0x330 net/netlink/genetlink.c:739
          [<00000000dff3302d>] genl_family_rcv_msg net/netlink/genetlink.c:783 [inline]
          [<00000000dff3302d>] genl_rcv_msg+0x328/0x590 net/netlink/genetlink.c:800
          [<000000000286dd87>] netlink_rcv_skb+0x153/0x430 net/netlink/af_netlink.c:2515
          [<0000000061fed410>] genl_rcv+0x24/0x40 net/netlink/genetlink.c:811
          [<000000009dc0f111>] netlink_unicast_kernel net/netlink/af_netlink.c:1313 [inline]
          [<000000009dc0f111>] netlink_unicast+0x545/0x7f0 net/netlink/af_netlink.c:1339
          [<000000004a5ee816>] netlink_sendmsg+0x8e7/0xde0 net/netlink/af_netlink.c:1934
          [<00000000482b476f>] sock_sendmsg_nosec net/socket.c:651 [inline]
          [<00000000482b476f>] sock_sendmsg+0x152/0x190 net/socket.c:671
          [<00000000698574ba>] ____sys_sendmsg+0x70a/0x870 net/socket.c:2356
          [<00000000d28d9e11>] ___sys_sendmsg+0xf3/0x170 net/socket.c:2410
          [<0000000083ba9120>] __sys_sendmsg+0xe5/0x1b0 net/socket.c:2439
          [<00000000c00628f8>] do_syscall_64+0x30/0x40 arch/x86/entry/common.c:46
          [<000000004abfdcf4>] entry_SYSCALL_64_after_hwframe+0x61/0xc6
      
      To fix this the patch rearranges the goto labels to reflect the order of
      object allocations and adds appropriate goto statements on the error
      paths.
      
      Found by Linux Verification Center (linuxtesting.org) with Syzkaller.
      
      Fixes: 68bb1010 ("openvswitch: Fix flow lookup to use unmasked key")
      Signed-off-by: NFedor Pchelkin <pchelkin@ispras.ru>
      Signed-off-by: NAlexey Khoroshilov <khoroshilov@ispras.ru>
      Acked-by: NEelco Chaudron <echaudro@redhat.com>
      Reviewed-by: NSimon Horman <simon.horman@corigine.com>
      Link: https://lore.kernel.org/r/20230201210218.361970-1-pchelkin@ispras.ruSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: Nzhaoxiaoqiang11 <zhaoxiaoqiang11@jd.com>
      0023f0c4
  2. 14 8月, 2023 1 次提交
    • E
      openvswitch: Fix flow lookup to use unmasked key · 1d732b60
      Eelco Chaudron 提交于
      stable inclusion
      from stable-v5.10.163
      commit 6736b61ecf230dd656464de0f514bdeadb384f20
      category: bugfix
      bugzilla: https://gitee.com/openeuler/kernel/issues/I7PJ9N
      
      Reference: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/?id=6736b61ecf230dd656464de0f514bdeadb384f20
      
      ----------------------------------------------------
      
      [ Upstream commit 68bb1010 ]
      
      The commit mentioned below causes the ovs_flow_tbl_lookup() function
      to be called with the masked key. However, it's supposed to be called
      with the unmasked key. This due to the fact that the datapath supports
      installing wider flows, and OVS relies on this behavior. For example
      if ipv4(src=1.1.1.1/192.0.0.0, dst=1.1.1.2/192.0.0.0) exists, a wider
      flow (smaller mask) of ipv4(src=192.1.1.1/128.0.0.0,dst=192.1.1.2/
      128.0.0.0) is allowed to be added.
      
      However, if we try to add a wildcard rule, the installation fails:
      
      $ ovs-appctl dpctl/add-flow system@myDP "in_port(1),eth_type(0x0800), \
        ipv4(src=1.1.1.1/192.0.0.0,dst=1.1.1.2/192.0.0.0,frag=no)" 2
      $ ovs-appctl dpctl/add-flow system@myDP "in_port(1),eth_type(0x0800), \
        ipv4(src=192.1.1.1/0.0.0.0,dst=49.1.1.2/0.0.0.0,frag=no)" 2
      ovs-vswitchd: updating flow table (File exists)
      
      The reason is that the key used to determine if the flow is already
      present in the system uses the original key ANDed with the mask.
      This results in the IP address not being part of the (miniflow) key,
      i.e., being substituted with an all-zero value. When doing the actual
      lookup, this results in the key wrongfully matching the first flow,
      and therefore the flow does not get installed.
      
      This change reverses the commit below, but rather than having the key
      on the stack, it's allocated.
      
      Fixes: 190aa3e7 ("openvswitch: Fix Frame-size larger than 1024 bytes warning.")
      Signed-off-by: NEelco Chaudron <echaudro@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NSasha Levin <sashal@kernel.org>
      Signed-off-by: Nzhaoxiaoqiang11 <zhaoxiaoqiang11@jd.com>
      1d732b60
  3. 23 5月, 2023 1 次提交
  4. 19 4月, 2023 2 次提交
  5. 04 11月, 2020 1 次提交
  6. 03 10月, 2020 1 次提交
  7. 02 9月, 2020 2 次提交
  8. 14 8月, 2020 1 次提交
  9. 04 8月, 2020 2 次提交
  10. 25 7月, 2020 1 次提交
  11. 18 7月, 2020 1 次提交
    • E
      net: openvswitch: reorder masks array based on usage · eac87c41
      Eelco Chaudron 提交于
      This patch reorders the masks array every 4 seconds based on their
      usage count. This greatly reduces the masks per packet hit, and
      hence the overall performance. Especially in the OVS/OVN case for
      OpenShift.
      
      Here are some results from the OVS/OVN OpenShift test, which use
      8 pods, each pod having 512 uperf connections, each connection
      sends a 64-byte request and gets a 1024-byte response (TCP).
      All uperf clients are on 1 worker node while all uperf servers are
      on the other worker node.
      
      Kernel without this patch     :  7.71 Gbps
      Kernel with this patch applied: 14.52 Gbps
      
      We also run some tests to verify the rebalance activity does not
      lower the flow insertion rate, which does not.
      Signed-off-by: NEelco Chaudron <echaudro@redhat.com>
      Tested-by: NAndrew Theurer <atheurer@redhat.com>
      Reviewed-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eac87c41
  12. 21 4月, 2020 1 次提交
    • T
      net: openvswitch: ovs_ct_exit to be done under ovs_lock · 27de77ce
      Tonghao Zhang 提交于
      syzbot wrote:
      | =============================
      | WARNING: suspicious RCU usage
      | 5.7.0-rc1+ #45 Not tainted
      | -----------------------------
      | net/openvswitch/conntrack.c:1898 RCU-list traversed in non-reader section!!
      |
      | other info that might help us debug this:
      | rcu_scheduler_active = 2, debug_locks = 1
      | ...
      |
      | stack backtrace:
      | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
      | Workqueue: netns cleanup_net
      | Call Trace:
      | ...
      | ovs_ct_exit
      | ovs_exit_net
      | ops_exit_list.isra.7
      | cleanup_net
      | process_one_work
      | worker_thread
      
      To avoid that warning, invoke the ovs_ct_exit under ovs_lock and add
      lockdep_ovsl_is_held as optional lockdep expression.
      
      Link: https://lore.kernel.org/lkml/000000000000e642a905a0cbee6e@google.com
      Fixes: 11efd5cb ("openvswitch: Support conntrack zone limit")
      Cc: Pravin B Shelar <pshelar@ovn.org>
      Cc: Yi-Hung Wei <yihung.wei@gmail.com>
      Reported-by: syzbot+7ef50afd3a211f879112@syzkaller.appspotmail.com
      Signed-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      27de77ce
  13. 30 3月, 2020 1 次提交
  14. 04 3月, 2020 1 次提交
  15. 19 2月, 2020 1 次提交
  16. 15 1月, 2020 1 次提交
  17. 10 12月, 2019 1 次提交
  18. 02 12月, 2019 2 次提交
  19. 27 11月, 2019 1 次提交
  20. 16 11月, 2019 1 次提交
  21. 15 11月, 2019 1 次提交
    • T
      net: openvswitch: add hash info to upcall · bd1903b7
      Tonghao Zhang 提交于
      When using the kernel datapath, the upcall don't
      include skb hash info relatived. That will introduce
      some problem, because the hash of skb is important
      in kernel stack. For example, VXLAN module uses
      it to select UDP src port. The tx queue selection
      may also use the hash in stack.
      
      Hash is computed in different ways. Hash is random
      for a TCP socket, and hash may be computed in hardware,
      or software stack. Recalculation hash is not easy.
      
      Hash of TCP socket is computed:
      tcp_v4_connect
          -> sk_set_txhash (is random)
      
      __tcp_transmit_skb
          -> skb_set_hash_from_sk
      
      There will be one upcall, without information of skb
      hash, to ovs-vswitchd, for the first packet of a TCP
      session. The rest packets will be processed in Open vSwitch
      modules, hash kept. If this tcp session is forward to
      VXLAN module, then the UDP src port of first tcp packet
      is different from rest packets.
      
      TCP packets may come from the host or dockers, to Open vSwitch.
      To fix it, we store the hash info to upcall, and restore hash
      when packets sent back.
      
      +---------------+          +-------------------------+
      |   Docker/VMs  |          |     ovs-vswitchd        |
      +----+----------+          +-+--------------------+--+
           |                       ^                    |
           |                       |                    |
           |                       |  upcall            v restore packet hash (not recalculate)
           |                     +-+--------------------+--+
           |  tap netdev         |                         |   vxlan module
           +--------------->     +-->  Open vSwitch ko     +-->
             or internal type    |                         |
                                 +-------------------------+
      
      Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2019-October/364062.htmlSigned-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bd1903b7
  22. 04 11月, 2019 3 次提交
  23. 26 10月, 2019 1 次提交
    • G
      netns: fix GFP flags in rtnl_net_notifyid() · d4e4fdf9
      Guillaume Nault 提交于
      In rtnl_net_notifyid(), we certainly can't pass a null GFP flag to
      rtnl_notify(). A GFP_KERNEL flag would be fine in most circumstances,
      but there are a few paths calling rtnl_net_notifyid() from atomic
      context or from RCU critical sections. The later also precludes the use
      of gfp_any() as it wouldn't detect the RCU case. Also, the nlmsg_new()
      call is wrong too, as it uses GFP_KERNEL unconditionally.
      
      Therefore, we need to pass the GFP flags as parameter and propagate it
      through function calls until the proper flags can be determined.
      
      In most cases, GFP_KERNEL is fine. The exceptions are:
        * openvswitch: ovs_vport_cmd_get() and ovs_vport_cmd_dump()
          indirectly call rtnl_net_notifyid() from RCU critical section,
      
        * rtnetlink: rtmsg_ifinfo_build_skb() already receives GFP flags as
          parameter.
      
      Also, in ovs_vport_cmd_build_info(), let's change the GFP flags used
      by nlmsg_new(). The function is allowed to sleep, so better make the
      flags consistent with the ones used in the following
      ovs_vport_cmd_fill_info() call.
      
      Found by code inspection.
      
      Fixes: 9a963454 ("netns: notify netns id events")
      Signed-off-by: NGuillaume Nault <gnault@redhat.com>
      Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d4e4fdf9
  24. 26 9月, 2019 1 次提交
  25. 06 9月, 2019 1 次提交
    • P
      net: openvswitch: Set OvS recirc_id from tc chain index · 95a7233c
      Paul Blakey 提交于
      Offloaded OvS datapath rules are translated one to one to tc rules,
      for example the following simplified OvS rule:
      
      recirc_id(0),in_port(dev1),eth_type(0x0800),ct_state(-trk) actions:ct(),recirc(2)
      
      Will be translated to the following tc rule:
      
      $ tc filter add dev dev1 ingress \
      	    prio 1 chain 0 proto ip \
      		flower tcp ct_state -trk \
      		action ct pipe \
      		action goto chain 2
      
      Received packets will first travel though tc, and if they aren't stolen
      by it, like in the above rule, they will continue to OvS datapath.
      Since we already did some actions (action ct in this case) which might
      modify the packets, and updated action stats, we would like to continue
      the proccessing with the correct recirc_id in OvS (here recirc_id(2))
      where we left off.
      
      To support this, introduce a new skb extension for tc, which
      will be used for translating tc chain to ovs recirc_id to
      handle these miss cases. Last tc chain index will be set
      by tc goto chain action and read by OvS datapath.
      Signed-off-by: NPaul Blakey <paulb@mellanox.com>
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      95a7233c
  26. 07 8月, 2019 1 次提交
  27. 25 7月, 2019 1 次提交
    • A
      ovs: datapath: hide clang frame-overflow warnings · 26063790
      Arnd Bergmann 提交于
      Some functions in the datapath code are factored out so that each
      one has a stack frame smaller than 1024 bytes with gcc. However,
      when compiling with clang, the functions are inlined more aggressively
      and combined again so we get
      
      net/openvswitch/datapath.c:1124:12: error: stack frame size of 1528 bytes in function 'ovs_flow_cmd_set' [-Werror,-Wframe-larger-than=]
      
      Marking both get_flow_actions() and ovs_nla_init_match_and_action()
      as 'noinline_for_stack' gives us the same behavior that we see with
      gcc, and no warning. Note that this does not mean we actually use
      less stack, as the functions call each other, and we still get
      three copies of the large 'struct sw_flow_key' type on the stack.
      
      The comment tells us that this was previously considered safe,
      presumably since the netlink parsing functions are called with
      a known backchain that does not also use a lot of stack space.
      
      Fixes: 9cc9a5cb ("datapath: Avoid using stack larger than 1024.")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      26063790
  28. 13 7月, 2019 1 次提交
  29. 06 6月, 2019 1 次提交
  30. 05 6月, 2019 1 次提交
  31. 04 5月, 2019 1 次提交
  32. 28 4月, 2019 3 次提交
    • J
      genetlink: optionally validate strictly/dumps · ef6243ac
      Johannes Berg 提交于
      Add options to strictly validate messages and dump messages,
      sometimes perhaps validating dump messages non-strictly may
      be required, so add an option for that as well.
      
      Since none of this can really be applied to existing commands,
      set the options everwhere using the following spatch:
      
          @@
          identifier ops;
          expression X;
          @@
          struct genl_ops ops[] = {
          ...,
           {
                  .cmd = X,
          +       .validate = GENL_DONT_VALIDATE_STRICT | GENL_DONT_VALIDATE_DUMP,
                  ...
           },
          ...
          };
      
      For new commands one should just not copy the .validate 'opt-out'
      flags and thus get strict validation.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ef6243ac
    • J
      netlink: make validation more configurable for future strictness · 8cb08174
      Johannes Berg 提交于
      We currently have two levels of strict validation:
      
       1) liberal (default)
           - undefined (type >= max) & NLA_UNSPEC attributes accepted
           - attribute length >= expected accepted
           - garbage at end of message accepted
       2) strict (opt-in)
           - NLA_UNSPEC attributes accepted
           - attribute length >= expected accepted
      
      Split out parsing strictness into four different options:
       * TRAILING     - check that there's no trailing data after parsing
                        attributes (in message or nested)
       * MAXTYPE      - reject attrs > max known type
       * UNSPEC       - reject attributes with NLA_UNSPEC policy entries
       * STRICT_ATTRS - strictly validate attribute size
      
      The default for future things should be *everything*.
      The current *_strict() is a combination of TRAILING and MAXTYPE,
      and is renamed to _deprecated_strict().
      The current regular parsing has none of this, and is renamed to
      *_parse_deprecated().
      
      Additionally it allows us to selectively set one of the new flags
      even on old policies. Notably, the UNSPEC flag could be useful in
      this case, since it can be arranged (by filling in the policy) to
      not be an incompatible userspace ABI change, but would then going
      forward prevent forgetting attribute entries. Similar can apply
      to the POLICY flag.
      
      We end up with the following renames:
       * nla_parse           -> nla_parse_deprecated
       * nla_parse_strict    -> nla_parse_deprecated_strict
       * nlmsg_parse         -> nlmsg_parse_deprecated
       * nlmsg_parse_strict  -> nlmsg_parse_deprecated_strict
       * nla_parse_nested    -> nla_parse_nested_deprecated
       * nla_validate_nested -> nla_validate_nested_deprecated
      
      Using spatch, of course:
          @@
          expression TB, MAX, HEAD, LEN, POL, EXT;
          @@
          -nla_parse(TB, MAX, HEAD, LEN, POL, EXT)
          +nla_parse_deprecated(TB, MAX, HEAD, LEN, POL, EXT)
      
          @@
          expression NLH, HDRLEN, TB, MAX, POL, EXT;
          @@
          -nlmsg_parse(NLH, HDRLEN, TB, MAX, POL, EXT)
          +nlmsg_parse_deprecated(NLH, HDRLEN, TB, MAX, POL, EXT)
      
          @@
          expression NLH, HDRLEN, TB, MAX, POL, EXT;
          @@
          -nlmsg_parse_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
          +nlmsg_parse_deprecated_strict(NLH, HDRLEN, TB, MAX, POL, EXT)
      
          @@
          expression TB, MAX, NLA, POL, EXT;
          @@
          -nla_parse_nested(TB, MAX, NLA, POL, EXT)
          +nla_parse_nested_deprecated(TB, MAX, NLA, POL, EXT)
      
          @@
          expression START, MAX, POL, EXT;
          @@
          -nla_validate_nested(START, MAX, POL, EXT)
          +nla_validate_nested_deprecated(START, MAX, POL, EXT)
      
          @@
          expression NLH, HDRLEN, MAX, POL, EXT;
          @@
          -nlmsg_validate(NLH, HDRLEN, MAX, POL, EXT)
          +nlmsg_validate_deprecated(NLH, HDRLEN, MAX, POL, EXT)
      
      For this patch, don't actually add the strict, non-renamed versions
      yet so that it breaks compile if I get it wrong.
      
      Also, while at it, make nla_validate and nla_parse go down to a
      common __nla_validate_parse() function to avoid code duplication.
      
      Ultimately, this allows us to have very strict validation for every
      new caller of nla_parse()/nlmsg_parse() etc as re-introduced in the
      next patch, while existing things will continue to work as is.
      
      In effect then, this adds fully strict validation for any new command.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8cb08174
    • M
      netlink: make nla_nest_start() add NLA_F_NESTED flag · ae0be8de
      Michal Kubecek 提交于
      Even if the NLA_F_NESTED flag was introduced more than 11 years ago, most
      netlink based interfaces (including recently added ones) are still not
      setting it in kernel generated messages. Without the flag, message parsers
      not aware of attribute semantics (e.g. wireshark dissector or libmnl's
      mnl_nlmsg_fprintf()) cannot recognize nested attributes and won't display
      the structure of their contents.
      
      Unfortunately we cannot just add the flag everywhere as there may be
      userspace applications which check nlattr::nla_type directly rather than
      through a helper masking out the flags. Therefore the patch renames
      nla_nest_start() to nla_nest_start_noflag() and introduces nla_nest_start()
      as a wrapper adding NLA_F_NESTED. The calls which add NLA_F_NESTED manually
      are rewritten to use nla_nest_start().
      
      Except for changes in include/net/netlink.h, the patch was generated using
      this semantic patch:
      
      @@ expression E1, E2; @@
      -nla_nest_start(E1, E2)
      +nla_nest_start_noflag(E1, E2)
      
      @@ expression E1, E2; @@
      -nla_nest_start_noflag(E1, E2 | NLA_F_NESTED)
      +nla_nest_start(E1, E2)
      Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Acked-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ae0be8de