1. 29 8月, 2022 1 次提交
    • J
      genetlink: start to validate reserved header bytes · 9c5d03d3
      Jakub Kicinski 提交于
      We had historically not checked that genlmsghdr.reserved
      is 0 on input which prevents us from using those precious
      bytes in the future.
      
      One use case would be to extend the cmd field, which is
      currently just 8 bits wide and 256 is not a lot of commands
      for some core families.
      
      To make sure that new families do the right thing by default
      put the onus of opting out of validation on existing families.
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      Acked-by: Paul Moore <paul@paul-moore.com> (NetLabel)
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9c5d03d3
  2. 27 8月, 2022 2 次提交
  3. 22 8月, 2022 2 次提交
  4. 05 2月, 2022 1 次提交
  5. 27 7月, 2021 2 次提交
  6. 17 7月, 2021 1 次提交
  7. 23 6月, 2021 1 次提交
    • A
      openvswitch: add trace points · c4ab7b56
      Aaron Conole 提交于
      This makes openvswitch module use the event tracing framework
      to log the upcall interface and action execution pipeline.  When
      using openvswitch as the packet forwarding engine, some types of
      debugging are made possible simply by using the ovs-vswitchd's
      ofproto/trace command.  However, such a command has some
      limitations:
      
        1. When trying to trace packets that go through the CT action,
           the state of the packet can't be determined, and probably
           would be potentially wrong.
      
        2. Deducing problem packets can sometimes be difficult as well
           even if many of the flows are known
      
        3. It's possible to use the openvswitch module even without
           the ovs-vswitchd (although, not common use).
      
      Introduce the event tracing points here to make it possible for
      working through these problems in kernel space.  The style is
      copied from the mac80211 driver-trace / trace code for
      consistency - this creates some checkpatch splats, but the
      official 'guide' for adding tracepoints, as well as the existing
      examples all add the same splats so it seems acceptable.
      Signed-off-by: NAaron Conole <aconole@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c4ab7b56
  8. 04 11月, 2020 1 次提交
  9. 03 10月, 2020 1 次提交
  10. 02 9月, 2020 2 次提交
  11. 14 8月, 2020 1 次提交
  12. 04 8月, 2020 2 次提交
  13. 25 7月, 2020 1 次提交
  14. 18 7月, 2020 1 次提交
    • E
      net: openvswitch: reorder masks array based on usage · eac87c41
      Eelco Chaudron 提交于
      This patch reorders the masks array every 4 seconds based on their
      usage count. This greatly reduces the masks per packet hit, and
      hence the overall performance. Especially in the OVS/OVN case for
      OpenShift.
      
      Here are some results from the OVS/OVN OpenShift test, which use
      8 pods, each pod having 512 uperf connections, each connection
      sends a 64-byte request and gets a 1024-byte response (TCP).
      All uperf clients are on 1 worker node while all uperf servers are
      on the other worker node.
      
      Kernel without this patch     :  7.71 Gbps
      Kernel with this patch applied: 14.52 Gbps
      
      We also run some tests to verify the rebalance activity does not
      lower the flow insertion rate, which does not.
      Signed-off-by: NEelco Chaudron <echaudro@redhat.com>
      Tested-by: NAndrew Theurer <atheurer@redhat.com>
      Reviewed-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      eac87c41
  15. 21 4月, 2020 1 次提交
    • T
      net: openvswitch: ovs_ct_exit to be done under ovs_lock · 27de77ce
      Tonghao Zhang 提交于
      syzbot wrote:
      | =============================
      | WARNING: suspicious RCU usage
      | 5.7.0-rc1+ #45 Not tainted
      | -----------------------------
      | net/openvswitch/conntrack.c:1898 RCU-list traversed in non-reader section!!
      |
      | other info that might help us debug this:
      | rcu_scheduler_active = 2, debug_locks = 1
      | ...
      |
      | stack backtrace:
      | Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.0-0-ga698c8995f-prebuilt.qemu.org 04/01/2014
      | Workqueue: netns cleanup_net
      | Call Trace:
      | ...
      | ovs_ct_exit
      | ovs_exit_net
      | ops_exit_list.isra.7
      | cleanup_net
      | process_one_work
      | worker_thread
      
      To avoid that warning, invoke the ovs_ct_exit under ovs_lock and add
      lockdep_ovsl_is_held as optional lockdep expression.
      
      Link: https://lore.kernel.org/lkml/000000000000e642a905a0cbee6e@google.com
      Fixes: 11efd5cb ("openvswitch: Support conntrack zone limit")
      Cc: Pravin B Shelar <pshelar@ovn.org>
      Cc: Yi-Hung Wei <yihung.wei@gmail.com>
      Reported-by: syzbot+7ef50afd3a211f879112@syzkaller.appspotmail.com
      Signed-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      27de77ce
  16. 30 3月, 2020 1 次提交
  17. 04 3月, 2020 1 次提交
  18. 19 2月, 2020 1 次提交
  19. 15 1月, 2020 1 次提交
  20. 10 12月, 2019 1 次提交
  21. 02 12月, 2019 2 次提交
  22. 27 11月, 2019 1 次提交
  23. 16 11月, 2019 1 次提交
  24. 15 11月, 2019 1 次提交
    • T
      net: openvswitch: add hash info to upcall · bd1903b7
      Tonghao Zhang 提交于
      When using the kernel datapath, the upcall don't
      include skb hash info relatived. That will introduce
      some problem, because the hash of skb is important
      in kernel stack. For example, VXLAN module uses
      it to select UDP src port. The tx queue selection
      may also use the hash in stack.
      
      Hash is computed in different ways. Hash is random
      for a TCP socket, and hash may be computed in hardware,
      or software stack. Recalculation hash is not easy.
      
      Hash of TCP socket is computed:
      tcp_v4_connect
          -> sk_set_txhash (is random)
      
      __tcp_transmit_skb
          -> skb_set_hash_from_sk
      
      There will be one upcall, without information of skb
      hash, to ovs-vswitchd, for the first packet of a TCP
      session. The rest packets will be processed in Open vSwitch
      modules, hash kept. If this tcp session is forward to
      VXLAN module, then the UDP src port of first tcp packet
      is different from rest packets.
      
      TCP packets may come from the host or dockers, to Open vSwitch.
      To fix it, we store the hash info to upcall, and restore hash
      when packets sent back.
      
      +---------------+          +-------------------------+
      |   Docker/VMs  |          |     ovs-vswitchd        |
      +----+----------+          +-+--------------------+--+
           |                       ^                    |
           |                       |                    |
           |                       |  upcall            v restore packet hash (not recalculate)
           |                     +-+--------------------+--+
           |  tap netdev         |                         |   vxlan module
           +--------------->     +-->  Open vSwitch ko     +-->
             or internal type    |                         |
                                 +-------------------------+
      
      Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2019-October/364062.htmlSigned-off-by: NTonghao Zhang <xiangxia.m.yue@gmail.com>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bd1903b7
  25. 04 11月, 2019 3 次提交
  26. 26 10月, 2019 1 次提交
    • G
      netns: fix GFP flags in rtnl_net_notifyid() · d4e4fdf9
      Guillaume Nault 提交于
      In rtnl_net_notifyid(), we certainly can't pass a null GFP flag to
      rtnl_notify(). A GFP_KERNEL flag would be fine in most circumstances,
      but there are a few paths calling rtnl_net_notifyid() from atomic
      context or from RCU critical sections. The later also precludes the use
      of gfp_any() as it wouldn't detect the RCU case. Also, the nlmsg_new()
      call is wrong too, as it uses GFP_KERNEL unconditionally.
      
      Therefore, we need to pass the GFP flags as parameter and propagate it
      through function calls until the proper flags can be determined.
      
      In most cases, GFP_KERNEL is fine. The exceptions are:
        * openvswitch: ovs_vport_cmd_get() and ovs_vport_cmd_dump()
          indirectly call rtnl_net_notifyid() from RCU critical section,
      
        * rtnetlink: rtmsg_ifinfo_build_skb() already receives GFP flags as
          parameter.
      
      Also, in ovs_vport_cmd_build_info(), let's change the GFP flags used
      by nlmsg_new(). The function is allowed to sleep, so better make the
      flags consistent with the ones used in the following
      ovs_vport_cmd_fill_info() call.
      
      Found by code inspection.
      
      Fixes: 9a963454 ("netns: notify netns id events")
      Signed-off-by: NGuillaume Nault <gnault@redhat.com>
      Acked-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d4e4fdf9
  27. 26 9月, 2019 1 次提交
  28. 06 9月, 2019 1 次提交
    • P
      net: openvswitch: Set OvS recirc_id from tc chain index · 95a7233c
      Paul Blakey 提交于
      Offloaded OvS datapath rules are translated one to one to tc rules,
      for example the following simplified OvS rule:
      
      recirc_id(0),in_port(dev1),eth_type(0x0800),ct_state(-trk) actions:ct(),recirc(2)
      
      Will be translated to the following tc rule:
      
      $ tc filter add dev dev1 ingress \
      	    prio 1 chain 0 proto ip \
      		flower tcp ct_state -trk \
      		action ct pipe \
      		action goto chain 2
      
      Received packets will first travel though tc, and if they aren't stolen
      by it, like in the above rule, they will continue to OvS datapath.
      Since we already did some actions (action ct in this case) which might
      modify the packets, and updated action stats, we would like to continue
      the proccessing with the correct recirc_id in OvS (here recirc_id(2))
      where we left off.
      
      To support this, introduce a new skb extension for tc, which
      will be used for translating tc chain to ovs recirc_id to
      handle these miss cases. Last tc chain index will be set
      by tc goto chain action and read by OvS datapath.
      Signed-off-by: NPaul Blakey <paulb@mellanox.com>
      Signed-off-by: NVlad Buslov <vladbu@mellanox.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      95a7233c
  29. 07 8月, 2019 1 次提交
  30. 25 7月, 2019 1 次提交
    • A
      ovs: datapath: hide clang frame-overflow warnings · 26063790
      Arnd Bergmann 提交于
      Some functions in the datapath code are factored out so that each
      one has a stack frame smaller than 1024 bytes with gcc. However,
      when compiling with clang, the functions are inlined more aggressively
      and combined again so we get
      
      net/openvswitch/datapath.c:1124:12: error: stack frame size of 1528 bytes in function 'ovs_flow_cmd_set' [-Werror,-Wframe-larger-than=]
      
      Marking both get_flow_actions() and ovs_nla_init_match_and_action()
      as 'noinline_for_stack' gives us the same behavior that we see with
      gcc, and no warning. Note that this does not mean we actually use
      less stack, as the functions call each other, and we still get
      three copies of the large 'struct sw_flow_key' type on the stack.
      
      The comment tells us that this was previously considered safe,
      presumably since the netlink parsing functions are called with
      a known backchain that does not also use a lot of stack space.
      
      Fixes: 9cc9a5cb ("datapath: Avoid using stack larger than 1024.")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      26063790
  31. 13 7月, 2019 1 次提交
  32. 06 6月, 2019 1 次提交