1. 14 4月, 2017 1 次提交
  2. 02 4月, 2017 1 次提交
  3. 29 3月, 2017 1 次提交
    • J
      openvswitch: Fix refcount leak on force commit. · b768b16d
      Jarno Rajahalme 提交于
      The reference count held for skb needs to be released when the skb's
      nfct pointer is cleared regardless of if nf_ct_delete() is called or
      not.
      
      Failing to release the skb's reference cound led to deferred conntrack
      cleanup spinning forever within nf_conntrack_cleanup_net_list() when
      cleaning up a network namespace:
      
         kworker/u16:0-19025 [004] 45981067.173642: sched_switch: kworker/u16:0:19025 [120] R ==> rcu_preempt:7 [120]
         kworker/u16:0-19025 [004] 45981067.173651: kernel_stack: <stack trace>
      => ___preempt_schedule (ffffffffa001ed36)
      => _raw_spin_unlock_bh (ffffffffa0713290)
      => nf_ct_iterate_cleanup (ffffffffc00a4454)
      => nf_conntrack_cleanup_net_list (ffffffffc00a5e1e)
      => nf_conntrack_pernet_exit (ffffffffc00a63dd)
      => ops_exit_list.isra.1 (ffffffffa06075f3)
      => cleanup_net (ffffffffa0607df0)
      => process_one_work (ffffffffa0084c31)
      => worker_thread (ffffffffa008592b)
      => kthread (ffffffffa008bee2)
      => ret_from_fork (ffffffffa071b67c)
      
      Fixes: dd41d33f ("openvswitch: Add force commit.")
      Reported-by: NYang Song <yangsong@vmware.com>
      Signed-off-by: NJarno Rajahalme <jarno@ovn.org>
      Acked-by: NJoe Stringer <joe@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b768b16d
  4. 23 3月, 2017 4 次提交
  5. 17 3月, 2017 1 次提交
  6. 16 3月, 2017 1 次提交
  7. 03 3月, 2017 1 次提交
  8. 02 3月, 2017 1 次提交
    • E
      ipv6: orphan skbs in reassembly unit · 48cac18e
      Eric Dumazet 提交于
      Andrey reported a use-after-free in IPv6 stack.
      
      Issue here is that we free the socket while it still has skb
      in TX path and in some queues.
      
      It happens here because IPv6 reassembly unit messes skb->truesize,
      breaking skb_set_owner_w() badly.
      
      We fixed a similar issue for IPV4 in commit 8282f274 ("inet: frag:
      Always orphan skbs inside ip_defrag()")
      Acked-by: NJoe Stringer <joe@ovn.org>
      
      ==================================================================
      BUG: KASAN: use-after-free in sock_wfree+0x118/0x120
      Read of size 8 at addr ffff880062da0060 by task a.out/4140
      
      page:ffffea00018b6800 count:1 mapcount:0 mapping:          (null)
      index:0x0 compound_mapcount: 0
      flags: 0x100000000008100(slab|head)
      raw: 0100000000008100 0000000000000000 0000000000000000 0000000180130013
      raw: dead000000000100 dead000000000200 ffff88006741f140 0000000000000000
      page dumped because: kasan: bad access detected
      
      CPU: 0 PID: 4140 Comm: a.out Not tainted 4.10.0-rc3+ #59
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:15
       dump_stack+0x292/0x398 lib/dump_stack.c:51
       describe_address mm/kasan/report.c:262
       kasan_report_error+0x121/0x560 mm/kasan/report.c:370
       kasan_report mm/kasan/report.c:392
       __asan_report_load8_noabort+0x3e/0x40 mm/kasan/report.c:413
       sock_flag ./arch/x86/include/asm/bitops.h:324
       sock_wfree+0x118/0x120 net/core/sock.c:1631
       skb_release_head_state+0xfc/0x250 net/core/skbuff.c:655
       skb_release_all+0x15/0x60 net/core/skbuff.c:668
       __kfree_skb+0x15/0x20 net/core/skbuff.c:684
       kfree_skb+0x16e/0x4e0 net/core/skbuff.c:705
       inet_frag_destroy+0x121/0x290 net/ipv4/inet_fragment.c:304
       inet_frag_put ./include/net/inet_frag.h:133
       nf_ct_frag6_gather+0x1125/0x38b0 net/ipv6/netfilter/nf_conntrack_reasm.c:617
       ipv6_defrag+0x21b/0x350 net/ipv6/netfilter/nf_defrag_ipv6_hooks.c:68
       nf_hook_entry_hookfn ./include/linux/netfilter.h:102
       nf_hook_slow+0xc3/0x290 net/netfilter/core.c:310
       nf_hook ./include/linux/netfilter.h:212
       __ip6_local_out+0x52c/0xaf0 net/ipv6/output_core.c:160
       ip6_local_out+0x2d/0x170 net/ipv6/output_core.c:170
       ip6_send_skb+0xa1/0x340 net/ipv6/ip6_output.c:1722
       ip6_push_pending_frames+0xb3/0xe0 net/ipv6/ip6_output.c:1742
       rawv6_push_pending_frames net/ipv6/raw.c:613
       rawv6_sendmsg+0x2cff/0x4130 net/ipv6/raw.c:927
       inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744
       sock_sendmsg_nosec net/socket.c:635
       sock_sendmsg+0xca/0x110 net/socket.c:645
       sock_write_iter+0x326/0x620 net/socket.c:848
       new_sync_write fs/read_write.c:499
       __vfs_write+0x483/0x760 fs/read_write.c:512
       vfs_write+0x187/0x530 fs/read_write.c:560
       SYSC_write fs/read_write.c:607
       SyS_write+0xfb/0x230 fs/read_write.c:599
       entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:203
      RIP: 0033:0x7ff26e6f5b79
      RSP: 002b:00007ff268e0ed98 EFLAGS: 00000206 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 00007ff268e0f9c0 RCX: 00007ff26e6f5b79
      RDX: 0000000000000010 RSI: 0000000020f50fe1 RDI: 0000000000000003
      RBP: 00007ff26ebc1220 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000206 R12: 0000000000000000
      R13: 00007ff268e0f9c0 R14: 00007ff26efec040 R15: 0000000000000003
      
      The buggy address belongs to the object at ffff880062da0000
       which belongs to the cache RAWv6 of size 1504
      The buggy address ffff880062da0060 is located 96 bytes inside
       of 1504-byte region [ffff880062da0000, ffff880062da05e0)
      
      Freed by task 4113:
       save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
       save_stack+0x43/0xd0 mm/kasan/kasan.c:502
       set_track mm/kasan/kasan.c:514
       kasan_slab_free+0x73/0xc0 mm/kasan/kasan.c:578
       slab_free_hook mm/slub.c:1352
       slab_free_freelist_hook mm/slub.c:1374
       slab_free mm/slub.c:2951
       kmem_cache_free+0xb2/0x2c0 mm/slub.c:2973
       sk_prot_free net/core/sock.c:1377
       __sk_destruct+0x49c/0x6e0 net/core/sock.c:1452
       sk_destruct+0x47/0x80 net/core/sock.c:1460
       __sk_free+0x57/0x230 net/core/sock.c:1468
       sk_free+0x23/0x30 net/core/sock.c:1479
       sock_put ./include/net/sock.h:1638
       sk_common_release+0x31e/0x4e0 net/core/sock.c:2782
       rawv6_close+0x54/0x80 net/ipv6/raw.c:1214
       inet_release+0xed/0x1c0 net/ipv4/af_inet.c:425
       inet6_release+0x50/0x70 net/ipv6/af_inet6.c:431
       sock_release+0x8d/0x1e0 net/socket.c:599
       sock_close+0x16/0x20 net/socket.c:1063
       __fput+0x332/0x7f0 fs/file_table.c:208
       ____fput+0x15/0x20 fs/file_table.c:244
       task_work_run+0x19b/0x270 kernel/task_work.c:116
       exit_task_work ./include/linux/task_work.h:21
       do_exit+0x186b/0x2800 kernel/exit.c:839
       do_group_exit+0x149/0x420 kernel/exit.c:943
       SYSC_exit_group kernel/exit.c:954
       SyS_exit_group+0x1d/0x20 kernel/exit.c:952
       entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:203
      
      Allocated by task 4115:
       save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
       save_stack+0x43/0xd0 mm/kasan/kasan.c:502
       set_track mm/kasan/kasan.c:514
       kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:605
       kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:544
       slab_post_alloc_hook mm/slab.h:432
       slab_alloc_node mm/slub.c:2708
       slab_alloc mm/slub.c:2716
       kmem_cache_alloc+0x1af/0x250 mm/slub.c:2721
       sk_prot_alloc+0x65/0x2a0 net/core/sock.c:1334
       sk_alloc+0x105/0x1010 net/core/sock.c:1396
       inet6_create+0x44d/0x1150 net/ipv6/af_inet6.c:183
       __sock_create+0x4f6/0x880 net/socket.c:1199
       sock_create net/socket.c:1239
       SYSC_socket net/socket.c:1269
       SyS_socket+0xf9/0x230 net/socket.c:1249
       entry_SYSCALL_64_fastpath+0x1f/0xc2 arch/x86/entry/entry_64.S:203
      
      Memory state around the buggy address:
       ffff880062d9ff00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
       ffff880062d9ff80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
      >ffff880062da0000: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
                                                             ^
       ffff880062da0080: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
       ffff880062da0100: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
      ==================================================================
      Reported-by: NAndrey Konovalov <andreyknvl@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      48cac18e
  9. 20 2月, 2017 1 次提交
  10. 16 2月, 2017 1 次提交
  11. 10 2月, 2017 10 次提交
    • J
      openvswitch: Pack struct sw_flow_key. · 316d4d78
      Jarno Rajahalme 提交于
      struct sw_flow_key has two 16-bit holes. Move the most matched
      conntrack match fields there.  In some typical cases this reduces the
      size of the key that needs to be hashed into half and into one cache
      line.
      Signed-off-by: NJarno Rajahalme <jarno@ovn.org>
      Acked-by: NJoe Stringer <joe@ovn.org>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      316d4d78
    • J
      openvswitch: Add force commit. · dd41d33f
      Jarno Rajahalme 提交于
      Stateful network admission policy may allow connections to one
      direction and reject connections initiated in the other direction.
      After policy change it is possible that for a new connection an
      overlapping conntrack entry already exists, where the original
      direction of the existing connection is opposed to the new
      connection's initial packet.
      
      Most importantly, conntrack state relating to the current packet gets
      the "reply" designation based on whether the original direction tuple
      or the reply direction tuple matched.  If this "directionality" is
      wrong w.r.t. to the stateful network admission policy it may happen
      that packets in neither direction are correctly admitted.
      
      This patch adds a new "force commit" option to the OVS conntrack
      action that checks the original direction of an existing conntrack
      entry.  If that direction is opposed to the current packet, the
      existing conntrack entry is deleted and a new one is subsequently
      created in the correct direction.
      Signed-off-by: NJarno Rajahalme <jarno@ovn.org>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Acked-by: NJoe Stringer <joe@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dd41d33f
    • J
      openvswitch: Add original direction conntrack tuple to sw_flow_key. · 9dd7f890
      Jarno Rajahalme 提交于
      Add the fields of the conntrack original direction 5-tuple to struct
      sw_flow_key.  The new fields are initially marked as non-existent, and
      are populated whenever a conntrack action is executed and either finds
      or generates a conntrack entry.  This means that these fields exist
      for all packets that were not rejected by conntrack as untrackable.
      
      The original tuple fields in the sw_flow_key are filled from the
      original direction tuple of the conntrack entry relating to the
      current packet, or from the original direction tuple of the master
      conntrack entry, if the current conntrack entry has a master.
      Generally, expected connections of connections having an assigned
      helper (e.g., FTP), have a master conntrack entry.
      
      The main purpose of the new conntrack original tuple fields is to
      allow matching on them for policy decision purposes, with the premise
      that the admissibility of tracked connections reply packets (as well
      as original direction packets), and both direction packets of any
      related connections may be based on ACL rules applying to the master
      connection's original direction 5-tuple.  This also makes it easier to
      make policy decisions when the actual packet headers might have been
      transformed by NAT, as the original direction 5-tuple represents the
      packet headers before any such transformation.
      
      When using the original direction 5-tuple the admissibility of return
      and/or related packets need not be based on the mere existence of a
      conntrack entry, allowing separation of admission policy from the
      established conntrack state.  While existence of a conntrack entry is
      required for admission of the return or related packets, policy
      changes can render connections that were initially admitted to be
      rejected or dropped afterwards.  If the admission of the return and
      related packets was based on mere conntrack state (e.g., connection
      being in an established state), a policy change that would make the
      connection rejected or dropped would need to find and delete all
      conntrack entries affected by such a change.  When using the original
      direction 5-tuple matching the affected conntrack entries can be
      allowed to time out instead, as the established state of the
      connection would not need to be the basis for packet admission any
      more.
      
      It should be noted that the directionality of related connections may
      be the same or different than that of the master connection, and
      neither the original direction 5-tuple nor the conntrack state bits
      carry this information.  If needed, the directionality of the master
      connection can be stored in master's conntrack mark or labels, which
      are automatically inherited by the expected related connections.
      
      The fact that neither ARP nor ND packets are trackable by conntrack
      allows mutual exclusion between ARP/ND and the new conntrack original
      tuple fields.  Hence, the IP addresses are overlaid in union with ARP
      and ND fields.  This allows the sw_flow_key to not grow much due to
      this patch, but it also means that we must be careful to never use the
      new key fields with ARP or ND packets.  ARP is easy to distinguish and
      keep mutually exclusive based on the ethernet type, but ND being an
      ICMPv6 protocol requires a bit more attention.
      Signed-off-by: NJarno Rajahalme <jarno@ovn.org>
      Acked-by: NJoe Stringer <joe@ovn.org>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9dd7f890
    • J
      openvswitch: Inherit master's labels. · 09aa98ad
      Jarno Rajahalme 提交于
      We avoid calling into nf_conntrack_in() for expected connections, as
      that would remove the expectation that we want to stick around until
      we are ready to commit the connection.  Instead, we do a lookup in the
      expectation table directly.  However, after a successful expectation
      lookup we have set the flow key label field from the master
      connection, whereas nf_conntrack_in() does not do this.  This leads to
      master's labels being inherited after an expectation lookup, but those
      labels not being inherited after the corresponding conntrack action
      with a commit flag.
      
      This patch resolves the problem by changing the commit code path to
      also inherit the master's labels to the expected connection.
      Resolving this conflict in favor of inheriting the labels allows more
      information be passed from the master connection to related
      connections, which would otherwise be much harder if the 32 bits in
      the connmark are not enough.  Labels can still be set explicitly, so
      this change only affects the default values of the labels in presense
      of a master connection.
      
      Fixes: 7f8a436e ("openvswitch: Add conntrack action")
      Signed-off-by: NJarno Rajahalme <jarno@ovn.org>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Acked-by: NJoe Stringer <joe@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      09aa98ad
    • J
      openvswitch: Refactor labels initialization. · 6ffcea79
      Jarno Rajahalme 提交于
      Refactoring conntrack labels initialization makes changes in later
      patches easier to review.
      Signed-off-by: NJarno Rajahalme <jarno@ovn.org>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Acked-by: NJoe Stringer <joe@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6ffcea79
    • J
      openvswitch: Simplify labels length logic. · b87cec38
      Jarno Rajahalme 提交于
      Since 23014011 ("netfilter: conntrack: support a fixed size of 128
      distinct labels"), the size of conntrack labels extension has fixed to
      128 bits, so we do not need to check for labels sizes shorter than 128
      at run-time.  This patch simplifies labels length logic accordingly,
      but allows the conntrack labels size to be increased in the future
      without breaking the build.  In the event of conntrack labels
      increasing in size OVS would still be able to deal with the 128 first
      label bits.
      Suggested-by: NJoe Stringer <joe@ovn.org>
      Signed-off-by: NJarno Rajahalme <jarno@ovn.org>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Acked-by: NJoe Stringer <joe@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b87cec38
    • J
      openvswitch: Unionize ovs_key_ct_label with a u32 array. · cb80d58f
      Jarno Rajahalme 提交于
      Make the array of labels in struct ovs_key_ct_label an union, adding a
      u32 array of the same byte size as the existing u8 array.  It is
      faster to loop through the labels 32 bits at the time, which is also
      the alignment of netlink attributes.
      Signed-off-by: NJarno Rajahalme <jarno@ovn.org>
      Acked-by: NJoe Stringer <joe@ovn.org>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cb80d58f
    • J
      openvswitch: Do not trigger events for unconfirmed connections. · 193e3096
      Jarno Rajahalme 提交于
      Receiving change events before the 'new' event for the connection has
      been received can be confusing.  Avoid triggering change events for
      setting conntrack mark or labels before the conntrack entry has been
      confirmed.
      
      Fixes: 182e3042 ("openvswitch: Allow matching on conntrack mark")
      Fixes: c2ac6673 ("openvswitch: Allow matching on conntrack label")
      Signed-off-by: NJarno Rajahalme <jarno@ovn.org>
      Acked-by: NJoe Stringer <joe@ovn.org>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      193e3096
    • J
      openvswitch: Use inverted tuple in ovs_ct_find_existing() if NATted. · 9ff464db
      Jarno Rajahalme 提交于
      The conntrack lookup for existing connections fails to invert the
      packet 5-tuple for NATted packets, and therefore fails to find the
      existing conntrack entry.  Conntrack only stores 5-tuples for incoming
      packets, and there are various situations where a lookup on a packet
      that has already been transformed by NAT needs to be made.  Looking up
      an existing conntrack entry upon executing packet received from the
      userspace is one of them.
      
      This patch fixes ovs_ct_find_existing() to invert the packet 5-tuple
      for the conntrack lookup whenever the packet has already been
      transformed by conntrack from its input form as evidenced by one of
      the NAT flags being set in the conntrack state metadata.
      
      Fixes: 05752523 ("openvswitch: Interface with NAT.")
      Signed-off-by: NJarno Rajahalme <jarno@ovn.org>
      Acked-by: NJoe Stringer <joe@ovn.org>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9ff464db
    • J
      openvswitch: Fix comments for skb->_nfct · 5e17da63
      Jarno Rajahalme 提交于
      Fix comments referring to skb 'nfct' and 'nfctinfo' fields now that
      they are combined into '_nfct'.
      Signed-off-by: NJarno Rajahalme <jarno@ovn.org>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Acked-by: NJoe Stringer <joe@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5e17da63
  12. 02 2月, 2017 2 次提交
  13. 30 1月, 2017 1 次提交
  14. 16 1月, 2017 1 次提交
    • L
      openvswitch: maintain correct checksum state in conntrack actions · 75f01a4c
      Lance Richardson 提交于
      When executing conntrack actions on skbuffs with checksum mode
      CHECKSUM_COMPLETE, the checksum must be updated to account for
      header pushes and pulls. Otherwise we get "hw csum failure"
      logs similar to this (ICMP packet received on geneve tunnel
      via ixgbe NIC):
      
      [  405.740065] genev_sys_6081: hw csum failure
      [  405.740106] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G          I     4.10.0-rc3+ #1
      [  405.740108] Call Trace:
      [  405.740110]  <IRQ>
      [  405.740113]  dump_stack+0x63/0x87
      [  405.740116]  netdev_rx_csum_fault+0x3a/0x40
      [  405.740118]  __skb_checksum_complete+0xcf/0xe0
      [  405.740120]  nf_ip_checksum+0xc8/0xf0
      [  405.740124]  icmp_error+0x1de/0x351 [nf_conntrack_ipv4]
      [  405.740132]  nf_conntrack_in+0xe1/0x550 [nf_conntrack]
      [  405.740137]  ? find_bucket.isra.2+0x62/0x70 [openvswitch]
      [  405.740143]  __ovs_ct_lookup+0x95/0x980 [openvswitch]
      [  405.740145]  ? netif_rx_internal+0x44/0x110
      [  405.740149]  ovs_ct_execute+0x147/0x4b0 [openvswitch]
      [  405.740153]  do_execute_actions+0x22e/0xa70 [openvswitch]
      [  405.740157]  ovs_execute_actions+0x40/0x120 [openvswitch]
      [  405.740161]  ovs_dp_process_packet+0x84/0x120 [openvswitch]
      [  405.740166]  ovs_vport_receive+0x73/0xd0 [openvswitch]
      [  405.740168]  ? udp_rcv+0x1a/0x20
      [  405.740170]  ? ip_local_deliver_finish+0x93/0x1e0
      [  405.740172]  ? ip_local_deliver+0x6f/0xe0
      [  405.740174]  ? ip_rcv_finish+0x3a0/0x3a0
      [  405.740176]  ? ip_rcv_finish+0xdb/0x3a0
      [  405.740177]  ? ip_rcv+0x2a7/0x400
      [  405.740180]  ? __netif_receive_skb_core+0x970/0xa00
      [  405.740185]  netdev_frame_hook+0xd3/0x160 [openvswitch]
      [  405.740187]  __netif_receive_skb_core+0x1dc/0xa00
      [  405.740194]  ? ixgbe_clean_rx_irq+0x46d/0xa20 [ixgbe]
      [  405.740197]  __netif_receive_skb+0x18/0x60
      [  405.740199]  netif_receive_skb_internal+0x40/0xb0
      [  405.740201]  napi_gro_receive+0xcd/0x120
      [  405.740204]  gro_cell_poll+0x57/0x80 [geneve]
      [  405.740206]  net_rx_action+0x260/0x3c0
      [  405.740209]  __do_softirq+0xc9/0x28c
      [  405.740211]  irq_exit+0xd9/0xf0
      [  405.740213]  do_IRQ+0x51/0xd0
      [  405.740215]  common_interrupt+0x93/0x93
      
      Fixes: 7f8a436e ("openvswitch: Add conntrack action")
      Signed-off-by: NLance Richardson <lrichard@redhat.com>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      75f01a4c
  15. 09 1月, 2017 1 次提交
  16. 28 12月, 2016 1 次提交
    • P
      openvswitch: upcall: Fix vlan handling. · df30f740
      pravin shelar 提交于
      Networking stack accelerate vlan tag handling by
      keeping topmost vlan header in skb. This works as
      long as packet remains in OVS datapath. But during
      OVS upcall vlan header is pushed on to the packet.
      When such packet is sent back to OVS datapath, core
      networking stack might not handle it correctly. Following
      patch avoids this issue by accelerating the vlan tag
      during flow key extract. This simplifies datapath by
      bringing uniform packet processing for packets from
      all code paths.
      
      Fixes: 5108bbad ("openvswitch: add processing of L3 packets").
      CC: Jarno Rajahalme <jarno@ovn.org>
      CC: Jiri Benc <jbenc@redhat.com>
      Signed-off-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      df30f740
  17. 21 12月, 2016 1 次提交
  18. 01 12月, 2016 1 次提交
  19. 18 11月, 2016 1 次提交
    • A
      netns: make struct pernet_operations::id unsigned int · c7d03a00
      Alexey Dobriyan 提交于
      Make struct pernet_operations::id unsigned.
      
      There are 2 reasons to do so:
      
      1)
      This field is really an index into an zero based array and
      thus is unsigned entity. Using negative value is out-of-bound
      access by definition.
      
      2)
      On x86_64 unsigned 32-bit data which are mixed with pointers
      via array indexing or offsets added or subtracted to pointers
      are preffered to signed 32-bit data.
      
      "int" being used as an array index needs to be sign-extended
      to 64-bit before being used.
      
      	void f(long *p, int i)
      	{
      		g(p[i]);
      	}
      
        roughly translates to
      
      	movsx	rsi, esi
      	mov	rdi, [rsi+...]
      	call 	g
      
      MOVSX is 3 byte instruction which isn't necessary if the variable is
      unsigned because x86_64 is zero extending by default.
      
      Now, there is net_generic() function which, you guessed it right, uses
      "int" as an array index:
      
      	static inline void *net_generic(const struct net *net, int id)
      	{
      		...
      		ptr = ng->ptr[id - 1];
      		...
      	}
      
      And this function is used a lot, so those sign extensions add up.
      
      Patch snipes ~1730 bytes on allyesconfig kernel (without all junk
      messing with code generation):
      
      	add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
      
      Unfortunately some functions actually grow bigger.
      This is a semmingly random artefact of code generation with register
      allocator being used differently. gcc decides that some variable
      needs to live in new r8+ registers and every access now requires REX
      prefix. Or it is shifted into r12, so [r12+0] addressing mode has to be
      used which is longer than [r8]
      
      However, overall balance is in negative direction:
      
      	add/remove: 0/0 grow/shrink: 70/598 up/down: 396/-2126 (-1730)
      	function                                     old     new   delta
      	nfsd4_lock                                  3886    3959     +73
      	tipc_link_build_proto_msg                   1096    1140     +44
      	mac80211_hwsim_new_radio                    2776    2808     +32
      	tipc_mon_rcv                                1032    1058     +26
      	svcauth_gss_legacy_init                     1413    1429     +16
      	tipc_bcbase_select_primary                   379     392     +13
      	nfsd4_exchange_id                           1247    1260     +13
      	nfsd4_setclientid_confirm                    782     793     +11
      		...
      	put_client_renew_locked                      494     480     -14
      	ip_set_sockfn_get                            730     716     -14
      	geneve_sock_add                              829     813     -16
      	nfsd4_sequence_done                          721     703     -18
      	nlmclnt_lookup_host                          708     686     -22
      	nfsd4_lockt                                 1085    1063     -22
      	nfs_get_client                              1077    1050     -27
      	tcf_bpf_init                                1106    1076     -30
      	nfsd4_encode_fattr                          5997    5930     -67
      	Total: Before=154856051, After=154854321, chg -0.00%
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c7d03a00
  20. 13 11月, 2016 8 次提交