1. 11 2月, 2017 11 次提交
  2. 10 2月, 2017 29 次提交
    • D
      a507c346
    • J
      openvswitch: Pack struct sw_flow_key. · 316d4d78
      Jarno Rajahalme 提交于
      struct sw_flow_key has two 16-bit holes. Move the most matched
      conntrack match fields there.  In some typical cases this reduces the
      size of the key that needs to be hashed into half and into one cache
      line.
      Signed-off-by: NJarno Rajahalme <jarno@ovn.org>
      Acked-by: NJoe Stringer <joe@ovn.org>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      316d4d78
    • J
      openvswitch: Add force commit. · dd41d33f
      Jarno Rajahalme 提交于
      Stateful network admission policy may allow connections to one
      direction and reject connections initiated in the other direction.
      After policy change it is possible that for a new connection an
      overlapping conntrack entry already exists, where the original
      direction of the existing connection is opposed to the new
      connection's initial packet.
      
      Most importantly, conntrack state relating to the current packet gets
      the "reply" designation based on whether the original direction tuple
      or the reply direction tuple matched.  If this "directionality" is
      wrong w.r.t. to the stateful network admission policy it may happen
      that packets in neither direction are correctly admitted.
      
      This patch adds a new "force commit" option to the OVS conntrack
      action that checks the original direction of an existing conntrack
      entry.  If that direction is opposed to the current packet, the
      existing conntrack entry is deleted and a new one is subsequently
      created in the correct direction.
      Signed-off-by: NJarno Rajahalme <jarno@ovn.org>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Acked-by: NJoe Stringer <joe@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dd41d33f
    • J
      openvswitch: Add original direction conntrack tuple to sw_flow_key. · 9dd7f890
      Jarno Rajahalme 提交于
      Add the fields of the conntrack original direction 5-tuple to struct
      sw_flow_key.  The new fields are initially marked as non-existent, and
      are populated whenever a conntrack action is executed and either finds
      or generates a conntrack entry.  This means that these fields exist
      for all packets that were not rejected by conntrack as untrackable.
      
      The original tuple fields in the sw_flow_key are filled from the
      original direction tuple of the conntrack entry relating to the
      current packet, or from the original direction tuple of the master
      conntrack entry, if the current conntrack entry has a master.
      Generally, expected connections of connections having an assigned
      helper (e.g., FTP), have a master conntrack entry.
      
      The main purpose of the new conntrack original tuple fields is to
      allow matching on them for policy decision purposes, with the premise
      that the admissibility of tracked connections reply packets (as well
      as original direction packets), and both direction packets of any
      related connections may be based on ACL rules applying to the master
      connection's original direction 5-tuple.  This also makes it easier to
      make policy decisions when the actual packet headers might have been
      transformed by NAT, as the original direction 5-tuple represents the
      packet headers before any such transformation.
      
      When using the original direction 5-tuple the admissibility of return
      and/or related packets need not be based on the mere existence of a
      conntrack entry, allowing separation of admission policy from the
      established conntrack state.  While existence of a conntrack entry is
      required for admission of the return or related packets, policy
      changes can render connections that were initially admitted to be
      rejected or dropped afterwards.  If the admission of the return and
      related packets was based on mere conntrack state (e.g., connection
      being in an established state), a policy change that would make the
      connection rejected or dropped would need to find and delete all
      conntrack entries affected by such a change.  When using the original
      direction 5-tuple matching the affected conntrack entries can be
      allowed to time out instead, as the established state of the
      connection would not need to be the basis for packet admission any
      more.
      
      It should be noted that the directionality of related connections may
      be the same or different than that of the master connection, and
      neither the original direction 5-tuple nor the conntrack state bits
      carry this information.  If needed, the directionality of the master
      connection can be stored in master's conntrack mark or labels, which
      are automatically inherited by the expected related connections.
      
      The fact that neither ARP nor ND packets are trackable by conntrack
      allows mutual exclusion between ARP/ND and the new conntrack original
      tuple fields.  Hence, the IP addresses are overlaid in union with ARP
      and ND fields.  This allows the sw_flow_key to not grow much due to
      this patch, but it also means that we must be careful to never use the
      new key fields with ARP or ND packets.  ARP is easy to distinguish and
      keep mutually exclusive based on the ethernet type, but ND being an
      ICMPv6 protocol requires a bit more attention.
      Signed-off-by: NJarno Rajahalme <jarno@ovn.org>
      Acked-by: NJoe Stringer <joe@ovn.org>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9dd7f890
    • J
      openvswitch: Inherit master's labels. · 09aa98ad
      Jarno Rajahalme 提交于
      We avoid calling into nf_conntrack_in() for expected connections, as
      that would remove the expectation that we want to stick around until
      we are ready to commit the connection.  Instead, we do a lookup in the
      expectation table directly.  However, after a successful expectation
      lookup we have set the flow key label field from the master
      connection, whereas nf_conntrack_in() does not do this.  This leads to
      master's labels being inherited after an expectation lookup, but those
      labels not being inherited after the corresponding conntrack action
      with a commit flag.
      
      This patch resolves the problem by changing the commit code path to
      also inherit the master's labels to the expected connection.
      Resolving this conflict in favor of inheriting the labels allows more
      information be passed from the master connection to related
      connections, which would otherwise be much harder if the 32 bits in
      the connmark are not enough.  Labels can still be set explicitly, so
      this change only affects the default values of the labels in presense
      of a master connection.
      
      Fixes: 7f8a436e ("openvswitch: Add conntrack action")
      Signed-off-by: NJarno Rajahalme <jarno@ovn.org>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Acked-by: NJoe Stringer <joe@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      09aa98ad
    • J
      openvswitch: Refactor labels initialization. · 6ffcea79
      Jarno Rajahalme 提交于
      Refactoring conntrack labels initialization makes changes in later
      patches easier to review.
      Signed-off-by: NJarno Rajahalme <jarno@ovn.org>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Acked-by: NJoe Stringer <joe@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6ffcea79
    • J
      openvswitch: Simplify labels length logic. · b87cec38
      Jarno Rajahalme 提交于
      Since 23014011 ("netfilter: conntrack: support a fixed size of 128
      distinct labels"), the size of conntrack labels extension has fixed to
      128 bits, so we do not need to check for labels sizes shorter than 128
      at run-time.  This patch simplifies labels length logic accordingly,
      but allows the conntrack labels size to be increased in the future
      without breaking the build.  In the event of conntrack labels
      increasing in size OVS would still be able to deal with the 128 first
      label bits.
      Suggested-by: NJoe Stringer <joe@ovn.org>
      Signed-off-by: NJarno Rajahalme <jarno@ovn.org>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Acked-by: NJoe Stringer <joe@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b87cec38
    • J
      openvswitch: Unionize ovs_key_ct_label with a u32 array. · cb80d58f
      Jarno Rajahalme 提交于
      Make the array of labels in struct ovs_key_ct_label an union, adding a
      u32 array of the same byte size as the existing u8 array.  It is
      faster to loop through the labels 32 bits at the time, which is also
      the alignment of netlink attributes.
      Signed-off-by: NJarno Rajahalme <jarno@ovn.org>
      Acked-by: NJoe Stringer <joe@ovn.org>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cb80d58f
    • J
      openvswitch: Do not trigger events for unconfirmed connections. · 193e3096
      Jarno Rajahalme 提交于
      Receiving change events before the 'new' event for the connection has
      been received can be confusing.  Avoid triggering change events for
      setting conntrack mark or labels before the conntrack entry has been
      confirmed.
      
      Fixes: 182e3042 ("openvswitch: Allow matching on conntrack mark")
      Fixes: c2ac6673 ("openvswitch: Allow matching on conntrack label")
      Signed-off-by: NJarno Rajahalme <jarno@ovn.org>
      Acked-by: NJoe Stringer <joe@ovn.org>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      193e3096
    • J
      openvswitch: Use inverted tuple in ovs_ct_find_existing() if NATted. · 9ff464db
      Jarno Rajahalme 提交于
      The conntrack lookup for existing connections fails to invert the
      packet 5-tuple for NATted packets, and therefore fails to find the
      existing conntrack entry.  Conntrack only stores 5-tuples for incoming
      packets, and there are various situations where a lookup on a packet
      that has already been transformed by NAT needs to be made.  Looking up
      an existing conntrack entry upon executing packet received from the
      userspace is one of them.
      
      This patch fixes ovs_ct_find_existing() to invert the packet 5-tuple
      for the conntrack lookup whenever the packet has already been
      transformed by conntrack from its input form as evidenced by one of
      the NAT flags being set in the conntrack state metadata.
      
      Fixes: 05752523 ("openvswitch: Interface with NAT.")
      Signed-off-by: NJarno Rajahalme <jarno@ovn.org>
      Acked-by: NJoe Stringer <joe@ovn.org>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9ff464db
    • J
      openvswitch: Fix comments for skb->_nfct · 5e17da63
      Jarno Rajahalme 提交于
      Fix comments referring to skb 'nfct' and 'nfctinfo' fields now that
      they are combined into '_nfct'.
      Signed-off-by: NJarno Rajahalme <jarno@ovn.org>
      Acked-by: NPravin B Shelar <pshelar@ovn.org>
      Acked-by: NJoe Stringer <joe@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5e17da63
    • D
      Merge branch 'ena-bug-fixes' · 9878f602
      David S. Miller 提交于
      Netanel Belgazal says:
      
      ====================
      Bug Fixes in ENA driver
      
      Changes from V3:
      * Rebase patchset to master and solve merge conflicts.
      * Remove redundant bug fix (fix error handling when probe fails)
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9878f602
    • N
      3e5d6897
    • N
      net/ena: change condition for host attribute configuration · dd8427a7
      Netanel Belgazal 提交于
      Move the host info config to be the first admin command that is executed.
      This change require the driver to remove the 'feature check'
      from host info configuration flow.
      The check is removed since the supported features bitmask field
      is retrieved only after calling ENA_ADMIN_DEVICE_ATTRIBUTES admin command.
      
      If set host info is not supported an error will be returned by the device.
      Signed-off-by: NNetanel Belgazal <netanel@annapurnalabs.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dd8427a7
    • N
      net/ena: change driver's default timeouts · 7102a18a
      Netanel Belgazal 提交于
      The timeouts were too agressive and sometimes cause false alarms.
      Signed-off-by: NNetanel Belgazal <netanel@annapurnalabs.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7102a18a
    • N
    • N
      net/ena: use READ_ONCE to access completion descriptors · a8496eb8
      Netanel Belgazal 提交于
      Completion descriptors are accessed from the driver and from the device.
      To avoid reading the old value, use READ_ONCE macro.
      Signed-off-by: NNetanel Belgazal <netanel@annapurnalabs.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a8496eb8
    • N
      net/ena: use napi_complete_done() return value · b1669c9f
      Netanel Belgazal 提交于
      Do not unamsk interrupts if we are in busy poll mode.
      Signed-off-by: NNetanel Belgazal <netanel@annapurnalabs.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b1669c9f
    • N
      net/ena: fix potential access to freed memory during device reset · 3f6159db
      Netanel Belgazal 提交于
      If the ena driver detects that the device is not behave as expected,
      it tries to reset the device.
      The reset flow calls ena_down, which will frees all the resources
      the driver allocates and then it will reset the device.
      
      This flow can cause memory corruption if the device is still writes
      to the driver's memory space.
      To overcome this potential race, move the reset before the device
      resources are freed.
      Signed-off-by: NNetanel Belgazal <netanel@annapurnalabs.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3f6159db
    • N
      net/ena: refactor ena_get_stats64 to be atomic context safe · d81db240
      Netanel Belgazal 提交于
      ndo_get_stat64() can be called from atomic context, but the current
      implementation sends an admin command to retrieve the statistics from
      the device. This admin command can sleep.
      
      This patch re-factors the implementation of ena_get_stats64() to use
      the {rx,tx}bytes/count from the driver's inner counters, and to obtain
      the rx drop counter from the asynchronous keep alive (heart bit)
      event.
      Signed-off-by: NNetanel Belgazal <netanel@annapurnalabs.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d81db240
    • N
      net/ena: fix NULL dereference when removing the driver after device reset failed · 22b331c9
      Netanel Belgazal 提交于
      If for some reason the device stops responding, and the device reset
      failes to recover the device, the mmio register read data structure
      will not be reinitialized.
      
      On driver removal, the driver will also try to reset the device, but
      this time the mmio data structure will be NULL.
      
      To solve this issue, perform the device reset in the remove function
      only if the device is runnig.
      
      Crash log
         54.240382] BUG: unable to handle kernel NULL pointer dereference at           (null)
      [   54.244186] IP: [<ffffffffc067de5a>] ena_com_reg_bar_read32+0x8a/0x180 [ena_drv]
      [   54.244186] PGD 0
      [   54.244186] Oops: 0002 [#1] SMP
      [   54.244186] Modules linked in: ena_drv(OE-) snd_hda_codec_generic kvm_intel kvm crct10dif_pclmul ppdev crc32_pclmul ghash_clmulni_intel aesni_intel snd_hda_intel aes_x86_64 snd_hda_controller lrw gf128mul cirrus glue_helper ablk_helper ttm snd_hda_codec drm_kms_helper cryptd snd_hwdep drm snd_pcm pvpanic snd_timer syscopyarea sysfillrect snd parport_pc sysimgblt serio_raw soundcore i2c_piix4 mac_hid lp parport psmouse floppy
      [   54.244186] CPU: 5 PID: 1841 Comm: rmmod Tainted: G           OE 3.16.0-031600-generic #201408031935
      [   54.244186] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011
      [   54.244186] task: ffff880135852880 ti: ffff8800bb640000 task.ti: ffff8800bb640000
      [   54.244186] RIP: 0010:[<ffffffffc067de5a>]  [<ffffffffc067de5a>] ena_com_reg_bar_read32+0x8a/0x180 [ena_drv]
      [   54.244186] RSP: 0018:ffff8800bb643d50  EFLAGS: 00010083
      [   54.244186] RAX: 000000000000deb0 RBX: 0000000000030d40 RCX: 0000000000000003
      [   54.244186] RDX: 0000000000000202 RSI: 0000000000000058 RDI: ffffc90000775104
      [   54.244186] RBP: ffff8800bb643d88 R08: 0000000000000000 R09: cf00000000000000
      [   54.244186] R10: 0000000fffffffe0 R11: 0000000000000001 R12: 0000000000000000
      [   54.244186] R13: ffffc90000765000 R14: ffffc90000775104 R15: 00007fca1fa98090
      [   54.244186] FS:  00007fca1f1bd740(0000) GS:ffff88013fd40000(0000) knlGS:0000000000000000
      [   54.244186] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [   54.244186] CR2: 0000000000000000 CR3: 00000000b9cf6000 CR4: 00000000001406e0
      [   54.244186] Stack:
      [   54.244186]  0000000000000202 0000005800000286 ffffc90000765000 ffffc90000765000
      [   54.244186]  ffff880135f6b000 ffff8800b9360000 00007fca1fa98090 ffff8800bb643db8
      [   54.244186]  ffffffffc0680b3d ffff8800b93608c0 ffffc90000765000 ffff880135f6b000
      [   54.244186] Call Trace:
      [   54.244186]  [<ffffffffc0680b3d>] ena_com_dev_reset+0x1d/0x1b0 [ena_drv]
      [   54.244186]  [<ffffffffc0678497>] ena_remove+0xa7/0x130 [ena_drv]
      [   54.244186]  [<ffffffff813d4df6>] pci_device_remove+0x46/0xc0
      [   54.244186]  [<ffffffff814c3b7f>] __device_release_driver+0x7f/0xf0
      [   54.244186]  [<ffffffff814c4738>] driver_detach+0xc8/0xd0
      [   54.244186]  [<ffffffff814c3969>] bus_remove_driver+0x59/0xd0
      [   54.244186]  [<ffffffff814c4fde>] driver_unregister+0x2e/0x60
      [   54.244186]  [<ffffffff810f0a80>] ? show_refcnt+0x40/0x40
      [   54.244186]  [<ffffffff813d4ec3>] pci_unregister_driver+0x23/0xa0
      [   54.244186]  [<ffffffffc068413f>] ena_cleanup+0x10/0xed1 [ena_drv]
      [   54.244186]  [<ffffffff810f3a47>] SyS_delete_module+0x157/0x1e0
      [   54.244186]  [<ffffffff81014fb7>] ? do_notify_resume+0xc7/0xd0
      [   54.244186]  [<ffffffff81793fad>] system_call_fastpath+0x1a/0x1f
      [   54.244186] Code: c3 4d 8d b5 04 01 01 00 4c 89 f7 e8 e1 5a 11 c1 48 89 45 c8 41 0f b7 85 00 01 01 00 8d 48 01 66 2d 52 21 66 41 89 8d 00 01 01 00 <66> 41 89 04 24 0f b7 45 d4 89 45 d0 89 c1 41 0f b7 85 00 01 01
      [   54.244186] RIP  [<ffffffffc067de5a>] ena_com_reg_bar_read32+0x8a/0x180 [ena_drv]
      [   54.244186]  RSP <ffff8800bb643d50>
      [   54.244186] CR2: 0000000000000000
      [   54.244186] ---[ end trace 18dd9889b6497810 ]---
      Signed-off-by: NNetanel Belgazal <netanel@annapurnalabs.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      22b331c9
    • N
      net/ena: fix RSS default hash configuration · 422e21e7
      Netanel Belgazal 提交于
      ENA default hash configures IPv4_frag hash twice instead of
      configure non-IP packets.
      
      The bug caused IPv4 fragmented packets to be calculated based on
      L2 source and destination address instead of L3 source and destination.
      IPv4 packets can reach to the wrong Rx queue.
      Signed-off-by: NNetanel Belgazal <netanel@annapurnalabs.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      422e21e7
    • N
      net/ena: fix ethtool RSS flow configuration · 6e2de20d
      Netanel Belgazal 提交于
      ena_flow_data_to_flow_hash and ena_flow_hash_to_flow_type
      treat the ena_flow_hash_to_flow_type enum as power of two values.
      
      Change the values of ena_admin_flow_hash_fields to be power of two values.
      
      This bug effect the ethtool set/get rxnfc.
      ethtool will report wrong values hash fields for get and will
      configure wrong hash fields in set.
      Signed-off-by: NNetanel Belgazal <netanel@annapurnalabs.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6e2de20d
    • N
      net/ena: fix queues number calculation · 6a1ce2fb
      Netanel Belgazal 提交于
      The ENA driver tries to open a queue per vCPU.
      To determine how many vCPUs the instance have it uses num_possible_cpus()
      while it should have use num_online_cpus() instead.
      Signed-off-by: NNetanel Belgazal <netanel@annapurnalabs.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6a1ce2fb
    • N
      net/ena: remove ntuple filter support from device feature list · fdeea0ad
      Netanel Belgazal 提交于
      Remove NETIF_F_NTUPLE from netdev->features.
      The ENA device driver does not support ntuple filtering.
      Signed-off-by: NNetanel Belgazal <netanel@annapurnalabs.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fdeea0ad
    • D
      Merge branch 'enic-vxlan-offload' · b66a8043
      David S. Miller 提交于
      Govindarajulu Varadarajan says:
      
      ====================
      enic: add vxlan offload support
      
      This series adds vxlan offload support for enic driver. The first
      patch adds vxlan devcmd for configuring vxland offload parameters.
      Second patch adds ndo_udp_tunnel_add/del and offload on rx path.
      There are to modes in which fw supports vxlan offload.
      
      mode 0: fcoe bit is set for encapsulated packet. fcoe_fc_crc_ok is set
      if checksum of csum is ok. This bit is or of ip_csum_ok and
      tcp_udp_csum_ok
      
      mode 2: BIT(0) in rss_hash is set if it is encapsulated packet.
              BIT(1) is set if outer_ip_csum_ok/
              BIT(2) is set if outer_tcp_csum_ok
      
      Some hw supports only mode 0, some support mode 0 and 2. Driver gets
      the supported modes bitmap using get_supported_feature_ver devcmd
      and selects the highest mode both driver and fw supports.
      
      Third patch adds offload support on tx path by adding
      enic_features_check().
      
      v2: Order local variable declarations from longest to shortest line,
          on all three patches.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b66a8043
    • G
      enic: add vxlan offload on tx path · 9c744d10
      Govindarajulu Varadarajan 提交于
      Define ndo_features_check. Hw supports offload only for ipv4 inner and
      ipv4 outer pkt.
      
      Code refactor for setting inner tcp pseudo csum.
      Signed-off-by: NGovindarajulu Varadarajan <gvaradar@cisco.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9c744d10
    • G
      enic: add udp_tunnel ndo for vxlan offload · 257e7382
      Govindarajulu Varadarajan 提交于
      Defines enic_udp_tunnel_add/del for configuring vxlan tunnel offload.
      enic supports offload of only one ipv4/udp port.
      
      There are two modes that fw supports for vxlan offload.
      
      mode 0: fcoe bit is set for encapsulated packet. fcoe_fc_crc_ok is set
      if checksum of csum is ok. This bit is or of ip_csum_ok and
      tcp_udp_csum_ok
      
      mode 2: BIT(0) in rss_hash is set if it is encapsulated packet.
      	BIT(1) is set if outer_ip_csum_ok/
      	BIT(2) is set if outer_tcp_csum_ok
      
      tcp_udp_csum_ok/ipv4_csum_ok is set if inner csum is OK.
      Signed-off-by: NGovindarajulu Varadarajan <gvaradar@cisco.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      257e7382
    • G
      enic: add devcmds for vxlan offload · ca029179
      Govindarajulu Varadarajan 提交于
      This patch adds devcmds needed for vxlan offload. Implement 3 new devcmd
      
      overlay_offload_ctrl: enable/disable offload
      overlay_offload_cfg: update offload udp port number
      get_supported_feature_ver: get hw supported offload version. Each
      			   version has different bitmap for csum_ok/encap
      Signed-off-by: NGovindarajulu Varadarajan <gvaradar@cisco.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ca029179