1. 18 11月, 2016 1 次提交
  2. 10 11月, 2016 5 次提交
  3. 08 11月, 2016 1 次提交
    • J
      qdisc: catch misconfig of attaching qdisc to tx_queue_len zero device · 84c46dd8
      Jesper Dangaard Brouer 提交于
      It is a clear misconfiguration to attach a qdisc to a device with
      tx_queue_len zero, because some qdisc's (namely, pfifo, bfifo, gred,
      htb, plug and sfb) inherit/copy this value as their queue length.
      
      Why should the kernel catch such a misconfiguration?  Because prior to
      introducing the IFF_NO_QUEUE device flag, userspace found a loophole
      in the qdisc config system that allowed them to achieve the equivalent
      of IFF_NO_QUEUE, which is to remove the qdisc code path entirely from
      a device.  The loophole on older kernels is setting tx_queue_len=0,
      *prior* to device qdisc init (the config time is significant, simply
      setting tx_queue_len=0 doesn't trigger the loophole).
      
      This loophole is currently used by Docker[1] to get better performance
      and scalability out of the veth device.  The Docker developers were
      warned[1] that they needed to adjust the tx_queue_len if ever
      attaching a qdisc.  The OpenShift project didn't remember this warning
      and attached a qdisc, this were caught and fixed in[2].
      
      [1] https://github.com/docker/libcontainer/pull/193
      [2] https://github.com/openshift/origin/pull/11126
      
      Instead of fixing every userspace program that used this loophole, and
      forgot to reset the tx_queue_len, prior to attaching a qdisc.  Let's
      catch the misconfiguration on the kernel side.
      Signed-off-by: NJesper Dangaard Brouer <brouer@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      84c46dd8
  4. 04 11月, 2016 1 次提交
  5. 03 11月, 2016 3 次提交
  6. 30 10月, 2016 2 次提交
  7. 28 10月, 2016 2 次提交
    • J
      net sched filters: fix notification of filter delete with proper handle · 9ee78374
      Jamal Hadi Salim 提交于
      Daniel says:
      
      While trying out [1][2], I noticed that tc monitor doesn't show the
      correct handle on delete:
      
      $ tc monitor
      qdisc clsact ffff: dev eno1 parent ffff:fff1
      filter dev eno1 ingress protocol all pref 49152 bpf handle 0x2a [...]
      deleted filter dev eno1 ingress protocol all pref 49152 bpf handle 0xf3be0c80
      
      some context to explain the above:
      The user identity of any tc filter is represented by a 32-bit
      identifier encoded in tcm->tcm_handle. Example 0x2a in the bpf filter
      above. A user wishing to delete, get or even modify a specific filter
      uses this handle to reference it.
      Every classifier is free to provide its own semantics for the 32 bit handle.
      Example: classifiers like u32 use schemes like 800:1:801 to describe
      the semantics of their filters represented as hash table, bucket and
      node ids etc.
      Classifiers also have internal per-filter representation which is different
      from this externally visible identity. Most classifiers set this
      internal representation to be a pointer address (which allows fast retrieval
      of said filters in their implementations). This internal representation
      is referenced with the "fh" variable in the kernel control code.
      
      When a user successfuly deletes a specific filter, by specifying the correct
      tcm->tcm_handle, an event is generated to user space which indicates
      which specific filter was deleted.
      
      Before this patch, the "fh" value was sent to user space as the identity.
      As an example what is shown in the sample bpf filter delete event above
      is 0xf3be0c80. This is infact a 32-bit truncation of 0xffff8807f3be0c80
      which happens to be a 64-bit memory address of the internal filter
      representation (address of the corresponding filter's struct cls_bpf_prog);
      
      After this patch the appropriate user identifiable handle as encoded
      in the originating request tcm->tcm_handle is generated in the event.
      One of the cardinal rules of netlink rules is to be able to take an
      event (such as a delete in this case) and reflect it back to the
      kernel and successfully delete the filter. This patch achieves that.
      
      Note, this issue has existed since the original TC action
      infrastructure code patch back in 2004 as found in:
      https://git.kernel.org/cgit/linux/kernel/git/history/history.git/commit/
      
      [1] http://patchwork.ozlabs.org/patch/682828/
      [2] http://patchwork.ozlabs.org/patch/682829/
      
      Fixes: 4e54c4816bfe ("[NET]: Add tc extensions infrastructure.")
      Reported-by: NDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9ee78374
    • A
      skbedit: allow the user to specify bitmask for mark · 4fe77d82
      Antonio Quartulli 提交于
      The user may want to use only some bits of the skb mark in
      his skbedit rules because the remaining part might be used by
      something else.
      
      Introduce the "mask" parameter to the skbedit actor in order
      to implement such functionality.
      
      When the mask is specified, only those bits selected by the
      latter are altered really changed by the actor, while the
      rest is left untouched.
      Signed-off-by: NAntonio Quartulli <antonio@open-mesh.com>
      Signed-off-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4fe77d82
  8. 27 10月, 2016 1 次提交
  9. 24 10月, 2016 1 次提交
    • S
      net/sched: em_meta: Fix 'meta vlan' to correctly recognize zero VID frames · d65f2fa6
      Shmulik Ladkani 提交于
      META_COLLECTOR int_vlan_tag() assumes that if the accel tag (vlan_tci)
      is zero, then no vlan accel tag is present.
      
      This is incorrect for zero VID vlan accel packets, making the following
      match fail:
        tc filter add ... basic match 'meta(vlan mask 0xfff eq 0)' ...
      
      Apparently 'int_vlan_tag' was implemented prior VLAN_TAG_PRESENT was
      introduced in 05423b24 "vlan: allow null VLAN ID to be used"
      (and at time introduced, the 'vlan_tx_tag_get' call in em_meta was not
       adapted).
      
      Fix, testing skb_vlan_tag_present instead of testing skb_vlan_tag_get's
      value.
      
      Fixes: 05423b24 ("vlan: allow null VLAN ID to be used")
      Fixes: 1a31f204 ("netsched: Allow meta match on vlan tag on receive")
      Signed-off-by: NShmulik Ladkani <shmulik.ladkani@gmail.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d65f2fa6
  10. 21 10月, 2016 1 次提交
    • J
      net: use core MTU range checking in core net infra · 91572088
      Jarod Wilson 提交于
      geneve:
      - Merge __geneve_change_mtu back into geneve_change_mtu, set max_mtu
      - This one isn't quite as straight-forward as others, could use some
        closer inspection and testing
      
      macvlan:
      - set min/max_mtu
      
      tun:
      - set min/max_mtu, remove tun_net_change_mtu
      
      vxlan:
      - Merge __vxlan_change_mtu back into vxlan_change_mtu
      - Set max_mtu to IP_MAX_MTU and retain dynamic MTU range checks in
        change_mtu function
      - This one is also not as straight-forward and could use closer inspection
        and testing from vxlan folks
      
      bridge:
      - set max_mtu of IP_MAX_MTU and retain dynamic MTU range checks in
        change_mtu function
      
      openvswitch:
      - set min/max_mtu, remove internal_dev_change_mtu
      - note: max_mtu wasn't checked previously, it's been set to 65535, which
        is the largest possible size supported
      
      sch_teql:
      - set min/max_mtu (note: max_mtu previously unchecked, used max of 65535)
      
      macsec:
      - min_mtu = 0, max_mtu = 65535
      
      macvlan:
      - min_mtu = 0, max_mtu = 65535
      
      ntb_netdev:
      - min_mtu = 0, max_mtu = 65535
      
      veth:
      - min_mtu = 68, max_mtu = 65535
      
      8021q:
      - min_mtu = 0, max_mtu = 65535
      
      CC: netdev@vger.kernel.org
      CC: Nicolas Dichtel <nicolas.dichtel@6wind.com>
      CC: Hannes Frederic Sowa <hannes@stressinduktion.org>
      CC: Tom Herbert <tom@herbertland.com>
      CC: Daniel Borkmann <daniel@iogearbox.net>
      CC: Alexander Duyck <alexander.h.duyck@intel.com>
      CC: Paolo Abeni <pabeni@redhat.com>
      CC: Jiri Benc <jbenc@redhat.com>
      CC: WANG Cong <xiyou.wangcong@gmail.com>
      CC: Roopa Prabhu <roopa@cumulusnetworks.com>
      CC: Pravin B Shelar <pshelar@ovn.org>
      CC: Sabrina Dubroca <sd@queasysnail.net>
      CC: Patrick McHardy <kaber@trash.net>
      CC: Stephen Hemminger <stephen@networkplumber.org>
      CC: Pravin Shelar <pshelar@nicira.com>
      CC: Maxim Krasnyansky <maxk@qti.qualcomm.com>
      Signed-off-by: NJarod Wilson <jarod@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      91572088
  11. 20 10月, 2016 1 次提交
  12. 14 10月, 2016 3 次提交
  13. 13 10月, 2016 2 次提交
  14. 04 10月, 2016 1 次提交
    • S
      net/sched: act_vlan: Push skb->data to mac_header prior calling skb_vlan_*() functions · f39acc84
      Shmulik Ladkani 提交于
      Generic skb_vlan_push/skb_vlan_pop functions don't properly handle the
      case where the input skb data pointer does not point at the mac header:
      
      - They're doing push/pop, but fail to properly unwind data back to its
        original location.
        For example, in the skb_vlan_push case, any subsequent
        'skb_push(skb, skb->mac_len)' calls make the skb->data point 4 bytes
        BEFORE start of frame, leading to bogus frames that may be transmitted.
      
      - They update rcsum per the added/removed 4 bytes tag.
        Alas if data is originally after the vlan/eth headers, then these
        bytes were already pulled out of the csum.
      
      OTOH calling skb_vlan_push/skb_vlan_pop with skb->data at mac_header
      present no issues.
      
      act_vlan is the only caller to skb_vlan_*() that has skb->data pointing
      at network header (upon ingress).
      Other calles (ovs, bpf) already adjust skb->data at mac_header.
      
      This patch fixes act_vlan to point to the mac_header prior calling
      skb_vlan_*() functions, as other callers do.
      Signed-off-by: NShmulik Ladkani <shmulik.ladkani@gmail.com>
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Pravin Shelar <pshelar@ovn.org>
      Cc: Jiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f39acc84
  15. 28 9月, 2016 1 次提交
  16. 27 9月, 2016 2 次提交
    • Y
      act_ife: Fix false encoding · c006da0b
      Yotam Gigi 提交于
      On ife encode side, the action stores the different tlvs inside the ife
      header, where each tlv length field should refer to the length of the
      whole tlv (without additional padding) and not just the data length.
      
      On ife decode side, the action iterates over the tlvs in the ife header
      and parses them one by one, where in each iteration the current pointer is
      advanced according to the tlv size.
      
      Before, the encoding encoded only the data length inside the tlv, which led
      to false parsing of ife the header. In addition, due to the fact that the
      loop counter was unsigned, it could lead to infinite parsing loop.
      
      This fix changes the loop counter to be signed and fixes the encoding to
      take into account the tlv type and size.
      
      Fixes: 28a10c42 ("net sched: fix encoding to use real length")
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NYotam Gigi <yotamg@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c006da0b
    • Y
      act_ife: Fix external mac header on encode · 4b1d488a
      Yotam Gigi 提交于
      On ife encode side, external mac header is copied from the original packet
      and may be overridden if the user requests. Before, the mac header copy
      was done from memory region that might not be accessible anymore, as
      skb_cow_head might free it and copy the packet. This led to random values
      in the external mac header once the values were not set by user.
      
      This fix takes the internal mac header from the packet, after the call to
      skb_cow_head.
      
      Fixes: ef6980b6 ("net sched: introduce IFE action")
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NYotam Gigi <yotamg@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4b1d488a
  17. 23 9月, 2016 4 次提交
  18. 22 9月, 2016 6 次提交
  19. 21 9月, 2016 2 次提交