1. 20 12月, 2017 1 次提交
  2. 16 12月, 2017 4 次提交
  3. 14 12月, 2017 2 次提交
  4. 09 12月, 2017 13 次提交
    • J
      net: sched: fix use-after-free in tcf_block_put_ext · df45bf84
      Jiri Pirko 提交于
      Since the block is freed with last chain being put, once we reach the
      end of iteration of list_for_each_entry_safe, the block may be
      already freed. I'm hitting this only by creating and deleting clsact:
      
      [  202.171952] ==================================================================
      [  202.180182] BUG: KASAN: use-after-free in tcf_block_put_ext+0x240/0x390
      [  202.187590] Read of size 8 at addr ffff880225539a80 by task tc/796
      [  202.194508]
      [  202.196185] CPU: 0 PID: 796 Comm: tc Not tainted 4.15.0-rc2jiri+ #5
      [  202.203200] Hardware name: Mellanox Technologies Ltd. "MSN2100-CB2F"/"SA001017", BIOS 5.6.5 06/07/2016
      [  202.213613] Call Trace:
      [  202.216369]  dump_stack+0xda/0x169
      [  202.220192]  ? dma_virt_map_sg+0x147/0x147
      [  202.224790]  ? show_regs_print_info+0x54/0x54
      [  202.229691]  ? tcf_chain_destroy+0x1dc/0x250
      [  202.234494]  print_address_description+0x83/0x3d0
      [  202.239781]  ? tcf_block_put_ext+0x240/0x390
      [  202.244575]  kasan_report+0x1ba/0x460
      [  202.248707]  ? tcf_block_put_ext+0x240/0x390
      [  202.253518]  tcf_block_put_ext+0x240/0x390
      [  202.258117]  ? tcf_chain_flush+0x290/0x290
      [  202.262708]  ? qdisc_hash_del+0x82/0x1a0
      [  202.267111]  ? qdisc_hash_add+0x50/0x50
      [  202.271411]  ? __lock_is_held+0x5f/0x1a0
      [  202.275843]  clsact_destroy+0x3d/0x80 [sch_ingress]
      [  202.281323]  qdisc_destroy+0xcb/0x240
      [  202.285445]  qdisc_graft+0x216/0x7b0
      [  202.289497]  tc_get_qdisc+0x260/0x560
      
      Fix this by holding the block also by chain 0 and put chain 0
      explicitly, out of the list_for_each_entry_safe loop at the very
      end of tcf_block_put_ext.
      
      Fixes: efbf7897 ("net_sched: get rid of rcu_barrier() in tcf_block_put_ext()")
      Signed-off-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      df45bf84
    • J
      net: sched: pfifo_fast use skb_array · c5ad119f
      John Fastabend 提交于
      This converts the pfifo_fast qdisc to use the skb_array data structure
      and set the lockless qdisc bit. pfifo_fast is the first qdisc to support
      the lockless bit that can be a child of a qdisc requiring locking. So
      we add logic to clear the lock bit on initialization in these cases when
      the qdisc graft operation occurs.
      
      This also removes the logic used to pick the next band to dequeue from
      and instead just checks a per priority array for packets from top priority
      to lowest. This might need to be a bit more clever but seems to work
      for now.
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c5ad119f
    • J
      net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mqprio · ce679e8d
      John Fastabend 提交于
      The sch_mqprio qdisc creates a sub-qdisc per tx queue which are then
      called independently for enqueue and dequeue operations. However
      statistics are aggregated and pushed up to the "master" qdisc.
      
      This patch adds support for any of the sub-qdiscs to be per cpu
      statistic qdiscs. To handle this case add a check when calculating
      stats and aggregate the per cpu stats if needed.
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ce679e8d
    • J
      net: sched: add support for TCQ_F_NOLOCK subqueues to sch_mq · b01ac095
      John Fastabend 提交于
      The sch_mq qdisc creates a sub-qdisc per tx queue which are then
      called independently for enqueue and dequeue operations. However
      statistics are aggregated and pushed up to the "master" qdisc.
      
      This patch adds support for any of the sub-qdiscs to be per cpu
      statistic qdiscs. To handle this case add a check when calculating
      stats and aggregate the per cpu stats if needed.
      
      Also exports __gnet_stats_copy_queue() to use as a helper function.
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b01ac095
    • J
      net: sched: helpers to sum qlen and qlen for per cpu logic · 7e66016f
      John Fastabend 提交于
      Add qdisc qlen helper routines for lockless qdiscs to use.
      
      The qdisc qlen is no longer used in the hotpath but it is reported
      via stats query on the qdisc so it still needs to be tracked. This
      adds the per cpu operations needed along with a helper to return
      the summation of per cpu stats.
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7e66016f
    • J
      net: sched: check for frozen queue before skb_bad_txq check · fd8e8d1a
      John Fastabend 提交于
      I can not think of any reason to pull the bad txq skb off the qdisc if
      the txq we plan to send this on is still frozen. So check for frozen
      queue first and abort before dequeuing either skb_bad_txq skb or
      normal qdisc dequeue() skb.
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fd8e8d1a
    • J
      net: sched: use skb list for skb_bad_tx · 70e57d5e
      John Fastabend 提交于
      Similar to how gso is handled use skb list for skb_bad_tx this is
      required with lockless qdiscs because we may have multiple cores
      attempting to push skbs into skb_bad_tx concurrently
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      70e57d5e
    • J
      net: sched: drop qdisc_reset from dev_graft_qdisc · 7bbde83b
      John Fastabend 提交于
      In qdisc_graft_qdisc a "new" qdisc is attached and the 'qdisc_destroy'
      operation is called on the old qdisc. The destroy operation will wait
      a rcu grace period and call qdisc_rcu_free(). At which point
      gso_cpu_skb is free'd along with all stats so no need to zero stats
      and gso_cpu_skb from the graft operation itself.
      
      Further after dropping the qdisc locks we can not continue to call
      qdisc_reset before waiting an rcu grace period so that the qdisc is
      detached from all cpus. By removing the qdisc_reset() here we get
      the correct property of waiting an rcu grace period and letting the
      qdisc_destroy operation clean up the qdisc correctly.
      
      Note, a refcnt greater than 1 would cause the destroy operation to
      be aborted however if this ever happened the reference to the qdisc
      would be lost and we would have a memory leak.
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7bbde83b
    • J
      net: sched: explicit locking in gso_cpu fallback · a53851e2
      John Fastabend 提交于
      This work is preparing the qdisc layer to support egress lockless
      qdiscs. If we are running the egress qdisc lockless in the case we
      overrun the netdev, for whatever reason, the netdev returns a busy
      error code and the skb is parked on the gso_skb pointer. With many
      cores all hitting this case at once its possible to have multiple
      sk_buffs here so we turn gso_skb into a queue.
      
      This should be the edge case and if we see this frequently then
      the netdev/qdisc layer needs to back off.
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a53851e2
    • J
      net: sched: a dflt qdisc may be used with per cpu stats · d59f5ffa
      John Fastabend 提交于
      Enable dflt qdisc support for per cpu stats before this patch a dflt
      qdisc was required to use the global statistics qstats and bstats.
      
      This adds a static flags field to qdisc_ops that is propagated
      into qdisc->flags in qdisc allocate call. This allows the allocation
      block to completely allocate the qdisc object so we don't have
      dangling allocations after qdisc init.
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d59f5ffa
    • J
      net: sched: remove remaining uses for qdisc_qlen in xmit path · 29b86cda
      John Fastabend 提交于
      sch_direct_xmit() uses qdisc_qlen as a return value but all call sites
      of the routine only check if it is zero or not. Simplify the logic so
      that we don't need to return an actual queue length value.
      
      This introduces a case now where sch_direct_xmit would have returned
      a qlen of zero but now it returns true. However in this case all
      call sites of sch_direct_xmit will implement a dequeue() and get
      a null skb and abort. This trades tracking qlen in the hotpath for
      an extra dequeue operation. Overall this seems to be good for
      performance.
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      29b86cda
    • J
      net: sched: allow qdiscs to handle locking · 6b3ba914
      John Fastabend 提交于
      This patch adds a flag for queueing disciplines to indicate the stack
      does not need to use the qdisc lock to protect operations. This can
      be used to build lockless scheduling algorithms and improving
      performance.
      
      The flag is checked in the tx path and the qdisc lock is only taken
      if it is not set. For now use a conditional if statement. Later we
      could be more aggressive if it proves worthwhile and use a static key
      or wrap this in a likely().
      
      Also the lockless case drops the TCQ_F_CAN_BYPASS logic. The reason
      for this is synchronizing a qlen counter across threads proves to
      cost more than doing the enqueue/dequeue operations when tested with
      pktgen.
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6b3ba914
    • J
      net: sched: cleanup qdisc_run and __qdisc_run semantics · 6c148184
      John Fastabend 提交于
      Currently __qdisc_run calls qdisc_run_end() but does not call
      qdisc_run_begin(). This makes it hard to track pairs of
      qdisc_run_{begin,end} across function calls.
      
      To simplify reading these code paths this patch moves begin/end calls
      into qdisc_run().
      Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6c148184
  5. 07 12月, 2017 3 次提交
  6. 06 12月, 2017 6 次提交
  7. 30 11月, 2017 1 次提交
  8. 29 11月, 2017 3 次提交
  9. 25 11月, 2017 1 次提交
    • R
      net: sched: crash on blocks with goto chain action · a60b3f51
      Roman Kapl 提交于
      tcf_block_put_ext has assumed that all filters (and thus their goto
      actions) are destroyed in RCU callback and thus can not race with our
      list iteration. However, that is not true during netns cleanup (see
      tcf_exts_get_net comment).
      
      Prevent the user after free by holding all chains (except 0, that one is
      already held). foreach_safe is not enough in this case.
      
      To reproduce, run the following in a netns and then delete the ns:
          ip link add dtest type dummy
          tc qdisc add dev dtest ingress
          tc filter add dev dtest chain 1 parent ffff: handle 1 prio 1 flower action goto chain 2
      
      Fixes: 822e86d9 ("net_sched: remove tcf_block_put_deferred()")
      Signed-off-by: NRoman Kapl <code@rkapl.cz>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a60b3f51
  10. 24 11月, 2017 2 次提交
    • W
      net: accept UFO datagrams from tuntap and packet · 0c19f846
      Willem de Bruijn 提交于
      Tuntap and similar devices can inject GSO packets. Accept type
      VIRTIO_NET_HDR_GSO_UDP, even though not generating UFO natively.
      
      Processes are expected to use feature negotiation such as TUNSETOFFLOAD
      to detect supported offload types and refrain from injecting other
      packets. This process breaks down with live migration: guest kernels
      do not renegotiate flags, so destination hosts need to expose all
      features that the source host does.
      
      Partially revert the UFO removal from 182e0b6b~1..d9d30adf.
      This patch introduces nearly(*) no new code to simplify verification.
      It brings back verbatim tuntap UFO negotiation, VIRTIO_NET_HDR_GSO_UDP
      insertion and software UFO segmentation.
      
      It does not reinstate protocol stack support, hardware offload
      (NETIF_F_UFO), SKB_GSO_UDP tunneling in SKB_GSO_SOFTWARE or reception
      of VIRTIO_NET_HDR_GSO_UDP packets in tuntap.
      
      To support SKB_GSO_UDP reappearing in the stack, also reinstate
      logic in act_csum and openvswitch. Achieve equivalence with v4.13 HEAD
      by squashing in commit 93991221 ("net: skb_needs_check() removes
      CHECKSUM_UNNECESSARY check for tx.") and reverting commit 8d63bee6
      ("net: avoid skb_warn_bad_offload false positives on UFO").
      
      (*) To avoid having to bring back skb_shinfo(skb)->ip6_frag_id,
      ipv6_proxy_select_ident is changed to return a __be32 and this is
      assigned directly to the frag_hdr. Also, SKB_GSO_UDP is inserted
      at the end of the enum to minimize code churn.
      
      Tested
        Booted a v4.13 guest kernel with QEMU. On a host kernel before this
        patch `ethtool -k eth0` shows UFO disabled. After the patch, it is
        enabled, same as on a v4.13 host kernel.
      
        A UFO packet sent from the guest appears on the tap device:
          host:
            nc -l -p -u 8000 &
            tcpdump -n -i tap0
      
          guest:
            dd if=/dev/zero of=payload.txt bs=1 count=2000
            nc -u 192.16.1.1 8000 < payload.txt
      
        Direct tap to tap transmission of VIRTIO_NET_HDR_GSO_UDP succeeds,
        packets arriving fragmented:
      
          ./with_tap_pair.sh ./tap_send_ufo tap0 tap1
          (from https://github.com/wdebruij/kerneltools/tree/master/tests)
      
      Changes
        v1 -> v2
          - simplified set_offload change (review comment)
          - documented test procedure
      
      Link: http://lkml.kernel.org/r/<CAF=yD-LuUeDuL9YWPJD9ykOZ0QCjNeznPDr6whqZ9NGMNF12Mw@mail.gmail.com>
      Fixes: fb652fdf ("macvlan/macvtap: Remove NETIF_F_UFO advertisement.")
      Reported-by: NMichal Kubecek <mkubecek@suse.cz>
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Acked-by: NJason Wang <jasowang@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0c19f846
    • R
      net: sched: fix crash when deleting secondary chains · d7aa04a5
      Roman Kapl 提交于
      If you flush (delete) a filter chain other than chain 0 (such as when
      deleting the device), the kernel may run into a use-after-free. The
      chain refcount must not be decremented unless we are sure we are done
      with the chain.
      
      To reproduce the bug, run:
          ip link add dtest type dummy
          tc qdisc add dev dtest ingress
          tc filter add dev dtest chain 1  parent ffff: flower
          ip link del dtest
      
      Introduced in: commit f93e1cdc ("net/sched: fix filter flushing"),
      but unless you have KAsan or luck, you won't notice it until
      commit 0dadc117 ("cls_flower: use tcf_exts_get_net() before call_rcu()")
      
      Fixes: f93e1cdc ("net/sched: fix filter flushing")
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NRoman Kapl <code@rkapl.cz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d7aa04a5
  11. 21 11月, 2017 1 次提交
  12. 15 11月, 2017 2 次提交
  13. 13 11月, 2017 1 次提交
    • A
      net/sched/sch_red.c: work around gcc-4.4.4 anon union initializer issue · ee9d3429
      Andrew Morton 提交于
      gcc-4.4.4 (at lest) has issues with initializers and anonymous unions:
      
      net/sched/sch_red.c: In function 'red_dump_offload':
      net/sched/sch_red.c:282: error: unknown field 'stats' specified in initializer
      net/sched/sch_red.c:282: warning: initialization makes integer from pointer without a cast
      net/sched/sch_red.c:283: error: unknown field 'stats' specified in initializer
      net/sched/sch_red.c:283: warning: initialization makes integer from pointer without a cast
      net/sched/sch_red.c: In function 'red_dump_stats':
      net/sched/sch_red.c:352: error: unknown field 'xstats' specified in initializer
      net/sched/sch_red.c:352: warning: initialization makes integer from pointer without a cast
      
      Work around this.
      
      Fixes: 602f3baf ("net_sch: red: Add offload ability to RED qdisc")
      Cc: Nogah Frankel <nogahf@mellanox.com>
      Cc: Jiri Pirko <jiri@mellanox.com>
      Cc: Simon Horman <simon.horman@netronome.com>
      Cc: David S. Miller <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ee9d3429