1. 08 7月, 2018 11 次提交
  2. 07 7月, 2018 3 次提交
  3. 04 7月, 2018 6 次提交
    • J
      net/sched: Make etf report drops on error_queue · 4b15c707
      Jesus Sanchez-Palencia 提交于
      Use the socket error queue for reporting dropped packets if the
      socket has enabled that feature through the SO_TXTIME API.
      
      Packets are dropped either on enqueue() if they aren't accepted by the
      qdisc or on dequeue() if the system misses their deadline. Those are
      reported as different errors so applications can react accordingly.
      
      Userspace can retrieve the errors through the socket error queue and the
      corresponding cmsg interfaces. A struct sock_extended_err* is used for
      returning the error data, and the packet's timestamp can be retrieved by
      adding both ee_data and ee_info fields as e.g.:
      
          ((__u64) serr->ee_data << 32) + serr->ee_info
      
      This feature is disabled by default and must be explicitly enabled by
      applications. Enabling it can bring some overhead for the Tx cycles
      of the application.
      Signed-off-by: NJesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4b15c707
    • J
      net/sched: Add HW offloading capability to ETF · 88cab771
      Jesus Sanchez-Palencia 提交于
      Add infra so etf qdisc supports HW offload of time-based transmission.
      
      For hw offload, the time sorted list is still used, so packets are
      dequeued always in order of txtime.
      
      Example:
      
      $ tc qdisc replace dev enp2s0 parent root handle 100 mqprio num_tc 3 \
                 map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 0
      
      $ tc qdisc add dev enp2s0 parent 100:1 etf offload delta 100000 \
      	   clockid CLOCK_REALTIME
      
      In this example, the Qdisc will use HW offload for the control of the
      transmission time through the network adapter. The hrtimer used for
      packets scheduling inside the qdisc will use the clockid CLOCK_REALTIME
      as reference and packets leave the Qdisc "delta" (100000) nanoseconds
      before their transmission time. Because this will be using HW offload and
      since dynamic clocks are not supported by the hrtimer, the system clock
      and the PHC clock must be synchronized for this mode to behave as
      expected.
      Signed-off-by: NJesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      88cab771
    • V
      net/sched: Introduce the ETF Qdisc · 25db26a9
      Vinicius Costa Gomes 提交于
      The ETF (Earliest TxTime First) qdisc uses the information added
      earlier in this series (the socket option SO_TXTIME and the new
      role of sk_buff->tstamp) to schedule packets transmission based
      on absolute time.
      
      For some workloads, just bandwidth enforcement is not enough, and
      precise control of the transmission of packets is necessary.
      
      Example:
      
      $ tc qdisc replace dev enp2s0 parent root handle 100 mqprio num_tc 3 \
                 map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 queues 1@0 1@1 2@2 hw 0
      
      $ tc qdisc add dev enp2s0 parent 100:1 etf delta 100000 \
                 clockid CLOCK_TAI
      
      In this example, the Qdisc will provide SW best-effort for the control
      of the transmission time to the network adapter, the time stamp in the
      socket will be in reference to the clockid CLOCK_TAI and packets
      will leave the qdisc "delta" (100000) nanoseconds before its transmission
      time.
      
      The ETF qdisc will buffer packets sorted by their txtime. It will drop
      packets on enqueue() if their skbuff clockid does not match the clock
      reference of the Qdisc. Moreover, on dequeue(), a packet will be dropped
      if it expires while being enqueued.
      
      The qdisc also supports the SO_TXTIME deadline mode. For this mode, it
      will dequeue a packet as soon as possible and change the skb timestamp
      to 'now' during etf_dequeue().
      
      Note that both the qdisc's and the SO_TXTIME ABIs allow for a clockid
      to be configured, but it's been decided that usage of CLOCK_TAI should
      be enforced until we decide to allow for other clockids to be used.
      The rationale here is that PTP times are usually in the TAI scale, thus
      no other clocks should be necessary. For now, the qdisc will return
      EINVAL if any clocks other than CLOCK_TAI are used.
      Signed-off-by: NJesus Sanchez-Palencia <jesus.sanchez-palencia@intel.com>
      Signed-off-by: NVinicius Costa Gomes <vinicius.gomes@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      25db26a9
    • V
      net/sched: Allow creating a Qdisc watchdog with other clocks · 860b642b
      Vinicius Costa Gomes 提交于
      This adds 'qdisc_watchdog_init_clockid()' that allows a clockid to be
      passed, this allows other time references to be used when scheduling
      the Qdisc to run.
      Signed-off-by: NVinicius Costa Gomes <vinicius.gomes@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      860b642b
    • W
      net: sched: act_pedit: fix possible memory leak in tcf_pedit_init() · 30e99ed6
      Wei Yongjun 提交于
      'keys_ex' is malloced by tcf_pedit_keys_ex_parse() in tcf_pedit_init()
      but not all of the error handle path free it, this may cause memory
      leak. This patch fix it.
      
      Fixes: 71d0ed70 ("net/act_pedit: Support using offset relative to the conventional network headers")
      Signed-off-by: NWei Yongjun <weiyongjun1@huawei.com>
      Acked-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      30e99ed6
    • Q
      net:sched: add action inheritdsfield to skbedit · e7e3728b
      Qiaobin Fu 提交于
      The new action inheritdsfield copies the field DS of
      IPv4 and IPv6 packets into skb->priority. This enables
      later classification of packets based on the DS field.
      
      v5:
      *Update the drop counter for TC_ACT_SHOT
      
      v4:
      *Not allow setting flags other than the expected ones.
      
      *Allow dumping the pure flags.
      
      v3:
      *Use optional flags, so that it won't break old versions of tc.
      
      *Allow users to set both SKBEDIT_F_PRIORITY and SKBEDIT_F_INHERITDSFIELD flags.
      
      v2:
      *Fix the style issue
      
      *Move the code from skbmod to skbedit
      
      Original idea by Jamal Hadi Salim <jhs@mojatatu.com>
      Signed-off-by: NQiaobin Fu <qiaobinf@bu.edu>
      Reviewed-by: NMichel Machado <michel@digirati.com.br>
      Acked-by: NJamal Hadi Salim <jhs@mojatatu.com>
      Reviewed-by: NMarcelo Ricardo Leitner <marcelo.leitner@gmail.com>
      Acked-by: NDavide Caratti <dcaratti@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e7e3728b
  4. 02 7月, 2018 1 次提交
  5. 29 6月, 2018 3 次提交
  6. 28 6月, 2018 6 次提交
  7. 26 6月, 2018 6 次提交
  8. 24 6月, 2018 1 次提交
  9. 23 6月, 2018 1 次提交
  10. 22 6月, 2018 1 次提交
    • P
      cls_flower: fix use after free in flower S/W path · 44a5cd43
      Paolo Abeni 提交于
      If flower filter is created without the skip_sw flag, fl_mask_put()
      can race with fl_classify() and we can destroy the mask rhashtable
      while a lookup operation is accessing it.
      
       BUG: unable to handle kernel paging request at 00000000000911d1
       PGD 0 P4D 0
       SMP PTI
       CPU: 3 PID: 5582 Comm: vhost-5541 Not tainted 4.18.0-rc1.vanilla+ #1950
       Hardware name: Dell Inc. PowerEdge R730/072T6D, BIOS 2.1.7 06/16/2016
       RIP: 0010:rht_bucket_nested+0x20/0x60
       Code: 31 c8 c1 c1 18 29 c8 c3 66 90 8b 4f 04 ba 01 00 00 00 8b 07 48 8b bf 80 00 00 0
       RSP: 0018:ffffafc5cfbb7a48 EFLAGS: 00010206
       RAX: 0000000000001978 RBX: ffff9f12dff88a00 RCX: 00000000ffff9f12
       RDX: 00000000000911d1 RSI: 0000000000000148 RDI: 0000000000000001
       RBP: ffff9f12dff88a00 R08: 000000005f1cc119 R09: 00000000a715fae2
       R10: ffffafc5cfbb7aa8 R11: ffff9f1cb4be804e R12: ffff9f1265e13000
       R13: 0000000000000000 R14: ffffafc5cfbb7b48 R15: ffff9f12dff88b68
       FS:  0000000000000000(0000) GS:ffff9f1d3f0c0000(0000) knlGS:0000000000000000
       CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
       CR2: 00000000000911d1 CR3: 0000001575a94006 CR4: 00000000001626e0
       Call Trace:
        fl_lookup+0x134/0x140 [cls_flower]
        fl_classify+0xf3/0x180 [cls_flower]
        tcf_classify+0x78/0x150
        __netif_receive_skb_core+0x69e/0xa50
        netif_receive_skb_internal+0x42/0xf0
        tun_get_user+0xdd5/0xfd0 [tun]
        tun_sendmsg+0x52/0x70 [tun]
        handle_tx+0x2b3/0x5f0 [vhost_net]
        vhost_worker+0xab/0x100 [vhost]
        kthread+0xf8/0x130
        ret_from_fork+0x35/0x40
       Modules linked in: act_mirred act_gact cls_flower vhost_net vhost tap sch_ingress
       CR2: 00000000000911d1
      
      Fix the above waiting for a RCU grace period before destroying the
      rhashtable: we need to use tcf_queue_work(), as rhashtable_destroy()
      must run in process context, as pointed out by Cong Wang.
      
      v1 -> v2: use tcf_queue_work to run rhashtable_destroy().
      
      Fixes: 05cd271f ("cls_flower: Support multiple masks per priority")
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      44a5cd43
  11. 20 6月, 2018 1 次提交