1. 09 10月, 2018 13 次提交
  2. 08 10月, 2018 1 次提交
    • L
      net: sched: pie: fix coding style issues · ac4a02c5
      Leslie Monis 提交于
      Fix 5 warnings and 14 checks issued by checkpatch.pl:
      
      CHECK: Logical continuations should be on the previous line
      +	if ((q->vars.qdelay < q->params.target / 2)
      +	    && (q->vars.prob < MAX_PROB / 5))
      
      WARNING: line over 80 characters
      +		q->params.tupdate = usecs_to_jiffies(nla_get_u32(tb[TCA_PIE_TUPDATE]));
      
      CHECK: Blank lines aren't necessary after an open brace '{'
      +{
      +
      
      CHECK: braces {} should be used on all arms of this statement
      +			if (qlen < QUEUE_THRESHOLD)
      [...]
      +			else {
      [...]
      
      CHECK: Unbalanced braces around else statement
      +			else {
      
      CHECK: No space is necessary after a cast
      +	if (delta > (s32) (MAX_PROB / (100 / 2)) &&
      
      CHECK: Unnecessary parentheses around 'qdelay == 0'
      +	if ((qdelay == 0) && (qdelay_old == 0) && update_prob)
      
      CHECK: Unnecessary parentheses around 'qdelay_old == 0'
      +	if ((qdelay == 0) && (qdelay_old == 0) && update_prob)
      
      CHECK: Unnecessary parentheses around 'q->vars.prob == 0'
      +	if ((q->vars.qdelay < q->params.target / 2) &&
      +	    (q->vars.qdelay_old < q->params.target / 2) &&
      +	    (q->vars.prob == 0) &&
      +	    (q->vars.avg_dq_rate > 0))
      
      CHECK: Unnecessary parentheses around 'q->vars.avg_dq_rate > 0'
      +	if ((q->vars.qdelay < q->params.target / 2) &&
      +	    (q->vars.qdelay_old < q->params.target / 2) &&
      +	    (q->vars.prob == 0) &&
      +	    (q->vars.avg_dq_rate > 0))
      
      CHECK: Blank lines aren't necessary before a close brace '}'
      +
      +}
      
      CHECK: Comparison to NULL could be written "!opts"
      +	if (opts == NULL)
      
      CHECK: No space is necessary after a cast
      +			((u32) PSCHED_TICKS2NS(q->params.target)) /
      
      WARNING: line over 80 characters
      +	    nla_put_u32(skb, TCA_PIE_TUPDATE, jiffies_to_usecs(q->params.tupdate)) ||
      
      CHECK: Blank lines aren't necessary before a close brace '}'
      +
      +}
      
      CHECK: No space is necessary after a cast
      +		.delay		= ((u32) PSCHED_TICKS2NS(q->vars.qdelay)) /
      
      WARNING: Missing a blank line after declarations
      +	struct sk_buff *skb;
      +	skb = qdisc_dequeue_head(sch);
      
      WARNING: Missing a blank line after declarations
      +	struct pie_sched_data *q = qdisc_priv(sch);
      +	qdisc_reset_queue(sch);
      
      WARNING: Missing a blank line after declarations
      +	struct pie_sched_data *q = qdisc_priv(sch);
      +	q->params.tupdate = 0;
      Signed-off-by: NLeslie Monis <lesliemonis@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ac4a02c5
  3. 06 10月, 2018 8 次提交
    • V
      net/ncsi: Add NCSI OEM command support · fb4ee675
      Vijay Khemka 提交于
      This patch adds OEM commands and response handling. It also defines OEM
      command and response structure as per NCSI specification along with its
      handlers.
      
      ncsi_cmd_handler_oem: This is a generic command request handler for OEM
      commands
      ncsi_rsp_handler_oem: This is a generic response handler for OEM commands
      Signed-off-by: NVijay Khemka <vijaykhemka@fb.com>
      Reviewed-by: NJustin Lee <justin.lee1@dell.com>
      Reviewed-by: NSamuel Mendoza-Jonas <sam@mendozajonas.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fb4ee675
    • W
      ipv6: take rcu lock in rawv6_send_hdrinc() · a688caa3
      Wei Wang 提交于
      In rawv6_send_hdrinc(), in order to avoid an extra dst_hold(), we
      directly assign the dst to skb and set passed in dst to NULL to avoid
      double free.
      However, in error case, we free skb and then do stats update with the
      dst pointer passed in. This causes use-after-free on the dst.
      Fix it by taking rcu read lock right before dst could get released to
      make sure dst does not get freed until the stats update is done.
      Note: we don't have this issue in ipv4 cause dst is not used for stats
      update in v4.
      
      Syzkaller reported following crash:
      BUG: KASAN: use-after-free in rawv6_send_hdrinc net/ipv6/raw.c:692 [inline]
      BUG: KASAN: use-after-free in rawv6_sendmsg+0x4421/0x4630 net/ipv6/raw.c:921
      Read of size 8 at addr ffff8801d95ba730 by task syz-executor0/32088
      
      CPU: 1 PID: 32088 Comm: syz-executor0 Not tainted 4.19.0-rc2+ #93
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011
      Call Trace:
       __dump_stack lib/dump_stack.c:77 [inline]
       dump_stack+0x1c4/0x2b4 lib/dump_stack.c:113
       print_address_description.cold.8+0x9/0x1ff mm/kasan/report.c:256
       kasan_report_error mm/kasan/report.c:354 [inline]
       kasan_report.cold.9+0x242/0x309 mm/kasan/report.c:412
       __asan_report_load8_noabort+0x14/0x20 mm/kasan/report.c:433
       rawv6_send_hdrinc net/ipv6/raw.c:692 [inline]
       rawv6_sendmsg+0x4421/0x4630 net/ipv6/raw.c:921
       inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798
       sock_sendmsg_nosec net/socket.c:621 [inline]
       sock_sendmsg+0xd5/0x120 net/socket.c:631
       ___sys_sendmsg+0x7fd/0x930 net/socket.c:2114
       __sys_sendmsg+0x11d/0x280 net/socket.c:2152
       __do_sys_sendmsg net/socket.c:2161 [inline]
       __se_sys_sendmsg net/socket.c:2159 [inline]
       __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2159
       do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      RIP: 0033:0x457099
      Code: fd b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b4 fb ff c3 66 2e 0f 1f 84 00 00 00 00
      RSP: 002b:00007f83756edc78 EFLAGS: 00000246 ORIG_RAX: 000000000000002e
      RAX: ffffffffffffffda RBX: 00007f83756ee6d4 RCX: 0000000000457099
      RDX: 0000000000000000 RSI: 0000000020003840 RDI: 0000000000000004
      RBP: 00000000009300a0 R08: 0000000000000000 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000246 R12: 00000000ffffffff
      R13: 00000000004d4b30 R14: 00000000004c90b1 R15: 0000000000000000
      
      Allocated by task 32088:
       save_stack+0x43/0xd0 mm/kasan/kasan.c:448
       set_track mm/kasan/kasan.c:460 [inline]
       kasan_kmalloc+0xc7/0xe0 mm/kasan/kasan.c:553
       kasan_slab_alloc+0x12/0x20 mm/kasan/kasan.c:490
       kmem_cache_alloc+0x12e/0x730 mm/slab.c:3554
       dst_alloc+0xbb/0x1d0 net/core/dst.c:105
       ip6_dst_alloc+0x35/0xa0 net/ipv6/route.c:353
       ip6_rt_cache_alloc+0x247/0x7b0 net/ipv6/route.c:1186
       ip6_pol_route+0x8f8/0xd90 net/ipv6/route.c:1895
       ip6_pol_route_output+0x54/0x70 net/ipv6/route.c:2093
       fib6_rule_lookup+0x277/0x860 net/ipv6/fib6_rules.c:122
       ip6_route_output_flags+0x2c5/0x350 net/ipv6/route.c:2121
       ip6_route_output include/net/ip6_route.h:88 [inline]
       ip6_dst_lookup_tail+0xe27/0x1d60 net/ipv6/ip6_output.c:951
       ip6_dst_lookup_flow+0xc8/0x270 net/ipv6/ip6_output.c:1079
       rawv6_sendmsg+0x12d9/0x4630 net/ipv6/raw.c:905
       inet_sendmsg+0x1a1/0x690 net/ipv4/af_inet.c:798
       sock_sendmsg_nosec net/socket.c:621 [inline]
       sock_sendmsg+0xd5/0x120 net/socket.c:631
       ___sys_sendmsg+0x7fd/0x930 net/socket.c:2114
       __sys_sendmsg+0x11d/0x280 net/socket.c:2152
       __do_sys_sendmsg net/socket.c:2161 [inline]
       __se_sys_sendmsg net/socket.c:2159 [inline]
       __x64_sys_sendmsg+0x78/0xb0 net/socket.c:2159
       do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
       entry_SYSCALL_64_after_hwframe+0x49/0xbe
      
      Freed by task 5356:
       save_stack+0x43/0xd0 mm/kasan/kasan.c:448
       set_track mm/kasan/kasan.c:460 [inline]
       __kasan_slab_free+0x102/0x150 mm/kasan/kasan.c:521
       kasan_slab_free+0xe/0x10 mm/kasan/kasan.c:528
       __cache_free mm/slab.c:3498 [inline]
       kmem_cache_free+0x83/0x290 mm/slab.c:3756
       dst_destroy+0x267/0x3c0 net/core/dst.c:141
       dst_destroy_rcu+0x16/0x19 net/core/dst.c:154
       __rcu_reclaim kernel/rcu/rcu.h:236 [inline]
       rcu_do_batch kernel/rcu/tree.c:2576 [inline]
       invoke_rcu_callbacks kernel/rcu/tree.c:2880 [inline]
       __rcu_process_callbacks kernel/rcu/tree.c:2847 [inline]
       rcu_process_callbacks+0xf23/0x2670 kernel/rcu/tree.c:2864
       __do_softirq+0x30b/0xad8 kernel/softirq.c:292
      
      Fixes: 1789a640 ("raw: avoid two atomics in xmit")
      Signed-off-by: NWei Wang <weiwan@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a688caa3
    • J
      socket: Tighten no-error check in bind() · 068b88cc
      Jakub Sitnicki 提交于
      move_addr_to_kernel() returns only negative values on error, or zero on
      success. Rewrite the error check to an idiomatic form to avoid confusing
      the reader.
      Signed-off-by: NJakub Sitnicki <jakub@cloudflare.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      068b88cc
    • D
      net: sched: Add policy validation for tc attributes · 8b4c3cdd
      David Ahern 提交于
      A number of TC attributes are processed without proper validation
      (e.g., length checks). Add a tca policy for all input attributes and use
      when invoking nlmsg_parse.
      
      The 2 Fixes tags below cover the latest additions. The other attributes
      are a string (KIND), nested attribute (OPTIONS which does seem to have
      validation in most cases), for dumps only or a flag.
      
      Fixes: 5bc17018 ("net: sched: introduce multichain support for filters")
      Fixes: d47a6b0e ("net: sched: introduce ingress/egress block index attributes for qdisc")
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8b4c3cdd
    • M
      rtnetlink: fix rtnl_fdb_dump() for ndmsg header · bd961c9b
      Mauricio Faria de Oliveira 提交于
      Currently, rtnl_fdb_dump() assumes the family header is 'struct ifinfomsg',
      which is not always true -- 'struct ndmsg' is used by iproute2 ('ip neigh').
      
      The problem is, the function bails out early if nlmsg_parse() fails, which
      does occur for iproute2 usage of 'struct ndmsg' because the payload length
      is shorter than the family header alone (as 'struct ifinfomsg' is assumed).
      
      This breaks backward compatibility with userspace -- nothing is sent back.
      
      Some examples with iproute2 and netlink library for go [1]:
      
       1) $ bridge fdb show
          33:33:00:00:00:01 dev ens3 self permanent
          01:00:5e:00:00:01 dev ens3 self permanent
          33:33:ff:15:98:30 dev ens3 self permanent
      
            This one works, as it uses 'struct ifinfomsg'.
      
            fdb_show() @ iproute2/bridge/fdb.c
              """
              .n.nlmsg_len = NLMSG_LENGTH(sizeof(struct ifinfomsg)),
              ...
              if (rtnl_dump_request(&rth, RTM_GETNEIGH, [...]
              """
      
       2) $ ip --family bridge neigh
          RTNETLINK answers: Invalid argument
          Dump terminated
      
            This one fails, as it uses 'struct ndmsg'.
      
            do_show_or_flush() @ iproute2/ip/ipneigh.c
              """
              .n.nlmsg_type = RTM_GETNEIGH,
              .n.nlmsg_len = NLMSG_LENGTH(sizeof(struct ndmsg)),
              """
      
       3) $ ./neighlist
          < no output >
      
            This one fails, as it uses 'struct ndmsg'-based.
      
            neighList() @ netlink/neigh_linux.go
              """
              req := h.newNetlinkRequest(unix.RTM_GETNEIGH, [...]
              msg := Ndmsg{
              """
      
      The actual breakage was introduced by commit 0ff50e83 ("net: rtnetlink:
      bail out from rtnl_fdb_dump() on parse error"), because nlmsg_parse() fails
      if the payload length (with the _actual_ family header) is less than the
      family header length alone (which is assumed, in parameter 'hdrlen').
      This is true in the examples above with struct ndmsg, with size and payload
      length shorter than struct ifinfomsg.
      
      However, that commit just intends to fix something under the assumption the
      family header is indeed an 'struct ifinfomsg' - by preventing access to the
      payload as such (via 'ifm' pointer) if the payload length is not sufficient
      to actually contain it.
      
      The assumption was introduced by commit 5e6d2435 ("bridge: netlink dump
      interface at par with brctl"), to support iproute2's 'bridge fdb' command
      (not 'ip neigh') which indeed uses 'struct ifinfomsg', thus is not broken.
      
      So, in order to unbreak the 'struct ndmsg' family headers and still allow
      'struct ifinfomsg' to continue to work, check for the known message sizes
      used with 'struct ndmsg' in iproute2 (with zero or one attribute which is
      not used in this function anyway) then do not parse the data as ifinfomsg.
      
      Same examples with this patch applied (or revert/before the original fix):
      
          $ bridge fdb show
          33:33:00:00:00:01 dev ens3 self permanent
          01:00:5e:00:00:01 dev ens3 self permanent
          33:33:ff:15:98:30 dev ens3 self permanent
      
          $ ip --family bridge neigh
          dev ens3 lladdr 33:33:00:00:00:01 PERMANENT
          dev ens3 lladdr 01:00:5e:00:00:01 PERMANENT
          dev ens3 lladdr 33:33:ff:15:98:30 PERMANENT
      
          $ ./neighlist
          netlink.Neigh{LinkIndex:2, Family:7, State:128, Type:0, Flags:2, IP:net.IP(nil), HardwareAddr:net.HardwareAddr{0x33, 0x33, 0x0, 0x0, 0x0, 0x1}, LLIPAddr:net.IP(nil), Vlan:0, VNI:0}
          netlink.Neigh{LinkIndex:2, Family:7, State:128, Type:0, Flags:2, IP:net.IP(nil), HardwareAddr:net.HardwareAddr{0x1, 0x0, 0x5e, 0x0, 0x0, 0x1}, LLIPAddr:net.IP(nil), Vlan:0, VNI:0}
          netlink.Neigh{LinkIndex:2, Family:7, State:128, Type:0, Flags:2, IP:net.IP(nil), HardwareAddr:net.HardwareAddr{0x33, 0x33, 0xff, 0x15, 0x98, 0x30}, LLIPAddr:net.IP(nil), Vlan:0, VNI:0}
      
      Tested on mainline (v4.19-rc6) and net-next (3bd09b05b068).
      
      References:
      
      [1] netlink library for go (test-case)
          https://github.com/vishvananda/netlink
      
          $ cat ~/go/src/neighlist/main.go
          package main
          import ("fmt"; "syscall"; "github.com/vishvananda/netlink")
          func main() {
              neighs, _ := netlink.NeighList(0, syscall.AF_BRIDGE)
              for _, neigh := range neighs { fmt.Printf("%#v\n", neigh) }
          }
      
          $ export GOPATH=~/go
          $ go get github.com/vishvananda/netlink
          $ go build neighlist
          $ ~/go/src/neighlist/neighlist
      
      Thanks to David Ahern for suggestions to improve this patch.
      
      Fixes: 0ff50e83 ("net: rtnetlink: bail out from rtnl_fdb_dump() on parse error")
      Fixes: 5e6d2435 ("bridge: netlink dump interface at par with brctl")
      Reported-by: NAidan Obley <aobley@pivotal.io>
      Signed-off-by: NMauricio Faria de Oliveira <mfo@canonical.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bd961c9b
    • E
      ipv6: do not leave garbage in rt->fib6_metrics · fda21d46
      Eric Dumazet 提交于
      In case ip_fib_metrics_init() returns an error, we better
      rewrite rt->fib6_metrics with &dst_default_metrics so that
      we do not crash later in ip_fib_metrics_put()
      
      Fixes: 767a2217 ("net: common metrics init helper for FIB entries")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: Nsyzbot <syzkaller@googlegroups.com>
      Reviewed-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fda21d46
    • W
      udp: gro behind static key · f2e9de21
      Willem de Bruijn 提交于
      Avoid the socket lookup cost in udp_gro_receive if no socket has a
      udp tunnel callback configured.
      
      udp_sk(sk)->gro_receive requires a registration with
      setup_udp_tunnel_sock, which enables the static key.
      Signed-off-by: NWillem de Bruijn <willemb@google.com>
      Acked-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f2e9de21
    • S
      net: bpfilter: Fix type cast and pointer warnings · 33aa8da1
      Shanthosh RK 提交于
      Fixes the following Sparse warnings:
      
      net/bpfilter/bpfilter_kern.c:62:21: warning: cast removes address space
      of expression
      net/bpfilter/bpfilter_kern.c:101:49: warning: Using plain integer as
      NULL pointer
      Signed-off-by: NShanthosh RK <shanthosh.rk@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      33aa8da1
  4. 05 10月, 2018 13 次提交
    • C
      net_sched: convert idrinfo->lock from spinlock to a mutex · 95278dda
      Cong Wang 提交于
      In commit ec3ed293 ("net_sched: change tcf_del_walker() to take idrinfo->lock")
      we move fl_hw_destroy_tmplt() to a workqueue to avoid blocking
      with the spinlock held. Unfortunately, this causes a lot of
      troubles here:
      
      1. tcf_chain_destroy() could be called right after we queue the work
         but before the work runs. This is a use-after-free.
      
      2. The chain refcnt is already 0, we can't even just hold it again.
         We can check refcnt==1 but it is ugly.
      
      3. The chain with refcnt 0 is still visible in its block, which means
         it could be still found and used!
      
      4. The block has a refcnt too, we can't hold it without introducing a
         proper API either.
      
      We can make it working but the end result is ugly. Instead of wasting
      time on reviewing it, let's just convert the troubling spinlock to
      a mutex, which allows us to use non-atomic allocations too.
      
      Fixes: ec3ed293 ("net_sched: change tcf_del_walker() to take idrinfo->lock")
      Reported-by: NIdo Schimmel <idosch@idosch.org>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Vlad Buslov <vladbu@mellanox.com>
      Cc: Jiri Pirko <jiri@mellanox.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Tested-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      95278dda
    • D
      net/neigh: Extend dump filter to proxy neighbor dumps · 6f52f80e
      David Ahern 提交于
      Move the attribute parsing from neigh_dump_table to neigh_dump_info, and
      pass the filter arguments down to neigh_dump_table in a new struct. Add
      the filter option to proxy neigh dumps as well to make them consistent.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6f52f80e
    • J
      net/packet: fix packet drop as of virtio gso · 9d2f67e4
      Jianfeng Tan 提交于
      When we use raw socket as the vhost backend, a packet from virito with
      gso offloading information, cannot be sent out in later validaton at
      xmit path, as we did not set correct skb->protocol which is further used
      for looking up the gso function.
      
      To fix this, we set this field according to virito hdr information.
      
      Fixes: e858fae2 ("virtio_net: use common code for virtio_net_hdr and skb GSO conversion")
      Signed-off-by: NJianfeng Tan <jianfeng.tan@linux.alibaba.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9d2f67e4
    • D
      net: Move free of dst_metrics to helper · 1620a336
      David Ahern 提交于
      Move the refcounting and potential free of dst metrics associated
      for ipv4 and ipv6 to a common helper.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1620a336
    • D
      net: common metrics init helper for dst_entry · e1255ed4
      David Ahern 提交于
      ipv4 and ipv6 both use refcounted metrics if FIB entries have metrics set.
      Move the common initialization code to a helper and use for both protocols.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e1255ed4
    • D
      net: Move free of fib_metrics to helper · cc5f0eb2
      David Ahern 提交于
      Move the refcounting and potential free of dst metrics associated
      with a fib entry to a helper and use it in both ipv4 and ipv6.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cc5f0eb2
    • D
      net: common metrics init helper for FIB entries · 767a2217
      David Ahern 提交于
      Consolidate initialization of ipv4 and ipv6 metrics when fib entries
      are created into a single helper, ip_fib_metrics_init, that handles
      the call to ip_metrics_convert.
      
      If no metrics are defined for the fib entry, then the metrics is set
      to dst_default_metrics.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      767a2217
    • F
      openvswitch: load NAT helper · 17c357ef
      Flavio Leitner 提交于
      Load the respective NAT helper module if the flow uses it.
      Signed-off-by: NFlavio Leitner <fbl@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      17c357ef
    • V
      tc: Add support for configuring the taprio scheduler · 5a781ccb
      Vinicius Costa Gomes 提交于
      This traffic scheduler allows traffic classes states (transmission
      allowed/not allowed, in the simplest case) to be scheduled, according
      to a pre-generated time sequence. This is the basis of the IEEE
      802.1Qbv specification.
      
      Example configuration:
      
      tc qdisc replace dev enp3s0 parent root handle 100 taprio \
                num_tc 3 \
      	  map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
      	  queues 1@0 1@1 2@2 \
      	  base-time 1528743495910289987 \
      	  sched-entry S 01 300000 \
      	  sched-entry S 02 300000 \
      	  sched-entry S 04 300000 \
      	  clockid CLOCK_TAI
      
      The configuration format is similar to mqprio. The main difference is
      the presence of a schedule, built by multiple "sched-entry"
      definitions, each entry has the following format:
      
           sched-entry <CMD> <GATE MASK> <INTERVAL>
      
      The only supported <CMD> is "S", which means "SetGateStates",
      following the IEEE 802.1Qbv-2015 definition (Table 8-6). <GATE MASK>
      is a bitmask where each bit is a associated with a traffic class, so
      bit 0 (the least significant bit) being "on" means that traffic class
      0 is "active" for that schedule entry. <INTERVAL> is a time duration
      in nanoseconds that specifies for how long that state defined by <CMD>
      and <GATE MASK> should be held before moving to the next entry.
      
      This schedule is circular, that is, after the last entry is executed
      it starts from the first one, indefinitely.
      
      The other parameters can be defined as follows:
      
       - base-time: specifies the instant when the schedule starts, if
        'base-time' is a time in the past, the schedule will start at
      
       	      base-time + (N * cycle-time)
      
         where N is the smallest integer so the resulting time is greater
         than "now", and "cycle-time" is the sum of all the intervals of the
         entries in the schedule;
      
       - clockid: specifies the reference clock to be used;
      
      The parameters should be similar to what the IEEE 802.1Q family of
      specification defines.
      Signed-off-by: NVinicius Costa Gomes <vinicius.gomes@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5a781ccb
    • V
      devlink: Add generic parameter msix_vec_per_pf_min · 16511789
      Vasundhara Volam 提交于
      msix_vec_per_pf_min - This param sets the number of minimal MSIX
      vectors required for the device initialization. This value is set
      in the device which limits MSIX vectors per PF.
      
      Cc: Jiri Pirko <jiri@mellanox.com>
      Cc: Michael Chan <michael.chan@broadcom.com>
      Signed-off-by: NVasundhara Volam <vasundhara-v.volam@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      16511789
    • V
      devlink: Add generic parameter msix_vec_per_pf_max · f61cba42
      Vasundhara Volam 提交于
      msix_vec_per_pf_max - This param sets the number of MSIX vectors
      that the device requests from the host on driver initialization.
      This value is set in the device which is applicable per PF.
      
      Cc: Jiri Pirko <jiri@mellanox.com>
      Cc: Michael Chan <michael.chan@broadcom.com>
      Signed-off-by: NVasundhara Volam <vasundhara-v.volam@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f61cba42
    • V
      devlink: Add generic parameter ignore_ari · e3b51061
      Vasundhara Volam 提交于
      ignore_ari - Device ignores ARI(Alternate Routing ID) capability,
      even when platforms has the support and creates same number of
      partitions when platform does not support ARI capability.
      
      Cc: Jiri Pirko <jiri@mellanox.com>
      Cc: Michael Chan <michael.chan@broadcom.com>
      Signed-off-by: NVasundhara Volam <vasundhara-v.volam@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e3b51061
    • D
      dns: Allow the dns resolver to retrieve a server set · bbb4c432
      David Howells 提交于
      Allow the DNS resolver to retrieve a set of servers and their associated
      addresses, ports, preference and weight ratings.
      
      In terms of communication with userspace, "srv=1" is added to the callout
      string (the '1' indicating the maximum data version supported by the
      kernel) to ask the userspace side for this.
      
      If the userspace side doesn't recognise it, it will ignore the option and
      return the usual text address list.
      
      If the userspace side does recognise it, it will return some binary data
      that begins with a zero byte that would cause the string parsers to give an
      error.  The second byte contains the version of the data in the blob (this
      may be between 1 and the version specified in the callout data).  The
      remainder of the payload is version-specific.
      
      In version 1, the payload looks like (note that this is packed):
      
      	u8	Non-string marker (ie. 0)
      	u8	Content (0 => Server list)
      	u8	Version (ie. 1)
      	u8	Source (eg. DNS_RECORD_FROM_DNS_SRV)
      	u8	Status (eg. DNS_LOOKUP_GOOD)
      	u8	Number of servers
      	foreach-server {
      		u16	Name length (LE)
      		u16	Priority (as per SRV record) (LE)
      		u16	Weight (as per SRV record) (LE)
      		u16	Port (LE)
      		u8	Source (eg. DNS_RECORD_FROM_NSS)
      		u8	Status (eg. DNS_LOOKUP_GOT_NOT_FOUND)
      		u8	Protocol (eg. DNS_SERVER_PROTOCOL_UDP)
      		u8	Number of addresses
      		char[]	Name (not NUL-terminated)
      		foreach-address {
      			u8		Family (AF_INET{,6})
      			union {
      				u8[4]	ipv4_addr
      				u8[16]	ipv6_addr
      			}
      		}
      	}
      
      This can then be used to fetch a whole cell's VL-server configuration for
      AFS, for example.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bbb4c432
  5. 04 10月, 2018 5 次提交