1. 05 10月, 2018 11 次提交
    • C
      net_sched: convert idrinfo->lock from spinlock to a mutex · 95278dda
      Cong Wang 提交于
      In commit ec3ed293 ("net_sched: change tcf_del_walker() to take idrinfo->lock")
      we move fl_hw_destroy_tmplt() to a workqueue to avoid blocking
      with the spinlock held. Unfortunately, this causes a lot of
      troubles here:
      
      1. tcf_chain_destroy() could be called right after we queue the work
         but before the work runs. This is a use-after-free.
      
      2. The chain refcnt is already 0, we can't even just hold it again.
         We can check refcnt==1 but it is ugly.
      
      3. The chain with refcnt 0 is still visible in its block, which means
         it could be still found and used!
      
      4. The block has a refcnt too, we can't hold it without introducing a
         proper API either.
      
      We can make it working but the end result is ugly. Instead of wasting
      time on reviewing it, let's just convert the troubling spinlock to
      a mutex, which allows us to use non-atomic allocations too.
      
      Fixes: ec3ed293 ("net_sched: change tcf_del_walker() to take idrinfo->lock")
      Reported-by: NIdo Schimmel <idosch@idosch.org>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Vlad Buslov <vladbu@mellanox.com>
      Cc: Jiri Pirko <jiri@mellanox.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Tested-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      95278dda
    • D
      net/neigh: Extend dump filter to proxy neighbor dumps · 6f52f80e
      David Ahern 提交于
      Move the attribute parsing from neigh_dump_table to neigh_dump_info, and
      pass the filter arguments down to neigh_dump_table in a new struct. Add
      the filter option to proxy neigh dumps as well to make them consistent.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6f52f80e
    • D
      net: Move free of dst_metrics to helper · 1620a336
      David Ahern 提交于
      Move the refcounting and potential free of dst metrics associated
      for ipv4 and ipv6 to a common helper.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1620a336
    • D
      net: common metrics init helper for dst_entry · e1255ed4
      David Ahern 提交于
      ipv4 and ipv6 both use refcounted metrics if FIB entries have metrics set.
      Move the common initialization code to a helper and use for both protocols.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e1255ed4
    • D
      net: Move free of fib_metrics to helper · cc5f0eb2
      David Ahern 提交于
      Move the refcounting and potential free of dst metrics associated
      with a fib entry to a helper and use it in both ipv4 and ipv6.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cc5f0eb2
    • D
      net: common metrics init helper for FIB entries · 767a2217
      David Ahern 提交于
      Consolidate initialization of ipv4 and ipv6 metrics when fib entries
      are created into a single helper, ip_fib_metrics_init, that handles
      the call to ip_metrics_convert.
      
      If no metrics are defined for the fib entry, then the metrics is set
      to dst_default_metrics.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      767a2217
    • V
      tc: Add support for configuring the taprio scheduler · 5a781ccb
      Vinicius Costa Gomes 提交于
      This traffic scheduler allows traffic classes states (transmission
      allowed/not allowed, in the simplest case) to be scheduled, according
      to a pre-generated time sequence. This is the basis of the IEEE
      802.1Qbv specification.
      
      Example configuration:
      
      tc qdisc replace dev enp3s0 parent root handle 100 taprio \
                num_tc 3 \
      	  map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
      	  queues 1@0 1@1 2@2 \
      	  base-time 1528743495910289987 \
      	  sched-entry S 01 300000 \
      	  sched-entry S 02 300000 \
      	  sched-entry S 04 300000 \
      	  clockid CLOCK_TAI
      
      The configuration format is similar to mqprio. The main difference is
      the presence of a schedule, built by multiple "sched-entry"
      definitions, each entry has the following format:
      
           sched-entry <CMD> <GATE MASK> <INTERVAL>
      
      The only supported <CMD> is "S", which means "SetGateStates",
      following the IEEE 802.1Qbv-2015 definition (Table 8-6). <GATE MASK>
      is a bitmask where each bit is a associated with a traffic class, so
      bit 0 (the least significant bit) being "on" means that traffic class
      0 is "active" for that schedule entry. <INTERVAL> is a time duration
      in nanoseconds that specifies for how long that state defined by <CMD>
      and <GATE MASK> should be held before moving to the next entry.
      
      This schedule is circular, that is, after the last entry is executed
      it starts from the first one, indefinitely.
      
      The other parameters can be defined as follows:
      
       - base-time: specifies the instant when the schedule starts, if
        'base-time' is a time in the past, the schedule will start at
      
       	      base-time + (N * cycle-time)
      
         where N is the smallest integer so the resulting time is greater
         than "now", and "cycle-time" is the sum of all the intervals of the
         entries in the schedule;
      
       - clockid: specifies the reference clock to be used;
      
      The parameters should be similar to what the IEEE 802.1Q family of
      specification defines.
      Signed-off-by: NVinicius Costa Gomes <vinicius.gomes@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5a781ccb
    • V
      devlink: Add generic parameter msix_vec_per_pf_min · 16511789
      Vasundhara Volam 提交于
      msix_vec_per_pf_min - This param sets the number of minimal MSIX
      vectors required for the device initialization. This value is set
      in the device which limits MSIX vectors per PF.
      
      Cc: Jiri Pirko <jiri@mellanox.com>
      Cc: Michael Chan <michael.chan@broadcom.com>
      Signed-off-by: NVasundhara Volam <vasundhara-v.volam@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      16511789
    • V
      devlink: Add generic parameter msix_vec_per_pf_max · f61cba42
      Vasundhara Volam 提交于
      msix_vec_per_pf_max - This param sets the number of MSIX vectors
      that the device requests from the host on driver initialization.
      This value is set in the device which is applicable per PF.
      
      Cc: Jiri Pirko <jiri@mellanox.com>
      Cc: Michael Chan <michael.chan@broadcom.com>
      Signed-off-by: NVasundhara Volam <vasundhara-v.volam@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f61cba42
    • V
      devlink: Add generic parameter ignore_ari · e3b51061
      Vasundhara Volam 提交于
      ignore_ari - Device ignores ARI(Alternate Routing ID) capability,
      even when platforms has the support and creates same number of
      partitions when platform does not support ARI capability.
      
      Cc: Jiri Pirko <jiri@mellanox.com>
      Cc: Michael Chan <michael.chan@broadcom.com>
      Signed-off-by: NVasundhara Volam <vasundhara-v.volam@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e3b51061
    • D
      dns: Allow the dns resolver to retrieve a server set · bbb4c432
      David Howells 提交于
      Allow the DNS resolver to retrieve a set of servers and their associated
      addresses, ports, preference and weight ratings.
      
      In terms of communication with userspace, "srv=1" is added to the callout
      string (the '1' indicating the maximum data version supported by the
      kernel) to ask the userspace side for this.
      
      If the userspace side doesn't recognise it, it will ignore the option and
      return the usual text address list.
      
      If the userspace side does recognise it, it will return some binary data
      that begins with a zero byte that would cause the string parsers to give an
      error.  The second byte contains the version of the data in the blob (this
      may be between 1 and the version specified in the callout data).  The
      remainder of the payload is version-specific.
      
      In version 1, the payload looks like (note that this is packed):
      
      	u8	Non-string marker (ie. 0)
      	u8	Content (0 => Server list)
      	u8	Version (ie. 1)
      	u8	Source (eg. DNS_RECORD_FROM_DNS_SRV)
      	u8	Status (eg. DNS_LOOKUP_GOOD)
      	u8	Number of servers
      	foreach-server {
      		u16	Name length (LE)
      		u16	Priority (as per SRV record) (LE)
      		u16	Weight (as per SRV record) (LE)
      		u16	Port (LE)
      		u8	Source (eg. DNS_RECORD_FROM_NSS)
      		u8	Status (eg. DNS_LOOKUP_GOT_NOT_FOUND)
      		u8	Protocol (eg. DNS_SERVER_PROTOCOL_UDP)
      		u8	Number of addresses
      		char[]	Name (not NUL-terminated)
      		foreach-address {
      			u8		Family (AF_INET{,6})
      			union {
      				u8[4]	ipv4_addr
      				u8[16]	ipv6_addr
      			}
      		}
      	}
      
      This can then be used to fetch a whole cell's VL-server configuration for
      AFS, for example.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bbb4c432
  2. 04 10月, 2018 8 次提交
  3. 03 10月, 2018 18 次提交
  4. 02 10月, 2018 3 次提交
    • D
      bond: take rcu lock in netpoll_send_skb_on_dev · 6fe94878
      Dave Jones 提交于
      The bonding driver lacks the rcu lock when it calls down into
      netdev_lower_get_next_private_rcu from bond_poll_controller, which
      results in a trace like:
      
      WARNING: CPU: 2 PID: 179 at net/core/dev.c:6567 netdev_lower_get_next_private_rcu+0x34/0x40
      CPU: 2 PID: 179 Comm: kworker/u16:15 Not tainted 4.19.0-rc5-backup+ #1
      Workqueue: bond0 bond_mii_monitor
      RIP: 0010:netdev_lower_get_next_private_rcu+0x34/0x40
      Code: 48 89 fb e8 fe 29 63 ff 85 c0 74 1e 48 8b 45 00 48 81 c3 c0 00 00 00 48 8b 00 48 39 d8 74 0f 48 89 45 00 48 8b 40 f8 5b 5d c3 <0f> 0b eb de 31 c0 eb f5 0f 1f 40 00 0f 1f 44 00 00 48 8>
      RSP: 0018:ffffc9000087fa68 EFLAGS: 00010046
      RAX: 0000000000000000 RBX: ffff880429614560 RCX: 0000000000000000
      RDX: 0000000000000001 RSI: 00000000ffffffff RDI: ffffffffa184ada0
      RBP: ffffc9000087fa80 R08: 0000000000000001 R09: 0000000000000000
      R10: ffffc9000087f9f0 R11: ffff880429798040 R12: ffff8804289d5980
      R13: ffffffffa1511f60 R14: 00000000000000c8 R15: 00000000ffffffff
      FS:  0000000000000000(0000) GS:ffff88042f880000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 00007f4b78fce180 CR3: 000000018180f006 CR4: 00000000001606e0
      Call Trace:
       bond_poll_controller+0x52/0x170
       netpoll_poll_dev+0x79/0x290
       netpoll_send_skb_on_dev+0x158/0x2c0
       netpoll_send_udp+0x2d5/0x430
       write_ext_msg+0x1e0/0x210
       console_unlock+0x3c4/0x630
       vprintk_emit+0xfa/0x2f0
       printk+0x52/0x6e
       ? __netdev_printk+0x12b/0x220
       netdev_info+0x64/0x80
       ? bond_3ad_set_carrier+0xe9/0x180
       bond_select_active_slave+0x1fc/0x310
       bond_mii_monitor+0x709/0x9b0
       process_one_work+0x221/0x5e0
       worker_thread+0x4f/0x3b0
       kthread+0x100/0x140
       ? process_one_work+0x5e0/0x5e0
       ? kthread_delayed_work_timer_fn+0x90/0x90
       ret_from_fork+0x24/0x30
      
      We're also doing rcu dereferences a layer up in netpoll_send_skb_on_dev
      before we call down into netpoll_poll_dev, so just take the lock there.
      Suggested-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDave Jones <davej@codemonkey.org.uk>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6fe94878
    • D
      rtnetlink: Fail dump if target netnsid is invalid · 893626d6
      David Ahern 提交于
      Link dumps can return results from a target namespace. If the namespace id
      is invalid, then the dump request should fail if get_target_net fails
      rather than continuing with a dump of the current namespace.
      
      Fixes: 79e1ad14 ("rtnetlink: use netnsid to query interface")
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      893626d6
    • F
      Revert "openvswitch: Fix template leak in error cases." · 7f6d6558
      Flavio Leitner 提交于
      This reverts commit 90c7afc9.
      
      When the commit was merged, the code used nf_ct_put() to free
      the entry, but later on commit 76644232 ("openvswitch: Free
      tmpl with tmpl_free.") replaced that with nf_ct_tmpl_free which
      is a more appropriate. Now the original problem is removed.
      
      Then 44d6e2f2 ("net: Replace NF_CT_ASSERT() with WARN_ON().")
      replaced a debug assert with a WARN_ON() which is trigged now.
      Signed-off-by: NFlavio Leitner <fbl@redhat.com>
      Acked-by: NJoe Stringer <joe@ovn.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7f6d6558