1. 06 10月, 2018 14 次提交
  2. 05 10月, 2018 26 次提交
    • C
      net_sched: convert idrinfo->lock from spinlock to a mutex · 95278dda
      Cong Wang 提交于
      In commit ec3ed293 ("net_sched: change tcf_del_walker() to take idrinfo->lock")
      we move fl_hw_destroy_tmplt() to a workqueue to avoid blocking
      with the spinlock held. Unfortunately, this causes a lot of
      troubles here:
      
      1. tcf_chain_destroy() could be called right after we queue the work
         but before the work runs. This is a use-after-free.
      
      2. The chain refcnt is already 0, we can't even just hold it again.
         We can check refcnt==1 but it is ugly.
      
      3. The chain with refcnt 0 is still visible in its block, which means
         it could be still found and used!
      
      4. The block has a refcnt too, we can't hold it without introducing a
         proper API either.
      
      We can make it working but the end result is ugly. Instead of wasting
      time on reviewing it, let's just convert the troubling spinlock to
      a mutex, which allows us to use non-atomic allocations too.
      
      Fixes: ec3ed293 ("net_sched: change tcf_del_walker() to take idrinfo->lock")
      Reported-by: NIdo Schimmel <idosch@idosch.org>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Vlad Buslov <vladbu@mellanox.com>
      Cc: Jiri Pirko <jiri@mellanox.com>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Tested-by: NIdo Schimmel <idosch@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      95278dda
    • D
      net/neigh: Extend dump filter to proxy neighbor dumps · 6f52f80e
      David Ahern 提交于
      Move the attribute parsing from neigh_dump_table to neigh_dump_info, and
      pass the filter arguments down to neigh_dump_table in a new struct. Add
      the filter option to proxy neigh dumps as well to make them consistent.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6f52f80e
    • D
      Merge branch 'net-metrics-consolidate' · 2970f2a8
      David S. Miller 提交于
      David Ahern says:
      
      ====================
      net: Consolidate metrics handling for ipv4 and ipv6
      
      As part of the IPv6 fib info refactoring, the intent was to make metrics
      handling for ipv6 identical to ipv4. One oversight in ip6_dst_destroy
      led to confusion and a couple of incomplete attempts at finding and
      fixing the resulting memory leak which was ultimately resolved by
      ce7ea4af ("ipv6: fix memory leak on dst->_metrics").
      
      Refactor metrics hanlding make the code really identical for v4 and v6,
      and add a few test cases.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2970f2a8
    • D
      fib_tests: Add tests for metrics on routes · a0e11da7
      David Ahern 提交于
      Add ipv4 and ipv6 test cases for metrics (mtu) when fib entries are
      created. Can be used with kmemleak to see leaks with both fib entries
      and dst_entry.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a0e11da7
    • D
      net: Move free of dst_metrics to helper · 1620a336
      David Ahern 提交于
      Move the refcounting and potential free of dst metrics associated
      for ipv4 and ipv6 to a common helper.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1620a336
    • D
      net: common metrics init helper for dst_entry · e1255ed4
      David Ahern 提交于
      ipv4 and ipv6 both use refcounted metrics if FIB entries have metrics set.
      Move the common initialization code to a helper and use for both protocols.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e1255ed4
    • D
      net: Move free of fib_metrics to helper · cc5f0eb2
      David Ahern 提交于
      Move the refcounting and potential free of dst metrics associated
      with a fib entry to a helper and use it in both ipv4 and ipv6.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cc5f0eb2
    • D
      net: common metrics init helper for FIB entries · 767a2217
      David Ahern 提交于
      Consolidate initialization of ipv4 and ipv6 metrics when fib entries
      are created into a single helper, ip_fib_metrics_init, that handles
      the call to ip_metrics_convert.
      
      If no metrics are defined for the fib entry, then the metrics is set
      to dst_default_metrics.
      Signed-off-by: NDavid Ahern <dsahern@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      767a2217
    • J
      net: sched: remove unused helpers · d26d4b19
      Jakub Kicinski 提交于
      tcf_block_dev() doesn't seem to be used anywhere in the tree.
      Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      d26d4b19
    • H
      geneve: allow to clear ttl inherit · a97d97ba
      Hangbin Liu 提交于
      As Michal remaind, we should allow to clear ttl inherit. Then we will
      have three states:
      
      1. set the flag, and do ttl inherit.
      2. do not set the flag, use configured ttl value, or default ttl (0) if
         not set.
      3. disable ttl inherit, use previous configured ttl value, or default ttl (0).
      
      Fixes: 52d0d404 ("geneve: add ttl inherit support")
      CC: Michal Kubecek <mkubecek@suse.cz>
      Signed-off-by: NHangbin Liu <liuhangbin@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a97d97ba
    • V
      tc: Add support for configuring the taprio scheduler · 5a781ccb
      Vinicius Costa Gomes 提交于
      This traffic scheduler allows traffic classes states (transmission
      allowed/not allowed, in the simplest case) to be scheduled, according
      to a pre-generated time sequence. This is the basis of the IEEE
      802.1Qbv specification.
      
      Example configuration:
      
      tc qdisc replace dev enp3s0 parent root handle 100 taprio \
                num_tc 3 \
      	  map 2 2 1 0 2 2 2 2 2 2 2 2 2 2 2 2 \
      	  queues 1@0 1@1 2@2 \
      	  base-time 1528743495910289987 \
      	  sched-entry S 01 300000 \
      	  sched-entry S 02 300000 \
      	  sched-entry S 04 300000 \
      	  clockid CLOCK_TAI
      
      The configuration format is similar to mqprio. The main difference is
      the presence of a schedule, built by multiple "sched-entry"
      definitions, each entry has the following format:
      
           sched-entry <CMD> <GATE MASK> <INTERVAL>
      
      The only supported <CMD> is "S", which means "SetGateStates",
      following the IEEE 802.1Qbv-2015 definition (Table 8-6). <GATE MASK>
      is a bitmask where each bit is a associated with a traffic class, so
      bit 0 (the least significant bit) being "on" means that traffic class
      0 is "active" for that schedule entry. <INTERVAL> is a time duration
      in nanoseconds that specifies for how long that state defined by <CMD>
      and <GATE MASK> should be held before moving to the next entry.
      
      This schedule is circular, that is, after the last entry is executed
      it starts from the first one, indefinitely.
      
      The other parameters can be defined as follows:
      
       - base-time: specifies the instant when the schedule starts, if
        'base-time' is a time in the past, the schedule will start at
      
       	      base-time + (N * cycle-time)
      
         where N is the smallest integer so the resulting time is greater
         than "now", and "cycle-time" is the sum of all the intervals of the
         entries in the schedule;
      
       - clockid: specifies the reference clock to be used;
      
      The parameters should be similar to what the IEEE 802.1Q family of
      specification defines.
      Signed-off-by: NVinicius Costa Gomes <vinicius.gomes@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5a781ccb
    • D
      Merge branch 'bnxt_en-devlink-param-updates' · 34f8c58f
      David S. Miller 提交于
      Vasundhara Volam says:
      
      ====================
      bnxt_en: devlink param updates
      
      This patchset adds support for 3 generic and 1 driver-specific devlink
      parameters. Add documentation for these configuration parameters.
      
      Also, this patchset adds support to return proper error code if
      HWRM_NVM_GET/SET_VARIABLE commands return error code
      HWRM_ERR_CODE_RESOURCE_ACCESS_DENIED.
      
      v3->v4:
      -Remove extra definition of NVM_OFF_HW_TC_OFFLOAD from bnxt_devlink.h
      -Remove type information for generic parameters from
      devlink-params-bnxt.txt
      
      v2->v3:
      -Remove description of generic parameters from devlink-params-bnxt.txt
      
      v1->v2:
      -Remove hw_tc_offload parameter.
      -Update all patches with Cc of MAINTAINERS.
      -Add more description in commit message for device specific parameter.
      -Add a new Documentation/networking/devlink-params.txt with some
      generic devlink parameters information.
      -Add a new Documentation/networking/devlink-params-bnxt.txt with devlink
      parameters information that are supported by bnxt_en driver.
      ====================
      Acked-by: NJiri Pirko <jiri@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      34f8c58f
    • V
      devlink: Add Documentation/networking/devlink-params-bnxt.txt · 53e233ea
      Vasundhara Volam 提交于
      This patch adds a new file to add information about configuration
      parameters that are supported by bnxt_en driver via devlink.
      
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: linux-doc@vger.kernel.org
      Cc: Jiri Pirko <jiri@mellanox.com>
      Cc: Michael Chan <michael.chan@broadcom.com>
      Signed-off-by: NVasundhara Volam <vasundhara-v.volam@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      53e233ea
    • V
      devlink: Add Documentation/networking/devlink-params.txt · 9bff98bb
      Vasundhara Volam 提交于
      This patch adds a new file to add information about some of the
      generic configuration parameters set via devlink.
      
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: linux-doc@vger.kernel.org
      Cc: Jiri Pirko <jiri@mellanox.com>
      Cc: Michael Chan <michael.chan@broadcom.com>
      Signed-off-by: NVasundhara Volam <vasundhara-v.volam@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9bff98bb
    • V
      bnxt_en: Add a driver specific gre_ver_check devlink parameter. · 2dc0865e
      Vasundhara Volam 提交于
      This patch adds following driver-specific permanent mode boolean
      parameter.
      
      gre_ver_check - Generic Routing Encapsulation(GRE) version check
      will be enabled in the device. If disabled, device skips version
      checking for GRE packets.
      
      Cc: Michael Chan <michael.chan@broadcom.com>
      Signed-off-by: NVasundhara Volam <vasundhara-v.volam@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2dc0865e
    • V
      bnxt_en: Use msix_vec_per_pf_max and msix_vec_per_pf_min devlink params. · f399e849
      Vasundhara Volam 提交于
      This patch adds support for following generic permanent mode
      devlink parameters. They can be modified using devlink param
      commands.
      
      msix_vec_per_pf_max - This param sets the number of MSIX vectors
      that the device requests from the host on driver initialization.
      This value is set in the device which limits MSIX vectors per PF.
      
      msix_vec_per_pf_min - This param sets the number of minimal MSIX
      vectors required for the device initialization. Value 0 indicates
      a default value is selected. This value is set in the device which
      limits MSIX vectors per PF.
      
      Cc: Michael Chan <michael.chan@broadcom.com>
      Signed-off-by: NVasundhara Volam <vasundhara-v.volam@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f399e849
    • V
      bnxt_en: return proper error when FW returns HWRM_ERR_CODE_RESOURCE_ACCESS_DENIED · 3a1d52a5
      Vasundhara Volam 提交于
      Return proper error code when Firmware returns
      HWRM_ERR_CODE_RESOURCE_ACCESS_DENIED for HWRM_NVM_GET/SET_VARIABLE
      commands.
      
      Cc: Michael Chan <michael.chan@broadcom.com>
      Signed-off-by: NVasundhara Volam <vasundhara-v.volam@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3a1d52a5
    • V
      bnxt_en: Use ignore_ari devlink parameter · 7d859234
      Vasundhara Volam 提交于
      This patch adds support for ignore_ari generic permanent mode
      devlink parameter. This parameter is disabled by default. It can be
      enabled using devlink param commands.
      
      ignore_ari - If enabled, device ignores ARI(Alternate Routing ID)
      capability, even when platforms has the support and creates same number
      of partitions when platform does not support ARI capability.
      
      Cc: Michael Chan <michael.chan@broadcom.com>
      Signed-off-by: NVasundhara Volam <vasundhara-v.volam@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7d859234
    • V
      devlink: Add generic parameter msix_vec_per_pf_min · 16511789
      Vasundhara Volam 提交于
      msix_vec_per_pf_min - This param sets the number of minimal MSIX
      vectors required for the device initialization. This value is set
      in the device which limits MSIX vectors per PF.
      
      Cc: Jiri Pirko <jiri@mellanox.com>
      Cc: Michael Chan <michael.chan@broadcom.com>
      Signed-off-by: NVasundhara Volam <vasundhara-v.volam@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      16511789
    • V
      devlink: Add generic parameter msix_vec_per_pf_max · f61cba42
      Vasundhara Volam 提交于
      msix_vec_per_pf_max - This param sets the number of MSIX vectors
      that the device requests from the host on driver initialization.
      This value is set in the device which is applicable per PF.
      
      Cc: Jiri Pirko <jiri@mellanox.com>
      Cc: Michael Chan <michael.chan@broadcom.com>
      Signed-off-by: NVasundhara Volam <vasundhara-v.volam@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f61cba42
    • V
      devlink: Add generic parameter ignore_ari · e3b51061
      Vasundhara Volam 提交于
      ignore_ari - Device ignores ARI(Alternate Routing ID) capability,
      even when platforms has the support and creates same number of
      partitions when platform does not support ARI capability.
      
      Cc: Jiri Pirko <jiri@mellanox.com>
      Cc: Michael Chan <michael.chan@broadcom.com>
      Signed-off-by: NVasundhara Volam <vasundhara-v.volam@broadcom.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e3b51061
    • N
      qed: Avoid implicit enum conversion in qed_ooo_submit_tx_buffers · 8fa74e3c
      Nathan Chancellor 提交于
      Clang warns when one enumerated type is implicitly converted to another.
      
      drivers/net/ethernet/qlogic/qed/qed_ll2.c:799:32: warning: implicit
      conversion from enumeration type 'enum core_tx_dest' to different
      enumeration type 'enum qed_ll2_tx_dest' [-Wenum-conversion]
                      tx_pkt.tx_dest = p_ll2_conn->tx_dest;
                                     ~ ~~~~~~~~~~~~^~~~~~~
      1 warning generated.
      
      Fix this by using a switch statement to convert between the enumerated
      values since they are not 1 to 1, which matches how the rest of the
      driver handles this conversion.
      
      Link: https://github.com/ClangBuiltLinux/linux/issues/125Suggested-by: NTomer Tayar <Tomer.Tayar@cavium.com>
      Signed-off-by: NNathan Chancellor <natechancellor@gmail.com>
      Acked-by: NTomer Tayar <Tomer.Tayar@cavium.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8fa74e3c
    • D
      Merge tag 'mlx5-updates-2018-10-03' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux · 9e50727f
      David S. Miller 提交于
      Saeed Mahameed says:
      
      ====================
      mlx5-updates-2018-10-03
      
      mlx5 core driver and ethernet netdev updates, please note there is a small
      devlink releated update to allow extack argument to eswitch operations.
      
      From Eli Britstein,
      1) devlink: Add extack argument to the eswitch related operations
      2) net/mlx5e: E-Switch, return extack messages for failures in the e-switch devlink callbacks
      3) net/mlx5e: Add extack messages for TC offload failures
      
      From Eran Ben Elisha,
      4) mlx5e: Add counter for aRFS rule insertion failures
      
      From Feras Daoud
      5) Fast teardown support for mlx5 device
      This change introduces the enhanced version of the "Force teardown" that
      allows SW to perform teardown in a faster way without the need to reclaim
      all the FW pages.
      Fast teardown provides the following advantages:
          1- Fix a FW race condition that could cause command timeout
          2- Avoid moving to polling mode
          3- Close the vport to prevent PCI ACK to be sent without been scatter
          to memory
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9e50727f
    • D
      Merge tag 'rxrpc-next-20181004' of git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs · f0e834e1
      David S. Miller 提交于
      David Howells says:
      
      ====================
      rxrpc: Development
      
      Here are some development patches for AF_RXRPC.  The most significant points
      are:
      
       (1) Change the tracepoint that indicates a packet has been transmitted
           into one that indicates a packet is about to be transmitted.  Without
           this, the response tracepoint may occur first if the round trip is
           fast enough.
      
       (2) Sort out AFS address list handling to better enforce maximum capacity
           to use helper functions to fill them and to do an insertion sort to
           order them.  This is here to make (3) easier.
      
       (3) Keep AF_INET addresses as AF_INET addresses rather than converting
           them to AF_INET6 in both AF_RXRPC and kAFS.  I hadn't realised that a
           UDP6 socket would just call down into UDP4 if given an AF_INET
           address.
      
       (4) Allow the timestamp on the first DATA packet of a reply to be
           retrieved by a kernel service.  This will give the kAFS a more
           accurate base from which to calculate the callback promise expiration.
      
       (5) Allow the rxrpc protocol epoch value to be retrieved from an incoming
           call.  This will allow kAFS to determine if the fileserver restarted
           and if two addresses apparently assigned to the same fileserver
           actually are different boxes.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f0e834e1
    • D
      dns: Allow the dns resolver to retrieve a server set · bbb4c432
      David Howells 提交于
      Allow the DNS resolver to retrieve a set of servers and their associated
      addresses, ports, preference and weight ratings.
      
      In terms of communication with userspace, "srv=1" is added to the callout
      string (the '1' indicating the maximum data version supported by the
      kernel) to ask the userspace side for this.
      
      If the userspace side doesn't recognise it, it will ignore the option and
      return the usual text address list.
      
      If the userspace side does recognise it, it will return some binary data
      that begins with a zero byte that would cause the string parsers to give an
      error.  The second byte contains the version of the data in the blob (this
      may be between 1 and the version specified in the callout data).  The
      remainder of the payload is version-specific.
      
      In version 1, the payload looks like (note that this is packed):
      
      	u8	Non-string marker (ie. 0)
      	u8	Content (0 => Server list)
      	u8	Version (ie. 1)
      	u8	Source (eg. DNS_RECORD_FROM_DNS_SRV)
      	u8	Status (eg. DNS_LOOKUP_GOOD)
      	u8	Number of servers
      	foreach-server {
      		u16	Name length (LE)
      		u16	Priority (as per SRV record) (LE)
      		u16	Weight (as per SRV record) (LE)
      		u16	Port (LE)
      		u8	Source (eg. DNS_RECORD_FROM_NSS)
      		u8	Status (eg. DNS_LOOKUP_GOT_NOT_FOUND)
      		u8	Protocol (eg. DNS_SERVER_PROTOCOL_UDP)
      		u8	Number of addresses
      		char[]	Name (not NUL-terminated)
      		foreach-address {
      			u8		Family (AF_INET{,6})
      			union {
      				u8[4]	ipv4_addr
      				u8[16]	ipv6_addr
      			}
      		}
      	}
      
      This can then be used to fetch a whole cell's VL-server configuration for
      AFS, for example.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      bbb4c432
    • C
      liquidio: fix a couple of spelling mistakes · 0aa63eb9
      Colin Ian King 提交于
      Trivial fix to spelling mistakes in dev_dbg warning messages
      
      "Reloade" -> "Reload"
      "chang" -> "change"
      Signed-off-by: NColin Ian King <colin.king@canonical.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0aa63eb9