1. 14 2月, 2018 17 次提交
  2. 13 2月, 2018 14 次提交
    • K
      net: Convert loopback_net_ops · 9a4d105d
      Kirill Tkhai 提交于
      These pernet_operations have only init() method. It allocates
      memory for net_device, calls register_netdev() and assigns
      net::loopback_dev.
      
      register_netdev() is allowed be used without additional locks,
      as it's synchronized on rtnl_lock(). There are many examples
      of using this functon directly from ioctl().
      
      The only difference, compared to ioctl(), is that net is not
      completely alive at this moment. But it looks like, there is
      no way for parallel pernet_operations to dereference
      the net_device, as the most of struct net_device lists,
      where it's linked, are related to net, and the net is not liked.
      
      The exceptions are net_device::unreg_list, close_list, todo_list,
      used for unregistration, and ::link_watch_list, where net_device
      may be linked to global lists.
      
      Unregistration of loopback_dev obviously can't happen, when
      loopback_net_init() is executing, as the net as alive. It occurs
      in default_device_ops, which currently requires net_mutex,
      and it behaves as a barrier at the moment. It will be considered
      in next patch.
      
      Speaking about link_watch_list, it seems, there is no way
      for loopback_dev at time of registration to be linked in lweventlist
      and be available for another pernet_operations.
      Signed-off-by: NKirill Tkhai <ktkhai@virtuozzo.com>
      Acked-by: NAndrei Vagin <avagin@virtuozzo.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9a4d105d
    • A
      i40e/i40evf: Add support for new mechanism of updating adaptive ITR · a0073a4b
      Alexander Duyck 提交于
      This patch replaces the existing mechanism for determining the correct
      value to program for adaptive ITR with yet another new and more
      complicated approach.
      
      The basic idea from a 30K foot view is that this new approach will push the
      Rx interrupt moderation up so that by default it starts in low latency and
      is gradually pushed up into a higher latency setup as long as doing so
      increases the number of packets processed, if the number of packets drops
      to 4 to 1 per packet we will reset and just base our ITR on the size of the
      packets being received. For Tx we leave it floating at a high interrupt
      delay and do not pull it down unless we start processing more than 112
      packets per interrupt. If we start exceeding that we will cut our interrupt
      rates in half until we are back below 112.
      
      The side effect of these patches are that we will be processing more
      packets per interrupt. This is both a good and a bad thing as it means we
      will not be blocking processing in the case of things like pktgen and XDP,
      but we will also be consuming a bit more CPU in the cases of things such as
      network throughput tests using netperf.
      
      One delta from this versus the ixgbe version of the changes is that I have
      made the interrupt moderation a bit more aggressive when we are in bulk
      mode by moving our "goldilocks zone" up from 48 to 96 to 56 to 112. The
      main motivation behind moving this is to address the fact that we need to
      update less frequently, and have more fine grained control due to the
      separate Tx and Rx ITR times.
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      a0073a4b
    • A
      i40e/i40evf: Split container ITR into current_itr and target_itr · 556fdfd6
      Alexander Duyck 提交于
      This patch is mostly prep-work for replacing the current approach to
      programming the dynamic aka adaptive ITR. Specifically here what we are
      doing is splitting the Tx and Rx ITR each into two separate values.
      
      The first value current_itr represents the current value of the register.
      
      The second value target_itr represents the desired value of the register.
      
      The general plan by doing this is to allow for deferring the update of the
      ITR value under certain circumstances. For now we will work with what we
      have, but in the future I hope to change the behavior so that we always
      only update one ITR at a time using some simple logic to determine which
      ITR requires an update.
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      556fdfd6
    • A
      i40evf: Correctly populate rxitr_idx and txitr_idx · d4942d58
      Alexander Duyck 提交于
      While testing code for the recent ITR changes I found that updating the Tx
      ITR appeared to have no effect with everything defaulting to the Rx ITR. A
      bit of digging narrowed it down the fact that we were asking the PF to
      associate all causes with ITR 0 as we weren't populating the itr_idx values
      for either Rx or Tx.
      
      To correct it I have added the configuration for these values to this
      patch. In addition I did some minor clean-up to just add a local pointer
      for the vector map instead of dereferencing it based off of the index
      repeatedly. In my opinion this makes the resultant code a bit more readable
      and saves us a few characters.
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      d4942d58
    • A
      i40e/i40evf: Use usec value instead of reg value for ITR defines · 92418fb1
      Alexander Duyck 提交于
      Instead of using the register value for the defines when setting up the
      ring ITR we can just use the actual values and avoid the use of shifts and
      macros to translate between the values we have and the values we want.
      
      This helps to make the code more readable as we can quickly translate from
      one value to the other.
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      92418fb1
    • D
      net: make getname() functions return length rather than use int* parameter · 9b2c45d4
      Denys Vlasenko 提交于
      Changes since v1:
      Added changes in these files:
          drivers/infiniband/hw/usnic/usnic_transport.c
          drivers/staging/lustre/lnet/lnet/lib-socket.c
          drivers/target/iscsi/iscsi_target_login.c
          drivers/vhost/net.c
          fs/dlm/lowcomms.c
          fs/ocfs2/cluster/tcp.c
          security/tomoyo/network.c
      
      Before:
      All these functions either return a negative error indicator,
      or store length of sockaddr into "int *socklen" parameter
      and return zero on success.
      
      "int *socklen" parameter is awkward. For example, if caller does not
      care, it still needs to provide on-stack storage for the value
      it does not need.
      
      None of the many FOO_getname() functions of various protocols
      ever used old value of *socklen. They always just overwrite it.
      
      This change drops this parameter, and makes all these functions, on success,
      return length of sockaddr. It's always >= 0 and can be differentiated
      from an error.
      
      Tests in callers are changed from "if (err)" to "if (err < 0)", where needed.
      
      rpc_sockname() lost "int buflen" parameter, since its only use was
      to be passed to kernel_getsockname() as &buflen and subsequently
      not used in any way.
      
      Userspace API is not changed.
      
          text    data     bss      dec     hex filename
      30108430 2633624  873672 33615726 200ef6e vmlinux.before.o
      30108109 2633612  873672 33615393 200ee21 vmlinux.o
      Signed-off-by: NDenys Vlasenko <dvlasenk@redhat.com>
      CC: David S. Miller <davem@davemloft.net>
      CC: linux-kernel@vger.kernel.org
      CC: netdev@vger.kernel.org
      CC: linux-bluetooth@vger.kernel.org
      CC: linux-decnet-user@lists.sourceforge.net
      CC: linux-wireless@vger.kernel.org
      CC: linux-rdma@vger.kernel.org
      CC: linux-sctp@vger.kernel.org
      CC: linux-nfs@vger.kernel.org
      CC: linux-x25@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      9b2c45d4
    • A
      i40e/i40evf: Don't bother setting the CLEARPBA bit · 4ff17929
      Alexander Duyck 提交于
      The CLEARPBA bit in the dynamic interrupt control register actually has
      no effect either way on the hardware. As per errata 28 in the XL710
      specification update the interrupt is actually cleared any time the
      register is written with the INTENA_MSK bit set to 0. As such the act of
      toggling the enable bit actually will trigger the interrupt being
      cleared and could lead to potential lost events if auto-masking is
      not enabled.
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      4ff17929
    • A
      i40e/i40evf: Clean-up of bits related to using q_vector->reg_idx · 8b99b117
      Alexander Duyck 提交于
      This patch is a further clean-up related to the change over to using
      q_vector->reg_idx when accessing the ITR registers. Specifically the code
      appears to have several other spots where we were computing the register
      offset manually and this resulted in errors in a few spots.
      
      Specifically in the i40evf functions for mapping queues to vectors it
      appears we may have had an off by 1 error since (v_idx - 1) for the first
      q_vector with an index of 0 would result in us returning -1 if I am not
      mistaken.
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      8b99b117
    • A
      i40e: use changed_flags to check I40E_FLAG_DISABLE_FW_LLDP · fe09ed0e
      Alan Brady 提交于
      Currently in i40e_set_priv_flags we use new_flags to check for the
      I40E_FLAG_DISABLE_FW_LLDP flag.  This is an issue for a few a reasons.
      DISABLE_FW_LLDP is persistent across reboots/driver reloads.  This means
      we need some way to detect if FW LLDP is enabled on init.  We do this by
      trying to init_dcb and if it fails with EPERM we know LLDP is disabled
      in FW.
      
      This could be a problem on older FW versions or NPAR enabled PFs because
      there are situations where the FW could disable LLDP, but they do _not_
      support using this flag to change it.  If we do end up in this
      situation, the flag will be set, then when the user tries to change any
      priv flags, the driver thinks the user is trying to disable FW LLDP on a
      FW that doesn't support it and essentially forbids any priv flag
      changes.
      
      The fix is simple, instead of checking if this flag is set, we should be
      checking if the user is trying to _change_ the flag on unsupported FW
      versions.
      
      This patch also adds a comment explaining that the cmpxchg is the point
      of no return.  Once we put the new flags into pf->flags we can't back
      out.
      Signed-off-by: NAlan Brady <alan.brady@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      fe09ed0e
    • P
      i40e: Warn when setting link-down-on-close while in MFP · 17b4d25c
      Paweł Jabłoński 提交于
      This patch adds a warning message when the link-down-on-close flag is
      setting on. The warning is printed only on MFP devices
      Signed-off-by: NPaweł Jabłoński <pawel.jablonski@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      17b4d25c
    • F
      i40e: Add delay after EMP reset for firmware to recover · 1fa51a65
      Filip Sadowski 提交于
      This patch adds necessary delay for 4.33 firmware to recover after
      EMP reset. Without this patch driver occasionally reinitializes
      structures too quickly to communicate with firmware after EMP reset
      causing AdminQ to timeout.
      Signed-off-by: NFilip Sadowski <filip.sadowski@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      1fa51a65
    • A
      i40e/i40evf: Clean up logic for adaptive ITR · 71dc3719
      Alexander Duyck 提交于
      The logic for dynamic ITR update is confusing at best as there were odd
      paths chosen for how to find the rings associated with a given queue based
      on the vector index and other inconsistencies throughout the code.
      
      This patch is an attempt to clean up the logic so that we can more easily
      understand what is going on. Specifically if there is a Rx or Tx ring that
      is enabled in dynamic mode on the q_vector it is allowed to override the
      other side of the interrupt moderation. While it isn't correct all this
      patch is doing is cleaning up the logic for now so that when we come
      through and fix it we can more easily identify that this is wrong.
      
      The other big change made here is that we replace references to:
      	vsi->rx_rings[q_vector->v_idx]->itr_setting
      with:
      	q_vector->rx.ring->itr_setting
      
      The general idea is we can avoid the long pointer chase since just
      accessing q_vector->rx.ring is a single pointer access versus having to
      chase down vsi->rx_rings, and then finding the pointer in the array, and
      finally chasing down the itr_setting from there.
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      71dc3719
    • A
      i40e/i40evf: Only track one ITR setting per ring instead of Tx/Rx · 40588ca6
      Alexander Duyck 提交于
      The rings are already split out into Tx and Rx rings so it doesn't make
      sense to have any single ring store both a Tx and Rx itr_setting value.
      Since that is the case drop the pair in favor of storing just a single ITR
      value.
      Signed-off-by: NAlexander Duyck <alexander.h.duyck@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      40588ca6
    • A
      i40e: fix typo in function description · 11a350c9
      Alan Brady 提交于
      'bufer' should be 'buffer'
      Signed-off-by: NAlan Brady <alan.brady@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      11a350c9
  3. 12 2月, 2018 1 次提交
    • L
      vfs: do bulk POLL* -> EPOLL* replacement · a9a08845
      Linus Torvalds 提交于
      This is the mindless scripted replacement of kernel use of POLL*
      variables as described by Al, done by this script:
      
          for V in IN OUT PRI ERR RDNORM RDBAND WRNORM WRBAND HUP RDHUP NVAL MSG; do
              L=`git grep -l -w POLL$V | grep -v '^t' | grep -v /um/ | grep -v '^sa' | grep -v '/poll.h$'|grep -v '^D'`
              for f in $L; do sed -i "-es/^\([^\"]*\)\(\<POLL$V\>\)/\\1E\\2/" $f; done
          done
      
      with de-mangling cleanups yet to come.
      
      NOTE! On almost all architectures, the EPOLL* constants have the same
      values as the POLL* constants do.  But they keyword here is "almost".
      For various bad reasons they aren't the same, and epoll() doesn't
      actually work quite correctly in some cases due to this on Sparc et al.
      
      The next patch from Al will sort out the final differences, and we
      should be all done.
      Scripted-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a9a08845
  4. 10 2月, 2018 8 次提交