1. 22 6月, 2017 24 次提交
  2. 21 6月, 2017 16 次提交
    • P
      sock: avoid dirtying incoming_cpu if not needed · 34cfb542
      Paolo Abeni 提交于
      for connected socket, the incoming_cpu field in the sock struct
      is not going to change frequently, but we are setting it
      unconditionally for each packet.
      
      Since sk_incoming_cpu and sk_flags share the same cacheline,
      and the latter is access by udp_recvmsg(), this cause a cache
      miss for each packet for UDP connected socket.
      
      With this patch, we set the incoming cpu field only when the
      ingress cpu really changes.
      
      This gives a small but measurable performance improvement for
      connected UDP socket.
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      34cfb542
    • D
      net: introduce SO_PEERGROUPS getsockopt · 28b5ba2a
      David Herrmann 提交于
      This adds the new getsockopt(2) option SO_PEERGROUPS on SOL_SOCKET to
      retrieve the auxiliary groups of the remote peer. It is designed to
      naturally extend SO_PEERCRED. That is, the underlying data is from the
      same credentials. Regarding its syntax, it is based on SO_PEERSEC. That
      is, if the provided buffer is too small, ERANGE is returned and @optlen
      is updated. Otherwise, the information is copied, @optlen is set to the
      actual size, and 0 is returned.
      
      While SO_PEERCRED (and thus `struct ucred') already returns the primary
      group, it lacks the auxiliary group vector. However, nearly all access
      controls (including kernel side VFS and SYSVIPC, but also user-space
      polkit, DBus, ...) consider the entire set of groups, rather than just
      the primary group. But this is currently not possible with pure
      SO_PEERCRED. Instead, user-space has to work around this and query the
      system database for the auxiliary groups of a UID retrieved via
      SO_PEERCRED.
      
      Unfortunately, there is no race-free way to query the auxiliary groups
      of the PID/UID retrieved via SO_PEERCRED. Hence, the current user-space
      solution is to use getgrouplist(3p), which itself falls back to NSS and
      whatever is configured in nsswitch.conf(3). This effectively checks
      which groups we *would* assign to the user if it logged in *now*. On
      normal systems it is as easy as reading /etc/group, but with NSS it can
      resort to quering network databases (eg., LDAP), using IPC or network
      communication.
      
      Long story short: Whenever we want to use auxiliary groups for access
      checks on IPC, we need further IPC to talk to the user/group databases,
      rather than just relying on SO_PEERCRED and the incoming socket. This
      is unfortunate, and might even result in dead-locks if the database
      query uses the same IPC as the original request.
      
      So far, those recursions / dead-locks have been avoided by using
      primitive IPC for all crucial NSS modules. However, we want to avoid
      re-inventing the wheel for each NSS module that might be involved in
      user/group queries. Hence, we would preferably make DBus (and other IPC
      that supports access-management based on groups) work without resorting
      to the user/group database. This new SO_PEERGROUPS ioctl would allow us
      to make dbus-daemon work without ever calling into NSS.
      
      Cc: Michal Sekletar <msekleta@redhat.com>
      Cc: Simon McVittie <simon.mcvittie@collabora.co.uk>
      Reviewed-by: NTom Gundersen <teg@jklm.no>
      Signed-off-by: NDavid Herrmann <dh.herrmann@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      28b5ba2a
    • P
      udp: prefetch rmem_alloc in udp_queue_rcv_skb() · dd99e425
      Paolo Abeni 提交于
      On UDP packets processing, if the BH is the bottle-neck, it
      always sees a cache miss while updating rmem_alloc; try to
      avoid it prefetching the value as soon as we have the socket
      available.
      
      Performances under flood with multiple NIC rx queues used are
      unaffected, but when a single NIC rx queue is in use, this
      gives ~10% performance improvement.
      Signed-off-by: NPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dd99e425
    • C
      qede: Fix compilation without QED_RDMA · da2e9cf0
      Chad Dupuis 提交于
      When CONFIG_QED_RDMA isn't defined, we'd hit the following:
      
       /include/linux/qed/qede_rdma.h:84:19:
       warning: ‘qede_rdma_dev_add’ used but never defined [enabled by default]
       static inline int qede_rdma_dev_add(struct qede_dev *dev);
      
      Fixes: bbfcd1e8 ("qed*: Set rdma generic functions prefix")
      Signed-off-by: NChad Dupuis <chad.dupuis@cavium.com>
      Signed-off-by: NYuval Mintz <Yuval.Mintz@cavium.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      da2e9cf0
    • H
      r8152: correct the definition · b65c0c9b
      hayeswang 提交于
      Replace VLAN_HLEN and CRC_SIZE with ETH_FCS_LEN.
      Signed-off-by: NHayes Wang <hayeswang@realtek.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b65c0c9b
    • D
      Merge branch '40GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/next-queue · adff41f9
      David S. Miller 提交于
      Jeff Kirsher says:
      
      ====================
      40GbE Intel Wired LAN Driver Updates 2017-06-20
      
      This series contains updates to i40e and i40evf only.
      
      Björn adds additional XDP support for i40e, by adding pass and drop actions
      and XDP_TX action support.
      
      Jake fixes a possible NULL pointer dereference in
      i40evf_get_ethtool_stats() which could occur if the VF fails to recover
      from a reset, and then a user requests statistics.  Changed the use of
      dev_info() to dev_dbg() for vf_capability client routine so that the
      standard log is not spammed with this information which "might" cause
      administrators to worry.  Also added more code comments to help explain
      why udp_port has be in host byte order and to avoid future changes which
      may cause this to break.  Fixed the holding of the RTNL lock for the
      entire reset routine, reduced the scope so that the reset function will
      handle its own lock, so that we do not have to wrap every reference
      to i40e_do_reset() with RTNL lock/unlock.
      
      Alice updates flags related to firmware interactions for WoL and admin
      queue command address with the correct value.
      
      Sudheer makes a fix to ensure that the array is not accessed past the
      size of the array.
      
      Greg fixes the parsing of firmware 4.33 admin queue commmand "Get CEE
      DCBX PER CFG" because the firmware now creates the oper_prio_tc nibbles
      reversed from those in the CDD Priority Group sub-TLV.
      
      Carolyn adds a check and message to let users know that when in MFP mode,
      changing RSS hash input set is not supported.
      
      Shannon makes the partition bandwidth control more generic since it is not
      in just one form of multi-function partitioning (MFP).  Also fixes a bug
      which was causing the firmware confusion in some reset sequences, when
      we were disabling interrupts and we were clearing the whole register.
      Instead we should only be clearing the CAUSE_ENA bit when disabling
      interrupts.
      
      Filip adds support for OEM firmware version, so that if a OEM specific
      adapter is detected, ethtool reports the OEM product version in the
      firmware version string instead of etrack id.
      
      Alan fixes a bug where the driver was not correctly exiting overflow
      promiscuous mode, which can happen if "too many" MAC filters are added,
      putting the driver into overflow promiscuous mode, and the filters are
      then removed.  The bug occurs because the conditional for toggling
      promiscuous mode was only be executed when enabled and not when it was
      disabled.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      adff41f9
    • D
      Merge branch 'ipmr-ip6mr-add-Netlink-notifications-on-cache-reports' · 2f89a795
      David S. Miller 提交于
      Julien Gomes says:
      
      ====================
      ipmr/ip6mr: add Netlink notifications on cache reports
      
      Currently, all ipmr/ip6mr cache reports are sent through the
      mroute/mroute6 socket only.
      This forces the use of a single socket for mroute programming, cache
      reports and, regarding ipmr, IGMP messages without Router Alert option
      reception.
      
      The present patches are aiming to send Netlink notifications in addition
      to the existing igmpmsg/mrt6msg to give user programs a way to handle
      cache reports in parallel with multiple sockets other than the
      mroute/mroute6 socket.
      
      Changes in v2:
      - Changed attributes naming from {IPMRA,IP6MRA}_CACHEREPORTA_* to
        {IPMRA,IP6MRA}_CREPORT_*
      - Improved packet data copy to handle non-linear packets in
        ipmr/ip6mr cache report Netlink notification creation
      - Added two rtnetlink groups with restricted-binding
      - Changed cache report notified groups from RTNL_{IPV4,IPV6}_MROUTE to
        the new restricted groups in ipmr/ip6mr
      
      Changes in v3:
      - Put message size calculation for {igmp,mrt6}msg_netlink_event in separate
        functions
      - Increased vif id attributes size from u8 to u32
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2f89a795
    • J
      ip6mr: add netlink notifications on mrt6msg cache reports · dd12d15c
      Julien Gomes 提交于
      Add Netlink notifications on cache reports in ip6mr, in addition to the
      existing mrt6msg sent to mroute6_sk.
      Send RTM_NEWCACHEREPORT notifications to RTNLGRP_IPV6_MROUTE_R.
      
      MSGTYPE, MIF_ID, SRC_ADDR and DST_ADDR Netlink attributes contain the
      same data as their equivalent fields in the mrt6msg header.
      PKT attribute is the packet sent to mroute6_sk, without the added
      mrt6msg header.
      Suggested-by: NRyan Halbrook <halbrook@arista.com>
      Signed-off-by: NJulien Gomes <julien@arista.com>
      Reviewed-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dd12d15c
    • J
      ipmr: add netlink notifications on igmpmsg cache reports · 5a645dd8
      Julien Gomes 提交于
      Add Netlink notifications on cache reports in ipmr, in addition to the
      existing igmpmsg sent to mroute_sk.
      Send RTM_NEWCACHEREPORT notifications to RTNLGRP_IPV4_MROUTE_R.
      
      MSGTYPE, VIF_ID, SRC_ADDR and DST_ADDR Netlink attributes contain the
      same data as their equivalent fields in the igmpmsg header.
      PKT attribute is the packet sent to mroute_sk, without the added igmpmsg
      header.
      Suggested-by: NRyan Halbrook <halbrook@arista.com>
      Signed-off-by: NJulien Gomes <julien@arista.com>
      Reviewed-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5a645dd8
    • J
      rtnetlink: add restricted rtnl groups for ipv4 and ipv6 mroute · 5f729eaa
      Julien Gomes 提交于
      Add RTNLGRP_{IPV4,IPV6}_MROUTE_R as two new restricted groups for the
      NETLINK_ROUTE family.
      Binding to these groups specifically requires CAP_NET_ADMIN to allow
      multicast of sensitive messages (e.g. mroute cache reports).
      Suggested-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NJulien Gomes <julien@arista.com>
      Signed-off-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5f729eaa
    • J
      rtnetlink: add NEWCACHEREPORT message type · 94df30a6
      Julien Gomes 提交于
      New NEWCACHEREPORT message type to be used for cache reports sent
      via Netlink, effectively allowing splitting cache report reception from
      mroute programming.
      Suggested-by: NRyan Halbrook <halbrook@arista.com>
      Signed-off-by: NJulien Gomes <julien@arista.com>
      Reviewed-by: NNikolay Aleksandrov <nikolay@cumulusnetworks.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      94df30a6
    • A
      tcp: md5: hide unused variable · 083a0326
      Arnd Bergmann 提交于
      Changing from a memcpy to per-member comparison left the
      size variable unused:
      
      net/ipv4/tcp_ipv4.c: In function 'tcp_md5_do_lookup':
      net/ipv4/tcp_ipv4.c:910:15: error: unused variable 'size' [-Werror=unused-variable]
      
      This does not show up when CONFIG_IPV6 is enabled, but the
      variable can be removed either way, along with the now unused
      assignment.
      
      Fixes: 6797318e ("tcp: md5: add an address prefix for key lookup")
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      083a0326
    • Y
      idsn: fix wrong skb_put() used · e1d20e22
      yuan linyu 提交于
      in my commit b952f4df,
      -	*(u8 *)skb_put(skb_out, 1) = (u8)(accm >> 24);	\
      +	skb_put(skb_out, (u8)(accm >> 24));	\
      it should skb_put_u8()
      
      Fixes: b952f4df ("net: manual clean code which call skb_put_[data:zero])")
      Signed-off-by: Nyuan linyu <Linyu.Yuan@alcatel-sbell.com.cn>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e1d20e22
    • J
      i40e: don't hold RTNL lock for the entire reset · dfc4ff64
      Jacob Keller 提交于
      We recently refactored i40e_do_reset() and its friends to be able to
      hold the RTNL lock only for the portions that actually need to be
      protected. However, a separate refactoring added several new callers of
      these functions during the PCIe error recovery and suspend/resume
      cycles.
      
      When merging the changes together, it was not noticed that we could
      reduce the RTNL scope by letting the reset function handle the lock
      itself, as previously it was not possible.
      
      Fix this by replacing these call sites to indicate that the reset
      function should handle its own lock. This enables multiple PFs to reset
      or resume simultaneously without serializing the resets via the RTNL
      lock. The end result is that on systems with lots of PFs and VFs the
      resets don't stall waiting for each other to finish.
      
      It is probable that we can also do the same for i40e_do_reset_safe, but
      this author did not research that change carefully enough to be
      confident.
      Signed-off-by: NJacob Keller <jacob.e.keller@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      dfc4ff64
    • C
      i40e: Handle PE_CRITERR properly with IWARP enabled · 7642984b
      Catherine Sullivan 提交于
      When IWARP is enabled, we weren't clearing the PE_CRITERR, just logging
      it and removing it from the mask. We need to do a corer to reset the
      PE_CRITERR register, so set the bit for that as we handle the
      interrupt.
      
      We should also be checking for the error against the PFINT_ICR0 register,
      and only need to clear it in the value getting written to
      PFINT_ICR0_ENA.
      Signed-off-by: NCatherine Sullivan <catherine.sullivan@intel.com>
      Signed-off-by: NMitch Williams <mitch.a.williams@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      7642984b
    • S
      i40e: clear only cause_ena bit · 2e5c26ea
      Shannon Nelson 提交于
      When disabling interrupts, we should only be clearing the CAUSE_ENA bit,
      not clearing the whole register.  Clearing the whole register sets the
      NEXTQ_IDX field to 0 instead of 0x7ff which can confuse the Firmware in
      some reset sequences.
      Signed-off-by: NShannon Nelson <shannon.nelson@intel.com>
      Signed-off-by: NMitch Williams <mitch.a.williams@intel.com>
      Tested-by: NAndrew Bowers <andrewx.bowers@intel.com>
      Signed-off-by: NJeff Kirsher <jeffrey.t.kirsher@intel.com>
      2e5c26ea