1. 01 11月, 2017 24 次提交
  2. 29 10月, 2017 9 次提交
    • M
      ipvlan: implement VEPA mode · fe89aa6b
      Mahesh Bandewar 提交于
      This is very similar to the Macvlan VEPA mode, however, there is some
      difference. IPvlan uses the mac-address of the lower device, so the VEPA
      mode has implications of ICMP-redirects for packets destined for its
      immediate neighbors sharing same master since the packets will have same
      source and dest mac. The external switch/router will send redirect msg.
      
      Having said that, this will be useful tool in terms of debugging
      since IPvlan will not switch packets within its slaves and rely completely
      on the external entity as intended in 802.1Qbg.
      Signed-off-by: NMahesh Bandewar <maheshb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      fe89aa6b
    • M
      ipvlan: introduce 'private' attribute for all existing modes. · a190d04d
      Mahesh Bandewar 提交于
      IPvlan has always operated in bridge mode. However there are scenarios
      where each slave should be able to talk through the master device but
      not necessarily across each other. Think of an environment where each
      of a namespace is a private and independant customer. In this scenario
      the machine which is hosting these namespaces neither want to tell who
      their neighbor is nor the individual namespaces care to talk to neighbor
      on short-circuited network path.
      
      This patch implements the mode that is very similar to the 'private' mode
      in macvlan where individual slaves can send and receive traffic through
      the master device, just that they can not talk among slave devices.
      Signed-off-by: NMahesh Bandewar <maheshb@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a190d04d
    • W
      net: aquantia: Make local functions static · 2660d226
      Wei Yongjun 提交于
      Fixes the following sparse warnings:
      
      drivers/net/ethernet/aquantia/atlantic/aq_ethtool.c:224:5: warning:
       symbol 'aq_ethtool_get_coalesce' was not declared. Should it be static?
      drivers/net/ethernet/aquantia/atlantic/aq_ethtool.c:245:5: warning:
       symbol 'aq_ethtool_set_coalesce' was not declared. Should it be static?
      Signed-off-by: NWei Yongjun <weiyongjun1@huawei.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2660d226
    • F
      net: dsa: b53: Export b53_configure_vlan() · 5c1a6eaf
      Florian Fainelli 提交于
      bcm_sf2 and b53 replicate the same operations: clear all VLANs and set
      their ports to the default VLAN tag (1 for these devices) so export the
      b53 function doing just that.
      Signed-off-by: NFlorian Fainelli <f.fainelli@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5c1a6eaf
    • F
      liquidio: get rid of false alarm "Unknown cmd 27" in dmesg · 641da8ed
      Felix Manlunas 提交于
      Creating a macvtap interface with the liquidio VF driver as lower device
      causes this alarming message to show up in dmesg:
      
          liquidio_link_ctrl_cmd_completion Unknown cmd 27
      
      That's actually a false alarm because cmd 27 is the value of the macro
      OCTNET_CMD_SET_UC_LIST which is known.  It's a control command sent from
      host to NIC firmware to set the unicast MAC address list of the macvtap
      lower device.
      
      Make the false alarm go away by adding a case for OCTNET_CMD_SET_UC_LIST
      in liquidio_link_ctrl_cmd_completion().
      Signed-off-by: NFelix Manlunas <felix.manlunas@cavium.com>
      Signed-off-by: NRaghu Vatsavayi <raghu.vatsavayi@cavium.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      641da8ed
    • H
      hv_netvsc: Set tx_table to equal weight after subchannels open · a6fb6aa3
      Haiyang Zhang 提交于
      In some cases, like internal vSwitch, the host doesn't provide
      send indirection table updates. This patch sets the table to be
      equal weight after subchannels are all open. Otherwise, all workload
      will be on one TX channel.
      
      As tested, this patch has largely increased the throughput over
      internal vSwitch.
      Signed-off-by: NHaiyang Zhang <haiyangz@microsoft.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a6fb6aa3
    • M
      ppp: allow usage in namespaces · 90e229ef
      Matteo Croce 提交于
      Check for CAP_NET_ADMIN with ns_capable() instead of capable()
      to allow usage of ppp in user namespace other than the init one.
      Signed-off-by: NMatteo Croce <mcroce@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      90e229ef
    • A
      cxgb3: Check and handle the dma mapping errors · c69fe407
      Arjun Vynipadath 提交于
      This patch adds checks at approprate places whether *dma_map*() call has
      succeeded or not.
      
      Original Work by: Santosh Rastapur <santosh@chelsio.com>
      Signed-off-by: NArjun Vynipadath <arjun@chelsio.com>
      Signed-off-by: NGanesh Goudar <ganeshgr@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c69fe407
    • F
      r8169: Add support for interrupt coalesce tuning (ethtool -C) · 50970831
      Francois Romieu 提交于
      Kirr: In particular with
      
      	ethtool -C <ifname> rx-usecs 0 rx-frames 0
      
      now it is possible to disable RX delays when NIC usage requires low-latency.
      
      See this thread for context:
      
      	https://www.spinics.net/lists/netdev/msg217665.html
      
      My specific case is that:
      
      We have many computers with gigabit Realtek NICs. For 2 such computers
      connected to a gigabit store-and-forward switch the minimum round-trip
      time for small pings (`ping -i 0 -w 3 -s 56 -q peer`) is ~ 30μs.
      
      However it turned out that when Ethernet frame length transitions 127 ->
      128 bytes (`ping -i 0 -w 3 -s {81 -> 82} -q peer`) the lowest RTT
      transitions step-wise to ~ 270μs.
      
      As David Light said this is RX interrupt mitigation done by NIC which creates
      the latency. For workloads when low-latency is required with e.g. Intel,
      BCM etc NIC drivers one just uses `ethtool -C rx-usecs ...` to reduce
      the time NIC delays before interrupting CPU, but it turned out
      `ethtool -C` is not supported by r8169 driver.
      
      Like Stéphane ANCELOT I've traced the problem down to IntrMitigate being
      hardcoded to != 0 for our chips (we have 8168 based NICs):
      
      https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/realtek/r8169.c#n5460
      static void rtl_hw_start_8169(struct net_device *dev) {
              ...
              /*
               * Undocumented corner. Supposedly:
               * (TxTimer << 12) | (TxPackets << 8) | (RxTimer << 4) | RxPackets
               */
              RTL_W16(IntrMitigate, 0x0000);
      
      https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/net/ethernet/realtek/r8169.c#n6346
      static void rtl_hw_start_8168(struct net_device *dev) {
              ...
              RTL_W16(IntrMitigate, 0x5151);
      
      and then I've also found
      
      	https://www.spinics.net/lists/netdev/msg217665.html
      
      and original Francois' patch:
      
      	https://www.spinics.net/lists/netdev/msg217984.html
      	https://www.spinics.net/lists/netdev/msg218207.html
      
      So could we please finally get support for tuning r8169 interrupt
      coalescing in tree? (so that next poor soul who hits the problem does
      not need to go all the way to dig into driver sources and internet
      wildly and finally patch locally
      
              -RTL_W16(IntrMitigate, 0x5151);
              +RTL_W16(IntrMitigate, 0x5100);
      
      guessing whether it is right or not and also having to care to deploy
      the patch everywhere it needs to be used, etc...).
      
      To do so I've took original Francois's patch from 2012 and reworked it a bit:
      
      - updated to latest net-next.git;
      - adjusted scaling setup based on feedback from Hayes to pick up scaling
        vector depending not only on link speed but also on CPlusCmd[0:1] and to
        adjust CPlusCmd[0:1] correspondingly when setting timings;
      - improved a bit (I think so) error handling.
      
      I've tested the patch on "RTL8168d/8111d" (XID 083000c0) and with it and
      `ethtool -C rx-usecs 0 rx-frames 0` on both ends it improves:
      
      - minimum RTT latency:
      
              ~270μs ->  ~30μs (small packet),
              ~330μs -> ~110μs (full 1.5K ethernet frame)
      
      - average RTT latency:
      
              ~480μs ->  ~50μs (small packet),
              ~560μs -> ~125μs (full 1.5K ethernet frame)
      
      ( before:
      
              root@neo1:# ping -i 0 -w 3 -s 82 -q neo2
              PING neo2.kirr.nexedi.com (192.168.102.21) 82(110) bytes of data.
      
              --- neo2.kirr.nexedi.com ping statistics ---
              5906 packets transmitted, 5905 received, 0% packet loss, time 2999ms
              rtt min/avg/max/mdev = 0.274/0.485/0.607/0.026 ms, ipg/ewma 0.508/0.489 ms
      
              root@neo1:# ping -i 0 -w 3 -s 1472 -q neo2
              PING neo2.kirr.nexedi.com (192.168.102.21) 1472(1500) bytes of data.
      
              --- neo2.kirr.nexedi.com ping statistics ---
              5073 packets transmitted, 5073 received, 0% packet loss, time 2999ms
              rtt min/avg/max/mdev = 0.330/0.566/0.710/0.028 ms, ipg/ewma 0.591/0.544 ms
      
        after:
      
              root@neo1# ping -i 0 -w 3 -s 82 -q neo2
              PING neo2.kirr.nexedi.com (192.168.102.21) 82(110) bytes of data.
      
              --- neo2.kirr.nexedi.com ping statistics ---
              45815 packets transmitted, 45815 received, 0% packet loss, time 3000ms
              rtt min/avg/max/mdev = 0.036/0.051/0.368/0.010 ms, ipg/ewma 0.065/0.053 ms
      
              root@neo1:# ping -i 0 -w 3 -s 1472 -q neo2
              PING neo2.kirr.nexedi.com (192.168.102.21) 1472(1500) bytes of data.
      
              --- neo2.kirr.nexedi.com ping statistics ---
              21250 packets transmitted, 21250 received, 0% packet loss, time 3000ms
              rtt min/avg/max/mdev = 0.112/0.125/0.390/0.007 ms, ipg/ewma 0.141/0.125 ms
      
        the small -> 1.5K latency growth is understandable as it takes ~15μs
        to transmit 1.5K on 1Gbps on the wire and with 2 hosts and 1 switch
        and ICMP ECHO + ECHO reply the packet has to travel 4 ethernet
        segments which is already 60μs;
      
        probably something a bit else is also there as e.g. on Linux, even
        with `cpupower frequency-set -g performance`, on some computers I've
        noticed the kernel can be spending more time in software-only mode
        when incoming packets go in less frequently. E.g. this program can
        demonstrate the effect for ICMP ECHO processing:
      
        https://lab.nexedi.com/kirr/bcc/blob/43cfc13b/tools/pinglat.py
      
        (later this was found to be partly due to C-states exit latencies) )
      
      We have this patch running in our testing setup for 1 months already
      without any issues observed.
      
      It remains to be clarified whether RX and TX timers use the same base.
      For now I've set them equally, but Francois's original patch version
      suggests it could be not the same.
      
      I've got no feedback at all to my original posting of this patch and questions
      
      	https://www.spinics.net/lists/netdev/msg457173.html
      
      neither from Francois, nor from any people from Realtek during one month.
      
      So I suggest we simply apply it to net-next.git now.
      
      Cc: Francois Romieu <romieu@fr.zoreil.com>
      Cc: Hayes Wang <hayeswang@realtek.com>
      Cc: Realtek linux nic maintainers <nic_swsd@realtek.com>
      Cc: David Laight <David.Laight@ACULAB.COM>
      Cc: Stéphane ANCELOT <sancelot@free.fr>
      Cc: Eric Dumazet <edumazet@google.com>
      Signed-off-by: NKirill Smelkov <kirr@nexedi.com>
      Tested-by: NHolger Hoffstätte <holger@applied-asynchrony.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      50970831
  3. 28 10月, 2017 7 次提交
    • G
      tap: reference to KVA of an unloaded module causes kernel panic · dea6e19f
      Girish Moodalbail 提交于
      The commit 9a393b5d ("tap: tap as an independent module") created a
      separate tap module that implements tap functionality and exports
      interfaces that will be used by macvtap and ipvtap modules to create
      create respective tap devices.
      
      However, that patch introduced a regression wherein the modules macvtap
      and ipvtap can be removed (through modprobe -r) while there are
      applications using the respective /dev/tapX devices. These applications
      cause kernel to hold reference to /dev/tapX through 'struct cdev
      macvtap_cdev' and 'struct cdev ipvtap_dev' defined in macvtap and ipvtap
      modules respectively. So,  when the application is later closed the
      kernel panics because we are referencing KVA that is present in the
      unloaded modules.
      
      ----------8<------- Example ----------8<----------
      $ sudo ip li add name mv0 link enp7s0 type macvtap
      $ sudo ip li show mv0 |grep mv0| awk -e '{print $1 $2}'
        14:mv0@enp7s0:
      $ cat /dev/tap14 &
      $ lsmod |egrep -i 'tap|vlan'
      macvtap                16384  0
      macvlan                24576  1 macvtap
      tap                    24576  3 macvtap
      $ sudo modprobe -r macvtap
      $ fg
      cat /dev/tap14
      ^C
      
      <...system panics...>
      BUG: unable to handle kernel paging request at ffffffffa038c500
      IP: cdev_put+0xf/0x30
      ----------8<-----------------8<----------
      
      The fix is to set cdev.owner to the module that creates the tap device
      (either macvtap or ipvtap). With this set, the operations (in
      fs/char_dev.c) on char device holds and releases the module through
      cdev_get() and cdev_put() and will not allow the module to unload
      prematurely.
      
      Fixes: 9a393b5d (tap: tap as an independent module)
      Signed-off-by: NGirish Moodalbail <girish.moodalbail@oracle.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dea6e19f
    • K
      drivers/net: smsc: Convert timers to use timer_setup() · 267146d4
      Kees Cook 提交于
      In preparation for unconditionally passing the struct timer_list pointer to
      all timer callbacks, switch to using the new timer_setup() and from_timer()
      to pass the timer pointer explicitly.
      
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: "yuval.shaia@oracle.com" <yuval.shaia@oracle.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Philippe Reynes <tremyfr@gmail.com>
      Cc: Allen Pais <allen.lkml@gmail.com>
      Cc: Tobias Klauser <tklauser@distanz.ch>
      Cc: netdev@vger.kernel.org
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      267146d4
    • K
      drivers/net: packetengines: Convert timers to use timer_setup() · 8089c6f4
      Kees Cook 提交于
      In preparation for unconditionally passing the struct timer_list pointer to
      all timer callbacks, switch to using the new timer_setup() and from_timer()
      to pass the timer pointer explicitly.
      
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Allen Pais <allen.lkml@gmail.com>
      Cc: yuan linyu <Linyu.Yuan@alcatel-sbell.com.cn>
      Cc: Philippe Reynes <tremyfr@gmail.com>
      Cc: netdev@vger.kernel.org
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8089c6f4
    • K
      drivers/net: natsemi: Convert timers to use timer_setup() · 15735c9d
      Kees Cook 提交于
      In preparation for unconditionally passing the struct timer_list pointer to
      all timer callbacks, switch to using the new timer_setup() and from_timer()
      to pass the timer pointer explicitly.
      
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Allen Pais <allen.lkml@gmail.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Philippe Reynes <tremyfr@gmail.com>
      Cc: Wei Yongjun <weiyongjun1@huawei.com>
      Cc: netdev@vger.kernel.org
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      15735c9d
    • K
      drivers/net: mellanox: Convert timers to use timer_setup() · 0365b047
      Kees Cook 提交于
      In preparation for unconditionally passing the struct timer_list pointer to
      all timer callbacks, switch to using the new timer_setup() and from_timer()
      to pass the timer pointer explicitly.
      
      Cc: Saeed Mahameed <saeedm@mellanox.com>
      Cc: Matan Barak <matanb@mellanox.com>
      Cc: Leon Romanovsky <leonro@mellanox.com>
      Cc: netdev@vger.kernel.org
      Cc: linux-rdma@vger.kernel.org
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0365b047
    • K
      drivers/net: korina: Convert timers to use timer_setup() · 34309b36
      Kees Cook 提交于
      In preparation for unconditionally passing the struct timer_list pointer to
      all timer callbacks, switch to using the new timer_setup() and from_timer()
      to pass the timer pointer explicitly.
      
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Roman Yeryomin <leroi.lists@gmail.com>
      Cc: Florian Fainelli <f.fainelli@gmail.com>
      Cc: netdev@vger.kernel.org
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      34309b36
    • K
      drivers/net: fealnx: Convert timers to use timer_setup() · 8b3718dc
      Kees Cook 提交于
      In preparation for unconditionally passing the struct timer_list pointer to
      all timer callbacks, switch to using the new timer_setup() and from_timer()
      to pass the timer pointer explicitly.
      
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: "yuval.shaia@oracle.com" <yuval.shaia@oracle.com>
      Cc: Allen Pais <allen.lkml@gmail.com>
      Cc: Stephen Hemminger <stephen@networkplumber.org>
      Cc: Philippe Reynes <tremyfr@gmail.com>
      Cc: Johannes Berg <johannes.berg@intel.com>
      Cc: netdev@vger.kernel.org
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8b3718dc