1. 10 8月, 2021 1 次提交
  2. 05 8月, 2021 1 次提交
  3. 04 8月, 2021 1 次提交
    • J
      net: add netif_set_real_num_queues() for device reconfig · 271e5b7d
      Jakub Kicinski 提交于
      netif_set_real_num_rx_queues() and netif_set_real_num_tx_queues()
      can fail which breaks drivers trying to implement reconfiguration
      in a way that can't leave the device half-broken. In other words
      those functions are incompatible with prepare/commit approach.
      
      Luckily setting real number of queues can fail only if the number
      is increased, meaning that if we order operations correctly we
      can guarantee ending up with either new config (success), or
      the old one (on error).
      
      Provide a helper implementing such logic so that drivers don't
      have to duplicate it.
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      271e5b7d
  4. 03 8月, 2021 1 次提交
  5. 29 7月, 2021 1 次提交
  6. 28 7月, 2021 5 次提交
    • A
      net: bonding: move ioctl handling to private ndo operation · 3d9d00bd
      Arnd Bergmann 提交于
      All other user triggered operations are gone from ndo_ioctl, so move
      the SIOCBOND family into a custom operation as well.
      
      The .ndo_ioctl() helper is no longer called by the dev_ioctl.c code now,
      but there are still a few definitions in obsolete wireless drivers as well
      as the appletalk and ieee802154 layers to call SIOCSIFADDR/SIOCGIFADDR
      helpers from inside the kernel.
      
      Cc: Jay Vosburgh <j.vosburgh@gmail.com>
      Cc: Veaceslav Falico <vfalico@gmail.com>
      Cc: Andy Gospodarek <andy@greyhouse.net>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      3d9d00bd
    • A
      net: split out ndo_siowandev ioctl · ad7eab2a
      Arnd Bergmann 提交于
      In order to further reduce the scope of ndo_do_ioctl(), move
      out the SIOCWANDEV handling into a new network device operation
      function.
      
      Adjust the prototype to only pass the if_settings sub-structure
      in place of the ifreq, and remove the redundant 'cmd' argument
      in the process.
      
      Cc: Krzysztof Halasa <khc@pm.waw.pl>
      Cc: "Jan \"Yenya\" Kasprzak" <kas@fi.muni.cz>
      Cc: Kevin Curtis <kevin.curtis@farsite.co.uk>
      Cc: Zhao Qiang <qiang.zhao@nxp.com>
      Cc: Martin Schiller <ms@dev.tdt.de>
      Cc: Jiri Slaby <jirislaby@kernel.org>
      Cc: linux-x25@vger.kernel.org
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ad7eab2a
    • A
      dev_ioctl: split out ndo_eth_ioctl · a7605370
      Arnd Bergmann 提交于
      Most users of ndo_do_ioctl are ethernet drivers that implement
      the MII commands SIOCGMIIPHY/SIOCGMIIREG/SIOCSMIIREG, or hardware
      timestamping with SIOCSHWTSTAMP/SIOCGHWTSTAMP.
      
      Separate these from the few drivers that use ndo_do_ioctl to
      implement SIOCBOND, SIOCBR and SIOCWANDEV commands.
      
      This is a purely cosmetic change intended to help readers find
      their way through the implementation.
      
      Cc: Doug Ledford <dledford@redhat.com>
      Cc: Jason Gunthorpe <jgg@ziepe.ca>
      Cc: Jay Vosburgh <j.vosburgh@gmail.com>
      Cc: Veaceslav Falico <vfalico@gmail.com>
      Cc: Andy Gospodarek <andy@greyhouse.net>
      Cc: Andrew Lunn <andrew@lunn.ch>
      Cc: Vivien Didelot <vivien.didelot@gmail.com>
      Cc: Florian Fainelli <f.fainelli@gmail.com>
      Cc: Vladimir Oltean <olteanv@gmail.com>
      Cc: Leon Romanovsky <leon@kernel.org>
      Cc: linux-rdma@vger.kernel.org
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Acked-by: NJason Gunthorpe <jgg@nvidia.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a7605370
    • A
      dev_ioctl: pass SIOCDEVPRIVATE data separately · a554bf96
      Arnd Bergmann 提交于
      The compat handlers for SIOCDEVPRIVATE are incorrect for any driver that
      passes data as part of struct ifreq rather than as an ifr_data pointer, or
      that passes data back this way, since the compat_ifr_data_ioctl() helper
      overwrites the ifr_data pointer and does not copy anything back out.
      
      Since all drivers using devprivate commands are now converted to the
      new .ndo_siocdevprivate callback, fix this by adding the missing piece
      and passing the pointer separately the whole way.
      
      This further unifies the native and compat logic for socket ioctls,
      as the new code now passes the correct pointer as well as the correct
      data for both native and compat ioctls.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      a554bf96
    • A
      net: split out SIOCDEVPRIVATE handling from dev_ioctl · b9067f5d
      Arnd Bergmann 提交于
      SIOCDEVPRIVATE ioctl commands are mainly used in really old
      drivers, and they have a number of problems:
      
      - They hide behind the normal .ndo_do_ioctl function that
        is also used for other things in modern drivers, so it's
        hard to spot a driver that actually uses one of these
      
      - Since drivers use a number different calling conventions,
        it is impossible to support compat mode for them in
        a generic way.
      
      - With all drivers using the same 16 commands codes, there
        is no way to introspect the data being passed through
        things like strace.
      
      Add a new net_device_ops callback pointer, to address the
      first two of these. Separating them from .ndo_do_ioctl
      makes it easy to grep for drivers with a .ndo_siocdevprivate
      callback, and the unwieldy name hopefully makes it easier
      to spot in code review.
      
      By passing the ifreq structure and the ifr_data pointer
      separately, it is no longer necessary to overload these,
      and the driver can use either one for a given command.
      
      Cc: Cong Wang <cong.wang@bytedance.com>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b9067f5d
  7. 23 7月, 2021 3 次提交
    • A
      net: socket: rework compat_ifreq_ioctl() · 29c49648
      Arnd Bergmann 提交于
      compat_ifreq_ioctl() is one of the last users of copy_in_user() and
      compat_alloc_user_space(), as it attempts to convert the 'struct ifreq'
      arguments from 32-bit to 64-bit format as used by dev_ioctl() and a
      couple of socket family specific interpretations.
      
      The current implementation works correctly when calling dev_ioctl(),
      inet_ioctl(), ieee802154_sock_ioctl(), atalk_ioctl(), qrtr_ioctl()
      and packet_ioctl(). The ioctl handlers for x25, netrom, rose and x25 do
      not interpret the arguments and only block the corresponding commands,
      so they do not care.
      
      For af_inet6 and af_decnet however, the compat conversion is slightly
      incorrect, as it will copy more data than the native handler accesses,
      both of them use a structure that is shorter than ifreq.
      
      Replace the copy_in_user() conversion with a pair of accessor functions
      to read and write the ifreq data in place with the correct length where
      needed, while leaving the other ones to copy the (already compatible)
      structures directly.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      29c49648
    • A
      net: socket: simplify dev_ifconf handling · 876f0bf9
      Arnd Bergmann 提交于
      The dev_ifconf() calling conventions make compat handling
      more complicated than necessary, simplify this by moving
      the in_compat_syscall() check into the function.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      876f0bf9
    • A
      net: socket: remove register_gifconf · b0e99d03
      Arnd Bergmann 提交于
      Since dynamic registration of the gifconf() helper is only used for
      IPv4, and this can not be in a loadable module, this can be simplified
      noticeably by turning it into a direct function call as a preparation
      for cleaning up the compat handling.
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b0e99d03
  8. 08 7月, 2021 1 次提交
  9. 26 6月, 2021 1 次提交
    • N
      dev_forward_skb: do not scrub skb mark within the same name space · ff70202b
      Nicolas Dichtel 提交于
      The goal is to keep the mark during a bpf_redirect(), like it is done for
      legacy encapsulation / decapsulation, when there is no x-netns.
      This was initially done in commit 213dd74a ("skbuff: Do not scrub skb
      mark within the same name space").
      
      When the call to skb_scrub_packet() was added in dev_forward_skb() (commit
      8b27f277 ("skb: allow skb_scrub_packet() to be used by tunnels")), the
      second argument (xnet) was set to true to force a call to skb_orphan(). At
      this time, the mark was always cleanned up by skb_scrub_packet(), whatever
      xnet value was.
      This call to skb_orphan() was removed later in commit
      9c4c3252 ("skbuff: preserve sock reference when scrubbing the skb.").
      But this 'true' stayed here without any real reason.
      
      Let's correctly set xnet in ____dev_forward_skb(), this function has access
      to the previous interface and to the new interface.
      Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ff70202b
  10. 04 6月, 2021 1 次提交
    • J
      mlx5: count all link events · 490dceca
      Jakub Kicinski 提交于
      mlx5 devices were observed generating MLX5_PORT_CHANGE_SUBTYPE_ACTIVE
      events without an intervening MLX5_PORT_CHANGE_SUBTYPE_DOWN. This
      breaks link flap detection based on Linux carrier state transition
      count as netif_carrier_on() does nothing if carrier is already on.
      Make sure we count such events.
      
      netif_carrier_event() increments the counters and fires the linkwatch
      events. The latter is not necessary for the use case but seems like
      the right thing to do.
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NSaeed Mahameed <saeedm@nvidia.com>
      490dceca
  11. 08 4月, 2021 1 次提交
  12. 06 4月, 2021 1 次提交
  13. 25 3月, 2021 7 次提交
  14. 24 3月, 2021 1 次提交
    • D
      net: make unregister netdev warning timeout configurable · 5aa3afe1
      Dmitry Vyukov 提交于
      netdev_wait_allrefs() issues a warning if refcount does not drop to 0
      after 10 seconds. While 10 second wait generally should not happen
      under normal workload in normal environment, it seems to fire falsely
      very often during fuzzing and/or in qemu emulation (~10x slower).
      At least it's not possible to understand if it's really a false
      positive or not. Automated testing generally bumps all timeouts
      to very high values to avoid flake failures.
      Add net.core.netdev_unregister_timeout_secs sysctl to make
      the timeout configurable for automated testing systems.
      Lowering the timeout may also be useful for e.g. manual bisection.
      The default value matches the current behavior.
      Signed-off-by: NDmitry Vyukov <dvyukov@google.com>
      Fixes: https://bugzilla.kernel.org/show_bug.cgi?id=211877
      Cc: netdev@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5aa3afe1
  15. 23 3月, 2021 3 次提交
    • E
      net: set initial device refcount to 1 · add2d736
      Eric Dumazet 提交于
      When adding CONFIG_PCPU_DEV_REFCNT, I forgot that the
      initial net device refcount was 0.
      
      When CONFIG_PCPU_DEV_REFCNT is not set, this means
      the first dev_hold() triggers an illegal refcount
      operation (addition on 0)
      
      refcount_t: addition on 0; use-after-free.
      WARNING: CPU: 0 PID: 1 at lib/refcount.c:25 refcount_warn_saturate+0x128/0x1a4
      
      Fix is to change initial (and final) refcount to be 1.
      
      Also add a missing kerneldoc piece, as reported by
      Stephen Rothwell.
      
      Fixes: 919067cc ("net: add CONFIG_PCPU_DEV_REFCNT")
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Reported-by: NGuenter Roeck <groeck@google.com>
      Tested-by: NGuenter Roeck <groeck@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      add2d736
    • V
      net: move the ptype_all and ptype_base declarations to include/linux/netdevice.h · 744b8376
      Vladimir Oltean 提交于
      ptype_all and ptype_base are declared in net/core/dev.c as non-static,
      because they are used by net-procfs.c too. However, a "make W=1" build
      complains that there was no previous declaration of ptype_all and
      ptype_base in a header file, so this way of declaring things constitutes
      a violation of coding style.
      
      Let's move the extern declarations of ptype_all and ptype_base to the
      linux/netdevice.h file, which is included by net-procfs.c too.
      Signed-off-by: NVladimir Oltean <vladimir.oltean@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      744b8376
    • V
      netdev: add netdev_queue_set_dql_min_limit() · f57bac3c
      Vincent Mailhol 提交于
      Add a function to set the dynamic queue limit minimum value.
      
      Some specific drivers might have legitimate reasons to configure
      dql.min_limit to a given value. Typically, this is the case when the
      PDU of the protocol is smaller than the packet size to used to
      carry those frames to the device.
      
      Concrete example: a CAN (Control Area Network) device with an USB 2.0
      interface.  The PDU of classical CAN protocol are roughly 16 bytes but
      the USB packet size (which is used to carry the CAN frames to the
      device) might be up to 512 bytes.  Wen small traffic burst occurs, BQL
      algorithm is not able to immediately adjust and this would result in
      having to send many small USB packets (i.e packet of 16 bytes for each
      CAN frame). Filling up the USB packet with CAN frames is relatively
      fast (small latency issue) but the gain of not having to send several
      small USB packets is huge (big throughput increase). In this case,
      forcing dql.min_limit to a given value that would allow to stuff the
      USB packet is always a win.
      
      This function is to be used by network drivers which are able to prove
      through a rationale and through empirical tests on several environment
      (with other applications, heavy context switching, virtualization...),
      that they constantly reach better performances with a specific
      predefined dql.min_limit value with no noticeable latency impact.
      Signed-off-by: NVincent Mailhol <mailhol.vincent@wanadoo.fr>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f57bac3c
  16. 20 3月, 2021 1 次提交
    • E
      net: add CONFIG_PCPU_DEV_REFCNT · 919067cc
      Eric Dumazet 提交于
      I was working on a syzbot issue, claiming one device could not be
      dismantled because its refcount was -1
      
      unregister_netdevice: waiting for sit0 to become free. Usage count = -1
      
      It would be nice if syzbot could trigger a warning at the time
      this reference count became negative.
      
      This patch adds CONFIG_PCPU_DEV_REFCNT options which defaults
      to per cpu variables (as before this patch) on SMP builds.
      
      v2: free_dev label in alloc_netdev_mqs() is moved to avoid
          a compiler warning (-Wunused-label), as reported
          by kernel test robot <lkp@intel.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      919067cc
  17. 19 3月, 2021 3 次提交
  18. 18 3月, 2021 1 次提交
    • W
      net: fix race between napi kthread mode and busy poll · cb038357
      Wei Wang 提交于
      Currently, napi_thread_wait() checks for NAPI_STATE_SCHED bit to
      determine if the kthread owns this napi and could call napi->poll() on
      it. However, if socket busy poll is enabled, it is possible that the
      busy poll thread grabs this SCHED bit (after the previous napi->poll()
      invokes napi_complete_done() and clears SCHED bit) and tries to poll
      on the same napi. napi_disable() could grab the SCHED bit as well.
      This patch tries to fix this race by adding a new bit
      NAPI_STATE_SCHED_THREADED in napi->state. This bit gets set in
      ____napi_schedule() if the threaded mode is enabled, and gets cleared
      in napi_complete_done(), and we only poll the napi in kthread if this
      bit is set. This helps distinguish the ownership of the napi between
      kthread and other scenarios and fixes the race issue.
      
      Fixes: 29863d41 ("net: implement threaded-able napi poll loop support")
      Reported-by: NMartin Zaharinov <micron10@gmail.com>
      Suggested-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NWei Wang <weiwan@google.com>
      Cc: Alexander Duyck <alexanderduyck@fb.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: Hannes Frederic Sowa <hannes@stressinduktion.org>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cb038357
  19. 17 3月, 2021 1 次提交
  20. 04 3月, 2021 1 次提交
  21. 25 2月, 2021 3 次提交
  22. 13 2月, 2021 1 次提交