1. 03 10月, 2019 10 次提交
    • R
      Clean up the net/caif/Kconfig menu · 0903102f
      rd.dunlab@gmail.com 提交于
      Clean up the net/caif/Kconfig menu:
      - remove extraneous space
      - minor language tweaks
      - fix punctuation
      Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      0903102f
    • E
      net_sched: remove need_resched() from qdisc_run() · b60fa1c5
      Eric Dumazet 提交于
      The introduction of this schedule point was done in commit
      2ba2506c ("[NET]: Add preemption point in qdisc_run")
      at a time the loop was not bounded.
      
      Then later in commit d5b8aa1d ("net_sched: fix dequeuer fairness")
      we added a limit on the number of packets.
      
      Now is the time to remove the schedule point, since the default
      limit of 64 packets matches the number of packets a typical NAPI
      poll can process in a row.
      
      This solves a latency problem for most TCP receivers under moderate load :
      
      1) host receives a packet.
         NET_RX_SOFTIRQ is raised by NIC hard IRQ handler
      
      2) __do_softirq() does its first loop, handling NET_RX_SOFTIRQ
         and calling the driver napi->loop() function
      
      3) TCP stores the skb in socket receive queue:
      
      4) TCP calls sk->sk_data_ready() and wakeups a user thread
         waiting for EPOLLIN (as a result, need_resched() might now be true)
      
      5) TCP cooks an ACK and sends it.
      
      6) qdisc_run() processes one packet from qdisc, and sees need_resched(),
         this raises NET_TX_SOFTIRQ (even if there are no more packets in
         the qdisc)
      
      Then we go back to the __do_softirq() in 2), and we see that new
      softirqs were raised. Since need_resched() is true, we end up waking
      ksoftirqd in this path :
      
          if (pending) {
                  if (time_before(jiffies, end) && !need_resched() &&
                      --max_restart)
                          goto restart;
      
                  wakeup_softirqd();
          }
      
      So we have many wakeups of ksoftirqd kernel threads,
      and more calls to qdisc_run() with associated lock overhead.
      
      Note that another way to solve the issue would be to change TCP
      to first send the ACK packet, then signal the EPOLLIN,
      but this changes P99 latencies, as sending the ACK packet
      can add a long delay.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b60fa1c5
    • V
      net: dsa: Remove unused __DSA_SKB_CB macro · 37048e94
      Vladimir Oltean 提交于
      The struct __dsa_skb_cb is supposed to span the entire 48-byte skb
      control block, while the struct dsa_skb_cb only the portion of it which
      is used by the DSA core (the rest is available as private data to
      drivers).
      
      The DSA_SKB_CB and __DSA_SKB_CB helpers are supposed to help retrieve
      this pointer based on a skb, but it turns out there is nobody directly
      interested in the struct __dsa_skb_cb in the kernel. So remove it.
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      37048e94
    • D
      Merge branch 'sja1105-cleanups' · b74d402e
      David S. Miller 提交于
      Vladimir Oltean says:
      
      ====================
      SJA1105 DSA coding style cleanup
      
      This series provides some mechanical cleanup patches related to function
      names and prototypes.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b74d402e
    • V
      net: dsa: sja1105: Rename sja1105_spi_send_packed_buf to sja1105_xfer_buf · 1bd44870
      Vladimir Oltean 提交于
      The most commonly called function in the driver is long due for a
      rename. The "packed" word is redundant (it doesn't make sense to
      transfer an unpacked structure, since that is in CPU endianness yadda
      yadda), and the "spi" word is also redundant since argument 2 of the
      function is SPI_READ or SPI_WRITE.
      
      As for the sja1105_spi_send_long_packed_buf function, it is only being
      used from sja1105_spi.c, so remove its global prototype.
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      1bd44870
    • V
      net: dsa: sja1105: Replace sja1105_spi_send_int with sja1105_xfer_{u32, u64} · dff79620
      Vladimir Oltean 提交于
      Having a function that takes a variable number of unpacked bytes which
      it generically calls an "int" is confusing and makes auditing patches
      next to impossible.
      
      We only use spi_send_int with the int sizes of 32 and 64 bits. So just
      make the spi_send_int function less generic and replace it with the
      appropriate two explicit functions, which can now type-check the int
      pointer type.
      
      Note that there is still a small weirdness in the u32 function, which
      has to convert it to a u64 temporary. This is because of how the packing
      API works at the moment, but the weirdness is at least hidden from
      callers of sja1105_xfer_u32 now.
      Suggested-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      dff79620
    • V
      net: dsa: sja1105: Don't use "inline" function declarations in C files · 09c1b412
      Vladimir Oltean 提交于
      Let the compiler decide.
      Signed-off-by: NVladimir Oltean <olteanv@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      09c1b412
    • D
      Merge branch 'SMB-rootfs' · 5cf37738
      David S. Miller 提交于
      Paulo Alcantara says:
      
      ====================
      Experimental SMB rootfs support
      
      This patch series enables Linux to mount root file systems over the
      network by utilizing SMB protocol.
      
      Upstream commit 8eecd1c2 ("cifs: Add support for root file
      systems") introduced a new CONFIG_CIFS_ROOT option, a virtual device
      (Root_CIFS) and a kernel cmdline parameter "cifsroot=" which tells the
      kernel to actually mount the root filesystem over a SMB share.
      
      The feature relies on ipconfig to set up the network prior to mounting
      the rootfs, so when it is set along with "cifsroot=" parameter:
      
          (1) cifs_root_setup() parses all necessary data out of "cifsroot="
          parameter for the init process know how to mount the SMB rootfs
          (e.g. SMB server address, mount options).
      
          (2) If DHCP failed for some reason in ipconfig, we keep retrying
          forever as we have nowhere to go for NFS or SMB root
          filesystems (see PATCH 2/2). Otherwise go to (3).
      
          (3) mount_cifs_root() is then called by mount_root() (ROOT_DEV ==
          Root_CIFS), retrieves early parsed data from (1), then attempt to
          mount SMB rootfs by CIFSROOT_RETRY_MAX times at most (see PATCH
          1/2).
      
          (4) If all attempts failed, fall back to floppy drive, otherwise
          continue the boot process with rootfs mounted over a SMB share.
      
      My idea was to keep the same behavior of nfsroot - as it seems to work
      for most users so far.
      
      For more information on how this feature works, see
      Documentation/filesystems/cifs/cifsroot.txt.
      ====================
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5cf37738
    • P
      ipconfig: Handle CONFIG_CIFS_ROOT option · 51976f47
      Paulo Alcantara (SUSE) 提交于
      The experimental root file system support in cifs.ko relies on
      ipconfig to set up the network stack and then accessing the SMB share
      that contains the rootfs files.
      Signed-off-by: NPaulo Alcantara (SUSE) <pc@cjr.nz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      51976f47
    • P
      init: Support mounting root file systems over SMB · 8902dd52
      Paulo Alcantara (SUSE) 提交于
      Add a new virtual device named /dev/cifs (0xfe) to tell the kernel to
      mount the root file system over the network by using SMB protocol.
      
      cifs_root_data() will be responsible to retrieve the parsed
      information of the new command-line option (cifsroot=) and then call
      do_mount_root() with the appropriate mount options for cifs.ko.
      Signed-off-by: NPaulo Alcantara (SUSE) <pc@cjr.nz>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      8902dd52
  2. 02 10月, 2019 27 次提交
  3. 29 9月, 2019 3 次提交
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · 02dc96ef
      Linus Torvalds 提交于
      Pull networking fixes from David Miller:
      
       1) Sanity check URB networking device parameters to avoid divide by
          zero, from Oliver Neukum.
      
       2) Disable global multicast filter in NCSI, otherwise LLDP and IPV6
          don't work properly. Longer term this needs a better fix tho. From
          Vijay Khemka.
      
       3) Small fixes to selftests (use ping when ping6 is not present, etc.)
          from David Ahern.
      
       4) Bring back rt_uses_gateway member of struct rtable, it's semantics
          were not well understood and trying to remove it broke things. From
          David Ahern.
      
       5) Move usbnet snaity checking, ignore endpoints with invalid
          wMaxPacketSize. From Bjørn Mork.
      
       6) Missing Kconfig deps for sja1105 driver, from Mao Wenan.
      
       7) Various small fixes to the mlx5 DR steering code, from Alaa Hleihel,
          Alex Vesker, and Yevgeny Kliteynik
      
       8) Missing CAP_NET_RAW checks in various places, from Ori Nimron.
      
       9) Fix crash when removing sch_cbs entry while offloading is enabled,
          from Vinicius Costa Gomes.
      
      10) Signedness bug fixes, generally in looking at the result given by
          of_get_phy_mode() and friends. From Dan Crapenter.
      
      11) Disable preemption around BPF_PROG_RUN() calls, from Eric Dumazet.
      
      12) Don't create VRF ipv6 rules if ipv6 is disabled, from David Ahern.
      
      13) Fix quantization code in tcp_bbr, from Kevin Yang.
      
      * git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (127 commits)
        net: tap: clean up an indentation issue
        nfp: abm: fix memory leak in nfp_abm_u32_knode_replace
        tcp: better handle TCP_USER_TIMEOUT in SYN_SENT state
        sk_buff: drop all skb extensions on free and skb scrubbing
        tcp_bbr: fix quantization code to not raise cwnd if not probing bandwidth
        mlxsw: spectrum_flower: Fail in case user specifies multiple mirror actions
        Documentation: Clarify trap's description
        mlxsw: spectrum: Clear VLAN filters during port initialization
        net: ena: clean up indentation issue
        NFC: st95hf: clean up indentation issue
        net: phy: micrel: add Asym Pause workaround for KSZ9021
        net: socionext: ave: Avoid using netdev_err() before calling register_netdev()
        ptp: correctly disable flags on old ioctls
        lib: dimlib: fix help text typos
        net: dsa: microchip: Always set regmap stride to 1
        nfp: flower: fix memory leak in nfp_flower_spawn_vnic_reprs
        nfp: flower: prevent memory leak in nfp_flower_spawn_phy_reprs
        net/sched: Set default of CONFIG_NET_TC_SKB_EXT to N
        vrf: Do not attempt to create IPv6 mcast rule if IPv6 is disabled
        net: sched: sch_sfb: don't call qdisc_put() while holding tree lock
        ...
      02dc96ef
    • L
      Merge branch 'hugepage-fallbacks' (hugepatch patches from David Rientjes) · edf445ad
      Linus Torvalds 提交于
      Merge hugepage allocation updates from David Rientjes:
       "We (mostly Linus, Andrea, and myself) have been discussing offlist how
        to implement a sane default allocation strategy for hugepages on NUMA
        platforms.
      
        With these reverts in place, the page allocator will happily allocate
        a remote hugepage immediately rather than try to make a local hugepage
        available. This incurs a substantial performance degradation when
        memory compaction would have otherwise made a local hugepage
        available.
      
        This series reverts those reverts and attempts to propose a more sane
        default allocation strategy specifically for hugepages. Andrea
        acknowledges this is likely to fix the swap storms that he originally
        reported that resulted in the patches that removed __GFP_THISNODE from
        hugepage allocations.
      
        The immediate goal is to return 5.3 to the behavior the kernel has
        implemented over the past several years so that remote hugepages are
        not immediately allocated when local hugepages could have been made
        available because the increased access latency is untenable.
      
        The next goal is to introduce a sane default allocation strategy for
        hugepages allocations in general regardless of the configuration of
        the system so that we prevent thrashing of local memory when
        compaction is unlikely to succeed and can prefer remote hugepages over
        remote native pages when the local node is low on memory."
      
      Note on timing: this reverts the hugepage VM behavior changes that got
      introduced fairly late in the 5.3 cycle, and that fixed a huge
      performance regression for certain loads that had been around since
      4.18.
      
      Andrea had this note:
      
       "The regression of 4.18 was that it was taking hours to start a VM
        where 3.10 was only taking a few seconds, I reported all the details
        on lkml when it was finally tracked down in August 2018.
      
           https://lore.kernel.org/linux-mm/20180820032640.9896-2-aarcange@redhat.com/
      
        __GFP_THISNODE in MADV_HUGEPAGE made the above enterprise vfio
        workload degrade like in the "current upstream" above. And it still
        would have been that bad as above until 5.3-rc5"
      
      where the bad behavior ends up happening as you fill up a local node,
      and without that change, you'd get into the nasty swap storm behavior
      due to compaction working overtime to make room for more memory on the
      nodes.
      
      As a result 5.3 got the two performance fix reverts in rc5.
      
      However, David Rientjes then noted that those performance fixes in turn
      regressed performance for other loads - although not quite to the same
      degree.  He suggested reverting the reverts and instead replacing them
      with two small changes to how hugepage allocations are done (patch
      descriptions rephrased by me):
      
       - "avoid expensive reclaim when compaction may not succeed": just admit
         that the allocation failed when you're trying to allocate a huge-page
         and compaction wasn't successful.
      
       - "allow hugepage fallback to remote nodes when madvised": when that
         node-local huge-page allocation failed, retry without forcing the
         local node.
      
      but by then I judged it too late to replace the fixes for a 5.3 release.
      So 5.3 was released with behavior that harked back to the pre-4.18 logic.
      
      But now we're in the merge window for 5.4, and we can see if this
      alternate model fixes not just the horrendous swap storm behavior, but
      also restores the performance regression that the late reverts caused.
      
      Fingers crossed.
      
      * emailed patches from David Rientjes <rientjes@google.com>:
        mm, page_alloc: allow hugepage fallback to remote nodes when madvised
        mm, page_alloc: avoid expensive reclaim when compaction may not succeed
        Revert "Revert "Revert "mm, thp: consolidate THP gfp handling into alloc_hugepage_direct_gfpmask""
        Revert "Revert "mm, thp: restore node-local hugepage allocations""
      edf445ad
    • D
      mm, page_alloc: allow hugepage fallback to remote nodes when madvised · 76e654cc
      David Rientjes 提交于
      For systems configured to always try hard to allocate transparent
      hugepages (thp defrag setting of "always") or for memory that has been
      explicitly madvised to MADV_HUGEPAGE, it is often better to fallback to
      remote memory to allocate the hugepage if the local allocation fails
      first.
      
      The point is to allow the initial call to __alloc_pages_node() to attempt
      to defragment local memory to make a hugepage available, if possible,
      rather than immediately fallback to remote memory.  Local hugepages will
      always have a better access latency than remote (huge)pages, so an attempt
      to make a hugepage available locally is always preferred.
      
      If memory compaction cannot be successful locally, however, it is likely
      better to fallback to remote memory.  This could take on two forms: either
      allow immediate fallback to remote memory or do per-zone watermark checks.
      It would be possible to fallback only when per-zone watermarks fail for
      order-0 memory, since that would require local reclaim for all subsequent
      faults so remote huge allocation is likely better than thrashing the local
      zone for large workloads.
      
      In this case, it is assumed that because the system is configured to try
      hard to allocate hugepages or the vma is advised to explicitly want to try
      hard for hugepages that remote allocation is better when local allocation
      and memory compaction have both failed.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Stefan Priebe - Profihost AG <s.priebe@profihost.ag>
      Cc: "Kirill A. Shutemov" <kirill@shutemov.name>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      76e654cc