1. 25 9月, 2020 4 次提交
    • P
      tcp: skip DSACKs with dubious sequence ranges · ad2b9b0f
      Priyaranjan Jha 提交于
      Currently, we use length of DSACKed range to compute number of
      delivered packets. And if sequence range in DSACK is corrupted,
      we can get bogus dsacked/acked count, and bogus cwnd.
      
      This patch put bounds on DSACKed range to skip update of data
      delivery and spurious retransmission information, if the DSACK
      is unlikely caused by sender's action:
      - DSACKed range shouldn't be greater than maximum advertised rwnd.
      - Total no. of DSACKed segments shouldn't be greater than total
        no. of retransmitted segs. Unlike spurious retransmits, network
        duplicates or corrupted DSACKs shouldn't be counted as delivery.
      Signed-off-by: NPriyaranjan Jha <priyarjha@google.com>
      Signed-off-by: NNeal Cardwell <ncardwell@google.com>
      Signed-off-by: NYuchung Cheng <ycheng@google.com>
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      ad2b9b0f
    • X
      net: mscc: ocelot: fix fields offset in SG_CONFIG_REG_3 · 4ab810a4
      Xiaoliang Yang 提交于
      INIT_IPS and GATE_ENABLE fields have a wrong offset in SG_CONFIG_REG_3.
      This register is used by stream gate control of PSFP, and it has not
      been used before, because PSFP is not implemented in ocelot driver.
      Signed-off-by: NXiaoliang Yang <xiaoliang.yang_1@nxp.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      4ab810a4
    • M
      net/ipv4: always honour route mtu during forwarding · 02a1b175
      Maciej Żenczykowski 提交于
      Documentation/networking/ip-sysctl.txt:46 says:
        ip_forward_use_pmtu - BOOLEAN
          By default we don't trust protocol path MTUs while forwarding
          because they could be easily forged and can lead to unwanted
          fragmentation by the router.
          You only need to enable this if you have user-space software
          which tries to discover path mtus by itself and depends on the
          kernel honoring this information. This is normally not the case.
          Default: 0 (disabled)
          Possible values:
          0 - disabled
          1 - enabled
      
      Which makes it pretty clear that setting it to 1 is a potential
      security/safety/DoS issue, and yet it is entirely reasonable to want
      forwarded traffic to honour explicitly administrator configured
      route mtus (instead of defaulting to device mtu).
      
      Indeed, I can't think of a single reason why you wouldn't want to.
      Since you configured a route mtu you probably know better...
      
      It is pretty common to have a higher device mtu to allow receiving
      large (jumbo) frames, while having some routes via that interface
      (potentially including the default route to the internet) specify
      a lower mtu.
      
      Note that ipv6 forwarding uses device mtu unless the route is locked
      (in which case it will use the route mtu).
      
      This approach is not usable for IPv4 where an 'mtu lock' on a route
      also has the side effect of disabling TCP path mtu discovery via
      disabling the IPv4 DF (don't frag) bit on all outgoing frames.
      
      I'm not aware of a way to lock a route from an IPv6 RA, so that also
      potentially seems wrong.
      Signed-off-by: NMaciej Żenczykowski <maze@google.com>
      Cc: Eric Dumazet <edumazet@google.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Cc: Lorenzo Colitti <lorenzo@google.com>
      Cc: Sunmeet Gill (Sunny) <sgill@quicinc.com>
      Cc: Vinay Paradkar <vparadka@qti.qualcomm.com>
      Cc: Tyler Wear <twear@quicinc.com>
      Cc: David Ahern <dsahern@kernel.org>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      02a1b175
    • C
      net_sched: defer tcf_idr_insert() in tcf_action_init_1() · e49d8c22
      Cong Wang 提交于
      All TC actions call tcf_idr_insert() for new action at the end
      of their ->init(), so we can actually move it to a central place
      in tcf_action_init_1().
      
      And once the action is inserted into the global IDR, other parallel
      process could free it immediately as its refcnt is still 1, so we can
      not fail after this, we need to move it after the goto action
      validation to avoid handling the failure case after insertion.
      
      This is found during code review, is not directly triggered by syzbot.
      And this prepares for the next patch.
      
      Cc: Vlad Buslov <vladbu@mellanox.com>
      Cc: Jamal Hadi Salim <jhs@mojatatu.com>
      Cc: Jiri Pirko <jiri@resnulli.us>
      Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      e49d8c22
  2. 21 9月, 2020 2 次提交
  3. 20 9月, 2020 3 次提交
  4. 19 9月, 2020 4 次提交
  5. 18 9月, 2020 3 次提交
    • M
      ethtool: add and use message type for tunnel info reply · 19a83d36
      Michal Kubecek 提交于
      Tunnel offload info code uses ETHTOOL_MSG_TUNNEL_INFO_GET message type (cmd
      field in genetlink header) for replies to tunnel info netlink request, i.e.
      the same value as the request have. This is a problem because we are using
      two separate enums for userspace to kernel and kernel to userspace message
      types so that this ETHTOOL_MSG_TUNNEL_INFO_GET (28) collides with
      ETHTOOL_MSG_CABLE_TEST_TDR_NTF which is what message type 28 means for
      kernel to userspace messages.
      
      As the tunnel info request reached mainline in 5.9 merge window, we should
      still be able to fix the reply message type without breaking backward
      compatibility.
      
      Fixes: c7d759eb ("ethtool: add tunnel info interface")
      Signed-off-by: NMichal Kubecek <mkubecek@suse.cz>
      Reviewed-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      19a83d36
    • L
      mm: allow a controlled amount of unfairness in the page lock · 5ef64cc8
      Linus Torvalds 提交于
      Commit 2a9127fc ("mm: rewrite wait_on_page_bit_common() logic") made
      the page locking entirely fair, in that if a waiter came in while the
      lock was held, the lock would be transferred to the lockers strictly in
      order.
      
      That was intended to finally get rid of the long-reported watchdog
      failures that involved the page lock under extreme load, where a process
      could end up waiting essentially forever, as other page lockers stole
      the lock from under it.
      
      It also improved some benchmarks, but it ended up causing huge
      performance regressions on others, simply because fair lock behavior
      doesn't end up giving out the lock as aggressively, causing better
      worst-case latency, but potentially much worse average latencies and
      throughput.
      
      Instead of reverting that change entirely, this introduces a controlled
      amount of unfairness, with a sysctl knob to tune it if somebody needs
      to.  But the default value should hopefully be good for any normal load,
      allowing a few rounds of lock stealing, but enforcing the strict
      ordering before the lock has been stolen too many times.
      
      There is also a hint from Matthieu Baerts that the fair page coloring
      may end up exposing an ABBA deadlock that is hidden by the usual
      optimistic lock stealing, and while the unfairness doesn't fix the
      fundamental issue (and I'm still looking at that), it avoids it in
      practice.
      
      The amount of unfairness can be modified by writing a new value to the
      'sysctl_page_lock_unfairness' variable (default value of 5, exposed
      through /proc/sys/vm/page_lock_unfairness), but that is hopefully
      something we'd use mainly for debugging rather than being necessary for
      any deep system tuning.
      
      This whole issue has exposed just how critical the page lock can be, and
      how contended it gets under certain locks.  And the main contention
      doesn't really seem to be anything related to IO (which was the origin
      of this lock), but for things like just verifying that the page file
      mapping is stable while faulting in the page into a page table.
      
      Link: https://lore.kernel.org/linux-fsdevel/ed8442fd-6f54-dd84-cd4a-941e8b7ee603@MichaelLarabel.com/
      Link: https://www.phoronix.com/scan.php?page=article&item=linux-50-59&num=1
      Link: https://lore.kernel.org/linux-fsdevel/c560a38d-8313-51fb-b1ec-e904bd8836bc@tessares.net/Reported-and-tested-by: NMichael Larabel <Michael@michaellarabel.com>
      Tested-by: NMatthieu Baerts <matthieu.baerts@tessares.net>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Chris Mason <clm@fb.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Amir Goldstein <amir73il@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5ef64cc8
    • A
      arm64: paravirt: Initialize steal time when cpu is online · 75df529b
      Andrew Jones 提交于
      Steal time initialization requires mapping a memory region which
      invokes a memory allocation. Doing this at CPU starting time results
      in the following trace when CONFIG_DEBUG_ATOMIC_SLEEP is enabled:
      
      BUG: sleeping function called from invalid context at mm/slab.h:498
      in_atomic(): 1, irqs_disabled(): 128, non_block: 0, pid: 0, name: swapper/1
      CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.9.0-rc5+ #1
      Call trace:
       dump_backtrace+0x0/0x208
       show_stack+0x1c/0x28
       dump_stack+0xc4/0x11c
       ___might_sleep+0xf8/0x130
       __might_sleep+0x58/0x90
       slab_pre_alloc_hook.constprop.101+0xd0/0x118
       kmem_cache_alloc_node_trace+0x84/0x270
       __get_vm_area_node+0x88/0x210
       get_vm_area_caller+0x38/0x40
       __ioremap_caller+0x70/0xf8
       ioremap_cache+0x78/0xb0
       memremap+0x9c/0x1a8
       init_stolen_time_cpu+0x54/0xf0
       cpuhp_invoke_callback+0xa8/0x720
       notify_cpu_starting+0xc8/0xd8
       secondary_start_kernel+0x114/0x180
      CPU1: Booted secondary processor 0x0000000001 [0x431f0a11]
      
      However we don't need to initialize steal time at CPU starting time.
      We can simply wait until CPU online time, just sacrificing a bit of
      accuracy by returning zero for steal time until we know better.
      
      While at it, add __init to the functions that are only called by
      pv_time_init() which is __init.
      Signed-off-by: NAndrew Jones <drjones@redhat.com>
      Fixes: e0685fa2 ("arm64: Retrieve stolen time as paravirtualized guest")
      Cc: stable@vger.kernel.org
      Reviewed-by: NSteven Price <steven.price@arm.com>
      Link: https://lore.kernel.org/r/20200916154530.40809-1-drjones@redhat.comSigned-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      75df529b
  6. 17 9月, 2020 2 次提交
  7. 16 9月, 2020 2 次提交
  8. 15 9月, 2020 2 次提交
  9. 12 9月, 2020 1 次提交
    • H
      KVM: MIPS: Change the definition of kvm type · 15e9e35c
      Huacai Chen 提交于
      MIPS defines two kvm types:
      
       #define KVM_VM_MIPS_TE          0
       #define KVM_VM_MIPS_VZ          1
      
      In Documentation/virt/kvm/api.rst it is said that "You probably want to
      use 0 as machine type", which implies that type 0 be the "automatic" or
      "default" type. And, in user-space libvirt use the null-machine (with
      type 0) to detect the kvm capability, which returns "KVM not supported"
      on a VZ platform.
      
      I try to fix it in QEMU but it is ugly:
      https://lists.nongnu.org/archive/html/qemu-devel/2020-08/msg05629.html
      
      And Thomas Huth suggests me to change the definition of kvm type:
      https://lists.nongnu.org/archive/html/qemu-devel/2020-09/msg03281.html
      
      So I define like this:
      
       #define KVM_VM_MIPS_AUTO        0
       #define KVM_VM_MIPS_VZ          1
       #define KVM_VM_MIPS_TE          2
      
      Since VZ and TE cannot co-exists, using type 0 on a TE platform will
      still return success (so old user-space tools have no problems on new
      kernels); the advantage is that using type 0 on a VZ platform will not
      return failure. So, the only problem is "new user-space tools use type
      2 on old kernels", but if we treat this as a kernel bug, we can backport
      this patch to old stable kernels.
      Signed-off-by: NHuacai Chen <chenhc@lemote.com>
      Message-Id: <1599734031-28746-1-git-send-email-chenhc@lemote.com>
      Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
      15e9e35c
  10. 11 9月, 2020 5 次提交
  11. 10 9月, 2020 2 次提交
  12. 09 9月, 2020 1 次提交
  13. 08 9月, 2020 3 次提交
  14. 07 9月, 2020 3 次提交
  15. 06 9月, 2020 1 次提交
  16. 05 9月, 2020 2 次提交