- 04 6月, 2019 40 次提交
-
-
由 George Zhang 提交于
LVS fullnat will replace network traffic's source ip with its local ip, and thus the backend servers cannot obtain the real client ip. To solve this, LVS has introduced the tcp option address (TOA) to store the essential ip address information in the last tcp ack packet of the 3-way handshake, and the backend servers need to retrieve it from the packet header. In this patch, we have introduced the sk_toa_data member in the sock structure to hold the TOA information. There used to be an in-tree module for TOA managing, whereas it has now been maintained as an standalone module. In this case, the toa module should register its hook function(s) using the provided interfaces in the hookers module. TOA in sock structure: __be32 sk_toa_data[16]; The hookers module only provides the sk_toa_data placeholder, and the toa module can use this variable through the layout it needs. Hook interfaces: The hookers module replaces the kernel's syn_recv_sock and getname handler with a stub that chains the toa module's hook function(s) to the original handling function. The hookers module allows hook functions to be installed and uninstalled in any order. toa module: The external toa module will be provided in separate RPM package. [xuyu@linux.alibaba.com: amend commit log] Signed-off-by: NGeorge Zhang <georgezhang@linux.alibaba.com> Signed-off-by: NXu Yu <xuyu@linux.alibaba.com> Reviewed-by: NCaspar Zhang <caspar@linux.alibaba.com>
-
由 Changpeng Liu 提交于
commit 1f23816b8eb8fdc39990abe166c10a18c16f6b21 upstream. In commit 88c85538, "virtio-blk: add discard and write zeroes features to specification" (https://github.com/oasis-tcs/virtio-spec), the virtio block specification has been extended to add VIRTIO_BLK_T_DISCARD and VIRTIO_BLK_T_WRITE_ZEROES commands. This patch enables support for discard and write zeroes in the virtio-blk driver when the device advertises the corresponding features, VIRTIO_BLK_F_DISCARD and VIRTIO_BLK_F_WRITE_ZEROES. Signed-off-by: NChangpeng Liu <changpeng.liu@intel.com> Signed-off-by: NDaniel Verkamp <dverkamp@chromium.org> Signed-off-by: NMichael S. Tsirkin <mst@redhat.com> Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com> Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com> Reviewed-by: NLiu Bo <bo.liu@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Jiufei Xue 提交于
Unstable tsc will trigger clocksource watchdog and disable itself, as a result other clocksource will be elected as the current clocksource which will result in performace issue on our servers. RHEL7 also disabled this feature for some issues, see changelog: [x86] disable clocksource watchdog (Prarit Bhargava) [914709] Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com> Acked-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Jiufei Xue 提交于
This reverts commit 76d3b851. The returned value for check_tsc_warp() is useless now, remove it. Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com> Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Jiufei Xue 提交于
This reverts commit cc4db268. When we do hot-add and enable vCPU, the time inside the VM jumps and then VM stucks. The dmesg shows like this: [ 48.402948] CPU2 has been hot-added [ 48.413774] smpboot: Booting Node 0 Processor 2 APIC 0x2 [ 48.415155] kvm-clock: cpu 2, msr 6b615081, secondary cpu clock [ 48.453690] TSC ADJUST compensate: CPU2 observed 139318776350 warp. Adjust: 139318776350 [ 102.060874] clocksource: timekeeping watchdog on CPU0: Marking clocksource 'tsc' as unstable because the skew is too large: [ 102.060874] clocksource: 'kvm-clock' wd_now: 1cb1cfc4bf8 wd_last: 1be9588f1fe mask: ffffffffffffffff [ 102.060874] clocksource: 'tsc' cs_now: 207d794f7e cs_last: 205a32697a mask: ffffffffffffffff [ 102.060874] tsc: Marking TSC unstable due to clocksource watchdog [ 102.070188] KVM setup async PF for cpu 2 [ 102.071461] kvm-stealtime: cpu 2, msr 13ba95000 [ 102.074530] Will online and init hotplugged CPU: 2 This is because the TSC for the newly added VCPU is initialized to 0 while others are ahead. Guest will do the TSC ADJUST compensate and cause the time jumps. Commit bd8fab39("KVM: x86: fix maintaining of kvm_clock stability on guest CPU hotplug") can fix this problem. However, the host kernel version may be older, so do not ajust TSC if sync test fails, just mark it unstable. Signed-off-by: NJiufei Xue <jiufei.xue@linux.alibaba.com> Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Joseph Qi 提交于
ECI may have an use case that configuring each device mapper disk throttling policy just under root blkio cgroup, but actually using them in different containers. Since hierarchical throttling is now only supported on cgroup v2 and ECI uses cgroup v1, so we have to enable hierarchical throttling on cgroup v1. This is ported from redhat 7u, and a year ago Jiufei already ported it to alikernel 4.9 as well. So I think this change should be acceptable. Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
-
由 Eryu Guan 提交于
Prior to xdragon platform 20181230 release (e.g. 0930 release), vring_use_dma_api() is required to return 'true' unconditionally. Introduce a new kernel boot parameter called "vring_force_dma_api" to control the behavior, boot xdragon host with "vring_force_dma_api" command line to make ENI hotplug work, so that normal ECS hosts keep the original behavior. Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Signed-off-by: NEryu Guan <eguan@linux.alibaba.com>
-
由 Arjan van de Ven 提交于
Cherry-pick from clear-linux patches: https://github.com/clearlinux-pkgs/linux-kvm/0104-give-rdrand-some-credit.patch try to credit rdrand/rdseed with some entropy In VMs but even modern hardware, we're super starved for entropy, and while we can and do wear a tin foil hat, it's very hard to argue that rdrand and rdtsc add zero entropy. Signed-off-by: NArjan van de Ven <arjan@linux.intel.com> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
-
由 Julio Montes 提交于
Cherry-pick from kata-container patches: https://github.com/kata-containers/packaging/tree/master/kernel/patches/0001-NO-UPSTREAM-9P-always-use-cached-inode-to-fill-in-v9.patch So that if in cache=none mode, we don't have to lookup server that might not support open-unlink-fstat operation. fixes https://github.com/01org/cc-oci-runtime/issues/47 fixes https://github.com/01org/cc-oci-runtime/issues/1062Signed-off-by: NJulio Montes <julio.montes@intel.com> Signed-off-by: NPeng Tao <bergwolf@gmail.com> Signed-off-by: NEryu Guan <eguan@linux.alibaba.com> Reviewed-by: NJiufei Xue <jiufei.xue@linux.alibaba.com> Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Arjan van de Ven 提交于
Cherry-pick from kata-container patches: https://github.com/kata-containers/packaging/tree/master/kernel/patches/0002-Compile-in-evged-always.patch We need evged for NEMU (and in general for hw reduced) The config option cannot be set normally since it breaks all regular systems, and hardware reduced is really a runtime choice. Signed-off-by: NArjan van de Ven <arjan@linux.intel.com> Signed-off-by: NEryu Guan <eguan@linux.alibaba.com> Reviewed-by: NJiufei Xue <jiufei.xue@linux.alibaba.com> Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
-
由 Eric Whitney 提交于
commit f456767d3391e9f7d9d25a2e7241d75676dc19da upstream. Add new code to count canceled pending cluster reservations on bigalloc file systems and to reduce the cluster reservation count on all file systems using delayed allocation. This replaces old code in ext4_da_page_release_reservations that was incorrect. Signed-off-by: NEric Whitney <enwlinux@gmail.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
-
由 Eric Whitney 提交于
commit 9fe671496b6c286f9033aedfc1718d67721da0ae upstream. Modify ext4_ext_remove_space() and the code it calls to correct the reserved cluster count for pending reservations (delayed allocated clusters shared with allocated blocks) when a block range is removed from the extent tree. Pending reservations may be found for the clusters at the ends of written or unwritten extents when a block range is removed. If a physical cluster at the end of an extent is freed, it's necessary to increment the reserved cluster count to maintain correct accounting if the corresponding logical cluster is shared with at least one delayed and unwritten extent as found in the extents status tree. Add a new function, ext4_rereserve_cluster(), to reapply a reservation on a delayed allocated cluster sharing blocks with a freed allocated cluster. To avoid ENOSPC on reservation, a flag is applied to ext4_free_blocks() to briefly defer updating the freeclusters counter when an allocated cluster is freed. This prevents another thread from allocating the freed block before the reservation can be reapplied. Redefine the partial cluster object as a struct to carry more state information and to clarify the code using it. Adjust the conditional code structure in ext4_ext_remove_space to reduce the indentation level in the main body of the code to improve readability. Signed-off-by: NEric Whitney <enwlinux@gmail.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
-
由 Eric Whitney 提交于
commit b6bf9171ef5c37b66d446378ba63af5339a56a97 upstream. Ext4 does not always reduce the reserved cluster count by the number of clusters allocated when mapping a delayed extent. It sometimes adds back one or more clusters after allocation if delalloc blocks adjacent to the range allocated by ext4_ext_map_blocks() share the clusters newly allocated for that range. However, this overcounts the number of clusters needed to satisfy future mapping requests (holding one or more reservations for clusters that have already been allocated) and premature ENOSPC and quota failures, etc., result. Ext4 also does not reduce the reserved cluster count when allocating clusters for non-delayed allocated writes that have previously been reserved for delayed writes. This also results in overcounts. To make it possible to handle reserved cluster accounting for fallocated regions in the same manner as used for other non-delayed writes, do the reserved cluster accounting for them at the time of allocation. In the current code, this is only done later when a delayed extent sharing the fallocated region is finally mapped. Address comment correcting handling of unsigned long long constant from Jan Kara's review of RFC version of this patch. Signed-off-by: NEric Whitney <enwlinux@gmail.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
-
由 Eric Whitney 提交于
commit 0b02f4c0d6d9e2c611dfbdd4317193e9dca740e6 upstream. The code in ext4_da_map_blocks sometimes reserves space for more delayed allocated clusters than it should, resulting in premature ENOSPC, exceeded quota, and inaccurate free space reporting. Fix this by checking for written and unwritten blocks shared in the same cluster with the newly delayed allocated block. A cluster reservation should not be made for a cluster for which physical space has already been allocated. Signed-off-by: NEric Whitney <enwlinux@gmail.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
-
由 Eric Whitney 提交于
commit 1dc0aa46e74a3366e12f426b7caaca477853e9c3 upstream. Add new pending reservation mechanism to help manage reserved cluster accounting. Its primary function is to avoid the need to read extents from the disk when invalidating pages as a result of a truncate, punch hole, or collapse range operation. Signed-off-by: NEric Whitney <enwlinux@gmail.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
-
由 Eric Whitney 提交于
commit ad431025aecda85d3ebef5e4a3aca5c1c681d0c7 upstream. Ext4 contains a few functions that are used to search for delayed extents or blocks in the extents status tree. Rather than duplicate code to add new functions to search for extents with different status values, such as written or a combination of delayed and unwritten, generalize the existing code to search for caller-specified extents status values. Also, move this code into extents_status.c where it is better associated with the data structures it operates upon, and where it can be more readily used to implement new extents status tree functions that might want a broader scope for i_es_lock. Three missing static specifiers in RFC version of patch reported and fixed by Fengguang Wu <fengguang.wu@intel.com>. Signed-off-by: NEric Whitney <enwlinux@gmail.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Signed-off-by: NJoseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: NJiufei Xue <jiufei.xue@linux.alibaba.com>
-
由 Greg Kroah-Hartman 提交于
-
由 Junwei Hu 提交于
commit 526f5b851a96566803ee4bee60d0a34df56c77f8 upstream. Error message printed: modprobe: ERROR: could not insert 'tipc': Address family not supported by protocol. when modprobe tipc after the following patch: switch order of device registration, commit 7e27e8d6130c ("tipc: switch order of device registration to fix a crash") Because sock_create_kern(net, AF_TIPC, ...) called by tipc_topsrv_create_listener() in the initialization process of tipc_init_net(), so tipc_socket_init() must be execute before that. Meanwhile, tipc_net_id need to be initialized when sock_create() called, and tipc_socket_init() is no need to be called for each namespace. I add a variable tipc_topsrv_net_ops, and split the register_pernet_subsys() of tipc into two parts, and split tipc_socket_init() with initialization of pernet params. By the way, I fixed resources rollback error when tipc_bcast_init() failed in tipc_init_net(). Fixes: 7e27e8d6130c ("tipc: switch order of device registration to fix a crash") Signed-off-by: NJunwei Hu <hujunwei4@huawei.com> Reported-by: NWang Wang <wangwang2@huawei.com> Reported-by: syzbot+1e8114b61079bfe9cbc5@syzkaller.appspotmail.com Reviewed-by: NKang Zhou <zhoukang7@huawei.com> Reviewed-by: NSuanming Mou <mousuanming@huawei.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 David S. Miller 提交于
commit 5593530e56943182ebb6d81eca8a3be6db6dbba4 upstream. This reverts commit 532b0f7ece4cb2ffd24dc723ddf55242d1188e5e. More revisions coming up. Signed-off-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Konrad Rzeszutek Wilk 提交于
commit 7681f31ec9cdacab4fd10570be924f2cef6669ba upstream. There is no need for this at all. Worst it means that if the guest tries to write to BARs it could lead (on certain platforms) to PCI SERR errors. Please note that with af6fc858 "xen-pciback: limit guest control of command register" a guest is still allowed to enable those control bits (safely), but is not allowed to disable them and that therefore a well behaved frontend which enables things before using them will still function correctly. This is done via an write to the configuration register 0x4 which triggers on the backend side: command_write \- pci_enable_device \- pci_enable_device_flags \- do_pci_enable_device \- pcibios_enable_device \-pci_enable_resourcess [which enables the PCI_COMMAND_MEMORY|PCI_COMMAND_IO] However guests (and drivers) which don't do this could cause problems, including the security issues which XSA-120 sought to address. Reported-by: NJan Beulich <jbeulich@suse.com> Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com> Reviewed-by: NPrarit Bhargava <prarit@redhat.com> Signed-off-by: NJuergen Gross <jgross@suse.com> Cc: Ben Hutchings <ben.hutchings@codethink.co.uk> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Masahiro Yamada 提交于
commit e9666d10a5677a494260d60d1fa0b73cc7646eb3 upstream. Currently, CONFIG_JUMP_LABEL just means "I _want_ to use jump label". The jump label is controlled by HAVE_JUMP_LABEL, which is defined like this: #if defined(CC_HAVE_ASM_GOTO) && defined(CONFIG_JUMP_LABEL) # define HAVE_JUMP_LABEL #endif We can improve this by testing 'asm goto' support in Kconfig, then make JUMP_LABEL depend on CC_HAS_ASM_GOTO. Ugly #ifdef HAVE_JUMP_LABEL will go away, and CONFIG_JUMP_LABEL will match to the real kernel capability. Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Tested-by: NSedat Dilek <sedat.dilek@gmail.com> [nc: Fix trivial conflicts in 4.19 arch/xtensa/kernel/jump_label.c doesn't exist yet Ensured CC_HAVE_ASM_GOTO and HAVE_JUMP_LABEL were sufficiently eliminated] Signed-off-by: NNathan Chancellor <natechancellor@gmail.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Masahiro Yamada 提交于
commit 81b45683487a51b0f4d3b29d37f20d6d078544e4 upstream. __compiletime_assert_fallback() is supposed to stop building earlier by using the negative-array-size method in case the compiler does not support "error" attribute, but has never worked like that. You can simply try: BUILD_BUG_ON(1); GCC immediately terminates the build, but Clang does not report anything because Clang does not support the "error" attribute now. It will later fail at link time, but __compiletime_assert_fallback() is not working at least. The root cause is commit 1d6a0d19 ("bug.h: prevent double evaluation of `condition' in BUILD_BUG_ON"). Prior to that commit, BUILD_BUG_ON() was checked by the negative-array-size method *and* the link-time trick. Since that commit, the negative-array-size is not effective because '__cond' is no longer constant. As the comment in <linux/build_bug.h> says, GCC (and Clang as well) only emits the error for obvious cases. When '__cond' is a variable, ((void)sizeof(char[1 - 2 * __cond])) ... is not obvious for the compiler to know the array size is negative. Reverting that commit would break BUILD_BUG() because negative-size-array is evaluated before the code is optimized out. Let's give up __compiletime_assert_fallback(). This commit does not change the current behavior since it just rips off the useless code. Signed-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com> Reviewed-by: NKees Cook <keescook@chromium.org> Reviewed-by: NNick Desaulniers <ndesaulniers@google.com> Signed-off-by: NKees Cook <keescook@chromium.org> Signed-off-by: NNathan Chancellor <natechancellor@gmail.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 ndesaulniers@google.com 提交于
commit 8bd66d147c88bd441178c7b4c774ae5a185f19b8 upstream. asm_volatile_goto should also be defined for other compilers that support asm goto. Fixes commit 815f0ddb ("include/linux/compiler*.h: make compiler-*.h mutually exclusive"). Signed-off-by: NNick Desaulniers <ndesaulniers@google.com> Signed-off-by: NMiguel Ojeda <miguel.ojeda.sandonis@gmail.com> Signed-off-by: NNathan Chancellor <natechancellor@gmail.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Daniel Axtens 提交于
commit 357d065a44cdd77ed5ff35155a989f2a763e96ef upstream. VMX ghash was using a fallback that did not support interleaving simd and nosimd operations, leading to failures in the extended test suite. If I understood correctly, Eric's suggestion was to use the same data format that the generic code uses, allowing us to call into it with the same contexts. I wasn't able to get that to work - I think there's a very different key structure and data layout being used. So instead steal the arm64 approach and perform the fallback operations directly if required. Fixes: cc333cd6 ("crypto: vmx - Adding GHASH routines for VMX module") Cc: stable@vger.kernel.org # v4.1+ Reported-by: NEric Biggers <ebiggers@google.com> Signed-off-by: NDaniel Axtens <dja@axtens.net> Acked-by: NArd Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: NMichael Ellerman <mpe@ellerman.id.au> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Signed-off-by: NDaniel Axtens <dja@axtens.net> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Jakub Kicinski 提交于
[ Upstream commit c3f4a6c39cf269a40d45f813c05fa830318ad875 ] On device surprise removal path (the notifier) we can't bail just because the features are disabled. They may have been enabled during the lifetime of the device. This bug leads to leaking netdev references and use-after-frees if there are active connections while device features are cleared. Fixes: e8f69799 ("net/tls: Add generic NIC offload infrastructure") Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: NDirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Jakub Kicinski 提交于
[ Upstream commit 3686637e507b48525fcea6fb91e1988bdbc14530 ] TLS offload drivers shouldn't (and currently don't) block the TLS offload feature changes based on whether there are active offloaded connections or not. This seems to be a good idea, because we want the admin to be able to disable the TLS offload at any time, and there is no clean way of disabling it for active connections (TX side is quite problematic). So if features are cleared existing connections will stay offloaded until they close, and new connections will not attempt offload to a given device. However, the offload state removal handling is currently broken if feature flags get cleared while there are active TLS offloads. RX side will completely bail from cleanup, even on normal remove path, leaving device state dangling, potentially causing issues when the 5-tuple is reused. It will also fail to release the netdev reference. Remove the RX-side warning message, in next release cycle it should be printed when features are disabled, rather than when connection dies, but for that we need a more efficient method of finding connection of a given netdev (a'la BPF offload code). Fixes: 4799ac81 ("tls: Add rx inline crypto offload") Signed-off-by: NJakub Kicinski <jakub.kicinski@netronome.com> Reviewed-by: NDirk van der Merwe <dirk.vandermerwe@netronome.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Michael Chan 提交于
[ Upstream commit 296d5b54163964b7ae536b8b57dfbd21d4e868e1 ] For every RX packet, the driver replenishes all buffers used for that packet and puts them back into the RX ring and RX aggregation ring. In one code path where the RX packet has one RX buffer and one or more aggregation buffers, we missed recycling the aggregation buffer(s) if we are unable to allocate a new SKB buffer. This leads to the aggregation ring slowly running out of buffers over time. Fix it by properly recycling the aggregation buffers. Fixes: c0c050c5 ("bnxt_en: New Broadcom ethernet driver.") Reported-by: NRakesh Hemnani <rhemnani@fb.com> Signed-off-by: NMichael Chan <michael.chan@broadcom.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Weifeng Voon 提交于
stmmac_init_chan() needs to be called before stmmac_init_rx_chan() and stmmac_init_tx_chan(). This is because if PBLx8 is to be used, "DMA_CH(#i)_Control.PBLx8" needs to be set before programming "DMA_CH(#i)_TX_Control.TxPBL" and "DMA_CH(#i)_RX_Control.RxPBL". Fixes: 47f2a9ce ("net: stmmac: dma channel init prepared for multiple queues") Reviewed-by: NZhang, Baoli <baoli.zhang@intel.com> Signed-off-by: NOng Boon Leong <boon.leong.ong@intel.com> Signed-off-by: NWeifeng Voon <weifeng.voon@intel.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Saeed Mahameed 提交于
[ Upstream commit c0194e2d0ef0e5ce5e21a35640d23a706827ae28 ] When CQE compression is enabled (Multi-host systems), compressed CQEs might arrive to the driver rx, compressed CQEs don't have a valid hash offload and the driver already reports a hash value of 0 and invalid hash type on the skb for compressed CQEs, but this is not good enough. On a congested PCIe, where CQE compression will kick in aggressively, gro will deliver lots of out of order packets due to the invalid hash and this might cause a serious performance drop. The only valid solution, is to disable rxhash offload at all when CQE compression is favorable (Multi-host systems). Fixes: 7219ab34 ("net/mlx5e: CQE compression") Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Parav Pandit 提交于
[ Upstream commit 25fa506b70cadb580c1e9cbd836d6417276d4bcd ] root ns is yet another fs core node which is freed using kfree() by tree_put_node(). Rest of the other fs core objects are also allocated using kmalloc variants. However, root ns memory is allocated using kvzalloc(). Hence allocate root ns memory using kzalloc(). Fixes: 25302363 ("net/mlx5_core: Flow steering tree initialization") Signed-off-by: NParav Pandit <parav@mellanox.com> Reviewed-by: NDaniel Jurgens <danielj@mellanox.com> Reviewed-by: NMark Bloch <markb@mellanox.com> Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Chris Packham 提交于
TLV_SET is called with a data pointer and a len parameter that tells us how many bytes are pointed to by data. When invoking memcpy() we need to careful to only copy len bytes. Previously we would copy TLV_LENGTH(len) bytes which would copy an extra 4 bytes past the end of the data pointer which newer GCC versions complain about. In file included from test.c:17: In function 'TLV_SET', inlined from 'test' at test.c:186:5: /usr/include/linux/tipc_config.h:317:3: warning: 'memcpy' forming offset [33, 36] is out of the bounds [0, 32] of object 'bearer_name' with type 'char[32]' [-Warray-bounds] memcpy(TLV_DATA(tlv_ptr), data, tlv_len); ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ test.c: In function 'test': test.c::161:10: note: 'bearer_name' declared here char bearer_name[TIPC_MAX_BEARER_NAME]; ^~~~~~~~~~~ We still want to ensure any padding bytes at the end are initialised, do this with a explicit memset() rather than copy bytes past the end of data. Apply the same logic to TCM_SET. Signed-off-by: NChris Packham <chris.packham@alliedtelesis.co.nz> Signed-off-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Parav Pandit 提交于
[ Upstream commit 9414277a5df3669c67e818708c0f881597e0118e ] In below code flow, for ingress acl table root ns memory leads to double free. mlx5_init_fs init_ingress_acls_root_ns() init_ingress_acl_root_ns kfree(steering->esw_ingress_root_ns); /* steering->esw_ingress_root_ns is not marked NULL */ mlx5_cleanup_fs cleanup_ingress_acls_root_ns steering->esw_ingress_root_ns non NULL check passes. kfree(steering->esw_ingress_root_ns); /* double free */ Similar issue exist for other tables. Hence zero out the pointers to not process the table again. Fixes: 9b93ab98 ("net/mlx5: Separate ingress/egress namespaces for each vport") Fixes: 40c3eebb49e51 ("net/mlx5: Add support in RDMA RX steering") Signed-off-by: NParav Pandit <parav@mellanox.com> Reviewed-by: NMark Bloch <markb@mellanox.com> Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Kloetzke Jan 提交于
[ Upstream commit ad70411a978d1e6e97b1e341a7bde9a79af0c93d ] When disconnecting cdc_ncm the kernel sporadically crashes shortly after the disconnect: [ 57.868812] Unable to handle kernel NULL pointer dereference at virtual address 00000000 ... [ 58.006653] PC is at 0x0 [ 58.009202] LR is at call_timer_fn+0xec/0x1b4 [ 58.013567] pc : [<0000000000000000>] lr : [<ffffff80080f5130>] pstate: 00000145 [ 58.020976] sp : ffffff8008003da0 [ 58.024295] x29: ffffff8008003da0 x28: 0000000000000001 [ 58.029618] x27: 000000000000000a x26: 0000000000000100 [ 58.034941] x25: 0000000000000000 x24: ffffff8008003e68 [ 58.040263] x23: 0000000000000000 x22: 0000000000000000 [ 58.045587] x21: 0000000000000000 x20: ffffffc68fac1808 [ 58.050910] x19: 0000000000000100 x18: 0000000000000000 [ 58.056232] x17: 0000007f885aff8c x16: 0000007f883a9f10 [ 58.061556] x15: 0000000000000001 x14: 000000000000006e [ 58.066878] x13: 0000000000000000 x12: 00000000000000ba [ 58.072201] x11: ffffffc69ff1db30 x10: 0000000000000020 [ 58.077524] x9 : 8000100008001000 x8 : 0000000000000001 [ 58.082847] x7 : 0000000000000800 x6 : ffffff8008003e70 [ 58.088169] x5 : ffffffc69ff17a28 x4 : 00000000ffff138b [ 58.093492] x3 : 0000000000000000 x2 : 0000000000000000 [ 58.098814] x1 : 0000000000000000 x0 : 0000000000000000 ... [ 58.205800] [< (null)>] (null) [ 58.210521] [<ffffff80080f5298>] expire_timers+0xa0/0x14c [ 58.215937] [<ffffff80080f542c>] run_timer_softirq+0xe8/0x128 [ 58.221702] [<ffffff8008081120>] __do_softirq+0x298/0x348 [ 58.227118] [<ffffff80080a6304>] irq_exit+0x74/0xbc [ 58.232009] [<ffffff80080e17dc>] __handle_domain_irq+0x78/0xac [ 58.237857] [<ffffff8008080cf4>] gic_handle_irq+0x80/0xac ... The crash happens roughly 125..130ms after the disconnect. This correlates with the 'delay' timer that is started on certain USB tx/rx errors in the URB completion handler. The problem is a race of usbnet_stop() with usbnet_start_xmit(). In usbnet_stop() we call usbnet_terminate_urbs() to cancel all URBs in flight. This only makes sense if no new URBs are submitted concurrently, though. But the usbnet_start_xmit() can run at the same time on another CPU which almost unconditionally submits an URB. The error callback of the new URB will then schedule the timer after it was already stopped. The fix adds a check if the tx queue is stopped after the tx list lock has been taken. This should reliably prevent the submission of new URBs while usbnet_terminate_urbs() does its job. The same thing is done on the rx side even though it might be safe due to other flags that are checked there. Signed-off-by: NJan Klötzke <Jan.Kloetzke@preh.de> Signed-off-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Jisheng Zhang 提交于
[ Upstream commit 49ce881c0d4c4a7a35358d9dccd5f26d0e56fc61 ] Commit 984203ce ("net: stmmac: mdio: remove reset gpio free") removed the reset gpio free, when the driver is unbinded or rmmod, we miss the gpio free. This patch uses managed API to request the reset gpio, so that the gpio could be freed properly. Fixes: 984203ce ("net: stmmac: mdio: remove reset gpio free") Signed-off-by: NJisheng Zhang <Jisheng.Zhang@synaptics.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Vlad Buslov 提交于
[ Upstream commit 4097e9d250fb17958c1d9b94538386edd3f20144 ] Function tcf_action_dump() relies on tc_action->order field when starting nested nla to send action data to userspace. This approach breaks in several cases: - When multiple filters point to same shared action, tc_action->order field is overwritten each time it is attached to filter. This causes filter dump to output action with incorrect attribute for all filters that have the action in different position (different order) from the last set tc_action->order value. - When action data is displayed using tc action API (RTM_GETACTION), action order is overwritten by tca_action_gd() according to its position in resulting array of nl attributes, which will break filter dump for all filters attached to that shared action that expect it to have different order value. Don't rely on tc_action->order when dumping actions. Set nla according to action position in resulting array of actions instead. Signed-off-by: NVlad Buslov <vladbu@mellanox.com> Acked-by: NJamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Russell King 提交于
[ Upstream commit 3d3ced2ec5d71b99d72ae6910fbdf890bc2eccf0 ] Some boards do not have the PHY firmware programmed in the 3310's flash, which leads to the PHY not working as expected. Warn the user when the PHY fails to boot the firmware and refuse to initialise. Fixes: 20b2af32 ("net: phy: add Marvell Alaska X 88X3310 10Gigabit PHY support") Signed-off-by: NRussell King <rmk+kernel@armlinux.org.uk> Tested-by: NMaxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: NAndrew Lunn <andrew@lunn.ch> Signed-off-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Antoine Tenart 提交于
[ Upstream commit 21808437214637952b61beaba6034d97880fbeb3 ] MVPP2_TXQ_SCHED_TOKEN_CNTR_REG() expects the logical queue id but the current code is passing the global tx queue offset, so it ends up writing to unknown registers (between 0x8280 and 0x82fc, which seemed to be unused by the hardware). This fixes the issue by using the logical queue id instead. Fixes: 3f518509 ("ethernet: Add new driver for Marvell Armada 375 network unit") Signed-off-by: NAntoine Tenart <antoine.tenart@bootlin.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Jisheng Zhang 提交于
[ Upstream commit d484e06e25ebb937d841dac02ac1fe76ec7d4ddd ] Fix below issues in err code path of probe: 1. we don't need to unregister_netdev() because the netdev isn't registered. 2. when register_netdev() fails, we also need to destroy bm pool for HWBM case. Fixes: dc35a10f ("net: mvneta: bm: add support for hardware buffer management") Signed-off-by: NJisheng Zhang <Jisheng.Zhang@synaptics.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Eric Dumazet 提交于
[ Upstream commit a4270d6795b0580287453ea55974d948393e66ef ] If a network driver provides to napi_gro_frags() an skb with a page fragment of exactly 14 bytes, the call to gro_pull_from_frag0() will 'consume' the fragment by calling skb_frag_unref(skb, 0), and the page might be freed and reused. Reading eth->h_proto at the end of napi_frags_skb() might read mangled data, or crash under specific debugging features. BUG: KASAN: use-after-free in napi_frags_skb net/core/dev.c:5833 [inline] BUG: KASAN: use-after-free in napi_gro_frags+0xc6f/0xd10 net/core/dev.c:5841 Read of size 2 at addr ffff88809366840c by task syz-executor599/8957 CPU: 1 PID: 8957 Comm: syz-executor599 Not tainted 5.2.0-rc1+ #32 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x172/0x1f0 lib/dump_stack.c:113 print_address_description.cold+0x7c/0x20d mm/kasan/report.c:188 __kasan_report.cold+0x1b/0x40 mm/kasan/report.c:317 kasan_report+0x12/0x20 mm/kasan/common.c:614 __asan_report_load_n_noabort+0xf/0x20 mm/kasan/generic_report.c:142 napi_frags_skb net/core/dev.c:5833 [inline] napi_gro_frags+0xc6f/0xd10 net/core/dev.c:5841 tun_get_user+0x2f3c/0x3ff0 drivers/net/tun.c:1991 tun_chr_write_iter+0xbd/0x156 drivers/net/tun.c:2037 call_write_iter include/linux/fs.h:1872 [inline] do_iter_readv_writev+0x5f8/0x8f0 fs/read_write.c:693 do_iter_write fs/read_write.c:970 [inline] do_iter_write+0x184/0x610 fs/read_write.c:951 vfs_writev+0x1b3/0x2f0 fs/read_write.c:1015 do_writev+0x15b/0x330 fs/read_write.c:1058 Fixes: a50e233c ("net-gro: restore frag0 optimization") Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: Nsyzbot <syzkaller@googlegroups.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Andy Duan 提交于
[ Upstream commit ce8d24f9a5965a58c588f9342689702a1024433c ] Fix the clk mismatch in the error path "failed_reset" because below error path will disable clk_ahb and clk_ipg directly, it should use pm_runtime_put_noidle() instead of pm_runtime_put() to avoid to call runtime resume callback. Reported-by: NBaruch Siach <baruch@tkos.co.il> Signed-off-by: NFugang Duan <fugang.duan@nxp.com> Tested-by: NBaruch Siach <baruch@tkos.co.il> Signed-off-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-