- 08 7月, 2018 5 次提交
-
-
由 John Fastabend 提交于
After latest lock updates there is no longer anything preventing a close and recvmsg call running in parallel. Additionally, we can race update with close if we close a socket and simultaneously update if via the BPF userspace API (note the cgroup ops are already run with sock_lock held). To resolve this take sock_lock in close and update paths. Reported-by: syzbot+b680e42077a0d7c9a0c4@syzkaller.appspotmail.com Fixes: e9db4ef6 ("bpf: sockhash fix omitted bucket lock in sock_close") Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
由 John Fastabend 提交于
Multiple BPF helpers in use by sk_skb programs calculate the max skb length using the __bpf_skb_max_len function. However, this calculates the max length using the skb->dev pointer which can be NULL when an sk_skb program is paired with an sk_msg program. To force this a sk_msg program needs to redirect into the ingress path of a sock with an attach sk_skb program. Then the the sk_skb program would need to call one of the helpers that adjust the skb size. To fix the null ptr dereference use SKB_MAX_ALLOC size if no dev is available. Fixes: 8934ce2f ("bpf: sockmap redirect ingress support") Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
由 Alexei Starovoitov 提交于
John Fastabend says: ==================== I missed fixing the error path in the sockhash code to align with supporting socks in multiple maps. Simply checking if the psock is present does not mean we can decrement the reference count because it could be part of another map. Fix this by cleaning up the error path so this situation does not happen. ==================== Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
由 John Fastabend 提交于
This removes locking from readers of RCU hash table. Its not necessary. Fixes: 81110384 ("bpf: sockmap, add hash map support") Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
由 John Fastabend 提交于
The current code, in the error path of sock_hash_ctx_update_elem, checks if the sock has a psock in the user data and if so decrements the reference count of the psock. However, if the error happens early in the error path we may have never incremented the psock reference count and if the psock exists because the sock is in another map then we may inadvertently decrement the reference count. Fix this by making the error path only call smap_release_sock if the error happens after the increment. Reported-by: syzbot+d464d2c20c717ef5a6a8@syzkaller.appspotmail.com Fixes: 81110384 ("bpf: sockmap, add hash map support") Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
- 05 7月, 2018 4 次提交
-
-
由 Taeung Song 提交于
For untracked executables of samples/bpf, add this. Untracked files: (use "git add <file>..." to include in what will be committed) samples/bpf/cpustat samples/bpf/fds_example samples/bpf/lathist samples/bpf/load_sock_ops ... Signed-off-by: NTaeung Song <treeze.taeung@gmail.com> Acked-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Taeung Song 提交于
test_task_rename() and test_urandom_read() can be failed during write() and read(), So check the result of them. Reviewed-by: NDavid Laight <David.Laight@ACULAB.COM> Signed-off-by: NTaeung Song <treeze.taeung@gmail.com> Acked-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Taeung Song 提交于
To avoid the below build warning message, use new generate_load() checking the return value. ignoring return value of ‘system’, declared with attribute warn_unused_result And it also refactors the duplicate code of both test_perf_event_all_cpu() and test_perf_event_task() Cc: Teng Qin <qinteng@fb.com> Signed-off-by: NTaeung Song <treeze.taeung@gmail.com> Acked-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 Taeung Song 提交于
This fixes build error regarding redefinition: CLANG-bpf samples/bpf/parse_varlen.o samples/bpf/parse_varlen.c:111:8: error: redefinition of 'vlan_hdr' struct vlan_hdr { ^ ./include/linux/if_vlan.h:38:8: note: previous definition is here So remove duplicate 'struct vlan_hdr' in sample code and include if_vlan.h Signed-off-by: NTaeung Song <treeze.taeung@gmail.com> Acked-by: NDavid S. Miller <davem@davemloft.net> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 04 7月, 2018 1 次提交
-
-
由 Mauricio Vasquez B 提交于
Decrement the number of elements in the map in case the allocation of a new node fails. Fixes: 6c905981 ("bpf: pre-allocate hash map elements") Signed-off-by: NMauricio Vasquez B <mauricio.vasquez@polito.it> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
- 03 7月, 2018 8 次提交
-
-
由 Alexei Starovoitov 提交于
Magnus Karlsson says: ==================== This patch set fixes three bugs in the SKB TX path of AF_XDP. Details in the individual commits. The structure of the patch set is as follows: Patch 1: Fix for lost completion message Patch 2-3: Fix for possible multiple completions of single packet Patch 4: Fix potential race during error Changes from v1: * Added explanation of race in commit message of patch 4. ==================== Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
由 Magnus Karlsson 提交于
There is a potential race in the TX completion code for the SKB case. One process enters the sendmsg code of an AF_XDP socket in order to send a frame. The execution eventually trickles down to the driver that is told to send the packet. However, it decides to drop the packet due to some error condition (e.g., rings full) and frees the SKB. This will trigger the SKB destructor and a completion will be sent to the AF_XDP user space through its single-producer/single-consumer queues. At the same time a TX interrupt has fired on another core and it dispatches the TX completion code in the driver. It does its HW specific things and ends up freeing the SKB associated with the transmitted packet. This will trigger the SKB destructor and a completion will be sent to the AF_XDP user space through its single-producer/single-consumer queues. With a pseudo call stack, it would look like this: Core 1: sendmsg() being called in the application netdev_start_xmit() Driver entered through ndo_start_xmit Driver decides to free the SKB for some reason (e.g., rings full) Destructor of SKB called xskq_produce_addr() is called to signal completion to user space Core 2: TX completion irq NAPI loop Driver irq handler for TX completions Frees the SKB Destructor of SKB called xskq_produce_addr() is called to signal completion to user space We now have a violation of the single-producer/single-consumer principle for our queues as there are two threads trying to produce at the same time on the same queue. Fixed by introducing a spin_lock in the destructor. In regards to the performance, I get around 1.74 Mpps for txonly before and after the introduction of the spinlock. There is of course some impact due to the spin lock but it is in the less significant digits that are too noisy for me to measure. But let us say that the version without the spin lock got 1.745 Mpps in the best case and the version with 1.735 Mpps in the worst case, then that would mean a maximum drop in performance of 0.5%. Fixes: 35fcde7f ("xsk: support for Tx") Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
由 Magnus Karlsson 提交于
Sendmsg in the SKB path of AF_XDP can now return EBUSY when a packet was discarded and completed by the driver. Just ignore this message in the sample application. Fixes: b4b8faa1 ("samples/bpf: sample application and documentation for AF_XDP sockets") Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com> Reported-by: NPavel Odintsov <pavel@fastnetmon.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
由 Magnus Karlsson 提交于
Fixed a bug in which a frame could be completed more than once when an error was returned from dev_direct_xmit(). The code erroneously retried sending the message leading to multiple calls to the SKB destructor and therefore multiple completions of the same buffer to user space. The error code in this case has been changed from EAGAIN to EBUSY in order to tell user space that the sending of the packet failed and the buffer has been return to user space through the completion queue. Fixes: 35fcde7f ("xsk: support for Tx") Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com> Reported-by: NPavel Odintsov <pavel@fastnetmon.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
由 Magnus Karlsson 提交于
The code in xskq_produce_addr erroneously checked if there was up to LAZY_UPDATE_THRESHOLD amount of space in the completion queue. It only needs to check if there is one slot left in the queue. This bug could under some circumstances lead to a WARN_ON_ONCE being triggered and the completion message to user space being lost. Fixes: 35fcde7f ("xsk: support for Tx") Signed-off-by: NMagnus Karlsson <magnus.karlsson@intel.com> Reported-by: NPavel Odintsov <pavel@fastnetmon.com> Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
-
git://git.kernel.org/pub/scm/linux/kernel/git/shli/md由 Linus Torvalds 提交于
Pull MD fixes from Shaohua Li: "Two small fixes for MD: - an error handling fix from me - a recover bug fix for raid10 from BingJing" * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/shli/md: md/raid10: fix that replacement cannot complete recovery after reassemble MD: cleanup resources in failure
-
git://github.com/stffrdhrn/linux由 Linus Torvalds 提交于
Pull OpenRISC fixes from Stafford Horne: "Two fixes for issues which were breaking OpenRISC boot: - Fix bug in __pte_free_tlb() exposed in 4.18 by Matthew Wilcox's page table flag addition. - Fix issue booting on real hardware if delay slot detection emulation is disabled" * tag 'for-linus' of git://github.com/stffrdhrn/linux: openrisc: entry: Fix delay slot exception detection openrisc: Call destructor during __pte_free_tlb
-
git://git.kernel.org/pub/scm/linux/kernel/git/davem/net由 Linus Torvalds 提交于
Pull networking fixes from David Miller: 1) Verify netlink attributes properly in nf_queue, from Eric Dumazet. 2) Need to bump memory lock rlimit for test_sockmap bpf test, from Yonghong Song. 3) Fix VLAN handling in lan78xx driver, from Dave Stevenson. 4) Fix uninitialized read in nf_log, from Jann Horn. 5) Fix raw command length parsing in mlx5, from Alex Vesker. 6) Cleanup loopback RDS connections upon netns deletion, from Sowmini Varadhan. 7) Fix regressions in FIB rule matching during create, from Jason A. Donenfeld and Roopa Prabhu. 8) Fix mpls ether type detection in nfp, from Pieter Jansen van Vuuren. 9) More bpfilter build fixes/adjustments from Masahiro Yamada. 10) Fix XDP_{TX,REDIRECT} flushing in various drivers, from Jesper Dangaard Brouer. 11) fib_tests.sh file permissions were broken, from Shuah Khan. 12) Make sure BH/preemption is disabled in data path of mac80211, from Denis Kenzior. 13) Don't ignore nla_parse_nested() return values in nl80211, from Johannes berg. 14) Properly account sock objects ot kmemcg, from Shakeel Butt. 15) Adjustments to setting bpf program permissions to read-only, from Daniel Borkmann. 16) TCP Fast Open key endianness was broken, it always took on the host endiannness. Whoops. Explicitly make it little endian. From Yuching Cheng. 17) Fix prefix route setting for link local addresses in ipv6, from David Ahern. 18) Potential Spectre v1 in zatm driver, from Gustavo A. R. Silva. 19) Various bpf sockmap fixes, from John Fastabend. 20) Use after free for GRO with ESP, from Sabrina Dubroca. 21) Passing bogus flags to crypto_alloc_shash() in ipv6 SR code, from Eric Biggers. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (87 commits) qede: Adverstise software timestamp caps when PHC is not available. qed: Fix use of incorrect size in memcpy call. qed: Fix setting of incorrect eswitch mode. qed: Limit msix vectors in kdump kernel to the minimum required count. ipvlan: call dev_change_flags when ipvlan mode is reset ipv6: sr: fix passing wrong flags to crypto_alloc_shash() net: fix use-after-free in GRO with ESP tcp: prevent bogus FRTO undos with non-SACK flows bpf: sockhash, add release routine bpf: sockhash fix omitted bucket lock in sock_close bpf: sockmap, fix smap_list_map_remove when psock is in many maps bpf: sockmap, fix crash when ipv6 sock is added net: fib_rules: bring back rule_exists to match rule during add hv_netvsc: split sub-channel setup into async and sync net: use dev_change_tx_queue_len() for SIOCSIFTXQLEN atm: zatm: Fix potential Spectre v1 s390/qeth: consistently re-enable device features s390/qeth: don't clobber buffer on async TX completion s390/qeth: avoid using is_multicast_ether_addr_64bits on (u8 *)[6] s390/qeth: fix race when setting MAC address ...
-
- 02 7月, 2018 15 次提交
-
-
由 David S. Miller 提交于
Sudarsana Reddy Kalluru says: ==================== qed*: Fix series. The patch series addresses few issues in the qed* drivers. Please consider applying it to 'net' branch. ==================== Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Sudarsana Reddy Kalluru 提交于
When ptp clock is not available for a PF (e.g., higher PFs in NPAR mode), get-tsinfo() callback should return the software timestamp capabilities instead of returning the error. Fixes: 4c55215c ("qede: Add driver support for PTP") Signed-off-by: NSudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: NMichal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Sudarsana Reddy Kalluru 提交于
Use the correct size value while copying chassis/port id values. Fixes: 6ad8c632 ("qed: Add support for query/config dcbx.") Signed-off-by: NSudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: NMichal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Sudarsana Reddy Kalluru 提交于
By default, driver sets the eswitch mode incorrectly as VEB (virtual Ethernet bridging). Need to set VEB eswitch mode only when sriov is enabled, and it should be to set NONE by default. The patch incorporates this change. Fixes: 0fefbfba ("qed*: Management firmware - notifications and defaults") Signed-off-by: NSudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: NMichal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Sudarsana Reddy Kalluru 提交于
Memory size is limited in the kdump kernel environment. Allocation of more msix-vectors (or queues) consumes few tens of MBs of memory, which might lead to the kdump kernel failure. This patch adds changes to limit the number of MSI-X vectors in kdump kernel to minimum required value (i.e., 2 per engine). Fixes: fe56b9e6 ("qed: Add module with basic common support") Signed-off-by: NSudarsana Reddy Kalluru <Sudarsana.Kalluru@cavium.com> Signed-off-by: NMichal Kalderon <Michal.Kalderon@cavium.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Hangbin Liu 提交于
After we change the ipvlan mode from l3 to l2, or vice versa, we only reset IFF_NOARP flag, but don't flush the ARP table cache, which will cause eth->h_dest to be equal to eth->h_source in ipvlan_xmit_mode_l2(). Then the message will not come out of host. Here is the reproducer on local host: ip link set eth1 up ip addr add 192.168.1.1/24 dev eth1 ip link add link eth1 ipvlan1 type ipvlan mode l3 ip netns add net1 ip link set ipvlan1 netns net1 ip netns exec net1 ip link set ipvlan1 up ip netns exec net1 ip addr add 192.168.2.1/24 dev ipvlan1 ip route add 192.168.2.0/24 via 192.168.1.2 ping 192.168.2.2 -c 2 ip netns exec net1 ip link set ipvlan1 type ipvlan mode l2 ping 192.168.2.2 -c 2 Add the same configuration on remote host. After we set the mode to l2, we could find that the src/dst MAC addresses are the same on eth1: 21:26:06.648565 00:b7:13:ad:d3:05 > 00:b7:13:ad:d3:05, ethertype IPv4 (0x0800), length 98: (tos 0x0, ttl 64, id 58356, offset 0, flags [DF], proto ICMP (1), length 84) 192.168.2.1 > 192.168.2.2: ICMP echo request, id 22686, seq 1, length 64 Fix this by calling dev_change_flags(), which will call netdevice notifier with flag change info. v2: a) As pointed out by Wang Cong, check return value for dev_change_flags() when change dev flags. b) As suggested by Stefano and Sabrina, move flags setting before l3mdev_ops. So we don't need to redo ipvlan_{, un}register_nf_hook() again in err path. Reported-by: NJianlin Shi <jishi@redhat.com> Reviewed-by: NStefano Brivio <sbrivio@redhat.com> Reviewed-by: NSabrina Dubroca <sd@queasysnail.net> Fixes: 2ad7bf36 ("ipvlan: Initial check-in of the IPVLAN driver.") Signed-off-by: NHangbin Liu <liuhangbin@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Eric Biggers 提交于
The 'mask' argument to crypto_alloc_shash() uses the CRYPTO_ALG_* flags, not 'gfp_t'. So don't pass GFP_KERNEL to it. Fixes: bf355b8d ("ipv6: sr: add core files for SR HMAC support") Signed-off-by: NEric Biggers <ebiggers@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Sabrina Dubroca 提交于
Since the addition of GRO for ESP, gro_receive can consume the skb and return -EINPROGRESS. In that case, the lower layer GRO handler cannot touch the skb anymore. Commit 5f114163 ("net: Add a skb_gro_flush_final helper.") converted some of the gro_receive handlers that can lead to ESP's gro_receive so that they wouldn't access the skb when -EINPROGRESS is returned, but missed other spots, mainly in tunneling protocols. This patch finishes the conversion to using skb_gro_flush_final(), and adds a new helper, skb_gro_flush_final_remcsum(), used in VXLAN and GUE. Fixes: 5f114163 ("net: Add a skb_gro_flush_final helper.") Signed-off-by: NSabrina Dubroca <sd@queasysnail.net> Reviewed-by: NStefano Brivio <sbrivio@redhat.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Linus Torvalds 提交于
-
git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux由 Linus Torvalds 提交于
Pull btrfs fixes from David Sterba: "We have a few regression fixes for qgroup rescan status tracking and the vm_fault_t conversion that mixed up the error values" * tag 'for-4.18-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: Btrfs: fix mount failure when qgroup rescan is in progress Btrfs: fix regression in btrfs_page_mkwrite() from vm_fault_t conversion btrfs: quota: Set rescan progress to (u64)-1 if we hit last leaf
-
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs由 Linus Torvalds 提交于
Pull vfs fix from Al Viro: "Followup to procfs-seq_file series this window" This fixes a memory leak by making sure that proc seq files release any private data on close. The 'proc_seq_open' has to be properly paired with 'proc_seq_release' that releases the extra private data. * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: proc: add proc_seq_release
-
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging由 Linus Torvalds 提交于
Pull staging/IIO fixes from Greg KH: "Here are a few small staging and IIO driver fixes for 4.18-rc3. Nothing major or big, all just fixes for reported problems since 4.18-rc1. All of these have been in linux-next this week with no reported problems" * tag 'staging-4.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging: staging: android: ion: Return an ERR_PTR in ion_map_kernel staging: comedi: quatech_daqp_cs: fix no-op loop daqp_ao_insn_write() iio: imu: inv_mpu6050: Fix probe() failure on older ACPI based machines iio: buffer: fix the function signature to match implementation iio: mma8452: Fix ignoring MMA8452_INT_DRDY iio: tsl2x7x/tsl2772: avoid potential division by zero iio: pressure: bmp280: fix relative humidity unit
-
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty由 Linus Torvalds 提交于
Pull tty/serial fixes from Greg KH: "Here are five fixes for the tty core and some serial drivers. The tty core ones fix some security and other issues reported by the syzbot that I have taken too long in responding to (sorry Tetsuo!). The 8350 serial driver fix resolves an issue of devices that used to work properly stopping working as they shouldn't have been added to a blacklist. All of these have been in linux-next for a few days with no reported issues" * tag 'tty-4.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty: vt: prevent leaking uninitialized data to userspace via /dev/vcs* serdev: fix memleak on module unload serial: 8250_pci: Remove stalled entries in blacklist n_tty: Access echo_* variables carefully. n_tty: Fix stall at n_tty_receive_char_special().
-
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb由 Linus Torvalds 提交于
Pull USB fixes from Greg KH: "Here is a number of USB gadget and other driver fixes for 4.18-rc3. There's a bunch of them here, most of them being gadget driver and xhci host controller fixes for reported issues (as normal), but there are also some new device ids, and some fixes for the typec code. There is an acpi core patch in here that was acked by the acpi maintainer as it is needed for the typec fixes in order to properly solve a problem in that driver. All of these have been in linux-next this week with no reported issues" * tag 'usb-4.18-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb: (33 commits) usb: chipidea: host: fix disconnection detect issue usb: typec: tcpm: fix logbuffer index is wrong if _tcpm_log is re-entered typec: tcpm: Fix a msecs vs jiffies bug NFC: pn533: Fix wrong GFP flag usage usb: cdc_acm: Add quirk for Uniden UBC125 scanner staging/typec: fix tcpci_rt1711h build errors usb: typec: ucsi: Fix for incorrect status data issue usb: typec: ucsi: acpi: Workaround for cache mode issue acpi: Add helper for deactivating memory region usb: xhci: increase CRS timeout value usb: xhci: tegra: fix runtime PM error handling usb: xhci: remove the code build warning xhci: Fix kernel oops in trace_xhci_free_virt_device xhci: Fix perceived dead host due to runtime suspend race with event handler dwc2: gadget: Fix ISOC IN DDMA PID bitfield value calculation usb: gadget: dwc2: fix memory leak in gadget_init() usb: gadget: composite: fix delayed_status race condition when set_interface usb: dwc2: fix isoc split in transfer with no data usb: dwc2: alloc dma aligned buffer for isoc split in usb: dwc2: fix the incorrect bitmaps for the ports of multi_tt hub ...
-
git://git.infradead.org/users/hch/dma-mapping由 Linus Torvalds 提交于
Pull dma mapping fixlet from Christoph Hellwig: "Add a missing export required by riscv and unicore" * tag 'dma-mapping-4.18-2' of git://git.infradead.org/users/hch/dma-mapping: swiotlb: export swiotlb_dma_ops
-
- 01 7月, 2018 7 次提交
-
-
由 Ilpo Järvinen 提交于
If SACK is not enabled and the first cumulative ACK after the RTO retransmission covers more than the retransmitted skb, a spurious FRTO undo will trigger (assuming FRTO is enabled for that RTO). The reason is that any non-retransmitted segment acknowledged will set FLAG_ORIG_SACK_ACKED in tcp_clean_rtx_queue even if there is no indication that it would have been delivered for real (the scoreboard is not kept with TCPCB_SACKED_ACKED bits in the non-SACK case so the check for that bit won't help like it does with SACK). Having FLAG_ORIG_SACK_ACKED set results in the spurious FRTO undo in tcp_process_loss. We need to use more strict condition for non-SACK case and check that none of the cumulatively ACKed segments were retransmitted to prove that progress is due to original transmissions. Only then keep FLAG_ORIG_SACK_ACKED set, allowing FRTO undo to proceed in non-SACK case. (FLAG_ORIG_SACK_ACKED is planned to be renamed to FLAG_ORIG_PROGRESS to better indicate its purpose but to keep this change minimal, it will be done in another patch). Besides burstiness and congestion control violations, this problem can result in RTO loop: When the loss recovery is prematurely undoed, only new data will be transmitted (if available) and the next retransmission can occur only after a new RTO which in case of multiple losses (that are not for consecutive packets) requires one RTO per loss to recover. Signed-off-by: NIlpo Järvinen <ilpo.jarvinen@helsinki.fi> Tested-by: NNeal Cardwell <ncardwell@google.com> Acked-by: NNeal Cardwell <ncardwell@google.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Stafford Horne 提交于
Originally in patch e6d20c55 ("openrisc: entry: Fix delay slot detection") I fixed delay slot detection, but only for QEMU. We missed that hardware delay slot detection using delay slot exception flag (DSX) was still broken. This was because QEMU set the DSX flag in both pre-exception supervision register (ESR) and supervision register (SR) register, but on real hardware the DSX flag is only set on the SR register during exceptions. Fix this by carrying the DSX flag into the SR register during exception. We also update the DSX flag read locations to read the value from the SR register not the pt_regs SR register which represents ESR. The ESR should never have the DSX flag set. In the process I updated/removed a few comments to match the current state. Including removing a comment saying that the DSX detection logic was inefficient and needed to be rewritten. I have tested this on QEMU with a patch ensuring it matches the hardware specification. Link: https://lists.gnu.org/archive/html/qemu-devel/2018-07/msg00000.html Fixes: e6d20c55 ("openrisc: entry: Fix delay slot detection") Signed-off-by: NStafford Horne <shorne@gmail.com>
-
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf由 David S. Miller 提交于
Daniel Borkmann says: ==================== pull-request: bpf 2018-07-01 The following pull-request contains BPF updates for your *net* tree. The main changes are: 1) A bpf_fib_lookup() helper fix to change the API before freeze to return an encoding of the FIB lookup result and return the nexthop device index in the params struct (instead of device index as return code that we had before), from David. 2) Various BPF JIT fixes to address syzkaller fallout, that is, do not reject progs when set_memory_*() fails since it could still be RO. Also arm32 JIT was not using bpf_jit_binary_lock_ro() API which was an issue, and a memory leak in s390 JIT found during review, from Daniel. 3) Multiple fixes for sockmap/hash to address most of the syzkaller triggered bugs. Usage with IPv6 was crashing, a GPF in bpf_tcp_close(), a missing sock_map_release() routine to hook up to callbacks, and a fix for an omitted bucket lock in sock_close(), from John. 4) Two bpftool fixes to remove duplicated error message on program load, and another one to close the libbpf object after program load. One additional fix for nfp driver's BPF offload to avoid stopping offload completely if replace of program failed, from Jakub. 5) Couple of BPF selftest fixes that bail out in some of the test scripts if the user does not have the right privileges, from Jeffrin. 6) Fixes in test_bpf for s390 when CONFIG_BPF_JIT_ALWAYS_ON is set where we need to set the flag that some of the test cases are expected to fail, from Kleber. 7) Fix to detangle BPF_LIRC_MODE2 dependency from CONFIG_CGROUP_BPF since it has no relation to it and lirc2 users often have configs without cgroups enabled and thus would not be able to use it, from Sean. 8) Fix a selftest failure in sockmap by removing a useless setrlimit() call that would set a too low limit where at the same time we are already including bpf_rlimit.h that does the job, from Yonghong. 9) Fix BPF selftest config with missing missing NET_SCHED, from Anders. ==================== Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Daniel Borkmann 提交于
John Fastabend says: ==================== This addresses two syzbot issues that lead to identifying (by Eric and Wei) a class of bugs where we don't correctly check for IPv4/v6 sockets and their associated state. The second issue was a locking omission in sockhash. The first patch addresses IPv6 socks and fixing an error where sockhash would overwrite the prot pointer with IPv4 prot. To fix this build similar solution to TLS ULP. Although we continue to allow socks in all states not just ESTABLISH in this patch set because as Martin points out there should be no issue with this on the sockmap ULP because we don't use the ctx in this code. Once multiple ULPs coexist we may need to revisit this. However we can do this in *next trees. The other issue syzbot found that the tcp_close() handler missed locking the hash bucket lock which could result in corrupting the sockhash bucket list if delete and close ran at the same time. And also the smap_list_remove() routine was not working correctly at all. This was not caught in my testing because in general my tests (to date at least lets add some more robust selftest in bpf-next) do things in the "expected" order, create map, add socks, delete socks, then tear down maps. The tests we have that do the ops out of this order where only working on single maps not multi- maps so we never saw the issue. Thanks syzbot. The fix is to restructure the tcp_close() lock handling. And fix the obvious bug in smap_list_remove(). Finally, during review I noticed the release handler was omitted from the upstream code (patch 4) due to an incorrect merge conflict fix when I ported the code to latest bpf-next before submitting. This would leave references to the map around if the user never closes the map. v3: rework patches, dropping ESTABLISH check and adding rcu annotation along with the smap_list_remove fix v4: missed one more case where maps was being accessed without the sk_callback_lock, spoted by Martin as well. v5: changed to use a specific lock for maps and reduced callback lock so that it is only used to gaurd sk callbacks. I think this makes the logic a bit cleaner and avoids confusion ovoer what each lock is doing. Also big thanks to Martin for thorough review he caught at least one case where I missed a rcu_call(). ==================== Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 John Fastabend 提交于
Add map_release_uref pointer to hashmap ops. This was dropped when original sockhash code was ported into bpf-next before initial commit. Fixes: 81110384 ("bpf: sockmap, add hash map support") Acked-by: NMartin KaFai Lau <kafai@fb.com> Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 John Fastabend 提交于
First the sk_callback_lock() was being used to protect both the sock callback hooks and the psock->maps list. This got overly convoluted after the addition of sockhash (in sockmap it made some sense because masp and callbacks were tightly coupled) so lets split out a specific lock for maps and only use the callback lock for its intended purpose. This fixes a couple cases where we missed using maps lock when it was in fact needed. Also this makes it easier to follow the code because now we can put the locking closer to the actual code its serializing. Next, in sock_hash_delete_elem() the pattern was as follows, sock_hash_delete_elem() [...] spin_lock(bucket_lock) l = lookup_elem_raw() if (l) hlist_del_rcu() write_lock(sk_callback_lock) .... destroy psock ... write_unlock(sk_callback_lock) spin_unlock(bucket_lock) The ordering is necessary because we only know the {p}sock after dereferencing the hash table which we can't do unless we have the bucket lock held. Once we have the bucket lock and the psock element it is deleted from the hashmap to ensure any other path doing a lookup will fail. Finally, the refcnt is decremented and if zero the psock is destroyed. In parallel with the above (or free'ing the map) a tcp close event may trigger tcp_close(). Which at the moment omits the bucket lock altogether (oops!) where the flow looks like this, bpf_tcp_close() [...] write_lock(sk_callback_lock) for each psock->maps // list of maps this sock is part of hlist_del_rcu(ref_hash_node); .... destroy psock ... write_unlock(sk_callback_lock) Obviously, and demonstrated by syzbot, this is broken because we can have multiple threads deleting entries via hlist_del_rcu(). To fix this we might be tempted to wrap the hlist operation in a bucket lock but that would create a lock inversion problem. In summary to follow locking rules the psocks maps list needs the sk_callback_lock (after this patch maps_lock) but we need the bucket lock to do the hlist_del_rcu. To resolve the lock inversion problem pop the head of the maps list repeatedly and remove the reference until no more are left. If a delete happens in parallel from the BPF API that is OK as well because it will do a similar action, lookup the lock in the map/hash, delete it from the map/hash, and dec the refcnt. We check for this case before doing a destroy on the psock to ensure we don't have two threads tearing down a psock. The new logic is as follows, bpf_tcp_close() e = psock_map_pop(psock->maps) // done with map lock bucket_lock() // lock hash list bucket l = lookup_elem_raw(head, hash, key, key_size); if (l) { //only get here if elmnt was not already removed hlist_del_rcu() ... destroy psock... } bucket_unlock() And finally for all the above to work add missing locking around map operations per above. Then add RCU annotations and use rcu_dereference/rcu_assign_pointer to manage values relying on RCU so that the object is not free'd from sock_hash_free() while it is being referenced in bpf_tcp_close(). Reported-by: syzbot+0ce137753c78f7b6acc1@syzkaller.appspotmail.com Fixes: 81110384 ("bpf: sockmap, add hash map support") Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-
由 John Fastabend 提交于
If a hashmap is free'd with open socks it removes the reference to the hash entry from the psock. If that is the last reference to the psock then it will also be free'd by the reference counting logic. However the current logic that removes the hash reference from the list of references is broken. In smap_list_remove() we first check if the sockmap entry matches and then check if the hashmap entry matches. But, the sockmap entry sill always match because its NULL in this case which causes the first entry to be removed from the list. If this is always the "right" entry (because the user adds/removes entries in order) then everything is OK but otherwise a subsequent bpf_tcp_close() may reference a free'd object. To fix this create two list handlers one for sockmap and one for sockhash. Reported-by: syzbot+0ce137753c78f7b6acc1@syzkaller.appspotmail.com Fixes: 81110384 ("bpf: sockmap, add hash map support") Acked-by: NMartin KaFai Lau <kafai@fb.com> Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
-