1. 23 3月, 2022 1 次提交
    • H
      swiotlb: fix info leak with DMA_FROM_DEVICE · 04c20fc8
      Halil Pasic 提交于
      mainline inclusion
      from mainline-v5.17-rc6
      commit ddbd89de
      category: bugfix
      bugzilla: 186478, https://gitee.com/openeuler/kernel/issues/I4Z86P
      CVE: CVE-2022-0854
      
      --------------------------------
      
      The problem I'm addressing was discovered by the LTP test covering
      cve-2018-1000204.
      
      A short description of what happens follows:
      1) The test case issues a command code 00 (TEST UNIT READY) via the SG_IO
         interface with: dxfer_len == 524288, dxdfer_dir == SG_DXFER_FROM_DEV
         and a corresponding dxferp. The peculiar thing about this is that TUR
         is not reading from the device.
      2) In sg_start_req() the invocation of blk_rq_map_user() effectively
         bounces the user-space buffer. As if the device was to transfer into
         it. Since commit a45b599a ("scsi: sg: allocate with __GFP_ZERO in
         sg_build_indirect()") we make sure this first bounce buffer is
         allocated with GFP_ZERO.
      3) For the rest of the story we keep ignoring that we have a TUR, so the
         device won't touch the buffer we prepare as if the we had a
         DMA_FROM_DEVICE type of situation. My setup uses a virtio-scsi device
         and the  buffer allocated by SG is mapped by the function
         virtqueue_add_split() which uses DMA_FROM_DEVICE for the "in" sgs (here
         scatter-gather and not scsi generics). This mapping involves bouncing
         via the swiotlb (we need swiotlb to do virtio in protected guest like
         s390 Secure Execution, or AMD SEV).
      4) When the SCSI TUR is done, we first copy back the content of the second
         (that is swiotlb) bounce buffer (which most likely contains some
         previous IO data), to the first bounce buffer, which contains all
         zeros.  Then we copy back the content of the first bounce buffer to
         the user-space buffer.
      5) The test case detects that the buffer, which it zero-initialized,
        ain't all zeros and fails.
      
      One can argue that this is an swiotlb problem, because without swiotlb
      we leak all zeros, and the swiotlb should be transparent in a sense that
      it does not affect the outcome (if all other participants are well
      behaved).
      
      Copying the content of the original buffer into the swiotlb buffer is
      the only way I can think of to make swiotlb transparent in such
      scenarios. So let's do just that if in doubt, but allow the driver
      to tell us that the whole mapped buffer is going to be overwritten,
      in which case we can preserve the old behavior and avoid the performance
      impact of the extra bounce.
      Signed-off-by: NHalil Pasic <pasic@linux.ibm.com>
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Conflicts:
      	Documentation/core-api/dma-attributes.rst
      	include/linux/dma-mapping.h
      	kernel/dma/swiotlb.c
      Signed-off-by: NLiu Shixin <liushixin2@huawei.com>
      Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
      Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
      Signed-off-by: NYongqiang Liu <liuyongqiang13@huawei.com>
      04c20fc8
  2. 22 3月, 2022 7 次提交
  3. 21 3月, 2022 5 次提交
  4. 18 3月, 2022 2 次提交
  5. 17 3月, 2022 3 次提交
    • P
      audit: improve audit queue handling when "audit=1" on cmdline · 1518b80b
      Paul Moore 提交于
      mainline inclusion
      from mainline-v5.17-rc3
      commit f26d0433
      category: bugfix
      bugzilla: 186384 https://gitee.com/openeuler/kernel/issues/I4X1AI?from=project-issue
      CVE: NA
      
      --------------------------------
      
      When an admin enables audit at early boot via the "audit=1" kernel
      command line the audit queue behavior is slightly different; the
      audit subsystem goes to greater lengths to avoid dropping records,
      which unfortunately can result in problems when the audit daemon is
      forcibly stopped for an extended period of time.
      
      This patch makes a number of changes designed to improve the audit
      queuing behavior so that leaving the audit daemon in a stopped state
      for an extended period does not cause a significant impact to the
      system.
      
      - kauditd_send_queue() is now limited to looping through the
        passed queue only once per call.  This not only prevents the
        function from looping indefinitely when records are returned
        to the current queue, it also allows any recovery handling in
        kauditd_thread() to take place when kauditd_send_queue()
        returns.
      
      - Transient netlink send errors seen as -EAGAIN now cause the
        record to be returned to the retry queue instead of going to
        the hold queue.  The intention of the hold queue is to store,
        perhaps for an extended period of time, the events which led
        up to the audit daemon going offline.  The retry queue remains
        a temporary queue intended to protect against transient issues
        between the kernel and the audit daemon.
      
      - The retry queue is now limited by the audit_backlog_limit
        setting, the same as the other queues.  This allows admins
        to bound the size of all of the audit queues on the system.
      
      - kauditd_rehold_skb() now returns records to the end of the
        hold queue to ensure ordering is preserved in the face of
        recent changes to kauditd_send_queue().
      
      Cc: stable@vger.kernel.org
      Fixes: 5b52330b ("audit: fix auditd/kernel connection state tracking")
      Fixes: f4b3ee3c ("audit: improve robustness of the audit queue handling")
      Reported-by: NGaosheng Cui <cuigaosheng1@huawei.com>
      Tested-by: NGaosheng Cui <cuigaosheng1@huawei.com>
      Reviewed-by: NRichard Guy Briggs <rgb@redhat.com>
      Signed-off-by: NPaul Moore <paul@paul-moore.com>
      Signed-off-by: NCui GaoSheng <cuigaosheng1@huawei.com>
      Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
      Signed-off-by: NLaibin Qiu <qiulaibin@huawei.com>
      1518b80b
    • C
      Revert "audit: bugfix for infinite loop when flush the hold queue" · a9140c08
      Cui GaoSheng 提交于
      hulk inclusion
      category: bugfix
      bugzilla: 186384 https://gitee.com/openeuler/kernel/issues/I4X1AI?from=project-issue
      CVE: NA
      
      --------------------------------
      
      This reverts commit 67ab712f.
      Signed-off-by: NCui GaoSheng <cuigaosheng1@huawei.com>
      Reviewed-by: NXiu Jianfeng <xiujianfeng@huawei.com>
      Signed-off-by: NLaibin Qiu <qiulaibin@huawei.com>
      a9140c08
    • D
      veth: Do not record rx queue hint in veth_xmit · 2632c58b
      Daniel Borkmann 提交于
      stable inclusion
      from linux-4.19.226
      commit bd6e97e2b6f59a19894c7032a83f03ad38ede28e
      
      --------------------------------
      
      commit 710ad98c upstream.
      
      Laurent reported that they have seen a significant amount of TCP retransmissions
      at high throughput from applications residing in network namespaces talking to
      the outside world via veths. The drops were seen on the qdisc layer (fq_codel,
      as per systemd default) of the phys device such as ena or virtio_net due to all
      traffic hitting a _single_ TX queue _despite_ multi-queue device. (Note that the
      setup was _not_ using XDP on veths as the issue is generic.)
      
      More specifically, after edbea922 ("veth: Store queue_mapping independently
      of XDP prog presence") which made it all the way back to v4.19.184+,
      skb_record_rx_queue() would set skb->queue_mapping to 1 (given 1 RX and 1 TX
      queue by default for veths) instead of leaving at 0.
      
      This is eventually retained and callbacks like ena_select_queue() will also pick
      single queue via netdev_core_pick_tx()'s ndo_select_queue() once all the traffic
      is forwarded to that device via upper stack or other means. Similarly, for others
      not implementing ndo_select_queue() if XPS is disabled, netdev_pick_tx() might
      call into the skb_tx_hash() and check for prior skb_rx_queue_recorded() as well.
      
      In general, it is a _bad_ idea for virtual devices like veth to mess around with
      queue selection [by default]. Given dev->real_num_tx_queues is by default 1,
      the skb->queue_mapping was left untouched, and so prior to edbea922 the
      netdev_core_pick_tx() could do its job upon __dev_queue_xmit() on the phys device.
      
      Unbreak this and restore prior behavior by removing the skb_record_rx_queue()
      from veth_xmit() altogether.
      
      If the veth peer has an XDP program attached, then it would return the first RX
      queue index in xdp_md->rx_queue_index (unless configured in non-default manner).
      However, this is still better than breaking the generic case.
      
      Fixes: edbea922 ("veth: Store queue_mapping independently of XDP prog presence")
      Fixes: 638264dc ("veth: Support per queue XDP ring")
      Reported-by: NLaurent Bernaille <laurent.bernaille@datadoghq.com>
      Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net>
      Cc: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
      Cc: Toshiaki Makita <toshiaki.makita1@gmail.com>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Paolo Abeni <pabeni@redhat.com>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: Willem de Bruijn <willemb@google.com>
      Acked-by: NJohn Fastabend <john.fastabend@gmail.com>
      Reviewed-by: NEric Dumazet <edumazet@google.com>
      Acked-by: NToshiaki Makita <toshiaki.makita1@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Conflicts:
      	drivers/net/veth.c
      Signed-off-by: NZiyang Xuan <william.xuanziyang@huawei.com>
      Reviewed-by: NWei Yongjun <weiyongjun1@huawei.com>
      Signed-off-by: NLaibin Qiu <qiulaibin@huawei.com>
      2632c58b
  6. 15 3月, 2022 1 次提交
  7. 14 3月, 2022 21 次提交