1. 20 10月, 2022 1 次提交
  2. 17 10月, 2022 1 次提交
  3. 16 10月, 2022 1 次提交
  4. 15 10月, 2022 1 次提交
  5. 14 10月, 2022 1 次提交
  6. 13 10月, 2022 9 次提交
    • A
      mm/migrate_device.c: add migrate_device_range() · e778406b
      Alistair Popple 提交于
      Device drivers can use the migrate_vma family of functions to migrate
      existing private anonymous mappings to device private pages.  These pages
      are backed by memory on the device with drivers being responsible for
      copying data to and from device memory.
      
      Device private pages are freed via the pgmap->page_free() callback when
      they are unmapped and their refcount drops to zero.  Alternatively they
      may be freed indirectly via migration back to CPU memory in response to a
      pgmap->migrate_to_ram() callback called whenever the CPU accesses an
      address mapped to a device private page.
      
      In other words drivers cannot control the lifetime of data allocated on
      the devices and must wait until these pages are freed from userspace. 
      This causes issues when memory needs to reclaimed on the device, either
      because the device is going away due to a ->release() callback or because
      another user needs to use the memory.
      
      Drivers could use the existing migrate_vma functions to migrate data off
      the device.  However this would require them to track the mappings of each
      page which is both complicated and not always possible.  Instead drivers
      need to be able to migrate device pages directly so they can free up
      device memory.
      
      To allow that this patch introduces the migrate_device family of functions
      which are functionally similar to migrate_vma but which skips the initial
      lookup based on mapping.
      
      Link: https://lkml.kernel.org/r/868116aab70b0c8ee467d62498bb2cf0ef907295.1664366292.git-series.apopple@nvidia.comSigned-off-by: NAlistair Popple <apopple@nvidia.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: Alex Sierra <alex.sierra@amd.com>
      Cc: Ben Skeggs <bskeggs@redhat.com>
      Cc: Christian König <christian.koenig@amd.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Felix Kuehling <Felix.Kuehling@amd.com>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: Lyude Paul <lyude@redhat.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      e778406b
    • A
      mm: free device private pages have zero refcount · ef233450
      Alistair Popple 提交于
      Since 27674ef6 ("mm: remove the extra ZONE_DEVICE struct page
      refcount") device private pages have no longer had an extra reference
      count when the page is in use.  However before handing them back to the
      owning device driver we add an extra reference count such that free pages
      have a reference count of one.
      
      This makes it difficult to tell if a page is free or not because both free
      and in use pages will have a non-zero refcount.  Instead we should return
      pages to the drivers page allocator with a zero reference count.  Kernel
      code can then safely use kernel functions such as get_page_unless_zero().
      
      Link: https://lkml.kernel.org/r/cf70cf6f8c0bdb8aaebdbfb0d790aea4c683c3c6.1664366292.git-series.apopple@nvidia.comSigned-off-by: NAlistair Popple <apopple@nvidia.com>
      Acked-by: NFelix Kuehling <Felix.Kuehling@amd.com>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: Christian König <christian.koenig@amd.com>
      Cc: Ben Skeggs <bskeggs@redhat.com>
      Cc: Lyude Paul <lyude@redhat.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Alex Sierra <alex.sierra@amd.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      ef233450
    • A
      mm/memory.c: fix race when faulting a device private page · 16ce101d
      Alistair Popple 提交于
      Patch series "Fix several device private page reference counting issues",
      v2
      
      This series aims to fix a number of page reference counting issues in
      drivers dealing with device private ZONE_DEVICE pages.  These result in
      use-after-free type bugs, either from accessing a struct page which no
      longer exists because it has been removed or accessing fields within the
      struct page which are no longer valid because the page has been freed.
      
      During normal usage it is unlikely these will cause any problems.  However
      without these fixes it is possible to crash the kernel from userspace. 
      These crashes can be triggered either by unloading the kernel module or
      unbinding the device from the driver prior to a userspace task exiting. 
      In modules such as Nouveau it is also possible to trigger some of these
      issues by explicitly closing the device file-descriptor prior to the task
      exiting and then accessing device private memory.
      
      This involves some minor changes to both PowerPC and AMD GPU code. 
      Unfortunately I lack hardware to test either of those so any help there
      would be appreciated.  The changes mimic what is done in for both Nouveau
      and hmm-tests though so I doubt they will cause problems.
      
      
      This patch (of 8):
      
      When the CPU tries to access a device private page the migrate_to_ram()
      callback associated with the pgmap for the page is called.  However no
      reference is taken on the faulting page.  Therefore a concurrent migration
      of the device private page can free the page and possibly the underlying
      pgmap.  This results in a race which can crash the kernel due to the
      migrate_to_ram() function pointer becoming invalid.  It also means drivers
      can't reliably read the zone_device_data field because the page may have
      been freed with memunmap_pages().
      
      Close the race by getting a reference on the page while holding the ptl to
      ensure it has not been freed.  Unfortunately the elevated reference count
      will cause the migration required to handle the fault to fail.  To avoid
      this failure pass the faulting page into the migrate_vma functions so that
      if an elevated reference count is found it can be checked to see if it's
      expected or not.
      
      [mpe@ellerman.id.au: fix build]
        Link: https://lkml.kernel.org/r/87fsgbf3gh.fsf@mpe.ellerman.id.au
      Link: https://lkml.kernel.org/r/cover.60659b549d8509ddecafad4f498ee7f03bb23c69.1664366292.git-series.apopple@nvidia.com
      Link: https://lkml.kernel.org/r/d3e813178a59e565e8d78d9b9a4e2562f6494f90.1664366292.git-series.apopple@nvidia.comSigned-off-by: NAlistair Popple <apopple@nvidia.com>
      Acked-by: NFelix Kuehling <Felix.Kuehling@amd.com>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Cc: Ralph Campbell <rcampbell@nvidia.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Lyude Paul <lyude@redhat.com>
      Cc: Alex Deucher <alexander.deucher@amd.com>
      Cc: Alex Sierra <alex.sierra@amd.com>
      Cc: Ben Skeggs <bskeggs@redhat.com>
      Cc: Christian König <christian.koenig@amd.com>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: "Huang, Ying" <ying.huang@intel.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      16ce101d
    • X
      mm/damon: move sz_damon_region to damon_sz_region · 652e0446
      Xin Hao 提交于
      Rename sz_damon_region() to damon_sz_region(), and move it to
      "include/linux/damon.h", because in many places, we can to use this func.
      
      Link: https://lkml.kernel.org/r/20220927001946.85375-1-xhao@linux.alibaba.comSigned-off-by: NXin Hao <xhao@linux.alibaba.com>
      Suggested-by: NSeongJae Park <sj@kernel.org>
      Reviewed-by: NSeongJae Park <sj@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      652e0446
    • K
      tcp/udp: Call inet6_destroy_sock() in IPv6 sk->sk_destruct(). · d38afeec
      Kuniyuki Iwashima 提交于
      Originally, inet6_sk(sk)->XXX were changed under lock_sock(), so we were
      able to clean them up by calling inet6_destroy_sock() during the IPv6 ->
      IPv4 conversion by IPV6_ADDRFORM.  However, commit 03485f2a ("udpv6:
      Add lockless sendmsg() support") added a lockless memory allocation path,
      which could cause a memory leak:
      
      setsockopt(IPV6_ADDRFORM)                 sendmsg()
      +-----------------------+                 +-------+
      - do_ipv6_setsockopt(sk, ...)             - udpv6_sendmsg(sk, ...)
        - sockopt_lock_sock(sk)                   ^._ called via udpv6_prot
          - lock_sock(sk)                             before WRITE_ONCE()
        - WRITE_ONCE(sk->sk_prot, &tcp_prot)
        - inet6_destroy_sock()                    - if (!corkreq)
        - sockopt_release_sock(sk)                  - ip6_make_skb(sk, ...)
          - release_sock(sk)                          ^._ lockless fast path for
                                                          the non-corking case
      
                                                      - __ip6_append_data(sk, ...)
                                                        - ipv6_local_rxpmtu(sk, ...)
                                                          - xchg(&np->rxpmtu, skb)
                                                            ^._ rxpmtu is never freed.
      
                                                      - goto out_no_dst;
      
                                                  - lock_sock(sk)
      
      For now, rxpmtu is only the case, but not to miss the future change
      and a similar bug fixed in commit e2732600 ("net: ping6: Fix
      memleak in ipv6_renew_options()."), let's set a new function to IPv6
      sk->sk_destruct() and call inet6_cleanup_sock() there.  Since the
      conversion does not change sk->sk_destruct(), we can guarantee that
      we can clean up IPv6 resources finally.
      
      We can now remove all inet6_destroy_sock() calls from IPv6 protocol
      specific ->destroy() functions, but such changes are invasive to
      backport.  So they can be posted as a follow-up later for net-next.
      
      Fixes: 03485f2a ("udpv6: Add lockless sendmsg() support")
      Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      d38afeec
    • K
      udp: Call inet6_destroy_sock() in setsockopt(IPV6_ADDRFORM). · 21985f43
      Kuniyuki Iwashima 提交于
      Commit 4b340ae2 ("IPv6: Complete IPV6_DONTFRAG support") forgot
      to add a change to free inet6_sk(sk)->rxpmtu while converting an IPv6
      socket into IPv4 with IPV6_ADDRFORM.  After conversion, sk_prot is
      changed to udp_prot and ->destroy() never cleans it up, resulting in
      a memory leak.
      
      This is due to the discrepancy between inet6_destroy_sock() and
      IPV6_ADDRFORM, so let's call inet6_destroy_sock() from IPV6_ADDRFORM
      to remove the difference.
      
      However, this is not enough for now because rxpmtu can be changed
      without lock_sock() after commit 03485f2a ("udpv6: Add lockless
      sendmsg() support").  We will fix this case in the following patch.
      
      Note we will rename inet6_destroy_sock() to inet6_cleanup_sock() and
      remove unnecessary inet6_destroy_sock() calls in sk_prot->destroy()
      in the future.
      
      Fixes: 4b340ae2 ("IPv6: Complete IPV6_DONTFRAG support")
      Signed-off-by: NKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      21985f43
    • A
    • P
      io_uring: remove notif leftovers · b7a81775
      Pavel Begunkov 提交于
      Notifications were killed but there is a couple of fields and struct
      declarations left, remove them.
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Link: https://lore.kernel.org/r/8df8877d677be5a2b43afd936d600e60105ea960.1664849941.git.asml.silence@gmail.comSigned-off-by: NJens Axboe <axboe@kernel.dk>
      b7a81775
    • P
      io_uring/af_unix: defer registered files gc to io_uring release · 0091bfc8
      Pavel Begunkov 提交于
      Instead of putting io_uring's registered files in unix_gc() we want it
      to be done by io_uring itself. The trick here is to consider io_uring
      registered files for cycle detection but not actually putting them down.
      Because io_uring can't register other ring instances, this will remove
      all refs to the ring file triggering the ->release path and clean up
      with io_ring_ctx_free().
      
      Cc: stable@vger.kernel.org
      Fixes: 6b06314c ("io_uring: add file set registration")
      Reported-and-tested-by: NDavid Bouman <dbouman03@gmail.com>
      Signed-off-by: NPavel Begunkov <asml.silence@gmail.com>
      Signed-off-by: NThadeu Lima de Souza Cascardo <cascardo@canonical.com>
      [axboe: add kerneldoc comment to skb, fold in skb leak fix]
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      0091bfc8
  7. 12 10月, 2022 5 次提交
    • B
      mm/hugetlb: fix races when looking up a CONT-PTE/PMD size hugetlb page · fac35ba7
      Baolin Wang 提交于
      On some architectures (like ARM64), it can support CONT-PTE/PMD size
      hugetlb, which means it can support not only PMD/PUD size hugetlb (2M and
      1G), but also CONT-PTE/PMD size(64K and 32M) if a 4K page size specified.
      
      So when looking up a CONT-PTE size hugetlb page by follow_page(), it will
      use pte_offset_map_lock() to get the pte entry lock for the CONT-PTE size
      hugetlb in follow_page_pte().  However this pte entry lock is incorrect
      for the CONT-PTE size hugetlb, since we should use huge_pte_lock() to get
      the correct lock, which is mm->page_table_lock.
      
      That means the pte entry of the CONT-PTE size hugetlb under current pte
      lock is unstable in follow_page_pte(), we can continue to migrate or
      poison the pte entry of the CONT-PTE size hugetlb, which can cause some
      potential race issues, even though they are under the 'pte lock'.
      
      For example, suppose thread A is trying to look up a CONT-PTE size hugetlb
      page by move_pages() syscall under the lock, however antoher thread B can
      migrate the CONT-PTE hugetlb page at the same time, which will cause
      thread A to get an incorrect page, if thread A also wants to do page
      migration, then data inconsistency error occurs.
      
      Moreover we have the same issue for CONT-PMD size hugetlb in
      follow_huge_pmd().
      
      To fix above issues, rename the follow_huge_pmd() as follow_huge_pmd_pte()
      to handle PMD and PTE level size hugetlb, which uses huge_pte_lock() to
      get the correct pte entry lock to make the pte entry stable.
      
      Mike said:
      
      Support for CONT_PMD/_PTE was added with bb9dd3df ("arm64: hugetlb:
      refactor find_num_contig()").  Patch series "Support for contiguous pte
      hugepages", v4.  However, I do not believe these code paths were
      executed until migration support was added with 5480280d ("arm64/mm:
      enable HugeTLB migration for contiguous bit HugeTLB pages") I would go
      with 5480280d for the Fixes: targe.
      
      Link: https://lkml.kernel.org/r/635f43bdd85ac2615a58405da82b4d33c6e5eb05.1662017562.git.baolin.wang@linux.alibaba.com
      Fixes: 5480280d ("arm64/mm: enable HugeTLB migration for contiguous bit HugeTLB pages")
      Signed-off-by: NBaolin Wang <baolin.wang@linux.alibaba.com>
      Suggested-by: NMike Kravetz <mike.kravetz@oracle.com>
      Reviewed-by: NMike Kravetz <mike.kravetz@oracle.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      fac35ba7
    • T
      include/linux/entry-common.h: remove has_signal comment of arch_do_signal_or_restart() prototype · 6a961bff
      Tiezhu Yang 提交于
      The argument has_signal of arch_do_signal_or_restart() has been removed in
      commit 8ba62d37 ("task_work: Call tracehook_notify_signal from
      get_signal on all architectures"), let us remove the related comment.
      
      Link: https://lkml.kernel.org/r/1662090106-5545-1-git-send-email-yangtiezhu@loongson.cn
      Fixes: 8ba62d37 ("task_work: Call tracehook_notify_signal from get_signal on all architectures")
      Signed-off-by: NTiezhu Yang <yangtiezhu@loongson.cn>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      6a961bff
    • J
      prandom: remove unused functions · de492c83
      Jason A. Donenfeld 提交于
      With no callers left of prandom_u32() and prandom_bytes(), as well as
      get_random_int(), remove these deprecated wrappers, in favor of
      get_random_u32() and get_random_bytes().
      Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Reviewed-by: NYury Norov <yury.norov@gmail.com>
      Acked-by: NJakub Kicinski <kuba@kernel.org>
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      de492c83
    • J
      treewide: use get_random_u32() when possible · a251c17a
      Jason A. Donenfeld 提交于
      The prandom_u32() function has been a deprecated inline wrapper around
      get_random_u32() for several releases now, and compiles down to the
      exact same code. Replace the deprecated wrapper with a direct call to
      the real function. The same also applies to get_random_int(), which is
      just a wrapper around get_random_u32(). This was done as a basic find
      and replace.
      Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Reviewed-by: NYury Norov <yury.norov@gmail.com>
      Reviewed-by: Jan Kara <jack@suse.cz> # for ext4
      Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> # for sch_cake
      Acked-by: Chuck Lever <chuck.lever@oracle.com> # for nfsd
      Acked-by: NJakub Kicinski <kuba@kernel.org>
      Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> # for thunderbolt
      Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs
      Acked-by: Helge Deller <deller@gmx.de> # for parisc
      Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      a251c17a
    • J
      treewide: use prandom_u32_max() when possible, part 1 · 81895a65
      Jason A. Donenfeld 提交于
      Rather than incurring a division or requesting too many random bytes for
      the given range, use the prandom_u32_max() function, which only takes
      the minimum required bytes from the RNG and avoids divisions. This was
      done mechanically with this coccinelle script:
      
      @basic@
      expression E;
      type T;
      identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
      typedef u64;
      @@
      (
      - ((T)get_random_u32() % (E))
      + prandom_u32_max(E)
      |
      - ((T)get_random_u32() & ((E) - 1))
      + prandom_u32_max(E * XXX_MAKE_SURE_E_IS_POW2)
      |
      - ((u64)(E) * get_random_u32() >> 32)
      + prandom_u32_max(E)
      |
      - ((T)get_random_u32() & ~PAGE_MASK)
      + prandom_u32_max(PAGE_SIZE)
      )
      
      @multi_line@
      identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
      identifier RAND;
      expression E;
      @@
      
      -       RAND = get_random_u32();
              ... when != RAND
      -       RAND %= (E);
      +       RAND = prandom_u32_max(E);
      
      // Find a potential literal
      @literal_mask@
      expression LITERAL;
      type T;
      identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
      position p;
      @@
      
              ((T)get_random_u32()@p & (LITERAL))
      
      // Add one to the literal.
      @script:python add_one@
      literal << literal_mask.LITERAL;
      RESULT;
      @@
      
      value = None
      if literal.startswith('0x'):
              value = int(literal, 16)
      elif literal[0] in '123456789':
              value = int(literal, 10)
      if value is None:
              print("I don't know how to handle %s" % (literal))
              cocci.include_match(False)
      elif value == 2**32 - 1 or value == 2**31 - 1 or value == 2**24 - 1 or value == 2**16 - 1 or value == 2**8 - 1:
              print("Skipping 0x%x for cleanup elsewhere" % (value))
              cocci.include_match(False)
      elif value & (value + 1) != 0:
              print("Skipping 0x%x because it's not a power of two minus one" % (value))
              cocci.include_match(False)
      elif literal.startswith('0x'):
              coccinelle.RESULT = cocci.make_expr("0x%x" % (value + 1))
      else:
              coccinelle.RESULT = cocci.make_expr("%d" % (value + 1))
      
      // Replace the literal mask with the calculated result.
      @plus_one@
      expression literal_mask.LITERAL;
      position literal_mask.p;
      expression add_one.RESULT;
      identifier FUNC;
      @@
      
      -       (FUNC()@p & (LITERAL))
      +       prandom_u32_max(RESULT)
      
      @collapse_ret@
      type T;
      identifier VAR;
      expression E;
      @@
      
       {
      -       T VAR;
      -       VAR = (E);
      -       return VAR;
      +       return E;
       }
      
      @drop_var@
      type T;
      identifier VAR;
      @@
      
       {
      -       T VAR;
              ... when != VAR
       }
      Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Reviewed-by: NYury Norov <yury.norov@gmail.com>
      Reviewed-by: NKP Singh <kpsingh@kernel.org>
      Reviewed-by: Jan Kara <jack@suse.cz> # for ext4 and sbitmap
      Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> # for drbd
      Acked-by: NJakub Kicinski <kuba@kernel.org>
      Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390
      Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # for mmc
      Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs
      Signed-off-by: NJason A. Donenfeld <Jason@zx2c4.com>
      81895a65
  8. 10 10月, 2022 3 次提交
  9. 09 10月, 2022 1 次提交
  10. 08 10月, 2022 5 次提交
  11. 07 10月, 2022 6 次提交
    • J
      vfio: Add vfio_file_is_group() · 4b22ef04
      Jason Gunthorpe 提交于
      This replaces uses of vfio_file_iommu_group() which were only detecting if
      the file is a VFIO file with no interest in the actual group.
      
      The only remaning user of vfio_file_iommu_group() is in KVM for the SPAPR
      stuff. It passes the iommu_group into the arch code through kvm for some
      reason.
      Tested-by: NMatthew Rosato <mjrosato@linux.ibm.com>
      Tested-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Tested-by: NEric Farman <farman@linux.ibm.com>
      Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
      Link: https://lore.kernel.org/r/1-v2-15417f29324e+1c-vfio_group_disassociate_jgg@nvidia.comSigned-off-by: NAlex Williamson <alex.williamson@redhat.com>
      4b22ef04
    • A
      virtio_blk: add SECURE ERASE command support · e60d6407
      Alvaro Karsz 提交于
      Support for the VIRTIO_BLK_F_SECURE_ERASE VirtIO feature.
      
      A device that offers this feature can receive VIRTIO_BLK_T_SECURE_ERASE
      commands.
      
      A device which supports this feature has the following fields in the
      virtio config:
      
      - max_secure_erase_sectors
      - max_secure_erase_seg
      - secure_erase_sector_alignment
      
      max_secure_erase_sectors and secure_erase_sector_alignment are expressed
      in 512-byte units.
      
      Every secure erase command has the following fields:
      
      - sectors: The starting offset in 512-byte units.
      - num_sectors: The number of sectors.
      Signed-off-by: NAlvaro Karsz <alvaro.karsz@solid-run.com>
      Message-Id: <20220921082729.2516779-1-alvaro.karsz@solid-run.com>
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Reviewed-by: NStefan Hajnoczi <stefanha@redhat.com>
      e60d6407
    • J
      vdpa: device feature provisioning · 90fea5a8
      Jason Wang 提交于
      This patch allows the device features to be provisioned through
      netlink. A new attribute is introduced to allow the userspace to pass
      a 64bit device features during device adding.
      
      This provides several advantages:
      
      - Allow to provision a subset of the features to ease the cross vendor
        live migration.
      - Better debug-ability for vDPA framework and parent.
      Reviewed-by: NEli Cohen <elic@nvidia.com>
      Signed-off-by: NJason Wang <jasowang@redhat.com>
      Message-Id: <20220927074810.28627-2-jasowang@redhat.com>
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      90fea5a8
    • M
      virtio: drop vp_legacy_set_queue_size · cdbd952b
      Michael S. Tsirkin 提交于
      There's actually no way to set queue size on legacy virtio pci.
      Signed-off-by: NMichael S. Tsirkin <mst@redhat.com>
      Message-Id: <20220815220447.155860-1-mst@redhat.com>
      cdbd952b
    • H
      wifi: wext: use flex array destination for memcpy() · e3e6e1d1
      Hawkins Jiawei 提交于
      Syzkaller reports buffer overflow false positive as follows:
      ------------[ cut here ]------------
      memcpy: detected field-spanning write (size 8) of single field
      	"&compat_event->pointer" at net/wireless/wext-core.c:623 (size 4)
      WARNING: CPU: 0 PID: 3607 at net/wireless/wext-core.c:623
      	wireless_send_event+0xab5/0xca0 net/wireless/wext-core.c:623
      Modules linked in:
      CPU: 1 PID: 3607 Comm: syz-executor659 Not tainted
      	6.0.0-rc6-next-20220921-syzkaller #0
      [...]
      Call Trace:
       <TASK>
       ioctl_standard_call+0x155/0x1f0 net/wireless/wext-core.c:1022
       wireless_process_ioctl+0xc8/0x4c0 net/wireless/wext-core.c:955
       wext_ioctl_dispatch net/wireless/wext-core.c:988 [inline]
       wext_ioctl_dispatch net/wireless/wext-core.c:976 [inline]
       wext_handle_ioctl+0x26b/0x280 net/wireless/wext-core.c:1049
       sock_ioctl+0x285/0x640 net/socket.c:1220
       vfs_ioctl fs/ioctl.c:51 [inline]
       __do_sys_ioctl fs/ioctl.c:870 [inline]
       __se_sys_ioctl fs/ioctl.c:856 [inline]
       __x64_sys_ioctl+0x193/0x200 fs/ioctl.c:856
       do_syscall_x64 arch/x86/entry/common.c:50 [inline]
       do_syscall_64+0x35/0xb0 arch/x86/entry/common.c:80
       entry_SYSCALL_64_after_hwframe+0x63/0xcd
       [...]
       </TASK>
      
      Wireless events will be sent on the appropriate channels in
      wireless_send_event(). Different wireless events may have different
      payload structure and size, so kernel uses **len** and **cmd** field
      in struct __compat_iw_event as wireless event common LCP part, uses
      **pointer** as a label to mark the position of remaining different part.
      
      Yet the problem is that, **pointer** is a compat_caddr_t type, which may
      be smaller than the relative structure at the same position. So during
      wireless_send_event() tries to parse the wireless events payload, it may
      trigger the memcpy() run-time destination buffer bounds checking when the
      relative structure's data is copied to the position marked by **pointer**.
      
      This patch solves it by introducing flexible-array field **ptr_bytes**,
      to mark the position of the wireless events remaining part next to
      LCP part. What's more, this patch also adds **ptr_len** variable in
      wireless_send_event() to improve its maintainability.
      
      Reported-and-tested-by: syzbot+473754e5af963cf014cf@syzkaller.appspotmail.com
      Link: https://lore.kernel.org/all/00000000000070db2005e95a5984@google.com/Suggested-by: NKees Cook <keescook@chromium.org>
      Reviewed-by: NKees Cook <keescook@chromium.org>
      Signed-off-by: NHawkins Jiawei <yin31149@gmail.com>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      e3e6e1d1
    • A
      net: ieee802154: return -EINVAL for unknown addr type · 30393181
      Alexander Aring 提交于
      This patch adds handling to return -EINVAL for an unknown addr type. The
      current behaviour is to return 0 as successful but the size of an
      unknown addr type is not defined and should return an error like -EINVAL.
      
      Fixes: 94160108 ("net/ieee802154: fix uninit value bug in dgram_sendmsg")
      Signed-off-by: NAlexander Aring <aahringo@redhat.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      30393181
  12. 06 10月, 2022 6 次提交