1. 13 12月, 2022 1 次提交
    • X
      net: add IFF_NO_ADDRCONF and use it in bonding to prevent ipv6 addrconf · 8a321cf7
      Xin Long 提交于
      Currently, in bonding it reused the IFF_SLAVE flag and checked it
      in ipv6 addrconf to prevent ipv6 addrconf.
      
      However, it is not a proper flag to use for no ipv6 addrconf, for
      bonding it has to move IFF_SLAVE flag setting ahead of dev_open()
      in bond_enslave(). Also, IFF_MASTER/SLAVE are historical flags
      used in bonding and eql, as Jiri mentioned, the new devices like
      Team, Failover do not use this flag.
      
      So as Jiri suggested, this patch adds IFF_NO_ADDRCONF in priv_flags
      of the device to indicate no ipv6 addconf, and uses it in bonding
      and moves IFF_SLAVE flag setting back to its original place.
      Signed-off-by: NXin Long <lucien.xin@gmail.com>
      Signed-off-by: NJakub Kicinski <kuba@kernel.org>
      8a321cf7
  2. 12 12月, 2022 22 次提交
  3. 11 12月, 2022 2 次提交
    • E
      bpf: states_equal() must build idmap for all function frames · 5dd9cdbc
      Eduard Zingerman 提交于
      verifier.c:states_equal() must maintain register ID mapping across all
      function frames. Otherwise the following example might be erroneously
      marked as safe:
      
      main:
          fp[-24] = map_lookup_elem(...)  ; frame[0].fp[-24].id == 1
          fp[-32] = map_lookup_elem(...)  ; frame[0].fp[-32].id == 2
          r1 = &fp[-24]
          r2 = &fp[-32]
          call foo()
          r0 = 0
          exit
      
      foo:
        0: r9 = r1
        1: r8 = r2
        2: r7 = ktime_get_ns()
        3: r6 = ktime_get_ns()
        4: if (r6 > r7) goto skip_assign
        5: r9 = r8
      
      skip_assign:                ; <--- checkpoint
        6: r9 = *r9               ; (a) frame[1].r9.id == 2
                                  ; (b) frame[1].r9.id == 1
      
        7: if r9 == 0 goto exit:  ; mark_ptr_or_null_regs() transfers != 0 info
                                  ; for all regs sharing ID:
                                  ;   (a) r9 != 0 => &frame[0].fp[-32] != 0
                                  ;   (b) r9 != 0 => &frame[0].fp[-24] != 0
      
        8: r8 = *r8               ; (a) r8 == &frame[0].fp[-32]
                                  ; (b) r8 == &frame[0].fp[-32]
        9: r0 = *r8               ; (a) safe
                                  ; (b) unsafe
      
      exit:
       10: exit
      
      While processing call to foo() verifier considers the following
      execution paths:
      
      (a) 0-10
      (b) 0-4,6-10
      (There is also path 0-7,10 but it is not interesting for the issue at
       hand. (a) is verified first.)
      
      Suppose that checkpoint is created at (6) when path (a) is verified,
      next path (b) is verified and (6) is reached.
      
      If states_equal() maintains separate 'idmap' for each frame the
      mapping at (6) for frame[1] would be empty and
      regsafe(r9)::check_ids() would add a pair 2->1 and return true,
      which is an error.
      
      If states_equal() maintains single 'idmap' for all frames the mapping
      at (6) would be { 1->1, 2->2 } and regsafe(r9)::check_ids() would
      return false when trying to add a pair 2->1.
      
      This issue was suggested in the following discussion:
      https://lore.kernel.org/bpf/CAEf4BzbFB5g4oUfyxk9rHy-PJSLQ3h8q9mV=rVoXfr_JVm8+1Q@mail.gmail.com/Suggested-by: NAndrii Nakryiko <andrii.nakryiko@gmail.com>
      Signed-off-by: NEduard Zingerman <eddyz87@gmail.com>
      Link: https://lore.kernel.org/r/20221209135733.28851-4-eddyz87@gmail.comSigned-off-by: NAlexei Starovoitov <ast@kernel.org>
      5dd9cdbc
    • D
      NFSD: add delegation reaper to react to low memory condition · 44df6f43
      Dai Ngo 提交于
      The delegation reaper is called by nfsd memory shrinker's on
      the 'count' callback. It scans the client list and sends the
      courtesy CB_RECALL_ANY to the clients that hold delegations.
      
      To avoid flooding the clients with CB_RECALL_ANY requests, the
      delegation reaper sends only one CB_RECALL_ANY request to each
      client per 5 seconds.
      Signed-off-by: NDai Ngo <dai.ngo@oracle.com>
      [ cel: moved definition of RCA4_TYPE_MASK_RDATA_DLG ]
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      44df6f43
  4. 10 12月, 2022 6 次提交
    • L
      sunrpc: svc: Remove an unused static function svc_ungetu32() · 3ed157d0
      Li zeming 提交于
      The svc_ungetu32 function is not used, you could remove it.
      Signed-off-by: NLi zeming <zeming@nfschina.com>
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      3ed157d0
    • K
      skbuff: Introduce slab_build_skb() · ce098da1
      Kees Cook 提交于
      syzkaller reported:
      
        BUG: KASAN: slab-out-of-bounds in __build_skb_around+0x235/0x340 net/core/skbuff.c:294
        Write of size 32 at addr ffff88802aa172c0 by task syz-executor413/5295
      
      For bpf_prog_test_run_skb(), which uses a kmalloc()ed buffer passed to
      build_skb().
      
      When build_skb() is passed a frag_size of 0, it means the buffer came
      from kmalloc. In these cases, ksize() is used to find its actual size,
      but since the allocation may not have been made to that size, actually
      perform the krealloc() call so that all the associated buffer size
      checking will be correctly notified (and use the "new" pointer so that
      compiler hinting works correctly). Split this logic out into a new
      interface, slab_build_skb(), but leave the original 0 checking for now
      to catch any stragglers.
      
      Reported-by: syzbot+fda18eaa8c12534ccb3b@syzkaller.appspotmail.com
      Link: https://groups.google.com/g/syzkaller-bugs/c/UnIKxTtU5-0/m/-wbXinkgAQAJ
      Fixes: 38931d89 ("mm: Make ksize() a reporting-only function")
      Cc: Pavel Begunkov <asml.silence@gmail.com>
      Cc: pepsipu <soopthegoop@gmail.com>
      Cc: syzbot+fda18eaa8c12534ccb3b@syzkaller.appspotmail.com
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: kasan-dev <kasan-dev@googlegroups.com>
      Cc: Andrii Nakryiko <andrii@kernel.org>
      Cc: ast@kernel.org
      Cc: Daniel Borkmann <daniel@iogearbox.net>
      Cc: Hao Luo <haoluo@google.com>
      Cc: Jesper Dangaard Brouer <hawk@kernel.org>
      Cc: John Fastabend <john.fastabend@gmail.com>
      Cc: jolsa@kernel.org
      Cc: KP Singh <kpsingh@kernel.org>
      Cc: martin.lau@linux.dev
      Cc: Stanislav Fomichev <sdf@google.com>
      Cc: song@kernel.org
      Cc: Yonghong Song <yhs@fb.com>
      Signed-off-by: NKees Cook <keescook@chromium.org>
      Link: https://lore.kernel.org/r/20221208060256.give.994-kees@kernel.orgSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      ce098da1
    • T
      memcg: fix possible use-after-free in memcg_write_event_control() · 4a7ba45b
      Tejun Heo 提交于
      memcg_write_event_control() accesses the dentry->d_name of the specified
      control fd to route the write call.  As a cgroup interface file can't be
      renamed, it's safe to access d_name as long as the specified file is a
      regular cgroup file.  Also, as these cgroup interface files can't be
      removed before the directory, it's safe to access the parent too.
      
      Prior to 347c4a87 ("memcg: remove cgroup_event->cft"), there was a
      call to __file_cft() which verified that the specified file is a regular
      cgroupfs file before further accesses.  The cftype pointer returned from
      __file_cft() was no longer necessary and the commit inadvertently dropped
      the file type check with it allowing any file to slip through.  With the
      invarients broken, the d_name and parent accesses can now race against
      renames and removals of arbitrary files and cause use-after-free's.
      
      Fix the bug by resurrecting the file type check in __file_cft().  Now that
      cgroupfs is implemented through kernfs, checking the file operations needs
      to go through a layer of indirection.  Instead, let's check the superblock
      and dentry type.
      
      Link: https://lkml.kernel.org/r/Y5FRm/cfcKPGzWwl@slm.duckdns.org
      Fixes: 347c4a87 ("memcg: remove cgroup_event->cft")
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NJann Horn <jannh@google.com>
      Acked-by: NRoman Gushchin <roman.gushchin@linux.dev>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: <stable@vger.kernel.org>	[3.14+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      4a7ba45b
    • D
      mm/swap: fix SWP_PFN_BITS with CONFIG_PHYS_ADDR_T_64BIT on 32bit · 630dc25e
      David Hildenbrand 提交于
      We use "unsigned long" to store a PFN in the kernel and phys_addr_t to
      store a physical address.
      
      On a 64bit system, both are 64bit wide.  However, on a 32bit system, the
      latter might be 64bit wide.  This is, for example, the case on x86 with
      PAE: phys_addr_t and PTEs are 64bit wide, while "unsigned long" only spans
      32bit.
      
      The current definition of SWP_PFN_BITS without MAX_PHYSMEM_BITS misses
      that case, and assumes that the maximum PFN is limited by an 32bit
      phys_addr_t.  This implies, that SWP_PFN_BITS will currently only be able
      to cover 4 GiB - 1 on any 32bit system with 4k page size, which is wrong.
      
      Let's rely on the number of bits in phys_addr_t instead, but make sure to
      not exceed the maximum swap offset, to not make the BUILD_BUG_ON() in
      is_pfn_swap_entry() unhappy.  Note that swp_entry_t is effectively an
      unsigned long and the maximum swap offset shares that value with the swap
      type.
      
      For example, on an 8 GiB x86 PAE system with a kernel config based on
      Debian 11.5 (-> CONFIG_FLATMEM=y, CONFIG_X86_PAE=y), we will currently
      fail removing migration entries (remove_migration_ptes()), because
      mm/page_vma_mapped.c:check_pte() will fail to identify a PFN match as
      swp_offset_pfn() wrongly masks off PFN bits.  For example,
      split_huge_page_to_list()->...->remap_page() will leave migration entries
      in place and continue to unlock the page.
      
      Later, when we stumble over these migration entries (e.g., via
      /proc/self/pagemap), pfn_swap_entry_to_page() will BUG_ON() because these
      migration entries shouldn't exist anymore and the page was unlocked.
      
      [   33.067591] kernel BUG at include/linux/swapops.h:497!
      [   33.067597] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
      [   33.067602] CPU: 3 PID: 742 Comm: cow Tainted: G            E      6.1.0-rc8+ #16
      [   33.067605] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-1.fc36 04/01/2014
      [   33.067606] EIP: pagemap_pmd_range+0x644/0x650
      [   33.067612] Code: 00 00 00 00 66 90 89 ce b9 00 f0 ff ff e9 ff fb ff ff 89 d8 31 db e8 48 c6 52 00 e9 23 fb ff ff e8 61 83 56 00 e9 b6 fe ff ff <0f> 0b bf 00 f0 ff ff e9 38 fa ff ff 3e 8d 74 26 00 55 89 e5 57 31
      [   33.067615] EAX: ee394000 EBX: 00000002 ECX: ee394000 EDX: 00000000
      [   33.067617] ESI: c1b0ded4 EDI: 00024a00 EBP: c1b0ddb4 ESP: c1b0dd68
      [   33.067619] DS: 007b ES: 007b FS: 00d8 GS: 0033 SS: 0068 EFLAGS: 00010246
      [   33.067624] CR0: 80050033 CR2: b7a00000 CR3: 01bbbd20 CR4: 00350ef0
      [   33.067625] Call Trace:
      [   33.067628]  ? madvise_free_pte_range+0x720/0x720
      [   33.067632]  ? smaps_pte_range+0x4b0/0x4b0
      [   33.067634]  walk_pgd_range+0x325/0x720
      [   33.067637]  ? mt_find+0x1d6/0x3a0
      [   33.067641]  ? mt_find+0x1d6/0x3a0
      [   33.067643]  __walk_page_range+0x164/0x170
      [   33.067646]  walk_page_range+0xf9/0x170
      [   33.067648]  ? __kmem_cache_alloc_node+0x2a8/0x340
      [   33.067653]  pagemap_read+0x124/0x280
      [   33.067658]  ? default_llseek+0x101/0x160
      [   33.067662]  ? smaps_account+0x1d0/0x1d0
      [   33.067664]  vfs_read+0x90/0x290
      [   33.067667]  ? do_madvise.part.0+0x24b/0x390
      [   33.067669]  ? debug_smp_processor_id+0x12/0x20
      [   33.067673]  ksys_pread64+0x58/0x90
      [   33.067675]  __ia32_sys_ia32_pread64+0x1b/0x20
      [   33.067680]  __do_fast_syscall_32+0x4c/0xc0
      [   33.067683]  do_fast_syscall_32+0x29/0x60
      [   33.067686]  do_SYSENTER_32+0x15/0x20
      [   33.067689]  entry_SYSENTER_32+0x98/0xf1
      
      Decrease the indentation level of SWP_PFN_BITS and SWP_PFN_MASK to keep it
      readable and consistent.
      
      [david@redhat.com: rely on sizeof(phys_addr_t) and min_t() instead]
        Link: https://lkml.kernel.org/r/20221206105737.69478-1-david@redhat.com
      [david@redhat.com: use "int" for comparison, as we're only comparing numbers < 64]
        Link: https://lkml.kernel.org/r/1f157500-2676-7cef-a84e-9224ed64e540@redhat.com
      Link: https://lkml.kernel.org/r/20221205150857.167583-1-david@redhat.com
      Fixes: 0d206b5d ("mm/swap: add swp_offset_pfn() to fetch PFN from swap entry")
      Signed-off-by: NDavid Hildenbrand <david@redhat.com>
      Acked-by: NPeter Xu <peterx@redhat.com>
      Reviewed-by: NYang Shi <shy828301@gmail.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      630dc25e
    • W
      regmap-irq: Add handle_mask_sync() callback · 69af4bca
      William Breathitt Gray 提交于
      Provide a public callback handle_mask_sync() that drivers can use when
      they have more complex IRQ masking logic. The default implementation is
      regmap_irq_handle_mask_sync(), used if the chip doesn't provide its own
      callback.
      
      Cc: Mark Brown <broonie@kernel.org>
      Signed-off-by: NWilliam Breathitt Gray <william.gray@linaro.org>
      Link: https://lore.kernel.org/r/e083474b3d467a86e6cb53da8072de4515bd6276.1669100542.git.william.gray@linaro.orgSigned-off-by: NMark Brown <broonie@kernel.org>
      69af4bca
    • R
      lsm: Fix description of fs_context_parse_param · 577cc143
      Roberto Sassu 提交于
      The fs_context_parse_param hook already has a description, which seems the
      right one according to the code.
      
      Fixes: 8eb687bc ("lsm: Add/fix return values in lsm_hooks.h and fix formatting")
      Signed-off-by: NRoberto Sassu <roberto.sassu@huawei.com>
      Signed-off-by: NPaul Moore <paul@paul-moore.com>
      577cc143
  5. 09 12月, 2022 9 次提交