1. 09 5月, 2022 7 次提交
  2. 02 4月, 2022 1 次提交
  3. 23 3月, 2022 1 次提交
  4. 17 3月, 2022 1 次提交
  5. 15 3月, 2022 2 次提交
  6. 08 3月, 2022 1 次提交
  7. 02 2月, 2022 1 次提交
  8. 17 12月, 2021 1 次提交
  9. 19 10月, 2021 2 次提交
  10. 25 9月, 2021 1 次提交
  11. 17 8月, 2021 1 次提交
  12. 13 7月, 2021 1 次提交
  13. 30 6月, 2021 2 次提交
  14. 06 5月, 2021 1 次提交
  15. 22 3月, 2021 1 次提交
  16. 25 2月, 2021 2 次提交
  17. 03 12月, 2020 1 次提交
    • R
      mm: memcontrol: Use helpers to read page's memcg data · bcfe06bf
      Roman Gushchin 提交于
      Patch series "mm: allow mapping accounted kernel pages to userspace", v6.
      
      Currently a non-slab kernel page which has been charged to a memory cgroup
      can't be mapped to userspace.  The underlying reason is simple: PageKmemcg
      flag is defined as a page type (like buddy, offline, etc), so it takes a
      bit from a page->mapped counter.  Pages with a type set can't be mapped to
      userspace.
      
      But in general the kmemcg flag has nothing to do with mapping to
      userspace.  It only means that the page has been accounted by the page
      allocator, so it has to be properly uncharged on release.
      
      Some bpf maps are mapping the vmalloc-based memory to userspace, and their
      memory can't be accounted because of this implementation detail.
      
      This patchset removes this limitation by moving the PageKmemcg flag into
      one of the free bits of the page->mem_cgroup pointer.  Also it formalizes
      accesses to the page->mem_cgroup and page->obj_cgroups using new helpers,
      adds several checks and removes a couple of obsolete functions.  As the
      result the code became more robust with fewer open-coded bit tricks.
      
      This patch (of 4):
      
      Currently there are many open-coded reads of the page->mem_cgroup pointer,
      as well as a couple of read helpers, which are barely used.
      
      It creates an obstacle on a way to reuse some bits of the pointer for
      storing additional bits of information.  In fact, we already do this for
      slab pages, where the last bit indicates that a pointer has an attached
      vector of objcg pointers instead of a regular memcg pointer.
      
      This commits uses 2 existing helpers and introduces a new helper to
      converts all read sides to calls of these helpers:
        struct mem_cgroup *page_memcg(struct page *page);
        struct mem_cgroup *page_memcg_rcu(struct page *page);
        struct mem_cgroup *page_memcg_check(struct page *page);
      
      page_memcg_check() is intended to be used in cases when the page can be a
      slab page and have a memcg pointer pointing at objcg vector.  It does
      check the lowest bit, and if set, returns NULL.  page_memcg() contains a
      VM_BUG_ON_PAGE() check for the page not being a slab page.
      
      To make sure nobody uses a direct access, struct page's
      mem_cgroup/obj_cgroups is converted to unsigned long memcg_data.
      Signed-off-by: NRoman Gushchin <guro@fb.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAlexei Starovoitov <ast@kernel.org>
      Reviewed-by: NShakeel Butt <shakeelb@google.com>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Link: https://lkml.kernel.org/r/20201027001657.3398190-1-guro@fb.com
      Link: https://lkml.kernel.org/r/20201027001657.3398190-2-guro@fb.com
      Link: https://lore.kernel.org/bpf/20201201215900.3569844-2-guro@fb.com
      bcfe06bf
  18. 02 12月, 2020 1 次提交
  19. 19 10月, 2020 1 次提交
    • R
      mm, memcg: rework remote charging API to support nesting · b87d8cef
      Roman Gushchin 提交于
      Currently the remote memcg charging API consists of two functions:
      memalloc_use_memcg() and memalloc_unuse_memcg(), which set and clear the
      memcg value, which overwrites the memcg of the current task.
      
        memalloc_use_memcg(target_memcg);
        <...>
        memalloc_unuse_memcg();
      
      It works perfectly for allocations performed from a normal context,
      however an attempt to call it from an interrupt context or just nest two
      remote charging blocks will lead to an incorrect accounting.  On exit from
      the inner block the active memcg will be cleared instead of being
      restored.
      
        memalloc_use_memcg(target_memcg);
      
        memalloc_use_memcg(target_memcg_2);
          <...>
          memalloc_unuse_memcg();
      
          Error: allocation here are charged to the memcg of the current
          process instead of target_memcg.
      
        memalloc_unuse_memcg();
      
      This patch extends the remote charging API by switching to a single
      function: struct mem_cgroup *set_active_memcg(struct mem_cgroup *memcg),
      which sets the new value and returns the old one.  So a remote charging
      block will look like:
      
        old_memcg = set_active_memcg(target_memcg);
        <...>
        set_active_memcg(old_memcg);
      
      This patch is heavily based on the patch by Johannes Weiner, which can be
      found here: https://lkml.org/lkml/2020/5/28/806 .
      Signed-off-by: NRoman Gushchin <guro@fb.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: NShakeel Butt <shakeelb@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Dan Schatzberg <dschatzberg@fb.com>
      Link: https://lkml.kernel.org/r/20200821212056.3769116-1-guro@fb.comSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b87d8cef
  20. 08 9月, 2020 1 次提交
    • J
      fs: Don't invalidate page buffers in block_write_full_page() · 6dbf7bb5
      Jan Kara 提交于
      If block_write_full_page() is called for a page that is beyond current
      inode size, it will truncate page buffers for the page and return 0.
      This logic has been added in 2.5.62 in commit 81eb69062588 ("fix ext3
      BUG due to race with truncate") in history.git tree to fix a problem
      with ext3 in data=ordered mode. This particular problem doesn't exist
      anymore because ext3 is long gone and ext4 handles ordered data
      differently. Also normally buffers are invalidated by truncate code and
      there's no need to specially handle this in ->writepage() code.
      
      This invalidation of page buffers in block_write_full_page() is causing
      issues to filesystems (e.g. ext4 or ocfs2) when block device is shrunk
      under filesystem's hands and metadata buffers get discarded while being
      tracked by the journalling layer. Although it is obviously "not
      supported" it can cause kernel crashes like:
      
      [ 7986.689400] BUG: unable to handle kernel NULL pointer dereference at
      +0000000000000008
      [ 7986.697197] PGD 0 P4D 0
      [ 7986.699724] Oops: 0002 [#1] SMP PTI
      [ 7986.703200] CPU: 4 PID: 203778 Comm: jbd2/dm-3-8 Kdump: loaded Tainted: G
      +O     --------- -  - 4.18.0-147.5.0.5.h126.eulerosv2r9.x86_64 #1
      [ 7986.716438] Hardware name: Huawei RH2288H V3/BC11HGSA0, BIOS 1.57 08/11/2015
      [ 7986.723462] RIP: 0010:jbd2_journal_grab_journal_head+0x1b/0x40 [jbd2]
      ...
      [ 7986.810150] Call Trace:
      [ 7986.812595]  __jbd2_journal_insert_checkpoint+0x23/0x70 [jbd2]
      [ 7986.818408]  jbd2_journal_commit_transaction+0x155f/0x1b60 [jbd2]
      [ 7986.836467]  kjournald2+0xbd/0x270 [jbd2]
      
      which is not great. The crash happens because bh->b_private is suddently
      NULL although BH_JBD flag is still set (this is because
      block_invalidatepage() cleared BH_Mapped flag and subsequent bh lookup
      found buffer without BH_Mapped set, called init_page_buffers() which has
      rewritten bh->b_private). So just remove the invalidation in
      block_write_full_page().
      
      Note that the buffer cache invalidation when block device changes size
      is already careful to avoid similar problems by using
      invalidate_mapping_pages() which skips busy buffers so it was only this
      odd block_write_full_page() behavior that could tear down bdev buffers
      under filesystem's hands.
      Reported-by: NYe Bin <yebin10@huawei.com>
      Signed-off-by: NJan Kara <jack@suse.cz>
      Reviewed-by: NChristoph Hellwig <hch@lst.de>
      CC: stable@vger.kernel.org
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      6dbf7bb5
  21. 24 8月, 2020 1 次提交
  22. 08 8月, 2020 1 次提交
    • X
      fs: prevent BUG_ON in submit_bh_wbc() · 377254b2
      Xianting Tian 提交于
      If a device is hot-removed --- for example, when a physical device is
      unplugged from pcie slot or a nbd device's network is shutdown ---
      this can result in a BUG_ON() crash in submit_bh_wbc().  This is
      because the when the block device dies, the buffer heads will have
      their Buffer_Mapped flag get cleared, leading to the crash in
      submit_bh_wbc.
      
      We had attempted to work around this problem in commit a17712c8
      ("ext4: check superblock mapped prior to committing").  Unfortunately,
      it's still possible to hit the BUG_ON(!buffer_mapped(bh)) if the
      device dies between when the work-around check in ext4_commit_super()
      and when submit_bh_wbh() is finally called:
      
      Code path:
      ext4_commit_super
          judge if 'buffer_mapped(sbh)' is false, return <== commit a17712c8
                lock_buffer(sbh)
                ...
                unlock_buffer(sbh)
                     __sync_dirty_buffer(sbh,...
                          lock_buffer(sbh)
                              judge if 'buffer_mapped(sbh))' is false, return <== added by this patch
                                  submit_bh(...,sbh)
                                      submit_bh_wbc(...,sbh,...)
      
      [100722.966497] kernel BUG at fs/buffer.c:3095! <== BUG_ON(!buffer_mapped(bh))' in submit_bh_wbc()
      [100722.966503] invalid opcode: 0000 [#1] SMP
      [100722.966566] task: ffff8817e15a9e40 task.stack: ffffc90024744000
      [100722.966574] RIP: 0010:submit_bh_wbc+0x180/0x190
      [100722.966575] RSP: 0018:ffffc90024747a90 EFLAGS: 00010246
      [100722.966576] RAX: 0000000000620005 RBX: ffff8818a80603a8 RCX: 0000000000000000
      [100722.966576] RDX: ffff8818a80603a8 RSI: 0000000000020800 RDI: 0000000000000001
      [100722.966577] RBP: ffffc90024747ac0 R08: 0000000000000000 R09: ffff88207f94170d
      [100722.966578] R10: 00000000000437c8 R11: 0000000000000001 R12: 0000000000020800
      [100722.966578] R13: 0000000000000001 R14: 000000000bf9a438 R15: ffff88195f333000
      [100722.966580] FS:  00007fa2eee27700(0000) GS:ffff88203d840000(0000) knlGS:0000000000000000
      [100722.966580] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [100722.966581] CR2: 0000000000f0b008 CR3: 000000201a622003 CR4: 00000000007606e0
      [100722.966582] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      [100722.966583] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
      [100722.966583] PKRU: 55555554
      [100722.966583] Call Trace:
      [100722.966588]  __sync_dirty_buffer+0x6e/0xd0
      [100722.966614]  ext4_commit_super+0x1d8/0x290 [ext4]
      [100722.966626]  __ext4_std_error+0x78/0x100 [ext4]
      [100722.966635]  ? __ext4_journal_get_write_access+0xca/0x120 [ext4]
      [100722.966646]  ext4_reserve_inode_write+0x58/0xb0 [ext4]
      [100722.966655]  ? ext4_dirty_inode+0x48/0x70 [ext4]
      [100722.966663]  ext4_mark_inode_dirty+0x53/0x1e0 [ext4]
      [100722.966671]  ? __ext4_journal_start_sb+0x6d/0xf0 [ext4]
      [100722.966679]  ext4_dirty_inode+0x48/0x70 [ext4]
      [100722.966682]  __mark_inode_dirty+0x17f/0x350
      [100722.966686]  generic_update_time+0x87/0xd0
      [100722.966687]  touch_atime+0xa9/0xd0
      [100722.966690]  generic_file_read_iter+0xa09/0xcd0
      [100722.966694]  ? page_cache_tree_insert+0xb0/0xb0
      [100722.966704]  ext4_file_read_iter+0x4a/0x100 [ext4]
      [100722.966707]  ? __inode_security_revalidate+0x4f/0x60
      [100722.966709]  __vfs_read+0xec/0x160
      [100722.966711]  vfs_read+0x8c/0x130
      [100722.966712]  SyS_pread64+0x87/0xb0
      [100722.966716]  do_syscall_64+0x67/0x1b0
      [100722.966719]  entry_SYSCALL64_slow_path+0x25/0x25
      
      To address this, add the check of 'buffer_mapped(bh)' to
      __sync_dirty_buffer().  This also has the benefit of fixing this for
      other file systems.
      
      With this addition, we can drop the workaround in ext4_commit_supper().
      
      [ Commit description rewritten by tytso. ]
      Signed-off-by: NXianting Tian <xianting_tian@126.com>
      Link: https://lore.kernel.org/r/1596211825-8750-1-git-send-email-xianting_tian@126.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      377254b2
  23. 09 7月, 2020 1 次提交
  24. 01 7月, 2020 1 次提交
  25. 03 6月, 2020 2 次提交
  26. 18 4月, 2020 1 次提交
  27. 16 4月, 2020 1 次提交
    • R
      ext4: use non-movable memory for superblock readahead · d87f6392
      Roman Gushchin 提交于
      Since commit a8ac900b ("ext4: use non-movable memory for the
      superblock") buffers for ext4 superblock were allocated using
      the sb_bread_unmovable() helper which allocated buffer heads
      out of non-movable memory blocks. It was necessarily to not block
      page migrations and do not cause cma allocation failures.
      
      However commit 85c8f176 ("ext4: preload block group descriptors")
      broke this by introducing pre-reading of the ext4 superblock.
      The problem is that __breadahead() is using __getblk() underneath,
      which allocates buffer heads out of movable memory.
      
      It resulted in page migration failures I've seen on a machine
      with an ext4 partition and a preallocated cma area.
      
      Fix this by introducing sb_breadahead_unmovable() and
      __breadahead_gfp() helpers which use non-movable memory for buffer
      head allocations and use them for the ext4 superblock readahead.
      Reviewed-by: NAndreas Dilger <adilger@dilger.ca>
      Fixes: 85c8f176 ("ext4: preload block group descriptors")
      Signed-off-by: NRoman Gushchin <guro@fb.com>
      Link: https://lore.kernel.org/r/20200229001411.128010-1-guro@fb.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
      d87f6392
  28. 28 3月, 2020 1 次提交
  29. 25 3月, 2020 1 次提交