1. 10 8月, 2021 5 次提交
    • D
      xfs: flush inode inactivation work when compiling usage statistics · 01e8f379
      Darrick J. Wong 提交于
      Users have come to expect that the space accounting information in
      statfs and getquota reports are fairly accurate.  Now that we inactivate
      inodes from a background queue, these numbers can be thrown off by
      whatever resources are singly-owned by the inodes in the queue.  Flush
      the pending inactivations when userspace asks for a space usage report.
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      01e8f379
    • D
      xfs: inactivate inodes any time we try to free speculative preallocations · 2eb66502
      Darrick J. Wong 提交于
      Other parts of XFS have learned to call xfs_blockgc_free_{space,quota}
      to try to free speculative preallocations when space is tight.  This
      means that file writes, transaction reservation failures, quota limit
      enforcement, and the EOFBLOCKS ioctl all call this function to free
      space when things are tight.
      
      Since inode inactivation is now a background task, this means that the
      filesystem can be hanging on to unlinked but not yet freed space.  Add
      this to the list of things that xfs_blockgc_free_* makes writer threads
      scan for when they cannot reserve space.
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      2eb66502
    • D
      xfs: queue inactivation immediately when free realtime extents are tight · 65f03d86
      Darrick J. Wong 提交于
      Now that we have made the inactivation of unlinked inodes a background
      task to increase the throughput of file deletions, we need to be a
      little more careful about how long of a delay we can tolerate.
      
      Similar to the patch doing this for free space on the data device, if
      the file being inactivated is a realtime file and the realtime volume is
      running low on free extents, we want to run the worker ASAP so that the
      realtime allocator can make better decisions.
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      65f03d86
    • D
      xfs: queue inactivation immediately when quota is nearing enforcement · 108523b8
      Darrick J. Wong 提交于
      Now that we have made the inactivation of unlinked inodes a background
      task to increase the throughput of file deletions, we need to be a
      little more careful about how long of a delay we can tolerate.
      
      Specifically, if the dquots attached to the inode being inactivated are
      nearing any kind of enforcement boundary, we want to queue that
      inactivation work immediately so that users don't get EDQUOT/ENOSPC
      errors even after they deleted a bunch of files to stay within quota.
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      108523b8
    • D
      xfs: queue inactivation immediately when free space is tight · 7d6f07d2
      Darrick J. Wong 提交于
      Now that we have made the inactivation of unlinked inodes a background
      task to increase the throughput of file deletions, we need to be a
      little more careful about how long of a delay we can tolerate.
      
      On a mostly empty filesystem, the risk of the allocator making poor
      decisions due to fragmentation of the free space on account a lengthy
      delay in background updates is minimal because there's plenty of space.
      However, if free space is tight, we want to deallocate unlinked inodes
      as quickly as possible to avoid fallocate ENOSPC and to give the
      allocator the best shot at optimal allocations for new writes.
      
      Therefore, queue the percpu worker immediately if the filesystem is more
      than 95% full.  This follows the same principle that XFS becomes less
      aggressive about speculative allocations and lazy cleanup (and more
      precise about accounting) when nearing full.
      Signed-off-by: NDarrick J. Wong <djwong@kernel.org>
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      7d6f07d2
  2. 07 8月, 2021 9 次提交
  3. 02 8月, 2021 4 次提交
    • L
      Linux 5.14-rc4 · c500bee1
      Linus Torvalds 提交于
      c500bee1
    • L
      Merge tag 'perf-tools-fixes-for-v5.14-2021-08-01' of... · d4affd6b
      Linus Torvalds 提交于
      Merge tag 'perf-tools-fixes-for-v5.14-2021-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
      
      Pull perf tools fixes from Arnaldo Carvalho de Melo:
      
       - Revert "perf map: Fix dso->nsinfo refcounting", this makes 'perf top'
         abort, uncovering a design flaw on how namespace information is kept.
         The fix for that is more than we can do right now, leave it for the
         next merge window.
      
       - Split --dump-raw-trace by AUX records for ARM's CoreSight, fixing up
         the decoding of some records.
      
       - Fix PMU alias matching.
      
      Thanks to James Clark and John Garry for these fixes.
      
      * tag 'perf-tools-fixes-for-v5.14-2021-08-01' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
        Revert "perf map: Fix dso->nsinfo refcounting"
        perf pmu: Fix alias matching
        perf cs-etm: Split --dump-raw-trace by AUX records
      d4affd6b
    • L
      Merge tag 'powerpc-5.14-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · c82357a7
      Linus Torvalds 提交于
      Pull powerpc fixes from Michael Ellerman:
      
       - Don't use r30 in VDSO code, to avoid breaking existing Go lang
         programs.
      
       - Change an export symbol to allow non-GPL modules to use spinlocks
         again.
      
      Thanks to Paul Menzel, and Srikar Dronamraju.
      
      * tag 'powerpc-5.14-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
        powerpc/vdso: Don't use r30 to avoid breaking Go lang
        powerpc/pseries: Fix regression while building external modules
      c82357a7
    • L
      Merge tag 'xfs-5.14-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux · aa660326
      Linus Torvalds 提交于
      Pull xfs fixes from Darrick Wong:
       "This contains a bunch of bug fixes in XFS.
      
        Dave and I have been busy the last couple of weeks to find and fix as
        many log recovery bugs as we can find; here are the results so far. Go
        fstests -g recoveryloop! ;)
      
         - Fix a number of coordination bugs relating to cache flushes for
           metadata writeback, cache flushes for multi-buffer log writes, and
           FUA writes for single-buffer log writes
      
         - Fix a bug with incorrect replay of attr3 blocks
      
         - Fix unnecessary stalls when flushing logs to disk
      
         - Fix spoofing problems when recovering realtime bitmap blocks"
      
      * tag 'xfs-5.14-fixes-2' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux:
        xfs: prevent spoofing of rtbitmap blocks when recovering buffers
        xfs: limit iclog tail updates
        xfs: need to see iclog flags in tracing
        xfs: Enforce attr3 buffer recovery order
        xfs: logging the on disk inode LSN can make it go backwards
        xfs: avoid unnecessary waits in xfs_log_force_lsn()
        xfs: log forces imply data device cache flushes
        xfs: factor out forced iclog flushes
        xfs: fix ordering violation between cache flushes and tail updates
        xfs: fold __xlog_state_release_iclog into xlog_state_release_iclog
        xfs: external logs need to flush data device
        xfs: flush data dev on external log write
      aa660326
  4. 01 8月, 2021 1 次提交
  5. 31 7月, 2021 20 次提交
    • L
      Merge tag 'net-5.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net · c7d10223
      Linus Torvalds 提交于
      Pull networking fixes from Jakub Kicinski:
       "Networking fixes for 5.14-rc4, including fixes from bpf, can, WiFi
        (mac80211) and netfilter trees.
      
        Current release - regressions:
      
         - mac80211: fix starting aggregation sessions on mesh interfaces
      
        Current release - new code bugs:
      
         - sctp: send pmtu probe only if packet loss in Search Complete state
      
         - bnxt_en: add missing periodic PHC overflow check
      
         - devlink: fix phys_port_name of virtual port and merge error
      
         - hns3: change the method of obtaining default ptp cycle
      
         - can: mcba_usb_start(): add missing urb->transfer_dma initialization
      
        Previous releases - regressions:
      
         - set true network header for ECN decapsulation
      
         - mlx5e: RX, avoid possible data corruption w/ relaxed ordering and
           LRO
      
         - phy: re-add check for PHY_BRCM_DIS_TXCRXC_NOENRGY on the BCM54811
           PHY
      
         - sctp: fix return value check in __sctp_rcv_asconf_lookup
      
        Previous releases - always broken:
      
         - bpf:
             - more spectre corner case fixes, introduce a BPF nospec
               instruction for mitigating Spectre v4
             - fix OOB read when printing XDP link fdinfo
             - sockmap: fix cleanup related races
      
         - mac80211: fix enabling 4-address mode on a sta vif after assoc
      
         - can:
             - raw: raw_setsockopt(): fix raw_rcv panic for sock UAF
             - j1939: j1939_session_deactivate(): clarify lifetime of session
               object, avoid UAF
             - fix number of identical memory leaks in USB drivers
      
         - tipc:
             - do not blindly write skb_shinfo frags when doing decryption
             - fix sleeping in tipc accept routine"
      
      * tag 'net-5.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (91 commits)
        gve: Update MAINTAINERS list
        can: esd_usb2: fix memory leak
        can: ems_usb: fix memory leak
        can: usb_8dev: fix memory leak
        can: mcba_usb_start(): add missing urb->transfer_dma initialization
        can: hi311x: fix a signedness bug in hi3110_cmd()
        MAINTAINERS: add Yasushi SHOJI as reviewer for the Microchip CAN BUS Analyzer Tool driver
        bpf: Fix leakage due to insufficient speculative store bypass mitigation
        bpf: Introduce BPF nospec instruction for mitigating Spectre v4
        sis900: Fix missing pci_disable_device() in probe and remove
        net: let flow have same hash in two directions
        nfc: nfcsim: fix use after free during module unload
        tulip: windbond-840: Fix missing pci_disable_device() in probe and remove
        sctp: fix return value check in __sctp_rcv_asconf_lookup
        nfc: s3fwrn5: fix undefined parameter values in dev_err()
        net/mlx5: Fix mlx5_vport_tbl_attr chain from u16 to u32
        net/mlx5e: Fix nullptr in mlx5e_hairpin_get_mdev()
        net/mlx5: Unload device upon firmware fatal error
        net/mlx5e: Fix page allocation failure for ptp-RQ over SF
        net/mlx5e: Fix page allocation failure for trap-RQ over SF
        ...
      c7d10223
    • L
      Merge tag 'acpi-5.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · e1dab4c0
      Linus Torvalds 提交于
      Pull ACPI fixes from Rafael Wysocki:
       "These revert a recent IRQ resources handling modification that turned
        out to be problematic, fix suspend-to-idle handling on AMD platforms
        to take upcoming systems into account properly and fix the retrieval
        of the DPTF attributes of the PCH FIVR.
      
        Specifics:
      
         - Revert recent change of the ACPI IRQ resources handling that
           attempted to improve the ACPI IRQ override selection logic, but
           introduced serious regressions on some systems (Hui Wang).
      
         - Fix up quirks for AMD platforms in the suspend-to-idle support code
           so as to take upcoming systems using uPEP HID AMDI007 into account
           as appropriate (Mario Limonciello).
      
         - Fix the code retrieving DPTF attributes of the PCH FIVR so that it
           agrees on the return data type with the ACPI control method
           evaluated for this purpose (Srinivas Pandruvada)"
      
      * tag 'acpi-5.14-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        ACPI: DPTF: Fix reading of attributes
        Revert "ACPI: resources: Add checks for ACPI IRQ override"
        ACPI: PM: Add support for upcoming AMD uPEP HID AMDI007
      e1dab4c0
    • L
      pipe: make pipe writes always wake up readers · 3a34b13a
      Linus Torvalds 提交于
      Since commit 1b6b26ae ("pipe: fix and clarify pipe write wakeup
      logic") we have sanitized the pipe write logic, and would only try to
      wake up readers if they needed it.
      
      In particular, if the pipe already had data in it before the write,
      there was no point in trying to wake up a reader, since any existing
      readers must have been aware of the pre-existing data already.  Doing
      extraneous wakeups will only cause potential thundering herd problems.
      
      However, it turns out that some Android libraries have misused the EPOLL
      interface, and expected "edge triggered" be to "any new write will
      trigger it".  Even if there was no edge in sight.
      
      Quoting Sandeep Patil:
       "The commit 1b6b26ae ('pipe: fix and clarify pipe write wakeup
        logic') changed pipe write logic to wakeup readers only if the pipe
        was empty at the time of write. However, there are libraries that
        relied upon the older behavior for notification scheme similar to
        what's described in [1]
      
        One such library 'realm-core'[2] is used by numerous Android
        applications. The library uses a similar notification mechanism as GNU
        Make but it never drains the pipe until it is full. When Android moved
        to v5.10 kernel, all applications using this library stopped working.
      
        The library has since been fixed[3] but it will be a while before all
        applications incorporate the updated library"
      
      Our regression rule for the kernel is that if applications break from
      new behavior, it's a regression, even if it was because the application
      did something patently wrong.  Also note the original report [4] by
      Michal Kerrisk about a test for this epoll behavior - but at that point
      we didn't know of any actual broken use case.
      
      So add the extraneous wakeup, to approximate the old behavior.
      
      [ I say "approximate", because the exact old behavior was to do a wakeup
        not for each write(), but for each pipe buffer chunk that was filled
        in. The behavior introduced by this change is not that - this is just
        "every write will cause a wakeup, whether necessary or not", which
        seems to be sufficient for the broken library use. ]
      
      It's worth noting that this adds the extraneous wakeup only for the
      write side, while the read side still considers the "edge" to be purely
      about reading enough from the pipe to allow further writes.
      
      See commit f467a6a6 ("pipe: fix and clarify pipe read wakeup logic")
      for the pipe read case, which remains that "only wake up if the pipe was
      full, and we read something from it".
      
      Link: https://lore.kernel.org/lkml/CAHk-=wjeG0q1vgzu4iJhW5juPkTsjTYmiqiMUYAebWW+0bam6w@mail.gmail.com/ [1]
      Link: https://github.com/realm/realm-core [2]
      Link: https://github.com/realm/realm-core/issues/4666 [3]
      Link: https://lore.kernel.org/lkml/CAKgNAkjMBGeAwF=2MKK758BhxvW58wYTgYKB2V-gY1PwXxrH+Q@mail.gmail.com/ [4]
      Link: https://lore.kernel.org/lkml/20210729222635.2937453-1-sspatil@android.com/Reported-by: NSandeep Patil <sspatil@android.com>
      Cc: Michael Kerrisk <mtk.manpages@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3a34b13a
    • A
      Revert "perf map: Fix dso->nsinfo refcounting" · 9bac1bd6
      Arnaldo Carvalho de Melo 提交于
      This makes 'perf top' abort in some cases, and the right fix will
      involve surgery that is too much to do at this stage, so revert for now
      and fix it in the next merge window.
      
      This reverts commit 2d6b74ba.
      
      Cc: Riccardo Mancini <rickyman7@gmail.com>
      Cc: Ian Rogers <irogers@google.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Krister Johansen <kjlx@templeofstupid.com>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Namhyung Kim <namhyung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Signed-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>
      9bac1bd6
    • R
      Merge branches 'acpi-resources' and 'acpi-dptf' · e83f54ea
      Rafael J. Wysocki 提交于
      * acpi-resources:
        Revert "ACPI: resources: Add checks for ACPI IRQ override"
      
      * acpi-dptf:
        ACPI: DPTF: Fix reading of attributes
      e83f54ea
    • L
      Merge tag 'block-5.14-2021-07-30' of git://git.kernel.dk/linux-block · 4669e13c
      Linus Torvalds 提交于
      Pull block fixes from Jens Axboe:
      
       - gendisk freeing fix (Christoph)
      
       - blk-iocost wake ordering fix (Tejun)
      
       - tag allocation error handling fix (John)
      
       - loop locking fix. While this isn't the prettiest fix in the world,
         nobody has any good alternatives for 5.14. Something to likely
         revisit for 5.15. (Tetsuo)
      
      * tag 'block-5.14-2021-07-30' of git://git.kernel.dk/linux-block:
        block: delay freeing the gendisk
        blk-iocost: fix operation ordering in iocg_wake_fn()
        blk-mq-sched: Fix blk_mq_sched_alloc_tags() error handling
        loop: reintroduce global lock for safe loop_validate_file() traversal
      4669e13c
    • L
      Merge tag 'io_uring-5.14-2021-07-30' of git://git.kernel.dk/linux-block · 27eb687b
      Linus Torvalds 提交于
      Pull io_uring fixes from Jens Axboe:
      
       - A fix for block backed reissue (me)
      
       - Reissue context hardening (me)
      
       - Async link locking fix (Pavel)
      
      * tag 'io_uring-5.14-2021-07-30' of git://git.kernel.dk/linux-block:
        io_uring: fix poll requests leaking second poll entries
        io_uring: don't block level reissue off completion path
        io_uring: always reissue from task_work context
        io_uring: fix race in unified task_work running
        io_uring: fix io_prep_async_link locking
      27eb687b
    • L
      Merge tag 'libata-5.14-2021-07-30' of git://git.kernel.dk/linux-block · f6c5971b
      Linus Torvalds 提交于
      Pull libata fixlets from Jens Axboe:
      
       - A fix for PIO highmem (Christoph)
      
       - Kill HAVE_IDE as it's now unused (Lukas)
      
      * tag 'libata-5.14-2021-07-30' of git://git.kernel.dk/linux-block:
        arch: Kconfig: clean up obsolete use of HAVE_IDE
        libata: fix ata_pio_sector for CONFIG_HIGHMEM
      f6c5971b
    • L
      Merge tag 'for-5.14-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux · 051df241
      Linus Torvalds 提交于
      Pull btrfs fixes from David Sterba:
      
       - fix -Warray-bounds warning, to help external patchset to make it
         default treewide
      
       - fix writeable device accounting (syzbot report)
      
       - fix fsync and log replay after a rename and inode eviction
      
       - fix potentially lost error code when submitting multiple bios for
         compressed range
      
      * tag 'for-5.14-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux:
        btrfs: calculate number of eb pages properly in csum_tree_block
        btrfs: fix rw device counting in __btrfs_free_extra_devids
        btrfs: fix lost inode on log replay after mix of fsync, rename and inode eviction
        btrfs: mark compressed range uptodate only if all bio succeed
      051df241
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid · 8723bc8f
      Linus Torvalds 提交于
      Pull HID fixes from Jiri Kosina:
      
       - resume timing fix for intel-ish driver (Ye Xiang)
      
       - fix for using incorrect MMIO register in amd_sfh driver (Dylan
         MacKenzie)
      
       - Cintiq 24HDT / 27QHDT regression fix and touch processing fix for
         Wacom driver (Jason Gerecke)
      
       - device removal bugfix for ft260 driver (Michael Zaidman)
      
       - other small assorted fixes
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
        HID: ft260: fix device removal due to USB disconnect
        HID: wacom: Skip processing of touches with negative slot values
        HID: wacom: Re-enable touch by default for Cintiq 24HDT / 27QHDT
        HID: Kconfig: Fix spelling mistake "Uninterruptable" -> "Uninterruptible"
        HID: apple: Add support for Keychron K1 wireless keyboard
        HID: fix typo in Kconfig
        HID: ft260: fix format type warning in ft260_word_show()
        HID: amd_sfh: Use correct MMIO register for DMA address
        HID: asus: Remove check for same LED brightness on set
        HID: intel-ish-hid: use async resume function
      8723bc8f
    • L
      Merge branch 'akpm' (patches from Andrew) · ad6ec09d
      Linus Torvalds 提交于
      Merge misc fixes from Andrew Morton:
       "7 patches.
      
        Subsystems affected by this patch series: lib, ocfs2, and mm (slub,
        migration, and memcg)"
      
      * emailed patches from Andrew Morton <akpm@linux-foundation.org>:
        mm/memcg: fix NULL pointer dereference in memcg_slab_free_hook()
        slub: fix unreclaimable slab stat for bulk free
        mm/migrate: fix NR_ISOLATED corruption on 64-bit
        mm: memcontrol: fix blocking rstat function called from atomic cgroup1 thresholding code
        ocfs2: issue zeroout to EOF blocks
        ocfs2: fix zero out valid data
        lib/test_string.c: move string selftest in the Runtime Testing menu
      ad6ec09d
    • J
      Merge tag 'linux-can-fixes-for-5.14-20210730' of... · 8d670412
      Jakub Kicinski 提交于
      Merge tag 'linux-can-fixes-for-5.14-20210730' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can
      
      Marc Kleine-Budde says:
      
      ====================
      pull-request: can 2021-07-30
      
      The first patch is by me and adds Yasushi SHOJI as a reviewer for the
      Microchip CAN BUS Analyzer Tool driver.
      
      Dan Carpenter's patch fixes a signedness bug in the hi311x driver.
      
      Pavel Skripkin provides 4 patches, the first targets the mcba_usb
      driver by adding the missing urb->transfer_dma initialization, which
      was broken in a previous commit. The last 3 patches fix a memory leak
      in the usb_8dev, ems_usb and esd_usb2 driver.
      
      * tag 'linux-can-fixes-for-5.14-20210730' of git://git.kernel.org/pub/scm/linux/kernel/git/mkl/linux-can:
        can: esd_usb2: fix memory leak
        can: ems_usb: fix memory leak
        can: usb_8dev: fix memory leak
        can: mcba_usb_start(): add missing urb->transfer_dma initialization
        can: hi311x: fix a signedness bug in hi3110_cmd()
        MAINTAINERS: add Yasushi SHOJI as reviewer for the Microchip CAN BUS Analyzer Tool driver
      ====================
      
      Link: https://lore.kernel.org/r/20210730070526.1699867-1-mkl@pengutronix.deSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      8d670412
    • W
      mm/memcg: fix NULL pointer dereference in memcg_slab_free_hook() · 121dffe2
      Wang Hai 提交于
      When I use kfree_rcu() to free a large memory allocated by kmalloc_node(),
      the following dump occurs.
      
        BUG: kernel NULL pointer dereference, address: 0000000000000020
        [...]
        Oops: 0000 [#1] SMP
        [...]
        Workqueue: events kfree_rcu_work
        RIP: 0010:__obj_to_index include/linux/slub_def.h:182 [inline]
        RIP: 0010:obj_to_index include/linux/slub_def.h:191 [inline]
        RIP: 0010:memcg_slab_free_hook+0x120/0x260 mm/slab.h:363
        [...]
        Call Trace:
          kmem_cache_free_bulk+0x58/0x630 mm/slub.c:3293
          kfree_bulk include/linux/slab.h:413 [inline]
          kfree_rcu_work+0x1ab/0x200 kernel/rcu/tree.c:3300
          process_one_work+0x207/0x530 kernel/workqueue.c:2276
          worker_thread+0x320/0x610 kernel/workqueue.c:2422
          kthread+0x13d/0x160 kernel/kthread.c:313
          ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
      
      When kmalloc_node() a large memory, page is allocated, not slab, so when
      freeing memory via kfree_rcu(), this large memory should not be used by
      memcg_slab_free_hook(), because memcg_slab_free_hook() is is used for
      slab.
      
      Using page_objcgs_check() instead of page_objcgs() in
      memcg_slab_free_hook() to fix this bug.
      
      Link: https://lkml.kernel.org/r/20210728145655.274476-1-wanghai38@huawei.com
      Fixes: 270c6a71 ("mm: memcontrol/slab: Use helpers to access slab page's memcg_data")
      Signed-off-by: NWang Hai <wanghai38@huawei.com>
      Reviewed-by: NShakeel Butt <shakeelb@google.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NRoman Gushchin <guro@fb.com>
      Reviewed-by: NKefeng Wang <wangkefeng.wang@huawei.com>
      Reviewed-by: NMuchun Song <songmuchun@bytedance.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Alexei Starovoitov <ast@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      121dffe2
    • S
      slub: fix unreclaimable slab stat for bulk free · f227f0fa
      Shakeel Butt 提交于
      SLUB uses page allocator for higher order allocations and update
      unreclaimable slab stat for such allocations.  At the moment, the bulk
      free for SLUB does not share code with normal free code path for these
      type of allocations and have missed the stat update.  So, fix the stat
      update by common code.  The user visible impact of the bug is the
      potential of inconsistent unreclaimable slab stat visible through
      meminfo and vmstat.
      
      Link: https://lkml.kernel.org/r/20210728155354.3440560-1-shakeelb@google.com
      Fixes: 6a486c0a ("mm, sl[ou]b: improve memory accounting")
      Signed-off-by: NShakeel Butt <shakeelb@google.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Acked-by: NRoman Gushchin <guro@fb.com>
      Reviewed-by: NMuchun Song <songmuchun@bytedance.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f227f0fa
    • A
      mm/migrate: fix NR_ISOLATED corruption on 64-bit · b5916c02
      Aneesh Kumar K.V 提交于
      Similar to commit 2da9f630 ("mm/vmscan: fix NR_ISOLATED_FILE
      corruption on 64-bit") avoid using unsigned int for nr_pages.  With
      unsigned int type the large unsigned int converts to a large positive
      signed long.
      
      Symptoms include CMA allocations hanging forever due to
      alloc_contig_range->...->isolate_migratepages_block waiting forever in
      "while (unlikely(too_many_isolated(pgdat)))".
      
      Link: https://lkml.kernel.org/r/20210728042531.359409-1-aneesh.kumar@linux.ibm.com
      Fixes: c5fc5c3a ("mm: migrate: account THP NUMA migration counters correctly")
      Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Reported-by: NMichael Ellerman <mpe@ellerman.id.au>
      Reported-by: NAlexey Kardashevskiy <aik@ozlabs.ru>
      Reviewed-by: NYang Shi <shy828301@gmail.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: David Hildenbrand <david@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b5916c02
    • J
      mm: memcontrol: fix blocking rstat function called from atomic cgroup1 thresholding code · 30def935
      Johannes Weiner 提交于
      Dan Carpenter reports:
      
          The patch 2d146aa3: "mm: memcontrol: switch to rstat" from Apr
          29, 2021, leads to the following static checker warning:
      
      	    kernel/cgroup/rstat.c:200 cgroup_rstat_flush()
      	    warn: sleeping in atomic context
      
          mm/memcontrol.c
            3572  static unsigned long mem_cgroup_usage(struct mem_cgroup *memcg, bool swap)
            3573  {
            3574          unsigned long val;
            3575
            3576          if (mem_cgroup_is_root(memcg)) {
            3577                  cgroup_rstat_flush(memcg->css.cgroup);
      			    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      
          This is from static analysis and potentially a false positive.  The
          problem is that mem_cgroup_usage() is called from __mem_cgroup_threshold()
          which holds an rcu_read_lock().  And the cgroup_rstat_flush() function
          can sleep.
      
            3578                  val = memcg_page_state(memcg, NR_FILE_PAGES) +
            3579                          memcg_page_state(memcg, NR_ANON_MAPPED);
            3580                  if (swap)
            3581                          val += memcg_page_state(memcg, MEMCG_SWAP);
            3582          } else {
            3583                  if (!swap)
            3584                          val = page_counter_read(&memcg->memory);
            3585                  else
            3586                          val = page_counter_read(&memcg->memsw);
            3587          }
            3588          return val;
            3589  }
      
      __mem_cgroup_threshold() indeed holds the rcu lock.  In addition, the
      thresholding code is invoked during stat changes, and those contexts
      have irqs disabled as well.  If the lock breaking occurs inside the
      flush function, it will result in a sleep from an atomic context.
      
      Use the irqsafe flushing variant in mem_cgroup_usage() to fix this.
      
      Link: https://lkml.kernel.org/r/20210726150019.251820-1-hannes@cmpxchg.org
      Fixes: 2d146aa3 ("mm: memcontrol: switch to rstat")
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Acked-by: NChris Down <chris@chrisdown.name>
      Reviewed-by: NRik van Riel <riel@surriel.com>
      Acked-by: NMichal Hocko <mhocko@suse.com>
      Reviewed-by: NShakeel Butt <shakeelb@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      30def935
    • J
      ocfs2: issue zeroout to EOF blocks · 9449ad33
      Junxiao Bi 提交于
      For punch holes in EOF blocks, fallocate used buffer write to zero the
      EOF blocks in last cluster.  But since ->writepage will ignore EOF
      pages, those zeros will not be flushed.
      
      This "looks" ok as commit 6bba4471 ("ocfs2: fix data corruption by
      fallocate") will zero the EOF blocks when extend the file size, but it
      isn't.  The problem happened on those EOF pages, before writeback, those
      pages had DIRTY flag set and all buffer_head in them also had DIRTY flag
      set, when writeback run by write_cache_pages(), DIRTY flag on the page
      was cleared, but DIRTY flag on the buffer_head not.
      
      When next write happened to those EOF pages, since buffer_head already
      had DIRTY flag set, it would not mark page DIRTY again.  That made
      writeback ignore them forever.  That will cause data corruption.  Even
      directio write can't work because it will fail when trying to drop pages
      caches before direct io, as it found the buffer_head for those pages
      still had DIRTY flag set, then it will fall back to buffer io mode.
      
      To make a summary of the issue, as writeback ingores EOF pages, once any
      EOF page is generated, any write to it will only go to the page cache,
      it will never be flushed to disk even file size extends and that page is
      not EOF page any more.  The fix is to avoid zero EOF blocks with buffer
      write.
      
      The following code snippet from qemu-img could trigger the corruption.
      
        656   open("6b3711ae-3306-4bdd-823c-cf1c0060a095.conv.2", O_RDWR|O_DIRECT|O_CLOEXEC) = 11
        ...
        660   fallocate(11, FALLOC_FL_KEEP_SIZE|FALLOC_FL_PUNCH_HOLE, 2275868672, 327680 <unfinished ...>
        660   fallocate(11, 0, 2275868672, 327680) = 0
        658   pwrite64(11, "
      
      Link: https://lkml.kernel.org/r/20210722054923.24389-2-junxiao.bi@oracle.comSigned-off-by: NJunxiao Bi <junxiao.bi@oracle.com>
      Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Changwei Ge <gechangwei@live.cn>
      Cc: Gang He <ghe@suse.com>
      Cc: Jun Piao <piaojun@huawei.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      9449ad33
    • J
      ocfs2: fix zero out valid data · f267aeb6
      Junxiao Bi 提交于
      If append-dio feature is enabled, direct-io write and fallocate could
      run in parallel to extend file size, fallocate used "orig_isize" to
      record i_size before taking "ip_alloc_sem", when
      ocfs2_zeroout_partial_cluster() zeroout EOF blocks, i_size maybe already
      extended by ocfs2_dio_end_io_write(), that will cause valid data zeroed
      out.
      
      Link: https://lkml.kernel.org/r/20210722054923.24389-1-junxiao.bi@oracle.com
      Fixes: 6bba4471 ("ocfs2: fix data corruption by fallocate")
      Signed-off-by: NJunxiao Bi <junxiao.bi@oracle.com>
      Reviewed-by: NJoseph Qi <joseph.qi@linux.alibaba.com>
      Cc: Changwei Ge <gechangwei@live.cn>
      Cc: Gang He <ghe@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Jun Piao <piaojun@huawei.com>
      Cc: Mark Fasheh <mark@fasheh.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f267aeb6
    • M
      lib/test_string.c: move string selftest in the Runtime Testing menu · b2ff70a0
      Matteo Croce 提交于
      STRING_SELFTEST is presented in the "Library routines" menu.  Move it in
      Kernel hacking > Kernel Testing and Coverage > Runtime Testing together
      with other similar tests found in lib/
      
      	--- Runtime Testing
      	<*>   Test functions located in the hexdump module at runtime
      	<*>   Test string functions (NEW)
      	<*>   Test functions located in the string_helpers module at runtime
      	<*>   Test strscpy*() family of functions at runtime
      	<*>   Test kstrto*() family of functions at runtime
      	<*>   Test printf() family of functions at runtime
      	<*>   Test scanf() family of functions at runtime
      
      Link: https://lkml.kernel.org/r/20210719185158.190371-1-mcroce@linux.microsoft.comSigned-off-by: NMatteo Croce <mcroce@microsoft.com>
      Cc: Peter Rosin <peda@axentia.se>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b2ff70a0
    • C
      gve: Update MAINTAINERS list · 028a7177
      Catherine Sullivan 提交于
      The team maintaining the gve driver has undergone some changes,
      this updates the MAINTAINERS file accordingly.
      Signed-off-by: NCatherine Sullivan <csully@google.com>
      Signed-off-by: NJon Olson <jonolson@google.com>
      Signed-off-by: NDavid Awogbemila <awogbemila@google.com>
      Signed-off-by: NJeroen de Borst <jeroendb@google.com>
      Link: https://lore.kernel.org/r/20210729155258.442650-1-csully@google.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
      028a7177
  6. 30 7月, 2021 1 次提交