提交 · 330a5bf4551784f2f0baef27f2c78550e40fc8a0 · openeuler / Kernel

14 3月, 2022 15 次提交

btrfs: use dev_t to match device in device_matched · 330a5bf4

由 Anand Jain 提交于 1月 12, 2022

Commit "btrfs: add device major-minor info in the struct btrfs_device"
saved the device major-minor number in the struct btrfs_device upon
discovering it.

So no need to lookup_bdev() again just match, which means
device_matched() can go away.
Signed-off-by: NAnand Jain <anand.jain@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

330a5bf4

btrfs: add device major-minor info in the struct btrfs_device · 4889bc05

由 Anand Jain 提交于 1月 12, 2022

Internally it is common to use the major-minor number to identify a
device and, at a few locations in btrfs, we use the major-minor number
to match the device.

So when we identify a new btrfs device through device add or device
replace or device-scan/ready save the device's major-minor (dev_t) in the
struct btrfs_device so that we don't have to call lookup_bdev() again.
Signed-off-by: NAnand Jain <anand.jain@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

4889bc05

btrfs: match stale devices by dev_t · 16cab91a

由 Anand Jain 提交于 1月 12, 2022

After the commit "btrfs: harden identification of the stale device", we
don't have to match the device path anymore. Instead, we match the dev_t.
So pass in the dev_t instead of the device path, in the call chain
btrfs_forget_devices()->btrfs_free_stale_devices().
Signed-off-by: NAnand Jain <anand.jain@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

16cab91a

btrfs: harden identification of a stale device · 770c79fb

由 Anand Jain 提交于 1月 12, 2022

Identifying and removing the stale device from the fs_uuids list is done
by btrfs_free_stale_devices().  btrfs_free_stale_devices() in turn
depends on device_path_matched() to check if the device appears in more
than one btrfs_device structure.

The matching of the device happens by its path, the device path. However,
when device mapper is in use, the dm device paths are nothing but a link
to the actual block device, which leads to the device_path_matched()
failing to match.

Fix this by matching the dev_t as provided by lookup_bdev() instead of
plain string compare of the device paths.
Reported-by: NJosef Bacik <josef@toxicpanda.com>
Signed-off-by: NAnand Jain <anand.jain@oracle.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

770c79fb

btrfs: simplify fs_devices member access in btrfs_init_dev_replace_tgtdev · bef16b52

由 Anand Jain 提交于 1月 17, 2022

In btrfs_init_dev_replace_tgtdev() we dereference fs_info to get
fs_devices many times, instead save a point to the fs_devices.
Signed-off-by: NAnand Jain <anand.jain@oracle.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

bef16b52

btrfs: reuse existing inode from btrfs_ioctl · 9ad12305

由 Sahil Kang 提交于 1月 15, 2022

btrfs_ioctl extracts inode from file so we can pass that into the
callbacks.
Signed-off-by: NSahil Kang <sahil.kang@asilaycomputing.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

9ad12305

btrfs: move missing device handling in a dedicate function · ff37c89f

由 Nikolay Borisov 提交于 1月 11, 2022

This simplifies the code flow in read_one_chunk and makes error handling
when handling missing devices a bit simpler by reducing it to a single
check if something went wrong. No functional changes.
Reviewed-by: NSu Yue <l@damenly.su>
Signed-off-by: NNikolay Borisov <nborisov@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

ff37c89f

btrfs: stop trying to log subdirectories created in past transactions · de6bc7f5

由 Filipe Manana 提交于 12月 15, 2021

When logging a directory we are trying to log subdirectories that were
changed in the current transaction and created in a past transaction.
This type of behaviour was introduced by commit 2f2ff0ee ("Btrfs:
fix metadata inconsistencies after directory fsync"), to fix some metadata
inconsistencies that in the meanwhile no longer need this behaviour due to
numerous other changes that happened throughout the years.

This behaviour, besides not needed anymore, it's also undesirable because:

1) It's not reliable because it's only triggered for the directories
   of dentries (dir items) that happen to be present on a leaf that
   was changed in the current transaction. If a dentry that points to
   a directory resides on a leaf that was not changed in the current
   transaction, then it's not logged, as at log_dir_items() and
   log_new_dir_dentries() we use btrfs_search_forward();

2) It's not required by posix or any standard, it's undefined territory.
   The only way to guarantee a subdirectory is logged, it to explicitly
   fsync it;

Making the behaviour guaranteed would require scanning all directory
items, check which point to a directory, and then fsync each subdirectory
which was modified in the current transaction. This could be very
expensive for large directories with many subdirectories and/or large
subdirectories.

So remove that obsolete logic.
Signed-off-by: NFilipe Manana <fdmanana@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

de6bc7f5

btrfs: stop copying old dir items when logging a directory · 732d591a

由 Filipe Manana 提交于 12月 15, 2021

When logging a directory, we go over every leaf of the subvolume tree that
was changed in the current transaction and copy all its dir index keys to
the log tree.

That includes copying dir index keys created in past transactions. This is
done mostly for simplicity, as after logging the keys we log an item that
specifies the start and end ranges of the keys we logged. That item is
then used during log replay to figure out which keys need to be deleted -
every key in that range that we find in the subvolume tree and is not in
the log tree, needs to be deleted.

Now that we log only dir index keys, and not dir item keys anymore, when
we remove dentries from a directory (due to unlink and rename operations),
we can get entire leaves that we changed only for deleting old dir index
keys, or that have few dir index keys that are new - this is due to the
fact that the offset for new index keys comes from a monotonically
increasing counter.

We can avoid logging dir index keys from past transactions, and in order
to track the deletions, only log range items (BTRFS_DIR_LOG_INDEX_KEY key
type) when we find gaps between consecutive index keys. This massively
reduces the amount of logged metadata when we have deleted directory
entries, even if it's a small percentage of the total number of entries.
The reduction comes from both less items that are logged and instead of
logging many dir index items (struct btrfs_dir_item), which have a size
of 30 bytes plus a file name, we typically log just a few range items
(struct btrfs_dir_log_item), which take only 8 bytes each.

Even if no entries were deleted from a directory and only new entries
were added, we typically still get a reduction on the amount of logged
metadata, because it's very likely the first leaf that got the new
dir index entries also has several old dir index entries.

So change the logging logic to not log dir index keys created in past
transactions and log a range item for every gap it finds between each
pair of consecutive index keys, to ensure deletions are tracked and
replayed on log replay.

This patch is part of a patchset comprised of the following patches:

 1/4 btrfs: don't log unnecessary boundary keys when logging directory
 2/4 btrfs: put initial index value of a directory in a constant
 3/4 btrfs: stop copying old dir items when logging a directory
 4/4 btrfs: stop trying to log subdirectories created in past transactions

The following test was run on a branch without this patchset and on a
branch with the first three patches applied:

  $ cat test.sh
  #!/bin/bash

  DEV=/dev/nvme0n1
  MNT=/mnt/nvme0n1

  NUM_FILES=1000000
  NUM_FILE_DELETES=10000

  MKFS_OPTIONS="-O no-holes -R free-space-tree"
  MOUNT_OPTIONS="-o ssd"

  mkfs.btrfs -f $MKFS_OPTIONS $DEV
  mount $MOUNT_OPTIONS $DEV $MNT

  mkdir $MNT/testdir
  for ((i = 1; i <= $NUM_FILES; i++)); do
      echo -n > $MNT/testdir/file_$i
  done

  sync

  del_inc=$(( $NUM_FILES / $NUM_FILE_DELETES ))
  for ((i = 1; i <= $NUM_FILES; i += $del_inc)); do
      rm -f $MNT/testdir/file_$i
  done

  start=$(date +%s%N)
  xfs_io -c "fsync" $MNT/testdir
  end=$(date +%s%N)

  dur=$(( (end - start) / 1000000 ))
  echo "dir fsync took $dur ms after deleting $NUM_FILE_DELETES files"
  echo

  umount $MNT

The test was run on a non-debug kernel (Debian's default kernel config),
and the results were the following for various values of NUM_FILES and
NUM_FILE_DELETES:

** before, NUM_FILES = 1 000 000, NUM_FILE_DELETES = 10 000 **

dir fsync took 585 ms after deleting 10000 files

** after, NUM_FILES = 1 000 000, NUM_FILE_DELETES = 10 000 **

dir fsync took 34 ms after deleting 10000 files   (-94.2%)

** before, NUM_FILES = 100 000, NUM_FILE_DELETES = 1 000 **

dir fsync took 50 ms after deleting 1000 files

** after, NUM_FILES = 100 000, NUM_FILE_DELETES = 1 000 **

dir fsync took 7 ms after deleting 1000 files    (-86.0%)

** before, NUM_FILES = 10 000, NUM_FILE_DELETES = 100 **

dir fsync took 9 ms after deleting 100 files

** after, NUM_FILES = 10 000, NUM_FILE_DELETES = 100 **

dir fsync took 5 ms after deleting 100 files     (-44.4%)
Signed-off-by: NFilipe Manana <fdmanana@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

732d591a

btrfs: put initial index value of a directory in a constant · 528ee697

由 Filipe Manana 提交于 12月 15, 2021

At btrfs_set_inode_index_count() we refer twice to the number 2 as the
initial index value for a directory (when it's empty), with a proper
comment explaining the reason for that value. In the next patch I'll
have to use that magic value in the directory logging code, so put
the value in a #define at btrfs_inode.h, to avoid hardcoding the
magic value again at tree-log.c.
Signed-off-by: NFilipe Manana <fdmanana@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

528ee697

btrfs: don't log unnecessary boundary keys when logging directory · a450a4af

由 Filipe Manana 提交于 12月 15, 2021

Before we start to log dir index keys from a leaf, we check if there is a
previous index key, which normally is at the end of a leaf that was not
changed in the current transaction. Then we log that key and set the start
of logged range (item of type BTRFS_DIR_LOG_INDEX_KEY) to the offset of
that key. This is to ensure that if there were deleted index keys between
that key and the first key we are going to log, those deletions are
replayed in case we need to replay to the log after a power failure.
However we really don't need to log that previous key, we can just set the
start of the logged range to that key's offset plus 1. This achieves the
same and avoids logging one dir index key.

The same logic is performed when we finish logging the index keys of a
leaf and we find that the next leaf has index keys and was not changed in
the current transaction. We are logging the first key of that next leaf
and use its offset as the end of range we log. This is just to ensure that
if there were deleted index keys between the last index key we logged and
the first key of that next leaf, those index keys are deleted if we end
up replaying the log. However that is not necessary, we can avoid logging
that first index key of the next leaf and instead set the end of the
logged range to match the offset of that index key minus 1.

So avoid logging those index keys at the boundaries and adjust the start
and end offsets of the logged ranges as described above.

This patch is part of a patchset comprised of the following patches:

1/4 btrfs: don't log unnecessary boundary keys when logging directory
2/4 btrfs: put initial index value of a directory in a constant
3/4 btrfs: stop copying old dir items when logging a directory
4/4 btrfs: stop trying to log subdirectories created in past transactions

Performance test results are listed in the changelog of patch 3/4.
Signed-off-by: NFilipe Manana <fdmanana@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

a450a4af

btrfs: reuse existing pointers from btrfs_ioctl · dc408ccd

由 Sahil Kang 提交于 1月 05, 2022

btrfs_ioctl already contains pointers to the inode and btrfs_root
structs, so we can pass them into the subfunctions instead of the
toplevel struct file.
Signed-off-by: NSahil Kang <sahil.kang@asilaycomputing.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

dc408ccd

btrfs: remove write and wait of struct walk_control · c816d705

由 Filipe Manana 提交于 1月 04, 2022

The ->write and ->wait fields of struct walk_control, used for log trees,
are not used since 2008, more specifically since commit d0c803c4
("Btrfs: Record dirty pages tree-log pages in an extent_io tree") and
since commit d0c803c4 ("Btrfs: Record dirty pages tree-log pages in
an extent_io tree"). So just remove them, along with the function
btrfs_write_tree_block(), which is also not used anymore after removing
the ->write member.
Signed-off-by: NFilipe Manana <fdmanana@suse.com>
Reviewed-by: NDavid Sterba <dsterba@suse.com>
Signed-off-by: NDavid Sterba <dsterba@suse.com>

c816d705

L

Linux 5.17-rc8 · 09688c01
由 Linus Torvalds 提交于 3月 13, 2022

09688c01

Merge tag 'x86_urgent_for_v5.17_rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · f0e18b03

由 Linus Torvalds 提交于 3月 13, 2022

Pull x86 fixes from Borislav Petkov:

 - Free shmem backing storage for SGX enclave pages when those are
   swapped back into EPC memory

 - Prevent do_int3() from being kprobed, to avoid recursion

 - Remap setup_data and setup_indirect structures properly when
   accessing their members

 - Correct the alternatives patching order for modules too

* tag 'x86_urgent_for_v5.17_rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/sgx: Free backing memory after faulting the enclave page
  x86/traps: Mark do_int3() NOKPROBE_SYMBOL
  x86/boot: Add setup_indirect support in early_memremap_is_setup_data()
  x86/boot: Fix memremap of setup_indirect structures
  x86/module: Fix the paravirt vs alternative order

f0e18b03

13 3月, 2022 2 次提交

Merge tag 'perf-tools-fixes-for-v5.17-2022-03-12' of... · aad611a8

由 Linus Torvalds 提交于 3月 12, 2022

Merge tag 'perf-tools-fixes-for-v5.17-2022-03-12' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux

Pull perf tools fixes from Arnaldo Carvalho de Melo:

 - Fix event parser error for hybrid systems

 - Fix NULL check against wrong variable in 'perf bench' and in the
   parsing code

 - Update arm64 KVM headers from the kernel sources

 - Sync cpufeatures header with the kernel sources

* tag 'perf-tools-fixes-for-v5.17-2022-03-12' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux:
  perf parse: Fix event parser error for hybrid systems
  perf bench: Fix NULL check against wrong variable
  perf parse-events: Fix NULL check against wrong variable
  tools headers cpufeatures: Sync with the kernel sources
  tools kvm headers arm64: Update KVM headers from the kernel sources

aad611a8

Merge tag 'drm-fixes-2022-03-12' of git://anongit.freedesktop.org/drm/drm · 1518a4f6

由 Linus Torvalds 提交于 3月 12, 2022

Pull drm kconfig fix from Dave Airlie:
 "Thorsten pointed out this had fallen down the cracks and was in -next
  only, I've picked it out, fixed up it's Fixes: line.

   - fix regression in Kconfig"

* tag 'drm-fixes-2022-03-12' of git://anongit.freedesktop.org/drm/drm:
  drm/panel: Select DRM_DP_HELPER for DRM_PANEL_EDP

1518a4f6

12 3月, 2022 23 次提交

perf parse: Fix event parser error for hybrid systems · 91c9923a

由 Zhengjun Xing 提交于 3月 07, 2022

This bug happened on hybrid systems when both cpu_core and cpu_atom
have the same event name such as "UOPS_RETIRED.MS" while their event
terms are different, then during perf stat, the event for cpu_atom
will parse fail and then no output for cpu_atom.

UOPS_RETIRED.MS -> cpu_core/period=0x1e8483,umask=0x4,event=0xc2,frontend=0x8/
UOPS_RETIRED.MS -> cpu_atom/period=0x1e8483,umask=0x1,event=0xc2/

It is because event terms in the "head" of parse_events_multi_pmu_add
will be changed to event terms for cpu_core after parsing UOPS_RETIRED.MS
for cpu_core, then when parsing the same event for cpu_atom, it still
uses the event terms for cpu_core, but event terms for cpu_atom are
different with cpu_core, the event parses for cpu_atom will fail. This
patch fixes it, the event terms should be parsed from the original
event.

This patch can work for the hybrid systems that have the same event
in more than 2 PMUs. It also can work in non-hybrid systems.

Before:

  # perf stat -v  -e  UOPS_RETIRED.MS  -a sleep 1

  Using CPUID GenuineIntel-6-97-1
  UOPS_RETIRED.MS -> cpu_core/period=0x1e8483,umask=0x4,event=0xc2,frontend=0x8/
  Control descriptor is not initialized
  UOPS_RETIRED.MS: 2737845 16068518485 16068518485

 Performance counter stats for 'system wide':

         2,737,845      cpu_core/UOPS_RETIRED.MS/

       1.002553850 seconds time elapsed

After:

  # perf stat -v  -e  UOPS_RETIRED.MS  -a sleep 1

  Using CPUID GenuineIntel-6-97-1
  UOPS_RETIRED.MS -> cpu_core/period=0x1e8483,umask=0x4,event=0xc2,frontend=0x8/
  UOPS_RETIRED.MS -> cpu_atom/period=0x1e8483,umask=0x1,event=0xc2/
  Control descriptor is not initialized
  UOPS_RETIRED.MS: 1977555 16076950711 16076950711
  UOPS_RETIRED.MS: 568684 8038694234 8038694234

 Performance counter stats for 'system wide':

         1,977,555      cpu_core/UOPS_RETIRED.MS/
           568,684      cpu_atom/UOPS_RETIRED.MS/

       1.004758259 seconds time elapsed

Fixes: fb081153 ("perf parse-events: Allow config on kernel PMU events")
Reviewed-by: NKan Liang <kan.liang@linux.intel.com>
Signed-off-by: NZhengjun Xing <zhengjun.xing@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220307151627.30049-1-zhengjun.xing@linux.intel.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

91c9923a

perf bench: Fix NULL check against wrong variable · 073a15c3

由因上努力果上随缘提交于 3月 11, 2022

We did a NULL check after "epollfdp = calloc(...)", but we checked
"epollfd" instead of "epollfdp".
Signed-off-by: Weiguo Li <liwg06@foxmail.com>
Acked-by: NDavidlohr Bueso <dave@stgolabs.net>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Davidlohr Bueso <dbueso@suse.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/tencent_B5D64530EB9C7DBB8D2C88A0C790F1489D0A@qq.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

073a15c3

perf parse-events: Fix NULL check against wrong variable · a7a72631

由因上努力果上随缘提交于 3月 11, 2022

We did a null check after "tmp->symbol = strdup(...)", but we checked
"list->symbol" other than "tmp->symbol".
Reviewed-by: NJohn Garry <john.garry@huawei.com>
Signed-off-by: Weiguo Li <liwg06@foxmail.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/tencent_DF39269807EC9425E24787E6DB632441A405@qq.comSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

a7a72631

tools headers cpufeatures: Sync with the kernel sources · ec9d50ac

由 Arnaldo Carvalho de Melo 提交于 7月 01, 2021

To pick the changes from:

d45476d9 ("x86/speculation: Rename RETPOLINE_AMD to RETPOLINE_LFENCE")

Its just a comment fixup.

This only causes these perf files to be rebuilt:

CC /tmp/build/perf/bench/mem-memcpy-x86-64-asm.o
CC /tmp/build/perf/bench/mem-memset-x86-64-asm.o

And addresses this perf build warning:

Warning: Kernel ABI header at 'tools/arch/x86/include/asm/cpufeatures.h' differs from latest version at 'arch/x86/include/asm/cpufeatures.h'
diff -u tools/arch/x86/include/asm/cpufeatures.h arch/x86/include/asm/cpufeatures.h

Cc: Borislav Petkov <bp@suse.de>
Cc: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/lkml/YiyiHatGaJQM7l/Y@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

ec9d50ac

tools kvm headers arm64: Update KVM headers from the kernel sources · 3ec94eea

由 Arnaldo Carvalho de Melo 提交于 12月 21, 2020

To pick the changes from:

a5905d6a ("KVM: arm64: Allow SMCCC_ARCH_WORKAROUND_3 to be discovered and migrated")

That don't causes any changes in tooling (when built on x86), only
addresses this perf build warning:

Warning: Kernel ABI header at 'tools/arch/arm64/include/uapi/asm/kvm.h' differs from latest version at 'arch/arm64/include/uapi/asm/kvm.h'
diff -u tools/arch/arm64/include/uapi/asm/kvm.h arch/arm64/include/uapi/asm/kvm.h

Cc: James Morse <james.morse@arm.com>
Link: https://lore.kernel.org/lkml/YiyhAK6sVPc83FaI@kernel.orgSigned-off-by: NArnaldo Carvalho de Melo <acme@redhat.com>

3ec94eea

drm/panel: Select DRM_DP_HELPER for DRM_PANEL_EDP · 3755d35e

由 Thomas Zimmermann 提交于 2月 03, 2022

As reported in [1], DRM_PANEL_EDP depends on DRM_DP_HELPER. Select
the option to fix the build failure. The error message is shown
below.

  arm-linux-gnueabihf-ld: drivers/gpu/drm/panel/panel-edp.o: in function
    `panel_edp_probe': panel-edp.c:(.text+0xb74): undefined reference to
    `drm_panel_dp_aux_backlight'
  make[1]: *** [/builds/linux/Makefile:1222: vmlinux] Error 1

The issue has been reported before, when DisplayPort helpers were
hidden behind the option CONFIG_DRM_KMS_HELPER. [2]

v2:
	* fix and expand commit description (Arnd)
Signed-off-by: NThomas Zimmermann <tzimmermann@suse.de>
Fixes: 9d6366e7 ("drm: fb_helper: improve CONFIG_FB dependency")
Reported-by: NNaresh Kamboju <naresh.kamboju@linaro.org>
Reported-by: NLinux Kernel Functional Testing <lkft@linaro.org>
Reviewed-by: NLyude Paul <lyude@redhat.com>
Acked-by: NSam Ravnborg <sam@ravnborg.org>
Link: https://lore.kernel.org/dri-devel/CA+G9fYvN0NyaVkRQmA1O6rX7H8PPaZrUAD7=RDy33QY9rUU-9g@mail.gmail.com/ # [1]
Link: https://lore.kernel.org/all/20211117062704.14671-1-rdunlap@infradead.org/ # [2]
Cc: Thomas Zimmermann <tzimmermann@suse.de>
Cc: Lyude Paul <lyude@redhat.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Maxime Ripard <mripard@kernel.org>
Cc: dri-devel@lists.freedesktop.org
Link: https://patchwork.freedesktop.org/patch/msgid/20220203093922.20754-1-tzimmermann@suse.deSigned-off-by: NDave Airlie <airlied@redhat.com>

3755d35e

ARM: Spectre-BHB: provide empty stub for non-config · 68453767

由 Randy Dunlap 提交于 3月 11, 2022

When CONFIG_GENERIC_CPU_VULNERABILITIES is not set, references
to spectre_v2_update_state() cause a build error, so provide an
empty stub for that function when the Kconfig option is not set.

Fixes this build error:

  arm-linux-gnueabi-ld: arch/arm/mm/proc-v7-bugs.o: in function `cpu_v7_bugs_init':
  proc-v7-bugs.c:(.text+0x52): undefined reference to `spectre_v2_update_state'
  arm-linux-gnueabi-ld: proc-v7-bugs.c:(.text+0x82): undefined reference to `spectre_v2_update_state'

Fixes: b9baf5c8 ("ARM: Spectre-BHB workaround")
Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
Reported-by: Nkernel test robot <lkp@intel.com>
Cc: Russell King <rmk+kernel@armlinux.org.uk>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: patches@armlinux.org.uk
Acked-by: NRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

68453767

Merge tag 'riscv-for-linus-5.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux · 77fe1ba9

由 Linus Torvalds 提交于 3月 11, 2022

Pull RISC-V fixes from Palmer Dabbelt:

 - prevent users from enabling the alternatives framework (and thus
   errata handling) on XIP kernels, where runtime code patching does not
   function correctly.

 - properly detect offset overflow for AUIPC-based relocations in
   modules. This may manifest as modules calling arbitrary invalid
   addresses, depending on the address allocated when a module is
   loaded.

* tag 'riscv-for-linus-5.17-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux:
  riscv: Fix auipc+jalr relocation range checks
  riscv: alternative only works on !XIP_KERNEL

77fe1ba9

Merge tag 'powerpc-5.17-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux · 878409ec

由 Linus Torvalds 提交于 3月 11, 2022

Pull powerpc fix from Michael Ellerman:
 "Fix STACKTRACE=n build, in particular for skiroot_defconfig"

* tag 'powerpc-5.17-6' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux:
  powerpc: Fix STACKTRACE=n build

878409ec

ARM: fix Thumb2 regression with Spectre BHB · 6c7cb60b

由 Russell King (Oracle) 提交于 3月 11, 2022

When building for Thumb2, the vectors make use of a local label. Sadly,
the Spectre BHB code also uses a local label with the same number which
results in the Thumb2 reference pointing at the wrong place. Fix this
by changing the number used for the Spectre BHB local label.

Fixes: b9baf5c8 ("ARM: Spectre-BHB workaround")
Tested-by: NNathan Chancellor <nathan@kernel.org>
Signed-off-by: NRussell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

6c7cb60b

Merge tag 'mmc-v5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc · 3977a3fb

由 Linus Torvalds 提交于 3月 11, 2022

Pull MMC fixes from Ulf Hansson:
 "MMC core:
   - Restore (mostly) the busy polling for MMC_SEND_OP_COND

  MMC host:
   - meson-gx: Fix DMA usage of meson_mmc_post_req()"

* tag 'mmc-v5.17-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/mmc:
  mmc: core: Restore (almost) the busy polling for MMC_SEND_OP_COND
  mmc: meson: Fix usage of meson_mmc_post_req()

3977a3fb

x86/sgx: Free backing memory after faulting the enclave page · 08999b24

由 Jarkko Sakkinen 提交于 3月 04, 2022

There is a limited amount of SGX memory (EPC) on each system.  When that
memory is used up, SGX has its own swapping mechanism which is similar
in concept but totally separate from the core mm/* code.  Instead of
swapping to disk, SGX swaps from EPC to normal RAM.  That normal RAM
comes from a shared memory pseudo-file and can itself be swapped by the
core mm code.  There is a hierarchy like this:

	EPC <-> shmem <-> disk

After data is swapped back in from shmem to EPC, the shmem backing
storage needs to be freed.  Currently, the backing shmem is not freed.
This effectively wastes the shmem while the enclave is running.  The
memory is recovered when the enclave is destroyed and the backing
storage freed.

Sort this out by freeing memory with shmem_truncate_range(), as soon as
a page is faulted back to the EPC.  In addition, free the memory for
PCMD pages as soon as all PCMD's in a page have been marked as unused
by zeroing its contents.

Cc: stable@vger.kernel.org
Fixes: 1728ab54 ("x86/sgx: Add a page reclaimer")
Reported-by: NDave Hansen <dave.hansen@linux.intel.com>
Signed-off-by: NJarkko Sakkinen <jarkko@kernel.org>
Signed-off-by: NDave Hansen <dave.hansen@linux.intel.com>
Link: https://lkml.kernel.org/r/20220303223859.273187-1-jarkko@kernel.org

08999b24

Merge branch 'davidh' (fixes from David Howells) · 93ce9358

由 Linus Torvalds 提交于 3月 11, 2022

Merge misc fixes from David Howells:
 "A set of patches for watch_queue filter issues noted by Jann. I've
  added in a cleanup patch from Christophe Jaillet to convert to using
  formal bitmap specifiers for the note allocation bitmap.

  Also two filesystem fixes (afs and cachefiles)"

* emailed patches from David Howells <dhowells@redhat.com>:
  cachefiles: Fix volume coherency attribute
  afs: Fix potential thrashing in afs writeback
  watch_queue: Make comment about setting ->defunct more accurate
  watch_queue: Fix lack of barrier/sync/lock between post and read
  watch_queue: Free the alloc bitmap when the watch_queue is torn down
  watch_queue: Fix the alloc bitmap size to reflect notes allocated
  watch_queue: Use the bitmap API when applicable
  watch_queue: Fix to always request a pow-of-2 pipe ring size
  watch_queue: Fix to release page in ->release()
  watch_queue, pipe: Free watchqueue state after clearing pipe ring
  watch_queue: Fix filter limit check

93ce9358

cachefiles: Fix volume coherency attribute · 413a4a6b

由 David Howells 提交于 3月 11, 2022

A network filesystem may set coherency data on a volume cookie, and if
given, cachefiles will store this in an xattr on the directory in the
cache corresponding to the volume.

The function that sets the xattr just stores the contents of the volume
coherency buffer directly into the xattr, with nothing added; the
checking function, on the other hand, has a cut'n'paste error whereby it
tries to interpret the xattr contents as would be the xattr on an
ordinary file (using the cachefiles_xattr struct).  This results in a
failure to match the coherency data because the buffer ends up being
shifted by 18 bytes.

Fix this by defining a structure specifically for the volume xattr and
making both the setting and checking functions use it.

Since the volume coherency doesn't work if used, take the opportunity to
insert a reserved field for future use, set it to 0 and check that it is
0.  Log mismatch through the appropriate tracepoint.

Note that this only affects cifs; 9p, afs, ceph and nfs don't use the
volume coherency data at the moment.

Fixes: 32e15003 ("fscache, cachefiles: Store the volume coherency data")
Reported-by: NRohith Surabattula <rohiths.msft@gmail.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Reviewed-by: NJeff Layton <jlayton@kernel.org>
cc: Steve French <smfrench@gmail.com>
cc: linux-cifs@vger.kernel.org
cc: linux-cachefs@redhat.com
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

413a4a6b

afs: Fix potential thrashing in afs writeback · 173ce1ca

由 David Howells 提交于 3月 11, 2022

In afs_writepages_region(), if the dirty page we find is undergoing
writeback or write to cache, but the sync_mode is WB_SYNC_NONE, we go
round the loop trying the same page again and again with no pausing or
waiting unless and until another thread manages to clear the writeback
and fscache flags.

Fix this with three measures:

 (1) Advance start to after the page we found.

 (2) Break out of the loop and return if rescheduling is requested.

 (3) Arbitrarily give up after a maximum of 5 skips.

Fixes: 31143d5d ("AFS: implement basic file write support")
Reported-by: NMarc Dionne <marc.dionne@auristor.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Tested-by: NMarc Dionne <marc.dionne@auristor.com>
Acked-by: NMarc Dionne <marc.dionne@auristor.com>
Link: https://lore.kernel.org/r/164692725757.2097000.2060513769492301854.stgit@warthog.procyon.org.uk/ # v1
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

173ce1ca

x86/traps: Mark do_int3() NOKPROBE_SYMBOL · a365a65f

由 Li Huafei 提交于 3月 10, 2022

Since kprobe_int3_handler() is called in do_int3(), probing do_int3()
can cause a breakpoint recursion and crash the kernel. Therefore,
do_int3() should be marked as NOKPROBE_SYMBOL.

Fixes: 21e28290 ("x86/traps: Split int3 handler up")
Signed-off-by: NLi Huafei <lihuafei1@huawei.com>
Signed-off-by: NBorislav Petkov <bp@suse.de>
Acked-by: NMasami Hiramatsu <mhiramat@kernel.org>
Cc: <stable@vger.kernel.org>
Link: https://lore.kernel.org/r/20220310120915.63349-1-lihuafei1@huawei.com

a365a65f

watch_queue: Make comment about setting ->defunct more accurate · 4edc0760

由 David Howells 提交于 3月 11, 2022

watch_queue_clear() has a comment stating that setting ->defunct to true
preventing new additions as well as preventing notifications.  Whilst
the latter is true, the first bit is superfluous since at the time this
function is called, the pipe cannot be accessed to add new event
sources.

Remove the "new additions" bit from the comment.

Fixes: c73be61c ("pipe: Add general notification queue support")
Reported-by: NJann Horn <jannh@google.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

4edc0760

watch_queue: Fix lack of barrier/sync/lock between post and read · 2ed147f0

由 David Howells 提交于 3月 11, 2022

There's nothing to synchronise post_one_notification() versus
pipe_read().  Whilst posting is done under pipe->rd_wait.lock, the
reader only takes pipe->mutex which cannot bar notification posting as
that may need to be made from contexts that cannot sleep.

Fix this by setting pipe->head with a barrier in post_one_notification()
and reading pipe->head with a barrier in pipe_read().

If that's not sufficient, the rd_wait.lock will need to be taken,
possibly in a ->confirm() op so that it only applies to notifications.
The lock would, however, have to be dropped before copy_page_to_iter()
is invoked.

Fixes: c73be61c ("pipe: Add general notification queue support")
Reported-by: NJann Horn <jannh@google.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2ed147f0

watch_queue: Free the alloc bitmap when the watch_queue is torn down · 7ea1a012

由 David Howells 提交于 3月 11, 2022

Free the watch_queue note allocation bitmap when the watch_queue is
destroyed.

Fixes: c73be61c ("pipe: Add general notification queue support")
Reported-by: NJann Horn <jannh@google.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

7ea1a012

watch_queue: Fix the alloc bitmap size to reflect notes allocated · 3b4c0371

由 David Howells 提交于 3月 11, 2022

Currently, watch_queue_set_size() sets the number of notes available in
wqueue->nr_notes according to the number of notes allocated, but sets
the size of the bitmap to the unrounded number of notes originally asked
for.

Fix this by setting the bitmap size to the number of notes we're
actually going to make available (ie. the number allocated).

Fixes: c73be61c ("pipe: Add general notification queue support")
Reported-by: NJann Horn <jannh@google.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3b4c0371

watch_queue: Use the bitmap API when applicable · a66bd757

由 Christophe JAILLET 提交于 3月 11, 2022

Use bitmap_alloc() to simplify code, improve the semantic and reduce
some open-coded arithmetic in allocator arguments.

Also change a memset(0xff) into an equivalent bitmap_fill() to keep
consistency.
Signed-off-by: NChristophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a66bd757

watch_queue: Fix to always request a pow-of-2 pipe ring size · 96a4d891

由 David Howells 提交于 3月 11, 2022

The pipe ring size must always be a power of 2 as the head and tail
pointers are masked off by AND'ing with the size of the ring - 1.
watch_queue_set_size(), however, lets you specify any number of notes
between 1 and 511. This number is passed through to pipe_resize_ring()
without checking/forcing its alignment.

Fix this by rounding the number of slots required up to the nearest
power of two. The request is meant to guarantee that at least that many
notifications can be generated before the queue is full, so rounding
down isn't an option, but, alternatively, it may be better to give an
error if we aren't allowed to allocate that much ring space.

Fixes: c73be61c ("pipe: Add general notification queue support")
Reported-by: NJann Horn <jannh@google.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

96a4d891

watch_queue: Fix to release page in ->release() · c1853fba

由 David Howells 提交于 3月 11, 2022

When a pipe ring descriptor points to a notification message, the
refcount on the backing page is incremented by the generic get function,
but the release function, which marks the bitmap, doesn't drop the page
ref.

Fix this by calling generic_pipe_buf_release() at the end of
watch_queue_pipe_buf_release().

Fixes: c73be61c ("pipe: Add general notification queue support")
Reported-by: NJann Horn <jannh@google.com>
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

c1853fba

openeuler / Kernel 大约 1 年 前同步成功

openeuler / Kernel
大约 1 年前同步成功