- 30 10月, 2020 2 次提交
-
-
由 Mauro Carvalho Chehab 提交于
Some identifiers have different names between their prototypes and the kernel-doc markup. Others need to be fixed, as kernel-doc markups should use this format: identifier - description In the specific case of __sta_info_flush(), add a documentation for sta_info_flush(), as this one is the one used outside sta_info.c. Signed-off-by: NMauro Carvalho Chehab <mchehab+huawei@kernel.org> Reviewed-by: NJohannes Berg <johannes@sipsolutions.net> Link: https://lore.kernel.org/r/978d35eef2dc76e21c81931804e4eaefbd6d635e.1603469755.git.mchehab+huawei@kernel.orgSigned-off-by: NJohannes Berg <johannes.berg@intel.com>
-
由 Gustavo A. R. Silva 提交于
In preparation to enable -Wimplicit-fallthrough for Clang, explicitly add break statements instead of letting the code fall through to the next case. This patch adds four break statements that, together, fix almost 40,000 warnings when building Linux 5.10-rc1 with Clang 12.0.0 and this[1] change reverted. Notice that in order to enable -Wimplicit-fallthrough for Clang, such change[1] is meant to be reverted at some point. So, this patch helps to move in that direction. Something important to mention is that there is currently a discrepancy between GCC and Clang when dealing with switch fall-through to empty case statements or to cases that only contain a break/continue/return statement[2][3][4]. Now that the -Wimplicit-fallthrough option has been globally enabled[5], any compiler should really warn on missing either a fallthrough annotation or any of the other case-terminating statements (break/continue/return/ goto) when falling through to the next case statement. Making exceptions to this introduces variation in case handling which may continue to lead to bugs, misunderstandings, and a general lack of robustness. The point of enabling options like -Wimplicit-fallthrough is to prevent human error and aid developers in spotting bugs before their code is even built/ submitted/committed, therefore eliminating classes of bugs. So, in order to really accomplish this, we should, and can, move in the direction of addressing any error-prone scenarios and get rid of the unintentional fallthrough bug-class in the kernel, entirely, even if there is some minor redundancy. Better to have explicit case-ending statements than continue to have exceptions where one must guess as to the right result. The compiler will eliminate any actual redundancy. [1] commit e2079e93 ("kbuild: Do not enable -Wimplicit-fallthrough for clang for now") [2] https://github.com/ClangBuiltLinux/linux/issues/636 [3] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91432 [4] https://godbolt.org/z/xgkvIh [5] commit a035d552 ("Makefile: Globally enable fall-through warning") Co-developed-by: NKees Cook <keescook@chromium.org> Signed-off-by: NKees Cook <keescook@chromium.org> Signed-off-by: NGustavo A. R. Silva <gustavoars@kernel.org>
-
- 29 10月, 2020 4 次提交
-
-
由 David Howells 提交于
Fix afs_invalidatepage() to adjust the dirty region recorded in page->private when truncating a page. If the dirty region is entirely removed, then the private data is cleared and the page dirty state is cleared. Without this, if the page is truncated and then expanded again by truncate, zeros from the expanded, but no-longer dirty region may get written back to the server if the page gets laundered due to a conflicting 3rd-party write. It mustn't, however, shorten the dirty region of the page if that page is still mmapped and has been marked dirty by afs_page_mkwrite(), so a flag is stored in page->private to record this. Fixes: 4343d008 ("afs: Get rid of the afs_writeback record") Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 David Howells 提交于
The afs filesystem uses page->private to store the dirty range within a page such that in the event of a conflicting 3rd-party write to the server, we write back just the bits that got changed locally. However, there are a couple of problems with this: (1) I need a bit to note if the page might be mapped so that partial invalidation doesn't shrink the range. (2) There aren't necessarily sufficient bits to store the entire range of data altered (say it's a 32-bit system with 64KiB pages or transparent huge pages are in use). So wrap the accesses in inline functions so that future commits can change how this works. Also move them out of the tracing header into the in-directory header. There's not really any need for them to be in the tracing header. Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 Mauro Carvalho Chehab 提交于
The kernel-doc markup that documents _fc_replay_callback is missing an asterisk, causing this warning: ../include/linux/jbd2.h:1271: warning: Function parameter or member 'j_fc_replay_callback' not described in 'journal_s' When building the docs. Fixes: 609f928af48f ("jbd2: fast commit recovery path") Signed-off-by: NMauro Carvalho Chehab <mchehab+huawei@kernel.org> Link: https://lore.kernel.org/r/6055927ada2015b55b413cdd2670533bdc9a8da2.1603791716.git.mchehab+huawei@kernel.orgSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Harshad Shirwadkar 提交于
This patch reserves a field in the jbd2 superblock for number of fast commit blocks. When this value is non-zero, Ext4 uses this field to set the number of fast commit blocks. Fixes: 6866d7b3 ("ext4/jbd2: add fast commit initialization") Signed-off-by: NHarshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20201027044915.2553163-2-harshadshirwadkar@gmail.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
-
- 28 10月, 2020 2 次提交
-
-
由 Jason Gunthorpe 提交于
There are two flows for handling RDMA_CM_EVENT_ROUTE_RESOLVED, either the handler triggers a completion and another thread does rdma_connect() or the handler directly calls rdma_connect(). In all cases rdma_connect() needs to hold the handler_mutex, but when handler's are invoked this is already held by the core code. This causes ULPs using the 2nd method to deadlock. Provide a rdma_connect_locked() and have all ULPs call it from their handlers. Link: https://lore.kernel.org/r/0-v2-53c22d5c1405+33-rdma_connect_locking_jgg@nvidia.comReported-and-tested-by: NGuoqing Jiang <guoqing.jiang@cloud.ionos.com> Fixes: 2a7cec53 ("RDMA/cma: Fix locking for the RDMA_CM_CONNECT state") Acked-by: NSantosh Shilimkar <santosh.shilimkar@oracle.com> Acked-by: NJack Wang <jinpu.wang@cloud.ionos.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NMax Gurtovoy <mgurtovoy@nvidia.com> Reviewed-by: NSagi Grimberg <sagi@grimberg.me> Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
-
由 Kees Cook 提交于
Under some circumstances, the compiler generates .ctors.* sections. This is seen doing a cross compile of x86_64 from a powerpc64el host: x86_64-linux-gnu-ld: warning: orphan section `.ctors.65435' from `kernel/trace/trace_clock.o' being placed in section `.ctors.65435' x86_64-linux-gnu-ld: warning: orphan section `.ctors.65435' from `kernel/trace/ftrace.o' being placed in section `.ctors.65435' x86_64-linux-gnu-ld: warning: orphan section `.ctors.65435' from `kernel/trace/ring_buffer.o' being placed in section `.ctors.65435' Include these orphans along with the regular .ctors section. Reported-by: NStephen Rothwell <sfr@canb.auug.org.au> Tested-by: NStephen Rothwell <sfr@canb.auug.org.au> Fixes: 83109d5d ("x86/build: Warn on orphan section placement") Signed-off-by: NKees Cook <keescook@chromium.org> Acked-by: NNick Desaulniers <ndesaulniers@google.com> Link: https://lore.kernel.org/r/20201005025720.2599682-1-keescook@chromium.org
-
- 27 10月, 2020 1 次提交
-
-
由 Parav Pandit 提交于
When a mlx5 core devlink instance is reloaded in different net namespace, its associated IB device is deleted and recreated. Example sequence is: $ ip netns add foo $ devlink dev reload pci/0000:00:08.0 netns foo $ ip netns del foo mlx5 IB device needs to attach and detach the netdevice to it through the netdev notifier chain during load and unload sequence. A below call graph of the unload flow. cleanup_net() down_read(&pernet_ops_rwsem); <- first sem acquired ops_pre_exit_list() pre_exit() devlink_pernet_pre_exit() devlink_reload() mlx5_devlink_reload_down() mlx5_unload_one() [...] mlx5_ib_remove() mlx5_ib_unbind_slave_port() mlx5_remove_netdev_notifier() unregister_netdevice_notifier() down_write(&pernet_ops_rwsem);<- recurrsive lock Hence, when net namespace is deleted, mlx5 reload results in deadlock. When deadlock occurs, devlink mutex is also held. This not only deadlocks the mlx5 device under reload, but all the processes which attempt to access unrelated devlink devices are deadlocked. Hence, fix this by mlx5 ib driver to register for per net netdev notifier instead of global one, which operats on the net namespace without holding the pernet_ops_rwsem. Fixes: 4383cfcc ("net/mlx5: Add devlink reload") Link: https://lore.kernel.org/r/20201026134359.23150-1-parav@nvidia.comSigned-off-by: NParav Pandit <parav@nvidia.com> Signed-off-by: NLeon Romanovsky <leonro@nvidia.com> Signed-off-by: NJason Gunthorpe <jgg@nvidia.com>
-
- 26 10月, 2020 2 次提交
-
-
由 Joe Perches 提交于
Use a more generic form for __section that requires quotes to avoid complications with clang and gcc differences. Remove the quote operator # from compiler_attributes.h __section macro. Convert all unquoted __section(foo) uses to quoted __section("foo"). Also convert __attribute__((section("foo"))) uses to __section("foo") even if the __attribute__ has multiple list entry forms. Conversion done using the script at: https://lore.kernel.org/lkml/75393e5ddc272dc7403de74d645e6c6e0f4e70eb.camel@perches.com/2-convert_section.plSigned-off-by: NJoe Perches <joe@perches.com> Reviewed-by: NNick Desaulniers <ndesaulniers@gooogle.com> Reviewed-by: NMiguel Ojeda <ojeda@kernel.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Eric Biggers 提交于
Commit 453431a5 ("mm, treewide: rename kzfree() to kfree_sensitive()") renamed kzfree() to kfree_sensitive(), but it left a compatibility definition of kzfree() to avoid being too disruptive. Since then a few more instances of kzfree() have slipped in. Just get rid of them and remove the compatibility definition once and for all. Signed-off-by: NEric Biggers <ebiggers@google.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 25 10月, 2020 2 次提交
-
-
由 Willy Tarreau 提交于
With the removal of the interrupt perturbations in previous random32 change (random32: make prandom_u32() output unpredictable), the PRNG has become 100% deterministic again. While SipHash is expected to be way more robust against brute force than the previous Tausworthe LFSR, there's still the risk that whoever has even one temporary access to the PRNG's internal state is able to predict all subsequent draws till the next reseed (roughly every minute). This may happen through a side channel attack or any data leak. This patch restores the spirit of commit f227e3ec ("random32: update the net random state on interrupt and activity") in that it will perturb the internal PRNG's statee using externally collected noise, except that it will not pick that noise from the random pool's bits nor upon interrupt, but will rather combine a few elements along the Tx path that are collectively hard to predict, such as dev, skb and txq pointers, packet length and jiffies values. These ones are combined using a single round of SipHash into a single long variable that is mixed with the net_rand_state upon each invocation. The operation was inlined because it produces very small and efficient code, typically 3 xor, 2 add and 2 rol. The performance was measured to be the same (even very slightly better) than before the switch to SipHash; on a 6-core 12-thread Core i7-8700k equipped with a 40G NIC (i40e), the connection rate dropped from 556k/s to 555k/s while the SYN cookie rate grew from 5.38 Mpps to 5.45 Mpps. Link: https://lore.kernel.org/netdev/20200808152628.GA27941@SDF.ORG/ Cc: George Spelvin <lkml@sdf.org> Cc: Amit Klein <aksecurity@gmail.com> Cc: Eric Dumazet <edumazet@google.com> Cc: "Jason A. Donenfeld" <Jason@zx2c4.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: tytso@mit.edu Cc: Florian Westphal <fw@strlen.de> Cc: Marc Plumb <lkml.mplumb@gmail.com> Tested-by: NSedat Dilek <sedat.dilek@gmail.com> Signed-off-by: NWilly Tarreau <w@1wt.eu>
-
由 George Spelvin 提交于
Non-cryptographic PRNGs may have great statistical properties, but are usually trivially predictable to someone who knows the algorithm, given a small sample of their output. An LFSR like prandom_u32() is particularly simple, even if the sample is widely scattered bits. It turns out the network stack uses prandom_u32() for some things like random port numbers which it would prefer are *not* trivially predictable. Predictability led to a practical DNS spoofing attack. Oops. This patch replaces the LFSR with a homebrew cryptographic PRNG based on the SipHash round function, which is in turn seeded with 128 bits of strong random key. (The authors of SipHash have *not* been consulted about this abuse of their algorithm.) Speed is prioritized over security; attacks are rare, while performance is always wanted. Replacing all callers of prandom_u32() is the quick fix. Whether to reinstate a weaker PRNG for uses which can tolerate it is an open question. Commit f227e3ec ("random32: update the net random state on interrupt and activity") was an earlier attempt at a solution. This patch replaces it. Reported-by: NAmit Klein <aksecurity@gmail.com> Cc: Willy Tarreau <w@1wt.eu> Cc: Eric Dumazet <edumazet@google.com> Cc: "Jason A. Donenfeld" <Jason@zx2c4.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Kees Cook <keescook@chromium.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: tytso@mit.edu Cc: Florian Westphal <fw@strlen.de> Cc: Marc Plumb <lkml.mplumb@gmail.com> Fixes: f227e3ec ("random32: update the net random state on interrupt and activity") Signed-off-by: NGeorge Spelvin <lkml@sdf.org> Link: https://lore.kernel.org/netdev/20200808152628.GA27941@SDF.ORG/ [ willy: partial reversal of f227e3ec; moved SIPROUND definitions to prandom.h for later use; merged George's prandom_seed() proposal; inlined siprand_u32(); replaced the net_rand_state[] array with 4 members to fix a build issue; cosmetic cleanups to make checkpatch happy; fixed RANDOM32_SELFTEST build ] Signed-off-by: NWilly Tarreau <w@1wt.eu>
-
- 23 10月, 2020 4 次提交
-
-
由 Juergen Gross 提交于
With the switch to the lateeoi model for interdomain event channels some functions are no longer in use. Remove them. Suggested-by: NJan Beulich <jbeulich@suse.com> Signed-off-by: NJuergen Gross <jgross@suse.com> Reviewed-by: NJan Beulich <jbeulich@suse.com> Link: https://lore.kernel.org/r/20201022094907.28560-2-jgross@suse.comSigned-off-by: NBoris Ostrovsky <boris.ostrovsky@oracle.com>
-
由 Ben Gardon 提交于
Dirty logging is a key feature of the KVM MMU and must be supported by the TDP MMU. Add support for both the write protection and PML dirty logging modes. Tested by running kvm-unit-tests and KVM selftests on an Intel Haswell machine. This series introduced no new failures. This series can be viewed in Gerrit at: https://linux-review.googlesource.com/c/virt/kvm/kvm/+/2538Signed-off-by: NBen Gardon <bgardon@google.com> Message-Id: <20201014182700.2888246-16-bgardon@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Helge Deller 提交于
When starting a HP machine with HIL driver but without an HIL keyboard or HIL mouse attached, it may happen that data written to the HIL loop gets stuck (e.g. because the transaction queue is full). Usually one will then have to reboot the machine because all you see is and endless output of: Transaction add failed: transaction already queued? In the higher layers hp_sdc_enqueue_transaction() is called to queued up a HIL packet. This function returns an error code, and this patch adds the necessary checks for this return code and disables the HIL driver if further packets can't be sent. Tested on a HP 730 and a HP 715/64 machine. Signed-off-by: NHelge Deller <deller@gmx.de> Cc: <stable@vger.kernel.org>
-
由 Jens Axboe 提交于
With the set_fs change, we can no longer rely on copy_{to,from}_user() accepting a kernel pointer, and it was bad form to do so anyway. Clean this up and change the internal helper that io_uring uses to deal with kernel pointers instead. This puts the offset copy in/out in __do_splice() instead, which just calls the same helper. Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
- 22 10月, 2020 10 次提交
-
-
由 Harshad Shirwadkar 提交于
This patch adds fast commit recovery path support for Ext4 file system. We add several helper functions that are similar in spirit to e2fsprogs journal recovery path handlers. Example of such functions include - a simple block allocator, idempotent block bitmap update function etc. Using these routines and the fast commit log in the fast commit area, the recovery path (ext4_fc_replay()) performs fast commit log recovery. Reported-by: Nkernel test robot <lkp@intel.com> Signed-off-by: NHarshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20201015203802.3597742-8-harshadshirwadkar@gmail.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Harshad Shirwadkar 提交于
This patch adds fast commit recovery support in JBD2. Signed-off-by: NHarshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20201015203802.3597742-7-harshadshirwadkar@gmail.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Harshad Shirwadkar 提交于
This patch adds main fast commit commit path handlers. The overall patch can be divided into two inter-related parts: (A) Metadata updates tracking This part consists of helper functions to track changes that need to be committed during a commit operation. These updates are maintained by Ext4 in different in-memory queues. Following are the APIs and their short description that are implemented in this patch: - ext4_fc_track_link/unlink/creat() - Track unlink. link and creat operations - ext4_fc_track_range() - Track changed logical block offsets inodes - ext4_fc_track_inode() - Track inodes - ext4_fc_mark_ineligible() - Mark file system fast commit ineligible() - ext4_fc_start_update() / ext4_fc_stop_update() / ext4_fc_start_ineligible() / ext4_fc_stop_ineligible() These functions are useful for co-ordinating inode updates with commits. (B) Main commit Path This part consists of functions to convert updates tracked in in-memory data structures into on-disk commits. Function ext4_fc_commit() is the main entry point to commit path. Reported-by: Nkernel test robot <lkp@intel.com> Signed-off-by: NHarshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20201015203802.3597742-6-harshadshirwadkar@gmail.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Harshad Shirwadkar 提交于
This functions adds necessary APIs needed in JBD2 layer for fast commits. Signed-off-by: NHarshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20201015203802.3597742-5-harshadshirwadkar@gmail.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Harshad Shirwadkar 提交于
This patch adds fast commit area trackers in the journal_t structure. These are initialized via the jbd2_fc_init() routine that this patch adds. This patch also adds ext4/fast_commit.c and ext4/fast_commit.h files for fast commit code that will be added in subsequent patches in this series. Reported-by: Nkernel test robot <lkp@intel.com> Signed-off-by: NHarshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20201015203802.3597742-4-harshadshirwadkar@gmail.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Harshad Shirwadkar 提交于
We are running out of mount option bits. Add handling for using s_mount_opt2. Add ext4 and jbd2 fast commit feature flag and also add ability to turn off the fast commit feature in Ext4. Signed-off-by: NHarshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://lore.kernel.org/r/20201015203802.3597742-3-harshadshirwadkar@gmail.comSigned-off-by: NTheodore Ts'o <tytso@mit.edu>
-
由 Di Zhu 提交于
"ip addr show" command execute error when we have a physical network card with a large number of VFs The return value of if_nlmsg_size() in rtnl_calcit() will exceed range of u16 data type when any network cards has a larger number of VFs. rtnl_vfinfo_size() will significant increase needed dump size when the value of num_vfs is larger. Eventually we get a wrong value of min_ifinfo_dump_size because of overflow which decides the memory size needed by netlink dump and netlink_dump() will return -EMSGSIZE because of not enough memory was allocated. So fix it by promoting min_dump_alloc data type to u32 to avoid whole netlink message size overflow and it's also align with the data type of struct netlink_callback{}.min_dump_alloc which is assigned by return value of rtnl_calcit() Signed-off-by: NDi Zhu <zhudi21@huawei.com> Link: https://lore.kernel.org/r/20201021020053.1401-1-zhudi21@huawei.comSigned-off-by: NJakub Kicinski <kuba@kernel.org>
-
由 Toke Høiland-Jørgensen 提交于
Based on the discussion in [0], update the bpf_redirect_neigh() helper to accept an optional parameter specifying the nexthop information. This makes it possible to combine bpf_fib_lookup() and bpf_redirect_neigh() without incurring a duplicate FIB lookup - since the FIB lookup helper will return the nexthop information even if no neighbour is present, this can simply be passed on to bpf_redirect_neigh() if bpf_fib_lookup() returns BPF_FIB_LKUP_RET_NO_NEIGH. Thus fix & extend it before helper API is frozen. [0] https://lore.kernel.org/bpf/393e17fc-d187-3a8d-2f0d-a627c7c63fca@iogearbox.net/Signed-off-by: NToke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: NDaniel Borkmann <daniel@iogearbox.net> Reviewed-by: NDavid Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/bpf/160322915615.32199.1187570224032024535.stgit@toke.dk
-
由 Peter Xu 提交于
Cache the address space ID just like the slot ID. It will be used in order to fill in the dirty ring entries. Suggested-by: NPaolo Bonzini <pbonzini@redhat.com> Suggested-by: NSean Christopherson <sean.j.christopherson@intel.com> Reviewed-by: NSean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: NPeter Xu <peterx@redhat.com> Message-Id: <20201014182700.2888246-7-bgardon@google.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
由 Oliver Upton 提交于
KVM unconditionally provides PV features to the guest, regardless of the configured CPUID. An unwitting guest that doesn't check KVM_CPUID_FEATURES before use could access paravirt features that userspace did not intend to provide. Fix this by checking the guest's CPUID before performing any paravirtual operations. Introduce a capability, KVM_CAP_ENFORCE_PV_FEATURE_CPUID, to gate the aforementioned enforcement. Migrating a VM from a host w/o this patch to a host with this patch could silently change the ABI exposed to the guest, warranting that we default to the old behavior and opt-in for the new one. Reviewed-by: NJim Mattson <jmattson@google.com> Reviewed-by: NPeter Shier <pshier@google.com> Signed-off-by: NOliver Upton <oupton@google.com> Change-Id: I202a0926f65035b872bfe8ad15307c026de59a98 Message-Id: <20200818152429.1923996-4-oupton@google.com> Reviewed-by: NWanpeng Li <wanpengli@tencent.com> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 21 10月, 2020 3 次提交
-
-
由 Pierre Morel 提交于
An architecture may restrict host access to guest memory, e.g. IBM s390 Secure Execution or AMD SEV. Provide a new Kconfig entry the architecture can select, CONFIG_ARCH_HAS_RESTRICTED_VIRTIO_MEMORY_ACCESS, when it provides the arch_has_restricted_virtio_memory_access callback to advertise to VIRTIO common code when the architecture restricts memory access from the host. The common code can then fail the probe for any device where VIRTIO_F_ACCESS_PLATFORM is required, but not set. Signed-off-by: NPierre Morel <pmorel@linux.ibm.com> Reviewed-by: NCornelia Huck <cohuck@redhat.com> Reviewed-by: NHalil Pasic <pasic@linux.ibm.com> Link: https://lore.kernel.org/r/1599728030-17085-2-git-send-email-pmorel@linux.ibm.comSigned-off-by: NMichael S. Tsirkin <mst@redhat.com> Acked-by: NChristian Borntraeger <borntraeger@de.ibm.com>
-
由 Dai Ngo 提交于
NFS_FS=y as dependency of CONFIG_NFSD_V4_2_INTER_SSC still have build errors and some configs with NFSD=m to get NFS4ERR_STALE error when doing inter server copy. Added ops table in nfs_common for knfsd to access NFS client modules. Fixes: 3ac3711a ("NFSD: Fix NFS server build errors") Signed-off-by: NDai Ngo <dai.ngo@oracle.com> Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
-
由 Bean Huo 提交于
This patch is to fix typo in the comment of helper pm_runtime_set_active(). Signed-off-by: NBean Huo <beanhuo@micron.com> [ rjw: Subject edit ] Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 20 10月, 2020 5 次提交
-
-
由 Yufen Yu 提交于
After removing blk-sq code, there is no user of nr_batch_requests and last_waited in kernel. Signed-off-by: NYufen Yu <yuyufen@huawei.com> Signed-off-by: NJens Axboe <axboe@kernel.dk>
-
由 Saeed Mirzamohammadi 提交于
This patch fixes the issue due to: BUG: KASAN: slab-out-of-bounds in nft_flow_rule_create+0x622/0x6a2 net/netfilter/nf_tables_offload.c:40 Read of size 8 at addr ffff888103910b58 by task syz-executor227/16244 The error happens when expr->ops is accessed early on before performing the boundary check and after nft_expr_next() moves the expr to go out-of-bounds. This patch checks the boundary condition before expr->ops that fixes the slab-out-of-bounds Read issue. Add nft_expr_more() and use it to fix this problem. Signed-off-by: NSaeed Mirzamohammadi <saeed.mirzamohammadi@oracle.com> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Christoph Hellwig 提交于
Due to a mismerge a bunch of prototypes that should have moved to dma-map-ops.h are still in dma-mapping.h, fix that up. Signed-off-by: NChristoph Hellwig <hch@lst.de>
-
由 Juergen Gross 提交于
In order to avoid tight event channel related IRQ loops add a new framework of "late EOI" handling: the IRQ the event channel is bound to will be masked until the event has been handled and the related driver is capable to handle another event. The driver is responsible for unmasking the event channel via the new function xen_irq_lateeoi(). This is similar to binding an event channel to a threaded IRQ, but without having to structure the driver accordingly. In order to support a future special handling in case a rogue guest is sending lots of unsolicited events, add a flag to xen_irq_lateeoi() which can be set by the caller to indicate the event was a spurious one. This is part of XSA-332. Cc: stable@vger.kernel.org Reported-by: NJulien Grall <julien@xen.org> Signed-off-by: NJuergen Gross <jgross@suse.com> Reviewed-by: NJan Beulich <jbeulich@suse.com> Reviewed-by: NStefano Stabellini <sstabellini@kernel.org> Reviewed-by: NWei Liu <wl@xen.org>
-
由 Al Grant 提交于
perf_event.h has macros that define the field offsets in the data_src bitmask in perf records. The SNOOPX and REMOTE offsets were both 37. These are distinct fields, and the bitfield layout in perf_mem_data_src confirms that SNOOPX should be at offset 38. Fixes: 52839e65 ("perf tools: Add support for printing new mem_info encodings") Signed-off-by: NAl Grant <al.grant@foss.arm.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: NAndi Kleen <ak@linux.intel.com> Link: https://lkml.kernel.org/r/4ac9f5cc-4388-b34a-9999-418a4099415d@foss.arm.com
-
- 19 10月, 2020 5 次提交
-
-
由 Christoph Hellwig 提交于
All users are gone now. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Juergen Gross <jgross@suse.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Link: https://lkml.kernel.org/r/20201002122204.1534411-12-hch@lst.deSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Christoph Hellwig 提交于
Add a proper helper to remap PFNs into kernel virtual space so that drivers don't have to abuse alloc_vm_area and open coded PTE manipulation for it. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Juergen Gross <jgross@suse.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Link: https://lkml.kernel.org/r/20201002122204.1534411-4-hch@lst.deSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Christoph Hellwig 提交于
Add a flag so that vmap takes ownership of the passed in page array. When vfree is called on such an allocation it will put one reference on each page, and free the page array itself. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Chris Wilson <chris@chris-wilson.co.uk> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com> Cc: Juergen Gross <jgross@suse.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Stefano Stabellini <sstabellini@kernel.org> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com> Cc: Uladzislau Rezki (Sony) <urezki@gmail.com> Link: https://lkml.kernel.org/r/20201002122204.1534411-3-hch@lst.deSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Minchan Kim 提交于
There is usecase that System Management Software(SMS) want to give a memory hint like MADV_[COLD|PAGEEOUT] to other processes and in the case of Android, it is the ActivityManagerService. The information required to make the reclaim decision is not known to the app. Instead, it is known to the centralized userspace daemon(ActivityManagerService), and that daemon must be able to initiate reclaim on its own without any app involvement. To solve the issue, this patch introduces a new syscall process_madvise(2). It uses pidfd of an external process to give the hint. It also supports vector address range because Android app has thousands of vmas due to zygote so it's totally waste of CPU and power if we should call the syscall one by one for each vma.(With testing 2000-vma syscall vs 1-vector syscall, it showed 15% performance improvement. I think it would be bigger in real practice because the testing ran very cache friendly environment). Another potential use case for the vector range is to amortize the cost ofTLB shootdowns for multiple ranges when using MADV_DONTNEED; this could benefit users like TCP receive zerocopy and malloc implementations. In future, we could find more usecases for other advises so let's make it happens as API since we introduce a new syscall at this moment. With that, existing madvise(2) user could replace it with process_madvise(2) with their own pid if they want to have batch address ranges support feature. ince it could affect other process's address range, only privileged process(PTRACE_MODE_ATTACH_FSCREDS) or something else(e.g., being the same UID) gives it the right to ptrace the process could use it successfully. The flag argument is reserved for future use if we need to extend the API. I think supporting all hints madvise has/will supported/support to process_madvise is rather risky. Because we are not sure all hints make sense from external process and implementation for the hint may rely on the caller being in the current context so it could be error-prone. Thus, I just limited hints as MADV_[COLD|PAGEOUT] in this patch. If someone want to add other hints, we could hear the usecase and review it for each hint. It's safer for maintenance rather than introducing a buggy syscall but hard to fix it later. So finally, the API is as follows, ssize_t process_madvise(int pidfd, const struct iovec *iovec, unsigned long vlen, int advice, unsigned int flags); DESCRIPTION The process_madvise() system call is used to give advice or directions to the kernel about the address ranges from external process as well as local process. It provides the advice to address ranges of process described by iovec and vlen. The goal of such advice is to improve system or application performance. The pidfd selects the process referred to by the PID file descriptor specified in pidfd. (See pidofd_open(2) for further information) The pointer iovec points to an array of iovec structures, defined in <sys/uio.h> as: struct iovec { void *iov_base; /* starting address */ size_t iov_len; /* number of bytes to be advised */ }; The iovec describes address ranges beginning at address(iov_base) and with size length of bytes(iov_len). The vlen represents the number of elements in iovec. The advice is indicated in the advice argument, which is one of the following at this moment if the target process specified by pidfd is external. MADV_COLD MADV_PAGEOUT Permission to provide a hint to external process is governed by a ptrace access mode PTRACE_MODE_ATTACH_FSCREDS check; see ptrace(2). The process_madvise supports every advice madvise(2) has if target process is in same thread group with calling process so user could use process_madvise(2) to extend existing madvise(2) to support vector address ranges. RETURN VALUE On success, process_madvise() returns the number of bytes advised. This return value may be less than the total number of requested bytes, if an error occurred. The caller should check return value to determine whether a partial advice occurred. FAQ: Q.1 - Why does any external entity have better knowledge? Quote from Sandeep "For Android, every application (including the special SystemServer) are forked from Zygote. The reason of course is to share as many libraries and classes between the two as possible to benefit from the preloading during boot. After applications start, (almost) all of the APIs end up calling into this SystemServer process over IPC (binder) and back to the application. In a fully running system, the SystemServer monitors every single process periodically to calculate their PSS / RSS and also decides which process is "important" to the user for interactivity. So, because of how these processes start _and_ the fact that the SystemServer is looping to monitor each process, it does tend to *know* which address range of the application is not used / useful. Besides, we can never rely on applications to clean things up themselves. We've had the "hey app1, the system is low on memory, please trim your memory usage down" notifications for a long time[1]. They rely on applications honoring the broadcasts and very few do. So, if we want to avoid the inevitable killing of the application and restarting it, some way to be able to tell the OS about unimportant memory in these applications will be useful. - ssp Q.2 - How to guarantee the race(i.e., object validation) between when giving a hint from an external process and get the hint from the target process? process_madvise operates on the target process's address space as it exists at the instant that process_madvise is called. If the space target process can run between the time the process_madvise process inspects the target process address space and the time that process_madvise is actually called, process_madvise may operate on memory regions that the calling process does not expect. It's the responsibility of the process calling process_madvise to close this race condition. For example, the calling process can suspend the target process with ptrace, SIGSTOP, or the freezer cgroup so that it doesn't have an opportunity to change its own address space before process_madvise is called. Another option is to operate on memory regions that the caller knows a priori will be unchanged in the target process. Yet another option is to accept the race for certain process_madvise calls after reasoning that mistargeting will do no harm. The suggested API itself does not provide synchronization. It also apply other APIs like move_pages, process_vm_write. The race isn't really a problem though. Why is it so wrong to require that callers do their own synchronization in some manner? Nobody objects to write(2) merely because it's possible for two processes to open the same file and clobber each other's writes --- instead, we tell people to use flock or something. Think about mmap. It never guarantees newly allocated address space is still valid when the user tries to access it because other threads could unmap the memory right before. That's where we need synchronization by using other API or design from userside. It shouldn't be part of API itself. If someone needs more fine-grained synchronization rather than process level, there were two ideas suggested - cookie[2] and anon-fd[3]. Both are applicable via using last reserved argument of the API but I don't think it's necessary right now since we have already ways to prevent the race so don't want to add additional complexity with more fine-grained optimization model. To make the API extend, it reserved an unsigned long as last argument so we could support it in future if someone really needs it. Q.3 - Why doesn't ptrace work? Injecting an madvise in the target process using ptrace would not work for us because such injected madvise would have to be executed by the target process, which means that process would have to be runnable and that creates the risk of the abovementioned race and hinting a wrong VMA. Furthermore, we want to act the hint in caller's context, not the callee's, because the callee is usually limited in cpuset/cgroups or even freezed state so they can't act by themselves quick enough, which causes more thrashing/kill. It doesn't work if the target process are ptraced(e.g., strace, debugger, minidump) because a process can have at most one ptracer. [1] https://developer.android.com/topic/performance/memory" [2] process_getinfo for getting the cookie which is updated whenever vma of process address layout are changed - Daniel Colascione - https://lore.kernel.org/lkml/20190520035254.57579-1-minchan@kernel.org/T/#m7694416fd179b2066a2c62b5b139b14e3894e224 [3] anonymous fd which is used for the object(i.e., address range) validation - Michal Hocko - https://lore.kernel.org/lkml/20200120112722.GY18451@dhcp22.suse.cz/ [minchan@kernel.org: fix process_madvise build break for arm64] Link: http://lkml.kernel.org/r/20200303145756.GA219683@google.com [minchan@kernel.org: fix build error for mips of process_madvise] Link: http://lkml.kernel.org/r/20200508052517.GA197378@google.com [akpm@linux-foundation.org: fix patch ordering issue] [akpm@linux-foundation.org: fix arm64 whoops] [minchan@kernel.org: make process_madvise() vlen arg have type size_t, per Florian] [akpm@linux-foundation.org: fix i386 build] [sfr@canb.auug.org.au: fix syscall numbering] Link: https://lkml.kernel.org/r/20200905142639.49fc3f1a@canb.auug.org.au [sfr@canb.auug.org.au: madvise.c needs compat.h] Link: https://lkml.kernel.org/r/20200908204547.285646b4@canb.auug.org.au [minchan@kernel.org: fix mips build] Link: https://lkml.kernel.org/r/20200909173655.GC2435453@google.com [yuehaibing@huawei.com: remove duplicate header which is included twice] Link: https://lkml.kernel.org/r/20200915121550.30584-1-yuehaibing@huawei.com [minchan@kernel.org: do not use helper functions for process_madvise] Link: https://lkml.kernel.org/r/20200921175539.GB387368@google.com [akpm@linux-foundation.org: pidfd_get_pid() gained an argument] [sfr@canb.auug.org.au: fix up for "iov_iter: transparently handle compat iovecs in import_iovec"] Link: https://lkml.kernel.org/r/20200928212542.468e1fef@canb.auug.org.auSigned-off-by: NMinchan Kim <minchan@kernel.org> Signed-off-by: NYueHaibing <yuehaibing@huawei.com> Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NSuren Baghdasaryan <surenb@google.com> Reviewed-by: NVlastimil Babka <vbabka@suse.cz> Acked-by: NDavid Rientjes <rientjes@google.com> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Brian Geffon <bgeffon@google.com> Cc: Christian Brauner <christian@brauner.io> Cc: Daniel Colascione <dancol@google.com> Cc: Jann Horn <jannh@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Dias <joaodias@google.com> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Oleksandr Natalenko <oleksandr@redhat.com> Cc: Sandeep Patil <sspatil@google.com> Cc: SeongJae Park <sj38.park@gmail.com> Cc: SeongJae Park <sjpark@amazon.de> Cc: Shakeel Butt <shakeelb@google.com> Cc: Sonny Rao <sonnyrao@google.com> Cc: Tim Murray <timmurray@google.com> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Florian Weimer <fw@deneb.enyo.de> Cc: <linux-man@vger.kernel.org> Link: http://lkml.kernel.org/r/20200302193630.68771-3-minchan@kernel.org Link: http://lkml.kernel.org/r/20200508183320.GA125527@google.com Link: http://lkml.kernel.org/r/20200622192900.22757-4-minchan@kernel.org Link: https://lkml.kernel.org/r/20200901000633.1920247-4-minchan@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Minchan Kim 提交于
process_madvise syscall needs pidfd_get_pid function to translate pidfd to pid so this patch move the function to kernel/pid.c. Suggested-by: NAlexander Duyck <alexander.h.duyck@linux.intel.com> Signed-off-by: NMinchan Kim <minchan@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Reviewed-by: NSuren Baghdasaryan <surenb@google.com> Reviewed-by: NAlexander Duyck <alexander.h.duyck@linux.intel.com> Reviewed-by: NVlastimil Babka <vbabka@suse.cz> Acked-by: NChristian Brauner <christian.brauner@ubuntu.com> Acked-by: NDavid Rientjes <rientjes@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Jann Horn <jannh@google.com> Cc: Brian Geffon <bgeffon@google.com> Cc: Daniel Colascione <dancol@google.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: John Dias <joaodias@google.com> Cc: Kirill Tkhai <ktkhai@virtuozzo.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Oleksandr Natalenko <oleksandr@redhat.com> Cc: Sandeep Patil <sspatil@google.com> Cc: SeongJae Park <sj38.park@gmail.com> Cc: SeongJae Park <sjpark@amazon.de> Cc: Shakeel Butt <shakeelb@google.com> Cc: Sonny Rao <sonnyrao@google.com> Cc: Tim Murray <timmurray@google.com> Cc: Christian Brauner <christian.brauner@ubuntu.com> Cc: Florian Weimer <fw@deneb.enyo.de> Cc: <linux-man@vger.kernel.org> Link: http://lkml.kernel.org/r/20200302193630.68771-5-minchan@kernel.org Link: http://lkml.kernel.org/r/20200622192900.22757-3-minchan@kernel.org Link: https://lkml.kernel.org/r/20200901000633.1920247-3-minchan@kernel.orgSigned-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-