- 27 1月, 2016 4 次提交
-
-
由 Herbert Xu 提交于
This patch replaces uses of the long obsolete hash interface with either shash (for non-SG users) or ahash. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Herbert Xu 提交于
This patch replaces uses of the long obsolete hash interface with ahash. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au> Reviewed-by: NMike Christie <michaelc@cs.wisc.edu>
-
由 Herbert Xu 提交于
This patch replaces uses of blkcipher with skcipher and the long obsolete hash interface with either shash (for non-SG users) and ahash. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Herbert Xu 提交于
This patch adds the helper crypto_skcipher_driver_name which returns the driver name of the alg object for a given tfm. This is needed by ecryptfs. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
- 25 1月, 2016 5 次提交
-
-
由 Herbert Xu 提交于
This patch adds the helper crypto_has_ahash which should replace crypto_has_hash. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Herbert Xu 提交于
As the size of an skcipher_request is variable, it's awkward to zero it explicitly. This patch adds a helper to do that which should be used when it is created on the stack. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Herbert Xu 提交于
As the size of an ahash_request or shash_desc is variable, it's awkward to zero them explicitly. This patch adds helpers to do that which should be used when they are created on the stack. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Stephan Mueller 提交于
The newly released FIPS 140-2 IG 9.8 specifies that for SP800-90A compliant DRBGs, the FIPS 140-2 continuous random number generator test is not required any more. This patch removes the test and all associated data structures. Signed-off-by: NStephan Mueller <smueller@chronox.de> Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
由 Herbert Xu 提交于
While converting ecryptfs over to skcipher I found that it needs to pick a default key size if one isn't given. Rather than having it poke into the guts of the algorithm to get max_keysize, let's provide a helper that is meant to give a sane default (just in case we ever get an algorithm that has no maximum key size). Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
- 24 1月, 2016 5 次提交
-
-
由 Simon Arlott 提交于
The "dual_image" and "inactive_flag" fields should be merged into a single "image_sequence" field. Signed-off-by: NSimon Arlott <simon@fire.lp0.eu> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Brian Norris <computersforpeace@gmail.com> Cc: Kevin Cernekee <cernekee@gmail.com> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Jonas Gorski <jogo@openwrt.org> Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org> Cc: MIPS Mailing List <linux-mips@linux-mips.org> Cc: MTD Maling List <linux-mtd@lists.infradead.org> Patchwork: https://patchwork.linux-mips.org/patch/11834/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
-
由 Simon Arlott 提交于
The extended flash address needs to be subtracted from bcm_tag flash image offsets. Move this value to the bcm_tag header file. Renamed define name to consistently use bcm963xx for flash layout which should be considered a property of the board and not the SoC (i.e. bcm63xx could theoretically be used on a board without CFE or any flash). Signed-off-by: NSimon Arlott <simon@fire.lp0.eu> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Brian Norris <computersforpeace@gmail.com> Cc: Kevin Cernekee <cernekee@gmail.com> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Jonas Gorski <jogo@openwrt.org> Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org> Cc: MIPS Mailing List <linux-mips@linux-mips.org> Cc: MTD Maling List <linux-mtd@lists.infradead.org> Patchwork: https://patchwork.linux-mips.org/patch/11833/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
-
由 Simon Arlott 提交于
Move Broadcom BCM963xx image tag data structure to include/linux/ so that drivers outside of mach-bcm63xx can use it. Signed-off-by: NSimon Arlott <simon@fire.lp0.eu> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Brian Norris <computersforpeace@gmail.com> Cc: Kevin Cernekee <cernekee@gmail.com> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Jonas Gorski <jogo@openwrt.org> Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org> Cc: MIPS Mailing List <linux-mips@linux-mips.org> Cc: MTD Maling List <linux-mtd@lists.infradead.org> Patchwork: https://patchwork.linux-mips.org/patch/11832/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
-
由 Simon Arlott 提交于
Broadcom BCM963xx boards have multiple nvram variants across different SoCs with additional checksum fields added whenever the size of the nvram was extended. Add this structure as a header file so that multiple drivers can use it. Signed-off-by: NSimon Arlott <simon@fire.lp0.eu> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Brian Norris <computersforpeace@gmail.com> Cc: Kevin Cernekee <cernekee@gmail.com> Cc: Florian Fainelli <f.fainelli@gmail.com> Cc: Jonas Gorski <jogo@openwrt.org> Cc: Linux Kernel Mailing List <linux-kernel@vger.kernel.org> Cc: MIPS Mailing List <linux-mips@linux-mips.org> Cc: MTD Maling List <linux-mtd@lists.infradead.org> Patchwork: https://patchwork.linux-mips.org/patch/11830/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
-
由 Joshua Henderson 提交于
This adds support for the Microchip PIC32 MIPS microcontroller with the specific variant PIC32MZDA. PIC32MZDA is based on the MIPS m14KEc core and boots using device tree. This includes an early pin setup and early clock setup needed prior to device tree being initialized. In additon, an interface is provided to synchronize access to registers shared across several peripherals. Signed-off-by: NJoshua Henderson <joshua.henderson@microchip.com> Cc: linux-kernel@vger.kernel.org Cc: linux-mips@linux-mips.org Patchwork: https://patchwork.linux-mips.org/patch/12097/Signed-off-by: NRalf Baechle <ralf@linux-mips.org>
-
- 23 1月, 2016 6 次提交
-
-
由 Ross Zwisler 提交于
To properly handle fsync/msync in an efficient way DAX needs to track dirty pages so it is able to flush them durably to media on demand. The tracking of dirty pages is done via the radix tree in struct address_space. This radix tree is already used by the page writeback infrastructure for tracking dirty pages associated with an open file, and it already has support for exceptional (non struct page*) entries. We build upon these features to add exceptional entries to the radix tree for DAX dirty PMD or PTE pages at fault time. [dan.j.williams@intel.com: fix dax_pmd_dbg build warning] Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Dave Chinner <david@fromorbit.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jan Kara <jack@suse.com> Cc: Jeff Layton <jlayton@poochiereds.net> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Ross Zwisler 提交于
Add find_get_entries_tag() to the family of functions that include find_get_entries(), find_get_pages() and find_get_pages_tag(). This is needed for DAX dirty page handling because we need a list of both page offsets and radix tree entries ('indices' and 'entries' in this function) that are marked with the PAGECACHE_TAG_TOWRITE tag. Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com> Reviewed-by: NJan Kara <jack@suse.cz> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Dave Chinner <david@fromorbit.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jeff Layton <jlayton@poochiereds.net> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Ross Zwisler 提交于
Add support for tracking dirty DAX entries in the struct address_space radix tree. This tree is already used for dirty page writeback, and it already supports the use of exceptional (non struct page*) entries. In order to properly track dirty DAX pages we will insert new exceptional entries into the radix tree that represent dirty DAX PTE or PMD pages. These exceptional entries will also contain the writeback addresses for the PTE or PMD faults that we can use at fsync/msync time. There are currently two types of exceptional entries (shmem and shadow) that can be placed into the radix tree, and this adds a third. We rely on the fact that only one type of exceptional entry can be found in a given radix tree based on its usage. This happens for free with DAX vs shmem but we explicitly prevent shadow entries from being added to radix trees for DAX mappings. The only shadow entries that would be generated for DAX radix trees would be to track zero page mappings that were created for holes. These pages would receive minimal benefit from having shadow entries, and the choice to have only one type of exceptional entry in a given radix tree makes the logic simpler both in clear_exceptional_entry() and in the rest of DAX. Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Dave Chinner <david@fromorbit.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jan Kara <jack@suse.com> Cc: Jeff Layton <jlayton@poochiereds.net> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Hugh Dickins <hughd@google.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Ross Zwisler 提交于
__arch_wb_cache_pmem() was already an internal implementation detail of the x86 PMEM API, but this functionality needs to be exported as part of the general PMEM API to handle the fsync/msync case for DAX mmaps. One thing worth noting is that we really do want this to be part of the PMEM API as opposed to a stand-alone function like clflush_cache_range() because of ordering restrictions. By having wb_cache_pmem() as part of the PMEM API we can leave it unordered, call it multiple times to write back large amounts of memory, and then order the multiple calls with a single wmb_pmem(). Signed-off-by: NRoss Zwisler <ross.zwisler@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: "J. Bruce Fields" <bfields@fieldses.org> Cc: "Theodore Ts'o" <tytso@mit.edu> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Andreas Dilger <adilger.kernel@dilger.ca> Cc: Dave Chinner <david@fromorbit.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jan Kara <jack@suse.com> Cc: Jeff Layton <jlayton@poochiereds.net> Cc: Matthew Wilcox <willy@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Matthew Wilcox <matthew.r.wilcox@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Al Viro 提交于
Cc: stable@vger.kernel.org # v4.2+ Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested}, inode_foo(inode) being mutex_foo(&inode->i_mutex). Please, use those for access to ->i_mutex; over the coming cycle ->i_mutex will become rwsem, with ->lookup() done with it held only shared. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 22 1月, 2016 11 次提交
-
-
由 yalin wang 提交于
This crash is caused by NULL pointer deference, in page_to_pfn() marco, when page == NULL : Unable to handle kernel NULL pointer dereference at virtual address 00000000 Internal error: Oops: 94000006 [#1] SMP Modules linked in: CPU: 1 PID: 26 Comm: khugepaged Tainted: G W 4.3.0-rc6-next-20151022ajb-00001-g32f3386-dirty #3 PC is at khugepaged+0x378/0x1af8 LR is at khugepaged+0x418/0x1af8 Process khugepaged (pid: 26, stack limit = 0xffffffc079638020) Call trace: khugepaged+0x378/0x1af8 kthread+0xdc/0xf4 ret_from_fork+0xc/0x40 Code: 35001700 f0002c60 aa0703e3 f9009fa0 (f94000e0) ---[ end trace 637503d8e28ae69e ]--- Kernel panic - not syncing: Fatal exception CPU2: stopping CPU: 2 PID: 0 Comm: swapper/2 Tainted: G D W 4.3.0-rc6-next-20151022ajb-00001-g32f3386-dirty #3 Hardware name: linux,dummy-virt (DT) [akpm@linux-foundation.org: fix fat-fingered merge resolution] Signed-off-by: Nyalin wang <yalin.wang2010@gmail.com> Acked-by: NVlastimil Babka <vbabka@suse.cz> Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: NDavid Rientjes <rientjes@google.com> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kirill A. Shutemov 提交于
After THP refcounting rework we have only two possible return values from pmd_trans_huge_lock(): success and failure. Return-by-pointer for ptl doesn't make much sense in this case. Let's convert pmd_trans_huge_lock() to return ptl on success and NULL on failure. Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Suggested-by: NLinus Torvalds <torvalds@linux-foundation.org> Cc: Minchan Kim <minchan@kernel.org> Acked-by: NMichal Hocko <mhocko@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Ilya Dryomov 提交于
There are a number of problems with revoking a "was sending" message: (1) We never make any attempt to revoke data - only kvecs contibute to con->out_skip. However, once the header (envelope) is written to the socket, our peer learns data_len and sets itself to expect at least data_len bytes to follow front or front+middle. If ceph_msg_revoke() is called while the messenger is sending message's data portion, anything we send after that call is counted by the OSD towards the now revoked message's data portion. The effects vary, the most common one is the eventual hang - higher layers get stuck waiting for the reply to the message that was sent out after ceph_msg_revoke() returned and treated by the OSD as a bunch of data bytes. This is what Matt ran into. (2) Flat out zeroing con->out_kvec_bytes worth of bytes to handle kvecs is wrong. If ceph_msg_revoke() is called before the tag is sent out or while the messenger is sending the header, we will get a connection reset, either due to a bad tag (0 is not a valid tag) or a bad header CRC, which kind of defeats the purpose of revoke. Currently the kernel client refuses to work with header CRCs disabled, but that will likely change in the future, making this even worse. (3) con->out_skip is not reset on connection reset, leading to one or more spurious connection resets if we happen to get a real one between con->out_skip is set in ceph_msg_revoke() and before it's cleared in write_partial_skip(). Fixing (1) and (3) is trivial. The idea behind fixing (2) is to never zero the tag or the header, i.e. send out tag+header regardless of when ceph_msg_revoke() is called. That way the header is always correct, no unnecessary resets are induced and revoke stands ready for disabled CRCs. Since ceph_msg_revoke() rips out con->out_msg, introduce a new "message out temp" and copy the header into it before sending. Cc: stable@vger.kernel.org # 4.0+ Reported-by: NMatt Conner <matt.conner@keepertech.com> Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Tested-by: NMatt Conner <matt.conner@keepertech.com> Reviewed-by: NSage Weil <sage@redhat.com>
-
由 Yaowei Bai 提交于
This patch makes ceph_frag_contains_value return bool to improve readability due to this particular function only using either one or zero as its return value. No functional change. Signed-off-by: NYaowei Bai <baiyaowei@cmss.chinamobile.com> Signed-off-by: NYan, Zheng <zyan@redhat.com>
-
由 Yaowei Bai 提交于
These functions were introduced in commit 3d14c5d2 ("ceph: factor out libceph from Ceph file system"). Howover, there's no user of these functions since then, so remove them for simplicity. Signed-off-by: NYaowei Bai <baiyaowei@cmss.chinamobile.com> Signed-off-by: NYan, Zheng <zyan@redhat.com>
-
由 majd@mellanox.com 提交于
When modifying a QP, the desired operation was determined in the mlx5_core using a transition table that takes the current state, the final state, and returns the desired operation. Since this logic will be used for Raw Packet QP, move the operation table to the mlx5_ib. Signed-off-by: NMajd Dibbiny <majd@mellanox.com> Reviewed-by: NMatan Barak <matanb@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 majd@mellanox.com 提交于
When the user changes the Address Vector(AV) in the modify QP, he provides an SL. This SL should be translated to Ethernet Priority by taking the 3 LSB bits, and modify the QP's TIS according to this Ethernet priority. Signed-off-by: NMajd Dibbiny <majd@mellanox.com> Reviewed-by: NMatan Barak <matanb@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 majd@mellanox.com 提交于
Since Raw Packet QP is composed of RQ and SQ, the IB QP's state is derived from the sub-objects. Therefore we need to query each one of the sub-objects, and decide on the IB QP's state. Signed-off-by: NMajd Dibbiny <majd@mellanox.com> Reviewed-by: NMatan Barak <matanb@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 majd@mellanox.com 提交于
RQ/SQ will be used to implement IB verbs QPs, so the IB QP affiliated events are affiliated also with SQs and RQs. Since SQ, RQ and QP resource numbers do not share the same name space, a queue type field was added to the event data to specify the SW object that the event is affiliated with. Signed-off-by: NMajd Dibbiny <majd@mellanox.com> Reviewed-by: NMatan Barak <matanb@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 majd@mellanox.com 提交于
To be used by mlx5_ib in the following patches for implementing RAW PACKET QP. Add mlx5_core_ prefix to alloc and delloc transport_domain since they are exposed now. Signed-off-by: NMajd Dibbiny <majd@mellanox.com> Reviewed-by: NMatan Barak <matanb@mellanox.com> Signed-off-by: NDoug Ledford <dledford@redhat.com>
-
由 Takashi Iwai 提交于
Instead of the previous ugly hack, introduce a new op, disconnect, to snd_timer_instance object for handling the wake up of pending tasks more cleanly. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=109431Signed-off-by: NTakashi Iwai <tiwai@suse.de>
-
- 21 1月, 2016 9 次提交
-
-
由 Johannes Weiner 提交于
Provide statistics on how much of a cgroup's memory footprint is made up of socket buffers from network connections owned by the group. Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Cc: Vladimir Davydov <vdavydov@virtuozzo.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vladimir Davydov 提交于
Swap cache pages are freed aggressively if swap is nearly full (>50% currently), because otherwise we are likely to stop scanning anonymous when we near the swap limit even if there is plenty of freeable swap cache pages. We should follow the same trend in case of memory cgroup, which has its own swap limit. Signed-off-by: NVladimir Davydov <vdavydov@virtuozzo.com> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vladimir Davydov 提交于
We don't scan anonymous memory if we ran out of swap, neither should we do it in case memcg swap limit is hit, because swap out is impossible anyway. Signed-off-by: NVladimir Davydov <vdavydov@virtuozzo.com> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vladimir Davydov 提交于
The following patches will add more functions to the memcg section of include/linux/swap.h. Some of them will need values defined below the current location of the section. So let's move the section to the end of the file. No functional changes intended. Signed-off-by: NVladimir Davydov <vdavydov@virtuozzo.com> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vladimir Davydov 提交于
mem_cgroup_lruvec_online() takes lruvec, but it only needs memcg. Since get_scan_count(), which is the only user of this function, now possesses pointer to memcg, let's pass memcg directly to mem_cgroup_online() instead of picking it out of lruvec and rename the function accordingly. Signed-off-by: NVladimir Davydov <vdavydov@virtuozzo.com> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vladimir Davydov 提交于
This patchset introduces swap accounting to cgroup2. This patch (of 7): In the legacy hierarchy we charge memsw, which is dubious, because: - memsw.limit must be >= memory.limit, so it is impossible to limit swap usage less than memory usage. Taking into account the fact that the primary limiting mechanism in the unified hierarchy is memory.high while memory.limit is either left unset or set to a very large value, moving memsw.limit knob to the unified hierarchy would effectively make it impossible to limit swap usage according to the user preference. - memsw.usage != memory.usage + swap.usage, because a page occupying both swap entry and a swap cache page is charged only once to memsw counter. As a result, it is possible to effectively eat up to memory.limit of memory pages *and* memsw.limit of swap entries, which looks unexpected. That said, we should provide a different swap limiting mechanism for cgroup2. This patch adds mem_cgroup->swap counter, which charges the actual number of swap entries used by a cgroup. It is only charged in the unified hierarchy, while the legacy hierarchy memsw logic is left intact. The swap usage can be monitored using new memory.swap.current file and limited using memory.swap.max. Note, to charge swap resource properly in the unified hierarchy, we have to make swap_entry_free uncharge swap only when ->usage reaches zero, not just ->count, i.e. when all references to a swap entry, including the one taken by swap cache, are gone. This is necessary, because otherwise swap-in could result in uncharging swap even if the page is still in swap cache and hence still occupies a swap entry. At the same time, this shouldn't break memsw counter logic, where a page is never charged twice for using both memory and swap, because in case of legacy hierarchy we uncharge swap on commit (see mem_cgroup_commit_charge). Signed-off-by: NVladimir Davydov <vdavydov@virtuozzo.com> Acked-by: NJohannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Johannes Weiner 提交于
The creation and teardown of struct mem_cgroup is fairly messy and that has attracted mistakes and subtle bugs before. The main cause for this is that there is no clear model about what needs to happen when, and that attracts more chaos. So create one: 1. mem_cgroup_alloc() should allocate struct mem_cgroup and its auxiliary members and initialize work items, locks etc. so that the object it returns is fully initialized and in a neutral state. 2. mem_cgroup_css_alloc() will use mem_cgroup_alloc() to obtain a new memcg object and configure it and the system according to the role of the new memory-controlled cgroup in the hierarchy. 3. mem_cgroup_css_online() is no longer needed to synchronize with iterators, but it verifies css->id which isn't available earlier. 4. mem_cgroup_css_offline() implements stuff that needs to happen upon the user-visible destruction of a cgroup, which includes stopping all user interfacing as well as releasing certain structures when continued memory consumption would be unexpected at that point. 5. mem_cgroup_css_free() prepares the system and the memcg object for the object's disappearance, neutralizes its state, and then gives it back to mem_cgroup_free(). 6. mem_cgroup_free() releases struct mem_cgroup and auxiliary memory. [arnd@arndb.de: fix SLOB build regression] Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org> Acked-by: NVladimir Davydov <vdavydov@virtuozzo.com> Cc: Michal Hocko <mhocko@suse.cz> Signed-off-by: NArnd Bergmann <arnd@arndb.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Johannes Weiner 提交于
There are no more external users of struct cg_proto, flatten the structure into struct mem_cgroup. Since using those struct members doesn't stand out as much anymore, add cgroup2 static branches to make it clearer which code is legacy. Suggested-by: NVladimir Davydov <vdavydov@virtuozzo.com> Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org> Acked-by: NVladimir Davydov <vdavydov@virtuozzo.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Johannes Weiner 提交于
What CONFIG_INET and CONFIG_LEGACY_KMEM guard inside the memory controller code is insignificant, having these conditionals is not worth the complication and fragility that comes with them. [akpm@linux-foundation.org: rework mem_cgroup_css_free() statement ordering] Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@suse.cz> Acked-by: NVladimir Davydov <vdavydov@virtuozzo.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-