- 20 10月, 2017 5 次提交
-
-
由 John Fastabend 提交于
SK_SKB BPF programs are run from the socket/tcp context but early in the stack before much of the TCP metadata is needed in tcp_skb_cb. So we can use some unused fields to place BPF metadata needed for SK_SKB programs when implementing the redirect function. This allows us to drop the preempt disable logic. It does however require an API change so sk_redirect_map() has been updated to additionally provide ctx_ptr to skb. Note, we do however continue to disable/enable preemption around actual BPF program running to account for map updates. Signed-off-by: NJohn Fastabend <john.fastabend@gmail.com> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Acked-by: NAlexei Starovoitov <ast@kernel.org> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Paul E. McKenney 提交于
Because many of RCU's files have not been included into docbook, a number of errors have accumulated. This commit fixes them. Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mathieu Desnoyers 提交于
This introduces a "register private expedited" membarrier command which allows eventual removal of important memory barrier constraints on the scheduler fast-paths. It changes how the "private expedited" membarrier command (new to 4.14) is used from user-space. This new command allows processes to register their intent to use the private expedited command. This affects how the expedited private command introduced in 4.14-rc is meant to be used, and should be merged before 4.14 final. Processes are now required to register before using MEMBARRIER_CMD_PRIVATE_EXPEDITED, otherwise that command returns EPERM. This fixes a problem that arose when designing requested extensions to sys_membarrier() to allow JITs to efficiently flush old code from instruction caches. Several potential algorithms are much less painful if the user register intent to use this functionality early on, for example, before the process spawns the second thread. Registering at this time removes the need to interrupt each and every thread in that process at the first expedited sys_membarrier() system call. Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@efficios.com> Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Dmitry Torokhov 提交于
Let's allow matching input devices on their property bits, both in-kernel and when generating module aliases. Tested-by: NRoderick Colenbrander <roderick.colenbrander@sony.com> Signed-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
-
由 Dmitry Torokhov 提交于
Factor out and export input_match_device_id() so that modules may use it. It will be needed by joydev to blacklist accelerometers in composite devices. Tested-by: NRoderick Colenbrander <roderick.colenbrander@sony.com> Signed-off-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
-
- 18 10月, 2017 1 次提交
-
-
由 David Howells 提交于
Consolidate KEY_FLAG_INSTANTIATED, KEY_FLAG_NEGATIVE and the rejection error into one field such that: (1) The instantiation state can be modified/read atomically. (2) The error can be accessed atomically with the state. (3) The error isn't stored unioned with the payload pointers. This deals with the problem that the state is spread over three different objects (two bits and a separate variable) and reading or updating them atomically isn't practical, given that not only can uninstantiated keys change into instantiated or rejected keys, but rejected keys can also turn into instantiated keys - and someone accessing the key might not be using any locking. The main side effect of this problem is that what was held in the payload may change, depending on the state. For instance, you might observe the key to be in the rejected state. You then read the cached error, but if the key semaphore wasn't locked, the key might've become instantiated between the two reads - and you might now have something in hand that isn't actually an error code. The state is now KEY_IS_UNINSTANTIATED, KEY_IS_POSITIVE or a negative error code if the key is negatively instantiated. The key_is_instantiated() function is replaced with key_is_positive() to avoid confusion as negative keys are also 'instantiated'. Additionally, barriering is included: (1) Order payload-set before state-set during instantiation. (2) Order state-read before payload-read when using the key. Further separate barriering is necessary if RCU is being used to access the payload content after reading the payload pointers. Fixes: 146aa8b1 ("KEYS: Merge the type-specific data with the payload data") Cc: stable@vger.kernel.org # v4.4+ Reported-by: NEric Biggers <ebiggers@google.com> Signed-off-by: NDavid Howells <dhowells@redhat.com> Reviewed-by: NEric Biggers <ebiggers@google.com>
-
- 17 10月, 2017 1 次提交
-
-
由 Cong Wang 提交于
register_netdevice() could fail early when we have an invalid dev name, in which case ->ndo_uninit() is not called. For tun device, this is a problem because a timer etc. are already initialized and it expects ->ndo_uninit() to clean them up. We could move these initializations into a ->ndo_init() so that register_netdevice() knows better, however this is still complicated due to the logic in tun_detach(). Therefore, I choose to just call dev_get_valid_name() before register_netdevice(), which is quicker and much easier to audit. And for this specific case, it is already enough. Fixes: 96442e42 ("tuntap: choose the txq based on rxq") Reported-by: NDmitry Alexeev <avekceeb@gmail.com> Cc: Jason Wang <jasowang@redhat.com> Cc: "Michael S. Tsirkin" <mst@redhat.com> Signed-off-by: NCong Wang <xiyou.wangcong@gmail.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 14 10月, 2017 4 次提交
-
-
由 Konstantin Khlebnikov 提交于
Kmemleak considers any pointers on task stacks as references. This patch clears newly allocated and reused vmap stacks. Link: http://lkml.kernel.org/r/150728990124.744199.8403409836394318684.stgit@buzzSigned-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru> Acked-by: NCatalin Marinas <catalin.marinas@arm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Matthew Wilcox 提交于
When using FAT on a block device which supports rw_page, we can hit BUG_ON(!PageLocked(page)) in try_to_free_buffers(). This is because we call clean_buffers() after unlocking the page we've written. Introduce a new clean_page_buffers() which cleans all buffers associated with a page and call it from within bdev_write_page(). [akpm@linux-foundation.org: s/PAGE_SIZE/~0U/ per Linus and Matthew] Link: http://lkml.kernel.org/r/20171006211541.GA7409@bombadil.infradead.orgSigned-off-by: NMatthew Wilcox <mawilcox@microsoft.com> Reported-by: NToshi Kani <toshi.kani@hpe.com> Reported-by: NOGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Tested-by: NToshi Kani <toshi.kani@hpe.com> Acked-by: NJohannes Thumshirn <jthumshirn@suse.de> Cc: Ross Zwisler <ross.zwisler@linux.intel.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Randy Dunlap 提交于
Add kernel-doc notation for some macros. Correct kernel-doc comments & typos for a few macros. Link: http://lkml.kernel.org/r/76fa1403-1511-be4c-e9c4-456b43edfad3@infradead.orgSigned-off-by: NRandy Dunlap <rdunlap@infradead.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Arnd Bergmann 提交于
The pci-rcar driver is enabled for compile tests, and this has shown that the driver cannot build without CONFIG_OF, following the inclusion of commit f8f2fe73 ("PCI: rcar: Use new OF interrupt mapping when possible"): drivers/pci/host/pcie-rcar.c: In function 'pci_dma_range_parser_init': drivers/pci/host/pcie-rcar.c:1039:2: error: implicit declaration of function 'of_n_addr_cells' [-Werror=implicit-function-declaration] parser->pna = of_n_addr_cells(node); ^ As pointed out by Ben Dooks and Geert Uytterhoeven, this is actually supposed to build fine, which we can achieve if we make the declaration of of_irq_parse_and_map_pci conditional on CONFIG_OF and provide an empty inline function otherwise, as we do for a lot of other of interfaces. This lets us build the rcar_pci driver again without CONFIG_OF for build testing. All platforms using this driver select OF, so this doesn't change anything for the users. [akpm@linux-foundation.org: be consistent with surrounding code] Link: http://lkml.kernel.org/r/20170911200805.3363318-1-arnd@arndb.de Fixes: c25da477 ("PCI: rcar: Add Renesas R-Car PCIe driver") Signed-off-by: NArnd Bergmann <arnd@arndb.de> Reviewed-by: NFrank Rowand <frank.rowand@sony.com> Acked-by: NGeert Uytterhoeven <geert+renesas@glider.be> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Magnus Damm <damm@opensource.se> Cc: Ben Dooks <ben.dooks@codethink.co.uk> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 13 10月, 2017 3 次提交
-
-
由 Doug Berger 提交于
Any usage of the irq_gc_mask_disable_reg_and_ack() function has been replaced with the desired functionality. The incorrect and ambiguously named function is removed here to prevent accidental misuse. Signed-off-by: NDoug Berger <opendmb@gmail.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Doug Berger 提交于
The irq_gc_mask_disable_reg_and_ack() function name implies that it provides the combined functions of irq_gc_mask_disable_reg() and irq_gc_ack(). However, the implementation does not actually do that since it writes the mask instead of the disable register. It also does not maintain the mask cache which makes it inappropriate to use with other masking functions. In addition, commit 659fb32d ("genirq: replace irq_gc_ack() with {set,clr}_bit variants (fwd)") effectively renamed irq_gc_ack() to irq_gc_ack_set_bit() so this function probably should have also been renamed at that time. The generic chip code currently provides three functions for use with the irq_mask member of the irq_chip structure and two functions for use with the irq_ack member of the irq_chip structure. These functions could be combined into six functions for use with the irq_mask_ack member of the irq_chip structure. However, since only one of the combinations is currently used, only the function irq_gc_mask_disable_and_ack_set() is added by this commit. The '_reg' and '_bit' portions of the base function name were left out of the new combined function name in an attempt to keep the function name length manageable with the 80 character source code line length while still allowing the distinct aspects of each combination to be captured by the name. If other combinations are desired in the future please add them to the irq generic chip library at that time. Signed-off-by: NDoug Berger <opendmb@gmail.com> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
由 Shanker Donthineni 提交于
The current ITS driver works fine as long as normal memory and GICR regions are located within the lower 48bit (>=0 && <2^48) physical address space. Some of the registers GICR_PEND/PROP, GICR_VPEND/VPROP and GITS_CBASER are handled properly but not all when configuring the hardware with 52bit physical address. This patch does the following changes to support 52bit PA. -Handle 52bit PA in GITS_BASERn. -Fix ITT_addr width to 52bits, bits[51:8]. -Fix RDbase width to 52bits, bits[51:16]. -Fix VPT_addr width to 52bits, bits[51:16]. Definition of the GITS_BASERn register when ITS PageSize is 64KB: -Bits[47:16] of the register provide bits[47:16] of the table PA. -Bits[15:12] of the register provide bits[51:48] of the table PA. -Bits[15:00] of the base physical address are 0. Signed-off-by: NShanker Donthineni <shankerd@codeaurora.org> Signed-off-by: NMarc Zyngier <marc.zyngier@arm.com>
-
- 12 10月, 2017 1 次提交
-
-
由 Jan Luebbe 提交于
At least the Armada XP SoC supports 4GB on a single DRAM window. Because the size register values contain the actual size - 1, the MSB is set in that case. For example, the SDRAM window's control register's value is 0xffffffe1 for 4GB (bits 31 to 24 contain the size). The MBUS driver reads back each window's size from registers and calculates the actual size as (control_reg | ~DDR_SIZE_MASK) + 1, which overflows for 32 bit values, resulting in other miscalculations further on (a bad RAM window for the CESA crypto engine calculated by mvebu_mbus_setup_cpu_target_nooverlap() in my case). This patch changes the type in 'struct mbus_dram_window' from u32 to u64, which allows us to keep using the same register calculation code in most MBUS-using drivers (which calculate ->size - 1 again). Fixes: fddddb52 ("bus: introduce an Marvell EBU MBus driver") CC: stable@vger.kernel.org Signed-off-by: NJan Luebbe <jlu@pengutronix.de> Signed-off-by: NGregory CLEMENT <gregory.clement@free-electrons.com>
-
- 10 10月, 2017 1 次提交
-
-
由 Peter Zijlstra 提交于
Eric reported a sysbench regression against commit: 3fed382b ("sched/numa: Implement NUMA node level wake_affine()") Similarly, Rik was looking at the NAS-lu.C benchmark, which regressed against his v3.10 enterprise kernel. PRE (current tip/master): ivb-ep sysbench: 2: [30 secs] transactions: 64110 (2136.94 per sec.) 5: [30 secs] transactions: 143644 (4787.99 per sec.) 10: [30 secs] transactions: 274298 (9142.93 per sec.) 20: [30 secs] transactions: 418683 (13955.45 per sec.) 40: [30 secs] transactions: 320731 (10690.15 per sec.) 80: [30 secs] transactions: 355096 (11834.28 per sec.) hsw-ex NAS: OMP_PROC_BIND/lu.C.x_threads_144_run_1.log: Time in seconds = 18.01 OMP_PROC_BIND/lu.C.x_threads_144_run_2.log: Time in seconds = 17.89 OMP_PROC_BIND/lu.C.x_threads_144_run_3.log: Time in seconds = 17.93 lu.C.x_threads_144_run_1.log: Time in seconds = 434.68 lu.C.x_threads_144_run_2.log: Time in seconds = 405.36 lu.C.x_threads_144_run_3.log: Time in seconds = 433.83 POST (+patch): ivb-ep sysbench: 2: [30 secs] transactions: 64494 (2149.75 per sec.) 5: [30 secs] transactions: 145114 (4836.99 per sec.) 10: [30 secs] transactions: 278311 (9276.69 per sec.) 20: [30 secs] transactions: 437169 (14571.60 per sec.) 40: [30 secs] transactions: 669837 (22326.73 per sec.) 80: [30 secs] transactions: 631739 (21055.88 per sec.) hsw-ex NAS: lu.C.x_threads_144_run_1.log: Time in seconds = 23.36 lu.C.x_threads_144_run_2.log: Time in seconds = 22.96 lu.C.x_threads_144_run_3.log: Time in seconds = 22.52 This patch takes out all the shiny wake_affine() stuff and goes back to utter basics. Between the two CPUs involved with the wakeup (the CPU doing the wakeup and the CPU we ran on previously) pick the CPU we can run on _now_. This restores much of the regressions against the older kernels, but leaves some ground in the overloaded case. The default-enabled WA_WEIGHT (which will be introduced in the next patch) is an attempt to address the overloaded situation. Reported-by: NEric Farman <farman@linux.vnet.ibm.com> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Matthew Rosato <mjrosato@linux.vnet.ibm.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rik van Riel <riel@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: jinpuwang@gmail.com Cc: vcaputo@pengaru.com Fixes: 3fed382b ("sched/numa: Implement NUMA node level wake_affine()") Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 09 10月, 2017 1 次提交
-
-
由 Shmulik Ladkani 提交于
Commit 2c16d603 ("netfilter: xt_bpf: support ebpf") introduced support for attaching an eBPF object by an fd, with the 'bpf_mt_check_v1' ABI expecting the '.fd' to be specified upon each IPT_SO_SET_REPLACE call. However this breaks subsequent iptables calls: # iptables -A INPUT -m bpf --object-pinned /sys/fs/bpf/xxx -j ACCEPT # iptables -A INPUT -s 5.6.7.8 -j ACCEPT iptables: Invalid argument. Run `dmesg' for more information. That's because iptables works by loading existing rules using IPT_SO_GET_ENTRIES to userspace, then issuing IPT_SO_SET_REPLACE with the replacement set. However, the loaded 'xt_bpf_info_v1' has an arbitrary '.fd' number (from the initial "iptables -m bpf" invocation) - so when 2nd invocation occurs, userspace passes a bogus fd number, which leads to 'bpf_mt_check_v1' to fail. One suggested solution [1] was to hack iptables userspace, to perform a "entries fixup" immediatley after IPT_SO_GET_ENTRIES, by opening a new, process-local fd per every 'xt_bpf_info_v1' entry seen. However, in [2] both Pablo Neira Ayuso and Willem de Bruijn suggested to depricate the xt_bpf_info_v1 ABI dealing with pinned ebpf objects. This fix changes the XT_BPF_MODE_FD_PINNED behavior to ignore the given '.fd' and instead perform an in-kernel lookup for the bpf object given the provided '.path'. It also defines an alias for the XT_BPF_MODE_FD_PINNED mode, named XT_BPF_MODE_PATH_PINNED, to better reflect the fact that the user is expected to provide the path of the pinned object. Existing XT_BPF_MODE_FD_ELF behavior (non-pinned fd mode) is preserved. References: [1] https://marc.info/?l=netfilter-devel&m=150564724607440&w=2 [2] https://marc.info/?l=netfilter-devel&m=150575727129880&w=2Reported-by: NRafael Buchbinder <rafi@rbk.ms> Signed-off-by: NShmulik Ladkani <shmulik.ladkani@gmail.com> Acked-by: NWillem de Bruijn <willemb@google.com> Acked-by: NDaniel Borkmann <daniel@iogearbox.net> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
- 04 10月, 2017 11 次提交
-
-
由 K. Y. Srinivasan 提交于
This patch addresses the following bugs in the current rescind handling code: 1. Fixes a race condition where we may be invoking hv_process_channel_removal() on an already freed channel. 2. Prevents indefinite wait when rescinding sub-channels by correctly setting the probe_complete state. I would like to thank Dexuan for patiently reviewing earlier versions of this patch and identifying many of the issues fixed here. Greg, please apply this to 4.14-final. Fixes: '54a66265 ("Drivers: hv: vmbus: Fix rescind handling")' Signed-off-by: NK. Y. Srinivasan <kys@microsoft.com> Reviewed-by: NDexuan Cui <decui@microsoft.com> Cc: stable@vger.kernel.org # (4.13 and above) Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
-
由 Thomas Gleixner 提交于
The rework of the core hotplug code triggers the WARN_ON in start_wd_cpu() on powerpc because it is called multiple times for the boot CPU. The first call is via: start_wd_on_cpu+0x80/0x2f0 watchdog_nmi_reconfigure+0x124/0x170 softlockup_reconfigure_threads+0x110/0x130 lockup_detector_init+0xbc/0xe0 kernel_init_freeable+0x18c/0x37c kernel_init+0x2c/0x160 ret_from_kernel_thread+0x5c/0xbc And then again via the CPU hotplug registration: start_wd_on_cpu+0x80/0x2f0 cpuhp_invoke_callback+0x194/0x620 cpuhp_thread_fun+0x7c/0x1b0 smpboot_thread_fn+0x290/0x2a0 kthread+0x168/0x1b0 ret_from_kernel_thread+0x5c/0xbc This can be avoided by setting up the cpu hotplug state with nocalls and move the initialization to the watchdog_nmi_probe() function. That initializes the hotplug callbacks without invoking the callback and the following core initialization function then configures the watchdog for the online CPUs (in this case CPU0) via softlockup_reconfigure_threads(). Reported-and-tested-by: NMichael Ellerman <mpe@ellerman.id.au> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NMichael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: linuxppc-dev@lists.ozlabs.org
-
由 Thomas Gleixner 提交于
The recent cleanup of the watchdog code split watchdog_nmi_reconfigure() into two stages. One to stop the NMI and one to restart it after reconfiguration. That was done by adding a boolean 'run' argument to the code, which is functionally correct but not necessarily a piece of art. Replace it by two explicit functions: watchdog_nmi_stop() and watchdog_nmi_start(). Fixes: 6592ad2f ("watchdog/core, powerpc: Make watchdog_nmi_reconfigure() two stage") Requested-by: NLinus 'Nursing his pet-peeve' Torvalds <torvalds@linuxfoundation.org> Signed-off-by: NThomas 'Mopping up garbage' Gleixner <tglx@linutronix.de> Acked-by: NMichael Ellerman <mpe@ellerman.id.au> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Don Zickus <dzickus@redhat.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: linuxppc-dev@lists.ozlabs.org Link: http://lkml.kernel.org/r/alpine.DEB.2.20.1710021957480.2114@nanos
-
由 Linus Walleij 提交于
In may, Steven sent a patch deleting the bounce buffer handling and the CONFIG_MMC_BLOCK_BOUNCE option. I chose the less invasive path of making it a runtime config option, and we merged that successfully for kernel v4.12. The code is however just standing in the way and taking up space for seemingly no gain on any systems in wide use today. Pierre says the code was there to improve speed on TI SDHCI controllers on certain HP laptops and possibly some Ricoh controllers as well. Early SDHCI controllers lacked the scatter-gather feature, which made software bounce buffers a significant speed boost. We are clearly talking about the list of SDHCI PCI-based MMC/SD card readers found in the pci_ids[] list in drivers/mmc/host/sdhci-pci-core.c. The TI SDHCI derivative is not supported by the upstream kernel. This leaves the Ricoh. What we can however notice is that the x86 defconfigs in the kernel did not enable CONFIG_MMC_BLOCK_BOUNCE option, which means that any such laptop would have to have a custom configured kernel to actually take advantage of this bounce buffer speed-up. It simply seems like there was a speed optimization for the Ricoh controllers that noone was using. (I have not checked the distro defconfigs but I am pretty sure the situation is the same there.) Bounce buffers increased performance on the OMAP HSMMC at one point, and was part of the original submission in commit a45c6cb8 ("[ARM] 5369/1: omap mmc: Add new omap hsmmc controller for 2430 and 34xx, v3") This optimization was removed in commit 0ccd76d4 ("omap_hsmmc: Implement scatter-gather emulation") which found that scatter-gather emulation provided even better performance. The same was introduced for SDHCI in commit 2134a922 ("sdhci: scatter-gather (ADMA) support") I am pretty positively convinced that software scatter-gather emulation will do for any host controller what the bounce buffers were doing. Essentially, the bounce buffer was a reimplementation of software scatter-gather-emulation in the MMC subsystem, and it should be done away with. Cc: Pierre Ossman <pierre@ossman.eu> Cc: Juha Yrjola <juha.yrjola@solidboot.com> Cc: Steven J. Hill <Steven.Hill@cavium.com> Cc: Shawn Lin <shawn.lin@rock-chips.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Suggested-by: NSteven J. Hill <Steven.Hill@cavium.com> Suggested-by: NShawn Lin <shawn.lin@rock-chips.com> Signed-off-by: NLinus Walleij <linus.walleij@linaro.org> Signed-off-by: NUlf Hansson <ulf.hansson@linaro.org>
-
由 Mike Rapoport 提交于
Before commit 9c5d760b ("mm: split gfp_mask and mapping flags into separate fields") the private_* fields of struct adrress_space were grouped together and using "ditto" in comments describing the last fields was correct. With introduction of gpf_mask between private_lock and private_list "ditto" references the wrong description. Fix it by using the elaborate description. Link: http://lkml.kernel.org/r/1507009987-8746-1-git-send-email-rppt@linux.vnet.ibm.comSigned-off-by: NMike Rapoport <rppt@linux.vnet.ibm.com> Cc: Michal Hocko <mhocko@suse.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 YASUAKI ISHIMATSU 提交于
pfn_to_section_nr() and section_nr_to_pfn() are defined as macro. pfn_to_section_nr() has no issue even if it is defined as macro. But section_nr_to_pfn() has overflow issue if sec is defined as int. section_nr_to_pfn() just shifts sec by PFN_SECTION_SHIFT. If sec is defined as unsigned long, section_nr_to_pfn() returns pfn as 64 bit value. But if sec is defined as int, section_nr_to_pfn() returns pfn as 32 bit value. __remove_section() calculates start_pfn using section_nr_to_pfn() and scn_nr defined as int. So if hot-removed memory address is over 16TB, overflow issue occurs and section_nr_to_pfn() does not calculate correct pfn. To make callers use proper arg, the patch changes the macros to inline functions. Fixes: 815121d2 ("memory_hotplug: clear zone when removing the memory") Link: http://lkml.kernel.org/r/e643a387-e573-6bbf-d418-c60c8ee3d15e@gmail.comSigned-off-by: NYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: Xishi Qiu <qiuxishi@huawei.com> Cc: Reza Arbab <arbab@linux.vnet.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Masahiro Yamada 提交于
I do not see anything that restricts this macro to 32 bit width. Link: http://lkml.kernel.org/r/1505921975-23379-1-git-send-email-yamada.masahiro@socionext.comSigned-off-by: NMasahiro Yamada <yamada.masahiro@socionext.com> Acked-by: NJakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Oleg Nesterov 提交于
Patch series "exec: binfmt_misc: fix use-after-free, kill iname[BINPRM_BUF_SIZE]". It looks like this code was always wrong, then commit 948b701a ("binfmt_misc: add persistent opened binary handler for containers") added more problems. This patch (of 6): load_script() can simply use i_name instead, it points into bprm->buf[] and nobody can change this memory until we call prepare_binprm(). The only complication is that we need to also change the signature of bprm_change_interp() but this change looks good too. While at it, do whitespace/style cleanups. NOTE: the real motivation for this change is that people want to increase BINPRM_BUF_SIZE, we need to change load_misc_binary() too but this looks more complicated because afaics it is very buggy. Link: http://lkml.kernel.org/r/20170918163446.GA26793@redhat.comSigned-off-by: NOleg Nesterov <oleg@redhat.com> Acked-by: NKees Cook <keescook@chromium.org> Cc: Travis Gummels <tgummels@redhat.com> Cc: Ben Woodard <woodard@redhat.com> Cc: Jim Foraker <foraker1@llnl.gov> Cc: <tdhooge@llnl.gov> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Sherry Yang 提交于
Drop the global lru lock in isolate callback before calling zap_page_range which calls cond_resched, and re-acquire the global lru lock before returning. Also change return code to LRU_REMOVED_RETRY. Use mmput_async when fail to acquire mmap sem in an atomic context. Fix "BUG: sleeping function called from invalid context" errors when CONFIG_DEBUG_ATOMIC_SLEEP is enabled. Also restore mmput_async, which was initially introduced in commit ec8d7c14 ("mm, oom_reaper: do not mmput synchronously from the oom reaper context"), and was removed in commit 21292580 ("mm: oom: let oom_reap_task and exit_mmap run concurrently"). Link: http://lkml.kernel.org/r/20170914182231.90908-1-sherryy@android.com Fixes: f2517eb7 ("android: binder: Add global lru shrinker to binder") Signed-off-by: NSherry Yang <sherryy@android.com> Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Reported-by: NKyle Yan <kyan@codeaurora.org> Acked-by: NArve Hjønnevåg <arve@android.com> Acked-by: NMichal Hocko <mhocko@suse.com> Cc: Martijn Coenen <maco@google.com> Cc: Todd Kjos <tkjos@google.com> Cc: Riley Andrews <riandrews@android.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Hillf Danton <hdanton@sina.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Hoeun Ryu <hoeun.ryu@gmail.com> Cc: Christopher Lameter <cl@linux.com> Cc: Vegard Nossum <vegard.nossum@oracle.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Michal Hocko 提交于
Andrea has noticed that the oom_reaper doesn't invalidate the range via mmu notifiers (mmu_notifier_invalidate_range_start/end) and that can corrupt the memory of the kvm guest for example. tlb_flush_mmu_tlbonly already invokes mmu notifiers but that is not sufficient as per Andrea: "mmu_notifier_invalidate_range cannot be used in replacement of mmu_notifier_invalidate_range_start/end. For KVM mmu_notifier_invalidate_range is a noop and rightfully so. A MMU notifier implementation has to implement either ->invalidate_range method or the invalidate_range_start/end methods, not both. And if you implement invalidate_range_start/end like KVM is forced to do, calling mmu_notifier_invalidate_range in common code is a noop for KVM. For those MMU notifiers that can get away only implementing ->invalidate_range, the ->invalidate_range is implicitly called by mmu_notifier_invalidate_range_end(). And only those secondary MMUs that share the same pagetable with the primary MMU (like AMD iommuv2) can get away only implementing ->invalidate_range" As the callback is allowed to sleep and the implementation is out of hand of the MM it is safer to simply bail out if there is an mmu notifier registered. In order to not fail too early make the mm_has_notifiers check under the oom_lock and have a little nap before failing to give the current oom victim some more time to exit. [akpm@linux-foundation.org: coding-style fixes] Link: http://lkml.kernel.org/r/20170913113427.2291-1-mhocko@kernel.org Fixes: aac45363 ("mm, oom: introduce oom reaper") Signed-off-by: NMichal Hocko <mhocko@suse.com> Reported-by: NAndrea Arcangeli <aarcange@redhat.com> Reviewed-by: NAndrea Arcangeli <aarcange@redhat.com> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kirill A. Shutemov 提交于
There's a typo in recent change of VM_MPX definition. We want it to be VM_HIGH_ARCH_4, not VM_HIGH_ARCH_BIT_4. This bug does cause visible regressions. In arch_vma_name the vmflags are tested against VM_MPX. With the incorrect value of VM_MPX, a number of vmas (such as the stack) test positive and end up being marked as "[mpx]" in /proc/N/maps instead of their correct names. This confuses tools like rr which expect to be able to find familiar vmas. Fixes: df3735c5 ("x86,mpx: make mpx depend on x86-64 to free up VMA flag") Link: http://lkml.kernel.org/r/20170918140253.36856-1-kirill.shutemov@linux.intel.comSigned-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Reviewed-by: NRik van Riel <riel@redhat.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Kyle Huey <me@kylehuey.com> Cc: <stable@vger.kernel.org> [4.14+] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 29 9月, 2017 7 次提交
-
-
由 Artem Savkov 提交于
It is possible for ebt_in_hook to be triggered before ebt_table is assigned resulting in a NULL-pointer dereference. Make sure hooks are registered as the last step. Fixes: aee12a0a ("ebtables: remove nf_hook_register usage") Signed-off-by: NArtem Savkov <asavkov@redhat.com> Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
-
由 Peter Zijlstra 提交于
Currently TASK_PARKED is masqueraded as TASK_INTERRUPTIBLE, give it its own print state because it will not in fact get woken by regular wakeups and is a long-term state. This requires moving TASK_PARKED into the TASK_REPORT mask, and since that latter needs to be a contiguous bitmask, we need to shuffle the bits around a bit. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Markus reported that kthreads that idle using TASK_IDLE instead of TASK_INTERRUPTIBLE are reported in as TASK_UNINTERRUPTIBLE and things like htop mark those red. This is undesirable, so add an explicit state for TASK_IDLE. Reported-by: NMarkus Trippelsdorf <markus@trippelsdorf.de> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Remove yet another task-state char instance. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Convert trace_sched_switch to use the common task-state helpers and fix the "X" and "Z" order, possibly they ended up in the wrong order because TASK_REPORT has them in the wrong order too. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Bit patterns are easier in hex. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Currently get_task_state() and task_state_to_char() report different states, create a number of common helpers and unify the reported state space. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Signed-off-by: NIngo Molnar <mingo@kernel.org>
-
- 28 9月, 2017 3 次提交
-
-
由 Kees Cook 提交于
Modern kernel callback systems pass the structure associated with a given callback to the callback function. The timer callback remains one of the legacy cases where an arbitrary unsigned long argument continues to be passed as the callback argument. This has several problems: - This bloats the timer_list structure with a normally redundant .data field. - No type checking is being performed, forcing callbacks to do explicit type casts of the unsigned long argument into the object that was passed, rather than using container_of(), as done in most of the other callback infrastructure. - Neighboring buffer overflows can overwrite both the .function and the .data field, providing attackers with a way to elevate from a buffer overflow into a simplistic ROP-like mechanism that allows calling arbitrary functions with a controlled first argument. - For future Control Flow Integrity work, this creates a unique function prototype for timer callbacks, instead of allowing them to continue to be clustered with other void functions that take a single unsigned long argument. This adds a new timer initialization API, which will ultimately replace the existing setup_timer(), setup_{deferrable,pinned,etc}_timer() family, named timer_setup() (to mirror hrtimer_setup(), making instances of its use much easier to grep for). In order to support the migration of existing timers into the new callback arguments, timer_setup() casts its arguments to the existing legacy types, and explicitly passes the timer pointer as the legacy data argument. Once all setup_*timer() callers have been replaced with timer_setup(), the casts can be removed, and the data argument can be dropped with the timer expiration code changed to just pass the timer to the callback directly. Since the regular pattern of using container_of() during local variable declaration repeats the need for the variable type declaration to be included, this adds a helper modeled after other from_*() helpers that wrap container_of(), named from_timer(). This helper uses typeof(*variable), removing the type redundancy and minimizing the need for line wraps in forthcoming conversions from "unsigned data long" to "struct timer_list *" in the timer callbacks: -void callback(unsigned long data) +void callback(struct timer_list *t) { - struct some_data_structure *local = (struct some_data_structure *)data; + struct some_data_structure *local = from_timer(local, t, timer); Finally, in order to support the handful of timer users that perform open-coded assignments of the .function (and .data) fields, provide cast macros (TIMER_FUNC_TYPE and TIMER_DATA_TYPE) that can be used temporarily. Once conversion has been completed, these can be globally trivially removed. Signed-off-by: NKees Cook <keescook@chromium.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20170928133817.GA113410@beast
-
由 Raed Salem 提交于
Added check for the maximal number of flow counters attached to rule (FTE). Fixes: bd5251db ('net/mlx5_core: Introduce flow steering destination of type counter') Signed-off-by: NRaed Salem <raeds@mellanox.com> Reviewed-by: NMaor Gottlieb <maorg@mellanox.com> Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
-
由 Inbar Karmy 提交于
Currently, FPGA capability is located in (mdev)->caps.hca_cur, change the location to be (mdev)->caps.fpga, since hca_cur is reserved for HCA device capabilities. Fixes: e29341fb ("net/mlx5: FPGA, Add basic support for Innova") Signed-off-by: NInbar Karmy <inbark@mellanox.com> Signed-off-by: NSaeed Mahameed <saeedm@mellanox.com>
-
- 27 9月, 2017 1 次提交
-
-
由 Jean-Philippe Brucker 提交于
The definition of map_sg was split during a recent addition to iommu_ops. Put it back together. Fixes: add02cfd ("iommu: Introduce Interface for IOMMU TLB Flushing") Signed-off-by: NJean-Philippe Brucker <jean-philippe.brucker@arm.com> Signed-off-by: NJoerg Roedel <jroedel@suse.de>
-
- 26 9月, 2017 1 次提交
-
-
由 Peter Zijlstra 提交于
Add a sysfs file to one-time fail a specific state. This can be used to test the state rollback code paths. Something like this (hotplug-up.sh): #!/bin/bash echo 0 > /debug/sched_debug echo 1 > /debug/tracing/events/cpuhp/enable ALL_STATES=`cat /sys/devices/system/cpu/hotplug/states | cut -d':' -f1` STATES=${1:-$ALL_STATES} for state in $STATES do echo 0 > /sys/devices/system/cpu/cpu1/online echo 0 > /debug/tracing/trace echo Fail state: $state echo $state > /sys/devices/system/cpu/cpu1/hotplug/fail cat /sys/devices/system/cpu/cpu1/hotplug/fail echo 1 > /sys/devices/system/cpu/cpu1/online cat /debug/tracing/trace > hotfail-${state}.trace sleep 1 done Can be used to test for all possible rollback (barring multi-instance) scenarios on CPU-up, CPU-down is a trivial modification of the above. Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: bigeasy@linutronix.de Cc: efault@gmx.de Cc: rostedt@goodmis.org Cc: max.byungchul.park@gmail.com Link: https://lkml.kernel.org/r/20170920170546.972581715@infradead.org
-