- 31 7月, 2015 1 次提交
-
-
由 Yijing Wang 提交于
Rajat Jain reported a deadlock when PCIe hot-add and AER recovery happen at the same time: thread 1: pciehp_enable_slot pciehp_configure_device pci_bus_add_devices pci_bus_add_device device_attach device_lock(dev) # acquire device lock ... pciehp_probe init_slot pci_hp_register pci_create_slot down_write(pci_bus_sem) # deadlock here thread 2: aer_isr_one_error aer_process_err_device do_recovery broadcast_error_message(..., report_error_detected) pci_walk_bus(..., cb=report_error_detected, ...) down_read(&pci_bus_sem) # acquire pci_bus_sem report_error_detected(dev) # cb() device_lock(dev) # deadlock here Previously, the bus->devices and bus->slots list were protected by pci_bus_sem. In pci_create_slot(), we held it for writing so we could add to the bus->slots list. Add a new local pci_slot_mutex to protect bus->slots. Hold pci_bus_sem for reading while searching the bus->devices list. [bhelgaas: changelog] Link: http://lkml.kernel.org/r/CAA93t1qpPqbih+UB0McA_d_+2rVaNkXsinAUxYzK9+JXSS+L-g@mail.gmail.comReported-by: NRajat Jain <rajatja@google.com> Tested-by: NGuenter Roeck <linux@roeck-us.net> Signed-off-by: NYijing Wang <wangyijing@huawei.com> Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
-
- 10 7月, 2015 1 次提交
-
-
由 Ilya Dryomov 提交于
Grab a reference on a network namespace of the 'rbd map' (in case of rbd) or 'mount' (in case of ceph) process and use that to open sockets instead of always using init_net and bailing if network namespace is anything but init_net. Be careful to not share struct ceph_client instances between different namespaces and don't add any code in the !CONFIG_NET_NS case. This is based on a patch from Hong Zhiguo <zhiguohong@tencent.com>. Signed-off-by: NIlya Dryomov <idryomov@gmail.com> Reviewed-by: NSage Weil <sage@redhat.com>
-
- 09 7月, 2015 1 次提交
-
-
由 Thomas Gleixner 提交于
All users gone. Remove it before we get another one. Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 08 7月, 2015 3 次提交
-
-
由 Thomas Gleixner 提交于
When a cpu goes up some architectures (e.g. x86) have to walk the irq space to set up the vector space for the cpu. While this needs extra protection at the architecture level we can avoid a few race conditions by preventing the concurrent allocation/free of irq descriptors and the associated data. When a cpu goes down it moves the interrupts which are targeted to this cpu away by reassigning the affinities. While this happens interrupts can be allocated and freed, which opens a can of race conditions in the code which reassignes the affinities because interrupt descriptors might be freed underneath. Example: CPU1 CPU2 cpu_up/down irq_desc = irq_to_desc(irq); remove_from_radix_tree(desc); raw_spin_lock(&desc->lock); free(desc); We could protect the irq descriptors with RCU, but that would require a full tree change of all accesses to interrupt descriptors. But fortunately these kind of race conditions are rather limited to a few things like cpu hotplug. The normal setup/teardown is very well serialized. So the simpler and obvious solution is: Prevent allocation and freeing of interrupt descriptors accross cpu hotplug. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Acked-by: NPeter Zijlstra <peterz@infradead.org> Cc: xiao jin <jin.xiao@intel.com> Cc: Joerg Roedel <jroedel@suse.de> Cc: Borislav Petkov <bp@suse.de> Cc: Yanmin Zhang <yanmin_zhang@linux.intel.com> Link: http://lkml.kernel.org/r/20150705171102.063519515@linutronix.de
-
由 Thomas Gleixner 提交于
Making tick_broadcast_oneshot_control() independent from CONFIG_GENERIC_CLOCKEVENTS_BROADCAST broke the build for CONFIG_GENERIC_CLOCKEVENTS=n because the function is not defined there. Provide a proper stub inline. Fixes: f32dd117 'tick/broadcast: Make idle check independent from mode and config' Reported-by: Nkbuild test robot <fengguang.wu@intel.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
-
由 Thomas Gleixner 提交于
Currently the broadcast busy check, which prevents the idle code from going into deep idle, works only in one shot mode. If NOHZ and HIGHRES are off (config or command line) there is no sanity check at all, so under certain conditions cpus are allowed to go into deep idle, where the local timer stops, and are not woken up again because there is no broadcast timer installed or a hrtimer based broadcast device is not evaluated. Move tick_broadcast_oneshot_control() into the common code and provide proper subfunctions for the various config combinations. The common check in tick_broadcast_oneshot_control() is for the C3STOP misfeature flag of the local clock event device. If its not set, idle can proceed. If set, further checks are necessary. Provide checks for the trivial cases: - If broadcast is disabled in the config, then return busy - If oneshot mode (NOHZ/HIGHES) is disabled in the config, return busy if the broadcast device is hrtimer based. - If oneshot mode is enabled in the config call the original tick_broadcast_oneshot_control() function. That function needs extra checks which will be implemented in seperate patches. [ Split out from a larger combo patch ] Reported-and-tested-by: NSudeep Holla <sudeep.holla@arm.com> Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Suzuki Poulose <Suzuki.Poulose@arm.com> Cc: Lorenzo Pieralisi <Lorenzo.Pieralisi@arm.com> Cc: Catalin Marinas <Catalin.Marinas@arm.com> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Preeti U Murthy <preeti@linux.vnet.ibm.com> Cc: Ingo Molnar <mingo@kernel.org> Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1507070929360.3916@nanos
-
- 07 7月, 2015 2 次提交
-
-
由 Suthikulpanit, Suravee 提交于
Device drivers typically use ACPI _HIDs/_CIDs listed in struct device_driver acpi_match_table to match devices. However, for generic drivers, we do not want to list _HID for all supported devices. Also, certain classes of devices do not have _CID (e.g. SATA, USB). Instead, we can leverage ACPI _CLS, which specifies PCI-defined class code (i.e. base-class, subclass and programming interface). This patch adds support for matching ACPI devices using the _CLS method. To support loadable module, current design uses _HID or _CID to match device's modalias. With the new way of matching with _CLS this would requires modification to the current ACPI modalias key to include _CLS. This patch appends PCI-defined class-code to the existing ACPI modalias as following. acpi:<HID>:<CID1>:<CID2>:..:<CIDn>:<bbsspp>: E.g: # cat /sys/devices/platform/AMDI0600:00/modalias acpi:AMDI0600:010601: where bb is th base-class code, ss is te sub-class code, and pp is the programming interface code Since there would not be _HID/_CID in the ACPI matching table of the driver, this patch adds a field to acpi_device_id to specify the matching _CLS. static const struct acpi_device_id ahci_acpi_match[] = { { ACPI_DEVICE_CLASS(PCI_CLASS_STORAGE_SATA_AHCI, 0xffffff) }, {}, }; In this case, the corresponded entry in modules.alias file would be: alias acpi*:010601:* ahci_platform Acked-by: NMika Westerberg <mika.westerberg@linux.intel.com> Reviewed-by: NHanjun Guo <hanjun.guo@linaro.org> Signed-off-by: NSuravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
由 Rafael J. Wysocki 提交于
This effectively reverts the following three commits: 7bc10388 ACPI / resources: free memory on error in add_region_before() 0f1b414d ACPI / PNP: Avoid conflicting resource reservations b9a5e5e1 ACPI / init: Fix the ordering of acpi_reserve_resources() (commit b9a5e5e1 introduced regressions some of which, but not all, were addressed by commit 0f1b414d and commit 7bc10388 was a fixup on top of the latter) and causes ACPI fixed hardware resources to be reserved at the fs_initcall_sync stage of system initialization. The story is as follows. First, a boot regression was reported due to an apparent resource reservation ordering change after a commit that shouldn't lead to such changes. Investigation led to the conclusion that the problem happened because acpi_reserve_resources() was executed at the device_initcall() stage of system initialization which wasn't strictly ordered with respect to driver initialization (and with respect to the initialization of the pcieport driver in particular), so a random change causing the device initcalls to be run in a different order might break things. The response to that was to attempt to run acpi_reserve_resources() as soon as we knew that ACPI would be in use (commit b9a5e5e1). However, that turned out to be too early, because it caused resource reservations made by the PNP system driver to fail on at least one system and that failure was addressed by commit 0f1b414d. That fix still turned out to be insufficient, though, because calling acpi_reserve_resources() before the fs_initcall stage of system initialization caused a boot regression to happen on the eCAFE EC-800-H20G/S netbook. That meant that we only could call acpi_reserve_resources() at the fs_initcall initialization stage or later, but then we might just as well call it after the PNP initalization in which case commit 0f1b414d wouldn't be necessary any more. For this reason, the changes made by commit 0f1b414d are reverted (along with a memory leak fixup on top of that commit), the changes made by commit b9a5e5e1 that went too far are reverted too and acpi_reserve_resources() is changed into fs_initcall_sync, which will cause it to be executed after the PNP subsystem initialization (which is an fs_initcall) and before device initcalls (including the pcieport driver initialization) which should avoid the initial issue. Link: https://bugzilla.kernel.org/show_bug.cgi?id=100581 Link: http://marc.info/?t=143092384600002&r=1&w=2 Link: https://bugzilla.kernel.org/show_bug.cgi?id=99831 Link: http://marc.info/?t=143389402600001&r=1&w=2 Fixes: b9a5e5e1 "ACPI / init: Fix the ordering of acpi_reserve_resources()" Reported-by: NRoland Dreier <roland@purestorage.com> Cc: All applicable <stable@vger.kernel.org> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
- 05 7月, 2015 2 次提交
-
-
由 Allen Hubbe 提交于
Change ntb_hw_intel to use the new NTB hardware abstraction layer. Split ntb_transport into its own driver. Change it to use the new NTB hardware abstraction layer. Signed-off-by: NAllen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: NJon Mason <jdmason@kudzu.us>
-
由 Allen Hubbe 提交于
Abstract the NTB device behind a programming interface, so that it can support different hardware and client drivers. Signed-off-by: NAllen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: NJon Mason <jdmason@kudzu.us>
-
- 04 7月, 2015 3 次提交
-
-
由 Srikar Dronamraju 提交于
Currently print_cfs_rq() is declared in include/linux/sched.h. However it's not used outside kernel/sched. Hence move the declaration to kernel/sched/sched.h Also some functions are only available for CONFIG_SCHED_DEBUG=y. Hence move the declarations to within the #ifdef. Signed-off-by: NSrikar Dronamraju <srikar@linux.vnet.ibm.com> Acked-by: NRik van Riel <riel@redhat.com> Cc: Iulia Manda <iulia.manda21@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1435252903-1081-2-git-send-email-srikar@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Naveen N. Rao 提交于
Both CONFIG_SCHEDSTATS=y and CONFIG_TASK_DELAY_ACCT=y track task sched_info, which results in ugly #if clauses. Simplify the code by introducing a synthethic CONFIG_SCHED_INFO switch, selected by both. Signed-off-by: NNaveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: a.p.zijlstra@chello.nl Cc: ricklind@us.ibm.com Link: http://lkml.kernel.org/r/8d19eef800811a94b0f91bcbeb27430a884d7433.1435255405.git.naveen.n.rao@linux.vnet.ibm.comSigned-off-by: NIngo Molnar <mingo@kernel.org>
-
由 Peter Zijlstra 提交于
Commit 1cde2930 ("sched/preempt: Add static_key() to preempt_notifiers") had two problems. First, the preempt-notifier API needs to sleep with the addition of the static_key, we do however need to hold off preemption while modifying the preempt notifier list, otherwise a preemption could observe an inconsistent list state. KVM correctly registers and unregisters preempt notifiers with preemption disabled, so the sleep caused dmesg splats. Second, KVM registers and unregisters preemption notifiers very often (in vcpu_load/vcpu_put). With a single uniprocessor guest the static key would move between 0 and 1 continuously, hitting the slow path on every userspace exit. To fix this, wrap the static_key inc/dec in a new API, and call it from KVM. Fixes: 1cde2930 ("sched/preempt: Add static_key() to preempt_notifiers") Reported-by: NPontus Fuchs <pontus.fuchs@gmail.com> Reported-by: NTakashi Iwai <tiwai@suse.de> Tested-by: NTakashi Iwai <tiwai@suse.de> Signed-off-by: NPeter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: NPaolo Bonzini <pbonzini@redhat.com>
-
- 03 7月, 2015 1 次提交
-
-
由 Joel Porquet 提交于
At the moment the IRQCHIP_DECLARE macro is only declared locally in drivers/irqchip/irqchip.h. It prevents from using it directly in arch/* directories whenever irqchip drivers only exist there, which happens in a few cases (e.g. arc, arm, microblaze and mips). This patch makes the macro to be globally defined, i.e. in include/linux/irqchip.h, and thus usable for arch-specific declarations of irqchip drivers. In this way, it is very similar to what clocksource does (ie CLOCKSOURCE_OF_DECLARE is defined in include/linux/clocksource.h). For now, this patch only moves the declaration of the macro IRQCHIP_DECLARE to the global header 'include/linux/irqchip.h' and make 'drivers/irqchip/irqchip.h' include 'include/linux/irqchip.h'. Later, other patches will get rid of 'drivers/irqchip/irqchip.h' and modify all the impacted irqchip drivers. Signed-off-by: NJoel Porquet <joel@porquet.org> Cc: Jason Cooper <jason@lakedaemon.net> Link: http://lkml.kernel.org/r/1435865565-14114-1-git-send-email-joel@porquet.orgSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-
- 02 7月, 2015 3 次提交
-
-
由 Tejun Heo 提交于
52ebea74 ("writeback: make backing_dev_info host cgroup-specific bdi_writebacks") made bdi (backing_dev_info) host per-cgroup wb's (bdi_writeback's). As the congested state needs to be per-wb and referenced from blkcg side and multiple wbs, the patch made all non-root cong's (bdi_writeback_congested's) reference counted and indexed on bdi. When a bdi is destroyed, cgwb_bdi_destroy() tries to drain all non-root cong's; however, this can hang indefinitely because wb's can also be referenced from blkcg_gq's which are destroyed after bdi destruction is complete. To fix the bug, bdi destruction will be updated to not wait for cong's to drain, which naturally means that cong's may outlive the associated bdi. This is fine for non-root cong's but is problematic for the root cong's which are embedded in their bdi's as they may end up getting dereferenced after the containing bdi's are freed. This patch makes root cong's behave the same as non-root cong's. They are no longer embedded in their bdi's but allocated separately during bdi initialization, indexed and reference counted the same way. * As cong handling is the same for all wb's, wb->congested initialization is moved into wb_init(). * When !CONFIG_CGROUP_WRITEBACK, there was no indexing or refcnting. bdi->wb_congested is now a pointer pointing to the root cong allocated during bdi init and minimal refcnting operations are implemented. * The above makes root wb init paths diverge depending on CONFIG_CGROUP_WRITEBACK. root wb init is moved to cgwb_bdi_init(). This patch in itself shouldn't cause any consequential behavior differences but prepares for the actual fix. Signed-off-by: NTejun Heo <tj@kernel.org> Reported-by: NJon Christopherson <jon@jons.org> Link: https://bugzilla.kernel.org/show_bug.cgi?id=100681Tested-by: NJon Christopherson <jon@jons.org> Added <linux/slab.h> include to backing-dev.h for kfree() definition. Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Allen Hubbe 提交于
This patch only moves files to their new locations, before applying the next two patches adding the NTB Abstraction layer. Splitting this patch from the next is intended make distinct which code is changed only due to moving the files, versus which are substantial code changes in adding the NTB Abstraction layer. Signed-off-by: NAllen Hubbe <Allen.Hubbe@emc.com> Signed-off-by: NJon Mason <jdmason@kudzu.us>
-
由 Nikolay Borisov 提交于
sb_getblk() is used during ext4 (and possibly other FSes) writeback paths. Sometimes such path require allocating memory and guaranteeing that such allocation won't block. Currently, however, there is no way to provide user flags for sb_getblk which could lead to deadlocks. This patch implements a sb_getblk_gfp with the only difference it can accept user-provided GFP flags. Signed-off-by: NNikolay Borisov <kernel@kyup.com> Signed-off-by: NTheodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org
-
- 01 7月, 2015 17 次提交
-
-
由 Eric W. Biederman 提交于
Add two functions sysfs_create_mount_point and sysfs_remove_mount_point that hang a permanently empty directory off of a kobject or remove a permanently emptpy directory hanging from a kobject. Export these new functions so modular filesystems can use them. Cc: stable@vger.kernel.org Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
由 Eric W. Biederman 提交于
Add a new function kernfs_create_empty_dir that can be used to create directory that can not be modified. Update the code to use make_empty_dir_inode when reporting a permanently empty directory to the vfs. Update the code to not allow adding to permanently empty directories. Cc: stable@vger.kernel.org Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
由 Eric W. Biederman 提交于
Add a magic sysctl table sysctl_mount_point that when used to create a directory forces that directory to be permanently empty. Update the code to use make_empty_dir_inode when accessing permanently empty directories. Update the code to not allow adding to permanently empty directories. Update /proc/sys/fs/binfmt_misc to be a permanently empty directory. Cc: stable@vger.kernel.org Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
由 Eric W. Biederman 提交于
To ensure it is safe to mount proc and sysfs I need to check if filesystems that are mounted on top of them are mounted on truly empty directories. Given that some directories can gain entries over time, knowing that a directory is empty right now is insufficient. Therefore add supporting infrastructure for permantently empty directories that proc and sysfs can use when they create mount points for filesystems and fs_fully_visible can use to test for permanently empty directories to ensure that nothing will be gained by mounting a fresh copy of proc or sysfs. Cc: stable@vger.kernel.org Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
由 Eric Dumazet 提交于
Mateusz Guzik reported : Currently obtaining a new file descriptor results in locking fdtable twice - once in order to reserve a slot and second time to fill it. Holding the spinlock in __fd_install() is needed in case a resize is done, or to prevent a resize. Mateusz provided an RFC patch and a micro benchmark : http://people.redhat.com/~mguzik/pipebench.c A resize is an unlikely operation in a process lifetime, as table size is at least doubled at every resize. We can use RCU instead of the spinlock. __fd_install() must wait if a resize is in progress. The resize must block new __fd_install() callers from starting, and wait that ongoing install are finished (synchronize_sched()) resize should be attempted by a single thread to not waste resources. rcu_sched variant is used, as __fd_install() and expand_fdtable() run from process context. It gives us a ~30% speedup using pipebench on a dual Intel(R) Xeon(R) CPU E5-2696 v2 @ 2.50GHz Signed-off-by: NEric Dumazet <edumazet@google.com> Reported-by: NMateusz Guzik <mguzik@redhat.com> Acked-by: NMateusz Guzik <mguzik@redhat.com> Tested-by: NMateusz Guzik <mguzik@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Vladimir Zapolskiy 提交于
To be consistent with other kernel interface namings, rename of_get_named_gen_pool() to of_gen_pool_get(). In the original function name "_named" suffix references to a device tree property, which contains a phandle to a device and the corresponding device driver is assumed to register a gen_pool object. Due to a weak relation and to avoid any confusion (e.g. in future possible scenario if gen_pool objects are named) the suffix is removed. [sfr@canb.auug.org.au: crypto/marvell/cesa - fix up for of_get_named_gen_pool() rename] Signed-off-by: NVladimir Zapolskiy <vladimir_zapolskiy@mentor.com> Cc: Nicolas Ferre <nicolas.ferre@atmel.com> Cc: Philipp Zabel <p.zabel@pengutronix.de> Cc: Shawn Guo <shawn.guo@linaro.org> Cc: Sascha Hauer <kernel@pengutronix.de> Cc: Alexandre Belloni <alexandre.belloni@free-electrons.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: Vinod Koul <vinod.koul@intel.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Jaroslav Kysela <perex@perex.cz> Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au> Cc: Herbert Xu <herbert@gondor.apana.org.au> Cc: Boris BREZILLON <boris.brezillon@free-electrons.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Vladimir Zapolskiy 提交于
To be consistent with other genalloc interface namings, rename dev_get_gen_pool() to gen_pool_get(). The original omitted "dev_" prefix is removed, since it points to argument type of the function, and so it does not bring any useful information. [akpm@linux-foundation.org: update arch/arm/mach-socfpga/pm.c] Signed-off-by: NVladimir Zapolskiy <vladimir_zapolskiy@mentor.com> Acked-by: NNicolas Ferre <nicolas.ferre@atmel.com> Cc: Philipp Zabel <p.zabel@pengutronix.de> Cc: Shawn Guo <shawn.guo@linaro.org> Cc: Sascha Hauer <kernel@pengutronix.de> Cc: Alexandre Belloni <alexandre.belloni@free-electrons.com> Cc: Russell King <linux@arm.linux.org.uk> Cc: Mauro Carvalho Chehab <mchehab@osg.samsung.com> Cc: Vinod Koul <vinod.koul@intel.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Jaroslav Kysela <perex@perex.cz> Cc: Mark Brown <broonie@kernel.org> Cc: Nicolas Ferre <nicolas.ferre@atmel.com> Cc: Alan Tull <atull@opensource.altera.com> Cc: Dinh Nguyen <dinguyen@opensource.altera.com> Cc: Kevin Hilman <khilman@linaro.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Dave Gordon 提交于
do_device_access() takes a separate parameter to indicate the direction of data transfer, which it used to use to select the appropriate function out of sg_pcopy_{to,from}_buffer(). However these two functions now have So this patch makes it bypass these wrappers and call the underlying function sg_copy_buffer() directly; this has the same calling style as do_device_access() i.e. a separate direction-of-transfer parameter and no pointers-to-const, so skipping the wrappers not only eliminates the warning, it also make the code simpler :) [akpm@linux-foundation.org: fix very broken build] Signed-off-by: NDave Gordon <david.s.gordon@intel.com> Acked-by: NArnd Bergmann <arnd@arndb.de> Cc: James Bottomley <James.Bottomley@HansenPartnership.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Dave Gordon 提交于
The 'buf' parameter of sg(p)copy_from_buffer() can and should be const-qualified, although because of the shared implementation of _to_buffer() and _from_buffer(), we have to cast this away internally. This means that callers who have a 'const' buffer containing the data to be copied to the sg-list no longer have to cast away the const-ness themselves. It also enables improved coverage by code analysis tools. Signed-off-by: NDave Gordon <david.s.gordon@intel.com> Cc: Akinobu Mita <akinobu.mita@gmail.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 HATAYAMA Daisuke 提交于
Commit f06e5153 ("kernel/panic.c: add "crash_kexec_post_notifiers" option for kdump after panic_notifers") introduced "crash_kexec_post_notifiers" kernel boot option, which toggles wheather panic() calls crash_kexec() before panic_notifiers and dump kmsg or after. The problem is that the commit overlooks panic_on_oops kernel boot option. If it is enabled, crash_kexec() is called directly without going through panic() in oops path. To fix this issue, this patch adds a check to "crash_kexec_post_notifiers" in the condition of kexec_should_crash(). Also, put a comment in kexec_should_crash() to explain not obvious things on this patch. Signed-off-by: NHATAYAMA Daisuke <d.hatayama@jp.fujitsu.com> Acked-by: NBaoquan He <bhe@redhat.com> Tested-by: NHidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Reviewed-by: NMasami Hiramatsu <masami.hiramatsu.pt@hitachi.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Hidehiro Kawai <hidehiro.kawai.ez@hitachi.com> Cc: Baoquan He <bhe@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mel Gorman 提交于
Waiman Long reported that 24TB machines hit OOM during basic setup when struct page initialisation was deferred. One approach is to initialise memory on demand but it interferes with page allocator paths. This patch creates dedicated threads to initialise memory before basic setup. It then blocks on a rw_semaphore until completion as a wait_queue and counter is overkill. This may be slower to boot but it's simplier overall and also gets rid of a section mangling which existed so kswapd could do the initialisation. [akpm@linux-foundation.org: include rwsem.h, use DECLARE_RWSEM, fix comment, remove unneeded cast] Signed-off-by: NMel Gorman <mgorman@suse.de> Cc: Waiman Long <waiman.long@hp.com Cc: Nathan Zimmer <nzimmer@sgi.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Scott Norton <scott.norton@hp.com> Tested-by: NDaniel J Blueman <daniel@numascale.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mel Gorman 提交于
This patch initalises all low memory struct pages and 2G of the highest zone on each node during memory initialisation if CONFIG_DEFERRED_STRUCT_PAGE_INIT is set. That config option cannot be set but will be available in a later patch. Parallel initialisation of struct page depends on some features from memory hotplug and it is necessary to alter alter section annotations. Signed-off-by: NMel Gorman <mgorman@suse.de> Tested-by: NNate Zimmer <nzimmer@sgi.com> Tested-by: NWaiman Long <waiman.long@hp.com> Tested-by: NDaniel J Blueman <daniel@numascale.com> Acked-by: NPekka Enberg <penberg@kernel.org> Cc: Robin Holt <robinmholt@gmail.com> Cc: Nate Zimmer <nzimmer@sgi.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Waiman Long <waiman.long@hp.com> Cc: Scott Norton <scott.norton@hp.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mel Gorman 提交于
early_pfn_in_nid() and meminit_pfn_in_nid() are small functions that are unnecessarily visible outside memory initialisation. As well as unnecessary visibility, it's unnecessary function call overhead when initialising pages. This patch moves the helpers inline. [akpm@linux-foundation.org: fix build] [mhocko@suse.cz: fix build] Signed-off-by: NMel Gorman <mgorman@suse.de> Tested-by: NNate Zimmer <nzimmer@sgi.com> Tested-by: NWaiman Long <waiman.long@hp.com> Tested-by: NDaniel J Blueman <daniel@numascale.com> Acked-by: NPekka Enberg <penberg@kernel.org> Cc: Robin Holt <robinmholt@gmail.com> Cc: Nate Zimmer <nzimmer@sgi.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Waiman Long <waiman.long@hp.com> Cc: Scott Norton <scott.norton@hp.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NMichal Hocko <mhocko@suse.cz> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mel Gorman 提交于
__early_pfn_to_nid() use static variables to cache recent lookups as memblock lookups are very expensive but it assumes that memory initialisation is single-threaded. Parallel initialisation of struct pages will break that assumption so this patch makes __early_pfn_to_nid() SMP-safe by requiring the caller to cache recent search information. early_pfn_to_nid() keeps the same interface but is only safe to use early in boot due to the use of a global static variable. meminit_pfn_in_nid() is an SMP-safe version that callers must maintain their own state for. Signed-off-by: NMel Gorman <mgorman@suse.de> Tested-by: NNate Zimmer <nzimmer@sgi.com> Tested-by: NWaiman Long <waiman.long@hp.com> Tested-by: NDaniel J Blueman <daniel@numascale.com> Acked-by: NPekka Enberg <penberg@kernel.org> Cc: Robin Holt <robinmholt@gmail.com> Cc: Nate Zimmer <nzimmer@sgi.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Waiman Long <waiman.long@hp.com> Cc: Scott Norton <scott.norton@hp.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Nathan Zimmer 提交于
Currently each page struct is set as reserved upon initialization. This patch leaves the reserved bit clear and only sets the reserved bit when it is known the memory was allocated by the bootmem allocator. This makes it easier to distinguish between uninitialised struct pages and reserved struct pages in later patches. Signed-off-by: NRobin Holt <holt@sgi.com> Signed-off-by: NNathan Zimmer <nzimmer@sgi.com> Signed-off-by: NMel Gorman <mgorman@suse.de> Tested-by: NNate Zimmer <nzimmer@sgi.com> Tested-by: NWaiman Long <waiman.long@hp.com> Tested-by: NDaniel J Blueman <daniel@numascale.com> Acked-by: NPekka Enberg <penberg@kernel.org> Cc: Robin Holt <robinmholt@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Waiman Long <waiman.long@hp.com> Cc: Scott Norton <scott.norton@hp.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Robin Holt 提交于
Struct page initialisation had been identified as one of the reasons why large machines take a long time to boot. Patches were posted a long time ago to defer initialisation until they were first used. This was rejected on the grounds it should not be necessary to hurt the fast paths. This series reuses much of the work from that time but defers the initialisation of memory to kswapd so that one thread per node initialises memory local to that node. After applying the series and setting the appropriate Kconfig variable I see this in the boot log on a 64G machine [ 7.383764] kswapd 0 initialised deferred memory in 188ms [ 7.404253] kswapd 1 initialised deferred memory in 208ms [ 7.411044] kswapd 3 initialised deferred memory in 216ms [ 7.411551] kswapd 2 initialised deferred memory in 216ms On a 1TB machine, I see [ 8.406511] kswapd 3 initialised deferred memory in 1116ms [ 8.428518] kswapd 1 initialised deferred memory in 1140ms [ 8.435977] kswapd 0 initialised deferred memory in 1148ms [ 8.437416] kswapd 2 initialised deferred memory in 1148ms Once booted the machine appears to work as normal. Boot times were measured from the time shutdown was called until ssh was available again. In the 64G case, the boot time savings are negligible. On the 1TB machine, the savings were 16 seconds. Nate Zimmer said: : On an older 8 TB box with lots and lots of cpus the boot time, as : measure from grub to login prompt, the boot time improved from 1484 : seconds to exactly 1000 seconds. Waiman Long said: : I ran a bootup timing test on a 12-TB 16-socket IvyBridge-EX system. From : grub menu to ssh login, the bootup time was 453s before the patch and 265s : after the patch - a saving of 188s (42%). Daniel Blueman said: : On a 7TB, 1728-core NumaConnect system with 108 NUMA nodes, we're seeing : stock 4.0 boot in 7136s. This drops to 2159s, or a 70% reduction with : this patchset. Non-temporal PMD init (https://lkml.org/lkml/2015/4/23/350) : drops this to 1045s. This patch (of 13): As part of initializing struct page's in 2MiB chunks, we noticed that at the end of free_all_bootmem(), there was nothing which had forced the reserved/allocated 4KiB pages to be initialized. This helper function will be used for that expansion. Signed-off-by: NRobin Holt <holt@sgi.com> Signed-off-by: NNate Zimmer <nzimmer@sgi.com> Signed-off-by: NMel Gorman <mgorman@suse.de> Tested-by: NNate Zimmer <nzimmer@sgi.com> Tested-by: NWaiman Long <waiman.long@hp.com> Tested-by: NDaniel J Blueman <daniel@numascale.com> Acked-by: NPekka Enberg <penberg@kernel.org> Cc: Robin Holt <robinmholt@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Waiman Long <waiman.long@hp.com> Cc: Scott Norton <scott.norton@hp.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Dan Williams 提交于
Move the definition of __pmem outside of CONFIG_SPARSE_RCU_POINTER to fix: drivers/nvdimm/pmem.c:198:17: sparse: too many arguments for function __builtin_expect drivers/nvdimm/pmem.c:36:33: sparse: expected ; at end of declaration drivers/nvdimm/pmem.c:48:21: sparse: void declaration ...due to __pmem failing to be defined in some configurations when CONFIG_SPARSE_RCU_POINTER=y. Reported-by: Nkbuild test robot <fengguang.wu@intel.com> Reported-by: NDan Carpenter <dan.carpenter@oracle.com> Signed-off-by: NDan Williams <dan.j.williams@intel.com>
-
- 30 6月, 2015 1 次提交
-
-
由 Christoph Lameter 提交于
This patch restores the slab creation sequence that was broken by commit 4066c33d and also reverts the portions that introduced the KMALLOC_LOOP_XXX macros. Those can never really work since the slab creation is much more complex than just going from a minimum to a maximum number. The latest upstream kernel boots cleanly on my machine with a 64 bit x86 configuration under KVM using either SLAB or SLUB. Fixes: 4066c33d ("support the slub_debug boot option") Reported-by: NTheodore Ts'o <tytso@mit.edu> Signed-off-by: NChristoph Lameter <cl@linux.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 29 6月, 2015 1 次提交
-
-
由 Jean-Baptiste Theou 提交于
Currently, watchdog subsystem require the misc subsystem to register a watchdog. This may not be the case in case of an early registration of a watchdog, which can be required when the watchdog cannot be disabled. This patch introduces a deferral mechanism to remove this requirement. Signed-off-by: NJean-Baptiste Theou <jtheou@adeneo-embedded.us> Reviewed-by: NGuenter Roeck <linux@roeck-us.net> Signed-off-by: NWim Van Sebroeck <wim@iguana.be>
-
- 28 6月, 2015 1 次提交
-
-
由 Rusty Russell 提交于
As Dan Streetman points out, the entire point of locking for is to stop sysfs accesses, so they're elided entirely in the !SYSFS case. Reported-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NRusty Russell <rusty@rustcorp.com.au>
-
- 27 6月, 2015 3 次提交
-
-
由 Trond Myklebust 提交于
Make it so, by checking the return value for NFS4ERR_MOTSUPP and caching the information as a server capability. Signed-off-by: NTrond Myklebust <trond.myklebust@primarydata.com>
-
由 Thomas Gleixner 提交于
The main use case for the exisiting __irq_set_*_locked() inlines is to replace the handler [,chip and name] of an interrupt from a region which has the irq descriptor lock held, e.g. from the irq_set_type() callback. The first argument is the irq number, so the functions need so perform a pointless lookup of the interrupt descriptor for those cases which have the irq_data pointer handy. Provide new functions which take an irq_data pointer instead of the interrupt number, so the lookup of the interrupt descriptor can be avoided. Signed-off-by: NThomas Gleixner <tglx@linutronix.de> Cc: Jiang Liu <jiang.liu@linux.intel.com> Conflicts: include/linux/irqdesc.h
-
由 Jiang Liu 提交于
Introduce helper irq_desc_get_irq() to retrieve the irq number from the irq descriptor. Signed-off-by: NJiang Liu <jiang.liu@linux.intel.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Marc Zyngier <marc.zyngier@arm.com> Link: http://lkml.kernel.org/r/1433391238-19471-17-git-send-email-jiang.liu@linux.intel.comSigned-off-by: NThomas Gleixner <tglx@linutronix.de>
-