- 13 3月, 2013 9 次提交
-
-
由 Tejun Heo 提交于
There are cases where workqueue users want to expose control knobs to userland. e.g. Unbound workqueues with custom attributes are scheduled to be used for writeback workers and depending on configuration it can be useful to allow admins to tinker with the priority or allowed CPUs. This patch implements workqueue_sysfs_register(), which makes the workqueue visible under /sys/bus/workqueue/devices/WQ_NAME. There currently are two attributes common to both per-cpu and unbound pools and extra attributes for unbound pools including nice level and cpumask. If alloc_workqueue*() is called with WQ_SYSFS, workqueue_sysfs_register() is called automatically as part of workqueue creation. This is the preferred method unless the workqueue user wants to apply workqueue_attrs before making the workqueue visible to userland. v2: Disallow exposing ordered workqueues as ordered workqueues can't be tuned in any way. Signed-off-by: NTejun Heo <tj@kernel.org>
-
由 Tejun Heo 提交于
Kay tells me the most appropriate place to expose workqueues to userland would be /sys/devices/virtual/workqueues/WQ_NAME which is symlinked to /sys/bus/workqueue/devices/WQ_NAME and that we're lacking a way to do that outside of driver core as virtual_device_parent() isn't exported and there's no inteface to conveniently create a virtual subsystem. This patch implements subsys_virtual_register() by factoring out subsys_register() from subsys_system_register() and using it with virtual_device_parent() as the origin directory. It's identical to subsys_system_register() other than the origin directory but we aren't gonna restrict the device names which should be used under it. This will be used to expose workqueue attributes to userland. Signed-off-by: NTejun Heo <tj@kernel.org> Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Kay Sievers <kay.sievers@vrfy.org>
-
由 Tejun Heo 提交于
We have cpulist_parse() but not cpumask_parse(). Implement it using bitmap_parse(). bitmap_parse() is weird in that it takes @len for a string in kernel-memory which also is inconsistent with bitmap_parselist(). Make cpumask_parse() calculate the length and don't expose the inconsistency to cpumask users. Maybe we can fix up bitmap_parse() later. This will be used to expose workqueue cpumask knobs to userland via sysfs. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Rusty Russell <rusty@rustcorp.com.au>
-
由 Tejun Heo 提交于
Adjusting max_active of or applying new workqueue_attrs to an ordered workqueue breaks its ordering guarantee. The former is obvious. The latter is because applying attrs creates a new pwq (pool_workqueue) and there is no ordering constraint between the old and new pwqs. Make apply_workqueue_attrs() and workqueue_set_max_active() trigger WARN_ON() if those operations are requested on an ordered workqueue and fail / ignore respectively. Signed-off-by: NTejun Heo <tj@kernel.org> Reviewed-by: NLai Jiangshan <laijs@cn.fujitsu.com>
-
由 Tejun Heo 提交于
We're gonna add another internal WQ flag. Let's make the distinction clear. Prefix WQ_DRAINING with __ and move it to bit 16. Signed-off-by: NTejun Heo <tj@kernel.org> Reviewed-by: NLai Jiangshan <laijs@cn.fujitsu.com>
-
由 Tejun Heo 提交于
Implement apply_workqueue_attrs() which applies workqueue_attrs to the specified unbound workqueue by creating a new pwq (pool_workqueue) linked to worker_pool with the specified attributes. A new pwq is linked at the head of wq->pwqs instead of tail and __queue_work() verifies that the first unbound pwq has positive refcnt before choosing it for the actual queueing. This is to cover the case where creation of a new pwq races with queueing. As base ref on a pwq won't be dropped without making another pwq the first one, __queue_work() is guaranteed to make progress and not add work item to a dead pwq. init_and_link_pwq() is updated to return the last first pwq the new pwq replaced, which is put by apply_workqueue_attrs(). Note that apply_workqueue_attrs() is almost identical to unbound pwq part of alloc_and_link_pwqs(). The only difference is that there is no previous first pwq. apply_workqueue_attrs() is implemented to handle such cases and replaces unbound pwq handling in alloc_and_link_pwqs(). Signed-off-by: NTejun Heo <tj@kernel.org> Reviewed-by: NLai Jiangshan <laijs@cn.fujitsu.com>
-
由 Tejun Heo 提交于
WQ_RESCUER is superflous. WQ_MEM_RECLAIM indicates that the user wants a rescuer and testing wq->rescuer for NULL can answer whether a given workqueue has a rescuer or not. Drop WQ_RESCUER and test wq->rescuer directly. This will help simplifying __alloc_workqueue_key() failure path by allowing it to use destroy_workqueue() on a partially constructed workqueue, which in turn will help implementing dynamic management of pool_workqueues. While at it, clear wq->rescuer after freeing it in destroy_workqueue(). This is a precaution as scheduled changes will make destruction more complex. This patch doesn't introduce any functional changes. Signed-off-by: NTejun Heo <tj@kernel.org> Reviewed-by: NLai Jiangshan <laijs@cn.fujitsu.com>
-
由 Tejun Heo 提交于
Introduce struct workqueue_attrs which carries worker attributes - currently the nice level and allowed cpumask along with helper routines alloc_workqueue_attrs() and free_workqueue_attrs(). Each worker_pool now carries ->attrs describing the attributes of its workers. All functions dealing with cpumask and nice level of workers are updated to follow worker_pool->attrs instead of determining them from other characteristics of the worker_pool, and init_workqueues() is updated to set worker_pool->attrs appropriately for all standard pools. Note that create_worker() is updated to always perform set_user_nice() and use set_cpus_allowed_ptr() combined with manual assertion of PF_THREAD_BOUND instead of kthread_bind(). This simplifies handling random attributes without affecting the outcome. This patch doesn't introduce any behavior changes. v2: Missing cpumask_var_t definition caused build failure on some archs. linux/cpumask.h included. Signed-off-by: NTejun Heo <tj@kernel.org> Reported-by: Nkbuild test robot <fengguang.wu@intel.com> Reviewed-by: NLai Jiangshan <laijs@cn.fujitsu.com>
-
由 Tejun Heo 提交于
Workqueue is mixing unsigned int and int for @cpu variables. There's no point in using unsigned int for cpus - many of cpu related APIs take int anyway. Consistently use int for @cpu variables so that we can use negative values to mark special ones. This patch doesn't introduce any visible behavior changes. Signed-off-by: NTejun Heo <tj@kernel.org> Reviewed-by: NLai Jiangshan <laijs@cn.fujitsu.com>
-
- 05 3月, 2013 1 次提交
-
-
由 Lai Jiangshan 提交于
When a work item is off-queue, its work->data contains WORK_STRUCT_* and WORK_OFFQ_* flags. As WORK_OFFQ_* flags are used only while a work item is off-queue, it can occupy bits of work->data which aren't used while off-queue. WORK_OFFQ_* currently only use bits used by on-queue CWQ pointer. As color bits aren't used while off-queue, there's no reason to not use them. Lower WORK_OFFQ_FLAG_BASE from WORK_STRUCT_FLAG_BITS to WORK_STRUCT_COLOR_SHIFT thus giving 4 more bits to off-queue flag space which is also used to record worker_pool ID while off-queue. This doesn't introduce any visible behavior difference. tj: Rewrote the description. Signed-off-by: NLai Jiangshan <laijs@cn.fujitsu.com> Signed-off-by: NTejun Heo <tj@kernel.org>
-
- 03 3月, 2013 3 次提交
-
-
由 James Hogan 提交于
Commit cc2383ec ("mm: introduce arch-specific vma flag VM_ARCH_1") merged in v3.7-rc1. The above commit combined several arch-specific vma flags into one, and in the process it changed the VM_GROWSUP definition to depend on specific architectures rather than CONFIG_STACK_GROWSUP. Therefore add an ifdef for CONFIG_METAG to also set VM_GROWSUP. Signed-off-by: NJames Hogan <james.hogan@imgtec.com> Cc: Konstantin Khlebnikov <khlebnikov@openvz.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Michel Lespinasse <walken@google.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: linux-mm@kvack.org
-
由 James Hogan 提交于
Meta core internal interrupts (from HWSTATMETA and friends) are vectored onto the TR1 core trigger for the current thread. This is demultiplexed in irq-metag.c to individual Linux IRQs for each internal interrupt. External SoC interrupts (from HWSTATEXT and friends) are vectored onto the TR2 core trigger for the current thread. This is demultiplexed in irq-metag-ext.c to individual Linux IRQs for each external SoC interrupt. The external irqchip has devicetree bindings for configuring the number of irq banks and the type of masking available. Signed-off-by: NJames Hogan <james.hogan@imgtec.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Grant Likely <grant.likely@secretlab.ca> Cc: Rob Herring <rob.herring@calxeda.com> Cc: Rob Landley <rob@landley.net> Cc: Dom Cobley <popcornmix@gmail.com> Cc: Simon Arlott <simon@fire.lp0.eu> Cc: Viresh Kumar <viresh.kumar@linaro.org> Cc: Maxime Ripard <maxime.ripard@free-electrons.com> Cc: devicetree-discuss@lists.ozlabs.org Cc: linux-doc@vger.kernel.org
-
由 Yinghai Lu 提交于
Tim found: WARNING: at arch/x86/kernel/smpboot.c:324 topology_sane.isra.2+0x6f/0x80() Hardware name: S2600CP sched: CPU #1's llc-sibling CPU #0 is not on the same node! [node: 1 != 0]. Ignoring dependency. smpboot: Booting Node 1, Processors #1 Modules linked in: Pid: 0, comm: swapper/1 Not tainted 3.9.0-0-generic #1 Call Trace: set_cpu_sibling_map+0x279/0x449 start_secondary+0x11d/0x1e5 Don Morris reproduced on a HP z620 workstation, and bisected it to commit e8d19552 ("acpi, memory-hotplug: parse SRAT before memblock is ready") It turns out movable_map has some problems, and it breaks several things 1. numa_init is called several times, NOT just for srat. so those nodes_clear(numa_nodes_parsed) memset(&numa_meminfo, 0, sizeof(numa_meminfo)) can not be just removed. Need to consider sequence is: numaq, srat, amd, dummy. and make fall back path working. 2. simply split acpi_numa_init to early_parse_srat. a. that early_parse_srat is NOT called for ia64, so you break ia64. b. for (i = 0; i < MAX_LOCAL_APIC; i++) set_apicid_to_node(i, NUMA_NO_NODE) still left in numa_init. So it will just clear result from early_parse_srat. it should be moved before that.... c. it breaks ACPI_TABLE_OVERIDE...as the acpi table scan is moved early before override from INITRD is settled. 3. that patch TITLE is total misleading, there is NO x86 in the title, but it changes critical x86 code. It caused x86 guys did not pay attention to find the problem early. Those patches really should be routed via tip/x86/mm. 4. after that commit, following range can not use movable ram: a. real_mode code.... well..funny, legacy Node0 [0,1M) could be hot-removed? b. initrd... it will be freed after booting, so it could be on movable... c. crashkernel for kdump...: looks like we can not put kdump kernel above 4G anymore. d. init_mem_mapping: can not put page table high anymore. e. initmem_init: vmemmap can not be high local node anymore. That is not good. If node is hotplugable, the mem related range like page table and vmemmap could be on the that node without problem and should be on that node. We have workaround patch that could fix some problems, but some can not be fixed. So just remove that offending commit and related ones including: f7210e6c ("mm/memblock.c: use CONFIG_HAVE_MEMBLOCK_NODE_MAP to protect movablecore_map in memblock_overlaps_region().") 01a178a9 ("acpi, memory-hotplug: support getting hotplug info from SRAT") 27168d38 ("acpi, memory-hotplug: extend movablemem_map ranges to the end of node") e8d19552 ("acpi, memory-hotplug: parse SRAT before memblock is ready") fb06bc8e ("page_alloc: bootmem limit with movablecore_map") 42f47e27 ("page_alloc: make movablemem_map have higher priority") 6981ec31 ("page_alloc: introduce zone_movable_limit[] to keep movable limit for nodes") 34b71f1e ("page_alloc: add movable_memmap kernel parameter") 4d59a751 ("x86: get pg_data_t's memory from other node") Later we should have patches that will make sure kernel put page table and vmemmap on local node ram instead of push them down to node0. Also need to find way to put other kernel used ram to local node ram. Reported-by: NTim Gardner <tim.gardner@canonical.com> Reported-by: NDon Morris <don.morris@hp.com> Bisected-by: NDon Morris <don.morris@hp.com> Tested-by: NDon Morris <don.morris@hp.com> Signed-off-by: NYinghai Lu <yinghai@kernel.org> Cc: Tony Luck <tony.luck@intel.com> Cc: Thomas Renninger <trenn@suse.de> Cc: Tejun Heo <tj@kernel.org> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 02 3月, 2013 7 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Note that this thing does *not* contribute to inode refcount; it's pinned down by dentry. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Alasdair G Kergon 提交于
Add a num_write_bios function to struct target. If an instance of a target sets this, it will be queried before the target's mapping function is called on a write bio, and the response controls the number of copies of the write bio that the target will receive. This provides a convenient way for a target to send the same data to more than one device. The new cache target uses this in writethrough mode, to send the data both to the cache and the backing device. Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mikulas Patocka 提交于
This patch allows the administrator to reduce the rate at which kcopyd issues I/O. Each module that uses kcopyd acquires a throttle parameter that can be set in /sys/module/*/parameters. We maintain a history of kcopyd usage by each module in the variables io_period and total_period in struct dm_kcopyd_throttle. The actual kcopyd activity is calculated as a percentage of time equal to "(100 * io_period / total_period)". This is compared with the user-defined throttle percentage threshold and if it is exceeded, we sleep. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Alasdair G Kergon 提交于
Use 'bio' in the name of variables and functions that deal with bios rather than 'request' to avoid confusion with the normal block layer use of 'request'. No functional changes. Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mikulas Patocka 提交于
Avoid returning a truncated table or status string instead of setting the DM_BUFFER_FULL_FLAG when the last target of a table fills the buffer. When processing a table or status request, the function retrieve_status calls ti->type->status. If ti->type->status returns non-zero, retrieve_status assumes that the buffer overflowed and sets DM_BUFFER_FULL_FLAG. However, targets don't return non-zero values from their status method on overflow. Most targets returns always zero. If a buffer overflow happens in a target that is not the last in the table, it gets noticed during the next iteration of the loop in retrieve_status; but if a buffer overflow happens in the last target, it goes unnoticed and erroneously truncated data is returned. In the current code, the targets behave in the following way: * dm-crypt returns -ENOMEM if there is not enough space to store the key, but it returns 0 on all other overflows. * dm-thin returns errors from the status method if a disk error happened. This is incorrect because retrieve_status doesn't check the error code, it assumes that all non-zero values mean buffer overflow. * all the other targets always return 0. This patch changes the ti->type->status function to return void (because most targets don't use the return code). Overflow is detected in retrieve_status: if the status method fills up the remaining space completely, it is assumed that buffer overflow happened. Cc: stable@vger.kernel.org Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Randy Dunlap 提交于
Fix kernel-doc warnings in hsi files: Warning(include/linux/hsi/hsi.h:136): Excess struct/union/enum/typedef member 'e_handler' description in 'hsi_client' Warning(include/linux/hsi/hsi.h:136): Excess struct/union/enum/typedef member 'pclaimed' description in 'hsi_client' Warning(include/linux/hsi/hsi.h:136): Excess struct/union/enum/typedef member 'nb' description in 'hsi_client' Warning(drivers/hsi/hsi.c:434): No description found for parameter 'handler' Warning(drivers/hsi/hsi.c:434): Excess function parameter 'cb' description in 'hsi_register_port_event' Don't document "private:" fields with kernel-doc notation. If you want to leave them fully documented, that's OK, but then don't mark them as "private:". Signed-off-by: NRandy Dunlap <rdunlap@infradead.org> Cc: Carlos Chinea <carlos.chinea@nokia.com> Cc: Linus Walleij <linus.walleij@linaro.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: linux-kernel@vger.kernel.org Cc: linux-omap@vger.kernel.org Acked-by: NNishanth Menon <nm@ti.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 01 3月, 2013 5 次提交
-
-
由 Fabio Porcedda 提交于
Add support for watchdog drivers to initialize/set the timeout field of the watchdog_device structure. The timeout field is initialised either with the module timeout parameter value (if valid) or with the timeout-sec dt property (if valid). If both are invalid the initial value is unchanged. Signed-off-by: NFabio Porcedda <fabio.porcedda@gmail.com> Acked-by: NNicolas Ferre <nicolas.ferre@atmel.com> Signed-off-by: NWim Van Sebroeck <wim@iguana.be>
-
由 Hauke Mehrtens 提交于
Instead of accessing the function to set the watchdog timer directly, register a platform driver the platform could register to use this watchdog driver. Signed-off-by: NHauke Mehrtens <hauke@hauke-m.de> Signed-off-by: NWim Van Sebroeck <wim@iguana.be>
-
由 Wolfram Sang 提交于
This RTC also includes a watchdog timer. Provide an accessor function for setting the watchdog timeout value which will be picked up by a watchdog driver. Also register the platform_device for the watchdog here to get the boot-time dependencies right. Signed-off-by: NWolfram Sang <w.sang@pengutronix.de> Acked-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NWim Van Sebroeck <wim@iguana.be>
-
由 Weston Andros Adamson 提交于
The client will currently try LAYOUTGETs forever if a server is returning NFS4ERR_LAYOUTTRYLATER or NFS4ERR_RECALLCONFLICT - even if the client no longer needs the layout (ie process killed, unmounted). This patch uses the DS timeout value (module parameter 'dataserver_timeo' via rpc layer) to set an upper limit of how long the client tries LATOUTGETs in this situation. Once the timeout is reached, IO is redirected to the MDS. This also changes how the client checks if a layout is on the clp list to avoid a double list_add. Signed-off-by: NWeston Andros Adamson <dros@netapp.com> Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
由 Weston Andros Adamson 提交于
Returns the configured timeout for the xprt of the rpc client. Signed-off-by: NWeston Andros Adamson <dros@netapp.com> Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
-
- 28 2月, 2013 15 次提交
-
-
由 Arnd Bergmann 提交于
The original device tree binding for this driver, from Viresh Kumar unfortunately conflicted with the generic DMA binding, and did not allow to completely seperate slave device configuration from the controller. This is an attempt to replace it with an implementation of the generic binding, but it is currently completely untested, because I do not have any hardware with this particular controller. The patch applies on top of the slave-dma tree, which contains both the base support for the generic DMA binding, as well as the earlier attempt from Viresh. Both of these are currently not merged upstream however. This version incorporates feedback from Viresh Kumar, Andy Shevchenko and Russell King. Signed-off-by: NArnd Bergmann <arnd@arndb.de> Acked-by: NViresh Kumar <viresh.kumar@linaro.org> Acked-by: NAndy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Vinod Koul <vinod.koul@linux.intel.com> Cc: devicetree-discuss@lists.ozlabs.org Cc: linux-arm-kernel@lists.infradead.org Signed-off-by: NVinod Koul <vinod.koul@intel.com>
-
由 Sasha Levin 提交于
I'm not sure why, but the hlist for each entry iterators were conceived list_for_each_entry(pos, head, member) The hlist ones were greedy and wanted an extra parameter: hlist_for_each_entry(tpos, pos, head, member) Why did they need an extra pos parameter? I'm not quite sure. Not only they don't really need it, it also prevents the iterator from looking exactly like the list iterator, which is unfortunate. Besides the semantic patch, there was some manual work required: - Fix up the actual hlist iterators in linux/list.h - Fix up the declaration of other iterators based on the hlist ones. - A very small amount of places were using the 'node' parameter, this was modified to use 'obj->member' instead. - Coccinelle didn't handle the hlist_for_each_entry_safe iterator properly, so those had to be fixed up manually. The semantic patch which is mostly the work of Peter Senna Tschudin is here: @@ iterator name hlist_for_each_entry, hlist_for_each_entry_continue, hlist_for_each_entry_from, hlist_for_each_entry_rcu, hlist_for_each_entry_rcu_bh, hlist_for_each_entry_continue_rcu_bh, for_each_busy_worker, ax25_uid_for_each, ax25_for_each, inet_bind_bucket_for_each, sctp_for_each_hentry, sk_for_each, sk_for_each_rcu, sk_for_each_from, sk_for_each_safe, sk_for_each_bound, hlist_for_each_entry_safe, hlist_for_each_entry_continue_rcu, nr_neigh_for_each, nr_neigh_for_each_safe, nr_node_for_each, nr_node_for_each_safe, for_each_gfn_indirect_valid_sp, for_each_gfn_sp, for_each_host; type T; expression a,c,d,e; identifier b; statement S; @@ -T b; <+... when != b ( hlist_for_each_entry(a, - b, c, d) S | hlist_for_each_entry_continue(a, - b, c) S | hlist_for_each_entry_from(a, - b, c) S | hlist_for_each_entry_rcu(a, - b, c, d) S | hlist_for_each_entry_rcu_bh(a, - b, c, d) S | hlist_for_each_entry_continue_rcu_bh(a, - b, c) S | for_each_busy_worker(a, c, - b, d) S | ax25_uid_for_each(a, - b, c) S | ax25_for_each(a, - b, c) S | inet_bind_bucket_for_each(a, - b, c) S | sctp_for_each_hentry(a, - b, c) S | sk_for_each(a, - b, c) S | sk_for_each_rcu(a, - b, c) S | sk_for_each_from -(a, b) +(a) S + sk_for_each_from(a) S | sk_for_each_safe(a, - b, c, d) S | sk_for_each_bound(a, - b, c) S | hlist_for_each_entry_safe(a, - b, c, d, e) S | hlist_for_each_entry_continue_rcu(a, - b, c) S | nr_neigh_for_each(a, - b, c) S | nr_neigh_for_each_safe(a, - b, c, d) S | nr_node_for_each(a, - b, c) S | nr_node_for_each_safe(a, - b, c, d) S | - for_each_gfn_sp(a, c, d, b) S + for_each_gfn_sp(a, c, d) S | - for_each_gfn_indirect_valid_sp(a, c, d, b) S + for_each_gfn_indirect_valid_sp(a, c, d) S | for_each_host(a, - b, c) S | for_each_host_safe(a, - b, c, d) S | for_each_mesh_entry(a, - b, c, d) S ) ...+> [akpm@linux-foundation.org: drop bogus change from net/ipv4/raw.c] [akpm@linux-foundation.org: drop bogus hunk from net/ipv6/raw.c] [akpm@linux-foundation.org: checkpatch fixes] [akpm@linux-foundation.org: fix warnings] [akpm@linux-foudnation.org: redo intrusive kvm changes] Tested-by: NPeter Senna Tschudin <peter.senna@gmail.com> Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: NSasha Levin <sasha.levin@oracle.com> Cc: Wu Fengguang <fengguang.wu@intel.com> Cc: Marcelo Tosatti <mtosatti@redhat.com> Cc: Gleb Natapov <gleb@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Martin Sustrik 提交于
Comment in eventfd.h referred to 'include/asm-generic/fcntl.h' while the correct path is 'include/uapi/asm-generic/fcntl.h'. Signed-off-by: NMartin Sustrik <sustrik@250bpm.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Robert P. J. Day 提交于
Given the obvious distinction between kernel and userspace supported by uapi/, it seems unnecessary to comment on that. Signed-off-by: NRobert P. J. Day <rpjday@crashcourse.ca> Signed-off-by: NCorey Minyard <cminyard@mvista.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Tejun Heo 提交于
While idr lookup isn't a particularly heavy operation, it still is too substantial to use in hot paths without worrying about the performance implications. With recent changes, each idr_layer covers 256 slots which should be enough to cover most use cases with single idr_layer making lookup hint very attractive. This patch adds idr->hint which points to the idr_layer which allocated an ID most recently and the fast path lookup becomes if (look up target's prefix matches that of the hinted layer) return hint->ary[ID's offset in the leaf layer]; which can be inlined. idr->hint is set to the leaf node on idr_fill_slot() and cleared from free_layer(). [andriy.shevchenko@linux.intel.com: always do slow path when hint is uninitialized] Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Sasha Levin <sasha.levin@oracle.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Tejun Heo 提交于
Add a field which carries the prefix of ID the idr_layer covers. This will be used to implement lookup hint. This patch doesn't make use of the new field and doesn't introduce any behavior difference. Signed-off-by: NTejun Heo <tj@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Tejun Heo 提交于
With recent preloading changes, idr no longer keeps full layer cache per each idr instance (used to be ~6.5k per idr on 64bit) and the previous patch removed restriction on the bitmap size. Both now allow us to have larger layers. Increase IDR_BITS to 8 regardless of BITS_PER_LONG. Each layer is slightly larger than 2k on 64bit and 1k on 32bit and carries 256 entries. The size isn't too large, especially compared to what we used to waste on per-idr caches, and 256 entries should be able to serve most use cases with single layer. The max tree depth is 4 which is much better than the previous 6 on 64bit and 7 on 32bit. Signed-off-by: NTejun Heo <tj@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Tejun Heo 提交于
Currently, idr->bitmap is declared as an unsigned long which restricts the number of bits an idr_layer can contain. All bitops can handle arbitrary positive integer bit number and there's no reason for this restriction. Declare idr_layer->bitmap using DECLARE_BITMAP() instead of a single unsigned long. * idr_layer->bitmap is now an array. '&' dropped from params to bitops. * Replaced "== IDR_FULL" tests with bitmap_full() and removed IDR_FULL. * Replaced find_next_bit() on ~bitmap with find_next_zero_bit(). * Replaced "bitmap = 0" with bitmap_clear(). This patch doesn't (or at least shouldn't) introduce any behavior changes. [akpm@linux-foundation.org: checkpatch fixes] Signed-off-by: NTejun Heo <tj@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Tejun Heo 提交于
MAX_IDR_MASK is another weirdness in the idr interface. As idr covers whole positive integer range, it's defined as 0x7fffffff or INT_MAX. Its usage in idr_find(), idr_replace() and idr_remove() is bizarre. They basically mask off the sign bit and operate on the rest, so if the caller, by accident, passes in a negative number, the sign bit will be masked off and the remaining part will be used as if that was the input, which is worse than crashing. The constant is visible in idr.h and there are several users in the kernel. * drivers/i2c/i2c-core.c:i2c_add_numbered_adapter() Basically used to test if adap->nr is a negative number which isn't -1 and returns -EINVAL if so. idr_alloc() already has negative @start checking (w/ WARN_ON_ONCE), so this can go away. * drivers/infiniband/core/cm.c:cm_alloc_id() drivers/infiniband/hw/mlx4/cm.c:id_map_alloc() Used to wrap cyclic @start. Can be replaced with max(next, 0). Note that this type of cyclic allocation using idr is buggy. These are prone to spurious -ENOSPC failure after the first wraparound. * fs/super.c:get_anon_bdev() The ID allocated from ida is masked off before being tested whether it's inside valid range. ida allocated ID can never be a negative number and the masking is unnecessary. Update idr_*() functions to fail with -EINVAL when negative @id is specified and update other MAX_IDR_MASK users as described above. This leaves MAX_IDR_MASK without any user, remove it and relocate other MAX_IDR_* constants to lib/idr.c. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Jean Delvare <khali@linux-fr.org> Cc: Roland Dreier <roland@kernel.org> Cc: Sean Hefty <sean.hefty@intel.com> Cc: Hal Rosenstock <hal.rosenstock@gmail.com> Cc: "Marciniszyn, Mike" <mike.marciniszyn@intel.com> Cc: Jack Morgenstein <jackm@dev.mellanox.co.il> Cc: Or Gerlitz <ogerlitz@mellanox.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Acked-by: NWolfram Sang <wolfram@the-dreams.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Tejun Heo 提交于
The current idr interface is very cumbersome. * For all allocations, two function calls - idr_pre_get() and idr_get_new*() - should be made. * idr_pre_get() doesn't guarantee that the following idr_get_new*() will not fail from memory shortage. If idr_get_new*() returns -EAGAIN, the caller is expected to retry pre_get and allocation. * idr_get_new*() can't enforce upper limit. Upper limit can only be enforced by allocating and then freeing if above limit. * idr_layer buffer is unnecessarily per-idr. Each idr ends up keeping around MAX_IDR_FREE idr_layers. The memory consumed per idr is under two pages but it makes it difficult to make idr_layer larger. This patch implements the following new set of allocation functions. * idr_preload[_end]() - Similar to radix preload but doesn't fail. The first idr_alloc() inside preload section can be treated as if it were called with @gfp_mask used for idr_preload(). * idr_alloc() - Allocate an ID w/ lower and upper limits. Takes @gfp_flags and can be used w/o preloading. When used inside preloaded section, the allocation mask of preloading can be assumed. If idr_alloc() can be called from a context which allows sufficiently relaxed @gfp_mask, it can be used by itself. If, for example, idr_alloc() is called inside spinlock protected region, preloading can be used like the following. idr_preload(GFP_KERNEL); spin_lock(lock); id = idr_alloc(idr, ptr, start, end, GFP_NOWAIT); spin_unlock(lock); idr_preload_end(); if (id < 0) error; which is much simpler and less error-prone than idr_pre_get and idr_get_new*() loop. The new interface uses per-pcu idr_layer buffer and thus the number of idr's in the system doesn't affect the amount of memory used for preloading. idr_layer_alloc() is introduced to handle idr_layer allocations for both old and new ID allocation paths. This is a bit hairy now but the new interface is expected to replace the old and the internal implementation eventually will become simpler. Signed-off-by: NTejun Heo <tj@kernel.org> Cc: Rusty Russell <rusty@rustcorp.com.au> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Tejun Heo 提交于
idr uses -1, IDR_NEED_TO_GROW and IDR_NOMORE_SPACE to communicate exception conditions internally. The return value is later translated to errno values using _idr_rc_to_errno(). This is confusing. Drop the custom ones and consistently use -EAGAIN for "tree needs to grow", -ENOMEM for "need more memory" and -ENOSPC for "ran out of ID space". Due to the weird memory preloading mechanism, [ra]_get_new*() return -EAGAIN on memory shortage, so we need to substitute -ENOMEM w/ -EAGAIN on those interface functions. They'll eventually be cleaned up and the translations will go away. This patch doesn't introduce any functional changes. Signed-off-by: NTejun Heo <tj@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Tejun Heo 提交于
* Move idr_for_each_entry() definition next to other idr related definitions. * Make id[r|a]_get_new() inline wrappers of id[r|a]_get_new_above(). This changes the implementation of idr_get_new() but the new implementation is trivial. This patch doesn't introduce any functional change. Signed-off-by: NTejun Heo <tj@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Tejun Heo 提交于
* Tab align fields like a normal person. * Drop the unnecessary 0 inits from IDR_INIT(). This patch is purely cosmetic. Signed-off-by: NTejun Heo <tj@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Tejun Heo 提交于
There was only one legitimate use of idr_remove_all() and a lot more of incorrect uses (or lack of it). Now that idr_destroy() implies idr_remove_all() and all the in-kernel users updated not to use it, there's no reason to keep it around. Mark it deprecated so that we can later unexport it. idr_remove_all() is made an inline function calling __idr_remove_all() to avoid triggering deprecated warning on EXPORT_SYMBOL(). Signed-off-by: NTejun Heo <tj@kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mandeep Singh Baines 提交于
We shouldn't try_to_freeze if locks are held. Holding a lock can cause a deadlock if the lock is later acquired in the suspend or hibernate path (e.g. by dpm). Holding a lock can also cause a deadlock in the case of cgroup_freezer if a lock is held inside a frozen cgroup that is later acquired by a process outside that group. [akpm@linux-foundation.org: export debug_check_no_locks_held] Signed-off-by: NMandeep Singh Baines <msb@chromium.org> Cc: Ben Chan <benchan@chromium.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Tejun Heo <tj@kernel.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: Ingo Molnar <mingo@redhat.com> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-