- 03 1月, 2013 13 次提交
-
-
由 Bjorn Helgaas 提交于
This member is never initialized and never referenced, so remove it. Signed-off-by: NBjorn Helgaas <bhelgaas@google.com> Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
-
由 Rafael J. Wysocki 提交于
Drop the .bind() and .unbind() that have no more users from struct acpi_device_ops and remove all of the code referring to them from drivers/acpi/scan.c. Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: NYinghai Lu <yinghai@kernel.org> Acked-by: NToshi Kani <toshi.kani@hp.com>
-
由 Rafael J. Wysocki 提交于
Currently, the ACPI wakeup capability of PCI devices is set up in two different places, partially in acpi_pci_bind() where runtime wakeup is initialized and partially in platform_pci_wakeup_init(), where system wakeup is initialized. The cleanup is only done in acpi_pci_unbind() and it only covers runtime wakeup. Use the new .setup() and .cleanup() callbacks in struct acpi_bus_type to consolidate that code and do the setup and the cleanup each in one place. Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: NYinghai Lu <yinghai@kernel.org> Acked-by: NToshi Kani <toshi.kani@hp.com>
-
由 Rafael J. Wysocki 提交于
Add two new callbacks,.setup() and .cleanup(), struct acpi_bus_type and modify acpi_platform_notify() to call .setup() after executing acpi_bind_one() successfully and acpi_platform_notify_remove() to call .cleanup() before running acpi_unbind_one(). This will allow the users of struct acpi_bus_type, PCI in particular, to specify operations to be executed right after the given device has been associated with a companion struct acpi_device and right before it's going to be detached from that companion, respectively. The main motivation is to be able to get rid of acpi_pci_bind() and acpi_pci_unbind(), which are horrible horrible stuff. [In short, there are three problems with them: The way they populate the .bind() and .unbind() callbacks of ACPI devices is rather less than straightforward, they require special hotplug-specific paths to be present in the ACPI namespace scanning code and by the time acpi_pci_unbind() is called the PCI device object in question may not exist any more.] Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: NYinghai Lu <yinghai@kernel.org> Acked-by: NToshi Kani <toshi.kani@hp.com>
-
由 Rafael J. Wysocki 提交于
The callers of acpi_bus_add() usually assume that if it has succeeded, then a struct acpi_device object has been attached to the handle passed as the first argument. Unfortunately, however, this assumption is wrong, because acpi_bus_scan(), and acpi_bus_add() too as a result, may return a pointer to a different struct acpi_device object on success (it may be an object corresponding to one of the descendant ACPI nodes in the namespace scope below that handle). For this reason, the callers of acpi_bus_add() who care about whether or not a struct acpi_device object has been created for its first argument need to check that using acpi_bus_get_device() anyway, so the second argument of acpi_bus_add() is not really useful for them. The same observation applies to acpi_bus_scan() executed directly from acpi_scan_init(). Therefore modify the relevant callers of acpi_bus_add() to check the existence of the struct acpi_device in question with the help of acpi_bus_get_device() and drop the no longer necessary second argument of acpi_bus_add(). Accordingly, modify acpi_scan_init() to use acpi_bus_get_device() to get acpi_root and drop the no longer needed second argument of acpi_bus_scan(). Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: NYinghai Lu <yinghai@kernel.org> Acked-by: NToshi Kani <toshi.kani@hp.com>
-
由 Rafael J. Wysocki 提交于
After the removal of the second argument of acpi_bus_scan() there is no difference between the ACPI_BUS_ADD_MATCH and ACPI_BUS_ADD_START add types, so the add_type field in struct acpi_device may be replaced with a single flag. Do that calling the flag match_driver. Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: NYinghai Lu <yinghai@kernel.org> Acked-by: NToshi Kani <toshi.kani@hp.com>
-
由 Rafael J. Wysocki 提交于
Notice that acpi_bus_add() uses only 2 of its 4 arguments and redefine its header to match the body. Update all of its callers as necessary and observe that this leads to quite a number of removed lines of code (Linus will like that). Add a kerneldoc comment documenting acpi_bus_add() and wonder how its callers make wrong assumptions about the second argument (make note to self to take care of that later). Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: NYinghai Lu <yinghai@kernel.org> Acked-by: NToshi Kani <toshi.kani@hp.com>
-
由 Rafael J. Wysocki 提交于
The ACPI PCI root bridge driver was the only ACPI driver implementing the .start() callback, which isn't used by any ACPI drivers any more now. For this reason, acpi_start_single_object() has no purpose any more, so remove it and all references to it. Also remove acpi_bus_start_device(), whose only purpose was to call acpi_start_single_object(). Moreover, since after the removal of acpi_bus_start_device() the only purpose of acpi_bus_start() remains to call acpi_update_all_gpes(), move that into acpi_bus_add() and drop acpi_bus_start() too, remove its header from acpi_bus.h and update all of its former users accordingly. This change was previously proposed in a different from by Yinghai Lu. Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: NYinghai Lu <yinghai@kernel.org> Acked-by: NToshi Kani <toshi.kani@hp.com>
-
由 Rafael J. Wysocki 提交于
Notice that one member of struct acpi_bus_ops, acpi_op_add, is not used anywhere any more and the relationship between its remaining members, acpi_op_match and acpi_op_start, is such that it doesn't make sense to set the latter without setting the former at the same time. Therefore, replace struct acpi_bus_ops with new a enum type, enum acpi_bus_add_type, with three values, ACPI_BUS_ADD_BASIC, ACPI_BUS_ADD_MATCH, ACPI_BUS_ADD_START, corresponding to both acpi_op_match and acpi_op_start unset, acpi_op_match set and acpi_op_start unset, and both acpi_op_match and acpi_op_start set, respectively. Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: NYinghai Lu <yinghai@kernel.org> Acked-by: NToshi Kani <toshi.kani@hp.com>
-
由 Rafael J. Wysocki 提交于
Split the ACPI namespace scanning for devices into two passes, such that struct acpi_device objects are registerd in the first pass without probing ACPI drivers and the drivers are probed against them directly in the second pass. There are two main reasons for doing that. First, the ACPI PCI root bridge driver's .add() routine, acpi_pci_root_add(), causes struct pci_dev objects to be created for all PCI devices under the given root bridge. Usually, there are corresponding ACPI device nodes in the ACPI namespace for some of those devices and therefore there should be "companion" struct acpi_device objects to attach those struct pci_dev objects to. These struct acpi_device objects should exist when the corresponding struct pci_dev objects are created, but that is only guaranteed during boot and not during hotplug. This leads to a number of functional differences between the boot and the hotplug cases which are not strictly necessary and make the code more complicated. For example, this forces the ACPI PCI root bridge driver to defer the registration of the just created struct pci_dev objects and to use a special .start() callback routine, acpi_pci_root_start(), to make sure that all of the "companion" struct acpi_device objects will be present at PCI devices registration time during hotplug. If those differences can be eliminated, we will be able to consolidate the boot and hotplug code paths for the enumeration and registration of PCI devices and to reduce the complexity of that code quite a bit. The second reason is that, in general, it should be possible to resolve conflicts of resources assigned by the BIOS to different devices represented by ACPI namespace nodes before any drivers bind to them and before they are attached to "companion" objects representing physical devices (such as struct pci_dev). However, for this purpose we first need to enumerate all ACPI device nodes in the given namespace scope. Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: NYinghai Lu <yinghai@kernel.org> Acked-by: NToshi Kani <toshi.kani@hp.com>
-
由 David Howells 提交于
Empty files can get deleted by the patch program, so remove empty Kbuild files and their links from the parent Kbuilds. Signed-off-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Mel Gorman 提交于
Sasha was fuzzing with trinity and reported the following problem: BUG: sleeping function called from invalid context at kernel/mutex.c:269 in_atomic(): 1, irqs_disabled(): 0, pid: 6361, name: trinity-main 2 locks held by trinity-main/6361: #0: (&mm->mmap_sem){++++++}, at: [<ffffffff810aa314>] __do_page_fault+0x1e4/0x4f0 #1: (&(&mm->page_table_lock)->rlock){+.+...}, at: [<ffffffff8122f017>] handle_pte_fault+0x3f7/0x6a0 Pid: 6361, comm: trinity-main Tainted: G W 3.7.0-rc2-next-20121024-sasha-00001-gd95ef01-dirty #74 Call Trace: __might_sleep+0x1c3/0x1e0 mutex_lock_nested+0x29/0x50 mpol_shared_policy_lookup+0x2e/0x90 shmem_get_policy+0x2e/0x30 get_vma_policy+0x5a/0xa0 mpol_misplaced+0x41/0x1d0 handle_pte_fault+0x465/0x6a0 This was triggered by a different version of automatic NUMA balancing but in theory the current version is vunerable to the same problem. do_numa_page -> numa_migrate_prep -> mpol_misplaced -> get_vma_policy -> shmem_get_policy It's very unlikely this will happen as shared pages are not marked pte_numa -- see the page_mapcount() check in change_pte_range() -- but it is possible. To address this, this patch restores sp->lock as originally implemented by Kosaki Motohiro. In the path where get_vma_policy() is called, it should not be calling sp_alloc() so it is not necessary to treat the PTL specially. Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Tested-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Signed-off-by: NMel Gorman <mgorman@suse.de> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Hugh Dickins 提交于
Remove the unused argument (formerly no_context) from mpol_parse_str() and from mpol_to_str(). Signed-off-by: NHugh Dickins <hughd@google.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 27 12月, 2012 4 次提交
-
-
由 Christoffer Dall 提交于
Unfortunately with !CONFIG_PAGEFLAGS_EXTENDED, (!PageHead) is false, and (PageHead) is true, for tail pages. If this is indeed the intended behavior, which I doubt because it breaks cache cleaning on some ARM systems, then the nomenclature is highly problematic. This patch makes sure PageHead is only true for head pages and PageTail is only true for tail pages, and neither is true for non-compound pages. [ This buglet seems ancient - seems to have been introduced back in Apr 2008 in commit 6a1e7f77: "pageflags: convert to the use of new macros". And the reason nobody noticed is because the PageHead() tests are almost all about just sanity-checking, and only used on pages that are actual page heads. The fact that the old code returned true for tail pages too was thus not really noticeable. - Linus ] Signed-off-by: NChristoffer Dall <cdall@cs.columbia.edu> Acked-by: NAndrea Arcangeli <aarcange@redhat.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Will Deacon <Will.Deacon@arm.com> Cc: Steve Capper <Steve.Capper@arm.com> Cc: Christoph Lameter <cl@linux.com> Cc: stable@kernel.org # 2.6.26+ Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Li Zefan 提交于
sock->sk_cgrp_prioidx won't be used at all if CONFIG_NETPRIO_CGROUP=n. Signed-off-by: NLi Zefan <lizefan@huawei.com> Acked-by: NNeil Horman <nhorman@tuxdriver.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Andy Lutomirski 提交于
Otherwise it fails like this on cards like the Transcend 16GB SDHC card: mmc0: new SDHC card at address b368 mmcblk0: mmc0:b368 SDC 15.0 GiB mmcblk0: error -110 sending status command, retrying mmcblk0: error -84 transferring data, sector 0, nr 8, cmd response 0x900, card status 0xb0 Tested on my Lenovo x200 laptop. [bhelgaas: changelog] Signed-off-by: NAndy Lutomirski <luto@amacapital.net> Signed-off-by: NBjorn Helgaas <bhelgaas@google.com> Acked-by: NChris Ball <cjb@laptop.org> CC: Manoj Iyer <manoj.iyer@canonical.com> CC: stable@vger.kernel.org
-
由 Bjorn Helgaas 提交于
Add standard #defines for the Supported Link Speeds field in the PCIe Link Capabilities register. Note that prior to PCIe spec r3.0, these encodings were defined: 0001b 2.5GT/s Link speed supported 0010b 5.0GT/s and 2.5GT/s Link speed supported Starting with spec r3.0, these encodings refer to bits 0 and 1 in the Supported Link Speeds Vector in the Link Capabilities 2 register, and bits 0 and 1 there mean 2.5 GT/s and 5.0 GT/s, respectively. Therefore, code that followed r2.0 and interpreted 0x1 as 2.5GT/s and 0x2 as 5.0GT/s will continue to work, and we can identify a device using the new encodings because it will have a non-zero Link Capabilities 2 register. Signed-off-by: NBjorn Helgaas <bhelgaas@google.com>
-
- 26 12月, 2012 3 次提交
-
-
由 Eric W. Biederman 提交于
Oleg pointed out that in a pid namespace the sequence. - pid 1 becomes a zombie - setns(thepidns), fork,... - reaping pid 1. - The injected processes exiting. Can lead to processes attempting access their child reaper and instead following a stale pointer. That waitpid for init can return before all of the processes in the pid namespace have exited is also unfortunate. Avoid these problems by disabling the allocation of new pids in a pid namespace when init dies, instead of when the last process in a pid namespace is reaped. Pointed-out-by: NOleg Nesterov <oleg@redhat.com> Reviewed-by: NOleg Nesterov <oleg@redhat.com> Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
-
由 Jan Kara 提交于
We cannot wait for transaction commit in journal_unmap_buffer() because we hold page lock which ranks below transaction start. We solve the issue by bailing out of journal_unmap_buffer() and jbd2_journal_invalidatepage() with -EBUSY. Caller is then responsible for waiting for transaction commit to finish and try invalidation again. Since the issue can happen only for page stradding i_size, it is simple enough to manually call jbd2_journal_invalidatepage() for such page from ext4_setattr(), check the return value and wait if necessary. Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Jan Kara 提交于
In data=journal mode we don't need delalloc or DIO handling in invalidatepage and similarly in other modes we don't need the journal handling. So split invalidatepage implementations. Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 22 12月, 2012 6 次提交
-
-
由 Eric Dumazet 提交于
Using a seqlock for devnet_rename_seq is not a good idea, as device_rename() can sleep. As we hold RTNL, we dont need a protection for writers, and only need a seqcount so that readers can catch a change done by a writer. Bug added in commit c91f6df2 (sockopt: Change getsockopt() of SO_BINDTODEVICE to return an interface name) Reported-by: NDave Jones <davej@redhat.com> Signed-off-by: NEric Dumazet <edumazet@google.com> Cc: Brian Haley <brian.haley@hp.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
由 Mikulas Patocka 提交于
This patch removes map_info from bio-based device mapper targets. map_info is still used for request-based targets. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mikulas Patocka 提交于
This patch moves target_request_nr from map_info to dm_target_io and makes it accessible with dm_bio_get_target_request_nr. This patch is a preparation for the next patch that removes map_info. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mikulas Patocka 提交于
Introduce a field per_bio_data_size in struct dm_target. Targets can set this field in the constructor. If a target sets this field to a non-zero value, "per_bio_data_size" bytes of auxiliary data are allocated for each bio submitted to the target. These data can be used for any purpose by the target and help us improve performance by removing some per-target mempools. Per-bio data is accessed with dm_per_bio_data. The argument data_size must be the same as the value per_bio_data_size in dm_target. If the target has a pointer to per_bio_data, it can get a pointer to the bio with dm_bio_from_per_bio_data() function (data_size must be the same as the value passed to dm_per_bio_data). Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mike Snitzer 提交于
Allow targets to opt in to WRITE SAME support by setting 'num_write_same_requests' in the dm_target structure. A dm device will only advertise WRITE SAME support if all its targets and all its underlying devices support it. Signed-off-by: NMike Snitzer <snitzer@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
由 Mikulas Patocka 提交于
When allocating memory for the userspace ioctl data, set some appropriate GPF flags directly instead of using PF_MEMALLOC. Signed-off-by: NMikulas Patocka <mpatocka@redhat.com> Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
-
- 21 12月, 2012 14 次提交
-
-
由 Guenter Roeck 提交于
Commit 263a523d ("linux/kernel.h: Fix warning seen with W=1 due to change in DIV_ROUND_CLOSEST") fixes a warning seen with W=1 due to change in DIV_ROUND_CLOSEST. Unfortunately, the C compiler converts divide operations with unsigned divisors to unsigned, even if the dividend is signed and negative (for example, -10 / 5U = 858993457). The C standard says "If one operand has unsigned int type, the other operand is converted to unsigned int", so the compiler is not to blame. As a result, DIV_ROUND_CLOSEST(0, 2U) and similar operations now return bad values, since the automatic conversion of expressions such as "0 - 2U/2" to unsigned was not taken into account. Fix by checking for the divisor variable type when deciding which operation to perform. This fixes DIV_ROUND_CLOSEST(0, 2U), but still returns bad values for negative dividends divided by unsigned divisors. Mark the latter case as unsupported. One observed effect of this problem is that the s2c_hwmon driver reports a value of 4198403 instead of 0 if the ADC reads 0. Other impact is unpredictable. Problem is seen if the divisor is an unsigned variable or constant and the dividend is less than (divisor/2). Signed-off-by: NGuenter Roeck <linux@roeck-us.net> Reported-by: NJuergen Beisert <jbe@pengutronix.de> Tested-by: NJuergen Beisert <jbe@pengutronix.de> Cc: Jean Delvare <khali@linux-fr.org> Cc: <stable@vger.kernel.org> [3.7.x] Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kees Cook 提交于
If a series of scripts are executed, each triggering module loading via unprintable bytes in the script header, kernel stack contents can leak into the command line. Normally execution of binfmt_script and binfmt_misc happens recursively. However, when modules are enabled, and unprintable bytes exist in the bprm->buf, execution will restart after attempting to load matching binfmt modules. Unfortunately, the logic in binfmt_script and binfmt_misc does not expect to get restarted. They leave bprm->interp pointing to their local stack. This means on restart bprm->interp is left pointing into unused stack memory which can then be copied into the userspace argv areas. After additional study, it seems that both recursion and restart remains the desirable way to handle exec with scripts, misc, and modules. As such, we need to protect the changes to interp. This changes the logic to require allocation for any changes to the bprm->interp. To avoid adding a new kmalloc to every exec, the default value is left as-is. Only when passing through binfmt_script or binfmt_misc does an allocation take place. For a proof of concept, see DoTest.sh from: http://www.halfdog.net/Security/2012/LinuxKernelBinfmtScriptStackDataDisclosure/Signed-off-by: NKees Cook <keescook@chromium.org> Cc: halfdog <me@halfdog.net> Cc: P J P <ppandit@redhat.com> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Jeff Layton 提交于
Where we can pass in LOOKUP_DIRECTORY or LOOKUP_REVAL. Any other flags passed in here are currently ignored. Signed-off-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Jeff Layton 提交于
This function is expected to be called from path-based syscalls to help them decide whether to try the lookup and call again in the event that they got an -ESTALE return back on an earier try. Currently, we only retry the call once on an ESTALE error, but in the event that we decide that that's not enough in the future, we should be able to change the logic in this helper without too much effort. Signed-off-by: NJeff Layton <jlayton@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Alessio Igor Bogani 提交于
Commit 8e22cc88 removes the (un)lock_super function definitions but forgets to remove their prototypes. Signed-off-by: NAlessio Igor Bogani <abogani@kernel.org> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Marco Stornelli 提交于
Removed vmtruncate Signed-off-by: NMarco Stornelli <marco.stornelli@gmail.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Marco Stornelli 提交于
Removed vmtruncate Signed-off-by: NMarco Stornelli <marco.stornelli@gmail.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 David Howells 提交于
Mark as cancelled an operation that is in progress rather than pending at the time it is cancelled, and call fscache_complete_op() to cancel an operation so that blocked ops can be started. Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 David Howells 提交于
Convert the fscache_object event IDs from #defines into an enum. Also add an extra label to the enum to carry the event count and redefine the event mask in terms of that. Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 David Howells 提交于
Make a more complete truncate operation available to CacheFiles (including security checks and suchlike) so that it can use this to clear invalidated cache files. Signed-off-by: NDavid Howells <dhowells@redhat.com> Acked-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 David Howells 提交于
Provide a proper invalidation method rather than relying on the netfs retiring the cookie it has and getting a new one. The problem with this is that isn't easy for the netfs to make sure that it has completed/cancelled all its outstanding storage and retrieval operations on the cookie it is retiring. Instead, have the cache provide an invalidation method that will cancel or wait for all currently outstanding operations before invalidating the cache, and will cause new operations to queue up behind that. Whilst invalidation is in progress, some requests will be rejected until the cache can stack a barrier on the operation queue to cause new operations to be deferred behind it. Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 David Howells 提交于
Fix the state management of internal fscache operations and the accounting of what operations are in what states. This is done by: (1) Give struct fscache_operation a enum variable that directly represents the state it's currently in, rather than spreading this knowledge over a bunch of flags, who's processing the operation at the moment and whether it is queued or not. This makes it easier to write assertions to check the state at various points and to prevent invalid state transitions. (2) Add an 'operation complete' state and supply a function to indicate the completion of an operation (fscache_op_complete()) and make things call it. The final call to fscache_put_operation() can then check that an op in the appropriate state (complete or cancelled). (3) Adjust the use of object->n_ops, ->n_in_progress, ->n_exclusive to better govern the state of an object: (a) The ->n_ops is now the number of extant operations on the object and is now decremented by fscache_put_operation() only. (b) The ->n_in_progress is simply the number of objects that have been taken off of the object's pending queue for the purposes of being run. This is decremented by fscache_op_complete() only. (c) The ->n_exclusive is the number of exclusive ops that have been submitted and queued or are in progress. It is decremented by fscache_op_complete() and by fscache_cancel_op(). fscache_put_operation() and fscache_operation_gc() now no longer try to clean up ->n_exclusive and ->n_in_progress. That was leading to double decrements against fscache_cancel_op(). fscache_cancel_op() now no longer decrements ->n_ops. That was leading to double decrements against fscache_put_operation(). fscache_submit_exclusive_op() now decides whether it has to queue an op based on ->n_in_progress being > 0 rather than ->n_ops > 0 as the latter will persist in being true even after all preceding operations have been cancelled or completed. Furthermore, if an object is active and there are runnable ops against it, there must be at least one op running. (4) Add a remaining-pages counter (n_pages) to struct fscache_retrieval and provide a function to record completion of the pages as they complete. When n_pages reaches 0, the operation is deemed to be complete and fscache_op_complete() is called. Add calls to fscache_retrieval_complete() anywhere we've finished with a page we've been given to read or allocate for. This includes places where we just return pages to the netfs for reading from the server and where accessing the cache fails and we discard the proposed netfs page. The bugs in the unfixed state management manifest themselves as oopses like the following where the operation completion gets out of sync with return of the cookie by the netfs. This is possible because the cache unlocks and returns all the netfs pages before recording its completion - which means that there's nothing to stop the netfs discarding them and returning the cookie. FS-Cache: Cookie 'NFS.fh' still has outstanding reads ------------[ cut here ]------------ kernel BUG at fs/fscache/cookie.c:519! invalid opcode: 0000 [#1] SMP CPU 1 Modules linked in: cachefiles nfs fscache auth_rpcgss nfs_acl lockd sunrpc Pid: 400, comm: kswapd0 Not tainted 3.1.0-rc7-fsdevel+ #1090 /DG965RY RIP: 0010:[<ffffffffa007050a>] [<ffffffffa007050a>] __fscache_relinquish_cookie+0x170/0x343 [fscache] RSP: 0018:ffff8800368cfb00 EFLAGS: 00010282 RAX: 000000000000003c RBX: ffff880023cc8790 RCX: 0000000000000000 RDX: 0000000000002f2e RSI: 0000000000000001 RDI: ffffffff813ab86c RBP: ffff8800368cfb50 R08: 0000000000000002 R09: 0000000000000000 R10: ffff88003a1b7890 R11: ffff88001df6e488 R12: ffff880023d8ed98 R13: ffff880023cc8798 R14: 0000000000000004 R15: ffff88003b8bf370 FS: 0000000000000000(0000) GS:ffff88003bd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b CR2: 00000000008ba008 CR3: 0000000023d93000 CR4: 00000000000006e0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400 Process kswapd0 (pid: 400, threadinfo ffff8800368ce000, task ffff88003b8bf040) Stack: ffff88003b8bf040 ffff88001df6e528 ffff88001df6e528 ffffffffa00b46b0 ffff88003b8bf040 ffff88001df6e488 ffff88001df6e620 ffffffffa00b46b0 ffff88001ebd04c8 0000000000000004 ffff8800368cfb70 ffffffffa00b2c91 Call Trace: [<ffffffffa00b2c91>] nfs_fscache_release_inode_cookie+0x3b/0x47 [nfs] [<ffffffffa008f25f>] nfs_clear_inode+0x3c/0x41 [nfs] [<ffffffffa0090df1>] nfs4_evict_inode+0x2f/0x33 [nfs] [<ffffffff810d8d47>] evict+0xa1/0x15c [<ffffffff810d8e2e>] dispose_list+0x2c/0x38 [<ffffffff810d9ebd>] prune_icache_sb+0x28c/0x29b [<ffffffff810c56b7>] prune_super+0xd5/0x140 [<ffffffff8109b615>] shrink_slab+0x102/0x1ab [<ffffffff8109d690>] balance_pgdat+0x2f2/0x595 [<ffffffff8103e009>] ? process_timeout+0xb/0xb [<ffffffff8109dba3>] kswapd+0x270/0x289 [<ffffffff8104c5ea>] ? __init_waitqueue_head+0x46/0x46 [<ffffffff8109d933>] ? balance_pgdat+0x595/0x595 [<ffffffff8104bf7a>] kthread+0x7f/0x87 [<ffffffff813ad6b4>] kernel_thread_helper+0x4/0x10 [<ffffffff81026b98>] ? finish_task_switch+0x45/0xc0 [<ffffffff813abcdd>] ? retint_restore_args+0xe/0xe [<ffffffff8104befb>] ? __init_kthread_worker+0x53/0x53 [<ffffffff813ad6b0>] ? gs_change+0xb/0xb Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 David Howells 提交于
Make fscache_relinquish_cookie() log a warning and wait if there are any outstanding reads left on the cookie it was given. Signed-off-by: NDavid Howells <dhowells@redhat.com>
-
由 David Howells 提交于
Under some circumstances CacheFiles defers the marking of pages with PG_fscache so that it can take advantage of pagevecs to reduce the number of calls to fscache_mark_pages_cached() and the netfs's hook to keep track of this. There are, however, two problems with this: (1) It can lead to the PG_fscache mark being applied _after_ the page is set PG_uptodate and unlocked (by the call to fscache_end_io()). (2) CacheFiles's ref on the page is dropped immediately following fscache_end_io() - and so may not still be held when the mark is applied. This can lead to the page being passed back to the allocator before the mark is applied. Fix this by, where appropriate, marking the page before calling fscache_end_io() and releasing the page. This means that we can't take advantage of pagevecs and have to make a separate call for each page to the marking routines. The symptoms of this are Bad Page state errors cropping up under memory pressure, for example: BUG: Bad page state in process tar pfn:002da page:ffffea0000009fb0 count:0 mapcount:0 mapping: (null) index:0x1447 page flags: 0x1000(private_2) Pid: 4574, comm: tar Tainted: G W 3.1.0-rc4-fsdevel+ #1064 Call Trace: [<ffffffff8109583c>] ? dump_page+0xb9/0xbe [<ffffffff81095916>] bad_page+0xd5/0xea [<ffffffff81095d82>] get_page_from_freelist+0x35b/0x46a [<ffffffff810961f3>] __alloc_pages_nodemask+0x362/0x662 [<ffffffff810989da>] __do_page_cache_readahead+0x13a/0x267 [<ffffffff81098942>] ? __do_page_cache_readahead+0xa2/0x267 [<ffffffff81098d7b>] ra_submit+0x1c/0x20 [<ffffffff8109900a>] ondemand_readahead+0x28b/0x29a [<ffffffff81098ee2>] ? ondemand_readahead+0x163/0x29a [<ffffffff810990ce>] page_cache_sync_readahead+0x38/0x3a [<ffffffff81091d8a>] generic_file_aio_read+0x2ab/0x67e [<ffffffffa008cfbe>] nfs_file_read+0xa4/0xc9 [nfs] [<ffffffff810c22c4>] do_sync_read+0xba/0xfa [<ffffffff81177a47>] ? security_file_permission+0x7b/0x84 [<ffffffff810c25dd>] ? rw_verify_area+0xab/0xc8 [<ffffffff810c29a4>] vfs_read+0xaa/0x13a [<ffffffff810c2a79>] sys_read+0x45/0x6c [<ffffffff813ac37b>] system_call_fastpath+0x16/0x1b As can be seen, PG_private_2 (== PG_fscache) is set in the page flags. Instrumenting fscache_mark_pages_cached() to verify whether page->mapping was set appropriately showed that sometimes it wasn't. This led to the discovery that sometimes the page has apparently been reclaimed by the time the marker got to see it. Reported-by: NM. Stevens <m@tippett.com> Signed-off-by: NDavid Howells <dhowells@redhat.com> Reviewed-by: NJeff Layton <jlayton@redhat.com>
-