1. 19 6月, 2013 5 次提交
    • D
      FS-Cache: The retrieval remaining-pages counter needs to be atomic_t · 1bb4b7f9
      David Howells 提交于
      struct fscache_retrieval contains a count of the number of pages that still
      need some processing (n_pages).  This is decremented as the pages are
      processed.
      
      However, this needs to be atomic as fscache_retrieval_complete() (I think) just
      occasionally may be called from cachefiles_read_backing_file() and
      cachefiles_read_copier() simultaneously.
      
      This happens when an fscache_read_or_alloc_pages() request containing a lot of
      pages (say a couple of hundred) is being processed.  The read on each backing
      page is dispatched individually because we need to insert a monitor into the
      waitqueue to catch when the read completes.  However, under low-memory
      conditions, we might be forced to wait in the allocator - and this gives the
      I/O on the backing page a chance to complete first.
      
      When the I/O completes, fscache_enqueue_retrieval() chucks the retrieval onto
      the workqueue without waiting for the operation to finish the initial I/O
      dispatch (we want to release any pages we can as soon as we can), thus both can
      end up running simultaneously and potentially attempting to partially complete
      the retrieval simultaneously (ENOMEM may occur, backing pages may already be in
      the page cache).
      
      This was demonstrated by parallelling the non-atomic counter with an atomic
      counter and printing both of them when the assertion fails.  At this point, the
      atomic counter has reached zero, but the non-atomic counter has not.
      
      To fix this, make the counter an atomic_t.
      
      This results in the following bug appearing
      
      	FS-Cache: Assertion failed
      	3 == 5 is false
      	------------[ cut here ]------------
      	kernel BUG at fs/fscache/operation.c:421!
      
      or
      
      	FS-Cache: Assertion failed
      	3 == 5 is false
      	------------[ cut here ]------------
      	kernel BUG at fs/fscache/operation.c:414!
      
      With a backtrace like the following:
      
      RIP: 0010:[<ffffffffa0211b1d>] fscache_put_operation+0x1ad/0x240 [fscache]
      Call Trace:
       [<ffffffffa0213185>] fscache_retrieval_work+0x55/0x270 [fscache]
       [<ffffffffa0213130>] ? fscache_retrieval_work+0x0/0x270 [fscache]
       [<ffffffff81090b10>] worker_thread+0x170/0x2a0
       [<ffffffff81096d10>] ? autoremove_wake_function+0x0/0x40
       [<ffffffff810909a0>] ? worker_thread+0x0/0x2a0
       [<ffffffff81096966>] kthread+0x96/0xa0
       [<ffffffff8100c0ca>] child_rip+0xa/0x20
       [<ffffffff810968d0>] ? kthread+0x0/0xa0
       [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Reviewed-and-tested-By: NMilosz Tanski <milosz@adfin.com>
      Acked-by: NJeff Layton <jlayton@redhat.com>
      1bb4b7f9
    • D
      FS-Cache: Simplify cookie retention for fscache_objects, fixing oops · 1362729b
      David Howells 提交于
      Simplify the way fscache cache objects retain their cookie.  The way I
      implemented the cookie storage handling made synchronisation a pain (ie. the
      object state machine can't rely on the cookie actually still being there).
      
      Instead of the the object being detached from the cookie and the cookie being
      freed in __fscache_relinquish_cookie(), we defer both operations:
      
       (*) The detachment of the object from the list in the cookie now takes place
           in fscache_drop_object() and is thus governed by the object state machine
           (fscache_detach_from_cookie() has been removed).
      
       (*) The release of the cookie is now in fscache_object_destroy() - which is
           called by the cache backend just before it frees the object.
      
      This means that the fscache_cookie struct is now available to the cache all the
      way through from ->alloc_object() to ->drop_object() and ->put_object() -
      meaning that it's no longer necessary to take object->lock to guarantee access.
      
      However, __fscache_relinquish_cookie() doesn't wait for the object to go all
      the way through to destruction before letting the netfs proceed.  That would
      massively slow down the netfs.  Since __fscache_relinquish_cookie() leaves the
      cookie around, in must therefore break all attachments to the netfs - which
      includes ->def, ->netfs_data and any outstanding page read/writes.
      
      To handle this, struct fscache_cookie now has an n_active counter:
      
       (1) This starts off initialised to 1.
      
       (2) Any time the cache needs to get at the netfs data, it calls
           fscache_use_cookie() to increment it - if it is not zero.  If it was zero,
           then access is not permitted.
      
       (3) When the cache has finished with the data, it calls fscache_unuse_cookie()
           to decrement it.  This does a wake-up on it if it reaches 0.
      
       (4) __fscache_relinquish_cookie() decrements n_active and then waits for it to
           reach 0.  The initialisation to 1 in step (1) ensures that we only get
           wake ups when we're trying to get rid of the cookie.
      
      This leaves __fscache_relinquish_cookie() a lot simpler.
      
      
      ***
      This fixes a problem in the current code whereby if fscache_invalidate() is
      followed sufficiently quickly by fscache_relinquish_cookie() then it is
      possible for __fscache_relinquish_cookie() to have detached the cookie from the
      object and cleared the pointer before a thread is dispatched to process the
      invalidation state in the object state machine.
      
      Since the pending write clearance was deferred to the invalidation state to
      make it asynchronous, we need to either wait in relinquishment for the stores
      tree to be cleared in the invalidation state or we need to handle the clearance
      in relinquishment.
      
      Further, if the relinquishment code does clear the tree, then the invalidation
      state need to make the clearance contingent on still having the cookie to hand
      (since that's where the tree is rooted) and we have to prevent the cookie from
      disappearing for the duration.
      
      This can lead to an oops like the following:
      
      BUG: unable to handle kernel NULL pointer dereference at 000000000000000c
      ...
      RIP: 0010:[<ffffffff8151023e>] _spin_lock+0xe/0x30
      ...
      CR2: 000000000000000c ...
      ...
      Process kslowd002 (...)
      ....
      Call Trace:
       [<ffffffffa01c3278>] fscache_invalidate_writes+0x38/0xd0 [fscache]
       [<ffffffff810096f0>] ? __switch_to+0xd0/0x320
       [<ffffffff8105e759>] ? find_busiest_queue+0x69/0x150
       [<ffffffff8110ddd4>] ? slow_work_enqueue+0x104/0x180
       [<ffffffffa01c1303>] fscache_object_slow_work_execute+0x5e3/0x9d0 [fscache]
       [<ffffffff81096b67>] ? bit_waitqueue+0x17/0xd0
       [<ffffffff8110e233>] slow_work_execute+0x233/0x310
       [<ffffffff8110e515>] slow_work_thread+0x205/0x360
       [<ffffffff81096ca0>] ? autoremove_wake_function+0x0/0x40
       [<ffffffff8110e310>] ? slow_work_thread+0x0/0x360
       [<ffffffff81096936>] kthread+0x96/0xa0
       [<ffffffff8100c0ca>] child_rip+0xa/0x20
       [<ffffffff810968a0>] ? kthread+0x0/0xa0
       [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
      
      The parameter to fscache_invalidate_writes() was object->cookie which is NULL.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Tested-By: NMilosz Tanski <milosz@adfin.com>
      Acked-by: NJeff Layton <jlayton@redhat.com>
      1362729b
    • D
      FS-Cache: Fix object state machine to have separate work and wait states · caaef690
      David Howells 提交于
      Fix object state machine to have separate work and wait states as that makes
      it easier to envision.
      
      There are now three kinds of state:
      
       (1) Work state.  This is an execution state.  No event processing is performed
           by a work state.  The function attached to a work state returns a pointer
           indicating the next state to which the OSM should transition.  Returning
           NO_TRANSIT repeats the current state, but goes back to the scheduler
           first.
      
       (2) Wait state.  This is an event processing state.  No execution is
           performed by a wait state.  Wait states are just tables of "if event X
           occurs, clear it and transition to state Y".  The dispatcher returns to
           the scheduler if none of the events in which the wait state has an
           interest are currently pending.
      
       (3) Out-of-band state.  This is a special work state.  Transitions to normal
           states can be overridden when an unexpected event occurs (eg. I/O error).
           Instead the dispatcher disables and clears the OOB event and transits to
           the specified work state.  This then acts as an ordinary work state,
           though object->state points to the overridden destination.  Returning
           NO_TRANSIT resumes the overridden transition.
      
      In addition, the states have names in their definitions, so there's no need for
      tables of state names.  Further, the EV_REQUEUE event is no longer necessary as
      that is automatic for work states.
      
      Since the states are now separate structs rather than values in an enum, it's
      not possible to use comparisons other than (non-)equality between them, so use
      some object->flags to indicate what phase an object is in.
      
      The EV_RELEASE, EV_RETIRE and EV_WITHDRAW events have been squished into one
      (EV_KILL).  An object flag now carries the information about retirement.
      
      Similarly, the RELEASING, RECYCLING and WITHDRAWING states have been merged
      into an KILL_OBJECT state and additional states have been added for handling
      waiting dependent objects (JUMPSTART_DEPS and KILL_DEPENDENTS).
      
      A state has also been added for synchronising with parent object initialisation
      (WAIT_FOR_PARENT) and another for initiating look up (PARENT_READY).
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Tested-By: NMilosz Tanski <milosz@adfin.com>
      Acked-by: NJeff Layton <jlayton@redhat.com>
      caaef690
    • D
      FS-Cache: Wrap checks on object state · 493f7bc1
      David Howells 提交于
      Wrap checks on object state (mostly outside of fs/fscache/object.c) with
      inline functions so that the mechanism can be replaced.
      
      Some of the state checks within object.c are left as-is as they will be
      replaced.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Tested-By: NMilosz Tanski <milosz@adfin.com>
      Acked-by: NJeff Layton <jlayton@redhat.com>
      493f7bc1
    • D
      FS-Cache: Uninline fscache_object_init() · 610be24e
      David Howells 提交于
      Uninline fscache_object_init() so as not to expose some of the FS-Cache
      internals to the cache backend.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Tested-By: NMilosz Tanski <milosz@adfin.com>
      Acked-by: NJeff Layton <jlayton@redhat.com>
      610be24e
  2. 15 5月, 2013 1 次提交
    • D
      Add wait_on_atomic_t() and wake_up_atomic_t() · cb65537e
      David Howells 提交于
      Add wait_on_atomic_t() and wake_up_atomic_t() to indicate became-zero events on
      atomic_t types.  This uses the bit-wake waitqueue table.  The key is set to a
      value outside of the number of bits in a long so that wait_on_bit() won't be
      woken up accidentally.
      
      What I'm using this for is: in a following patch I add a counter to struct
      fscache_cookie to count the number of outstanding operations that need access
      to netfs data.  The way this works is:
      
       (1) When a cookie is allocated, the counter is initialised to 1.
      
       (2) When an operation wants to access netfs data, it calls atomic_inc_unless()
           to increment the counter before it does so.  If it was 0, then the counter
           isn't incremented, the operation isn't permitted to access the netfs data
           (which might by this point no longer exist) and the operation aborts in
           some appropriate manner.
      
       (3) When an operation finishes with the netfs data, it decrements the counter
           and if it reaches 0, calls wake_up_atomic_t() on it - the assumption being
           that it was the last blocker.
      
       (4) When a cookie is released, the counter is decremented and the releaser
           uses wait_on_atomic_t() to wait for the counter to become 0 - which should
           indicate no one is using the netfs data any longer.  The netfs data can
           then be destroyed.
      
      There are some alternatives that I have thought of and that have been suggested
      by Tejun Heo:
      
       (A) Using wait_on_bit() to wait on a bit in the counter.  This doesn't work
           because if that bit happens to be 0 then the wait won't happen - even if
           the counter is non-zero.
      
       (B) Using wait_on_bit() to wait on a flag elsewhere which is cleared when the
           counter reaches 0.  Such a flag would be redundant and would add
           complexity.
      
       (C) Adding a waitqueue to fscache_cookie - this would expand that struct by
           several words for an event that happens just once in each cookie's
           lifetime.  Further, cookies are generally per-file so there are likely to
           be a lot of them.
      
       (D) Similar to (C), but add a pointer to a waitqueue in the cookie instead of
           a waitqueue.  This would add single word per cookie and so would be less
           of an expansion - but still an expansion.
      
       (E) Adding a static waitqueue to the fscache module.  Generally this would be
           fine, but under certain circumstances many cookies will all get added at
           the same time (eg. NFS umount, cache withdrawal) thereby presenting
           scaling issues.  Note that the wait may be significant as disk I/O may be
           in progress.
      
      So, I think reusing the wait_on_bit() waitqueue set is reasonable.  I don't
      make much use of the waitqueue I need on a per-cookie basis, but sometimes I
      have a huge flood of the cookies to deal with.
      
      I also don't want to add a whole new set of global waitqueue tables
      specifically for the dec-to-0 event if I can reuse the bit tables.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Tested-By: NMilosz Tanski <milosz@adfin.com>
      Acked-by: NJeff Layton <jlayton@redhat.com>
      cb65537e
  3. 13 5月, 2013 1 次提交
  4. 12 5月, 2013 2 次提交
    • E
      ipv6: do not clear pinet6 field · f77d6021
      Eric Dumazet 提交于
      We have seen multiple NULL dereferences in __inet6_lookup_established()
      
      After analysis, I found that inet6_sk() could be NULL while the
      check for sk_family == AF_INET6 was true.
      
      Bug was added in linux-2.6.29 when RCU lookups were introduced in UDP
      and TCP stacks.
      
      Once an IPv6 socket, using SLAB_DESTROY_BY_RCU is inserted in a hash
      table, we no longer can clear pinet6 field.
      
      This patch extends logic used in commit fcbdf09d
      ("net: fix nulls list corruptions in sk_prot_alloc")
      
      TCP/UDP/UDPLite IPv6 protocols provide their own .clear_sk() method
      to make sure we do not clear pinet6 field.
      
      At socket clone phase, we do not really care, as cloning the parent (non
      NULL) pinet6 is not adding a fatal race.
      Signed-off-by: NEric Dumazet <edumazet@google.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      f77d6021
    • R
      net/mlx4: Strengthen VLAN tags/priorities enforcement in VST mode · 7677fc96
      Rony Efraim 提交于
      Make sure that the following steps are taken:
      
      - drop packets sent by the VF with vlan tag
      - block packets with vlan tag which are steered to the VF
      - drop/block tagged packets when the policy is priority-tagged
      - make sure VLAN stripping for received packets is set
      - make sure force UP bit for the VF QP is set
      
      Use enum values for all the above instead of numerical bit offsets.
      Signed-off-by: NRony Efraim <ronye@mellanox.com>
      Signed-off-by: NOr Gerlitz <ogerlitz@mellanox.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7677fc96
  5. 10 5月, 2013 10 次提交
  6. 09 5月, 2013 2 次提交
  7. 08 5月, 2013 18 次提交
    • M
      NVMe: Simplify Firmware Activate code slightly · ab3ea5bf
      Matthew Wilcox 提交于
      Add definitions for the three Firmware Activate actions, and change the
      SCSI translation code to construct the command into a temporary variable
      instead of translating the endianness back-and-forth.
      Signed-off-by: NMatthew Wilcox <matthew.r.wilcox@intel.com>
      Reviewed-by: NVishal Verma <vishal.l.verma@linux.intel.com>
      ab3ea5bf
    • D
      ALSA: Add comment for control TLV API · d24f5a9a
      David Henningsson 提交于
      Userspace is not meant to have to handle all strange dB ranges,
      so add a specification comment.
      Signed-off-by: NDavid Henningsson <david.henningsson@canonical.com>
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      d24f5a9a
    • K
      aio: don't include aio.h in sched.h · a27bb332
      Kent Overstreet 提交于
      Faster kernel compiles by way of fewer unnecessary includes.
      
      [akpm@linux-foundation.org: fix fallout]
      [akpm@linux-foundation.org: fix build]
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: N"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a27bb332
    • K
      aio: kill ki_retry · 41ef4eb8
      Kent Overstreet 提交于
      Thanks to Zach Brown's work to rip out the retry infrastructure, we don't
      need this anymore - ki_retry was only called right after the kiocb was
      initialized.
      
      This also refactors and trims some duplicated code, as well as cleaning up
      the refcounting/error handling a bit.
      
      [akpm@linux-foundation.org: use fmode_t in aio_run_iocb()]
      [akpm@linux-foundation.org: fix file_start_write/file_end_write tests]
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: N"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      41ef4eb8
    • K
      aio: kill ki_key · 8a660890
      Kent Overstreet 提交于
      ki_key wasn't actually used for anything previously - it was always 0.
      Drop it to trim struct kiocb a bit.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: N"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8a660890
    • E
      audit: Make testing for a valid loginuid explicit. · 780a7654
      Eric W. Biederman 提交于
      audit rule additions containing "-F auid!=4294967295" were failing
      with EINVAL because of a regression caused by e1760bd5.
      
      Apparently some userland audit rule sets want to know if loginuid uid
      has been set and are using a test for auid != 4294967295 to determine
      that.
      
      In practice that is a horrible way to ask if a value has been set,
      because it relies on subtle implementation details and will break
      every time the uid implementation in the kernel changes.
      
      So add a clean way to test if the audit loginuid has been set, and
      silently convert the old idiom to the cleaner and more comprehensible
      new idiom.
      
      Cc: <stable@vger.kernel.org> # 3.7
      Reported-By: NRichard Guy Briggs <rgb@redhat.com>
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Tested-by: NRichard Guy Briggs <rgb@redhat.com>
      Signed-off-by: NEric Paris <eparis@redhat.com>
      780a7654
    • K
      aio: kill batch allocation · a1c8eae7
      Kent Overstreet 提交于
      Previously, allocating a kiocb required touching quite a few global
      (well, per kioctx) cachelines...  so batching up allocation to amortize
      those was worthwhile.  But we've gotten rid of some of those, and in
      another couple of patches kiocb allocation won't require writing to any
      shared cachelines, so that means we can just rip this code out.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: N"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a1c8eae7
    • K
      aio: use cancellation list lazily · 0460fef2
      Kent Overstreet 提交于
      Cancelling kiocbs requires adding them to a per kioctx linked list,
      which is one of the few things we need to take the kioctx lock for in
      the fast path.  But most kiocbs can't be cancelled - so if we just do
      this lazily, we can avoid quite a bit of locking overhead.
      
      While we're at it, instead of using a flag bit switch to using ki_cancel
      itself to indicate that a kiocb has been cancelled/completed.  This lets
      us get rid of ki_flags entirely.
      
      [akpm@linux-foundation.org: remove buggy BUG()]
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: N"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0460fef2
    • K
      wait: add wait_event_hrtimeout() · 774a08b3
      Kent Overstreet 提交于
      Analagous to wait_event_timeout() and friends, this adds
      wait_event_hrtimeout() and wait_event_interruptible_hrtimeout().
      
      Note that unlike the versions that use regular timers, these don't
      return the amount of time remaining when they return - instead, they
      return 0 or -ETIME if they timed out.  because I was uncomfortable with
      the semantics of doing it the other way (that I could get it right,
      anyways).
      
      If the timer expires, there's no real guarantee that expire_time -
      current_time would be <= 0 - due to timer slack certainly, and I'm not
      sure I want to know the implications of the different clock bases in
      hrtimers.
      
      If the timer does expire and the code calculates that the time remaining
      is nonnegative, that could be even worse if the calling code then reuses
      that timeout.  Probably safer to just return 0 then, but I could imagine
      weird bugs or at least unintended behaviour arising from that too.
      
      I came to the conclusion that if other users end up actually needing the
      amount of time remaining, the sanest thing to do would be to create a
      version that uses absolute timeouts instead of relative.
      
      [akpm@linux-foundation.org: fix description of `timeout' arg]
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Cc: Jeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: N"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      774a08b3
    • K
      aio: make aio_put_req() lockless · 11599eba
      Kent Overstreet 提交于
      Freeing a kiocb needed to touch the kioctx for three things:
      
       * Pull it off the reqs_active list
       * Decrementing reqs_active
       * Issuing a wakeup, if the kioctx was in the process of being freed.
      
      This patch moves these to aio_complete(), for a couple reasons:
      
       * aio_complete() already has to issue the wakeup, so if we drop the
         kioctx refcount before aio_complete does its wakeup we don't have to
         do it twice.
       * aio_complete currently has to take the kioctx lock, so it makes sense
         for it to pull the kiocb off the reqs_active list too.
       * A later patch is going to change reqs_active to include unreaped
         completions - this will mean allocating a kiocb doesn't have to look
         at the ringbuffer. So taking the decrement of reqs_active out of
         kiocb_free() is useful prep work for that patch.
      
      This doesn't really affect cancellation, since existing (usb) code that
      implements a cancel function still calls aio_complete() - we just have
      to make sure that aio_complete does the necessary teardown for cancelled
      kiocbs.
      
      It does affect code paths where we free kiocbs that were never
      submitted; they need to decrement reqs_active and pull the kiocb off the
      reqs_active list.  This occurs in two places: kiocb_batch_free(), which
      is going away in a later patch, and the error path in io_submit_one.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Acked-by: NJeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: N"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      11599eba
    • K
      aio: move private stuff out of aio.h · 4e179bca
      Kent Overstreet 提交于
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Acked-by: NJeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Cc: Theodore Ts'o <tytso@mit.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4e179bca
    • K
      aio: kill return value of aio_complete() · 2d68449e
      Kent Overstreet 提交于
      Nothing used the return value, and it probably wasn't possible to use it
      safely for the locked versions (aio_complete(), aio_put_req()).  Just
      kill it.
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      Acked-by: NZach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Acked-by: NJeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: N"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2d68449e
    • Z
      aio: remove retry-based AIO · 41003a7b
      Zach Brown 提交于
      This removes the retry-based AIO infrastructure now that nothing in tree
      is using it.
      
      We want to remove retry-based AIO because it is fundemantally unsafe.
      It retries IO submission from a kernel thread that has only assumed the
      mm of the submitting task.  All other task_struct references in the IO
      submission path will see the kernel thread, not the submitting task.
      This design flaw means that nothing of any meaningful complexity can use
      retry-based AIO.
      
      This removes all the code and data associated with the retry machinery.
      The most significant benefit of this is the removal of the locking
      around the unused run list in the submission path.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      Signed-off-by: NZach Brown <zab@redhat.com>
      Cc: Zach Brown <zab@redhat.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Acked-by: NJeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: N"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      41003a7b
    • Z
      aio: remove dead code from aio.h · 4b49bb8a
      Zach Brown 提交于
      Signed-off-by: NZach Brown <zab@redhat.com>
      Signed-off-by: NKent Overstreet <koverstreet@google.com>
      Cc: Felipe Balbi <balbi@ti.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: Asai Thambi S P <asamymuthupa@micron.com>
      Cc: Selvan Mani <smani@micron.com>
      Cc: Sam Bradshaw <sbradshaw@micron.com>
      Acked-by: NJeff Moyer <jmoyer@redhat.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Benjamin LaHaise <bcrl@kvack.org>
      Reviewed-by: N"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4b49bb8a
    • A
      remove unused random32() and srandom32() · 22ea9c07
      Akinobu Mita 提交于
      After finishing a naming transition, remove unused backward
      compatibility wrapper macros
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      22ea9c07
    • N
      hugetlbfs: fix mmap failure in unaligned size request · af73e4d9
      Naoya Horiguchi 提交于
      The current kernel returns -EINVAL unless a given mmap length is
      "almost" hugepage aligned.  This is because in sys_mmap_pgoff() the
      given length is passed to vm_mmap_pgoff() as it is without being aligned
      with hugepage boundary.
      
      This is a regression introduced in commit 40716e29 ("hugetlbfs: fix
      alignment of huge page requests"), where alignment code is pushed into
      hugetlb_file_setup() and the variable len in caller side is not changed.
      
      To fix this, this patch partially reverts that commit, and adds
      alignment code in caller side.  And it also introduces hstate_sizelog()
      in order to get proper hstate to specified hugepage size.
      
      Addresses https://bugzilla.kernel.org/show_bug.cgi?id=56881
      
      [akpm@linux-foundation.org: fix warning when CONFIG_HUGETLB_PAGE=n]
      Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Reported-by: <iceman_dvd@yahoo.com>
      Cc: Steven Truelove <steven.truelove@utoronto.ca>
      Cc: Jianguo Wu <wujianguo@huawei.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      af73e4d9
    • A
      include/linux/mm.h: complete the mm_walk definition · 0f157a5b
      Andrew Morton 提交于
      That nameless-function-arguments thing drives me batty.  Fix.
      
      Cc: Dave Hansen <dave.hansen@intel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0f157a5b
    • A
      kref: minor cleanup · 2d864e41
      Anatol Pomozov 提交于
       - make warning smp-safe
       - result of atomic _unless_zero functions should be checked by caller
         to avoid use-after-free error
       - trivial whitespace fix.
      
      Link: https://lkml.org/lkml/2013/4/12/391
      
      Tested: compile x86, boot machine and run xfstests
      Signed-off-by: NAnatol Pomozov <anatol.pomozov@gmail.com>
      [ Removed line-break, changed to use WARN_ON_ONCE()  - Linus ]
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2d864e41
  8. 07 5月, 2013 1 次提交