• D
    FS-Cache: Synchronise object death state change vs operation submission · f09b443d
    David Howells 提交于
    When an object is being marked as no longer live, do this under the object
    spinlock to prevent a race with operation submission targeted on that object.
    
    The problem occurs due to the following pair of intertwined sequences when the
    cache tries to create an object that would take it over the hard available
    space limit:
    
     NETFS INTERFACE
     ===============
     (A) The netfs calls fscache_acquire_cookie().  object creation is deferred to
         the object state machine and the netfs is allowed to continue.
    
    	OBJECT STATE MACHINE KTHREAD
    	============================
    	(1) The object is looked up on disk by fscache_look_up_object()
    	    calling cachefiles_walk_to_object().  The latter finds that the
    	    object is not yet represented on disk and calls
    	    fscache_object_lookup_negative().
    
    	(2) fscache_object_lookup_negative() sets FSCACHE_COOKIE_NO_DATA_YET
    	    and clears FSCACHE_COOKIE_LOOKING_UP, thus allowing the netfs to
    	    start queuing read operations.
    
     (B) The netfs calls fscache_read_or_alloc_pages().  This calls
         fscache_wait_for_deferred_lookup() which sees FSCACHE_COOKIE_LOOKING_UP
         become clear, allowing the read to begin.
    
     (C) A read operation is set up and passed to fscache_submit_op() to deal
         with.
    
    	(3) cachefiles_walk_to_object() calls cachefiles_has_space(), which
    	    fails (or one of the file operations to create stuff fails).
    	    cachefiles returns an error to fscache.
    
    	(4) fscache_look_up_object() transits to the LOOKUP_FAILURE state,
    
    	(5) fscache_lookup_failure() sets FSCACHE_OBJECT_LOOKED_UP and
    	    FSCACHE_COOKIE_UNAVAILABLE and clears FSCACHE_COOKIE_LOOKING_UP
    	    then transits to the KILL_OBJECT state.
    
    	(6) fscache_kill_object() clears FSCACHE_OBJECT_IS_LIVE in an attempt
    	    to reject any further requests from the netfs.
    
    	(7) object->n_ops is examined and found to be 0.
    	    fscache_kill_object() transits to the DROP_OBJECT state.
    
     (D) fscache_submit_op() locks the object spinlock, sees if it can dispatch
         the op immediately by calling fscache_object_is_active() - which fails
         since FSCACHE_OBJECT_IS_AVAILABLE has not yet been set.
    
     (E) fscache_submit_op() then tests FSCACHE_OBJECT_LOOKED_UP - which is set.
         It then queues the object and increments object->n_ops.
    
    	(8) fscache_drop_object() releases the object and eventually
    	    fscache_put_object() calls cachefiles_put_object() which suffers
    	    an assertion failure here:
    
    		ASSERTCMP(object->fscache.n_ops, ==, 0);
    
    Locking the object spinlock in step (6) around the clearance of
    FSCACHE_OBJECT_IS_LIVE ensures that the the decision trees in
    fscache_submit_op() and fscache_submit_exclusive_op() don't see the IS_LIVE
    flag being cleared mid-decision: either the op is queued before step (7) - in
    which case fscache_kill_object() will see n_ops>0 and will deal with the op -
    or the op will be rejected.
    
    This, combined with rejecting op submission if the target object is dying, fix
    the problem.
    
    The problem shows up as the following oops:
    
    CacheFiles: Assertion failed
    CacheFiles: 1 == 0 is false
    ------------[ cut here ]------------
    kernel BUG at ../fs/cachefiles/interface.c:339!
    ...
    RIP: 0010:[<ffffffffa014fd9c>]  [<ffffffffa014fd9c>] cachefiles_put_object+0x2a4/0x301 [cachefiles]
    ...
    Call Trace:
     [<ffffffffa008674b>] fscache_put_object+0x18/0x21 [fscache]
     [<ffffffffa00883e6>] fscache_object_work_func+0x3ba/0x3c9 [fscache]
     [<ffffffff81054dad>] process_one_work+0x226/0x441
     [<ffffffff81055d91>] worker_thread+0x273/0x36b
     [<ffffffff81055b1e>] ? rescuer_thread+0x2e1/0x2e1
     [<ffffffff81059b9d>] kthread+0x10e/0x116
     [<ffffffff81059a8f>] ? kthread_create_on_node+0x1bb/0x1bb
     [<ffffffff815579ac>] ret_from_fork+0x7c/0xb0
     [<ffffffff81059a8f>] ? kthread_create_on_node+0x1bb/0x1bb
    Signed-off-by: NDavid Howells <dhowells@redhat.com>
    Reviewed-by: NSteve Dickson <steved@redhat.com>
    Acked-by: NJeff Layton <jeff.layton@primarydata.com>
    f09b443d
object.c 31.0 KB