1. 18 4月, 2008 19 次提交
    • J
      ocfs2: Clean up stackglue initialization · 63e0c48a
      Joel Becker 提交于
      The stack glue initialization function needs a better name so that it can be
      used cleanly when stackglue becomes a module.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      63e0c48a
    • J
      ocfs2: Abstract out a debugging function for underlying dlms. · cf0acdcd
      Joel Becker 提交于
      dlmglue.c was still referencing a raw o2dlm lksb in one instance.  Let's
      create a generic ocfs2_dlm_dump_lksb() function.  This allows underlying
      DLMs to print whatever they want about their lock.
      
      We then move the o2dlm dump into stackglue.c where it belongs.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      cf0acdcd
    • D
      ocfs2: handle async EAGAIN from NOQUEUE request · 1693a5c0
      David Teigland 提交于
      When using fsdlm, -EAGAIN is returned in the async callback for NOQUEUE
      requests. Fix up dlmglue to expect this.
      Signed-off-by: NDavid Teigland <teigland@redhat.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      1693a5c0
    • J
      ocfs2: Remove CANCELGRANT from the view of dlmglue. · de551246
      Joel Becker 提交于
      o2dlm has the non-standard behavior of providing a cancel callback
      (unlock_ast) even when the cancel has failed (the locking operation
      succeeded without canceling).  This is called CANCELGRANT after the
      status code sent to the callback.  fs/dlm does not provide this
      callback, so dlmglue must be changed to live without it.
      o2dlm_unlock_ast_wrapper() in stackglue now ignores CANCELGRANT calls.
      
      Because dlmglue no longer sees CANCELGRANT, ocfs2_unlock_ast() no longer
      needs to check for it.  ocfs2_locking_ast() must catch that a cancel was
      tried and clear the cancel state.
      
      Making these changes opens up a locking race.  dlmglue uses the the
      OCFS2_LOCK_BUSY flag to ensure only one thread is calling the dlm at any
      one time.  But dlmglue must unlock the lockres before calling into the
      dlm.  In the small window of time between unlocking the lockres and
      calling the dlm, the downconvert thread can try to cancel the lock.  The
      downconvert thread is checking the OCFS2_LOCK_BUSY flag - it doesn't
      know that ocfs2_dlm_lock() has not yet been called.
      
      Because ocfs2_dlm_lock() has not yet been called, the cancel operation
      will just be a no-op.  There's nothing to cancel.  With CANCELGRANT,
      dlmglue uses the CANCELGRANT callback to clear up the cancel state.
      When it comes around again, it will retry the cancel.  Eventually, the
      first thread will have called into ocfs2_dlm_lock(), and either the
      lock or the cancel will succeed.  The downconvert thread can then do its
      downconvert.
      
      Without CANCELGRANT, there is nothing to clean up the cancellation
      state.  The downconvert thread does not know to retry its operations.
      More importantly, the original lock may be blocking on the other node
      that is trying to cancel us.  With neither able to make progress, the
      ast is never called and the cancellation state is never cleaned up that
      way.  dlmglue is deadlocked.
      
      The OCFS2_LOCK_PENDING flag is introduced to remedy this window.  It is
      set at the same time OCFS2_LOCK_BUSY is.  Thus, the downconvert thread
      can check whether the lock is cancelable.  If not, it just loops around
      to try again.  Once ocfs2_dlm_lock() is called, the thread then clears
      OCFS2_LOCK_PENDING and wakes the downconvert thread.  Now, if the
      downconvert thread finds the lock BUSY, it can safely try to cancel it.
      Whether the cancel works or not, the state will be properly set and the
      lock processing can continue.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      de551246
    • M
      ocfs2: Fill node number during cluster stack init · 0abd6d18
      Mark Fasheh 提交于
      It doesn't make sense to query for a node number before connecting to the
      cluster stack. This should be safe to do because node_num is only just
      printed,
      and we're actually only moving the setting of node num a small amount
      further in the mount process.
      
      [ Disconnect when node query fails -- Joel ]
      Reviewed-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      0abd6d18
    • J
      ocfs2: Move o2hb functionality into the stack glue. · 6953b4c0
      Joel Becker 提交于
      The last bit of classic stack used directly in ocfs2 code is o2hb.
      Specifically, the check for heartbeat during mount and the call to
      ocfs2_hb_ctl during unmount.
      
      We create an extra API, ocfs2_cluster_hangup(), to encapsulate the call
      to ocfs2_hb_ctl.  Other stacks will just leave hangup() empty.
      
      The check for heartbeat is moved into ocfs2_cluster_connect().  It will
      be matched by a similar check for other stacks.
      
      With this change, only stackglue.c includes cluster/ headers.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      6953b4c0
    • J
      ocfs2: Abstract out node number queries. · 19fdb624
      Joel Becker 提交于
      ocfs2 asks the cluster stack for the local node's node number for two
      reasons; to fill the slot map and to print it. While the slot map isn't
      necessary for userspace cluster stacks, the printing is very nice for
      debugging. Thus we add ocfs2_cluster_this_node() as a generic API to get
      this value. It is anticipated that the slot map will not be used under a
      userspace cluster stack, so validity checks of the node num only need to
      exist in the slot map code. Otherwise, it just gets used and printed as an
      opaque value.
      
      [ Fixed up some "int" versus "unsigned int" issues and made osb->node_num
        truly opaque. --Mark ]
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      19fdb624
    • J
      ocfs2: Introduce the new ocfs2_cluster_connect/disconnect() API. · 4670c46d
      Joel Becker 提交于
      This step introduces a cluster stack agnostic API for initializing and
      exiting.  fs/ocfs2/dlmglue.c no longer uses o2cb/o2dlm knowledge to
      connect to the stack.  It is all handled in stackglue.c.
      
      heartbeat.c no longer needs to know how it gets called.
      ocfs2_do_node_down() is now a clean recovery trigger.
      
      The big gotcha is the ordering of initializations and de-initializations done
      underneath ocfs2_cluster_connect().  ocfs2_dlm_init() used to do all
      o2dlm initialization in one block.  Thus, the o2dlm functionality of
      ocfs2_cluster_connect() is very straightforward.  ocfs2_dlm_shutdown(),
      however, did a few things between de-registration of the eviction
      callback and actually shutting down the domain.  Now de-registration and
      shutdown of the domain are wrapped within the single
      ocfs2_cluster_disconnect() call.  I've checked the code paths to make
      sure we can safely tear down things in ocfs2_dlm_shutdown() before
      calling ocfs2_cluster_disconnect().  The filesystem has already set
      itself to ignore the callback.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      4670c46d
    • J
      ocfs2: Create the lock status block union. · 8f2c9c1b
      Joel Becker 提交于
      Wrap the lock status block (lksb) in a union.  Later we will add a union
      element for the fs/dlm lksb.  Create accessors for the status and lvb
      fields.
      
      Other than a debugging function, dlmglue.c does not directly reference
      the o2dlm locking path anymore.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      8f2c9c1b
    • J
      ocfs2: Use -errno instead of dlm_status for ocfs2_dlm_lock/unlock() API. · 7431cd7e
      Joel Becker 提交于
      Change the ocfs2_dlm_lock/unlock() functions to return -errno values.
      This is the first step towards elminiating dlm_status in
      fs/ocfs2/dlmglue.c.  The change also passes -errno values to
      ->unlock_ast().
      
      [ Fix a return code in dlmglue.c and change the error translation table into
        an array of ints. --Mark ]
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      7431cd7e
    • J
      ocfs2: Use global DLM_ constants in generic code. · bd3e7610
      Joel Becker 提交于
      The ocfs2 generic code should use the values in <linux/dlmconstants.h>.
      stackglue.c will convert them to o2dlm values.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      bd3e7610
    • J
      ocfs2: Separate out dlm lock functions. · 24ef1815
      Joel Becker 提交于
      This is the first in a series of patches to isolate ocfs2 from the
      underlying cluster stack. Here we wrap the dlm locking functions with
      ocfs2-specific calls. Because ocfs2 always uses the same dlm lock status
      callbacks, we can eliminate the callbacks from the filesystem visible
      functions.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      24ef1815
    • J
      ocfs2: New slot map format · 386a2ef8
      Joel Becker 提交于
      The old slot map had a few limitations:
      
      - It was limited to one block, so the maximum slot count was 255.
      - Each slot was signed 16bits, limiting node numbers to INT16_MAX.
      - An empty slot was marked by the magic 0xFFFF (-1).
      
      The new slot map format provides 32bit node numbers (UINT32_MAX), a
      separate space to mark a slot in use, and extra room to grow.  The slot
      map is now bounded by i_size, not a block.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      386a2ef8
    • J
      ocfs2: Define the contents of the slot_map file. · fb86b1f0
      Joel Becker 提交于
      The slot map file is merely an array of __le16.  Wrap it in a structure for
      cleaner reference.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      fb86b1f0
    • J
      ocfs2: De-magic the in-memory slot map. · fc881fa0
      Joel Becker 提交于
      The in-memory slot map uses the same magic as the on-disk one.  There is
      a special value to mark a slot as invalid.  It relies on the size of
      certain types and so on.
      
      Write a new in-memory map that keeps validity as a separate field.  Outside
      of the I/O functions, OCFS2_INVALID_SLOT now means what it is supposed to.
      It also is no longer tied to the type size.
      
      This also means that only the I/O functions refer to 16bit quantities.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      fc881fa0
    • J
      ocfs2: slot_map I/O based on max_slots. · 1c8d9a6a
      Joel Becker 提交于
      The slot map code assumed a slot_map file has one block allocated.
      This changes the code to I/O as many blocks as will cover max_slots.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      1c8d9a6a
    • J
      ocfs2: Change the recovery map to an array of node numbers. · 553abd04
      Joel Becker 提交于
      The old recovery map was a bitmap of node numbers.  This was sufficient
      for the maximum node number of 254.  Going forward, we want node numbers
      to be UINT32.  Thus, we need a new recovery map.
      
      Note that we can't keep track of slots here.  We must write down the
      node number to recovery *before* we get the locks needed to convert a
      node number into a slot number.
      
      The recovery map is now an array of unsigned ints, max_slots in size.
      It moves to journal.c with the rest of recovery.
      
      Because it needs to be initialized, we move all of recovery initialization
      into a new function, ocfs2_recovery_init().  This actually cleans up
      ocfs2_initialize_super() a little as well.  Following on, recovery cleaup
      becomes part of ocfs2_recovery_exit().
      
      A number of node map functions are rendered obsolete and are removed.
      
      Finally, waiting on recovery is wrapped in a function rather than naked
      checks on the recovery_event.  This is a cleanup from Mark.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      553abd04
    • J
      ocfs2: Make ocfs2_slot_info private. · d85b20e4
      Joel Becker 提交于
      Just use osb_lock around the ocfs2_slot_info data.  This allows us to
      take the ocfs2_slot_info structure private in slot_info.c.  All access
      is now via accessors.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      d85b20e4
    • M
      ocfs2: Move slot map access into slot_map.c · 8e8a4603
      Mark Fasheh 提交于
      journal.c and dlmglue.c would refresh the slot map by hand.  Instead, have
      the update and clear functions do the work inside slot_map.c.  The eventual
      result is to make ocfs2_slot_info defined privately in slot_map.c
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      8e8a4603
  2. 16 4月, 2008 2 次提交
  3. 15 4月, 2008 2 次提交
    • A
      JFFS2 Fix of panics caused by wrong condition for hole frag creation in write_begin · abe2f414
      Alexey Korolev 提交于
      This fixes a regression introduced in commit
      205c109a when switching to
      write_begin/write_end operations in JFFS2.
      
      The page offset is miscalculated, leading to corruption of the fragment
      lists and subsequently to memory corruption and panics.
      
      [ Side note: the bug is a fairly direct result of the naming.  Nick was
        likely misled by the use of "offs", since we tend to use the notion of
        "offset" not as an absolute position, but as an offset _within_ a page
        or allocation.
      
        Alternatively, a "pgoff_t" is a page index, but not a byte offset -
        our VM naming can be a bit confusing.
      
        So in this case, a VM person would likely have called this a "pos",
        not an "offs", or perhaps talked about byte offsets rather than page
        offsets (since it's counted in bytes, not pages).    - Linus ]
      Signed-off-by: NAlexey Korolev <akorolev@infradead.org>
      Signed-off-by: NVasiliy Leonenko <vasiliy.leonenko@mail.ru>
      Signed-off-by: NDavid Woodhouse <dwmw2@infradead.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      abe2f414
    • J
      locks: fix possible infinite loop in fcntl(F_SETLKW) over nfs · 19e729a9
      J. Bruce Fields 提交于
      Miklos Szeredi found the bug:
      
      	"Basically what happens is that on the server nlm_fopen() calls
      	nfsd_open() which returns -EACCES, to which nlm_fopen() returns
      	NLM_LCK_DENIED.
      
      	"On the client this will turn into a -EAGAIN (nlm_stat_to_errno()),
      	which in will cause fcntl_setlk() to retry forever."
      
      So, for example, opening a file on an nfs filesystem, changing
      permissions to forbid further access, then trying to lock the file,
      could result in an infinite loop.
      
      And Trond Myklebust identified the culprit, from Marc Eshel and I:
      
      	7723ec97 "locks: factor out
      	generic/filesystem switch from setlock code"
      
      That commit claimed to just be reshuffling code, but actually introduced
      a behavioral change by calling the lock method repeatedly as long as it
      returned -EAGAIN.
      
      We assumed this would be safe, since we assumed a lock of type SETLKW
      would only return with either success or an error other than -EAGAIN.
      However, nfs does can in fact return -EAGAIN in this situation, and
      independently of whether that behavior is correct or not, we don't
      actually need this change, and it seems far safer not to depend on such
      assumptions about the filesystem's ->lock method.
      
      Therefore, revert the problematic part of the original commit.  This
      leaves vfs_lock_file() and its other callers unchanged, while returning
      fcntl_setlk and fcntl_setlk64 to their former behavior.
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      Tested-by: NMiklos Szeredi <mszeredi@suse.cz>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: Marc Eshel <eshel@almaden.ibm.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      19e729a9
  4. 12 4月, 2008 1 次提交
  5. 11 4月, 2008 5 次提交
  6. 10 4月, 2008 4 次提交
  7. 09 4月, 2008 2 次提交
  8. 05 4月, 2008 1 次提交
    • L
      Be more careful about marking buffers dirty · 1be62dc1
      Linus Torvalds 提交于
      Mikulas Patocka noted that the optimization where we check if a buffer
      was already dirty (and we avoid re-dirtying it) was not really SMP-safe.
      
      Since the read of the old status was not synchronized with anything, an
      aggressive CPU re-ordering of memory accesses might have moved that read
      up to before the data was even written to the buffer, and another CPU
      that cleaned it again, causing the newly dirty state to never actually
      hit the disk.
      
      Admittedly this would probably never trigger in practice, but it's still
      wrong.
      
      Mikulas sent a patch that fixed the problem, but I dislike the subtlety
      of the whole optimization, so this is an alternate fix that is more
      explicit about the particular SMP ordering for the optimization, and
      separates out the speculative reads of the buffer state into its own
      conditional (and makes the memory barrier only happen if we are likely
      to actually hit the optimized case in the first place).
      
      I considered removing the optimization entirely, but Andrew argued for
      it's continued existence. I'm a push-over.
      
      Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1be62dc1
  9. 04 4月, 2008 2 次提交
  10. 03 4月, 2008 1 次提交
  11. 02 4月, 2008 1 次提交