1. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  2. 24 3月, 2010 4 次提交
    • S
      ocfs2: Fix a race in o2dlm lockres mastery · 14741472
      Srinivas Eeda 提交于
      In o2dlm, the master of a lock resource keeps a map of all interested
      nodes.  This prevents the master from purging the resource before an
      interested node can create a lock.
      
      A race between the mastery thread and the mastery handler allowed an
      interested node to discover who the master is without informing the
      master directly.  This is easily fixed by holding the dlm spinlock a
      little longer in the mastery handler.
      Signed-off-by: NSrinivas Eeda <srinivas.eeda@oracle.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      14741472
    • T
      Ocfs2: Handle deletion of reflinked oprhan inodes correctly. · b54c2ca4
      Tristan Ye 提交于
      The rule is that all inodes in the orphan dir have ORPHANED_FL,
      otherwise we treated it as an ERROR.  This rule works well except
      for some rare cases of reflink operation:
      
      http://oss.oracle.com/bugzilla/show_bug.cgi?id=1215
      
      The problem is caused by how reflink and our orphan_scan thread
      interact.
      
       * The orphan scan pulls the orphans into a queue first, then runs the
         queue at a later time.  We only hold the orphan_dir's lock
         during scanning.
      
       * Reflink create a oprhaned target in orphan_dir as its first step.
         It removes the target and clears the flag as the final step.
         These two steps take the orphan_dir's lock, but it is not held for
         the duration.
      
      Based on the above semantics, a reflink inode can be moved out of the
      orphan dir and have its ORPHANED_FL cleared before the queue of orphans
      is run.  This leads to a ERROR in ocfs2_query_wipde_inode().
      
      This patch teaches ocfs2_query_wipe_inode() to detect previously
      orphaned reflink targets.  If a reflink fails or a crash occurs during
      the relfink operation, the inode will retain ORPHANED_FL and will be
      properly wiped.
      Signed-off-by: NTristan Ye <tristan.ye@oracle.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      b54c2ca4
    • T
      Ocfs2: Journaling i_flags and i_orphaned_slot when adding inode to orphan dir. · 3939fda4
      Tristan Ye 提交于
      Currently, some callers were missing to journal the dirty inode after
      adding it to orphan dir.
      
      Now we're going to journal such modifications within the ocfs2_orphan_add()
      itself, It's safe to do so, though some existing caller may duplicate this,
      and it makes the logic look more straightforward anyway.
      Signed-off-by: NTristan Ye <tristan.ye@oracle.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      3939fda4
    • M
      ocfs2: Clear undo bits when local alloc is freed · b4414eea
      Mark Fasheh 提交于
      When the local alloc file changes windows, unused bits are freed back to the
      global bitmap. By defnition, those bits can not be in use by any file. Also,
      the local alloc will never have been able to allocate those bits if they
      were part of a previous truncate. Therefore it makes sense that we should
      clear unused local alloc bits in the undo buffer so that they can be used
      immediatly.
      
      [ Modified to call it ocfs2_release_clusters() -- Joel ]
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      b4414eea
  3. 20 3月, 2010 2 次提交
    • T
      ocfs2: Init meta_ac properly in ocfs2_create_empty_xattr_block. · b2317968
      Tao Ma 提交于
      You can't store a pointer that you haven't filled in yet and expect it
      to work.
      Signed-off-by: NTao Ma <tao.ma@oracle.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      b2317968
    • T
      ocfs2: Fix the update of name_offset when removing xattrs · dfe4d3d6
      Tao Ma 提交于
      When replacing a xattr's value, in some case we wipe its name/value
      first and then re-add it. The wipe is done by
      ocfs2_xa_block_wipe_namevalue() when the xattr is in the inode or
      block. We currently adjust name_offset for all the entries which have
      (offset < name_offset). This does not adjust the entrie we're replacing.
      Since we are replacing the entry, we don't adjust the total entry count.
      When we calculate a new namevalue location, we trust the entries
      now-wrong offset in ocfs2_xa_get_free_start().  The solution is to
      also adjust the name_offset for the replaced entry, allowing
      ocfs2_xa_get_free_start() to calculate the new namevalue location
      correctly.
      
      The following script can trigger a kernel panic easily.
      
      echo 'y'|mkfs.ocfs2 --fs-features=local,xattr -b 4K $DEVICE
      mount -t ocfs2 $DEVICE $MNT_DIR
      FILE=$MNT_DIR/$RANDOM
      for((i=0;i<76;i++))
      do
      string_76="a$string_76"
      done
      string_78="aa$string_76"
      string_82="aaaa$string_78"
      
      touch $FILE
      setfattr -n 'user.test1234567890' -v $string_76 $FILE
      setfattr -n 'user.test1234567890' -v $string_78 $FILE
      setfattr -n 'user.test1234567890' -v $string_82 $FILE
      Signed-off-by: NTao Ma <tao.ma@oracle.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      dfe4d3d6
  4. 19 3月, 2010 1 次提交
    • M
      ocfs2: Always try for maximum bits with new local alloc windows · b22b63eb
      Mark Fasheh 提交于
      What we were doing before was to ask for the current window size as the
      maximum allocation. This had the effect of limiting the amount of allocation
      we could get for the local alloc during times when the window size was
      shrunk due to fragmentation. In some cases, that could actually *increase*
      fragmentation by artificially limiting the number of bits we can accept. So
      while we still want to ask for a minimum number of bits equal to window
      size, there is no reason why we should limit the number of bits the local
      alloc should accept. Hence always allow the maximum number of local alloc
      bits.
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      b22b63eb
  5. 18 3月, 2010 4 次提交
  6. 13 3月, 2010 1 次提交
  7. 08 3月, 2010 1 次提交
  8. 07 3月, 2010 1 次提交
  9. 05 3月, 2010 7 次提交
    • C
      dquot: cleanup dquot initialize routine · 871a2931
      Christoph Hellwig 提交于
      Get rid of the initialize dquot operation - it is now always called from
      the filesystem and if a filesystem really needs it's own (which none
      currently does) it can just call into it's own routine directly.
      
      Rename the now static low-level dquot_initialize helper to __dquot_initialize
      and vfs_dq_init to dquot_initialize to have a consistent namespace.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      871a2931
    • C
      dquot: move dquot initialization responsibility into the filesystem · 907f4554
      Christoph Hellwig 提交于
      Currently various places in the VFS call vfs_dq_init directly.  This means
      we tie the quota code into the VFS.  Get rid of that and make the
      filesystem responsible for the initialization.   For most metadata operations
      this is a straight forward move into the methods, but for truncate and
      open it's a bit more complicated.
      
      For truncate we currently only call vfs_dq_init for the sys_truncate case
      because open already takes care of it for ftruncate and open(O_TRUNC) - the
      new code causes an additional vfs_dq_init for those which is harmless.
      
      For open the initialization is moved from do_filp_open into the open method,
      which means it happens slightly earlier now, and only for regular files.
      The latter is fine because we don't need to initialize it for operations
      on special files, and we already do it as part of the namespace operations
      for directories.
      
      Add a dquot_file_open helper that filesystems that support generic quotas
      can use to fill in ->open.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      907f4554
    • C
      dquot: cleanup dquot drop routine · 9f754758
      Christoph Hellwig 提交于
      Get rid of the drop dquot operation - it is now always called from
      the filesystem and if a filesystem really needs it's own (which none
      currently does) it can just call into it's own routine directly.
      
      Rename the now static low-level dquot_drop helper to __dquot_drop
      and vfs_dq_drop to dquot_drop to have a consistent namespace.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      9f754758
    • C
      dquot: move dquot drop responsibility into the filesystem · 257ba15c
      Christoph Hellwig 提交于
      Currently clear_inode calls vfs_dq_drop directly.  This means
      we tie the quota code into the VFS.  Get rid of that and make the
      filesystem responsible for the drop inside the ->clear_inode
      superblock operation.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      257ba15c
    • C
      dquot: cleanup dquot transfer routine · b43fa828
      Christoph Hellwig 提交于
      Get rid of the transfer dquot operation - it is now always called from
      the filesystem and if a filesystem really needs it's own (which none
      currently does) it can just call into it's own routine directly.
      
      Rename the now static low-level dquot_transfer helper to __dquot_transfer
      and vfs_dq_transfer to dquot_transfer to have a consistent namespace,
      and make the new dquot_transfer return a normal negative errno value
      which all callers expect.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      b43fa828
    • C
      dquot: cleanup inode allocation / freeing routines · 63936dda
      Christoph Hellwig 提交于
      Get rid of the alloc_inode and free_inode dquot operations - they are
      always called from the filesystem and if a filesystem really needs
      their own (which none currently does) it can just call into it's
      own routine directly.
      
      Also get rid of the vfs_dq_alloc/vfs_dq_free wrappers and always
      call the lowlevel dquot_alloc_inode / dqout_free_inode routines
      directly, which now lose the number argument which is always 1.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      63936dda
    • C
      dquot: cleanup space allocation / freeing routines · 5dd4056d
      Christoph Hellwig 提交于
      Get rid of the alloc_space, free_space, reserve_space, claim_space and
      release_rsv dquot operations - they are always called from the filesystem
      and if a filesystem really needs their own (which none currently does)
      it can just call into it's own routine directly.
      
      Move shared logic into the common __dquot_alloc_space,
      dquot_claim_space_nodirty and __dquot_free_space low-level methods,
      and rationalize the wrappers around it to move as much as possible
      code into the common block for CONFIG_QUOTA vs not.  Also rename
      all these helpers to be named dquot_* instead of vfs_dq_*.
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NJan Kara <jack@suse.cz>
      5dd4056d
  10. 03 3月, 2010 1 次提交
  11. 28 2月, 2010 3 次提交
  12. 27 2月, 2010 14 次提交
    • S
      dlm: allow dlm do recovery during shutdown · bc9838c4
      Srinivas Eeda 提交于
      If a node down event happens while dlm shutdown in progress, dlm recovery
      should be done before dlm is shutdown.  We can't migrate unrecovered locks,
      obviously.  But dlm_reco_thread only does recovery if the dlm_state is
      in DLM_CTXT_JOINED.
      
      dlm_reco_thread should do recovery if dlm_state is in DLM_CTXT_JOINED or
      DLM_CTXT_IN_SHUTDOWN.
      Signed-off-by: NSrinivas Eeda <srinivas.eeda@oracle.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      bc9838c4
    • T
      ocfs2: Only bug out in direct io write for reflinked extent. · cbaee472
      Tao Ma 提交于
      In ocfs2_direct_IO_get_blocks, we only need to bug out
      in case of we are going to write a recounted extent rec.
      
      What a silly bug introduced by me!
      Signed-off-by: NTao Ma <tao.ma@oracle.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Cc: stable@kernel.org
      cbaee472
    • C
      ocfs2: fix warning in ocfs2_file_aio_write() · 66b116c9
      Coly Li 提交于
      This patch fixes a compiling warning in ocfs2_file_aio_write().
      Signed-off-by: NColy Li <coly.li@suse.de>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      66b116c9
    • J
      ocfs2_dlmfs: Enable the use of user cluster stacks. · cbe0e331
      Joel Becker 提交于
      Unlike ocfs2, dlmfs has no permanent storage.  It can't store off a
      cluster stack it is supposed to be using.  So it can't specify the stack
      name in ocfs2_cluster_connect().
      
      Instead, we create ocfs2_cluster_connect_agnostic(), which simply uses
      the stack that is currently enabled.  This is find for dlmfs, which will
      rely on the stack initialization.
      
      We add the "stackglue" capability to dlmfs's capability list.  This lets
      userspace know dlmfs can be used with all cluster stacks.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      cbe0e331
    • J
      ocfs2_dlmfs: Use the stackglue. · 0016eedc
      Joel Becker 提交于
      Rather than directly using o2dlm, dlmfs can now use the stackglue.  This
      allows it to use userspace cluster stacks and fs/dlm.  This commit
      forces o2cb for now.  A latter commit will bump the protocol version and
      allow non-o2cb stacks.
      
      This is one big sed, really.  LKM_xxMODE becomes DLM_LOCK_xx.  LKM_flag
      becomes DLM_LKF_flag.
      
      We also learn to check that the LVB is valid before reading it.  Any DLM
      can lose the contents of the LVB during a complicated recovery.  userdlm
      should be checking this.  Now it does.  dlmfs will return 0 from read(2)
      if the LVB was invalid.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      0016eedc
    • J
      ocfs2_dlmfs: Don't honor truncate. The size of a dlmfs file is LVB_LEN · e8fce482
      Joel Becker 提交于
      We want folks using dlmfs to be able to use the LVB in places other than
      just write(2)/read(2).  By ignoring truncate requests, we allow 'echo
      "contents" > /dlm/space/lockname' to work.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      e8fce482
    • J
      ocfs2: Pass the locking protocol into ocfs2_cluster_connect(). · 553b5eb9
      Joel Becker 提交于
      Inside the stackglue, the locking protocol structure is hanging off of
      the ocfs2_cluster_connection.  This takes it one further; the locking
      protocol is passed into ocfs2_cluster_connect().  Now different cluster
      connections can have different locking protocols with distinct asts.
      Note that all locking protocols have to keep their maximum protocol
      version in lock-step.
      
      With the protocol structure set in ocfs2_cluster_connect(), there is no
      need for the stackglue to have a static pointer to a specific protocol
      structure.  We can change initialization to only pass in the maximum
      protocol version.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      553b5eb9
    • J
      ocfs2: Remove the ast pointers from ocfs2_stack_plugins · e603cfb0
      Joel Becker 提交于
      With the full ocfs2_locking_protocol hanging off of the
      ocfs2_cluster_connection, ast wrappers can get the ast/bast pointers
      there.  They don't need to get them from their plugin structure.
      
      The user plugin still needs the maximum locking protocol version,
      though.  This changes the plugin structure so that it only holds the max
      version, not the entire ocfs2_locking_protocol pointer.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      e603cfb0
    • J
      ocfs2: Hang the locking proto on the cluster conn and use it in asts. · 110946c8
      Joel Becker 提交于
      With the ocfs2_cluster_connection hanging off of the ocfs2_dlm_lksb, we
      have access to it in the ast and bast wrapper functions.  Attach the
      ocfs2_locking_protocol to the conn.
      
      Now, instead of refering to a static variable for ast/bast pointers, the
      wrappers can look at the connection.  This means different connections
      can have different ast/bast pointers, and it reduces the need for the
      static pointer.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      110946c8
    • J
      ocfs2: Attach the connection to the lksb · c0e41338
      Joel Becker 提交于
      We're going to want it in the ast functions, so we convert union
      ocfs2_dlm_lksb to struct ocfs2_dlm_lksb and let it carry the connection.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      c0e41338
    • J
      ocfs2: Pass lksbs back from stackglue ast/bast functions. · a796d286
      Joel Becker 提交于
      The stackglue ast and bast functions tried to maintain the fiction that
      their arguments were void pointers.  In reality, stack_user.c had to
      know that the argument was an ocfs2_lock_res in order to get the status
      off of the lksb.  That's ugly.
      
      This changes stackglue to always pass the lksb as the argument to ast
      and bast functions.  The caller can always use container_of() to get the
      ocfs2_lock_res or user_dlm_lock_res.  The net effect to the caller is
      zero.  They still get back the lockres in their ast.  stackglue gets
      cleaner, and now can use the lksb itself.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      a796d286
    • J
      ocfs2_dlmfs: Move to its own directory · 34a9dd7e
      Joel Becker 提交于
      We're going to remove the tie between ocfs2_dlmfs and o2dlm.
      ocfs2_dlmfs doesn't belong in the fs/ocfs2/dlm directory anymore.  Here
      we move it to fs/ocfs2/dlmfs.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      34a9dd7e
    • J
      ocfs2_dlmfs: Use poll() to signify BASTs. · 65b6f340
      Joel Becker 提交于
      o2dlm's userspace filesystem is an easy way to use the DLM from
      userspace.  It is intentionally simple. For example, it does not allow
      for asynchronous behavior or lock conversion.  This is intentional to
      keep the interface simple.
      
      Because there is no asynchronous notification, there is no way for a
      process holding a lock to know another node needs the lock.  This is the
      number one complaint of ocfs2_dlmfs users.  Turns out, we can solve this
      very easily.  We add poll() support to ocfs2_dlmfs.  When a BAST is
      received, the lock's file descriptor will receive POLLIN.
      
      This is trivial to implement.  Userdlm already has an appropriate
      waitqueue, and the lock knows when it is blocked.
      
      We add the "bast" capability to tell userspace this is available.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Acked-by: NMark Fasheh <mfasheh@suse.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      65b6f340
    • J
      ocfs2_dlmfs: Add capabilities parameter. · 14a437c2
      Joel Becker 提交于
      Over time, dlmfs has added some features that were not part of the
      initial ABI.  Unfortunately, some of these features are not detectable
      via standard usage.  For example, Linux's default poll always returns
      POLLIN, so there is no way for a caller of poll(2) to know when dlmfs
      added poll support.  Instead, we provide this list of new capabilities.
      
      Capabilities is a read-only attribute.  We do it as a module parameter
      so we can discover it whether dlmfs is built in, loaded, or even not
      loaded (via modinfo).
      
      The ABI features are local to this machine's dlmfs mount.  This is
      distinct from the locking protocol, which is concerned with inter-node
      interaction.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      14a437c2