提交 · 721f69c404c51a5d1dc93fddb48ee936e8e23770 · openanolis / cloud-kernel

05 9月, 2009 3 次提交

ocfs2: Pass struct ocfs2_caching_info to the journal functions. · 0cf2f763

由 Joel Becker 提交于 2月 12, 2009

The next step in divorcing metadata I/O management from struct inode is
to pass struct ocfs2_caching_info to the journal functions.  Thus the
journal locks a metadata cache with the cache io_lock function.  It also
can compare ci_last_trans and ci_created_trans directly.

This is a large patch because of all the places we change
ocfs2_journal_access..(handle, inode, ...) to
ocfs2_journal_access..(handle, INODE_CACHE(inode), ...).
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

0cf2f763

ocfs2: move ip_created_trans to struct ocfs2_caching_info · 292dd27e

由 Joel Becker 提交于 2月 12, 2009

Similar ip_last_trans, ip_created_trans tracks the creation of a journal
managed inode.  This specifically tracks what transaction created the
inode.  This is so the code can know if the inode has ever been written
to disk.

This behavior is desirable for any journal managed object.  We move it
to struct ocfs2_caching_info as ci_created_trans so that any object
using ocfs2_caching_info can rely on this behavior.
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

292dd27e

ocfs2: move ip_last_trans to struct ocfs2_caching_info · 66fb345d

由 Joel Becker 提交于 2月 12, 2009

We have the read side of metadata caching isolated to struct
ocfs2_caching_info, now we need the write side.  This means the journal
functions.  The journal only does a couple of things with struct inode.

This change moves the ip_last_trans field onto struct
ocfs2_caching_info as ci_last_trans.  This field tells the journal
whether a pending journal flush is required.
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

66fb345d

11 8月, 2009 1 次提交

ocfs2: Fix possible deadlock when extending quota file · b409d7a0

由 Jan Kara 提交于 8月 06, 2009

In OCFS2, allocator locks rank above transaction start. Thus we
cannot extend quota file from inside a transaction less we could
deadlock.

We solve the problem by starting transaction not already in
ocfs2_acquire_dquot() but only in ocfs2_local_read_dquot() and
ocfs2_global_read_dquot() and we allocate blocks to quota files before starting
the transaction.  In case we crash, quota files will just have a few blocks
more but that's no problem since we just use them next time we extend the
quota file.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

b409d7a0

24 7月, 2009 1 次提交

ocfs2: Define credit counts for quota operations · 0584974a

由 Jan Kara 提交于 7月 22, 2009

Numbers of needed credits for some quota operations were written
as raw numbers. Create appropriate defines instead.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

0584974a

09 7月, 2009 1 次提交

ocfs2: Fixup orphan scan cleanup after failed mount · 8b712cd5

由 Jeff Mahoney 提交于 7月 07, 2009

If the mount fails for any reason, ocfs2_dismount_volume calls
ocfs2_orphan_scan_stop. It requires that ocfs2_orphan_scan_init
be called to setup the mutex and work queues, but that doesn't
happen if the mount has failed and we oops accessing an uninitialized
work queue.

This patch splits the init and startup of the orphan scan, eliminating
the oops.
Signed-off-by: NJeff Mahoney <jeffm@suse.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

8b712cd5

23 6月, 2009 1 次提交

ocfs2: Disable orphan scanning for local and hard-ro mounts · df152c24

由 Sunil Mushran 提交于 6月 22, 2009

Local and Hard-RO mounts do not need orphan scanning.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

df152c24

04 6月, 2009 1 次提交

ocfs2: timer to queue scan of all orphan slots · 83273932

由 Srinivas Eeda 提交于 6月 03, 2009

When a dentry is unlinked, the unlinking node takes an EX on the dentry lock
before moving the dentry to the orphan directory. Other nodes that have
this dentry in cache have a PR on the same dentry lock. When the EX is
requested, the other nodes flag the corresponding inode as MAYBE_ORPHANED
during downconvert. The inode is finally deleted when the last node to iput
the inode sees that i_nlink==0 and the MAYBE_ORPHANED flag is set.

A problem arises if a node is forced to free dentry locks because of memory
pressure. If this happens, the node will no longer get downconvert
notifications for the dentries that have been unlinked on another node.
If it also happens that node is actively using the corresponding inode and
happens to be the one performing the last iput on that inode, it will fail
to delete the inode as it will not have the MAYBE_ORPHANED flag set.

This patch fixes this shortcoming by introducing a periodic scan of the
orphan directories to delete such inodes. Care has been taken to distribute
the workload across the cluster so that no one node has to perform the task
all the time.
Signed-off-by: NSrinivas Eeda <srinivas.eeda@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

83273932

01 5月, 2009 1 次提交

ocfs2: Fix a missing credit when deleting from indexed directories. · dfa13f39

由 Joel Becker 提交于 4月 29, 2009

The ocfs2 directory index updates two blocks when we remove an entry -
the dx root and the dx leaf.  OCFS2_DELETE_INODE_CREDITS was only
accounting for the dx leaf.  This shows up when ocfs2_delete_inode()
runs out of credits in jbd2_journal_dirty_metadata() at
"J_ASSERT_JH(jh, handle->h_buffer_credits > 0);".

The test that caught this was running dirop_file_racer from the
ocfs2-test suite with a 250-character filename PREFIX.  Run on a 512B
blocksize, it forces the orphan dir index to grow large enough to
trigger.
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

dfa13f39

04 4月, 2009 5 次提交

ocfs2: recover orphans in offline slots during recovery and mount · 9140db04

由 Srinivas Eeda 提交于 3月 06, 2009

During recovery, a node recovers orphans in it's slot and the dead node(s). But
if the dead nodes were holding orphans in offline slots, they will be left
unrecovered.

If the dead node is the last one to die and is holding orphans in other slots
and is the first one to mount, then it only recovers it's own slot, which
leaves orphans in offline slots.

This patch queues complete_recovery to clean orphans for all offline slots
during mount and node recovery.
Signed-off-by: NSrinivas Eeda <srinivas.eeda@oracle.com>
Acked-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

9140db04

ocfs2: Introduce dir free space list · e7c17e43

由 Mark Fasheh 提交于 1月 29, 2009

The only operation which doesn't get faster with directory indexing is
insert, which still has to walk the entire unindexed directory portion to
find a free block. This patch provides an improvement in directory insert
performance by maintaining a singly linked list of directory leaf blocks
which have space for additional dirents.
Signed-off-by: NMark Fasheh <mfasheh@suse.com>
Acked-by: NJoel Becker <joel.becker@oracle.com>

e7c17e43

ocfs2: Store dir index records inline · 4ed8a6bb

由 Mark Fasheh 提交于 11月 24, 2008

Allow us to store a small number of directory index records in the
ocfs2_dx_root_block. This saves us a disk read on small to medium sized
directories (less than about 250 entries). The inline root is automatically
turned into a root block with extents if the directory size increases beyond
it's capacity.
Signed-off-by: NMark Fasheh <mfasheh@suse.com>
Acked-by: NJoel Becker <joel.becker@oracle.com>

4ed8a6bb

ocfs2: Add a name indexed b-tree to directory inodes · 9b7895ef

由 Mark Fasheh 提交于 11月 12, 2008

This patch makes use of Ocfs2's flexible btree code to add an additional
tree to directory inodes. The new tree stores an array of small,
fixed-length records in each leaf block. Each record stores a hash value,
and pointer to a block in the traditional (unindexed) directory tree where a
dirent with the given name hash resides. Lookup exclusively uses this tree
to find dirents, thus providing us with constant time name lookups.

Some of the hashing code was copied from ext3. Unfortunately, it has lots of
unfixed checkpatch errors. I left that as-is so that tracking changes would
be easier.
Signed-off-by: NMark Fasheh <mfasheh@suse.com>
Acked-by: NJoel Becker <joel.becker@oracle.com>

9b7895ef

ocfs2: Move struct recovery_map to a header file · 96a6c64b

由 Sunil Mushran 提交于 12月 16, 2008

Move the definition of struct recovery_map from journal.c to journal.h. This
is preparation for the next patch.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

96a6c64b

11 2月, 2009 1 次提交

jbd2: Avoid possible NULL dereference in jbd2_journal_begin_ordered_truncate() · 7f5aa215

由 Jan Kara 提交于 2月 10, 2009

If we race with commit code setting i_transaction to NULL, we could
possibly dereference it.  Proper locking requires the journal pointer
(to access journal->j_list_lock), which we don't have.  So we have to
change the prototype of the function so that filesystem passes us the
journal pointer.  Also add a more detailed comment about why the
function jbd2_journal_begin_ordered_truncate() does what it does and
how it should be used.

Thanks to Dan Carpenter <error27@gmail.com> for pointing to the
suspitious code.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
Acked-by: NJoel Becker <joel.becker@oracle.com>
CC: linux-ext4@vger.kernel.org
CC: ocfs2-devel@oss.oracle.com
CC: mfasheh@suse.de
CC: Dan Carpenter <error27@gmail.com>

7f5aa215

06 1月, 2009 5 次提交

ocfs2: Use metadata-specific ocfs2_journal_access_*() functions. · 13723d00

由 Joel Becker 提交于 10月 17, 2008

The per-metadata-type ocfs2_journal_access_*() functions hook up jbd2
commit triggers and allow us to compute metadata ecc right before the
buffers are written out.  This commit provides ecc for inodes, extent
blocks, group descriptors, and quota blocks.  It is not safe to use
extened attributes and metaecc at the same time yet.

The ocfs2_extent_tree and ocfs2_path abstractions in alloc.c both hide
the type of block at their root.  Before, it didn't matter, but now the
root block must use the appropriate ocfs2_journal_access_*() function.
To keep this abstract, the structures now have a pointer to the matching
journal_access function and a wrapper call to call it.

A few places use naked ocfs2_write_block() calls instead of adding the
blocks to the journal.  We make sure to calculate their checksum and ecc
before the write.

Since we pass around the journal_access functions.  Let's typedef them
in ocfs2.h.
Signed-off-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

13723d00

ocfs2: Add journal_access functions with jbd2 triggers. · 50655ae9

由 Joel Becker 提交于 9月 11, 2008

We create wrappers for ocfs2_journal_access() that are specific to the
type of metadata block.  This allows us to associate jbd2 commit
triggers with the block.  The triggers will compute metadata ecc in a
future commit.
Signed-off-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

50655ae9

ocfs2: Implement quota recovery · 2205363d

由 Jan Kara 提交于 10月 20, 2008

Implement functions for recovery after a crash. Functions just
read local quota file and sync info to global quota file.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

2205363d

ocfs2: Add quota calls for allocation and freeing of inodes and space · a90714c1

由 Jan Kara 提交于 10月 09, 2008

Add quota calls for allocation and freeing of inodes and space, also update
estimates on number of needed credits for a transaction. Move out inode
allocation from ocfs2_mknod_locked() because vfs_dq_init() must be called
outside of a transaction.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

a90714c1

ocfs2: Remove JBD compatibility layer · 53ef99ca

由 Mark Fasheh 提交于 11月 18, 2008

JBD2 is fully backwards compatible with JBD and it's been tested enough with
Ocfs2 that we can clean this code up now.
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

53ef99ca

14 10月, 2008 3 次提交

ocfs2: Switch over to JBD2. · 2b4e30fb

由 Joel Becker 提交于 9月 03, 2008

ocfs2 wants JBD2 for many reasons, not the least of which is that JBD is
limiting our maximum filesystem size.

It's a pretty trivial change.  Most functions are just renamed.  The
only functional change is moving to Jan's inode-based ordered data mode.
It's better, too.

Because JBD2 reads and writes JBD journals, this is compatible with any
existing filesystem.  It can even interact with JBD-based ocfs2 as long
as the journal is formated for JBD.

We provide a compatibility option so that paranoid people can still use
JBD for the time being.  This will go away shortly.

[ Moved call of ocfs2_begin_ordered_truncate() from ocfs2_delete_inode() to
  ocfs2_truncate_for_delete(). --Mark ]
Signed-off-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

2b4e30fb

ocfs2: Add extended attribute support · cf1d6c76

由 Tiger Yang 提交于 8月 18, 2008

This patch implements storing extended attributes both in inode or a single
external block. We only store EA's in-inode when blocksize > 512 or that
inode block has free space for it. When an EA's value is larger than 80
bytes, we will store the value via b-tree outside inode or block.
Signed-off-by: NTiger Yang <tiger.yang@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

cf1d6c76

ocfs2: Use ocfs2_extent_list instead of ocfs2_dinode. · 811f933d

由 Tao Ma 提交于 8月 18, 2008

ocfs2_extend_meta_needed(), ocfs2_calc_extend_credits() and
ocfs2_reserve_new_metadata() are all useful for extent tree operations. But
they are all limited to an inode btree because they use a struct
ocfs2_dinode parameter. Change their parameter to struct ocfs2_extent_list
(the part of an ocfs2_dinode they actually use) so that the xattr btree code
can use these functions.
Signed-off-by: NTao Ma <tao.ma@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

811f933d

01 8月, 2008 1 次提交

[PATCH 2/2] ocfs2: Fix race between mount and recovery · 539d8264

由 Sunil Mushran 提交于 7月 14, 2008

As the fs recovery is asynchronous, there is a small chance that another
node can mount (and thus recover) the slot before the recovery thread
gets to it.

If this happens, the recovery thread will block indefinitely on the
journal/slot lock as that lock will be held for the duration of the mount
(by design) by the node assigned to that slot.

The solution implemented is to keep track of the journal replays using
a recovery generation in the journal inode, which will be incremented by the
thread replaying that journal. The recovery thread, before attempting the
blocking lock on the journal/slot lock, will compare the generation on disk
with what it has cached and skip recovery if it does not match.

This bug appears to have been inadvertently introduced during the mount/umount
vote removal by mainline commit 34d024f8. In the
mount voting scheme, the messaging would indirectly indicate that the slot
was being recovered.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

539d8264

18 4月, 2008 1 次提交

ocfs2: Change the recovery map to an array of node numbers. · 553abd04

由 Joel Becker 提交于 2月 01, 2008

The old recovery map was a bitmap of node numbers.  This was sufficient
for the maximum node number of 254.  Going forward, we want node numbers
to be UINT32.  Thus, we need a new recovery map.

Note that we can't keep track of slots here.  We must write down the
node number to recovery *before* we get the locks needed to convert a
node number into a slot number.

The recovery map is now an array of unsigned ints, max_slots in size.
It moves to journal.c with the rest of recovery.

Because it needs to be initialized, we move all of recovery initialization
into a new function, ocfs2_recovery_init().  This actually cleans up
ocfs2_initialize_super() a little as well.  Following on, recovery cleaup
becomes part of ocfs2_recovery_exit().

A number of node map functions are rendered obsolete and are removed.

Finally, waiting on recovery is wrapped in a function rather than naked
checks on the recovery_event.  This is a cleanup from Mark.
Signed-off-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

553abd04

26 1月, 2008 2 次提交

[PATCH 2/2] ocfs2: Implement group add for online resize · 7909f2bf

由 Tao Ma 提交于 12月 18, 2007

This patch adds the ability for a userspace program to request that a
properly formatted cluster group be added to the main allocation bitmap for
an Ocfs2 file system. The request is made via an ioctl, OCFS2_IOC_GROUP_ADD.
On a high level, this is similar to ext3, but we use a different ioctl as
the structure which has to be passed through is different.

During an online resize, tunefs.ocfs2 will format any new cluster groups
which must be added to complete the resize, and call OCFS2_IOC_GROUP_ADD on
each one. Kernel verifies that the core cluster group information is valid
and then does the work of linking it into the global allocation bitmap.
Signed-off-by: NTao Ma <tao.ma@oracle.com>
Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>

7909f2bf

[PATCH 1/2] ocfs2: Add group extend for online resize · d659072f

由 Tao Ma 提交于 12月 18, 2007

This patch adds the ability for a userspace program to request an extend of
last cluster group on an Ocfs2 file system. The request is made via ioctl,
OCFS2_IOC_GROUP_EXTEND. This is derived from EXT3_IOC_GROUP_EXTEND, but is
obviously Ocfs2 specific.

tunefs.ocfs2 would call this for an online-resize operation if the last
cluster group isn't full.
Signed-off-by: NTao Ma <tao.ma@oracle.com>
Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>

d659072f

13 10月, 2007 1 次提交

ocfs2: Write support for inline data · 1afc32b9

由 Mark Fasheh 提交于 9月 07, 2007

This fixes up write, truncate, mmap, and RESVSP/UNRESVP to understand inline
inode data.

For the most part, the changes to the core write code can be relied on to do
the heavy lifting. Any code calling ocfs2_write_begin (including shared
writeable mmap) can count on it doing the right thing with respect to
growing inline data to an extent tree.

Size reducing truncates, including UNRESVP can simply zero that portion of
the inode block being removed. Size increasing truncatesm, including RESVP
have to be a little bit smarter and grow the inode to an extent tree if
necessary.
Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
Reviewed-by: NJoel Becker <joel.becker@oracle.com>

1afc32b9

11 7月, 2007 1 次提交

ocfs2: support for removing file regions · 063c4561

由 Mark Fasheh 提交于 7月 03, 2007

Provide an internal interface for the removal of arbitrary file regions.

ocfs2_remove_inode_range() takes a byte range within a file and will remove
existing extents within that range. Partial clusters will be zeroed so that
any read from within the region will return zeros.
Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>

063c4561

27 4月, 2007 1 次提交

ocfs2: make room for unwritten extents flag · e48edee2

由 Mark Fasheh 提交于 3月 07, 2007

Due to the size of our group bitmaps, we'll never have a leaf node extent
record with more than 16 bits worth of clusters. Split e_clusters up so that
leaf nodes can get a flags field where we can mark unwritten extents.
Interior nodes whose length references all the child nodes beneath it can't
split their e_clusters field, so we use a union to preserve sizing there.
Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>

e48edee2

02 2月, 2007 1 次提交

ocfs2: ocfs2_link() journal credits update · e051fda4

由 Mark Fasheh 提交于 2月 01, 2007

Commit 592282cf fixed some missing directory
c/mtime updates in part by introducing a dinode update in ocfs2_add_entry().
Unfortunately, ocfs2_link() (which didn't update the directory inode before)
is now missing a single journal credit. Fix this by doubling the number of
inode updates expected during hard link creation.
Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>

e051fda4

08 12月, 2006 1 次提交

ocfs2: local mounts · c271c5c2

由 Sunil Mushran 提交于 12月 05, 2006

This allows users to format an ocfs2 file system with a special flag,
OCFS2_FEATURE_INCOMPAT_LOCAL_MOUNT. When the file system sees this flag, it
will not use any cluster services, nor will it require a cluster
configuration, thus acting like a 'local' file system.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>

c271c5c2

02 12月, 2006 8 次提交

ocfs2: Remove struct ocfs2_journal_handle in favor of handle_t · 1fabe148

由 Mark Fasheh 提交于 10月 09, 2006

This is mostly a search and replace as ocfs2_journal_handle is now no more
than a container for a handle_t pointer.

ocfs2_commit_trans() becomes very straight forward, and we remove some out
of date comments / code.
Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>

1fabe148

ocfs2: remove handle argument to ocfs2_start_trans() · 65eff9cc

由 Mark Fasheh 提交于 10月 09, 2006

All callers either pass in NULL directly, or a local variable that is
already set to NULL.

The internals of ocfs2_start_trans() get a nice cleanup as a result.
Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>

65eff9cc

M
ocfs2: remove ocfs2_journal_handle journal field · dae85832
由 Mark Fasheh 提交于 10月 09, 2006
```
It is no longer used.
Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
```
dae85832
M
ocfs2: pass ocfs2_super * into ocfs2_commit_trans() · 02dc1af4
由 Mark Fasheh 提交于 10月 09, 2006
```
This sets us up to remove handle->journal.
Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
```
02dc1af4

ocfs2: make ocfs2_alloc_handle() static · a301a27d

由 Mark Fasheh 提交于 10月 06, 2006

This is no longer used outside of journal.c
Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>

a301a27d

ocfs2: remove unused ocfs2_handle_add_lock() · daf29e9c

由 Mark Fasheh 提交于 10月 06, 2006

This gets us rid of a slab we no longer need, as well as removing the
majority of what's left on ocfs2_journal_handle.

ocfs2_commit_unstarted_handle() has no more real work to do, so remove that
function too.
Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>

daf29e9c

ocfs2: remove unused ocfs2_handle_add_inode() · 02928a71

由 Mark Fasheh 提交于 10月 06, 2006

We can also delete the unused infrastructure which was once in place to
support this functionality. ocfs2_inode_private loses ip_handle and
ip_handle_list. ocfs2_journal_handle loses handle_list.
Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>

02928a71

ocfs2: remove ocfs2_journal_handle flags field · c161f89b

由 Mark Fasheh 提交于 10月 05, 2006

Callers can set h_sync directly on the handle_t, whether a transaction has
been started or not can be determined via the existence of the handle_t on
the struct ocfs2_journal_handle.
Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>

c161f89b

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功