提交 · f7b1aa69be138ad9d7d3f31fa56f4c9407f56b6a · xiphi1978 / linux

22 7月, 2009 1 次提交

ocfs2: Fix deadlock on umount · f7b1aa69

由 Jan Kara 提交于 7月 20, 2009

In commit ea455f8a, we moved the dentry lock
put process into ocfs2_wq. This causes problems during umount because ocfs2_wq
can drop references to inodes while they are being invalidated by
invalidate_inodes() causing all sorts of nasty things (invalidate_inodes()
ending in an infinite loop, "Busy inodes after umount" messages etc.).

We fix the problem by stopping ocfs2_wq from doing any further releasing of
inode references on the superblock being unmounted, wait until it finishes
the current round of releasing and finally cleaning up all the references in
dentry_lock_list from ocfs2_put_super().

The issue was tracked down by Tao Ma <tao.ma@oracle.com>.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

f7b1aa69

09 7月, 2009 1 次提交

ocfs2: Fixup orphan scan cleanup after failed mount · 8b712cd5

由 Jeff Mahoney 提交于 7月 07, 2009

If the mount fails for any reason, ocfs2_dismount_volume calls
ocfs2_orphan_scan_stop. It requires that ocfs2_orphan_scan_init
be called to setup the mutex and work queues, but that doesn't
happen if the mount has failed and we oops accessing an uninitialized
work queue.

This patch splits the init and startup of the orphan scan, eliminating
the oops.
Signed-off-by: NJeff Mahoney <jeffm@suse.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

8b712cd5

23 6月, 2009 3 次提交

ocfs2: Disable orphan scanning for local and hard-ro mounts · df152c24

由 Sunil Mushran 提交于 6月 22, 2009

Local and Hard-RO mounts do not need orphan scanning.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

df152c24

ocfs2: Stop orphan scan as early as possible during umount · 692684e1

由 Sunil Mushran 提交于 6月 19, 2009

Currently if the orphan scan fires a tick before the user issues the umount,
the umount will wait for the queued orphan scan tasks to complete.

This patch makes the umount stop the orphan scan as early as possible so as
to reduce the probability of the queued tasks slowing down the umount.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

692684e1

ocfs2: Fix ocfs2_osb_dump() · c3d38840

由 Sunil Mushran 提交于 6月 19, 2009

Skip printing information that is not valid for local mounts.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

c3d38840

19 6月, 2009 1 次提交

block: rename CONFIG_LBD to CONFIG_LBDAF · 90c699a9

由 Bartlomiej Zolnierkiewicz 提交于 6月 19, 2009

Follow-up to "block: enable by default support for large devices
and files on 32-bit archs".

Rename CONFIG_LBD to CONFIG_LBDAF to:
- allow update of existing [def]configs for "default y" change
- reflect that it is used also for large files support nowadays
Signed-off-by: NBartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

90c699a9

12 6月, 2009 3 次提交

Push BKL down into ->remount_fs() · 337eb00a

由 Alessio Igor Bogani 提交于 5月 12, 2009

[xfs, btrfs, capifs, shmem don't need BKL, exempt]
Signed-off-by: NAlessio Igor Bogani <abogani@texware.it>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

337eb00a

push BKL down into ->put_super · 6cfd0148

由 Christoph Hellwig 提交于 5月 05, 2009

Move BKL into ->put_super from the only caller.  A couple of
filesystems had trivial enough ->put_super (only kfree and NULLing of
s_fs_info + stuff in there) to not get any locking: coda, cramfs, efs,
hugetlbfs, omfs, qnx4, shmem, all others got the full treatment.  Most
of them probably don't need it, but I'd rather sort that out individually.
Preferably after all the other BKL pushdowns in that area.

[AV: original used to move lock_super() down as well; these changes are
removed since we don't do lock_super() at all in generic_shutdown_super()
now]
[AV: fuse, btrfs and xfs are known to need no damn BKL, exempt]
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

6cfd0148

ocfs2: remove ->write_super and stop maintaining ->s_dirt · 94cb993f

由 Christoph Hellwig 提交于 4月 27, 2009

Signed-off-by: NChristoph Hellwig <hch@lst.de>
Acked-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

94cb993f

04 6月, 2009 4 次提交

T
ocfs2: Remove redundant gotos in ocfs2_mount_volume() · 06c59bb8
由 Tao Ma 提交于 5月 19, 2009
```
Signed-off-by: NTao Ma <tao.ma@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>
```
06c59bb8

ocfs2: Add statistics for the checksum and ecc operations. · 73be192b

由 Joel Becker 提交于 1月 06, 2009

It would be nice to know how often we get checksum failures.  Even
better, how many of them we can fix with the single bit ecc.  So, we add
a statistics structure.  The structure can be installed into debugfs
wherever the user wants.

For ocfs2, we'll put it in the superblock-specific debugfs directory and
pass it down from our higher-level functions.  The stats are only
registered with debugfs when the filesystem supports metadata ecc.
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

73be192b

ocfs2 patch to track delayed orphan scan timer statistics · 15633a22

由 Srinivas Eeda 提交于 6月 03, 2009

Patch to track delayed orphan scan timer statistics.

Modifies ocfs2_osb_dump to print the following:
  Orphan Scan=> Local: 10  Global: 21  Last Scan: 67 seconds ago
Signed-off-by: NSrinivas Eeda <srinivas.eeda@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

15633a22

ocfs2: timer to queue scan of all orphan slots · 83273932

由 Srinivas Eeda 提交于 6月 03, 2009

When a dentry is unlinked, the unlinking node takes an EX on the dentry lock
before moving the dentry to the orphan directory. Other nodes that have
this dentry in cache have a PR on the same dentry lock. When the EX is
requested, the other nodes flag the corresponding inode as MAYBE_ORPHANED
during downconvert. The inode is finally deleted when the last node to iput
the inode sees that i_nlink==0 and the MAYBE_ORPHANED flag is set.

A problem arises if a node is forced to free dentry locks because of memory
pressure. If this happens, the node will no longer get downconvert
notifications for the dentries that have been unlinked on another node.
If it also happens that node is actively using the corresponding inode and
happens to be the one performing the last iput on that inode, it will fail
to delete the inode as it will not have the MAYBE_ORPHANED flag set.

This patch fixes this shortcoming by introducing a periodic scan of the
orphan directories to delete such inodes. Care has been taken to distribute
the workload across the cluster so that no one node has to perform the task
all the time.
Signed-off-by: NSrinivas Eeda <srinivas.eeda@oracle.com>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

83273932

23 5月, 2009 1 次提交

block: Do away with the notion of hardsect_size · e1defc4f

由 Martin K. Petersen 提交于 5月 22, 2009

Until now we have had a 1:1 mapping between storage device physical
block size and the logical block sized used when addressing the device.
With SATA 4KB drives coming out that will no longer be the case.  The
sector size will be 4KB but the logical block size will remain
512-bytes.  Hence we need to distinguish between the physical block size
and the logical ditto.

This patch renames hardsect_size to logical_block_size.
Signed-off-by: NMartin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: NJens Axboe <jens.axboe@oracle.com>

e1defc4f

04 4月, 2009 3 次提交

ocfs2: recover orphans in offline slots during recovery and mount · 9140db04

由 Srinivas Eeda 提交于 3月 06, 2009

During recovery, a node recovers orphans in it's slot and the dead node(s). But
if the dead nodes were holding orphans in offline slots, they will be left
unrecovered.

If the dead node is the last one to die and is holding orphans in other slots
and is the first one to mount, then it only recovers it's own slot, which
leaves orphans in offline slots.

This patch queues complete_recovery to clean orphans for all offline slots
during mount and node recovery.
Signed-off-by: NSrinivas Eeda <srinivas.eeda@oracle.com>
Acked-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

9140db04

ocfs2: Add a name indexed b-tree to directory inodes · 9b7895ef

由 Mark Fasheh 提交于 11月 12, 2008

This patch makes use of Ocfs2's flexible btree code to add an additional
tree to directory inodes. The new tree stores an array of small,
fixed-length records in each leaf block. Each record stores a hash value,
and pointer to a block in the traditional (unindexed) directory tree where a
dirent with the given name hash resides. Lookup exclusively uses this tree
to find dirents, thus providing us with constant time name lookups.

Some of the hashing code was copied from ext3. Unfortunately, it has lots of
unfixed checkpatch errors. I left that as-is so that tracking changes would
be easier.
Signed-off-by: NMark Fasheh <mfasheh@suse.com>
Acked-by: NJoel Becker <joel.becker@oracle.com>

9b7895ef

ocfs2: Expose the file system state via debugfs · 50397507

由 Sunil Mushran 提交于 12月 17, 2008

This patch creates a per mount debugfs file, fs_state, which exposes
information like, cluster stack in use, states of the downconvert, recovery
and commit threads, number of journal txns, some allocation stats, list of
all slots, etc.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

50397507

27 2月, 2009 2 次提交

ocfs2: add IO error check in ocfs2_get_sector() · 28d57d43

由 wengang wang 提交于 2月 13, 2009

Check for IO error in ocfs2_get_sector().
Signed-off-by: NWengang Wang <wen.gang.wang@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

28d57d43

ocfs2: lock the metaecc process for xattr bucket · c8b9cf9a

由 Tao Ma 提交于 2月 24, 2009

For other metadata in ocfs2, metaecc is checked in ocfs2_read_blocks
with io_mutex held. While for xattr bucket, it is calculated by
the whole buckets. So we have to add a spin_lock to prevent multiple
processes calculating metaecc.
Signed-off-by: NTao Ma <tao.ma@oracle.com>
Tested-by: NTristan Ye <tristan.ye@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

c8b9cf9a

03 2月, 2009 1 次提交

ocfs2: Push out dropping of dentry lock to ocfs2_wq · ea455f8a

由 Jan Kara 提交于 1月 12, 2009

Dropping of last reference to dentry lock is a complicated operation involving
dropping of reference to inode. This can get complicated and quota code in
particular needs to obtain some quota locks which leads to potential deadlock.
Thus we defer dropping of inode reference to ocfs2_wq.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

ea455f8a

06 1月, 2009 6 次提交

ocfs2: Validate superblock with checksum and ecc. · d030cc97

由 Joel Becker 提交于 12月 11, 2008

The superblock is read via a raw call.  Validate it after we find it
from its signature.
Signed-off-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

d030cc97

ocfs2: Enable quota accounting on mount, disable on umount · 19ece546

由 Jan Kara 提交于 8月 21, 2008

Enable quota usage tracking on mount and disable it on umount. Also
add support for quota on and quota off quotactls and usrquota and
grpquota mount options. Add quota features among supported ones.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

19ece546

ocfs2: Periodic quota syncing · 171bf93c

由 Mark Fasheh 提交于 10月 20, 2008

This patch creates a work queue for periodic syncing of locally cached quota
information to the global quota files. We constantly queue a delayed work
item, to get the periodic behavior.
Signed-off-by: NMark Fasheh <mfasheh@suse.com>
Acked-by: NJan Kara <jack@suse.cz>

171bf93c

ocfs2: Implementation of local and global quota file handling · 9e33d69f

由 Jan Kara 提交于 8月 25, 2008

For each quota type each node has local quota file. In this file it stores
changes users have made to disk usage via this node. Once in a while this
information is synced to global file (and thus with other nodes) so that
limits enforcement at least aproximately works.

Global quota files contain all the information about usage and limits. It's
mostly handled by the generic VFS code (which implements a trie of structures
inside a quota file). We only have to provide functions to convert structures
from on-disk format to in-memory one. We also have to provide wrappers for
various quota functions starting transactions and acquiring necessary cluster
locks before the actual IO is really started.
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

9e33d69f

J
ocfs2: Assign feature bits and system inodes to quota feature and quota files · 1a224ad1
由 Jan Kara 提交于 8月 20, 2008
```
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>
```
1a224ad1

ocfs2: add mount option and Kconfig option for acl · a68979b8

由 Tiger Yang 提交于 11月 14, 2008

This patch adds the Kconfig option "CONFIG_OCFS2_FS_POSIX_ACL"
and mount options "acl" to enable acls in Ocfs2.
Signed-off-by: NTiger Yang <tiger.yang@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

a68979b8

14 10月, 2008 10 次提交

M
ocfs2: Don't check for NULL before brelse() · a81cb88b
由 Mark Fasheh 提交于 10月 07, 2008
```
This is pointless as brelse() already does the check.

Signed-off-by: Mark Fasheh
```
a81cb88b

ocfs2: Add xattr mount option in ocfs2_show_options() · b0f73cfc

由 Sunil Mushran 提交于 9月 05, 2008

Patch adds check for [no]user_xattr in ocfs2_show_options() that completes
the list of all mount options.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

b0f73cfc

ocfs2: Switch over to JBD2. · 2b4e30fb

由 Joel Becker 提交于 9月 03, 2008

ocfs2 wants JBD2 for many reasons, not the least of which is that JBD is
limiting our maximum filesystem size.

It's a pretty trivial change.  Most functions are just renamed.  The
only functional change is moving to Jan's inode-based ordered data mode.
It's better, too.

Because JBD2 reads and writes JBD journals, this is compatible with any
existing filesystem.  It can even interact with JBD-based ocfs2 as long
as the journal is formated for JBD.

We provide a compatibility option so that paranoid people can still use
JBD for the time being.  This will go away shortly.

[ Moved call of ocfs2_begin_ordered_truncate() from ocfs2_delete_inode() to
  ocfs2_truncate_for_delete(). --Mark ]
Signed-off-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

2b4e30fb

ocfs2: Add the 'inode64' mount option. · 12462f1d

由 Joel Becker 提交于 9月 03, 2008

Now that ocfs2 limits inode numbers to 32bits, add a mount option to
disable the limit.  This parallels XFS.  64bit systems can handle the
larger inode numbers.

[ Added description of inode64 mount option in ocfs2.txt. --Mark ]
Signed-off-by: NJoel Becker <joel.becker@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

12462f1d

ocfs2: Add incompatible flag for extended attribute · 8154da3d

由 Tiger Yang 提交于 8月 18, 2008

This patch adds the s_incompat flag for extended attribute support. This
helps us ensure that older versions of Ocfs2 or ocfs2-tools will not be able
to mount a volume with xattr support.
Signed-off-by: NTiger Yang <tiger.yang@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

8154da3d

ocfs2: Add extended attribute support · cf1d6c76

由 Tiger Yang 提交于 8月 18, 2008

This patch implements storing extended attributes both in inode or a single
external block. We only store EA's in-inode when blocksize > 512 or that
inode block has free space for it. When an EA's value is larger than 80
bytes, we will store the value via b-tree outside inode or block.
Signed-off-by: NTiger Yang <tiger.yang@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

cf1d6c76

ocfs2: reserve inline space for extended attribute · fdd77704

由 Tiger Yang 提交于 8月 18, 2008

Add the structures and helper functions we want for handling inline extended
attributes. We also update the inline-data handlers so that they properly
function in the event that we have both inline data and inline attributes
sharing an inode block.
Signed-off-by: NTiger Yang <tiger.yang@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

fdd77704

ocfs2: throttle back local alloc when low on disk space · 9c7af40b

由 Mark Fasheh 提交于 7月 28, 2008

Ocfs2's local allocator disables itself for the duration of a mount point
when it has trouble allocating a large enough area from the primary bitmap.
That can cause performance problems, especially for disks which were only
temporarily full or fragmented. This patch allows for the allocator to
shrink it's window first, before being disabled. Later, it can also be
re-enabled so that any performance drop is minimized.

To do this, we allow the value of osb->local_alloc_bits to be shrunk when
needed. The default value is recorded in a mostly read-only variable so that
we can re-initialize when required.

Locking had to be updated so that we could protect changes to
local_alloc_bits. Mostly this involves protecting various local alloc values
with the osb spinlock. A new state is also added, OCFS2_LA_THROTTLED, which
is used when the local allocator is has shrunk, but is not disabled. If the
available space dips below 1 megabyte, the local alloc file is disabled. In
either case, local alloc is re-enabled 30 seconds after the event, or when
an appropriate amount of bits is seen in the primary bitmap.
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

9c7af40b

ocfs2: Track local alloc bits internally · ebcee4b5

由 Mark Fasheh 提交于 7月 28, 2008

Do this instead of tracking absolute local alloc size. This avoids
needless re-calculatiion of bits from bytes in localalloc.c. Additionally,
the value is now in a more natural unit for internal file system bitmap
work.
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

ebcee4b5

vfs: Use const for kernel parser table · a447c093

由 Steven Whitehouse 提交于 10月 13, 2008

This is a much better version of a previous patch to make the parser
tables constant. Rather than changing the typedef, we put the "const" in
all the various places where its required, allowing the __initconst
exception for nfsroot which was the cause of the previous trouble.

This was posted for review some time ago and I believe its been in -mm
since then.
Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
Cc: Alexander Viro <aviro@redhat.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a447c093

01 8月, 2008 1 次提交

[PATCH 2/2] ocfs2: Fix race between mount and recovery · 539d8264

由 Sunil Mushran 提交于 7月 14, 2008

As the fs recovery is asynchronous, there is a small chance that another
node can mount (and thus recover) the slot before the recovery thread
gets to it.

If this happens, the recovery thread will block indefinitely on the
journal/slot lock as that lock will be held for the duration of the mount
(by design) by the node assigned to that slot.

The solution implemented is to keep track of the journal replays using
a recovery generation in the journal inode, which will be incremented by the
thread replaying that journal. The recovery thread, before attempting the
blocking lock on the journal/slot lock, will compare the generation on disk
with what it has cached and skip recovery if it does not match.

This bug appears to have been inadvertently introduced during the mount/umount
vote removal by mainline commit 34d024f8. In the
mount voting scheme, the messaging would indirectly indicate that the slot
was being recovered.
Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

539d8264

27 7月, 2008 1 次提交

SL*B: drop kmem cache argument from constructor · 51cc5068

由 Alexey Dobriyan 提交于 7月 25, 2008

Kmem cache passed to constructor is only needed for constructors that are
themselves multiplexeres.  Nobody uses this "feature", nor does anybody uses
passed kmem cache in non-trivial way, so pass only pointer to object.

Non-trivial places are:
	arch/powerpc/mm/init_64.c
	arch/powerpc/mm/hugetlbpage.c

This is flag day, yes.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Acked-by: NPekka Enberg <penberg@cs.helsinki.fi>
Acked-by: NChristoph Lameter <cl@linux-foundation.org>
Cc: Jon Tollefson <kniht@linux.vnet.ibm.com>
Cc: Nick Piggin <nickpiggin@yahoo.com.au>
Cc: Matt Mackall <mpm@selenic.com>
[akpm@linux-foundation.org: fix arch/powerpc/mm/hugetlbpage.c]
[akpm@linux-foundation.org: fix mm/slab.c]
[akpm@linux-foundation.org: fix ubifs]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

51cc5068

15 7月, 2008 1 次提交

ocfs2: Handle error during journal load · 01af4820

由 Wengang Wang 提交于 6月 10, 2008

This patch ensures the mount fails if the fs is unable to load the journal.
Signed-off-by: NWengang Wang <wen.gang.wang@oracle.com>
Acked-by: NSunil Mushran <sunil.mushran@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

01af4820

18 4月, 2008 1 次提交

ocfs2: Add inode stealing for ocfs2_reserve_new_inode · 4d0ddb2c

由 Tao Ma 提交于 3月 05, 2008

Inode allocation is modified to look in other nodes allocators during
extreme out of space situations. We retry our own slot when space is freed
back to the global bitmap, or whenever we've allocated more than 1024 inodes
from another slot.
Signed-off-by: NTao Ma <tao.ma@oracle.com>
Signed-off-by: NMark Fasheh <mfasheh@suse.com>

4d0ddb2c