- 08 9月, 2010 13 次提交
-
-
由 Mark Fasheh 提交于
ocfs2_create_inode_in_orphan() is used by reflink to create the newly reflinked inode simultaneously in the orphan dir. This allows us to easily handle partially-reflinked files during recovery cleanup. We have a problem though - the orphan dir stringifies inode # to determine a unique name under which the orphan entry dirent can be created. Since ocfs2_create_inode_in_orphan() needs the space allocated in the orphan dir before it can allocate the inode, we currently call into the orphan code: /* * We give the orphan dir the root blkno to fake an orphan name, * and allocate enough space for our insertion. */ status = ocfs2_prepare_orphan_dir(osb, &orphan_dir, osb->root_blkno, orphan_name, &orphan_insert); Using osb->root_blkno might work fine on unindexed directories, but the orphan dir can have an index. When it has that index, the above code fails to allocate the proper index entry. Later, when we try to remove the file from the orphan dir (using the actual inode #), the reflink operation will fail. To fix this, I created a function ocfs2_alloc_orphaned_file() which uses the newly split out orphan and inode alloc code to figure out what the inode block number will be (once allocated) and then prepare the orphan dir from that data. Signed-off-by: NMark Fasheh <mfasheh@suse.com> Signed-off-by: NTao Ma <tao.ma@oracle.com>
-
由 Mark Fasheh 提交于
We do this because ocfs2_create_inode_in_orphan() wants to order locking of the orphan dir with respect to locking of the inode allocator *before* making any changes to the directory. Signed-off-by: NMark Fasheh <mfasheh@suse.com> Signed-off-by: NTao Ma <tao.ma@oracle.com>
-
由 Mark Fasheh 提交于
This allows code which needs to know the eventual block number of an inode but can't allocate it yet due to transaction or lock ordering. For example, ocfs2_create_inode_in_orphan() currently gives a junk blkno for preparation of the orphan dir because it can't yet know where the actual inode is placed - that code is actually in ocfs2_mknod_locked. This is a problem when the orphan dirs are indexed as the junk inode number will create an index entry which goes unused (and fails the later removal from the orphan dir). Now with these interfaces, ocfs2_create_inode_in_orphan() can run the block group search (and get back the inode block number) *before* any actual allocation occurs. Signed-off-by: NMark Fasheh <mfasheh@suse.com> Signed-off-by: NTao Ma <tao.ma@oracle.com>
-
由 Mark Fasheh 提交于
ocfs2_search_chain() makes the same updates as ocfs2_alloc_dinode_update_counts to the alloc inode. Instead of open coding the bitmap update, use our helper function. Signed-off-by: NMark Fasheh <mfasheh@suse.com> Signed-off-by: NTao Ma <tao.ma@oracle.com>
-
由 Mark Fasheh 提交于
Do this by splitting the bulk of the function away from the inode allocation code at the very tom of ocfs2_mknod_locked(). Existing callers don't need to change and won't see any difference. The new function created, __ocfs2_mknod_locked() will be used shortly. Signed-off-by: NMark Fasheh <mfasheh@suse.com> Signed-off-by: NTao Ma <tao.ma@oracle.com>
-
由 Tristan Ye 提交于
The patch is to fix the regression bug brought from commit 6b933c8e...( 'ocfs2: Avoid direct write if we fall back to buffered I/O'): http://oss.oracle.com/bugzilla/show_bug.cgi?id=1285 The commit 6b933c8e changed __generic_file_aio_write to generic_file_buffered_write, which didn't call filemap_{write,wait}_range to flush the pagecaches when we were falling O_DIRECT writes back to buffered ones. it did hurt the O_DIRECT semantics somehow in extented odirect writes. This patch tries to guarantee O_DIRECT writes of 'fall back to buffered' to be correctly flushed. Signed-off-by: NTristan Ye <tristan.ye@oracle.com> Signed-off-by: NTao Ma <tao.ma@oracle.com>
-
由 Jan Kara 提交于
We cannot call grab_cache_page() when holding filesystem locks or with a transaction started as grab_cache_page() calls page allocation with GFP_KERNEL flag and thus page reclaim can recurse back into the filesystem causing deadlocks or various assertion failures. We have to use find_or_create_page() instead and pass it GFP_NOFS as we do with other allocations. Acked-by: NMark Fasheh <mfasheh@suse.com> Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NTao Ma <tao.ma@oracle.com>
-
由 Mark Fasheh 提交于
We were setting ac->ac_last_group in ocfs2_claim_suballoc_bits from res->sr_bg_blkno. Unfortunately, res->sr_bg_blkno is going to be zero under normal (non-fragmented) circumstances. The discontig block group patches effectively turned off that feature. Fix this by correctly calculating what the next group hint should be. Acked-by: NTao Ma <tao.ma@oracle.com> Signed-off-by: NMark Fasheh <mfasheh@suse.com> Tested-by: NGoldwyn Rodrigues <rgoldwyn@suse.de> Signed-off-by: NTao Ma <tao.ma@oracle.com>
-
由 Tao Ma 提交于
We have added discontig block group now, and now an inode can be allocated in an discontig block group. So get it in ocfs2_get_suballoc_slot_bit. The old ocfs2_test_suballoc_bit gets group block no from the allocation inode which is wrong. Fix it by passing the right group. Acked-by: NMark Fasheh <mfasheh@suse.com> Signed-off-by: NTao Ma <tao.ma@oracle.com>
-
由 Jan Kara 提交于
When 'barrier' mount option is specified, we have to issue a cache flush during fdatasync(2). We have to do this even if inode doesn't have I_DIRTY_DATASYNC set because we still have to get written *data* to disk so that they are not lost in case of crash. Acked-by: NTao Ma <tao.ma@oracle.com> Signed-off-by: NJan Kara <jack@suse.cz> Singed-off-by: NTao Ma <tao.ma@oracle.com>
-
由 Tao Ma 提交于
__ocfs2_page_mkwrite now is broken in handling file end. 1. the last page should be the page contains i_size - 1. 2. the len in the last page is also calculated wrong. So change them accordingly. Acked-by: NMark Fasheh <mfasheh@suse.com> Signed-off-by: NTao Ma <tao.ma@oracle.com>
-
由 Sunil Mushran 提交于
For local mounts, ocfs2_read_locked_inode() calls ocfs2_read_blocks_sync() to read the inode off the disk. The latter first checks to see if that block is cached in the journal, and, if so, returns that block. That is ok. But ocfs2_read_locked_inode() goes wrong when it tries to validate the checksum of such blocks. Blocks that are cached in the journal may not have had their checksum computed as yet. We should not validate the checksums of such blocks. Fixes ossbz#1282 http://oss.oracle.com/bugzilla/show_bug.cgi?id=1282Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com> Cc: stable@kernel.org Singed-off-by: NTao Ma <tao.ma@oracle.com>
-
由 Sunil Mushran 提交于
Like tools, the checksum validate function now prints the values in hex. Signed-off-by: NSunil Mushran <sunil.mushran@oracle.com> Singed-off-by: NTao Ma <tao.ma@oracle.com>
-
- 10 8月, 2010 6 次提交
-
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
... and let iput_final() do the actual eviction or retention Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Christoph Hellwig 提交于
Make sure we check the truncate constraints early on in ->setattr by adding those checks to inode_change_ok. Also clean up and document inode_change_ok to make this obvious. As a fallout we don't have to call inode_newsize_ok from simple_setsize and simplify it down to a truncate_setsize which doesn't return an error. This simplifies a lot of setattr implementations and means we use truncate_setsize almost everywhere. Get rid of fat_setsize now that it's trivial and mark ext2_setsize static to make the calling convention obvious. Keep the inode_newsize_ok in vmtruncate for now as all callers need an audit for its removal anyway. Note: setattr code in ecryptfs doesn't call inode_change_ok at all and needs a deeper audit, but that is left for later. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Christoph Hellwig 提交于
Replace inode_setattr with opencoded variants of it in all callers. This moves the remaining call to vmtruncate into the filesystem methods where it can be replaced with the proper truncate sequence. In a few cases it was obvious that we would never end up calling vmtruncate so it was left out in the opencoded variant: spufs: explicitly checks for ATTR_SIZE earlier btrfs,hugetlbfs,logfs,dlmfs: explicitly clears ATTR_SIZE earlier ufs: contains an opencoded simple_seattr + truncate that sets the filesize just above In addition to that ncpfs called inode_setattr with handcrafted iattrs, which allowed to trim down the opencoded variant. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Christoph Hellwig 提交于
Move the call to vmtruncate to get rid of accessive blocks to the callers in prepearation of the new truncate calling sequence. This was only done for DIO_LOCKING filesystems, so the __blockdev_direct_IO_newtrunc variant was not needed anyway. Get rid of blockdev_direct_IO_no_locking and its _newtrunc variant while at it as just opencoding the two additional paramters is shorted than the name suffix. Signed-off-by: NChristoph Hellwig <hch@lst.de> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 08 8月, 2010 8 次提交
-
-
由 Tristan Ye 提交于
Currently, o2net_accept_one() is allowed to accept a connection from listening node itself, such a fake connection will not be successfully established due to no handshake detected afterwards, and later end up with triggering connecting worker in a loop. We're going to fix this by treating such connection request as 'invalid', since we've got no chance of requesting connection from a node to itself in a OCFS2 cluster. The fix doesn't hurt user's scan for o2net-listener, it always gets a successful connection from userpace. Signed-off-by: NTristan Ye <tristan.ye@oracle.com> Acked-by: NSunil Mushran <sunil.mushran@oracle.com> Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-
由 Wengang Wang 提交于
When we need to take both dlm_domain_lock and dlm->spinlock, we should take them in order of: dlm_domain_lock then dlm->spinlock. There is pathes disobey this order. That is calling dlm_lockres_put() with dlm->spinlock held in dlm_run_purge_list. dlm_lockres_put() calls dlm_put() at the ref and dlm_put() locks on dlm_domain_lock. Fix: Don't grab/put the dlm when the initialising/releasing lockres. That grab is not required because we don't call dlm_unregister_domain() based on refcount. Signed-off-by: NWengang Wang <wen.gang.wang@oracle.com> Cc: stable@kernel.org Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-
由 Wengang Wang 提交于
In the following situation, there remains an incorrect bit in refmap on the recovery master. Finally the recovery master will fail at purging the lockres due to the incorrect bit in refmap. 1) node A has no interest on lockres A any longer, so it is purging it. 2) the owner of lockres A is node B, so node A is sending de-ref message to node B. 3) at this time, node B crashed. node C becomes the recovery master. it recovers lockres A(because the master is the dead node B). 4) node A migrated lockres A to node C with a refbit there. 5) node A failed to send de-ref message to node B because it crashed. The failure is ignored. no other action is done for lockres A any more. For mormal, re-send the deref message to it to recovery master can fix it. Well, ignoring the failure of deref to the original master and not recovering the lockres to recovery master has the same effect. And the later is simpler. Signed-off-by: NWengang Wang <wen.gang.wang@oracle.com> Acked-by: NSrinivas Eeda <srinivas.eeda@oracle.com> Cc: stable@kernel.org Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-
由 Jiaju Zhang 提交于
Hi, Thanks a lot for all the review and comments so far;) I'd like to send the improved (V4) version of this patch. This patch fixes a deadlock in OCFS2 ACL. We found this bug in OCFS2 and Samba integration using scenario, the symptom is several smbd processes will be hung under heavy workload. Finally we found out it is the nested PR lock calling that leads to this deadlock: node1 node2 gr PR | V PR(EX)---> BAST:OCFS2_LOCK_BLOCKED | V rq PR | V wait=1 After requesting the 2nd PR lock, the process "smbd" went into D state. It can only be woken up when the 1st PR lock's RO holder equals zero. There should be an ocfs2_inode_unlock in the calling path later on, which can decrement the RO holder. But since it has been in uninterruptible sleep, the unlock function has no chance to be called. The related stack trace is: smbd D ffff8800013d0600 0 9522 5608 0x00000000 ffff88002ca7fb18 0000000000000282 ffff88002f964500 ffff88002ca7fa98 ffff8800013d0600 ffff88002ca7fae0 ffff88002f964340 ffff88002f964340 ffff88002ca7ffd8 ffff88002ca7ffd8 ffff88002f964340 ffff88002f964340 Call Trace: [<ffffffff80350425>] schedule_timeout+0x175/0x210 [<ffffffff8034f580>] wait_for_common+0xf0/0x210 [<ffffffffa03e12b9>] __ocfs2_cluster_lock+0x3b9/0xa90 [ocfs2] [<ffffffffa03e7665>] ocfs2_inode_lock_full_nested+0x255/0xdb0 [ocfs2] [<ffffffffa0446019>] ocfs2_get_acl+0x69/0x120 [ocfs2] [<ffffffffa0446368>] ocfs2_check_acl+0x28/0x80 [ocfs2] [<ffffffff800e3507>] acl_permission_check+0x57/0xb0 [<ffffffff800e357d>] generic_permission+0x1d/0xc0 [<ffffffffa03eecea>] ocfs2_permission+0x10a/0x1d0 [ocfs2] [<ffffffff800e3f65>] inode_permission+0x45/0x100 [<ffffffff800d86b3>] sys_chdir+0x53/0x90 [<ffffffff80007458>] system_call_fastpath+0x16/0x1b [<00007f34a4ef6927>] 0x7f34a4ef6927 For details, please see: https://bugzilla.novell.com/show_bug.cgi?id=614332 and http://oss.oracle.com/bugzilla/show_bug.cgi?id=1278Signed-off-by: NJiaju Zhang <jjzhang@suse.de> Acked-by: NMark Fasheh <mfasheh@suse.com> Cc: stable@kernel.org Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-
由 Tao Ma 提交于
The refcount record calculation in ocfs2_calc_refcount_meta_credits is too optimistic that we can always allocate contiguous clusters and handle an already existed refcount rec as a whole. Actually because of file system fragmentation, we may have the chance to split a refcount record into 3 parts during the transaction. So consider the worst case in record calculation. Cc: stable@kernel.org Signed-off-by: NTao Ma <tao.ma@oracle.com> Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-
由 Srinivas Eeda 提交于
This patch fixes two problems in dlm_run_purgelist 1. If a lockres is found to be in use, dlm_run_purgelist keeps trying to purge the same lockres instead of trying the next lockres. 2. When a lockres is found unused, dlm_run_purgelist releases lockres spinlock before setting DLM_LOCK_RES_DROPPING_REF and calls dlm_purge_lockres. spinlock is reacquired but in this window lockres can get reused. This leads to BUG. This patch modifies dlm_run_purgelist to skip lockres if it's in use and purge next lockres. It also sets DLM_LOCK_RES_DROPPING_REF before releasing the lockres spinlock protecting it from getting reused. Signed-off-by: NSrinivas Eeda <srinivas.eeda@oracle.com> Acked-by: NSunil Mushran <sunil.mushran@oracle.com> Cc: stable@kernel.org Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-
由 Wengang Wang 提交于
When we have to take both dlm->master_lock and lockres->spinlock, take them in order lockres->spinlock and then dlm->master_lock. The patch fixes a violation of the rule. We can simply move taking dlm->master_lock to where we have dropped res->spinlock since when we access res->state and free mle memory we don't need master_lock's protection. Signed-off-by: NWengang Wang <wen.gang.wang@oracle.com> Cc: stable@kernel.org Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-
由 Tiger Yang 提交于
Setting the acl while creating a new inode depends on the error codes of posix_acl_create_masq. This patch fix a issue of overwriting the error codes of it. Reported-by: NPawel Zawora <pzawora@gmail.com> Cc: <stable@kernel.org> [ .33, .34 ] Signed-off-by: NTiger Yang <tiger.yang@oracle.com> Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-
- 04 8月, 2010 1 次提交
-
-
由 Theodore Ts'o 提交于
Lockstat reports have shown that j_state_lock is a major source of lock contention, especially on systems with more than 4 CPU cores. So change it to be a read/write spinlock. Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
- 27 7月, 2010 2 次提交
-
-
由 Christoph Hellwig 提交于
Filesystems with unwritten extent support must not complete an AIO request until the transaction to convert the extent has been commited. That means the aio_complete calls needs to be moved into the ->end_io callback so that the filesystem can control when to call it exactly. This makes a bit of a mess out of dio_complete and the ->end_io callback prototype even more complicated. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Christoph Hellwig 提交于
Filesystems with unwritten extent support must not complete an AIO request until the transaction to convert the extent has been commited. That means the aio_complete calls needs to be moved into the ->end_io callback so that the filesystem can control when to call it exactly. This makes a bit of a mess out of dio_complete and the ->end_io callback prototype even more complicated. Signed-off-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NJan Kara <jack@suse.cz> Signed-off-by: NAlex Elder <aelder@sgi.com>
-
- 20 7月, 2010 1 次提交
-
-
由 Joe Perches 提交于
Signed-off-by: NJoe Perches <joe@perches.com> Acked-by: NJoel Becker <joel.becker@oracle.com> Signed-off-by: NJiri Kosina <jkosina@suse.cz>
-
- 17 7月, 2010 1 次提交
-
-
由 Joel Becker 提交于
ocfs2_write_zero_page() has a loop that won't ever be skipped, but gcc doesn't know that. Set ret=0 just to make gcc happy. Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-
- 16 7月, 2010 3 次提交
-
-
由 Jan Kara 提交于
OCFS2 uses t_commit trigger to compute and store checksum of the just committed blocks. When a buffer has b_frozen_data, checksum is computed for it instead of b_data but this can result in an old checksum being written to the filesystem in the following scenario: 1) transaction1 is opened 2) handle1 is opened 3) journal_access(handle1, bh) - This sets jh->b_transaction to transaction1 4) modify(bh) 5) journal_dirty(handle1, bh) 6) handle1 is closed 7) start committing transaction1, opening transaction2 8) handle2 is opened 9) journal_access(handle2, bh) - This copies off b_frozen_data to make it safe for transaction1 to commit. jh->b_next_transaction is set to transaction2. 10) jbd2_journal_write_metadata() checksums b_frozen_data 11) the journal correctly writes b_frozen_data to the disk journal 12) handle2 is closed - There was no dirty call for the bh on handle2, so it is never queued for any more journal operation 13) Checkpointing finally happens, and it just spools the bh via normal buffer writeback. This will write b_data, which was never triggered on and thus contains a wrong (old) checksum. This patch fixes the problem by calling the trigger at the moment data is frozen for journal commit - i.e., either when b_frozen_data is created by do_get_write_access or just before we write a buffer to the log if b_frozen_data does not exist. We also rename the trigger to t_frozen as that better describes when it is called. Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NMark Fasheh <mfasheh@suse.com> Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-
由 Wengang Wang 提交于
For migration, we are waiting for DLM_LOCK_RES_MIGRATING flag to be set before sending DLM_MIG_LOCKRES_MSG message to the target. We are using dlm_migration_can_proceed() for that purpose. However, if the node is down, dlm_migration_can_proceed() will also return "go ahead". In this rare case, the DLM_LOCK_RES_MIGRATING flag might not be set yet. Remove the BUG_ON() that trips over this condition. Signed-off-by: NWengang Wang <wen.gang.wang@oracle.com> Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-
由 Tao Ma 提交于
During CoW, the pages after i_size don't contain valid data, so there's no need to read and duplicate them. Signed-off-by: NTao Ma <tao.ma@oracle.com> Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-
- 13 7月, 2010 5 次提交
-
-
由 Dan Carpenter 提交于
This function is only called from one place and it's like this: dlm_register_domain(conn->cc_name, dlm_key, &fs_version); The "conn->cc_name" is 64 characters long. If strlen(conn->cc_name) were equal to O2NM_MAX_NAME_LEN (64) that would be a bug because strlen() doesn't count the NULL character. In fact, if you look how O2NM_MAX_NAME_LEN is used, it mostly describes 64 character buffers. The only exception is nd_name from struct o2nm_node. Anyway I looked into it and in this case the domain string comes from osb->uuid_str in ocfs2_setup_osb_uuid(). That's 32 characters and NULL which easily fits into O2NM_MAX_NAME_LEN. This patch doesn't change how the code works, but I think it makes the code a little cleaner. Signed-off-by: NDan Carpenter <error27@gmail.com> Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-
由 Tao Ma 提交于
The new reservation code in local alloc has add the limitation that the caller should handle the case that the local alloc doesn't give use enough contiguous clusters. It make the old xattr reflink code broken. So this patch udpate the xattr reflink code so that it can handle the case that local alloc give us one cluster at a time. Signed-off-by: NTao Ma <tao.ma@oracle.com> Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-
由 Tao Ma 提交于
The old ocfs2_xattr_extent_allocation is too optimistic about the clusters we can get. So actually if the file system is too fragmented, ocfs2_add_clusters_in_btree will return us with EGAIN and we need to allocate clusters once again. So this patch change it to a while loop so that we can allocate clusters until we reach clusters_to_add. Signed-off-by: NTao Ma <tao.ma@oracle.com> Signed-off-by: NJoel Becker <joel.becker@oracle.com> Cc: stable@kernel.org
-
由 Tao Ma 提交于
In ocfs2_block_group_alloc, we set c_blkno by bg->bg_blkno. But actually bg->bg_blkno is already changed to little endian in ocfs2_block_group_fill. So remove the extra cpu_to_le64. Reported-by: NMarcos Matsunaga <Marcos.Matsunaga@oracle.com> Signed-off-by: NTao Ma <tao.ma@oracle.com> Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-
由 Wengang Wang 提交于
dlm->recovery_map is defined as unsigned long recovery_map[BITS_TO_LONGS(O2NM_MAX_NODES)]; We should treat O2NM_MAX_NODES as the bit map size in bits. This patches fixes a bit operation that takes O2NM_MAX_NODES + 1 as bitmap size. Signed-off-by: NWengang Wang <wen.gang.wang@oracle.com> Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-