- 31 1月, 2008 16 次提交
-
-
由 Denis Cheng 提交于
also change name_prefix from char pointer to char array. Signed-off-by: NDenis Cheng <crquan@gmail.com> Signed-off-by: NDavid Teigland <teigland@redhat.com>
-
由 David Teigland 提交于
A couple small clean-ups. Remove unnecessary wrapper-functions in rcom.c, and remove unnecessary casting and an unnecessary ASSERT in util.c. Signed-off-by: NDavid Teigland <teigland@redhat.com>
-
由 Patrick Caulfeld 提交于
The 32/64 compatibility code in the DLM does not check the validity of the lock name length passed into it, so it can easily overwrite memory if the value is rubbish (as early versions of libdlm can cause with unlock calls, it doesn't zero the field). This patch restricts the length of the name to the amount of data actually passed into the call. Signed-off-by: NPatrick Caulfield <pcaulfie@redhat.com> Signed-off-by: NDavid Teigland <teigland@redhat.com>
-
由 David Teigland 提交于
To prevent the master of an rsb from changing rapidly, an unused rsb is kept on the "toss list" for a period of time to be reused. The toss list was being cleared completely for each recovery, which is unnecessary. Much of the benefit of the toss list can be maintained if nodes keep rsb's in their toss list that they are the master of. These rsb's need to be included when the resource directory is rebuilt during recovery. Signed-off-by: NDavid Teigland <teigland@redhat.com>
-
由 David Teigland 提交于
The invalid lockspace messages are normal and can appear relatively often. They should be suppressed without debugging enabled. Signed-off-by: NDavid Teigland <teigland@redhat.com>
-
由 David Teigland 提交于
The dlm_put_lkb() can free the lkb and its associated ua structure, so we can't depend on using the ua struct after the put. Signed-off-by: NDavid Teigland <teigland@redhat.com>
-
由 David Teigland 提交于
In a rare case we may need to repeat a local resource directory lookup due to a race with removing the rsb and removing the resdir record. We'll never need to do more than a single additional lookup, though, so the infinite loop around the lookup can be removed. In addition to being unnecessary, the infinite loop is dangerous since some other unknown condition may appear causing the loop to never break. Signed-off-by: NDavid Teigland <teigland@redhat.com>
-
由 David Teigland 提交于
Non-forced unlocks should be rejected if the lock is waiting on the rsb_lookup list for another lock to establish the master node. Signed-off-by: NDavid Teigland <teigland@redhat.com>
-
由 David Teigland 提交于
There was some hit and miss validation of messages that has now been cleaned up and unified. Before processing a message, the new validate_message() function checks that the lkb is the appropriate type, process-copy or master-copy, and that the message is from the correct nodeid for the the given lkb. Other checks and assertions on the lkb type and nodeid have been removed. The assertions were particularly bad since they would panic the machine instead of just ignoring the bad message. Although other recent patches have made processing old message unlikely, it still may be possible for an old message to be processed and caught by these checks. Signed-off-by: NDavid Teigland <teigland@redhat.com>
-
由 David Teigland 提交于
Messages from nodes that are no longer members of the lockspace should be ignored. When nodes are removed from the lockspace, recovery can sometimes complete quickly enough that messages arrive from a removed node after recovery has completed. When processed, these messages would often cause an error message, and could in some cases change some state, causing problems. Signed-off-by: NDavid Teigland <teigland@redhat.com>
-
由 David Teigland 提交于
When a failed request (EBADR or ENOTBLK) is unlocked/canceled instead of retried, there may be other lkb's waiting on the rsb_lookup list for it to complete. A call to confirm_master() is needed to move on to the next waiting lkb since the current one won't be retried. Signed-off-by: NDavid Teigland <teigland@redhat.com>
-
由 David Teigland 提交于
When recovery looks at locks waiting for replies, it fails to consider locks that have already received a reply for their first remote operation, but not received a reply for secondary, overlapping unlock/cancel. The appropriate stub reply needs to be called for these waiters. Appears when we start doing recovery in the presence of a many overlapping unlock/cancel ops. Signed-off-by: NDavid Teigland <teigland@redhat.com>
-
由 David Teigland 提交于
The lkb_ast_type field indicates whether the lkb is on the astqueue list. When clearing locks for a process, lkb's were being removed from the astqueue list without clearing the field. If release_lockspace then happened immediately afterward, it could try to remove the lkb from the list a second time. Appears when process calls libdlm dlm_release_lockspace() which first closes the ls dev triggering clear_proc_locks, and then removes the ls (a write to control dev) causing release_lockspace(). Signed-off-by: NDavid Teigland <teigland@redhat.com>
-
由 David Teigland 提交于
Some errno values differ across platforms. So if we return things like -EINPROGRESS from one node it can get misinterpreted or rejected on another one. This patch fixes up the errno values passed on the wire so that they match the x86 ones (so as not to break the protocol), and re-instates the platform-specific ones at the other end. Many thanks to Fabio for testing this patch. Initial patch from Patrick. Signed-off-by: NPatrick Caulfield <pcaulfie@redhat.com> Signed-off-by: NFabio M. Di Nitto <fabbione@ubuntu.com> Signed-off-by: NDavid Teigland <teigland@redhat.com>
-
由 Fabio M. Di Nitto 提交于
DLM_RCOM_LOCK_REPLY messages need byte swapping. Signed-off-by: NFabio M. Di Nitto <fabbione@ubuntu.com> Signed-off-by: NDavid Teigland <teigland@redhat.com>
-
由 Fabio M. Di Nitto 提交于
gcc does not guarantee that an auto buffer is 64bit aligned. This change allows sparc64 to work. Signed-off-by: NFabio M. Di Nitto <fabbione@ubuntu.com> Signed-off-by: NDavid Teigland <teigland@redhat.com>
-
- 30 1月, 2008 5 次提交
-
-
由 Patrick Caulfeld 提交于
This patch addresses a problem introduced with the last round of lowcomms patches where the 'othercon' connections do not get freed when the DLM shuts down. This results in the error message "slab error in kmem_cache_destroy(): cache `dlm_conn': Can't free all objects" and the DLM cannot be restarted without a system reboot. See bz#428119 Signed-off-by: NPatrick Caulfield <pcaulfie@redhat.com> Signed-off-by: NFabio M. Di Nitto <fabbione@ubuntu.com> Signed-off-by: NDavid Teigland <teigland@redhat.com>
-
由 David Teigland 提交于
The dlm functions in memory.c should use the dlm_ prefix. Also, use kzalloc/kfree directly for dlm_direntry's, removing the wrapper functions. Signed-off-by: NDavid Teigland <teigland@redhat.com>
-
由 David Teigland 提交于
Change log_error() to log_debug() for conditions that can occur in large number in normal operation. Signed-off-by: NDavid Teigland <teigland@redhat.com>
-
由 Adrian Bunk 提交于
This patch adds a proper prototype for some functions in fs/dlm/dlm_internal.h Signed-off-by: NAdrian Bunk <bunk@kernel.org> Signed-off-by: NDavid Teigland <teigland@redhat.com>
-
由 Lon Hohberger 提交于
A common problem occurs when multiple IP addresses within the same subnet are assigned to the same NIC. If we make a connection attempt to another address on the same subnet as one of those addresses, the connection attempt will not necessarily be routed from the address we want. In the case of the DLM, the other nodes will quickly drop the connection attempt, causing problems. This patch makes the DLM bind to the local address it acquired from the cluster manager when using TCP prior to making a connection, obviating the need for administrators to "fix" their systems or use clever routing tricks. Signed-off-by: NLon Hohberger <lhh@redhat.com> Signed-off-by: NPatrick Caulfield <pcaulfie@redhat.com> Signed-off-by: NDavid Teigland <teigland@redhat.com>
-
- 29 1月, 2008 19 次提交
-
-
由 Mingming Cao 提交于
Get rid of sparse related warnings from places that use integer as NULL pointer. (Ported from upstream ext3/jbd changes.) Signed-off-by: NMingming Cao <cmm@us.ibm.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Mingming Cao 提交于
While "every 5 seconds" doesn't sound as a problem, there can be many of these (and these timers do add up over all the kernel). The "5 second" wakeup isn't really timing sensitive; in addition even with rounding it'll still happen every 5 seconds (with the exception of the very first time, which is likely to be rounded up to somewhere closer to 6 seconds) (Ported from similar JBD patch made by Arjan van de Ven to fs/jbd/transaction.c) Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Andrew Morton <akpm@osdl.org> Signed-off-by: NMingming Cao <cmm@us.ibm.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Mingming Cao 提交于
This patch marks slab allocations by jbd2 as short-lived in support of Mel Gorman's "Group short-lived and reclaimable kernel allocations" patch. (Ported from similar changes made to fs/jbd/journal.c and fs/jbd/revoke.c in Mel's patch.) Cc: Mel Gorman <mel@csn.ul.ie> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: NMingming Cao <cmm@us.ibm.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Mingming Cao 提交于
Ported from similar patch for the jbd layer. Signed-off-by: NMingming Cao <cmm@us.ibm.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Aneesh Kumar K.V 提交于
ext4 uses the high bit of the extent length to encode whether the extent is intialized or not. The helper function ext4_ext_get_actual_len should be used to get the actual length of the extent. This addresses the kernel bug documented here: http://bugzilla.kernel.org/show_bug.cgi?id=9732 kernel BUG at fs/ext4/extents.c:1056! .... Call Trace: [<ffffffff88366073>] :ext4dev:ext4_ext_get_blocks+0x5ba/0x8c1 [<ffffffff81053c91>] lock_release_holdtime+0x27/0x49 [<ffffffff812748f6>] _spin_unlock+0x17/0x20 [<ffffffff883400a6>] :jbd2:start_this_handle+0x4e0/0x4fe [<ffffffff88366564>] :ext4dev:ext4_fallocate+0x175/0x39a [<ffffffff81053c91>] lock_release_holdtime+0x27/0x49 [<ffffffff81056480>] __lock_acquire+0x4e7/0xc4d [<ffffffff81053c91>] lock_release_holdtime+0x27/0x49 [<ffffffff810a8de7>] sys_fallocate+0xe4/0x10d [<ffffffff8100c043>] tracesys+0xd5/0xda Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Dmitry Monakhov 提交于
Fix bug reported by Dmitry Monakhov caused by lost error code Testcase: blksize = 0x1000; fd = open(argv[1], O_RDWR|O_CREAT, 0700); unsigned long long sz = 0x10000000UL; /* allocating big blocks chunk */ syscall(__NR_fallocate, fd, 0, 0UL, sz) /* grab all other available filesystem space */ tfd = open("tmp", O_RDWR|O_CREAT|O_DIRECT, 0700); while( write(tfd, buf, 4096) > 0); /* loop untill ENOSPC */ fsync(fd); /* just in case */ while (pos < sz) { /* each seek+ write operation result in splits uninitialized extent in three extents. Splitting may result in new extent allocation which probably will fail because of ENOSPC*/ lseek(fd, blksize*2 -1, SEEK_CUR); if ((ret = write(fd, 'a', 1)) != 1) exit(1); pos += blksize * 2; } Signed-off-by: NDmitry Monakhov <dmonakhov@openvz.org> Signed-off-by: NMingming Cao <cmm@us.ibm.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Aneesh Kumar K.V 提交于
sb_set_blocksize validates whether the specfied block size can be used by the file system. Make sure we fail mounting the file system if the blocksize specfied cannot be used. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NMingming Cao <cmm@us.ibm.com>
-
由 Miklos Szeredi 提交于
Add stripe= option to /proc/mounts for ext4 filesystems. Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz> Signed-off-by: NMingming Cao <cmm@us.ibm.com>
-
由 Aneesh Kumar K.V 提交于
Enable the multiblock allocator by default. Fix ext4_show_options() so if it is not enabled, the nomballoc option included in /proc/mounts. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Acked-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: NMingming Cao <cmm@us.ibm.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Alex Tomas 提交于
Signed-off-by: NAlex Tomas <alex@clusterfs.com> Signed-off-by: NAndreas Dilger <adilger@clusterfs.com> Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Alex Tomas 提交于
Add the functions ext4_ext_search_left() and ext4_ext_search_right(), which are used by mballoc during ext4_ext_get_blocks to decided whether to merge extent information. Signed-off-by: NAlex Tomas <alex@clusterfs.com> Signed-off-by: NAndreas Dilger <adilger@clusterfs.com> Signed-off-by: NJohann Lombardi <johann@clusterfs.com> Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: N"Theodore Ts'o" <tytso@mit.edu>
-
由 Eric Sandeen 提交于
Builds with EXT4FS_DEBUG defined (to enable ext4_debug()) fail without these changes. Clean up some format warnings too. Signed-off-by: NEric Sandeen <sandeen@redhat.com> Signed-off-by: NMingming Cao <cmm@us.ibm.com>
-
由 Aneesh Kumar K.V 提交于
We need to look at the default value and make sure the mount options are not set via default value before showing them via ext4_show_options Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
-
由 Aneesh Kumar K.V 提交于
The below patch add ioctl for migrating ext3 indirect block mapped inode to ext4 extent mapped inode. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
-
由 Jean Noel Cordenner 提交于
This patch adds 64-bit inode version support to ext4. The lower 32 bits are stored in the osd1.linux1.l_i_version field while the high 32 bits are stored in the i_version_hi field newly created in the ext4_inode. This field is incremented in case the ext4_inode is large enough. A i_version mount option has been added to enable the feature. Signed-off-by: NMingming Cao <cmm@us.ibm.com> Signed-off-by: NAndreas Dilger <adilger@clusterfs.com> Signed-off-by: NKalpak Shah <kalpak@clusterfs.com> Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: NJean Noel Cordenner <jean-noel.cordenner@bull.net>
-
由 Jean Noel Cordenner 提交于
The i_version field of the inode is changed to be a 64-bit counter that is set on every inode creation and that is incremented every time the inode data is modified (similarly to the "ctime" time-stamp). The aim is to fulfill a NFSv4 requirement for rfc3530. This first part concerns the vfs, it converts the 32-bit i_version in the generic inode to a 64-bit, a flag is added in the super block in order to check if the feature is enabled and the i_version is incremented in the vfs. Signed-off-by: NMingming Cao <cmm@us.ibm.com> Signed-off-by: NJean Noel Cordenner <jean-noel.cordenner@bull.net> Signed-off-by: NKalpak Shah <kalpak@clusterfs.com>
-
由 Girish Shilamkar 提交于
The journal checksum feature adds two new flags i.e JBD2_FEATURE_INCOMPAT_ASYNC_COMMIT and JBD2_FEATURE_COMPAT_CHECKSUM. JBD2_FEATURE_CHECKSUM flag indicates that the commit block contains the checksum for the blocks described by the descriptor blocks. Due to checksums, writing of the commit record no longer needs to be synchronous. Now commit record can be sent to disk without waiting for descriptor blocks to be written to disk. This behavior is controlled using JBD2_FEATURE_ASYNC_COMMIT flag. Older kernels/e2fsck should not be able to recover the journal with _ASYNC_COMMIT hence it is made incompat. The commit header has been extended to hold the checksum along with the type of the checksum. For recovery in pass scan checksums are verified to ensure the sanity and completeness(in case of _ASYNC_COMMIT) of every transaction. Signed-off-by: NAndreas Dilger <adilger@clusterfs.com> Signed-off-by: NGirish Shilamkar <girish@clusterfs.com> Signed-off-by: NDave Kleikamp <shaggy@linux.vnet.ibm.com> Signed-off-by: NMingming Cao <cmm@us.ibm.com>
-
由 Johann Lombardi 提交于
The patch below updates the jbd stats patch to 2.6.20/jbd2. The initial patch was posted by Alex Tomas in December 2005 (http://marc.info/?l=linux-ext4&m=113538565128617&w=2). It provides statistics via procfs such as transaction lifetime and size. Sometimes, investigating performance problems, i find useful to have stats from jbd about transaction's lifetime, size, etc. here is a patch for review and inclusion probably. for example, stats after creation of 3M files in htree directory: [root@bob ~]# cat /proc/fs/jbd/sda/history R/C tid wait run lock flush log hndls block inlog ctime write drop close R 261 8260 2720 0 0 750 9892 8170 8187 C 259 750 0 4885 1 R 262 20 2200 10 0 770 9836 8170 8187 R 263 30 2200 10 0 3070 9812 8170 8187 R 264 0 5000 10 0 1340 0 0 0 C 261 8240 3212 4957 0 R 265 8260 1470 0 0 4640 9854 8170 8187 R 266 0 5000 10 0 1460 0 0 0 C 262 8210 2989 4868 0 R 267 8230 1490 10 0 4440 9875 8171 8188 R 268 0 5000 10 0 1260 0 0 0 C 263 7710 2937 4908 0 R 269 7730 1470 10 0 3330 9841 8170 8187 R 270 0 5000 10 0 830 0 0 0 C 265 8140 3234 4898 0 C 267 720 0 4849 1 R 271 8630 2740 20 0 740 9819 8170 8187 C 269 800 0 4214 1 R 272 40 2170 10 0 830 9716 8170 8187 R 273 40 2280 0 0 3530 9799 8170 8187 R 274 0 5000 10 0 990 0 0 0 where, R - line for transaction's life from T_RUNNING to T_FINISHED C - line for transaction's checkpointing tid - transaction's id wait - for how long we were waiting for new transaction to start (the longest period journal_start() took in this transaction) run - real transaction's lifetime (from T_RUNNING to T_LOCKED lock - how long we were waiting for all handles to close (time the transaction was in T_LOCKED) flush - how long it took to flush all data (data=ordered) log - how long it took to write the transaction to the log hndls - how many handles got to the transaction block - how many blocks got to the transaction inlog - how many blocks are written to the log (block + descriptors) ctime - how long it took to checkpoint the transaction write - how many blocks have been written during checkpointing drop - how many blocks have been dropped during checkpointing close - how many running transactions have been closed to checkpoint this one all times are in msec. [root@bob ~]# cat /proc/fs/jbd/sda/info 280 transaction, each upto 8192 blocks average: 1633ms waiting for transaction 3616ms running transaction 5ms transaction was being locked 1ms flushing data (in ordered mode) 1799ms logging transaction 11781 handles per transaction 5629 blocks per transaction 5641 logged blocks per transaction Signed-off-by: NJohann Lombardi <johann.lombardi@bull.net> Signed-off-by: NMariusz Kozlowski <m.kozlowski@tuxland.pl> Signed-off-by: NMingming Cao <cmm@us.ibm.com> Signed-off-by: NEric Sandeen <sandeen@redhat.com>
-
由 Aneesh Kumar K.V 提交于
When we are overwriting a file and not actually allocating new file system blocks we need to take only the read lock on i_data_sem. Signed-off-by: NAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
-