- 23 7月, 2010 21 次提交
-
-
由 Ryusuke Konishi 提交于
This replaces seq_printf() with seq_puts() in nilfs_show_options for mount options which have no argument. Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
-
由 Ryusuke Konishi 提交于
Nilfs has "discard" mount option which issues discard/TRIM commands to underlying block device, but it lacks a complementary option and has no way to disable the feature through remount. This adds "nodiscard" option to resolve this imbalance. Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
-
由 Ryusuke Konishi 提交于
Nilfs enables write barriers by default and has "nobarrier" mount option to disable this feature. But it lacks the complementary option and has no way to re-enable the feature on remount. This adds "barrier" option to resolve this imbalance. Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
-
由 Ryusuke Konishi 提交于
Super blocks of nilfs are periodically overwritten in order to record the recent log position. This shortens recovery time after unclean unmount, but the current implementation performs the update even for a few blocks of change. If the filesystem gets small changes slowly and continually, super blocks may be updated excessively. This moderates the issue by skipping update of log cursor if it does not cross a segment boundary. Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
-
由 Ryusuke Konishi 提交于
Although nilfs redundantly uses two super blocks and each may point to different position on log, the current version of nilfs does not try fallback to the spare super block when it doesn't find any valid log at the position that the primary super block points to. This has been a cause of mount failures due to write order reversals on barrier less block devices. This inserts fallback code in error path of nilfs_search_super_root routine to resolve the mount failure problem. Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
-
由 Ryusuke Konishi 提交于
nilfs_search_super_root can return -ENOMEM, but this error code is not described in its kernel-doc comment. This fixes the discrepancy. Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
-
由 Ryusuke Konishi 提交于
This separates a setup routine of log cursor from init_nilfs(). The routine, nilfs_store_log_cursor, reads the last position of the log containing a super root, and initializes relevant state on the nilfs object. Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
-
由 Jiro SEKIBA 提交于
This will sync super blocks in turns instead of syncing duplicate super blocks at the time. This will help searching valid super root when super block is written into disk before log is written, which is happen when barrier-less block devices are unmounted uncleanly. In the situation, old super block likely points to valid log. This patch introduces ns_sbwcount member to the nilfs object and adds nilfs_sb_will_flip() function; ns_sbwcount counts how many times super blocks write back to the disk. And, nilfs_sb_will_flip() decides whether flipping required or not based on the count of ns_sbwcount to sync super blocks asymmetrically. The following functions are also changed: - nilfs_prepare_super(): flips super blocks according to the argument. The argument is calculated by nilfs_sb_will_flip() function. - nilfs_cleanup_super(): sets "clean" flag to both super blocks if they point to the same checkpoint. To update both of super block information, caller of nilfs_commit_super must set the information on both super blocks. Signed-off-by: NJiro SEKIBA <jir@unicus.jp> Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
-
由 Jiro SEKIBA 提交于
This function checks validity of super block pointers. If first super block is invalid, it will swap the super blocks. The function should be called before any super block information updates. Caller must obtain nilfs->ns_sem. Signed-off-by: NJiro SEKIBA <jir@unicus.jp> Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
-
由 Ryusuke Konishi 提交于
This moves out section that updates information of the recent log position stored in super blocks from nilfs_commit_super to a new routine named nilfs_set_log_cursor. Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
-
由 Ryusuke Konishi 提交于
This function marks error state and write it on super blocks. This is a preparation for making super block writeback alternately. Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
-
由 Ryusuke Konishi 提交于
This function write out filesystem state to super blocks in order to share the same cleanup work. This is a preparation for making super block writeback alternately. Cc: Jiro SEKIBA <jir@unicus.jp> Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
-
由 Ryusuke Konishi 提交于
Mount time field in super block is wrongly updated when nilfs remounts the partition from read-write to read-only. This fixes the issue. Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
-
由 Ryusuke Konishi 提交于
This counter is unused. Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
-
由 Ryusuke Konishi 提交于
This removes macros to test segment summary flags and redefines a few relevant macros with inline functions. Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
-
由 Ryusuke Konishi 提交于
This will get rid of nilfs_segsum_info use from recovery functions for simplicity. Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
-
由 Ryusuke Konishi 提交于
load_segment_summary function has two distinct roles: getting summary header of a log, and verifying consistencies of the log. This divide it into two corresponding functions, nilfs_read_log_header and nilfs_validate_log to clarify the meaning. Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
-
由 Ryusuke Konishi 提交于
The function name of nilfs_recover_logical_segments makes no sense. This changes the name into nilfs_salvage_orphan_logs to clarify the role of the function. Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
-
由 Ryusuke Konishi 提交于
Most functions in recovery code take an argument of a super block instance or a nilfs_sb_info struct for convenience sake. This replaces them aggressively with a nilfs object by applying __bread and __breadahead against routines using sb_bread and sb_breadahead. Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
-
由 Ryusuke Konishi 提交于
This stores blocksize in nilfs objects for the successive refactoring of recovery logic. Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
-
由 David Howells 提交于
Fix the security problem in the CIFS filesystem DNS lookup code in which a malicious redirect could be installed by a random user by simply adding a result record into one of their keyrings with add_key() and then invoking a CIFS CFS lookup [CVE-2010-2524]. This is done by creating an internal keyring specifically for the caching of DNS lookups. To enforce the use of this keyring, the module init routine creates a set of override credentials with the keyring installed as the thread keyring and instructs request_key() to only install lookup result keys in that keyring. The override is then applied around the call to request_key(). This has some additional benefits when a kernel service uses this module to request a key: (1) The result keys are owned by root, not the user that caused the lookup. (2) The result keys don't pop up in the user's keyrings. (3) The result keys don't come out of the quota of the user that caused the lookup. The keyring can be viewed as root by doing cat /proc/keys: 2a0ca6c3 I----- 1 perm 1f030000 0 0 keyring .dns_resolver: 1/4 It can then be listed with 'keyctl list' by root. # keyctl list 0x2a0ca6c3 1 key in keyring: 726766307: --alswrv 0 0 dns_resolver: foo.bar.com Signed-off-by: NDavid Howells <dhowells@redhat.com> Reviewed-and-Tested-by: NJeff Layton <jlayton@redhat.com> Acked-by: NSteve French <smfrench@gmail.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 22 7月, 2010 1 次提交
-
-
由 Linus Torvalds 提交于
Pointed out by Lucas who found the new one in a comment in setup_percpu.c. And then I fixed the others that I grepped for. Reported-by: NLucas <canolucas@gmail.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 20 7月, 2010 5 次提交
-
-
由 Dave Chinner 提交于
https://bugzilla.kernel.org/show_bug.cgi?id=16348 When the filesystem grows to a large number of allocation groups, the summing of recalimable inodes gets expensive. In many cases, most AGs won't have any reclaimable inodes and so we are wasting CPU time aggregating over these AGs. This is particularly important for the inode shrinker that gets called frequently under memory pressure. To avoid the overhead, track AGs with reclaimable inodes in the per-ag radix tree so that we can find all the AGs with reclaimable inodes via a simple gang tag lookup. This involves setting the tag when the first reclaimable inode is tracked in the AG, and removing the tag when the last reclaimable inode is removed from the tree. Then the summation process becomes a loop walking the radix tree summing AGs with the reclaim tag set. This significantly reduces the overhead of scanning - a 6400 AG filesystea now only uses about 25% of a cpu in kswapd while slab reclaim progresses instead of being permanently stuck at 100% CPU and making little progress. Clean filesystems filesystems will see no overhead and the overhead only increases linearly with the number of dirty AGs. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
由 Dave Chinner 提交于
Now the shrinker passes us a context, wire up a shrinker context per filesystem. This allows us to remove the global mount list and the locking problems that introduced. It also means that a shrinker call does not need to traverse clean filesystems before finding a filesystem with reclaimable inodes. This significantly reduces scanning overhead when lots of filesystems are present. Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
由 Dan Rosenberg 提交于
1. The BTRFS_IOC_CLONE and BTRFS_IOC_CLONE_RANGE ioctls should check whether the donor file is append-only before writing to it. 2. The BTRFS_IOC_CLONE_RANGE ioctl appears to have an integer overflow that allows a user to specify an out-of-bounds range to copy from the source file (if off + len wraps around). I haven't been able to successfully exploit this, but I'd imagine that a clever attacker could use this to read things he shouldn't. Even if it's not exploitable, it couldn't hurt to be safe. Signed-off-by: NDan Rosenberg <dan.j.rosenberg@gmail.com> cc: stable@kernel.org Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Sage Weil 提交于
The CLONE and CLONE_RANGE ioctls round up the range of extents being cloned to the block size when the range to clone extends to the end of file (this is always the case with CLONE). It was then using that offset when extending the destination file's i_size. Fix this by not setting i_size beyond the originally requested ending offset. This bug was introduced by a22285a6 (2.6.35-rc1). Signed-off-by: NSage Weil <sage@newdream.net> Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
由 Chris Mason 提交于
split_leaf was not properly balancing leaves when it was forced to split a leaf twice. This commit adds an extra push left and right before forcing the double split in hopes of getting the slot where we want to insert at either the start or end of the leaf. If the extra pushes do work, then we are able to avoid splitting twice and we keep the tree properly balanced. Signed-off-by: NChris Mason <chris.mason@oracle.com>
-
- 19 7月, 2010 2 次提交
-
-
由 Peter Oberparleiter 提交于
Partition boundary calculation fails for DASD FBA disks under the following conditions: - disk is formatted with CMS FORMAT with a blocksize of more than 512 bytes - all of the disk is reserved to a single CMS file using CMS RESERVE - the disk is accessed using the DIAG mode of the DASD driver Under these circumstances, the partition detection code tries to read the CMS label block containing partition-relevant information from logical block offset 1, while it is in fact located at physical block offset 1. Fix this problem by using the correct CMS label block location depending on the device type as determined by the DASD SENSE ID information. Signed-off-by: NPeter Oberparleiter <peter.oberparleiter@de.ibm.com> Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
-
由 Dave Chinner 提交于
The current shrinker implementation requires the registered callback to have global state to work from. This makes it difficult to shrink caches that are not global (e.g. per-filesystem caches). Pass the shrinker structure to the callback so that users can embed the shrinker structure in the context the shrinker needs to operate on and get back to it in the callback via container_of(). Signed-off-by: NDave Chinner <dchinner@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
- 17 7月, 2010 3 次提交
-
-
由 Joel Becker 提交于
ocfs2_write_zero_page() has a loop that won't ever be skipped, but gcc doesn't know that. Set ret=0 just to make gcc happy. Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-
由 Sage Weil 提交于
Strip the cap and dentry releases from replayed messages. They can cause the shared state to get out of sync because they were generated (with the request message) earlier, and no longer reflect the current client state. Signed-off-by: NSage Weil <sage@newdream.net>
-
由 Sage Weil 提交于
Replayed rename operations (after an mds failure/recovery) were broken because the request paths were regenerated from the dentry names, which get mangled when d_move() is called. Instead, resend the previous request message when replaying completed operations. Just make sure the REPLAY flag is set and the target ino is filled in. This fixes problems with workloads doing renames when the MDS restarts, where the rename operation appears to succeed, but on mds restart then fails (leading to client confusion, app breakage, etc.). Signed-off-by: NSage Weil <sage@newdream.net>
-
- 16 7月, 2010 3 次提交
-
-
由 Jan Kara 提交于
OCFS2 uses t_commit trigger to compute and store checksum of the just committed blocks. When a buffer has b_frozen_data, checksum is computed for it instead of b_data but this can result in an old checksum being written to the filesystem in the following scenario: 1) transaction1 is opened 2) handle1 is opened 3) journal_access(handle1, bh) - This sets jh->b_transaction to transaction1 4) modify(bh) 5) journal_dirty(handle1, bh) 6) handle1 is closed 7) start committing transaction1, opening transaction2 8) handle2 is opened 9) journal_access(handle2, bh) - This copies off b_frozen_data to make it safe for transaction1 to commit. jh->b_next_transaction is set to transaction2. 10) jbd2_journal_write_metadata() checksums b_frozen_data 11) the journal correctly writes b_frozen_data to the disk journal 12) handle2 is closed - There was no dirty call for the bh on handle2, so it is never queued for any more journal operation 13) Checkpointing finally happens, and it just spools the bh via normal buffer writeback. This will write b_data, which was never triggered on and thus contains a wrong (old) checksum. This patch fixes the problem by calling the trigger at the moment data is frozen for journal commit - i.e., either when b_frozen_data is created by do_get_write_access or just before we write a buffer to the log if b_frozen_data does not exist. We also rename the trigger to t_frozen as that better describes when it is called. Signed-off-by: NJan Kara <jack@suse.cz> Signed-off-by: NMark Fasheh <mfasheh@suse.com> Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-
由 Wengang Wang 提交于
For migration, we are waiting for DLM_LOCK_RES_MIGRATING flag to be set before sending DLM_MIG_LOCKRES_MSG message to the target. We are using dlm_migration_can_proceed() for that purpose. However, if the node is down, dlm_migration_can_proceed() will also return "go ahead". In this rare case, the DLM_LOCK_RES_MIGRATING flag might not be set yet. Remove the BUG_ON() that trips over this condition. Signed-off-by: NWengang Wang <wen.gang.wang@oracle.com> Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-
由 Tao Ma 提交于
During CoW, the pages after i_size don't contain valid data, so there's no need to read and duplicate them. Signed-off-by: NTao Ma <tao.ma@oracle.com> Signed-off-by: NJoel Becker <joel.becker@oracle.com>
-
- 15 7月, 2010 5 次提交
-
-
由 Bob Peterson 提交于
This patch fixes a kernel Oops in the GFS2 rename code. The problem was in the way the gfs2 directory code was trying to re-use sentinel directory entries. In the failing case, gfs2's rename function was renaming a file to another name that had the same non-trivial length. The file being renamed happened to be the first directory entry on the leaf block. First, the rename code (gfs2_rename in ops_inode.c) found the original directory entry and decided it could do its job by simply replacing the directory entry with another. Therefore it determined correctly that no block allocations were needed. Next, the rename code deleted the old directory entry prior to replacing it with the new name. Therefore, the soon-to-be replaced directory entry was temporarily made into a directory entry "sentinel" or a place holder at the start of a leaf block. Lastly, it went to re-add the replacement directory entry in that leaf block. However, when gfs2_dirent_find_space was looking for space in the leaf block, it used the wrong value for the sentinel. That threw off its calculations so later it decides it can't really re-use the sentinel and therefore must allocate a new leaf block. But because it previously decided to re-use the directory entry, it didn't waste the time to grab a new block allocation for the inode. Therefore, the inode's i_alloc pointer was still NULL and it crashes trying to reference it. In the case of sentinel directory entries, the entire dirent is reused, not just the "free space" portion of it, and therefore the function gfs2_dirent_find_space should use the value 0 rather than GFS2_DIRENT_SIZE(0) for the actual dirent size. Fixing this calculation enables the reproducer programs to work properly. Signed-off-by: NBob Peterson <rpeterso@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Abhijith Das 提交于
HighMem pages on i686 do not get mapped to the buffer_heads and this was causing a NULL pointer dereference when we were trying to memset page buffers to zero. We now use zero_user() that kmaps the page and directly manipulates page data. This patch also fixes a boundary condition that was incorrect. Signed-off-by: NAbhi Das <adas@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Bob Peterson 提交于
This patch fixes a problem in an error path when looking up dinodes. There are two sister-functions, gfs2_inode_lookup and gfs2_process_unlinked_inode. Both functions acquire and hold the i_iopen glock for the dinode being looked up. The last thing they try to do is hold the i_gl glock for the dinode. If that glock fails for some reason, the error path was incorrectly calling gfs2_glock_put for the i_iopen glock twice. This resulted in the glock being prematurely freed. The "minimum hold time" usually kept the glock in memory, but the lock interface to dlm (aka lock_dlm) freed its memory for the glock. In some circumstances, it would cause dlm's dlm_astd daemon to try to call the bast function for the freed lock_dlm memory, which resulted in a NULL pointer dereference. Signed-off-by: NBob Peterson <rpeterso@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-
由 Bob Peterson 提交于
This patch fixes bugzilla bug #590878: GFS2: recovery stuck on transaction lock. We set the frozen flag on the glock when we receive a completion that cannot be delivered due to blocked locks. At that point we check to see whether the first waiting holder has the noexp flag set. If the noexp lock is queued later, then we need to unfreeze the glock at that point in time, namely, in the glock work function. This patch was originally written by Steve Whitehouse, but since he's on holiday, I'm submitting it. It's been well tested with a complex recovery test called revolver. Signed-off-by: NSteve Whitehouse <swhiteho@redhat.com> Signed-off-by: NBob Peterson <rpeterso@redhat.com>
-
由 Bob Peterson 提交于
This patch replaces a statement that got dropped out by accident. Without the patch, truncates on stuffed (very small) files cause those files to have an unpredictable size. Signed-off-by: NBob Peterson <rpeterso@redhat.com> Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
-