- 21 3月, 2015 1 次提交
-
-
由 Goldwyn Rodrigues 提交于
A --cluster-confirm without an --add (by another node) can crash the kernel. Fix it by guarding it using a state. Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com> Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 04 3月, 2015 2 次提交
-
-
由 Stephen Rothwell 提交于
neilb: modified to not corrupt ->resync_max_sectors. sector_div usage fixed by Guoqing Jiang <gqjiang@suse.com> Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: NNeilBrown <neilb@suse.de>
-
由 NeilBrown 提交于
DIV_ROUTND_UP doesn't work on "long long", - and it should be sector_t anyway. Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 25 2月, 2015 1 次提交
-
-
由 NeilBrown 提交于
Recent change to bitmap_create mishandles errors. In particular a failure doesn't alway cause 'err' to be set. Signed-off-by: NNeilBrown <neilb@suse.de>
-
- 23 2月, 2015 36 次提交
-
-
由 Goldwyn Rodrigues 提交于
Algorithm: 1. Node 1 issues mdadm --manage /dev/mdX --add /dev/sdYY which issues ioctl(ADD_NEW_DISC with disc.state set to MD_DISK_CLUSTER_ADD) 2. Node 1 sends NEWDISK with uuid and slot number 3. Other nodes issue kobject_uevent_env with uuid and slot number (Steps 4,5 could be a udev rule) 4. In userspace, the node searches for the disk, perhaps using blkid -t SUB_UUID="" 5. Other nodes issue either of the following depending on whether the disk was found: ioctl(ADD_NEW_DISK with disc.state set to MD_DISK_CANDIDATE and disc.number set to slot number) ioctl(CLUSTERED_DISK_NACK) 6. Other nodes drop lock on no-new-devs (CR) if device is found 7. Node 1 attempts EX lock on no-new-devs 8. If node 1 gets the lock, it sends METADATA_UPDATED after unmarking the disk as SpareLocal 9. If not (get no-new-dev lock), it fails the operation and sends METADATA_UPDATED 10. Other nodes understand if the device is added or not by reading the superblock again after receiving the METADATA_UPDATED message. Signed-off-by: NLidong Zhong <lzhong@suse.com> Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
-
由 Goldwyn Rodrigues 提交于
set choose_first true for cluster read in read balance when the area is resyncing. Signed-off-by: NLidong Zhong <lzhong@suse.com> Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
-
由 Goldwyn Rodrigues 提交于
If there is a resync going on, all nodes must suspend writes to the range. This is recorded in the suspend_info/suspend_list. If there is an I/O within the ranges of any of the suspend_info, should_suspend will return 1. Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
-
由 Goldwyn Rodrigues 提交于
When a RESYNC_START message arrives, the node removes the entry with the current slot number and adds the range to the suspend_list. Simlarly, when a RESYNC_FINISHED message is received, node clears entry with respect to the bitmap number. Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
-
由 Goldwyn Rodrigues 提交于
When a resync is initiated, RESYNCING message is sent to all active nodes with the range (lo,hi). When the resync is over, a RESYNCING message is sent with (0,0). A high sector value of zero indicates that the resync is over. Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
-
由 Goldwyn Rodrigues 提交于
Re-reads the devices by invalidating the cache. Since we don't write to faulty devices, this is detected using events recorded in the devices. If it is old as compared to the mddev mark it is faulty. Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
-
由 Goldwyn Rodrigues 提交于
- request to send a message - make changes to superblock - send messages telling everyone that the superblock has changed - other nodes all read the superblock - other nodes all ack the messages - updating node release the "I'm sending a message" resource. Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
-
由 Goldwyn Rodrigues 提交于
The sending part is split in two functions to make sure atomicity of the operations, such as the MD superblock update. Signed-off-by: NLidong Zhong <lzhong@suse.com> Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
-
由 Goldwyn Rodrigues 提交于
1. receive status sender receiver receiver ACK:CR ACK:CR ACK:CR 2. sender get EX of TOKEN sender get EX of MESSAGE sender receiver receiver TOKEN:EX ACK:CR ACK:CR MESSAGE:EX ACK:CR 3. sender write LVB. sender down-convert MESSAGE from EX to CR sender try to get EX of ACK [ wait until all receiver has *processed* the MESSAGE ] [ triggered by bast of ACK ] receiver get CR of MESSAGE receiver read LVB receiver processes the message [ wait finish ] receiver release ACK sender receiver receiver TOKEN:EX MESSAGE:CR MESSAGE:CR MESSAGE:CR ACK:EX 4. sender down-convert ACK from EX to CR sender release MESSAGE sender release TOKEN receiver upconvert to EX of MESSAGE receiver get CR of ACK receiver release MESSAGE sender receiver receiver ACK:CR ACK:CR ACK:CR Signed-off-by: NLidong Zhong <lzhong@suse.com> Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
-
由 Goldwyn Rodrigues 提交于
If bitmap_copy_slot returns hi>0, we need to perform resync. Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
-
由 Goldwyn Rodrigues 提交于
The DLM informs us in case of node failure with the DLM slot number. cluster_info->recovery_map sets the bit corresponding to the slot number and wakes up the recovery thread. The recovery thread: 1. Derives the slot number from the recovery_map 2. Locks the bitmap corresponding to the slot 3. Copies the set bits to the node-local bitmap Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
-
由 Goldwyn Rodrigues 提交于
bitmap_copy_from_slot reads the bitmap from the slot mentioned. It then copies the set bits to the node local bitmap. This is helper function for the resync operation on node failure. bitmap_set_memory_bits() currently assumes it is only run at startup and that they bitmap is currently empty. So if it finds that a region is already marked as dirty, it won't mark it dirty again. Change bitmap_set_memory_bits() to always set the NEEDED_MASK bit if 'needed' is set. Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
-
由 Goldwyn Rodrigues 提交于
This is done to have multiple bitmaps open at the same time. Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
-
由 Goldwyn Rodrigues 提交于
When a node joins, it does not know of other nodes performing resync. So, each node keeps the resync information in it's LVB. When a new node joins, it reads the LVB of each "online" bitmap. [TODO] The new node attempts to get the PW lock on other bitmap, if it is successful, it reads the bitmap and performs the resync (if required) on it's behalf. If the node does not get the PW, it requests CR and reads the LVB for the resync information. Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
-
由 Goldwyn Rodrigues 提交于
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
-
由 Goldwyn Rodrigues 提交于
On-disk format: 0 4k 8k 12k ------------------------------------------------------------------- | idle | md super | bm super [0] + bits | | bm bits[0, contd] | bm super[1] + bits | bm bits[1, contd] | | bm super[2] + bits | bm bits [2, contd] | bm super[3] + bits | | bm bits [3, contd] | | | Bitmap super has a field nodes, which defines the maximum number of nodes the device can use. While reading the bitmap super, if the cluster finds out that the number of nodes is > 0: 1. Requests the md-cluster module. 2. Calls md_cluster_ops->join(), which sets up clustering such as joining DLM lockspace. Since the first time, the first bitmap is read. After the call to the cluster_setup, the bitmap offset is adjusted and the superblock is re-read. This also ensures the bitmap is read the bitmap lock (when bitmap lock is introduced in later patches) Questions: 1. cluster name is repeated in all bitmap supers. Is that okay? Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
-
由 Goldwyn Rodrigues 提交于
DLM offers callbacks when a node fails and the lock remastery is performed: 1. recover_prep: called when DLM discovers a node is down 2. recover_slot: called when DLM identifies the node and recovery can start 3. recover_done: called when all nodes have completed recover_slot recover_slot() and recover_done() are also called when the node joins initially in order to inform the node with its slot number. These slot numbers start from one, so we deduct one to make it start with zero which the cluster-md code uses. Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
-
由 Goldwyn Rodrigues 提交于
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
-
由 Goldwyn Rodrigues 提交于
md_cluster_info stores the cluster information in the MD device. The join() is called when mddev detects it is a clustered device. The main responsibilities are: 1. Setup a DLM lockspace 2. Setup all initial locks such as super block locks and bitmap lock (will come later) The leave() clears up the lockspace and all the locks held. Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
-
由 Goldwyn Rodrigues 提交于
This allows dynamic registering of cluster hooks. Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
-
由 Goldwyn Rodrigues 提交于
A dlm_lock_resource is a structure which contains all information required for locking using DLM. The init function allocates the lock and acquires the lock in NL mode. The unlock function converts the lock resource to NL mode. This is done to preserve LVB and for faster processing of locks. The lock resource is DLM unlocked only in the lockres_free function, which is the end of life of the lock resource. Signed-off-by: NLidong Zhong <lzhong@suse.com> Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
-
由 Goldwyn Rodrigues 提交于
Tagged as EXPERIMENTAL for now. Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
-
由 Goldwyn Rodrigues 提交于
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
-
由 Goldwyn Rodrigues 提交于
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
-
由 Linus Torvalds 提交于
.. after extensive statistical analysis of my G+ polling, I've come to the inescapable conclusion that internet polls are bad. Big surprise. But "Hurr durr I'ma sheep" trounced "I like online polls" by a 62-to-38% margin, in a poll that people weren't even supposed to participate in. Who can argue with solid numbers like that? 5,796 votes from people who can't even follow the most basic directions? In contrast, "v4.0" beat out "v3.20" by a slimmer margin of 56-to-44%, but with a total of 29,110 votes right now. Now, arguably, that vote spread is only about 3,200 votes, which is less than the almost six thousand votes that the "please ignore" poll got, so it could be considered noise. But hey, I asked, so I'll honor the votes.
-
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4由 Linus Torvalds 提交于
Pull ext4 fixes from Ted Ts'o: "Ext4 bug fixes. We also reserved code points for encryption and read-only images (for which the implementation is mostly just the reserved code point for a read-only feature :-)" * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: fix indirect punch hole corruption ext4: ignore journal checksum on remount; don't fail ext4: remove duplicate remount check for JOURNAL_CHECKSUM change ext4: fix mmap data corruption in nodelalloc mode when blocksize < pagesize ext4: support read-only images ext4: change to use setup_timer() instead of init_timer() ext4: reserve codepoints used by the ext4 encryption feature jbd2: complain about descriptor block checksum errors
-
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs由 Linus Torvalds 提交于
Pull more vfs updates from Al Viro: "Assorted stuff from this cycle. The big ones here are multilayer overlayfs from Miklos and beginning of sorting ->d_inode accesses out from David" * 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (51 commits) autofs4 copy_dev_ioctl(): keep the value of ->size we'd used for allocation procfs: fix race between symlink removals and traversals debugfs: leave freeing a symlink body until inode eviction Documentation/filesystems/Locking: ->get_sb() is long gone trylock_super(): replacement for grab_super_passive() fanotify: Fix up scripted S_ISDIR/S_ISREG/S_ISLNK conversions Cachefiles: Fix up scripted S_ISDIR/S_ISREG/S_ISLNK conversions VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry) SELinux: Use d_is_positive() rather than testing dentry->d_inode Smack: Use d_is_positive() rather than testing dentry->d_inode TOMOYO: Use d_is_dir() rather than d_inode and S_ISDIR() Apparmor: Use d_is_positive/negative() rather than testing dentry->d_inode Apparmor: mediated_filesystem() should use dentry->d_sb not inode->i_sb VFS: Split DCACHE_FILE_TYPE into regular and special types VFS: Add a fallthrough flag for marking virtual dentries VFS: Add a whiteout dentry type VFS: Introduce inode-getting helpers for layered/unioned fs environments Infiniband: Fix potential NULL d_inode dereference posix_acl: fix reference leaks in posix_acl_create autofs4: Wrong format for printing dentry ...
-
git://ftp.arm.linux.org.uk/~rmk/linux-arm由 Linus Torvalds 提交于
Pull ARM fix from Russell King: "Just one fix this time around. __iommu_alloc_buffer() can cause a BUG() if dma_alloc_coherent() is called with either __GFP_DMA32 or __GFP_HIGHMEM set. The patch from Alexandre addresses this" * 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm: ARM: 8305/1: DMA: Fix kzalloc flags in __iommu_alloc_buffer()
-
由 Al Viro 提交于
X-Coverup: just ask spender Cc: stable@vger.kernel.org Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
use_pde()/unuse_pde() in ->follow_link()/->put_link() resp. Cc: stable@vger.kernel.org Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
As it is, we have debugfs_remove() racing with symlink traversals. Supply ->evict_inode() and do freeing there - inode will remain pinned until we are done with the symlink body. And rip the idiocy with checking if dentry is positive right after we'd verified debugfs_positive(), which is a stronger check... Cc: stable@vger.kernel.org Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Al Viro 提交于
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 Konstantin Khlebnikov 提交于
I've noticed significant locking contention in memory reclaimer around sb_lock inside grab_super_passive(). Grab_super_passive() is called from two places: in icache/dcache shrinkers (function super_cache_scan) and from writeback (function __writeback_inodes_wb). Both are required for progress in memory allocator. Grab_super_passive() acquires sb_lock to increment sb->s_count and check sb->s_instances. It seems sb->s_umount locked for read is enough here: super-block deactivation always runs under sb->s_umount locked for write. Protecting super-block itself isn't a problem: in super_cache_scan() sb is protected by shrinker_rwsem: it cannot be freed if its slab shrinkers are still active. Inside writeback super-block comes from inode from bdi writeback list under wb->list_lock. This patch removes locking sb_lock and checks s_instances under s_umount: generic_shutdown_super() unlinks it under sb->s_umount locked for write. New variant is called trylock_super() and since it only locks semaphore, callers must call up_read(&sb->s_umount) instead of drop_super(sb) when they're done. Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 David Howells 提交于
Fanotify probably doesn't want to watch autodirs so make it use d_can_lookup() rather than d_is_dir() when checking a dir watch and give an error on fake directories. Signed-off-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 David Howells 提交于
Fix up the following scripted S_ISDIR/S_ISREG/S_ISLNK conversions (or lack thereof) in cachefiles: (1) Cachefiles mostly wants to use d_can_lookup() rather than d_is_dir() as it doesn't want to deal with automounts in its cache. (2) Coccinelle didn't find S_IS* expressions in ASSERT() statements in cachefiles. Signed-off-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
由 David Howells 提交于
Convert the following where appropriate: (1) S_ISLNK(dentry->d_inode) to d_is_symlink(dentry). (2) S_ISREG(dentry->d_inode) to d_is_reg(dentry). (3) S_ISDIR(dentry->d_inode) to d_is_dir(dentry). This is actually more complicated than it appears as some calls should be converted to d_can_lookup() instead. The difference is whether the directory in question is a real dir with a ->lookup op or whether it's a fake dir with a ->d_automount op. In some circumstances, we can subsume checks for dentry->d_inode not being NULL into this, provided we the code isn't in a filesystem that expects d_inode to be NULL if the dirent really *is* negative (ie. if we're going to use d_inode() rather than d_backing_inode() to get the inode pointer). Note that the dentry type field may be set to something other than DCACHE_MISS_TYPE when d_inode is NULL in the case of unionmount, where the VFS manages the fall-through from a negative dentry to a lower layer. In such a case, the dentry type of the negative union dentry is set to the same as the type of the lower dentry. However, if you know d_inode is not NULL at the call site, then you can use the d_is_xxx() functions even in a filesystem. There is one further complication: a 0,0 chardev dentry may be labelled DCACHE_WHITEOUT_TYPE rather than DCACHE_SPECIAL_TYPE. Strictly, this was intended for special directory entry types that don't have attached inodes. The following perl+coccinelle script was used: use strict; my @callers; open($fd, 'git grep -l \'S_IS[A-Z].*->d_inode\' |') || die "Can't grep for S_ISDIR and co. callers"; @callers = <$fd>; close($fd); unless (@callers) { print "No matches\n"; exit(0); } my @cocci = ( '@@', 'expression E;', '@@', '', '- S_ISLNK(E->d_inode->i_mode)', '+ d_is_symlink(E)', '', '@@', 'expression E;', '@@', '', '- S_ISDIR(E->d_inode->i_mode)', '+ d_is_dir(E)', '', '@@', 'expression E;', '@@', '', '- S_ISREG(E->d_inode->i_mode)', '+ d_is_reg(E)' ); my $coccifile = "tmp.sp.cocci"; open($fd, ">$coccifile") || die $coccifile; print($fd "$_\n") || die $coccifile foreach (@cocci); close($fd); foreach my $file (@callers) { chomp $file; print "Processing ", $file, "\n"; system("spatch", "--sp-file", $coccifile, $file, "--in-place", "--no-show-diff") == 0 || die "spatch failed"; } [AV: overlayfs parts skipped] Signed-off-by: NDavid Howells <dhowells@redhat.com> Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-