提交 · 09dd1af2e011a13adce65b74425dfe31e1985e64 · openanolis / cloud-kernel

21 3月, 2015 3 次提交

md/cluster: Communication Framework: fix semicolon.cocci warnings · 09dd1af2

由 kbuild test robot 提交于 2月 28, 2015

drivers/md/md-cluster.c:328:2-3: Unneeded semicolon

 Removes unneeded semicolon.

Generated by: scripts/coccinelle/misc/semicolon.cocci
Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

09dd1af2

md: recover_bitmaps() can be static · 6dc69c9c

由 kbuild test robot 提交于 2月 28, 2015

drivers/md/md-cluster.c:190:6: sparse: symbol 'recover_bitmaps' was not declared. Should it be static?
Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

6dc69c9c

md: Fix stray --cluster-confirm crash · fa8259da

由 Goldwyn Rodrigues 提交于 3月 02, 2015

A --cluster-confirm without an --add (by another node) can
crash the kernel.

Fix it by guarding it using a state.
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
Signed-off-by: NNeilBrown <neilb@suse.de>

fa8259da

04 3月, 2015 2 次提交

md/bitmap: use sector_div for sector_t divisions · 3b0e6aac

由 Stephen Rothwell 提交于 3月 03, 2015

neilb: modified to not corrupt ->resync_max_sectors.

sector_div usage fixed by Guoqing Jiang <gqjiang@suse.com>
Signed-off-by: NStephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: NNeilBrown <neilb@suse.de>

3b0e6aac

md/bitmap: fix incorrect DIV_ROUND_UP usage. · 935f3d4f

由 NeilBrown 提交于 3月 02, 2015

DIV_ROUTND_UP doesn't work on "long long", - and it should be
sector_t anyway.
Signed-off-by: NNeilBrown <neilb@suse.de>

935f3d4f

25 2月, 2015 1 次提交

md: fix error paths from bitmap_create. · ba599aca

由 NeilBrown 提交于 2月 25, 2015

Recent change to bitmap_create mishandles errors.
In particular a failure doesn't alway cause 'err' to be set.
Signed-off-by: NNeilBrown <neilb@suse.de>

ba599aca

23 2月, 2015 34 次提交

Add new disk to clustered array · 1aee41f6

由 Goldwyn Rodrigues 提交于 10月 29, 2014

Algorithm:
1. Node 1 issues mdadm --manage /dev/mdX --add /dev/sdYY which issues
   ioctl(ADD_NEW_DISC with disc.state set to MD_DISK_CLUSTER_ADD)
2. Node 1 sends NEWDISK with uuid and slot number
3. Other nodes issue kobject_uevent_env with uuid and slot number
(Steps 4,5 could be a udev rule)
4. In userspace, the node searches for the disk, perhaps
   using blkid -t SUB_UUID=""
5. Other nodes issue either of the following depending on whether the disk
   was found:
   ioctl(ADD_NEW_DISK with disc.state set to MD_DISK_CANDIDATE and
	 disc.number set to slot number)
   ioctl(CLUSTERED_DISK_NACK)
6. Other nodes drop lock on no-new-devs (CR) if device is found
7. Node 1 attempts EX lock on no-new-devs
8. If node 1 gets the lock, it sends METADATA_UPDATED after unmarking the disk
   as SpareLocal
9. If not (get no-new-dev lock), it fails the operation and sends METADATA_UPDATED
10. Other nodes understand if the device is added or not by reading the superblock again after receiving the METADATA_UPDATED message.
Signed-off-by: NLidong Zhong <lzhong@suse.com>
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>

1aee41f6

Read from the first device when an area is resyncing · 7d49ffcf

由 Goldwyn Rodrigues 提交于 8月 12, 2014

set choose_first true for cluster read in read balance when the area
is resyncing.
Signed-off-by: NLidong Zhong <lzhong@suse.com>
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>

7d49ffcf

Suspend writes in RAID1 if within range · 589a1c49

由 Goldwyn Rodrigues 提交于 6月 07, 2014

If there is a resync going on, all nodes must suspend writes to the
range. This is recorded in the suspend_info/suspend_list.

If there is an I/O within the ranges of any of the suspend_info,
should_suspend will return 1.
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>

589a1c49

Resync start/Finish actions · e59721cc

由 Goldwyn Rodrigues 提交于 6月 07, 2014

When a RESYNC_START message arrives, the node removes the entry
with the current slot number and adds the range to the
suspend_list.

Simlarly, when a RESYNC_FINISHED message is received, node clears
entry with respect to the bitmap number.
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>

e59721cc

Send RESYNCING while performing resync start/stop · 965400eb

由 Goldwyn Rodrigues 提交于 6月 07, 2014

When a resync is initiated, RESYNCING message is sent to all active
nodes with the range (lo,hi). When the resync is over, a RESYNCING
message is sent with (0,0). A high sector value of zero indicates
that the resync is over.
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>

965400eb

Reload superblock if METADATA_UPDATED is received · 1d7e3e96

由 Goldwyn Rodrigues 提交于 6月 07, 2014

Re-reads the devices by invalidating the cache.
Since we don't write to faulty devices, this is detected using
events recorded in the devices. If it is old as compared to the mddev
mark it is faulty.
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>

1d7e3e96

metadata_update sends message to other nodes · 293467aa

由 Goldwyn Rodrigues 提交于 6月 07, 2014

   - request to send a message
   - make changes to superblock
   - send messages telling everyone that the superblock has changed
   - other nodes all read the superblock
   - other nodes all ack the messages
   - updating node release the "I'm sending a message" resource.
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>

293467aa

Communication Framework: Sending functions · 601b515c

由 Goldwyn Rodrigues 提交于 6月 07, 2014

The sending part is split in two functions to make sure
atomicity of the operations, such as the MD superblock update.
Signed-off-by: NLidong Zhong <lzhong@suse.com>
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>

601b515c

Communication Framework: Receiving · 4664680c

由 Goldwyn Rodrigues 提交于 6月 07, 2014

1. receive status

   sender                         receiver                   receiver
   ACK:CR                          ACK:CR                     ACK:CR

2. sender get EX of TOKEN
   sender get EX of MESSAGE
   sender                          receiver                   receiver
   TOKEN:EX                         ACK:CR                     ACK:CR
   MESSAGE:EX
   ACK:CR

3. sender write LVB.
   sender down-convert MESSAGE from EX to CR
   sender try to get EX of ACK
   [ wait until all receiver has *processed* the MESSAGE ]

                                     [ triggered by bast of ACK ]
                                     receiver get CR of MESSAGE
                                     receiver read LVB
                                     receiver processes the message
				     [ wait finish ]
                                     receiver release ACK

   sender                         receiver                   receiver
   TOKEN:EX                       MESSAGE:CR                 MESSAGE:CR
   MESSAGE:CR
   ACK:EX

4. sender down-convert ACK from EX to CR
   sender release MESSAGE
   sender release TOKEN
				  receiver upconvert to EX of MESSAGE
                                  receiver get CR of ACK
				  receiver release MESSAGE

   sender                        receiver                   receiver
   ACK:CR                         ACK:CR                     ACK:CR
Signed-off-by: NLidong Zhong <lzhong@suse.com>
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>

4664680c

Perform resync for cluster node failure · 4b26a08a

由 Goldwyn Rodrigues 提交于 6月 07, 2014

If bitmap_copy_slot returns hi>0, we need to perform resync.
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>

4b26a08a

Initiate recovery on node failure · e94987db

由 Goldwyn Rodrigues 提交于 6月 07, 2014

The DLM informs us in case of node failure with the DLM slot number.
cluster_info->recovery_map sets the bit corresponding to the slot number
and wakes up the recovery thread.

The recovery thread:
1. Derives the slot number from the recovery_map
2. Locks the bitmap corresponding to the slot
3. Copies the set bits to the node-local bitmap
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>

e94987db

Copy set bits from another slot · 11dd35da

由 Goldwyn Rodrigues 提交于 6月 07, 2014

bitmap_copy_from_slot reads the bitmap from the slot mentioned.
It then copies the set bits to the node local bitmap.

This is helper function for the resync operation on node failure.

bitmap_set_memory_bits() currently assumes it is only run at startup and that
they bitmap is currently empty. So if it finds that a region is already
marked as dirty, it won't mark it dirty again. Change bitmap_set_memory_bits()
to always set the NEEDED_MASK bit if 'needed' is set.
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>

11dd35da

bitmap_create returns bitmap pointer · f9209a32

由 Goldwyn Rodrigues 提交于 6月 06, 2014

This is done to have multiple bitmaps open at the same time.
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>

f9209a32

Gather on-going resync information of other nodes · 96ae923a

由 Goldwyn Rodrigues 提交于 6月 06, 2014

When a node joins, it does not know of other nodes performing resync.
So, each node keeps the resync information in it's LVB. When a new
node joins, it reads the LVB of each "online" bitmap.

[TODO] The new node attempts to get the PW lock on other bitmap, if
it is successful, it reads the bitmap and performs the resync (if
required) on it's behalf.

If the node does not get the PW, it requests CR and reads the LVB
for the resync information.
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>

96ae923a

G
Lock bitmap while joining the cluster · 54519c5f
由 Goldwyn Rodrigues 提交于 6月 06, 2014
```
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
```
54519c5f

Use separate bitmaps for each nodes in the cluster · b97e9257

由 Goldwyn Rodrigues 提交于 6月 06, 2014

On-disk format:

0                    4k                     8k                    12k
-------------------------------------------------------------------
| idle                | md super            | bm super [0] + bits |
| bm bits[0, contd]   | bm super[1] + bits  | bm bits[1, contd]   |
| bm super[2] + bits  | bm bits [2, contd]  | bm super[3] + bits  |
| bm bits [3, contd]  |                     |                     |

Bitmap super has a field nodes, which defines the maximum number
of nodes the device can use. While reading the bitmap super, if
the cluster finds out that the number of nodes is > 0:
1. Requests the md-cluster module.
2. Calls md_cluster_ops->join(), which sets up clustering such as
   joining DLM lockspace.

Since the first time, the first bitmap is read. After the call
to the cluster_setup, the bitmap offset is adjusted and the
superblock is re-read. This also ensures the bitmap is read
the bitmap lock (when bitmap lock is introduced in later patches)

Questions:
1. cluster name is repeated in all bitmap supers. Is that okay?
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>

b97e9257

Add node recovery callbacks · cf921cc1

由 Goldwyn Rodrigues 提交于 3月 30, 2014

DLM offers callbacks when a node fails and the lock remastery
is performed:

1. recover_prep: called when DLM discovers a node is down
2. recover_slot: called when DLM identifies the node and recovery
		can start
3. recover_done: called when all nodes have completed recover_slot

recover_slot() and recover_done() are also called when the node joins
initially in order to inform the node with its slot number. These slot
numbers start from one, so we deduct one to make it start with zero
which the cluster-md code uses.
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>

cf921cc1

G
Return MD_SB_CLUSTERED if mddev is clustered · ca8895d9
由 Goldwyn Rodrigues 提交于 11月 26, 2014
```
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
```
ca8895d9

Introduce md_cluster_info · c4ce867f

由 Goldwyn Rodrigues 提交于 3月 29, 2014

md_cluster_info stores the cluster information in the MD device.

The join() is called when mddev detects it is a clustered device.
The main responsibilities are:
	1. Setup a DLM lockspace
	2. Setup all initial locks such as super block locks and bitmap lock (will come later)

The leave() clears up the lockspace and all the locks held.
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>

c4ce867f

G
Introduce md_cluster_operations to handle cluster functions · edb39c9d
由 Goldwyn Rodrigues 提交于 3月 29, 2014
```
This allows dynamic registering of cluster hooks.
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
```
edb39c9d

DLM lock and unlock functions · 47741b7c

由 Goldwyn Rodrigues 提交于 3月 07, 2014

A dlm_lock_resource is a structure which contains all information
required for locking using DLM. The init function allocates the
lock and acquires the lock in NL mode. The unlock function
converts the lock resource to NL mode. This is done to preserve
LVB and for faster processing of locks. The lock resource is
DLM unlocked only in the lockres_free function, which is the end
of life of the lock resource.
Signed-off-by: NLidong Zhong <lzhong@suse.com>
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>

47741b7c

G
Create a separate module for clustering support · 8e854e9c
由 Goldwyn Rodrigues 提交于 3月 07, 2014
```
Tagged as EXPERIMENTAL for now.
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
```
8e854e9c
G
Add number of nodes to bitmap structure for clustering · 183bdf51
由 Goldwyn Rodrigues 提交于 3月 07, 2014
```
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
```
183bdf51
G
md-cluster: Design Documentation · b8d83448
由 Goldwyn Rodrigues 提交于 6月 10, 2014
```
Signed-off-by: NGoldwyn Rodrigues <rgoldwyn@suse.com>
```
b8d83448

Linux 4.0-rc1 · c517d838

由 Linus Torvalds 提交于 2月 22, 2015

.. after extensive statistical analysis of my G+ polling, I've come to
the inescapable conclusion that internet polls are bad.

Big surprise.

But "Hurr durr I'ma sheep" trounced "I like online polls" by a 62-to-38%
margin, in a poll that people weren't even supposed to participate in.
Who can argue with solid numbers like that? 5,796 votes from people who
can't even follow the most basic directions?

In contrast, "v4.0" beat out "v3.20" by a slimmer margin of 56-to-44%,
but with a total of 29,110 votes right now.

Now, arguably, that vote spread is only about 3,200 votes, which is less
than the almost six thousand votes that the "please ignore" poll got, so
it could be considered noise.

But hey, I asked, so I'll honor the votes.

c517d838

Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · feaf2229

由 Linus Torvalds 提交于 2月 22, 2015

Pull ext4 fixes from Ted Ts'o:
 "Ext4 bug fixes.

  We also reserved code points for encryption and read-only images (for
  which the implementation is mostly just the reserved code point for a
  read-only feature :-)"

* tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
  ext4: fix indirect punch hole corruption
  ext4: ignore journal checksum on remount; don't fail
  ext4: remove duplicate remount check for JOURNAL_CHECKSUM change
  ext4: fix mmap data corruption in nodelalloc mode when blocksize < pagesize
  ext4: support read-only images
  ext4: change to use setup_timer() instead of init_timer()
  ext4: reserve codepoints used by the ext4 encryption feature
  jbd2: complain about descriptor block checksum errors

feaf2229

Merge branch 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · be5e6616

由 Linus Torvalds 提交于 2月 22, 2015

Pull more vfs updates from Al Viro:
 "Assorted stuff from this cycle.  The big ones here are multilayer
  overlayfs from Miklos and beginning of sorting ->d_inode accesses out
  from David"

* 'for-linus-2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (51 commits)
  autofs4 copy_dev_ioctl(): keep the value of ->size we'd used for allocation
  procfs: fix race between symlink removals and traversals
  debugfs: leave freeing a symlink body until inode eviction
  Documentation/filesystems/Locking: ->get_sb() is long gone
  trylock_super(): replacement for grab_super_passive()
  fanotify: Fix up scripted S_ISDIR/S_ISREG/S_ISLNK conversions
  Cachefiles: Fix up scripted S_ISDIR/S_ISREG/S_ISLNK conversions
  VFS: (Scripted) Convert S_ISLNK/DIR/REG(dentry->d_inode) to d_is_*(dentry)
  SELinux: Use d_is_positive() rather than testing dentry->d_inode
  Smack: Use d_is_positive() rather than testing dentry->d_inode
  TOMOYO: Use d_is_dir() rather than d_inode and S_ISDIR()
  Apparmor: Use d_is_positive/negative() rather than testing dentry->d_inode
  Apparmor: mediated_filesystem() should use dentry->d_sb not inode->i_sb
  VFS: Split DCACHE_FILE_TYPE into regular and special types
  VFS: Add a fallthrough flag for marking virtual dentries
  VFS: Add a whiteout dentry type
  VFS: Introduce inode-getting helpers for layered/unioned fs environments
  Infiniband: Fix potential NULL d_inode dereference
  posix_acl: fix reference leaks in posix_acl_create
  autofs4: Wrong format for printing dentry
  ...

be5e6616

Merge branch 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm · 90c453ca

由 Linus Torvalds 提交于 2月 22, 2015

Pull ARM fix from Russell King:
 "Just one fix this time around.  __iommu_alloc_buffer() can cause a
  BUG() if dma_alloc_coherent() is called with either __GFP_DMA32 or
  __GFP_HIGHMEM set.  The patch from Alexandre addresses this"

* 'fixes' of git://ftp.arm.linux.org.uk/~rmk/linux-arm:
  ARM: 8305/1: DMA: Fix kzalloc flags in __iommu_alloc_buffer()

90c453ca

A
autofs4 copy_dev_ioctl(): keep the value of ->size we'd used for allocation · 0a280962
由 Al Viro 提交于 2月 21, 2015
```
X-Coverup: just ask spender
Cc: stable@vger.kernel.org
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
0a280962

procfs: fix race between symlink removals and traversals · 7e0e953b

由 Al Viro 提交于 2月 21, 2015

use_pde()/unuse_pde() in ->follow_link()/->put_link() resp.

Cc: stable@vger.kernel.org
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

7e0e953b

debugfs: leave freeing a symlink body until inode eviction · 0db59e59

由 Al Viro 提交于 2月 21, 2015

As it is, we have debugfs_remove() racing with symlink traversals.
Supply ->evict_inode() and do freeing there - inode will remain
pinned until we are done with the symlink body.

And rip the idiocy with checking if dentry is positive right after
we'd verified debugfs_positive(), which is a stronger check...

Cc: stable@vger.kernel.org
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

0db59e59

A
Documentation/filesystems/Locking: ->get_sb() is long gone · dca11178
由 Al Viro 提交于 2月 21, 2015
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
dca11178

trylock_super(): replacement for grab_super_passive() · eb6ef3df

由 Konstantin Khlebnikov 提交于 2月 19, 2015

I've noticed significant locking contention in memory reclaimer around
sb_lock inside grab_super_passive(). Grab_super_passive() is called from
two places: in icache/dcache shrinkers (function super_cache_scan) and
from writeback (function __writeback_inodes_wb). Both are required for
progress in memory allocator.

Grab_super_passive() acquires sb_lock to increment sb->s_count and check
sb->s_instances. It seems sb->s_umount locked for read is enough here:
super-block deactivation always runs under sb->s_umount locked for write.
Protecting super-block itself isn't a problem: in super_cache_scan() sb
is protected by shrinker_rwsem: it cannot be freed if its slab shrinkers
are still active. Inside writeback super-block comes from inode from bdi
writeback list under wb->list_lock.

This patch removes locking sb_lock and checks s_instances under s_umount:
generic_shutdown_super() unlinks it under sb->s_umount locked for write.
New variant is called trylock_super() and since it only locks semaphore,
callers must call up_read(&sb->s_umount) instead of drop_super(sb) when
they're done.
Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

eb6ef3df

fanotify: Fix up scripted S_ISDIR/S_ISREG/S_ISLNK conversions · 54f2a2f4

由 David Howells 提交于 1月 29, 2015

Fanotify probably doesn't want to watch autodirs so make it use d_can_lookup()
rather than d_is_dir() when checking a dir watch and give an error on fake
directories.
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

54f2a2f4

openanolis / cloud-kernel 1 年多 前同步成功

openanolis / cloud-kernel
1 年多前同步成功