提交 · f7545144c2e3d280139260df934043e0a6ccce6f · openeuler / raspberrypi-kernel

09 3月, 2011 7 次提交

nilfs2: use sb instance instead of nilfs_sb_info struct · f7545144

由 Ryusuke Konishi 提交于 3月 09, 2011

This replaces sbi uses with direct reference to sb instance.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

f7545144

nilfs2: get rid of sc_sbi back pointer · d96bbfa2

由 Ryusuke Konishi 提交于 3月 09, 2011

Removes sci->sc_sbi which is a back pointer to nilfs_sb_info struct
from log writer object (nilfs_sc_info).
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

d96bbfa2

nilfs2: move log writer onto nilfs object · 3fd3fe5a

由 Ryusuke Konishi 提交于 3月 09, 2011

Log writer is held by the nilfs_sb_info structure. This moves it into
nilfs object and replaces all uses of NILFS_SC() accessor.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

3fd3fe5a

nilfs2: move next generation counter into nilfs object · 9b1fc4e4

由 Ryusuke Konishi 提交于 3月 09, 2011

Moves s_next_generation counter and a spinlock protecting it to nilfs
object from nilfs_sb_info structure.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

9b1fc4e4

nilfs2: move s_inode_lock and s_dirty_files into nilfs object · 693dd321

由 Ryusuke Konishi 提交于 3月 09, 2011

Moves s_inode_lock spinlock and s_dirty_files list to nilfs object
from nilfs_sb_info structure.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

693dd321

nilfs2: move parameters on nilfs_sb_info into nilfs object · 574e6c31

由 Ryusuke Konishi 提交于 3月 09, 2011

This moves four parameter variables on nilfs_sb_info s_resuid,
s_resgid, s_interval and s_watermark to the nilfs object.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

574e6c31

nilfs2: move mount options to nilfs object · 3b2ce58b

由 Ryusuke Konishi 提交于 3月 09, 2011

This moves mount_opt local variable to nilfs object from nilfs_sb_info
struct.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

3b2ce58b

08 3月, 2011 10 次提交

nilfs2: record used amount of each checkpoint in checkpoint list · be667377

由 Ryusuke Konishi 提交于 3月 05, 2011

This records the number of used blocks per checkpoint in each
checkpoint entry of cpfile.  Even though userland tools can get the
block count via nilfs_get_cpinfo ioctl, it was not updated by the
nilfs2 kernel code.  This fixes the issue and makes it available for
userland tools to calculate used amount per checkpoint.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Cc: Jiro SEKIBA <jir@unicus.jp>

be667377

nilfs2: optimize rec_len functions · ae191838

由 Ryusuke Konishi 提交于 2月 04, 2011

This is a similar change to those in ext2/ext3 codebase (commit
40a063f6 and a4ae3094, respectively).

The addition of 64k block capability in the rec_len_from_disk and
rec_len_to_disk functions added a bit of math overhead which slows
down file create workloads needlessly when the architecture cannot
even support 64k blocks.  This will cut the corner.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

ae191838

nilfs2: append blocksize info to warnings during loading super blocks · 4138ec23

由 Ryusuke Konishi 提交于 1月 24, 2011

At present, the same warning message can be output twice when nilfs
detected a problem on super blocks:

 NILFS warning: broken superblock. using spare superblock.
 NILFS warning: broken superblock. using spare superblock.
 ...

This is because these super blocks are reloaded with the block size
written in a super block if it differs from the first block size, but
this repetition looks somewhat confusing.  So, we hint at what is
going on by appending block size information to those messages.
Reported-by: NWakko Warner <wakko@animx.eu.org>
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

4138ec23

nilfs2: add compat ioctl · 828b1c50

由 Ryusuke Konishi 提交于 2月 03, 2011

The current FS_IOC_GETFLAGS/SETFLAGS/GETVERSION will fail if
application is 32 bit and kernel is 64 bit.

This issue is avoidable by adding compat_ioctl method.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

828b1c50

nilfs2: implement FS_IOC_GETFLAGS/SETFLAGS/GETVERSION · cde98f0f

由 Ryusuke Konishi 提交于 1月 20, 2011

Add support for the standard attributes set via chattr and read via
lsattr.  These attributes are already in the flags value in the nilfs2
inode, but currently we don't have any ioctl commands that expose them
to the userland.

Collaterally, this adds the FS_IOC_GETVERSION ioctl for getting
i_generation, which allows users to list the file's generation number
with "lsattr -v".
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

cde98f0f

nilfs2: tighten restrictions on inode flags · b253a3e4

由 Ryusuke Konishi 提交于 1月 20, 2011

Nilfs has few rectrictions on which flags may be set on which inodes
like ext2/3/4 filesystems used to be.  Specifically DIRSYNC may only
be set on directories and IMMUTABLE and APPEND may not be set on
links.  Tighten that to disallow TOPDIR being set on non-directories
and only NODUMP and NOATIME to be set on non-regular file,
non-directories.

This introduces a flags masking function like those of extN and uses
it during inode creation.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

b253a3e4

nilfs2: mark S_NOATIME on inodes only if NOATIME attribute is set · 32f4aeb3

由 Ryusuke Konishi 提交于 1月 20, 2011

At present, nilfs marks S_NOATIME flag on all inodes. This restricts
nilfs_set_inode_flags function so that it marks S_NOATIME only if a
given inode has an FS_NOATIME_FL flag.

Although nilfs does not support atime yet, touch_atime() still safely
returns on IS_NOATIME check since MS_NOATIME is always set on sb.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

32f4aeb3

nilfs2: use common file attribute macros · f0c9f242

由 Ryusuke Konishi 提交于 1月 20, 2011

Replaces uses of own inode flags (i.e. NILFS_SECRM_FL, NILFS_UNRM_FL,
NILFS_COMPR_FL, and so forth) with common inode flags, and removes the
own flag declarations.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

f0c9f242

nilfs2: add free entries count only if clear bit operation succeeded · 9954e7af

由 Ryusuke Konishi 提交于 2月 23, 2011

Three functions of the current persistent object allocator,
nilfs_palloc_commit_free_entry, nilfs_palloc_abort_alloc_entry, and
nilfs_palloc_freev functions unconditionally add a counter after doing
clear bit operation on a bitmap block.

If the clear bit operation overlapped, the counter will not add up.
This fixes the issue by making the counter operations conditional.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

9954e7af

nilfs2: decrement inodes count only if raw inode was successfully deleted · 25b18d39

由 Ryusuke Konishi 提交于 2月 11, 2011

This fixes the issue that inodes count will not add up after removal
of raw inodes fails.  Hence, this prevents possible under flow of the
inodes count.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

25b18d39

03 3月, 2011 1 次提交
- A
  nilfs2: i_nlink races in rename() · 30eb43d3
  由 Al Viro 提交于 3月 02, 2011
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  30eb43d3
02 3月, 2011 1 次提交

nilfs2: fix regression that i-flag is not set on changeless checkpoints · 72746ac6

由 Ryusuke Konishi 提交于 2月 28, 2011

According to the report from Jiro SEKIBA titled "regression in
2.6.37?"  (Message-Id: <8739n8vs1f.wl%jir@sekiba.com>), on 2.6.37 and
later kernels, lscp command no longer displays "i" flag on checkpoints
that snapshot operations or garbage collection created.

This is a regression of nilfs2 checkpointing function, and it's
critical since it broke behavior of a part of nilfs2 applications.
For instance, snapshot manager of TimeBrowse gets to create
meaningless snapshots continuously; snapshot creation triggers another
checkpoint, but applications cannot distinguish whether the new
checkpoint contains meaningful changes or not without the i-flag.

This patch fixes the regression and brings that application behavior
back to normal.
Reported-by: NJiro SEKIBA <jir@unicus.jp>
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: NJiro SEKIBA <jir@unicus.jp>
Cc: stable <stable@kernel.org>  [2.6.37]

72746ac6

24 2月, 2011 1 次提交

mm: prevent concurrent unmap_mapping_range() on the same inode · 2aa15890

由 Miklos Szeredi 提交于 2月 23, 2011

Michael Leun reported that running parallel opens on a fuse filesystem
can trigger a "kernel BUG at mm/truncate.c:475"

Gurudas Pai reported the same bug on NFS.

The reason is, unmap_mapping_range() is not prepared for more than
one concurrent invocation per inode.  For example:

  thread1: going through a big range, stops in the middle of a vma and
     stores the restart address in vm_truncate_count.

  thread2: comes in with a small (e.g. single page) unmap request on
     the same vma, somewhere before restart_address, finds that the
     vma was already unmapped up to the restart address and happily
     returns without doing anything.

Another scenario would be two big unmap requests, both having to
restart the unmapping and each one setting vm_truncate_count to its
own value.  This could go on forever without any of them being able to
finish.

Truncate and hole punching already serialize with i_mutex.  Other
callers of unmap_mapping_range() do not, and it's difficult to get
i_mutex protection for all callers.  In particular ->d_revalidate(),
which calls invalidate_inode_pages2_range() in fuse, may be called
with or without i_mutex.

This patch adds a new mutex to 'struct address_space' to prevent
running multiple concurrent unmap_mapping_range() on the same mapping.

[ We'll hopefully get rid of all this with the upcoming mm
  preemptibility series by Peter Zijlstra, the "mm: Remove i_mmap_mutex
  lockbreak" patch in particular.  But that is for 2.6.39 ]
Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
Reported-by: NMichael Leun <lkml20101129@newton.leun.net>
Reported-by: NGurudas Pai <gurudas.pai@oracle.com>
Tested-by: NGurudas Pai <gurudas.pai@oracle.com>
Acked-by: NHugh Dickins <hughd@google.com>
Cc: stable@kernel.org
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

2aa15890

22 1月, 2011 1 次提交

nilfs2: fix crash after one superblock became unavailable · 0ca7a5b9

由 Ryusuke Konishi 提交于 1月 21, 2011

Fixes the following kernel oops in nilfs_setup_super() which could
arise if one of two super-blocks is unavailable.

> BUG: unable to handle kernel NULL pointer dereference at   (null)
> Pid: 3529, comm: mount.nilfs2 Not tainted 2.6.37 #1 /
> EIP: 0060:[<c03196bc>] EFLAGS: 00010202 CPU: 3
> EIP is at memcpy+0xc/0x1b
> Call Trace:
>  [<f953720e>] ? nilfs_setup_super+0x6c/0xa5 [nilfs2]
>  [<f95369e9>] ? nilfs_get_root_dentry+0x81/0xcb [nilfs2]
>  [<f9537a08>] ? nilfs_mount+0x4f9/0x62c [nilfs2]
>  [<c02745cf>] ? kstrdup+0x36/0x3f
>  [<f953750f>] ? nilfs_mount+0x0/0x62c [nilfs2]
>  [<c0293940>] ? vfs_kern_mount+0x4d/0x12c
>  [<c02a5100>] ? get_fs_type+0x76/0x8f
>  [<c0293a68>] ? do_kern_mount+0x33/0xbf
>  [<c02a784a>] ? do_mount+0x2ed/0x714
>  [<c02a6171>] ? copy_mount_options+0x28/0xfc
>  [<c02a7ce3>] ? sys_mount+0x72/0xaf
>  [<c0473085>] ? syscall_call+0x7/0xb
Reported-by: NWakko Warner <wakko@animx.eu.org>
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: NWakko Warner <wakko@animx.eu.org>
Cc: stable <stable@kernel.org> [2.6.37, 2.6.36]
LKML-Reference: <20110121024918.GA29598@animx.eu.org>

0ca7a5b9

11 1月, 2011 1 次提交

headers: kobject.h redux · 57cc7215

由 Alexey Dobriyan 提交于 1月 10, 2011

Remove kobject.h from files which don't need it, notably,
sched.h and fs.h.
Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

57cc7215

10 1月, 2011 10 次提交

nilfs2: unfold nilfs_dat_inode function · 365e215c

由 Ryusuke Konishi 提交于 12月 27, 2010

nilfs_dat_inode function was a wrapper to switch between normal dat
inode and gcdat, a clone of the dat inode for garbage collection.

This function got obsolete when the gcdat inode was removed, and now
we can access the dat inode directly from a nilfs object. So, we will
unfold the wrapper and remove it.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

365e215c

nilfs2: do not pass sbi to functions which can get it from inode · bcbc8c64

由 Ryusuke Konishi 提交于 12月 27, 2010

This removes argument for passing nilfs_sb_info structure from
nilfs_set_file_dirty and nilfs_load_inode_block functions.  We can get
a pointer to the structure from inodes.

[Stephen Rothwell <sfr@canb.auug.org.au>: fix conflict with commit
 b74c79e9]
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

bcbc8c64

nilfs2: get rid of nilfs_mount_options structure · 06df0f99

由 Ryusuke Konishi 提交于 12月 27, 2010

Only mount_opt member is used in the nilfs_mount_options structure,
and we can simplify it.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

06df0f99

nilfs2: simplify nilfs_mdt_freeze_buffer · a7a8447e

由 Ryusuke Konishi 提交于 12月 27, 2010

nilfs_page_get_nth_block() function used in nilfs_mdt_freeze_buffer()
always returns a valid buffer head, so its validity check can be
removed.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

a7a8447e

nilfs2: get rid of loaded flag from nilfs object · 888da23c

由 Ryusuke Konishi 提交于 12月 27, 2010

NILFS_LOADED flag of the nilfs object is not used now, so this will
remove it.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

888da23c

nilfs2: fix a checkpatch error in page.c · ae53a0a2

由 Ryusuke Konishi 提交于 12月 26, 2010

Will correct the following checkpatch error:

 ERROR: trailing whitespace
 #494: FILE: page.c:494:
 + $
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

ae53a0a2

nilfs2: fiemap support · 622daaff

由 Ryusuke Konishi 提交于 12月 26, 2010

This adds fiemap to nilfs.  Two new functions, nilfs_fiemap and
nilfs_find_uncommitted_extent are added.

nilfs_fiemap() implements the fiemap inode operation, and
nilfs_find_uncommitted_extent() helps to get a range of data blocks
whose physical location has not been determined.

nilfs_fiemap() collects extent information by looping through
nilfs_bmap_lookup_contig and nilfs_find_uncommitted_extent routines.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

622daaff

nilfs2: mark buffer heads as delayed until the data is written to disk · 27e6c7a3

由 Ryusuke Konishi 提交于 12月 26, 2010

Nilfs does not allocate new blocks on disk until they are actually
written to. To implement fiemap, we need to deal with such blocks.

To allow successive fiemap patch to distinguish mapped but unallocated
regions, this marks buffer heads of those new blocks as delayed and
clears the flag after the blocks are written to disk.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

27e6c7a3

nilfs2: call nilfs_error inside bmap routines · e828949e

由 Ryusuke Konishi 提交于 11月 19, 2010

Some functions using nilfs bmap routines can wrongly return invalid
argument error (i.e. -EINVAL) that bmap returns as an internal code
for btree corruption.

This fixes the issue by catching and converting the internal EINVAL to
EIO and calling nilfs_error function inside bmap routines.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

e828949e

fs/nilfs2/super.c: Use printf extension %pV · b004a5eb

由 Joe Perches 提交于 11月 09, 2010

Using %pV reduces the number of printk calls and
eliminates any possible message interleaving from
other printk calls.
Signed-off-by: NJoe Perches <joe@perches.com>
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

b004a5eb

07 1月, 2011 3 次提交

N
fs: provide rcu-walk aware permission i_ops · b74c79e9
由 Nick Piggin 提交于 1月 07, 2011
```
Signed-off-by: NNick Piggin <npiggin@kernel.dk>
```
b74c79e9

fs: icache RCU free inodes · fa0d7e3d

由 Nick Piggin 提交于 1月 07, 2011

RCU free the struct inode. This will allow:

- Subsequent store-free path walking patch. The inode must be consulted for
  permissions when walking, so an RCU inode reference is a must.
- sb_inode_list_lock to be moved inside i_lock because sb list walkers who want
  to take i_lock no longer need to take sb_inode_list_lock to walk the list in
  the first place. This will simplify and optimize locking.
- Could remove some nested trylock loops in dcache code
- Could potentially simplify things a bit in VM land. Do not need to take the
  page lock to follow page->mapping.

The downsides of this is the performance cost of using RCU. In a simple
creat/unlink microbenchmark, performance drops by about 10% due to inability to
reuse cache-hot slab objects. As iterations increase and RCU freeing starts
kicking over, this increases to about 20%.

In cases where inode lifetimes are longer (ie. many inodes may be allocated
during the average life span of a single inode), a lot of this cache reuse is
not applicable, so the regression caused by this patch is smaller.

The cache-hot regression could largely be avoided by using SLAB_DESTROY_BY_RCU,
however this adds some complexity to list walking and store-free path walking,
so I prefer to implement this at a later date, if it is shown to be a win in
real situations. I haven't found a regression in any non-micro benchmark so I
doubt it will be a problem.
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

fa0d7e3d

fs: dcache scale dentry refcount · b7ab39f6

由 Nick Piggin 提交于 1月 07, 2011

Make d_count non-atomic and protect it with d_lock. This allows us to ensure a
0 refcount dentry remains 0 without dcache_lock. It is also fairly natural when
we start protecting many other dentry members with d_lock.
Signed-off-by: NNick Piggin <npiggin@kernel.dk>

b7ab39f6

16 12月, 2010 1 次提交

nilfs2: fix regression of garbage collection ioctl · 947b10ae

由 Ryusuke Konishi 提交于 12月 16, 2010

On 2.6.37-rc1, garbage collection ioctl of nilfs was broken due to the
commit 263d90ce ("nilfs2: remove own inode hash used for GC"),
and leading to filesystem corruption.

The patch doesn't queue gc-inodes for log writer if they are reused
through the vfs inode cache.  Here, gc-inode is the inode which
buffers blocks to be relocated on GC.  That patch queues gc-inodes in
nilfs_init_gcinode() function, but this function is not called when
they don't have I_NEW flag.  Thus, some of live blocks are wrongly
overrode without being moved to new logs.

This resolves the problem by moving the gc-inode queueing to an outer
function to ensure it's done right.
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

947b10ae

24 11月, 2010 1 次提交
- R
  nilfs2: fix typo in comment of nilfs_dat_move function · f6c26ec5
  由 Ryusuke Konishi 提交于 11月 24, 2010
```
Fixes a typo: "uncommited" -> "uncommitted".
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
```
  f6c26ec5
23 11月, 2010 1 次提交

nilfs2: nilfs_iget_for_gc() returns ERR_PTR · 103cfcf5

由 Dan Carpenter 提交于 11月 23, 2010

nilfs_iget_for_gc() returns an ERR_PTR() on failure and doesn't return
NULL.
Signed-off-by: NDan Carpenter <error27@gmail.com>
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>

103cfcf5

13 11月, 2010 2 次提交

block: clean up blkdev_get() wrappers and their users · d4d77629

由 Tejun Heo 提交于 11月 13, 2010

After recent blkdev_get() modifications, open_by_devnum() and
open_bdev_exclusive() are simple wrappers around blkdev_get().
Replace them with blkdev_get_by_dev() and blkdev_get_by_path().

blkdev_get_by_dev() is identical to open_by_devnum().
blkdev_get_by_path() is slightly different in that it doesn't
automatically add %FMODE_EXCL to @mode.

All users are converted.  Most conversions are mechanical and don't
introduce any behavior difference.  There are several exceptions.

* btrfs now sets FMODE_EXCL in btrfs_device->mode, so there's no
  reason to OR it explicitly on blkdev_put().

* gfs2, nilfs2 and the generic mount_bdev() now set FMODE_EXCL in
  sb->s_mode.

* With the above changes, sb->s_mode now always should contain
  FMODE_EXCL.  WARN_ON_ONCE() added to kill_block_super() to detect
  errors.

The new blkdev_get_*() functions are with proper docbook comments.
While at it, add function description to blkdev_get() too.
Signed-off-by: NTejun Heo <tj@kernel.org>
Cc: Philipp Reisner <philipp.reisner@linbit.com>
Cc: Neil Brown <neilb@suse.de>
Cc: Mike Snitzer <snitzer@redhat.com>
Cc: Joern Engel <joern@lazybastard.org>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Jan Kara <jack@suse.cz>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: KONISHI Ryusuke <konishi.ryusuke@lab.ntt.co.jp>
Cc: reiserfs-devel@vger.kernel.org
Cc: xfs-masters@oss.sgi.com
Cc: Alexander Viro <viro@zeniv.linux.org.uk>

d4d77629

block: make blkdev_get/put() handle exclusive access · e525fd89

由 Tejun Heo 提交于 11月 13, 2010

Over time, block layer has accumulated a set of APIs dealing with bdev
open, close, claim and release.

* blkdev_get/put() are the primary open and close functions.

* bd_claim/release() deal with exclusive open.

* open/close_bdev_exclusive() are combination of open and claim and
  the other way around, respectively.

* bd_link/unlink_disk_holder() to create and remove holder/slave
  symlinks.

* open_by_devnum() wraps bdget() + blkdev_get().

The interface is a bit confusing and the decoupling of open and claim
makes it impossible to properly guarantee exclusive access as
in-kernel open + claim sequence can disturb the existing exclusive
open even before the block layer knows the current open if for another
exclusive access.  Reorganize the interface such that,

* blkdev_get() is extended to include exclusive access management.
  @holder argument is added and, if is @FMODE_EXCL specified, it will
  gain exclusive access atomically w.r.t. other exclusive accesses.

* blkdev_put() is similarly extended.  It now takes @mode argument and
  if @FMODE_EXCL is set, it releases an exclusive access.  Also, when
  the last exclusive claim is released, the holder/slave symlinks are
  removed automatically.

* bd_claim/release() and close_bdev_exclusive() are no longer
  necessary and either made static or removed.

* bd_link_disk_holder() remains the same but bd_unlink_disk_holder()
  is no longer necessary and removed.

* open_bdev_exclusive() becomes a simple wrapper around lookup_bdev()
  and blkdev_get().  It also has an unexpected extra bdev_read_only()
  test which probably should be moved into blkdev_get().

* open_by_devnum() is modified to take @holder argument and pass it to
  blkdev_get().

Most of bdev open/close operations are unified into blkdev_get/put()
and most exclusive accesses are tested atomically at the open time (as
it should).  This cleans up code and removes some, both valid and
invalid, but unnecessary all the same, corner cases.

open_bdev_exclusive() and open_by_devnum() can use further cleanup -
rename to blkdev_get_by_path() and blkdev_get_by_devt() and drop
special features.  Well, let's leave them for another day.

Most conversions are straight-forward.  drbd conversion is a bit more
involved as there was some reordering, but the logic should stay the
same.
Signed-off-by: NTejun Heo <tj@kernel.org>
Acked-by: NNeil Brown <neilb@suse.de>
Acked-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Acked-by: NMike Snitzer <snitzer@redhat.com>
Acked-by: NPhilipp Reisner <philipp.reisner@linbit.com>
Cc: Peter Osterlund <petero2@telia.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andreas Dilger <adilger.kernel@dilger.ca>
Cc: "Theodore Ts'o" <tytso@mit.edu>
Cc: Mark Fasheh <mfasheh@suse.com>
Cc: Joel Becker <joel.becker@oracle.com>
Cc: Alex Elder <aelder@sgi.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: dm-devel@redhat.com
Cc: drbd-dev@lists.linbit.com
Cc: Leo Chen <leochen@broadcom.com>
Cc: Scott Branden <sbranden@broadcom.com>
Cc: Chris Mason <chris.mason@oracle.com>
Cc: Steven Whitehouse <swhiteho@redhat.com>
Cc: Dave Kleikamp <shaggy@linux.vnet.ibm.com>
Cc: Joern Engel <joern@logfs.org>
Cc: reiserfs-devel@vger.kernel.org
Cc: Alexander Viro <viro@zeniv.linux.org.uk>

e525fd89