提交 · 857eceebd2803c9a3459f784acf45e5266921e4d · openeuler / Kernel

27 6月, 2009 4 次提交

Add new __init_task_data macro to be used in arch init_task.c files. · 857eceeb

由 Tim Abbott 提交于 6月 23, 2009

This patch is preparation for replacing most ".data.init_task" in the
kernel with macros, so that the section name can later be changed
without having to touch a lot of the kernel.

The long-term goal here is to be able to change the kernel's magic
section names to those that are compatible with -ffunction-sections
-fdata-sections.  This requires renaming all magic sections with names
of the form ".data.foo".
Signed-off-by: NTim Abbott <tabbott@ksplice.com>
Signed-off-by: NSam Ravnborg <sam@ravnborg.org>

857eceeb

asm-generic/vmlinux.lds.h: shuffle INIT_TASK* macro names in vmlinux.lds.h · 39a449d9

由 Tim Abbott 提交于 6月 23, 2009

We recently added a INIT_TASK(align) in include/asm-generic/vmlinux.lds.h,
but there is already a macro INIT_TASK in include/linux/init_task.h, which
is quite confusing.  We should switch the macro in the linker script to
INIT_TASK_DATA. (Sorry that I missed this in reviewing the patch).  Since
the macros are new, there is only one user of the INIT_TASK in
vmlinux.lds.h, arch/mn10300/kernel/vmlinux.lds.S.

However, we are currently using INIT_TASK_DATA for laying down an entire
.data.init_task section.  So rename that to INIT_TASK_DATA_SECTION.

I would be worried about changing the meaning of INIT_TASK_DATA, but the
old INIT_TASK_DATA implementation had no users, and in fact if anyone had
tried to use it, it would have failed to compile because it didn't pass
the alignment to the old INIT_TASK.
Signed-off-by: NTim Abbott <tabbott@ksplice.com>
Cc: David Howells <dhowells@redhat.com>
Cc: Jesper Nilsson <Jesper.Nilsson@axis.com
Signed-off-by: NSam Ravnborg <sam@ravnborg.org>

39a449d9

Add new macros for page-aligned data and bss sections. · d2af12ae

由 Tim Abbott 提交于 6月 23, 2009

This patch is preparation for replacing most uses of
".bss.page_aligned" and ".data.page_aligned" in the kernel with
macros, so that the section name can later be changed without having
to touch a lot of the kernel.

The long-term goal here is to be able to change the kernel's magic
section names to those that are compatible with -ffunction-sections
-fdata-sections.  This requires renaming all magic sections with names
of the form ".data.foo".
Signed-off-by: NTim Abbott <tabbott@ksplice.com>
Acked-by: NDavid Howells <dhowells@redhat.com>
Signed-off-by: NSam Ravnborg <sam@ravnborg.org>

d2af12ae

asm-generic/vmlinux.lds.h: Fix up RW_DATA_SECTION definition. · 73f1d939

由 Paul Mundt 提交于 6月 24, 2009

RW_DATA_SECTION is defined to take 4 different alignment parameters,
while NOSAVE_DATA currently uses a fixed PAGE_SIZE alignment as noted
in the comments.

There are presently no in-tree users of this at present, and I just
stumbled across this while implementing the simplified script on a new
architecture port, which subsequently resulted in a syntax error.
Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
Signed-off-by: NSam Ravnborg <sam@ravnborg.org>

73f1d939

26 6月, 2009 1 次提交

clarify get_user_pages() prototype · 9d73777e

由 Peter Zijlstra 提交于 6月 25, 2009

Currently the 4th parameter of get_user_pages() is called len, but its
in pages, not bytes. Rename the thing to nr_pages to avoid future
confusion.
Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

9d73777e

25 6月, 2009 2 次提交
- A
  Get "no acls for this inode" right, fix shmem breakage · 72c04902
  由 Al Viro 提交于 6月 24, 2009
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  72c04902
- M
  inline functions left without protection of ifdef (acl) · 641cf4a6
  由 Markus Trippelsdorf 提交于 6月 24, 2009
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
  641cf4a6
24 6月, 2009 16 次提交

helpers for acl caching + switch to those · 073aaa1b

由 Al Viro 提交于 6月 09, 2009

helpers: get_cached_acl(inode, type), set_cached_acl(inode, type, acl),
forget_cached_acl(inode, type).

ubifs/xattr.c needed includes reordered, the rest is a plain switchover.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

073aaa1b

A
switch shmem to inode->i_acl · 06b16e9f
由 Al Viro 提交于 6月 08, 2009
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
06b16e9f
A
switch reiserfs to inode->i_acl · 281eede0
由 Al Viro 提交于 6月 08, 2009
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
281eede0
A
switch reiserfs to usual conventions for caching ACLs · 7a77b15d
由 Al Viro 提交于 6月 08, 2009
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
7a77b15d
A
switch ext3 to inode->i_acl · 6582a0e6
由 Al Viro 提交于 6月 08, 2009
```
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
6582a0e6
A
add caching of ACLs in struct inode · f19d4a8f
由 Al Viro 提交于 6月 08, 2009
```
No helpers, no conversions yet.
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
```
f19d4a8f

fs: Add new pre-allocation ioctls to vfs for compatibility with legacy xfs ioctls · 3e63cbb1

由 Ankit Jain 提交于 6月 19, 2009

This patch adds ioctls to vfs for compatibility with legacy XFS
pre-allocation ioctls (XFS_IOC_*RESVP*). The implementation
effectively invokes sys_fallocate for the new ioctls.
Also handles the compat_ioctl case.
Note: These legacy ioctls are also implemented by OCFS2.

[AV: folded fixes from hch]
Signed-off-by: NAnkit Jain <me@ankitjain.org>
Signed-off-by: NChristoph Hellwig <hch@lst.de>
Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>

3e63cbb1

ide: relax DMA info validity checking · 346c17a6

由 Bartlomiej Zolnierkiewicz 提交于 6月 22, 2009

There are some broken devices that report multiple DMA xfer modes
enabled at once (ATA spec doesn't allow it) but otherwise work fine
with DMA so just delete ide_id_dma_bug().

[ As discovered by detective work by Frans and Bart, due to how
  handling of the ID block was handled before commit c4199930
  ("ide-iops: only clear DMA words on setting DMA mode") this
  check was always seeing zeros in the fields or other similar
  garbage.  Therefore this check wasn't actually checking anything.
  Now that the tests actually check the real bits, all we see are
  devices that trigger the check yet work perfectly fine, therefore
  killing this useless check is the best thing to do. -DaveM ]
Reported-by: NFrans Pop <elendil@planet.nl>
Signed-off-by: NBartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

346c17a6

drm: Fix shifts which were miscalculated when converting from bitfields. · e14cbee4

由 Michel Dänzer 提交于 6月 23, 2009

Looks like I managed to mess up most shifts when converting from bitfields. :(

The patch below works on my Thinkpad T500 (as well as on my PowerBook,
where the previous change worked as well, maybe out of luck...). I'd
appreciate more testing and eyes looking over it though.
Signed-off-by: NMichel Dänzer <daenzer@vmware.com>
Tested-by: NMichael Pyne <mpyne@kde.org>
Signed-off-by: NDave Airlie <airlied@linux.ie>

e14cbee4

Audit: clean up all op= output to include string quoting · 9d960985

由 Eric Paris 提交于 6月 11, 2009

A number of places in the audit system we send an op= followed by a string
that includes spaces.  Somehow this works but it's just wrong.  This patch
moves all of those that I could find to be quoted.

Example:

Change From: type=CONFIG_CHANGE msg=audit(1244666690.117:31): auid=0 ses=1
subj=unconfined_u:unconfined_r:auditctl_t:s0-s0:c0.c1023 op=remove rule
key="number2" list=4 res=0

Change To: type=CONFIG_CHANGE msg=audit(1244666690.117:31): auid=0 ses=1
subj=unconfined_u:unconfined_r:auditctl_t:s0-s0:c0.c1023 op="remove rule"
key="number2" list=4 res=0
Signed-off-by: NEric Paris <eparis@redhat.com>

9d960985

ACPI: Add the reference count to avoid unloading ACPI video bus twice · 86e437f0

由 Zhao Yakui 提交于 6月 16, 2009

Sometimes both acpi video and i915 driver are compiled as modules.
And there exists the strict dependency between the two drivers.
The acpi video bus will be unloaded in course of unloading the i915 driver.
If we unload the acpi video driver, then the kernel oops will be triggered.

Add the reference count to avoid unloading the ACPI video bus twice.
The reference count should be checked before unregistering the acpi video bus.
If the reference count is already zero, it won't unregister it again.
And after the acpi video bus is already unregistered, the reference count
will be set to zero.

http://bugzilla.kernel.org/show_bug.cgi?id=13396Signed-off-by: NZhao Yakui <yakui.zhao@intel.com>
Acked-by: NZhang Rui <rui.zhang@intel.com>
Signed-off-by: NLen Brown <len.brown@intel.com>

86e437f0

net: Move rx skb_orphan call to where needed · d55d87fd

由 Herbert Xu 提交于 6月 22, 2009

In order to get the tun driver to account packets, we need to be
able to receive packets with destructors set.  To be on the safe
side, I added an skb_orphan call for all protocols by default since
some of them (IP in particular) cannot handle receiving packets
destructors properly.

Now it seems that at least one protocol (CAN) expects to be able
to pass skb->sk through the rx path without getting clobbered.

So this patch attempts to fix this properly by moving the skb_orphan
call to where it's actually needed.  In particular, I've added it
to skb_set_owner_[rw] which is what most users of skb->destructor
call.

This is actually an improvement for tun too since it means that
we only give back the amount charged to the socket when the skb
is passed to another socket that will also be charged accordingly.
Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
Tested-by: NOliver Hartkopp <olver@hartkopp.net>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d55d87fd

Intel-IOMMU, intr-remap: source-id checking · f007e99c

由 Weidong Han 提交于 5月 23, 2009

To support domain-isolation usages, the platform hardware must be
capable of uniquely identifying the requestor (source-id) for each
interrupt message. Without source-id checking for interrupt remapping
, a rouge guest/VM with assigned devices can launch interrupt attacks
to bring down anothe guest/VM or the VMM itself.

This patch adds source-id checking for interrupt remapping, and then
really isolates interrupts for guests/VMs with assigned devices.

Because PCI subsystem is not initialized yet when set up IOAPIC
entries, use read_pci_config_byte to access PCI config space directly.
Signed-off-by: NWeidong Han <weidong.han@intel.com>
Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>

f007e99c

rmap: fixup page_referenced() for nommu systems · 01ff53f4

由 Mike Frysinger 提交于 6月 23, 2009

After the recent changes that went into mm/vmscan.c to overhaul stuff, we
ended up with these warnings on no-mmu systems:

  mm/vmscan.c: In function `shrink_page_list':
  mm/vmscan.c:580: warning: unused variable `vm_flags'
  mm/vmscan.c: In function `shrink_active_list':
  mm/vmscan.c:1294: warning: `vm_flags' may be used uninitialized in this function
  mm/vmscan.c:1242: note: `vm_flags' was declared here

This is because the no-mmu function defines page_referenced() to work on
the first argument only (the page).  It does not clear the vm_flags given
to it because for no-mmu systems, they never actually get utilized.  Since
that is no longer strictly true, we need to set vm_flags to 0 like
everyone else so gcc can do proper dead code elimination without annoying
us with unused warnings.
Signed-off-by: NMike Frysinger <vapier@gentoo.org>
Cc: David Howells <dhowells@redhat.com>
Acked-by: NDavid McCullough <davidm@snapgear.com>
Cc: Greg Ungerer <gerg@uclinux.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Wu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

01ff53f4

mm: pass mm to grab_swap_token · a5c9b696

由 Hugh Dickins 提交于 6月 23, 2009

If a kthread happens to use get_user_pages() on an mm (as KSM does),
there's a chance that it will end up trying to read in a swap page, then
oops in grab_swap_token() because the kthread has no mm: GUP passes down
the right mm, so grab_swap_token() ought to be using it.

We have not identified a stronger case than KSM's daemon (not yet in
mainline), but the issue must have come up before, since RHEL has included
a fix for this for years (though a different fix, they just back out of
grab_swap_token if current->mm is unset: which is what we first proposed,
but using the right mm here seems more correct).
Reported-by: NIzik Eidus <ieidus@redhat.com>
Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
Acked-by: NRik van Riel <riel@redhat.com>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

a5c9b696

hugetlb: fault flags instead of write_access · 788c7df4

由 Hugh Dickins 提交于 6月 23, 2009

handle_mm_fault() is now passing fault flags rather than write_access
down to hugetlb_fault(), so better recognize that in hugetlb_fault(),
and in hugetlb_no_page().
Signed-off-by: NHugh Dickins <hugh.dickins@tiscali.co.uk>
Acked-by: NWu Fengguang <fengguang.wu@intel.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

788c7df4

23 6月, 2009 15 次提交

asm-generic: add dummy pgprot_noncached() · 0634a632

由 Paul Mundt 提交于 6月 23, 2009

Most architectures now provide a pgprot_noncached(), the
remaining ones can simply use an dummy default implementation,
except for cris and xtensa, which should override the
default appropriately.
Signed-off-by: NArnd Bergmann <arnd@arndb.de>
Cc: Jesper Nilsson <jesper.nilsson@axis.com>
Cc: Chris Zankel <chris@zankel.net>
Cc: Magnus Damm <magnus.damm@gmail.com>

0634a632

ipv6: Use correct data types for ICMPv6 type and code · d5fdd6ba

由 Brian Haley 提交于 6月 23, 2009

Change all the code that deals directly with ICMPv6 type and code
values to use u8 instead of a signed int as that's the actual data
type.
Signed-off-by: NBrian Haley <brian.haley@hp.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

d5fdd6ba

V4L/DVB (11901): v4l2: Create helper function for bounding and aligning images · b0d3159b

由 Trent Piepho 提交于 5月 30, 2009

Most hardware has limits on minimum and maximum image dimensions and also
requirements about alignment.  For example, image width must be even or a
multiple of four.  Some hardware has requirements that the total image size
(width * height) be a multiple of some power of two.

v4l_bound_align_image() will enforce min and max width and height, power of
two alignment on width and height, and power of two alignment on total
image size.

It uses an efficient algorithm that will try to find the "closest" image
size that meets the requirements.
Signed-off-by: NTrent Piepho <xyzzy@speakeasy.org>
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

b0d3159b

V4L/DVB (12125): v4l2: add new s_config subdev ops and v4l2_i2c_new_subdev_cfg/board calls · f0222c7d

由 Hans Verkuil 提交于 6月 09, 2009

Add a new s_config core ops call: this is called with the irq and platform
data to be used to initialize the subdev.

Added new v4l2_i2c_new_subdev_cfg and v4l2_i2c_new_subdev_board calls
that allows you to pass these new arguments.

The existing v4l2_i2c_new_subdev functions were modified to also call
s_config.

In the future the existing v4l2_i2c_new_subdev functions will be replaced
by a single v4l2_i2c_new_subdev function similar to v4l2_i2c_new_subdev_cfg
but without the irq and platform_data arguments.
Signed-off-by: NHans Verkuil <hverkuil@xs4all.nl>
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

f0222c7d

V4L/DVB (12108): v4l2-i2c-drv.h: add comment describing when not to use this header. · 719cd4ab

由 Hans Verkuil 提交于 6月 14, 2009

Make it very clear that this header should not be used for i2c drivers that
do not need to be compiled for pre-2.6.26 kernels.

As soon as the minimum supported kernel in the v4l-dvb repository becomes
2.6.26 or up, then this header should be removed entirely.
Signed-off-by: NHans Verkuil <hverkuil@xs4all.nl>
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

719cd4ab

V4L/DVB (12102): em28xx: add Remote control support for EVGA inDtube · a4c47303

由 Devin Heitmueller 提交于 6月 20, 2009

Add an IR profile for the EVGA inDtube remote control (which is an NEC type
remote)
Signed-off-by: NDevin Heitmueller <dheitmueller@kernellabs.com>
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

a4c47303

V4L/DVB (12079): gspca_ov519: add support for the ov511 bridge · 1876bb92

由 Hans de Goede 提交于 6月 14, 2009

gspca_ov519: add support for the ov511 bridge
Signed-off-by: NHans de Goede <hdegoede@redhat.com>
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

1876bb92

V4L/DVB (12072): gspca-ov519: add extra controls · 02ab18b0

由 Hans de Goede 提交于 6月 14, 2009

This patch adds autobrightness (so that it can
be turned off to make the already present brightness
control work) and light frequency filtering controls.

The lightfreq control needed 2 different entries
in the ctrls array, as the number of options differs
depending on the sensor. Always one of the 2 entires is
disabled ofcourse.
Signed-off-by: NHans de Goede <hdegoede@redhat.com>
Signed-off-by: NMauro Carvalho Chehab <mchehab@redhat.com>

02ab18b0

VFS: Add VFS helper functions for setting up private namespaces · cf8d2c11

由 Trond Myklebust 提交于 6月 22, 2009

The purpose of this patch is to improve the remote mount path lookup
support for distributed filesystems such as the NFSv4 client.

When given a mount command of the form "mount server:/foo/bar /mnt", the
NFSv4 client is required to look up the filehandle for "server:/", and
then look up each component of the remote mount path "foo/bar" in order
to find the directory that is actually going to be mounted on /mnt.
Following that remote mount path may involve following symlinks,
crossing server-side mount points and even following referrals to
filesystem volumes on other servers.

Since the standard VFS path lookup code already supports walking paths
that contain all these features (using in-kernel automounts for
following referrals) we would like to be able to reuse that rather than
duplicate the full path traversal functionality in the NFSv4 client code.

This patch therefore defines a VFS helper function create_mnt_ns(), that
sets up a temporary filesystem namespace and attaches a root filesystem to
it. It exports the create_mnt_ns() and put_mnt_ns() function for use by
filesystem modules.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

cf8d2c11

VFS: Uninline the function put_mnt_ns() · 616511d0

由 Trond Myklebust 提交于 6月 22, 2009

In order to allow modules to use it without having to export vfsmount_lock.
Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

616511d0

mm/init: cpu_hotplug_init() must be initialized before SLAB · 31950eb6

由 Linus Torvalds 提交于 6月 22, 2009

SLAB uses get/put_online_cpus() which use a mutex which is itself only
initialized when cpu_hotplug_init() is called.  Currently we hang suring
boot in SLAB due to doing that too late.

Reported by James Bottomley and Sachin Sant (and possibly others).
Debugged by Benjamin Herrenschmidt.

This just removes the dynamic initialization of the data structures, and
replaces it with a static one, avoiding this dependency entirely, and
removing one unnecessary special initcall.
Tested-by: NSachin Sant <sachinp@in.ibm.com>
Tested-by: NJames Bottomley <James.Bottomley@HansenPartnership.com>
Tested-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

31950eb6

vfs: Set special lockdep map for dirs only if not set by fs · 9a7aa12f

由 Jan Kara 提交于 6月 04, 2009

Some filesystems need to set lockdep map for i_mutex differently for
different directories. For example OCFS2 has system directories (for
orphan inode tracking and for gathering all system files like journal
or quota files into a single place) which have different locking
locking rules than standard directories. For a filesystem setting
lockdep map is naturaly done when the inode is read but we have to
modify unlock_new_inode() not to overwrite the lockdep map the filesystem
has set.

Acked-by: peterz@infradead.org
CC: mingo@redhat.com
Signed-off-by: NJan Kara <jack@suse.cz>
Signed-off-by: NJoel Becker <joel.becker@oracle.com>

9a7aa12f

LDSCRIPT: Name INIT_RAM_FS consistently · eadfe219

由 David Howells 提交于 6月 22, 2009

In asm-generic/vmlinux.lds.h, name INIT_RAM_FS consistently, no matter the
setting of CONFIG_BLK_DEV_INITRD.  This corrects:

	commit ef53dae8
	Author: Sam Ravnborg <sam@ravnborg.org>
	Date:   Sun Jun 7 20:46:37 2009 +0200
	Subject: Improve vmlinux.lds.h support for arch specific linker scripts
Signed-off-by: NDavid Howells <dhowells@redhat.com>
Acked-by: NSam Ravnborg <sam@ravnborg.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

eadfe219

msm_serial: serial driver for MSM7K onboard serial peripheral. · 04896a77

由 Robert Love 提交于 6月 22, 2009

Signed-off-by: NBrian Swetland <swetland@google.com>
Signed-off-by: NAlan Cox <alan@linux.intel.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

04896a77

serial@ add OMAP wakeup-enable register · 099d5270

由 Kevin Hilman 提交于 6月 22, 2009

Add the wakeup enable register to the list of OMAP-specific UART
registers.  This is to support forthcoming OMAP PM enhancements which
use the wakeup feature of the OMAP's 8250-based UART.
Signed-off-by: NKevin Hilman <khilman@deeprootsystems.com>
Signed-off-by: NAlan Cox <alan@linux.intel.com>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

099d5270

22 6月, 2009 2 次提交

dm: prepare for request based option · cec47e3d

由 Kiyoshi Ueda 提交于 6月 22, 2009

This patch adds core functions for request-based dm.

When struct mapped device (md) is initialized, md->queue has
an I/O scheduler and the following functions are used for
request-based dm as the queue functions:
    make_request_fn: dm_make_request()
    pref_fn:         dm_prep_fn()
    request_fn:      dm_request_fn()
    softirq_done_fn: dm_softirq_done()
    lld_busy_fn:     dm_lld_busy()
Actual initializations are done in another patch (PATCH 2).

Below is a brief summary of how request-based dm behaves, including:
  - making request from bio
  - cloning, mapping and dispatching request
  - completing request and bio
  - suspending md
  - resuming md

  bio to request
  ==============
  md->queue->make_request_fn() (dm_make_request()) calls __make_request()
  for a bio submitted to the md.
  Then, the bio is kept in the queue as a new request or merged into
  another request in the queue if possible.

  Cloning and Mapping
  ===================
  Cloning and mapping are done in md->queue->request_fn() (dm_request_fn()),
  when requests are dispatched after they are sorted by the I/O scheduler.

  dm_request_fn() checks busy state of underlying devices using
  target's busy() function and stops dispatching requests to keep them
  on the dm device's queue if busy.
  It helps better I/O merging, since no merge is done for a request
  once it is dispatched to underlying devices.

  Actual cloning and mapping are done in dm_prep_fn() and map_request()
  called from dm_request_fn().
  dm_prep_fn() clones not only request but also bios of the request
  so that dm can hold bio completion in error cases and prevent
  the bio submitter from noticing the error.
  (See the "Completion" section below for details.)

  After the cloning, the clone is mapped by target's map_rq() function
    and inserted to underlying device's queue using
    blk_insert_cloned_request().

  Completion
  ==========
  Request completion can be hooked by rq->end_io(), but then, all bios
  in the request will have been completed even error cases, and the bio
  submitter will have noticed the error.
  To prevent the bio completion in error cases, request-based dm clones
  both bio and request and hooks both bio->bi_end_io() and rq->end_io():
      bio->bi_end_io(): end_clone_bio()
      rq->end_io():     end_clone_request()

  Summary of the request completion flow is below:
  blk_end_request() for a clone request
    => blk_update_request()
       => bio->bi_end_io() == end_clone_bio() for each clone bio
          => Free the clone bio
          => Success: Complete the original bio (blk_update_request())
             Error:   Don't complete the original bio
    => blk_finish_request()
       => rq->end_io() == end_clone_request()
          => blk_complete_request()
             => dm_softirq_done()
                => Free the clone request
                => Success: Complete the original request (blk_end_request())
                   Error:   Requeue the original request

  end_clone_bio() completes the original request on the size of
  the original bio in successful cases.
  Even if all bios in the original request are completed by that
  completion, the original request must not be completed yet to keep
  the ordering of request completion for the stacking.
  So end_clone_bio() uses blk_update_request() instead of
  blk_end_request().
  In error cases, end_clone_bio() doesn't complete the original bio.
  It just frees the cloned bio and gives over the error handling to
  end_clone_request().

  end_clone_request(), which is called with queue lock held, completes
  the clone request and the original request in a softirq context
  (dm_softirq_done()), which has no queue lock, to avoid a deadlock
  issue on submission of another request during the completion:
      - The submitted request may be mapped to the same device
      - Request submission requires queue lock, but the queue lock
        has been held by itself and it doesn't know that

  The clone request has no clone bio when dm_softirq_done() is called.
  So target drivers can't resubmit it again even error cases.
  Instead, they can ask dm core for requeueing and remapping
  the original request in that cases.

  suspend
  =======
  Request-based dm uses stopping md->queue as suspend of the md.
  For noflush suspend, just stops md->queue.

  For flush suspend, inserts a marker request to the tail of md->queue.
  And dispatches all requests in md->queue until the marker comes to
  the front of md->queue.  Then, stops dispatching request and waits
  for the all dispatched requests to complete.
  After that, completes the marker request, stops md->queue and
  wake up the waiter on the suspend queue, md->wait.

  resume
  ======
  Starts md->queue.
Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

cec47e3d

dm raid1: add userspace log · f5db4af4

由 Jonthan Brassow 提交于 6月 22, 2009

This patch contains a device-mapper mirror log module that forwards
requests to userspace for processing.

The structures used for communication between kernel and userspace are
located in include/linux/dm-log-userspace.h.  Due to the frequency,
diversity, and 2-way communication nature of the exchanges between
kernel and userspace, 'connector' was chosen as the interface for
communication.

The first log implementations written in userspace - "clustered-disk"
and "clustered-core" - support clustered shared storage.   A userspace
daemon (in the LVM2 source code repository) uses openAIS/corosync to
process requests in an ordered fashion with the rest of the nodes in the
cluster so as to prevent log state corruption.  Other implementations
with no association to LVM or openAIS/corosync, are certainly possible.

(Imagine if two machines are writing to the same region of a mirror.
They would both mark the region dirty, but you need a cluster-aware
entity that can handle properly marking the region clean when they are
done.  Otherwise, you might clear the region when the first machine is
done, not the second.)
Signed-off-by: NJonathan Brassow <jbrassow@redhat.com>
Cc: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Signed-off-by: NAlasdair G Kergon <agk@redhat.com>

f5db4af4

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功