提交 · d97081b0c7bdb55371994cc6690217bf393eb63e · openeuler / Kernel

22 3月, 2012 30 次提交

rbd: move ctl_mutex lock inside rbd_get_client() · d97081b0

由 Alex Elder 提交于 1月 29, 2012

Since rbd_get_client() is only called in one place, move the
acquisition of the mutex around that call inside that function.

Furthermore, within rbd_get_client(), it appears the mutex only
needs to be held while calling rbd_client_create().  (Moving
the lock inside that function will wait for the next patch.)
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

d97081b0

rbd: release client list lock sooner · e6994d3d

由 Alex Elder 提交于 1月 29, 2012

In rbd_get_client(), if a client is reused, a number of things
get done while still holding the list lock unnecessarily.

This just moves a few things that need no lock protection outside
the lock.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

e6994d3d

rbd: restore previous rbd id sequence behavior · d184f6bf

由 Alex Elder 提交于 1月 29, 2012

It used to be that selecting a new unique identifier for an added
rbd device required searching all existing ones to find the highest
id is used.  A recent change made that unnecessary, but made it
so that id's used were monotonically non-decreasing.  It's a bit
more pleasant to have smaller rbd id's though, and this change
makes ids get allocated as they were before--each new id is one more
than the maximum currently in use.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

d184f6bf

rbd: tie rbd_dev_list changes to rbd_id operations · 499afd5b

由 Alex Elder 提交于 2月 02, 2012

The only time entries are added to or removed from the global
rbd_dev_list is exactly when a "put" or "get" operation is being
performed on a rbd_dev's id.  So just move the list management code
into get/put routines.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

499afd5b

rbd: protect the rbd_dev_list with a spinlock · e124a82f

由 Alex Elder 提交于 1月 29, 2012

The rbd_dev_list is just a simple list of all the current
rbd_devices.  Using the ctl_mutex as a concurrency guard is
overkill.  Instead, use a spinlock for that specific purpose.

This also reduces the window that the ctl_mutex needs to be held in
rbd_add().
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

e124a82f

rbd: rework calculation of new rbd id's · 1ddbe94e

由 Alex Elder 提交于 1月 29, 2012

In order to select a new unique identifier for an added rbd device,
the list of all existing ones is searched and a value one greater
than the highest id is used.

The list search can be avoided by using an atomic variable that
keeps track of the current highest id.  Using a get/put model for
id's we can limit the boundless growth of id numbers a bit by
arranging to reuse the current highest id once it gets released.
Add these calls to "put" the id when an rbd is getting removed.

Note that this changes the pattern of device id's used--new values
will never be below the highest one seen so far (even if there
exists an unused lower one).  I assert this is OK because the key
property of an rbd id is its uniqueness, not its magnitude.

Regardless, a follow-on patch will restore the old way of doing
things, I just think this commit just makes the incremental change
to atomics a little easier to understand.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

1ddbe94e

rbd: encapsulate new rbd id selection · b7f23c36

由 Alex Elder 提交于 1月 29, 2012

Move the loop that finds a new unique rbd id to use into
its own helper function.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

b7f23c36

rbd: use a single value of snap_name to mean no snap · cc9d734c

由 Josh Durgin 提交于 11月 21, 2011

There's already a constant for this anyway.

Since rbd_header_set_snap() is only used to set the rbd device
snap_name field, just do that within that function rather than
having it take the snap_name as an argument.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

v2: Changed interface rbd_header_set_snap() so it explicitly updates
    the snap_name in the rbd_device.  Also added a BUILD_BUG_ON()
    to verify the size of the snap_name field is sufficient for
    SNAP_HEAD_NAME.

cc9d734c

rbd: do not duplicate ceph_client pointer in rbd_device · 1dbb4399

由 Alex Elder 提交于 1月 24, 2012

The rbd_device structure maintains a duplicate copy of the
ceph_client pointer maintained in its rbd_client structure.  There
appears to be no good reason for this, and its presence presents a
risk of them getting out of synch or otherwise misused.  So kill it
off, and use the rbd_client copy only.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

1dbb4399

rbd: make ceph_parse_options() return a pointer · ee57741c

由 Alex Elder 提交于 1月 24, 2012

ceph_parse_options() takes the address of a pointer as an argument
and uses it to return the address of an allocated structure if
successful.  With this interface is not evident at call sites that
the pointer is always initialized.  Change the interface to return
the address instead (or a pointer-coded error code) to make the
validity of the returned pointer obvious.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

ee57741c

rbd: a few small cleanups · 21079786

由 Alex Elder 提交于 1月 24, 2012

Some minor cleanups in "drivers/block/rbd.c:
    - Use the more meaningful "RBD_MAX_OBJ_NAME_LEN" in place if "96"
      in the definition of RBD_MAX_MD_NAME_LEN.
    - Use DEFINE_SPINLOCK() to define and initialize node_lock.
    - Drop a needless (char *) cast in parse_rbd_opts_token().
    - Make a few minor formatting changes.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

21079786

ceph: make ceph_setxattr() and ceph_removexattr() more alike · 18fa8b3f

由 Alex Elder 提交于 1月 23, 2012

This patch just rearranges a few bits of code to make more
portions of ceph_setxattr() and ceph_removexattr() identical.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

18fa8b3f

ceph: avoid repeatedly computing the size of constant vxattr names · 3ce6cd12

由 Alex Elder 提交于 1月 23, 2012

All names defined in the directory and file virtual extended
attribute tables are constant, and the size of each is known at
compile time.  So there's no need to compute their length every
time any file's attribute is listed.

Record the length of each string and use it when needed to determine
the space need to represent them.  In addition, compute the
aggregate size of strings in each table just once at initialization
time.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

3ce6cd12

ceph: encode type in vxattr callback routines · aa4066ed

由 Alex Elder 提交于 1月 23, 2012

The names of the callback functions used for virtual extended
attributes are based only on the last component of the attribute
name.  Because of the way these are defined, this precludes allowing
a single (lowest) attribute name for different callbacks, dependent
on the type of file being operated on.  (For example, it might be
nice to support both "ceph.dir.layout" and "ceph.file.layout".)

Just change the callback names to avoid this problem.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

aa4066ed

ceph: drop "_cb" from name of struct ceph_vxattr_cb · 881a5fa2

由 Alex Elder 提交于 1月 23, 2012

A struct ceph_vxattr_cb does not represent a callback at all, but
rather a virtual extended attribute itself.  Drop the "_cb" suffix
from its name to reflect that.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

881a5fa2

ceph: use macros to normalize vxattr table definitions · eb788084

由 Alex Elder 提交于 1月 23, 2012

Entries in the ceph virtual extended attribute tables all follow a
distinct pattern in their definition.  Enforce this pattern through
the use of a macro.

Also, a null name field signals the end of the table, so make that
be the first field in the ceph_vxattr_cb structure.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

eb788084

ceph: use a symbolic name for "ceph." extended attribute namespace · 22891907

由 Alex Elder 提交于 1月 23, 2012

Use symbolic constants to define the top-level prefix for "ceph."
extended attribute names.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

22891907

ceph: pass inode rather than table to ceph_match_vxattr() · 06476a69

由 Alex Elder 提交于 1月 23, 2012

All callers of ceph_match_vxattr() determine what to pass as the
first argument by calling ceph_inode_vxattrs(inode).  Just do that
inside ceph_match_vxattr() itself, changing it to take an inode
rather than the vxattr pointer as its first argument.

Also ensure the function works correctly for an empty table (i.e.,
containing only a terminating null entry).
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

06476a69

ceph: don't null-terminate xattr values · b829c195

由 Alex Elder 提交于 1月 23, 2012

For some reason, ceph_setxattr() allocates an extra byte in which a
'\0' is stored past the end of an extended attribute value.  This is
not needed, and is potentially misleading, so get rid of it.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

b829c195

ceph: eliminate some abusive casts · 99f0f3b2

由 Alex Elder 提交于 1月 23, 2012

This fixes some spots where a type cast to (void *) was used as
as a universal type hiding mechanism.  Instead, properly cast the
type to the intended target type.
Signed-off-by: NAlex Elder <elder@newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

99f0f3b2

ceph: eliminate some needless casts · bd406145

由 Alex Elder 提交于 1月 23, 2012

This eliminates type casts in some places where they are not
required.
Signed-off-by: NAlex Elder <elder@newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

bd406145

ceph: kill addr_str_lock spinlock; use atomic instead · f64a9317

由 Alex Elder 提交于 1月 23, 2012

A spinlock is used to protect a value used for selecting an array
index for a string used for formatting a socket address for human
consumption.  The index is reset to 0 if it ever reaches the maximum
index value.

Instead, use an ever-increasing atomic variable as a sequence
number, and compute the array index by masking off all but the
sequence number's lowest bits.  Make the number of entries in the
array a power of two to allow the use of such a mask (to avoid jumps
in the index value when the sequence number wraps).

The length of these strings is somewhat arbitrarily set at 60 bytes.
The worst-case length of a string produced is 54 bytes, for an IPv6
address that can't be shortened, e.g.:
    [1234:5678:9abc:def0:1111:2222:123.234.210.100]:32767
Change it so we arbitrarily use 64 bytes instead; if nothing else
it will make the array of these line up better in hex dumps.

Rename a few things to reinforce the distinction between the number
of strings in the array and the length of individual strings.
Signed-off-by: NAlex Elder <elder@newdream.net>
Signed-off-by: NSage Weil <sage@newdream.net>

f64a9317

ceph: make use of "else" where appropriate · a5bc3129

由 Alex Elder 提交于 1月 23, 2012

Rearrange ceph_tcp_connect() a bit, making use of "else" rather than
re-testing a value with consecutive "if" statements.  Don't record a
connection's socket pointer unless the connect operation is
successful.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

a5bc3129

ceph: use a shared zero page rather than one per messenger · 57666519

由 Alex Elder 提交于 1月 23, 2012

Each messenger allocates a page to be used when writing zeroes
out in the event of error or other abnormal condition.  Instead,
use the kernel ZERO_PAGE() for that purpose.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

57666519

ceph: fix overflow check in build_snap_context() · 80834312

由 Xi Wang 提交于 2月 16, 2012

The overflow check for a + n * b should be (n > (ULONG_MAX - a) / b),
rather than (n > ULONG_MAX / b - a).
Signed-off-by: NXi Wang <xi.wang@gmail.com>
Signed-off-by: NSage Weil <sage@newdream.net>

80834312

libceph: fix overflow check in crush_decode() · 64486697

由 Xi Wang 提交于 2月 16, 2012

The existing overflow check (n > ULONG_MAX / b) didn't work, because
n = ULONG_MAX / b would both bypass the check and still overflow the
allocation size a + n * b.

The correct check should be (n > (ULONG_MAX - a) / b).
Signed-off-by: NXi Wang <xi.wang@gmail.com>
Signed-off-by: NSage Weil <sage@newdream.net>

64486697

ceph: avoid panic with mismatched symlink sizes in fill_inode() · 810339ec

由 Xi Wang 提交于 2月 03, 2012

Return -EINVAL rather than panic if iinfo->symlink_len and inode->i_size
do not match.

Also use kstrndup rather than kmalloc/memcpy.
Signed-off-by: NXi Wang <xi.wang@gmail.com>
Reviewed-by: NAlex Elder <elder@dreamhost.com>

810339ec

ceph: use 2 instead of 1 as fallback for 32-bit inode number · a661fc56

由 Amon Ott 提交于 1月 23, 2012

The root directory of the Ceph mount has inode number 1, so falling back
to 1 always creates a collision. 2 is unused on my test systems and seems
less likely to collide.
Signed-off-by: NAmon Ott <ao@m-privacy.de>
Signed-off-by: NSage Weil <sage@newdream.net>

a661fc56

ceph: don't reset s_cap_ttl to zero · 1ce208a6

由 Alex Elder 提交于 1月 12, 2012

Avoid the need to check for a special zero s_cap_ttl value by just
using (jiffies - 1) as the value assigned to indicate "sometime in
the past."
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Reviewed-by: NSage Weil <sage@newdream.net>

1ce208a6

net/ceph: Only clear SOCK_NOSPACE when there is sufficient space in the socket buffer · 182fac26

由 Jim Schutt 提交于 2月 29, 2012

The Ceph messenger would sometimes queue multiple work items to write
data to a socket when the socket buffer was full.

Fix this problem by making ceph_write_space() use SOCK_NOSPACE in the
same way that net/core/stream.c:sk_stream_write_space() does, i.e.,
clearing it only when sufficient space is available in the socket buffer.
Signed-off-by: NJim Schutt <jaschut@sandia.gov>
Reviewed-by: NAlex Elder <elder@dreamhost.com>

182fac26

19 3月, 2012 2 次提交

L

Linux 3.3 · c16fa4f2
由 Linus Torvalds 提交于 3月 18, 2012

c16fa4f2

Don't limit non-nested epoll paths · 93dc6107

由 Jason Baron 提交于 3月 16, 2012

Commit 28d82dc1 ("epoll: limit paths") that I did to limit the
number of possible wakeup paths in epoll is causing a few applications
to longer work (dovecot for one).

The original patch is really about limiting the amount of epoll nesting
(since epoll fds can be attached to other fds). Thus, we probably can
allow an unlimited number of paths of depth 1. My current patch limits
it at 1000. And enforce the limits on paths that have a greater depth.

This is captured in: https://bugzilla.redhat.com/show_bug.cgi?id=681578Signed-off-by: NJason Baron <jbaron@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

93dc6107

18 3月, 2012 2 次提交

Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net · c579bc7e

由 Linus Torvalds 提交于 3月 17, 2012

Pull networking changes from David Miller:
 "1) icmp6_dst_alloc() returns NULL instead of ERR_PTR() leading to
     crashes, particularly during shutdown.  Reported by Dave Jones and
     fixed by Eric Dumazet.

  2) hyperv and wimax/i2400m return NETDEV_TX_BUSY when they have
     already freed the SKB, which causes crashes as to the caller this
     means requeue the packet.  Fixes from Eric Dumazet.

  3) usbnet driver doesn't allocate the right amount of headroom on
     fresh RX SKBs, fix from Eric Dumazet.

  4) Fix regression in ip6_mc_find_dev_rcu(), as an RCU lookup it
     abolutely should not take a reference to 'dev', this leads to
     leaks.  Fix from RonQing Li.

  5) Fix netfilter ctnetlink race between delete and timeout expiration.
     From Pablo Neira Ayuso.

  6) Revert SFQ change which causes regressions, specifically queueing
     to tail can lead to unavoidable flow starvation.  From Eric
     Dumazet.

  7) Fix a memory leak and a crash on corrupt firmware files in bnx2x,
     from Michal Schmidt."

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net:
  netfilter: ctnetlink: fix race between delete and timeout expiration
  ipv6: Don't dev_hold(dev) in ip6_mc_find_dev_rcu.
  wimax/i2400m: fix erroneous NETDEV_TX_BUSY use
  net/hyperv: fix erroneous NETDEV_TX_BUSY use
  net/usbnet: reserve headroom on rx skbs
  bnx2x: fix memory leak in bnx2x_init_firmware()
  bnx2x: fix a crash on corrupt firmware file
  sch_sfq: revert dont put new flow at the end of flows
  ipv6: fix icmp6_dst_alloc()

c579bc7e

Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip · 96ee0499

由 Linus Torvalds 提交于 3月 17, 2012

Pull perf fixes from Ingo Molnar.

* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  perf tools, x86: Build perf on older user-space as well
  perf tools: Use scnprintf where applicable
  perf tools: Incorrect use of snprintf results in SEGV

96ee0499

17 3月, 2012 6 次提交

netfilter: ctnetlink: fix race between delete and timeout expiration · a16a1647

由 Pablo Neira Ayuso 提交于 3月 16, 2012

Kerin Millar reported hardlockups while running `conntrackd -c'
in a busy firewall. That system (with several processors) was
acting as backup in a primary-backup setup.

After several tries, I found a race condition between the deletion
operation of ctnetlink and timeout expiration. This patch fixes
this problem.
Tested-by: NKerin Millar <kerframil@gmail.com>
Reported-by: NKerin Millar <kerframil@gmail.com>
Signed-off-by: NPablo Neira Ayuso <pablo@netfilter.org>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

a16a1647

ipv6: Don't dev_hold(dev) in ip6_mc_find_dev_rcu. · c5779237

由 RongQing.Li 提交于 3月 15, 2012

ip6_mc_find_dev_rcu() is called with rcu_read_lock(), so don't
need to dev_hold().
With dev_hold(), not corresponding dev_put(), will lead to leak.

[ bug introduced in 96b52e61 (ipv6: mcast: RCU conversions) ]
Signed-off-by: NRongQing.Li <roy.qing.li@gmail.com>
Acked-by: NEric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: NDavid S. Miller <davem@davemloft.net>

c5779237

Merge branch 'akpm' (more patches from Andrew) · cb1ecf25

由 Linus Torvalds 提交于 3月 16, 2012

Merge some more email patches from Andrew Morton:
 "A couple of nilfs fixes"

* emailed from Andrew Morton <akpm@linux-foundation.org>:
  nilfs2: fix NULL pointer dereference in nilfs_load_super_block()
  nilfs2: clamp ns_r_segments_percentage to [1, 99]

cb1ecf25

nilfs2: fix NULL pointer dereference in nilfs_load_super_block() · d7178c79

由 Ryusuke Konishi 提交于 3月 16, 2012

According to the report from Slicky Devil, nilfs caused kernel oops at
nilfs_load_super_block function during mount after he shrank the
partition without resizing the filesystem:

 BUG: unable to handle kernel NULL pointer dereference at 00000048
 IP: [<d0d7a08e>] nilfs_load_super_block+0x17e/0x280 [nilfs2]
 *pde = 00000000
 Oops: 0000 [#1] PREEMPT SMP
 ...
 Call Trace:
  [<d0d7a87b>] init_nilfs+0x4b/0x2e0 [nilfs2]
  [<d0d6f707>] nilfs_mount+0x447/0x5b0 [nilfs2]
  [<c0226636>] mount_fs+0x36/0x180
  [<c023d961>] vfs_kern_mount+0x51/0xa0
  [<c023ddae>] do_kern_mount+0x3e/0xe0
  [<c023f189>] do_mount+0x169/0x700
  [<c023fa9b>] sys_mount+0x6b/0xa0
  [<c04abd1f>] sysenter_do_call+0x12/0x28
 Code: 53 18 8b 43 20 89 4b 18 8b 4b 24 89 53 1c 89 43 24 89 4b 20 8b 43
 20 c7 43 2c 00 00 00 00 23 75 e8 8b 50 68 89 53 28 8b 54 b3 20 <8b> 72
 48 8b 7a 4c 8b 55 08 89 b3 84 00 00 00 89 bb 88 00 00 00
 EIP: [<d0d7a08e>] nilfs_load_super_block+0x17e/0x280 [nilfs2] SS:ESP 0068:ca9bbdcc
 CR2: 0000000000000048

This turned out due to a defect in an error path which runs if the
calculated location of the secondary super block was invalid.

This patch fixes it and eliminates the reported oops.
Reported-by: NSlicky Devil <slicky.dvl@gmail.com>
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Tested-by: NSlicky Devil <slicky.dvl@gmail.com>
Cc: <stable@vger.kernel.org>	[2.6.30+]
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

d7178c79

nilfs2: clamp ns_r_segments_percentage to [1, 99] · 3d777a64

由 Haogang Chen 提交于 3月 16, 2012

ns_r_segments_percentage is read from the disk.  Bogus or malicious
value could cause integer overflow and malfunction due to meaningless
disk usage calculation.  This patch reports error when mounting such
bogus volumes.
Signed-off-by: NHaogang Chen <haogangchen@gmail.com>
Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>

3d777a64

Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security · 33e9ee8d

由 Linus Torvalds 提交于 3月 16, 2012

Pull maintainer update from James Morris:
 "Please pull this patch which adds Serge as maintainer of the
  capabilities code, as discussed on lwn and the lsm list.

  New capabilities must be signed off by the maintainer, and new uses of
  any capabilities should at be cc'd to the maintainer."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security:
  MAINTAINERS: Add Serge as maintainer of capabilities

33e9ee8d

openeuler / Kernel 1 年多 前同步成功

openeuler / Kernel
1 年多前同步成功