提交 · fd51653f78cf40a0516e521b6de22f329c5bad8d · openanolis / cloud-kernel

15 5月, 2012 13 次提交

ceph: messenger: change read_partial() to take "end" arg · fd51653f

由 Alex Elder 提交于 5月 10, 2012

Make the second argument to read_partial() be the ending input byte
position rather than the beginning offset it now represents.  This
amounts to moving the addition "to + size" into the caller.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

fd51653f

ceph: messenger: update "to" in read_partial() caller · e6cee71f

由 Alex Elder 提交于 5月 10, 2012

read_partial() always increases whatever "to" value is supplied by
adding the requested size to it, and that's the only thing it does
with that pointed-to value.

Do that pointer advance in the caller (and then only when the
updated value will be subsequently used), and change the "to"
parameter to be an in-only and non-pointer value.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

e6cee71f

ceph: messenger: use read_partial() in read_partial_message() · 57dac9d1

由 Alex Elder 提交于 5月 10, 2012

There are two blocks of code in read_partial_message()--those that
read the header and footer of the message--that can be replaced by a
call to read_partial().  Do that.
Signed-off-by: NAlex Elder <elder@inktank.com>
Reviewed-by: NSage Weil <sage@inktank.com>

57dac9d1

rbd: correct sysfs snap attribute documentation · b7f6519e

由 Josh Durgin 提交于 12月 01, 2011

Each attribute is prefixed with "snap_".
Signed-off-by: NJosh Durgin <josh.durgin@dreamhost.com>
Reviewed-by: NAlex Elder <elder@dreamhost.com>
Reviewed-by: NYehuda Sadeh <yehuda@hq.newdream.net>

b7f6519e

rbd: rename __rbd_update_snaps to __rbd_refresh_header · 263c6ca0

由 Josh Durgin 提交于 12月 05, 2011

This function rereads the entire header and handles any changes in
it, not just changes in snapshots.
Signed-off-by: NJosh Durgin <josh.durgin@dreamhost.com>
Reviewed-by: NAlex Elder <elder@dreamhost.com>
Reviewed-by: NYehuda Sadeh <yehuda@hq.newdream.net>

263c6ca0

rbd: fix snapshot size type · 3591538f

由 Josh Durgin 提交于 12月 05, 2011

Snapshot sizes should be the same type as regular image sizes. This
only affects their displayed size in sysfs, not the reported size of
an actual block device sizes.
Signed-off-by: NJosh Durgin <josh.durgin@dreamhost.com>
Reviewed-by: NAlex Elder <elder@dreamhost.com>
Reviewed-by: NYehuda Sadeh <yehuda@hq.newdream.net>

3591538f

rbd: remove conditional snapid parameters · b06e6a6b

由 Josh Durgin 提交于 11月 21, 2011

The snapid parameters passed to rbd_do_op() and rbd_req_sync_op()
are now always either a valid snapid or an explicit CEPH_NOSNAP.

[elder@dreamhost.com: Rephrased the description]
Signed-off-by: NJosh Durgin <josh.durgin@dreamhost.com>
Reviewed-by: NAlex Elder <elder@dreamhost.com>
Reviewed-by: NYehuda Sadeh <yehuda@hq.newdream.net>

b06e6a6b

rbd: store snapshot id instead of index · 77dfe99f

由 Josh Durgin 提交于 11月 21, 2011

When a device was open at a snapshot, and snapshots were deleted or
added, data from the wrong snapshot could be read. Instead of
assuming the snap context is constant, store the actual snap id when
the device is initialized, and rely on the OSDs to signal an error
if we try reading from a snapshot that was deleted.
Signed-off-by: NJosh Durgin <josh.durgin@dreamhost.com>
Reviewed-by: NAlex Elder <elder@dreamhost.com>
Reviewed-by: NYehuda Sadeh <yehuda@hq.newdream.net>

77dfe99f

rbd: protect read of snapshot sequence number · 403f24d3

由 Josh Durgin 提交于 12月 05, 2011

This is updated whenever a snapshot is added or deleted, and the
snapc pointer is changed with every refresh of the header.
Signed-off-by: NJosh Durgin <josh.durgin@dreamhost.com>
Reviewed-by: NAlex Elder <elder@dreamhost.com>
Reviewed-by: NYehuda Sadeh <yehuda@hq.newdream.net>

403f24d3

rbd: fix integer overflow in rbd_header_from_disk() · 50f7c4c9

由 Xi Wang 提交于 4月 20, 2012

ondisk->snap_count is read from disk via rbd_req_sync_read() and thus
needs validation.  Otherwise, a bogus `snap_count' could overflow the
kmalloc() size, leading to memory corruption.

Also use `u32' consistently for `snap_count'.

[elder@dreamhost.com: changed to use UINT_MAX rather than ULONG_MAX]
Signed-off-by: NXi Wang <xi.wang@gmail.com>
Reviewed-by: NAlex Elder <elder@dreamhost.com>

50f7c4c9

rbd: use gfp_flags parameter in rbd_header_from_disk() · f8ad495a

由 Dan Carpenter 提交于 4月 20, 2012

We should use the gfp_flags that the caller specified instead of
GFP_KERNEL here.

There is only one caller and it uses GFP_KERNEL, so this change is
just a cleanup and doesn't change how the code works.
Signed-off-by: NDan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: NAlex Elder <elder@dreamhost.com>

f8ad495a

ceph: fix bounds check in ceph_decode_need and ceph_encode_need · 76aa542f

由 Xi Wang 提交于 4月 20, 2012

Given a large n, the bounds check (*p + n > end) can be bypassed due to
pointer wraparound.  A safer check is (n > end - *p).

[elder@dreamhost.com: inverted test and renamed ceph_has_room()]
Signed-off-by: NXi Wang <xi.wang@gmail.com>
Reviewed-by: NAlex Elder <elder@dreamhost.com>

76aa542f

ceph: osd_client: fix endianness bug in osd_req_encode_op() · 065a68f9

由 Alex Elder 提交于 4月 20, 2012

From Al Viro <viro@zeniv.linux.org.uk>

Al Viro noticed that we were using a non-cpu-encoded value in
a switch statement in osd_req_encode_op().  The result would
clearly not work correctly on a big-endian machine.
Signed-off-by: NAlex Elder <elder@dreamhost.com>

065a68f9

08 5月, 2012 11 次提交

crush: warn on do_rule failure · 8b393269

由 Sage Weil 提交于 5月 07, 2012

If we get an error code from crush_do_rule(), print an error to the
console.
Reviewed-by: NAlex Elder <elder@inktank.com>
Signed-off-by: NSage Weil <sage@inktank.com>

8b393269

crush: fix memory leak when destroying tree buckets · 6eb43f4b

由 Sage Weil 提交于 5月 07, 2012

Reflects ceph.git commit 46d63d98434b3bc9dad2fc9ab23cbaedc3bcb0e4.
Reported-by: NAlexander Lyakas <alex.bolshoy@gmail.com>
Reviewed-by: NAlex Elder <elder@inktank.com>
Signed-off-by: NSage Weil <sage@inktank.com>

6eb43f4b

crush: fix tree node weight lookup · f671d4cd

由 Sage Weil 提交于 5月 07, 2012

Fix the node weight lookup for tree buckets by using a correct accessor.

Reflects ceph.git commit d287ade5bcbdca82a3aef145b92924cf1e856733.
Reviewed-by: NAlex Elder <elder@inktank.com>
Signed-off-by: NSage Weil <sage@inktank.com>

f671d4cd

crush: remove parent maps · fc7c3ae5

由 Sage Weil 提交于 5月 07, 2012

These were used for the ill-fated forcefeed feature.  Remove them.

Reflects ceph.git commit ebdf80edfecfbd5a842b71fbe5732857994380c1.
Reviewed-by: NAlex Elder <elder@inktank.com>
Signed-off-by: NSage Weil <sage@inktank.com>

fc7c3ae5

crush: remove forcefeed functionality · 41ebcc09

由 Sage Weil 提交于 5月 07, 2012

Remove forcefeed functionality from CRUSH.  This is an ugly misfeature that
is mostly useless and unused.  Remove it.

Reflects ceph.git commit ed974b5000f2851207d860a651809af4a1867942.
Reviewed-by: NAlex Elder <elder@inktank.com>
Signed-off-by: NSage Weil <sage@inktank.com>

Conflicts:

	net/ceph/crush/mapper.c

41ebcc09

crush: use a temporary variable to simplify crush_do_rule · 0668216e

由 Sage Weil 提交于 5月 07, 2012

Use a temporary variable here to avoid repeated array lookups and clean up
the code a bit.

This reflects ceph.git commit 6b5be27634ad307b471a5bf0db85c4f5c834885f.
Reviewed-by: NAlex Elder <elder@inktank.com>
Signed-off-by: NSage Weil <sage@inktank.com>

0668216e

crush: be more tolerant of nonsensical crush maps · a1f4895b

由 Sage Weil 提交于 5月 07, 2012

If we get a map that doesn't make sense, error out or ignore the badness
instead of BUGging out.  This reflects the ceph.git commits
9895f0bff7dc68e9b49b572613d242315fb11b6c and
8ded26472058d5205803f244c2f33cb6cb10de79.
Reviewed-by: NAlex Elder <elder@inktank.com>
Signed-off-by: NSage Weil <sage@inktank.com>

a1f4895b

crush: adjust local retry threshold · c90f95ed

由 Sage Weil 提交于 5月 07, 2012

This small adjustment reflects a change that was made in ceph.git commit
af6a9f30696c900a2a8bd7ae24e8ed15fb4964bb, about 6 months ago.  An N-1
search is not exhaustive.  Fixed ceph.git bug #1594.
Reviewed-by: NAlex Elder <elder@inktank.com>
Signed-off-by: NSage Weil <sage@inktank.com>

c90f95ed

crush: clean up types, const-ness · 8b12d47b

由 Sage Weil 提交于 5月 07, 2012

Move various types from int -> __u32 (or similar), and add const as
appropriate.

This reflects changes that have been present in the userland implementation
for some time.
Reviewed-by: NAlex Elder <elder@inktank.com>
Signed-off-by: NSage Weil <sage@inktank.com>

8b12d47b

ceph: refactor SETLAYOUT and SETDIRLAYOUT ioctl checks into common helper · e49bf4c5

由 Sage Weil 提交于 5月 07, 2012

Both of these methods perform similar checks; move that code to a helper
so that we can ensure the checks are consistent.
Reviewed-by: NAlex Elder <elder@inktank.com>
Signed-off-by: NSage Weil <sage@inktank.com>

e49bf4c5

ceph: drop support for preferred_osd pgs · 3469ac1a

由 Sage Weil 提交于 5月 07, 2012

This was an ill-conceived feature that has been removed from Ceph.  Do
this gracefully:

 - reject attempts to specify a preferred_osd via the ioctl
 - stop exposing this information via virtual xattrs
 - always fill in -1 for requests, in case we talk to an older server
 - don't calculate preferred_osd placements/pgids
Reviewed-by: NAlex Elder <elder@inktank.com>
Signed-off-by: NSage Weil <sage@inktank.com>

3469ac1a

06 4月, 2012 1 次提交

rbd: don't hold spinlock during messenger flush · cd9d9f5d

由 Alex Elder 提交于 4月 04, 2012

A recent change made changes to the rbd_client_list be protected by
a spinlock.  Unfortunately in rbd_put_client(), the lock is taken
before possibly dropping the last reference to an rbd_client, and on
the last reference that eventually calls flush_workqueue() which can
sleep.

The problem was flagged by a debug spinlock warning:
    BUG: spinlock wrong CPU on CPU#3, rbd/27814

The solution is to move the spinlock acquisition and release inside
rbd_client_release(), which is the spot where it's really needed for
protecting the removal of the rbd_client from the client list.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Reviewed-by: NSage Weil <sage@newdream.net>

cd9d9f5d

22 3月, 2012 15 次提交

rbd: move snap_rwsem to the device, rename to header_rwsem · c666601a

由 Josh Durgin 提交于 11月 21, 2011

A new temporary header is allocated each time the header changes, but
only the changed properties are copied over. We don't need a new
semaphore for each header update.

This addresses http://tracker.newdream.net/issues/2174Signed-off-by: NJosh Durgin <josh.durgin@dreamhost.com>
Reviewed-by: NAlex Elder <elder@dreamhost.com>

c666601a

ceph: fix three bugs, two in ceph_vxattrcb_file_layout() · 3489b42a

由 Alex Elder 提交于 3月 08, 2012

In ceph_vxattrcb_file_layout(), there is a check to determine
whether a preferred PG should be formatted into the output buffer.
That check assumes that a preferred PG number of 0 indicates "no
preference," but that is wrong.  No preference is indicated by a
negative (specifically, -1) PG number.

In addition, if that condition yields true, the preferred value
is formatted into a sized buffer, but the size consumed by the
earlier snprintf() call is not accounted for, opening up the
possibilty of a buffer overrun.

Finally, in ceph_vxattrcb_dir_rctime() where the nanoseconds part of
the time displayed did not include leading 0's, which led to
erroneous (sub-second portion of) time values being shown.

This fixes these three issues:
    http://tracker.newdream.net/issues/2155
    http://tracker.newdream.net/issues/2156
    http://tracker.newdream.net/issues/2157Signed-off-by: NAlex Elder <elder@dreamhost.com>
Reviewed-by: NSage Weil <sage@newdream.net>

3489b42a

libceph: isolate kmap() call in write_partial_msg_pages() · 8d63e318

由 Alex Elder 提交于 3月 07, 2012

In write_partial_msg_pages(), every case now does an identical call
to kmap(page).  Instead, just call it once inside the CRC-computing
block where it's needed.  Move the definition of kaddr inside that
block, and make it a (char *) to ensure portable pointer arithmetic.

We still don't kunmap() it until after the sendpage() call, in case
that also ends up needing to use the mapping.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Reviewed-by: NSage Weil <sage@newdream.net>

8d63e318

libceph: rename "page_shift" variable to something sensible · 9bd19663

由 Alex Elder 提交于 3月 07, 2012

In write_partial_msg_pages() there is a local variable used to
track the starting offset within a bio segment to use.  Its name,
"page_shift" defies the Linux convention of using that name for
log-base-2(page size).

Since it's only used in the bio case rename it "bio_offset".  Use it
along with the page_pos field to compute the memory offset when
computing CRC's in that function.  This makes the bio case match the
others more closely.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Reviewed-by: NSage Weil <sage@newdream.net>

9bd19663

libceph: get rid of zero_page_address · 0cdf9e60

由 Alex Elder 提交于 3月 07, 2012

There's not a lot of benefit to zero_page_address, which basically
holds a mapping of the zero page through the life of the messenger
module.  Even with our own mapping, the sendpage interface where
it's used may need to kmap() it again.  It's almost certain to
be in low memory anyway.

So stop treating the zero page specially in write_partial_msg_pages()
and just get rid of zero_page_address entirely.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Reviewed-by: NSage Weil <sage@newdream.net>

0cdf9e60

libceph: only call kernel_sendpage() via helper · e36b13cc

由 Alex Elder 提交于 3月 07, 2012

Make ceph_tcp_sendpage() be the only place kernel_sendpage() is
used, by using this helper in write_partial_msg_pages().
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Reviewed-by: NSage Weil <sage@newdream.net>

e36b13cc

libceph: use kernel_sendpage() for sending zeroes · 31739139

由 Alex Elder 提交于 3月 07, 2012

If a message queued for send gets revoked, zeroes are sent over the
wire instead of any unsent data.  This is done by constructing a
message and passing it to kernel_sendmsg() via ceph_tcp_sendmsg().

Since we are already working with a page in this case we can use
the sendpage interface instead.  Create a new ceph_tcp_sendpage()
helper that sets up flags to match the way ceph_tcp_sendmsg()
does now.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Reviewed-by: NSage Weil <sage@newdream.net>

31739139

libceph: fix inverted crc option logic · 37675b0f

由 Alex Elder 提交于 3月 07, 2012

CRC's are computed for all messages between ceph entities. The CRC
computation for the data portion of message can optionally be
disabled using the "nocrc" (common) ceph option. The default is
for CRC computation for the data portion to be enabled.

Unfortunately, the code that implements this feature interprets the
feature flag wrong, meaning that by default the CRC's have *not*
been computed (or checked) for the data portion of messages unless
the "nocrc" option was supplied.

Fix this, in write_partial_msg_pages() and read_partial_message().
Also change the flag variable in write_partial_msg_pages() to be
"no_datacrc" to match the usage elsewhere in the file.

This fixes http://tracker.newdream.net/issues/2064Signed-off-by: NAlex Elder <elder@dreamhost.com>
Reviewed-by: NSage Weil <sage@newdream.net>

37675b0f

libceph: some simple changes · 84495f49

由 Alex Elder 提交于 2月 15, 2012

Nothing too big here.
    - define the size of the buffer used for consuming ignored
      incoming data using a symbolic constant
    - simplify the condition determining whether to unmap the page
      in write_partial_msg_pages(): do it for crc but not if the
      page is the zero page
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

84495f49

libceph: small refactor in write_partial_kvec() · f42299e6

由 Alex Elder 提交于 2月 15, 2012

Make a small change in the code that counts down kvecs consumed by
a ceph_tcp_sendmsg() call.  Same functionality, just blocked out
a little differently.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

f42299e6

libceph: do crc calculations outside loop · fe3ad593

由 Alex Elder 提交于 2月 15, 2012

Move blocks of code out of loops in read_partial_message_section()
and read_partial_message().  They were only was getting called at
the end of the last iteration of the loop anyway.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

fe3ad593

libceph: separate CRC calculation from byte swapping · a9a0c51a

由 Alex Elder 提交于 2月 15, 2012

Calculate CRC in a separate step from rearranging the byte order
of the result, to improve clarity and readability.

Use offsetof() to determine the number of bytes to include in the
CRC calculation.

In read_partial_message(), switch which value gets byte-swapped,
since the just-computed CRC is already likely to be in a register.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

a9a0c51a

libceph: use "do" in CRC-related Boolean variables · bca064d2

由 Alex Elder 提交于 2月 15, 2012

Change the name (and type) of a few CRC-related Boolean local
variables so they contain the word "do", to distingish their purpose
from variables used for holding an actual CRC value.

Note that in the process of doing this I identified a fairly serious
logic error in write_partial_msg_pages(): the value of "do_crc"
assigned appears to be the opposite of what it should be. No
attempt to fix this is made here; this change preserves the
erroneous behavior. The problem I found is documented here:
http://tracker.newdream.net/issues/2064Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

bca064d2

ceph: ensure Boolean options support both senses · cffaba15

由 Alex Elder 提交于 2月 15, 2012

Many ceph-related Boolean options offer the ability to both enable
and disable a feature.  For all those that don't offer this, add
a new option so that they do.

Note that ceph_show_options()--which reports mount options currently
in effect--only reports the option if it is different from the
default value.
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

cffaba15

libceph: a few small changes · d3002b97

由 Alex Elder 提交于 2月 14, 2012

This gathers a number of very minor changes:
    - use %hu when formatting the a socket address's address family
    - null out the ceph_msgr_wq pointer after the queue has been
      destroyed
    - drop a needless cast in ceph_write_space()
    - add a WARN() call in ceph_state_change() in the event an
      unrecognized socket state is encountered
    - rearrange the logic in ceph_con_get() and ceph_con_put() so
      that:
        - the reference counts are only atomically read once
	- the values displayed via dout() calls are known to
	  be meaningful at the time they are formatted
Signed-off-by: NAlex Elder <elder@dreamhost.com>
Signed-off-by: NSage Weil <sage@newdream.net>

d3002b97

openanolis / cloud-kernel 接近 2 年 前同步成功

openanolis / cloud-kernel
接近 2 年前同步成功