- 14 6月, 2016 1 次提交
-
-
由 Lars Ellenberg 提交于
The intention was to only suspend IO if some normal bitmap operation is supposed to be locked out, not always. If the bulk operation is flaged as BM_LOCKED_CHANGE_ALLOWED, we do not need to suspend IO. Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com> Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 08 6月, 2016 3 次提交
-
-
由 Mike Christie 提交于
To avoid confusion between REQ_OP_FLUSH, which is handled by request_fn drivers, and upper layers requesting the block layer perform a flush sequence along with possibly a WRITE, this patch renames REQ_FLUSH to REQ_PREFLUSH. Signed-off-by: NMike Christie <mchristi@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Mike Christie 提交于
Separate the op from the rq_flag_bits and have drbd set/get the bio using bio_set_op_attrs/bio_op. Signed-off-by: NMike Christie <mchristi@redhat.com> Reviewed-by: NChristoph Hellwig <hch@lst.de> Reviewed-by: NHannes Reinecke <hare@suse.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Mike Christie 提交于
This has callers of submit_bio/submit_bio_wait set the bio->bi_rw instead of passing it in. This makes that use the same as generic_make_request and how we set the other bio fields. Signed-off-by: NMike Christie <mchristi@redhat.com> Fixed up fs/ext4/crypto.c Signed-off-by: NJens Axboe <axboe@fb.com>
-
- 11 5月, 2016 1 次提交
-
-
由 Nicolas Dichtel 提交于
The attribute 0 is never used in drbd, so let's use it as pad attribute in netlink messages. This minimizes the patch. Note that this patch is only compile-tested. Signed-off-by: NNicolas Dichtel <nicolas.dichtel@6wind.com> Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NDavid S. Miller <davem@davemloft.net>
-
- 13 4月, 2016 1 次提交
-
-
由 Jens Axboe 提交于
Signed-off-by: NJens Axboe <axboe@fb.com> Reviewed-by: NChristoph Hellwig <hch@lst.de>
-
- 05 4月, 2016 2 次提交
-
-
由 Kirill A. Shutemov 提交于
Mostly direct substitution with occasional adjustment or removing outdated comments. Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: NMichal Hocko <mhocko@suse.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Kirill A. Shutemov 提交于
PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} macros were introduced *long* time ago with promise that one day it will be possible to implement page cache with bigger chunks than PAGE_SIZE. This promise never materialized. And unlikely will. We have many places where PAGE_CACHE_SIZE assumed to be equal to PAGE_SIZE. And it's constant source of confusion on whether PAGE_CACHE_* or PAGE_* constant should be used in a particular case, especially on the border between fs and mm. Global switching to PAGE_CACHE_SIZE != PAGE_SIZE would cause to much breakage to be doable. Let's stop pretending that pages in page cache are special. They are not. The changes are pretty straight-forward: - <foo> << (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - <foo> >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) -> <foo>; - PAGE_CACHE_{SIZE,SHIFT,MASK,ALIGN} -> PAGE_{SIZE,SHIFT,MASK,ALIGN}; - page_cache_get() -> get_page(); - page_cache_release() -> put_page(); This patch contains automated changes generated with coccinelle using script below. For some reason, coccinelle doesn't patch header files. I've called spatch for them manually. The only adjustment after coccinelle is revert of changes to PAGE_CAHCE_ALIGN definition: we are going to drop it later. There are few places in the code where coccinelle didn't reach. I'll fix them manually in a separate patch. Comments and documentation also will be addressed with the separate patch. virtual patch @@ expression E; @@ - E << (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ expression E; @@ - E >> (PAGE_CACHE_SHIFT - PAGE_SHIFT) + E @@ @@ - PAGE_CACHE_SHIFT + PAGE_SHIFT @@ @@ - PAGE_CACHE_SIZE + PAGE_SIZE @@ @@ - PAGE_CACHE_MASK + PAGE_MASK @@ expression E; @@ - PAGE_CACHE_ALIGN(E) + PAGE_ALIGN(E) @@ expression E; @@ - page_cache_get(E) + get_page(E) @@ expression E; @@ - page_cache_release(E) + put_page(E) Signed-off-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by: NMichal Hocko <mhocko@suse.com> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
- 27 1月, 2016 1 次提交
-
-
由 Herbert Xu 提交于
This patch replaces uses of the long obsolete hash interface with either shash (for non-SG users) or ahash. Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
-
- 23 1月, 2016 2 次提交
-
-
由 Tetsuo Handa 提交于
There are many locations that do if (memory_was_allocated_by_vmalloc) vfree(ptr); else kfree(ptr); but kvfree() can handle both kmalloc()ed memory and vmalloc()ed memory using is_vmalloc_addr(). Unless callers have special reasons, we can replace this branch with kvfree(). Please check and reply if you found problems. Signed-off-by: NTetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Acked-by: NMichal Hocko <mhocko@suse.com> Acked-by: NJan Kara <jack@suse.com> Acked-by: NRussell King <rmk+kernel@arm.linux.org.uk> Reviewed-by: NAndreas Dilger <andreas.dilger@intel.com> Acked-by: N"Rafael J. Wysocki" <rjw@rjwysocki.net> Acked-by: NDavid Rientjes <rientjes@google.com> Cc: "Luck, Tony" <tony.luck@intel.com> Cc: Oleg Drokin <oleg.drokin@intel.com> Cc: Boris Petkov <bp@suse.de> Signed-off-by: NAndrew Morton <akpm@linux-foundation.org> Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
-
由 Al Viro 提交于
parallel to mutex_{lock,unlock,trylock,is_locked,lock_nested}, inode_foo(inode) being mutex_foo(&inode->i_mutex). Please, use those for access to ->i_mutex; over the coming cycle ->i_mutex will become rwsem, with ->lookup() done with it held only shared. Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
-
- 26 11月, 2015 29 次提交
-
-
由 Lars Ellenberg 提交于
In case the lower level device size changed, but some other internal details of the resize did not work out, drbd_determine_dev_size() would try to restore the previous settings, trusting drbd_md_set_sector_offsets() to "do the right thing", but overlooked that this internally may set the meta data base offset based on device size. This could end up with incomplete on-disk meta data layout change, and ultimately lead to data corruption (if the failure was not noticed or ignored by the operator, and other things go wrong as well). Just remember all meta data related offsets/sizes, and on error restore them all. Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com> Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Lars Ellenberg 提交于
During handshake communication, we also reconsider our device size, using drbd_determine_dev_size(). Just in case we need to change the offsets or layout of our on-disk metadata, we lock out application and other meta data IO, and wait for the activity log to be "idle" (no more referenced extents). If this handshake happens just after a connection loss, with a fencing policy of "resource-and-stonith", we have frozen IO. If, additionally, the activity log was "starving" (too many incoming random writes at that point in time), it won't become idle, ever, because of the frozen IO, and this would be a lockup of the receiver thread, and consquentially of DRBD. Previous logic (re-)initialized with a special "empty" transaction block, which required the activity log to fully drain first. Instead, write out some standard activity log transactions. Using lc_try_lock_for_transaction() instead of lc_try_lock() does not care about pending activity log references, avoiding the potential deadlock. Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com> Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Lars Ellenberg 提交于
To be able to "force out" an activity log transaction, even if there are no pending updates. This will be used to relocate the on-disk activity log, if the on-disk offsets have to be changed, without the need to empty the activity log first. While at it, move the definition, so we can drop the forward declaration of a static helper. Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com> Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Philipp Reisner 提交于
Avoid to prematurely resume application IO: don't set/clear a single bit, but inc/dec an atomic counter. Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com> Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Lars Ellenberg 提交于
Don't remember a DRBD request as ack_pending, if it is not. In protocol A, we usually clear RQ_NET_PENDING at the same time we set RQ_NET_SENT, so when deciding to remember it as ack_pending, mod_rq_state needs to look at the current request state, not at the previous state before the current modification was applied. This should prevent advance_conn_req_ack_pending() from walking the full transfer log just to find NULL in protocol A, which would cause serious performance degradation with many "in-flight" requests, e.g. when working via DRBD-proxy, or with a huge bandwidth-delay product. Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com> Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Oleg Drokin 提交于
new_disk_conf could be leaked if the follow on checks fail, so make sure to free it on error if it was not assigned yet. Found with smatch. Signed-off-by: NOleg Drokin <green@linuxhacker.ru> Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com> Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Lars Ellenberg 提交于
Disconnect should wait for pending bitmap IO. But if that bitmap IO is not happening, because it is waiting for pending application IO, and there is no progress, because the fencing policy suspended application IO because of the disconnect, then we deadlock. The bitmap writeout in this case does not care for concurrent application IO, so there is no point waiting for it. Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com> Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Lars Ellenberg 提交于
lsblk should be able to pick up stacking device driver relations involving DRBD conveniently. Even though upstream kernel since 2011 says "DON'T USE THIS UNLESS YOU'RE ALREADY USING IT." a new user has been added since (bcache), which sets the precedences for us to use it as well. Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com> Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Lars Ellenberg 提交于
We cannot possibly support SECDISCARD, even if all backend devices would support it: if our peer is currently unreachable, some instance of the data may obviously still be recoverable. We did not set discard_granularity at all. We don't really care (yet), we only pass them on, so for now, set our granularity to one sector. blkdev_stack_limits() takes care of the rest. If we decide we cannot support discards, not only clear the (not user visible) QUEUE_FLAG_DISCARD, but set both (user visible) discard_granularity and max_discard_sectors to zero, to avoid confusion with e.g. lsblk -D. Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com> Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Lars Ellenberg 提交于
When accessing out meta data area on disk, we double check the plausibility of the requested sector offsets, and are very noisy about it if they look suspicious. During initial read of our "superblock", for "external" meta data, this triggered because the range estimate returned by drbd_md_last_sector() was still wrong. Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com> Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Lars Ellenberg 提交于
Suggested by Akinobu Mita <akinobu.mita@gmail.com> Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com> Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Lars Ellenberg 提交于
Apparently we now implicitly get definitions for BITS_PER_PAGE and BITS_PER_PAGE_MASK from the pid_namespace.h Instead of renaming our defines, I chose to define only if not yet defined, but to double check the value if already defined. Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com> Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Lars Ellenberg 提交于
Since kernel 3.3, we can use snprintf-style arguments to create a workqueue. Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com> Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Lars Ellenberg 提交于
The effective data generation ID may be interesting for debugging purposes of scenarios involving diskless states. Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com> Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Lars Ellenberg 提交于
In a multiple error scenario, we may end up with a "frozen" Primary, that has no access to any data (no local disk, no replication link). If we then resume-io, we try to generate a new data generation id, which will fail if there is no longer a local disk. Double check for available local data, which prevents the NULL pointer deref. If we are diskless, turn the resume-io in this situation into the first stage of a "force down", by bumping the "effective" data gen id, which will prevent later attach or connect to the former data set without first being demoted (deconfigured). Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com> Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Philipp Reisner 提交于
The intention is to reduce CPU utilization. Recent measurements unveiled that the current performance bottleneck is CPU utilization on the receiving node. The asender thread became CPU limited. One of the main points is to eliminate the idr_for_each_entry() loop from the sending acks code path. One exception in that is sending back ping_acks. These stay in the ack-receiver thread. Otherwise the logic becomes too complicated for no added value. Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com> Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Philipp Reisner 提交于
This prepares the next patch where the sending on the meta (or control) socket is moved to a dedicated workqueue. Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com> Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Lars Ellenberg 提交于
A D_FAILED disk transitions as quickly as possible to D_DISKLESS. But in the "unresponsive local disk" case, there remains a time window where a administrative detach command could find the disk already failed, but some internal meta data IO against the unresponsive local disk still pending. In that case, drbd_md_get_buffer() will return NULL. Don't unconditionally call drbd_md_put_buffer(), or it will cause refcount imbalance, and prevent any further re-attach on this volume (until it is deleted and re-created). Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com> Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Lars Ellenberg 提交于
The recent (not yet released) backport of the extended state broadcasts to support the "events2" subcommand of drbdsetup had some glitches. remember_old_state() would first count all connections with a net_conf != NULL, then allocate a suitable array, then populate that array with all connections found to have net_conf != NULL. This races with the state change to C_STANDALONE, and the NULL assignment there. remember_new_state() then iterates over said connection array, assuming that it would be fully populated. But rcu_lock() just makes sure the thing some pointer points to, if any, won't go away. It does not make the pointer itself immutable. In fact there is no need to "filter" connections based on whether or not they have a currently valid configuration. Just record them always, if they don't have a config, that's fine, there will be no change then. Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com> Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Lars Ellenberg 提交于
Don't blame the peer for being unresponsive, if we did not even ask the question yet. Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com> Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Lars Ellenberg 提交于
The only way to make DRBD intentionally call panic is to set a disk timeout, have that trigger, "abort" some request and complete to upper layers, then have the backend IO subsystem later complete these requests successfully regardless. As the attached IO pages have been recycled for other purposes meanwhile, this will cause unexpected random memory changes. To prevent corruption, we rather panic in that case. Make it obvious from stack traces that this was the case by introducing drbd_panic_after_delayed_completion_of_aborted_request(). Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com> Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Lars Ellenberg 提交于
Even though we really want to get the state information about our bad disk to the peer as soon as possible, it is useful to first call the local-io-error handler. People may chose to hard-reset the box from there. If that looks and behaves exactly like a "regular node crash", without bumping the data generation UUIDs on the peer in between, it makes it easier to deal with. If you intend to return from the local-io-error handler, then better return as quickly as possible to avoid triggering other timeouts. Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com> Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Lars Ellenberg 提交于
If for some reason the primary lost its disk *and* the replication link before it is able to communicate the disk loss, probably blocked IO, then later is able to re-establish the connection, the peer needs to bump its UUIDs just like it does when peer only loses the disk and is able to communicate this in time. Otherwise, a later re-attach of the disk on the primary may start a resync in the "wrong" direction. Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com> Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Lars Ellenberg 提交于
When detaching, we make sure no application IO is in-flight by internally suspending IO, then trigger the state change, wait for the result, and finally internally resume IO again. Once we triggered the stat change to "Failed", we expect it to change from Failed to Diskless. (To avoid races, we actually wait for it to leave "Failed"). On an unresponsive local IO backend, this may not happen, ever. Don't have a "hung" detach block IO "forever", but resume IO before waiting for the state change to Diskless. We may well be able to continue IO to and from a healthy peer. Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com> Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Lars Ellenberg 提交于
(You should not use disk-timeout anyways, see the man page for why...) We add incoming requests to the tail of some ring list. On local completion, requests are removed from that list. The timer looks only at the head of that ring list, so is supposed to only see the oldest request. All protected by a spinlock. The request object is created with timestamps zeroed out. The timestamp was only filled in just before the actual submit. But to actually submit the request, we need to give up the spinlock. If you are unlucky, there is no older still pending request, the timer looks at a new request with timestamp still zero (before it even was submitted), and 0 + timeout is most likely older than "now". Better assign the timestamp right when we put the request object on said ring list. Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com> Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Philipp Reisner 提交于
GFP_NOWAIT has a value of 0. I.e. functionality not changed. Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com> Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Markus Elfring 提交于
The lc_destroy() function tests whether its argument is NULL and then returns immediately. Thus the test around the call is not needed. This issue was detected by using the Coccinelle software. Signed-off-by: NMarkus Elfring <elfring@users.sourceforge.net> Signed-off-by: NRoland Kammerer <roland.kammerer@linbit.com> Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com> Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Andreas Gruenbacher 提交于
The status command originates the drbd9 code base. While for now we keep the status information in /proc/drbd available, this commit allows the user base to gracefully migrate their monitoring infrastructure to the new status reporting interface. In drbd9 no status information is exposed through /proc/drbd. Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com> Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-
由 Andreas Gruenbacher 提交于
The events2 command originates from drbd-9 development. It features more information but requires a incompatible change in output format. Therefore the previous events command continues to exist, the new improved events2 command becomes available now. This prepares the user-base for a later switch to the complete drbd9 code base. Signed-off-by: NPhilipp Reisner <philipp.reisner@linbit.com> Signed-off-by: NLars Ellenberg <lars.ellenberg@linbit.com> Signed-off-by: NJens Axboe <axboe@fb.com>
-