1. 10 6月, 2016 1 次提交
    • A
      much milder d_walk() race · ba65dc5e
      Al Viro 提交于
      d_walk() relies upon the tree not getting rearranged under it without
      rename_lock being touched.  And we do grab rename_lock around the
      places that change the tree topology.  Unfortunately, branch reordering
      is just as bad from d_walk() POV and we have two places that do it
      without touching rename_lock - one in handling of cursors (for ramfs-style
      directories) and another in autofs.  autofs one is a separate story; this
      commit deals with the cursors.
      	* mark cursor dentries explicitly at allocation time
      	* make __dentry_kill() leave ->d_child.next pointing to the next
      non-cursor sibling, making sure that it won't be moved around unnoticed
      before the parent is relocked on ascend-to-parent path in d_walk().
      	* make d_walk() skip cursors explicitly; strictly speaking it's
      not necessary (all callbacks we pass to d_walk() are no-ops on cursors),
      but it makes analysis easier.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      ba65dc5e
  2. 08 6月, 2016 1 次提交
  3. 30 5月, 2016 2 次提交
    • A
      trim fsnotify hooks a bit · affda484
      Al Viro 提交于
      fsnotify_d_move()/__fsnotify_d_instantiate()/__fsnotify_update_dcache_flags()
      are identical to each other, regardless of the config.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      affda484
    • A
      undo "fs: allow d_instantiate to be called with negative parent dentry" · 2853908a
      Al Viro 提交于
      Background: spufs used to mangle the order in which it had been building
      dentry trees.  It was broken in a lot of ways, but most of them required
      the right timing to trigger until an fsnotify change had added one more
      - the one that was always triggered.
      
      Unfortunately, insteading of fixing their long-standing bug the spufs
      folks had chosen to paper over the fsnotify trigger.  Eventually said
      bug had been spotted and killed off, but the pointless check in
      fsnotify has remained, complete with the implication that one *could*
      do that kind of crap.
      
      Again, a parent of any dentry should always be positive.  Any code
      can rely upon that and anything violating that assert is a bug,
      *not* something to be accomodated.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      2853908a
  4. 29 5月, 2016 6 次提交
    • G
      <linux/hash.h>: Add support for architecture-specific functions · 468a9428
      George Spelvin 提交于
      This is just the infrastructure; there are no users yet.
      
      This is modelled on CONFIG_ARCH_RANDOM; a CONFIG_ symbol declares
      the existence of <asm/hash.h>.
      
      That file may define its own versions of various functions, and define
      HAVE_* symbols (no CONFIG_ prefix!) to suppress the generic ones.
      
      Included is a self-test (in lib/test_hash.c) that verifies the basics.
      It is NOT in general required that the arch-specific functions compute
      the same thing as the generic, but if a HAVE_* symbol is defined with
      the value 1, then equality is tested.
      Signed-off-by: NGeorge Spelvin <linux@sciencehorizons.net>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greg Ungerer <gerg@linux-m68k.org>
      Cc: Andreas Schwab <schwab@linux-m68k.org>
      Cc: Philippe De Muyter <phdm@macq.eu>
      Cc: linux-m68k@lists.linux-m68k.org
      Cc: Alistair Francis <alistai@xilinx.com>
      Cc: Michal Simek <michal.simek@xilinx.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: uclinux-h8-devel@lists.sourceforge.jp
      468a9428
    • G
      Eliminate bad hash multipliers from hash_32() and hash_64() · ef703f49
      George Spelvin 提交于
      The "simplified" prime multipliers made very bad hash functions, so get rid
      of them.  This completes the work of 689de1d6.
      
      To avoid the inefficiency which was the motivation for the "simplified"
      multipliers, hash_64() on 32-bit systems is changed to use a different
      algorithm.  It makes two calls to hash_32() instead.
      
      drivers/media/usb/dvb-usb-v2/af9015.c uses the old GOLDEN_RATIO_PRIME_32
      for some horrible reason, so it inherits a copy of the old definition.
      Signed-off-by: NGeorge Spelvin <linux@sciencehorizons.net>
      Cc: Antti Palosaari <crope@iki.fi>
      Cc: Mauro Carvalho Chehab <m.chehab@samsung.com>
      ef703f49
    • G
      Change hash_64() return value to 32 bits · 92d56774
      George Spelvin 提交于
      That's all that's ever asked for, and it makes the return
      type of hash_long() consistent.
      
      It also allows (upcoming patch) an optimized implementation
      of hash_64 on 32-bit machines.
      
      I tried adding a BUILD_BUG_ON to ensure the number of bits requested
      was never more than 32 (most callers use a compile-time constant), but
      adding <linux/bug.h> to <linux/hash.h> breaks the tools/perf compiler
      unless tools/perf/MANIFEST is updated, and understanding that code base
      well enough to update it is too much trouble.  I did the rest of an
      allyesconfig build with such a check, and nothing tripped.
      Signed-off-by: NGeorge Spelvin <linux@sciencehorizons.net>
      92d56774
    • G
      <linux/sunrpc/svcauth.h>: Define hash_str() in terms of hashlen_string() · 917ea166
      George Spelvin 提交于
      Finally, the first use of previous two patches: eliminate the
      separate ad-hoc string hash functions in the sunrpc code.
      
      Now hash_str() is a wrapper around hash_string(), and hash_mem() is
      likewise a wrapper around full_name_hash().
      
      Note that sunrpc code *does* call hash_mem() with a zero length, which
      is why the previous patch needed to handle that in full_name_hash().
      (Thanks, Bruce, for finding that!)
      
      This also eliminates the only caller of hash_long which asks for
      more than 32 bits of output.
      
      The comment about the quality of hashlen_string() and full_name_hash()
      is jumping the gun by a few patches; they aren't very impressive now,
      but will be improved greatly later in the series.
      Signed-off-by: NGeorge Spelvin <linux@sciencehorizons.net>
      Tested-by: NJ. Bruce Fields <bfields@redhat.com>
      Acked-by: NJ. Bruce Fields <bfields@redhat.com>
      Cc: Jeff Layton <jlayton@poochiereds.net>
      Cc: linux-nfs@vger.kernel.org
      917ea166
    • G
      fs/namei.c: Add hashlen_string() function · fcfd2fbf
      George Spelvin 提交于
      We'd like to make more use of the highly-optimized dcache hash functions
      throughout the kernel, rather than have every subsystem create its own,
      and a function that hashes basic null-terminated strings is required
      for that.
      
      (The name is to emphasize that it returns both hash and length.)
      
      It's actually useful in the dcache itself, specifically d_alloc_name().
      Other uses in the next patch.
      
      full_name_hash() is also tweaked to make it more generally useful:
      1) Take a "char *" rather than "unsigned char *" argument, to
         be consistent with hash_name().
      2) Handle zero-length inputs.  If we want more callers, we don't want
         to make them worry about corner cases.
      Signed-off-by: NGeorge Spelvin <linux@sciencehorizons.net>
      fcfd2fbf
    • G
      Pull out string hash to <linux/stringhash.h> · f4bcbe79
      George Spelvin 提交于
      ... so they can be used without the rest of <linux/dcache.h>
      
      The hashlen_* macros will make sense next patch.
      Signed-off-by: NGeorge Spelvin <linux@sciencehorizons.net>
      f4bcbe79
  5. 28 5月, 2016 5 次提交
  6. 27 5月, 2016 4 次提交
  7. 26 5月, 2016 21 次提交
    • A
      add down_write_killable_nested() · 887bddfa
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      887bddfa
    • Z
      ceph: make logical calculation functions return bool · 3b33f692
      Zhang Zhuoyu 提交于
      This patch makes serverl logical caculation functions return bool to
      improve readability due to these particular functions only using 0/1
      as their return value.
      
      No functional change.
      Signed-off-by: NZhang Zhuoyu <zhangzhuoyu@cmss.chinamobile.com>
      3b33f692
    • Y
      ceph: using hash value to compose dentry offset · f3c4ebe6
      Yan, Zheng 提交于
      If MDS sorts dentries in dirfrag in hash order, we use hash value to
      compose dentry offset. dentry offset is:
      
        (0xff << 52) | ((24 bits hash) << 28) |
        (the nth entry hash hash collision)
      
      This offset is stable across directory fragmentation. This alos means
      there is no need to reset readdir offset if directory get fragmented
      in the middle of readdir.
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      f3c4ebe6
    • Y
      ceph: define 'end/complete' in readdir reply as bit flags · 956d39d6
      Yan, Zheng 提交于
      Set a flag in readdir request, which indicates that client interprets
      'end/complete' as bit flags. So that mds can reply additional flags in
      readdir reply.
      Signed-off-by: NYan, Zheng <zyan@redhat.com>
      956d39d6
    • I
      737cc81e
    • I
      libceph: replace ceph_monc_request_next_osdmap() · 7cca78c9
      Ilya Dryomov 提交于
      ... with a wrapper around maybe_request_map() - no need for two
      osdmap-specific functions.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      7cca78c9
    • I
      libceph: pool deletion detection · 4609245e
      Ilya Dryomov 提交于
      This adds the "map check" infrastructure for sending osdmap version
      checks on CALC_TARGET_POOL_DNE and completing in-flight requests with
      -ENOENT if the target pool doesn't exist or has just been deleted.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      4609245e
    • I
      libceph: async MON client generic requests · d0b19705
      Ilya Dryomov 提交于
      For map check, we are going to need to send CEPH_MSG_MON_GET_VERSION
      messages asynchronously and get a callback on completion.  Refactor MON
      client to allow firing off generic requests asynchronously and add an
      async variant of ceph_monc_get_version().  ceph_monc_do_statfs() is
      switched over and remains sync.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      d0b19705
    • I
      libceph: support for checking on status of watch · b07d3c4b
      Ilya Dryomov 提交于
      Implement ceph_osdc_watch_check() to be able to check on status of
      watch.  Note that the time it takes for a watch/notify event to get
      delivered through the notify_wq is taken into account.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      b07d3c4b
    • I
      libceph: support for sending notifies · 19079203
      Ilya Dryomov 提交于
      Implement ceph_osdc_notify() for sending notifies.
      
      Due to the fact that the current messenger can't do read-in into
      pagelists (it can only do write-out from them), I had to go with a page
      vector for a NOTIFY_COMPLETE payload, for now.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      19079203
    • I
      libceph, rbd: ceph_osd_linger_request, watch/notify v2 · 922dab61
      Ilya Dryomov 提交于
      This adds support and switches rbd to a new, more reliable version of
      watch/notify protocol.  As with the OSD client update, this is mostly
      about getting the right structures linked into the right places so that
      reconnects are properly sent when needed.  watch/notify v2 also
      requires sending regular pings to the OSDs - send_linger_ping().
      
      A major change from the old watch/notify implementation is the
      introduction of ceph_osd_linger_request - linger requests no longer
      piggy back on ceph_osd_request.  ceph_osd_event has been merged into
      ceph_osd_linger_request.
      
      All the details are now hidden within libceph, the interface consists
      of a simple pair of watch/unwatch functions and ceph_osdc_notify_ack().
      ceph_osdc_watch() does return ceph_osd_linger_request, but only to keep
      the lifetime management simple.
      
      ceph_osdc_notify_ack() accepts an optional data payload, which is
      relayed back to the notifier.
      
      Portions of this patch are loosely based on work by Douglas Fuller
      <dfuller@redhat.com> and Mike Christie <michaelc@cs.wisc.edu>.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      922dab61
    • I
      libceph: a major OSD client update · 5aea3dcd
      Ilya Dryomov 提交于
      This is a major sync up, up to ~Jewel.  The highlights are:
      
      - per-session request trees (vs a global per-client tree)
      - per-session locking (vs a global per-client rwlock)
      - homeless OSD session
      - no ad-hoc global per-client lists
      - support for pool quotas
      - foundation for watch/notify v2 support
      - foundation for map check (pool deletion detection) support
      
      The switchover is incomplete: lingering requests can be setup and
      teared down but aren't ever reestablished.  This functionality is
      restored with the introduction of the new lingering infrastructure
      (ceph_osd_linger_request, linger_work, etc) in a later commit.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      5aea3dcd
    • I
      libceph: protect osdc->osd_lru list with a spinlock · 9dd2845c
      Ilya Dryomov 提交于
      OSD client is getting moved from the big per-client lock to a set of
      per-session locks.  The big rwlock would only be held for read most of
      the time, so a global osdc->osd_lru needs additional protection.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      9dd2845c
    • I
      libceph: handle_one_map() · 42c1b124
      Ilya Dryomov 提交于
      Separate osdmap handling from decoding and iterating over a bag of maps
      in a fresh MOSDMap message.  This sets up the scene for the updated OSD
      client.
      
      Of particular importance here is the addition of pi->was_full, which
      can be used to answer "did this pool go full -> not-full in this map?".
      This is the key bit for supporting pool quotas.
      
      We won't be able to downgrade map_sem for much longer, so drop
      downgrade_write().
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      42c1b124
    • I
      libceph: allocate dummy osdmap in ceph_osdc_init() · e5253a7b
      Ilya Dryomov 提交于
      This leads to a simpler osdmap handling code, particularly when dealing
      with pi->was_full, which is introduced in a later commit.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      e5253a7b
    • I
      libceph: redo callbacks and factor out MOSDOpReply decoding · fe5da05e
      Ilya Dryomov 提交于
      If you specify ACK | ONDISK and set ->r_unsafe_callback, both
      ->r_callback and ->r_unsafe_callback(true) are called on ack.  This is
      very confusing.  Redo this so that only one of them is called:
      
          ->r_unsafe_callback(true), on ack
          ->r_unsafe_callback(false), on commit
      
      or
      
          ->r_callback, on ack|commit
      
      Decode everything in decode_MOSDOpReply() to reduce clutter.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      fe5da05e
    • I
      libceph: drop msg argument from ceph_osdc_callback_t · 85e084fe
      Ilya Dryomov 提交于
      finish_read(), its only user, uses it to get to hdr.data_len, which is
      what ->r_result is set to on success.  This gains us the ability to
      safely call callbacks from contexts other than reply, e.g. map check.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      85e084fe
    • I
      libceph: switch to calc_target(), part 2 · bb873b53
      Ilya Dryomov 提交于
      The crux of this is getting rid of ceph_osdc_build_request(), so that
      MOSDOp can be encoded not before but after calc_target() calculates the
      actual target.  Encoding now happens within ceph_osdc_start_request().
      
      Also nuked is the accompanying bunch of pointers into the encoded
      buffer that was used to update fields on each send - instead, the
      entire front is re-encoded.  If we want to support target->name_len !=
      base->name_len in the future, there is no other way, because oid is
      surrounded by other fields in the encoded buffer.
      
      Encoding OSD ops and adding data items to the request message were
      mixed together in osd_req_encode_op().  While we want to re-encode OSD
      ops, we don't want to add duplicate data items to the message when
      resending, so all call to ceph_osdc_msg_data_add() are factored out
      into a new setup_request_data().
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      bb873b53
    • I
      libceph: switch to calc_target(), part 1 · a66dd383
      Ilya Dryomov 提交于
      Replace __calc_request_pg() and most of __map_request() with
      calc_target() and start using req->r_t.
      
      ceph_osdc_build_request() however still encodes base_oid, because it's
      called before calc_target() is and target_oid is empty at that point in
      time; a printf in osdc_show() also shows base_oid.  This is fixed in
      "libceph: switch to calc_target(), part 2".
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      a66dd383
    • I
      libceph: introduce ceph_osd_request_target, calc_target() · 63244fa1
      Ilya Dryomov 提交于
      Introduce ceph_osd_request_target, containing all mapping-related
      fields of ceph_osd_request and calc_target() for calculating mappings
      and populating it.
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      63244fa1
    • I
      libceph: pi->min_size, pi->last_force_request_resend · 04812acf
      Ilya Dryomov 提交于
      Add and decode pi->min_size and pi->last_force_request_resend.  These
      are going to be used by calc_target().
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      04812acf