1. 01 2月, 2018 1 次提交
    • J
      iversion: make inode_cmp_iversion{+raw} return bool instead of s64 · c0cef30e
      Jeff Layton 提交于
      As Linus points out:
      
          The inode_cmp_iversion{+raw}() functions are pure and utter crap.
      
          Why?
      
          You say that they return 0/negative/positive, but they do so in a
          completely broken manner. They return that ternary value as the
          sequence number difference in a 's64', which means that if you
          actually care about that ternary value, and do the *sane* thing that
          the kernel-doc of the function implies is the right thing, you would
          do
      
              int cmp = inode_cmp_iversion(inode, old);
              if (cmp < 0 ...
      
          and as a result you get code that looks sane, but that doesn't
          actually *WORK* right.
      
      Since none of the callers actually care about the ternary value here,
      convert the inode_cmp_iversion{+raw} functions to just return a boolean
      value (false for matching, true for non-matching).
      
      This matches the existing use of these functions just fine, and makes it
      simple to convert them to return a ternary value in the future if we
      grow callers that need it.
      
      With this change we can also reimplement inode_cmp_iversion in a simpler
      way using inode_peek_iversion.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c0cef30e
  2. 29 1月, 2018 3 次提交
    • J
      fs: handle inode->i_version more efficiently · f02a9ad1
      Jeff Layton 提交于
      Since i_version is mostly treated as an opaque value, we can exploit that
      fact to avoid incrementing it when no one is watching. With that change,
      we can avoid incrementing the counter on writes, unless someone has
      queried for it since it was last incremented. If the a/c/mtime don't
      change, and the i_version hasn't changed, then there's no need to dirty
      the inode metadata on a write.
      
      Convert the i_version counter to an atomic64_t, and use the lowest order
      bit to hold a flag that will tell whether anyone has queried the value
      since it was last incremented.
      
      When we go to maybe increment it, we fetch the value and check the flag
      bit.  If it's clear then we don't need to do anything if the update
      isn't being forced.
      
      If we do need to update, then we increment the counter by 2, and clear
      the flag bit, and then use a CAS op to swap it into place. If that
      works, we return true. If it doesn't then do it again with the value
      that we fetch from the CAS operation.
      
      On the query side, if the flag is already set, then we just shift the
      value down by 1 bit and return it. Otherwise, we set the flag in our
      on-stack value and again use cmpxchg to swap it into place if it hasn't
      changed. If it has, then we use the value from the cmpxchg as the new
      "old" value and try again.
      
      This method allows us to avoid incrementing the counter on writes (and
      dirtying the metadata) under typical workloads. We only need to increment
      if it has been queried since it was last changed.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      Acked-by: NDave Chinner <dchinner@redhat.com>
      Tested-by: NKrzysztof Kozlowski <krzk@kernel.org>
      f02a9ad1
    • J
      fs: don't take the i_lock in inode_inc_iversion · 7594c461
      Jeff Layton 提交于
      The rationale for taking the i_lock when incrementing this value is
      lost in antiquity. The readers of the field don't take it (at least
      not universally), so my assumption is that it was only done here to
      serialize incrementors.
      
      If that is indeed the case, then we can drop the i_lock from this
      codepath and treat it as a atomic64_t for the purposes of
      incrementing it. This allows us to use inode_inc_iversion without
      any danger of lock inversion.
      
      Note that the read side is not fetched atomically with this change.
      The assumption here is that that is not a critical issue since the
      i_version is not fully synchronized with anything else anyway.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      7594c461
    • J
      fs: new API for handling inode->i_version · ae5e165d
      Jeff Layton 提交于
      Add a documentation blob that explains what the i_version field is, how
      it is expected to work, and how it is currently implemented by various
      filesystems.
      
      We already have inode_inc_iversion. Add several other functions for
      manipulating and accessing the i_version counter. For now, the
      implementation is trivial and basically works the way that all of the
      open-coded i_version accesses work today.
      
      Future patches will convert existing users of i_version to use the new
      API, and then convert the backend implementation to do things more
      efficiently.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Reviewed-by: NJan Kara <jack@suse.cz>
      ae5e165d
  3. 26 1月, 2018 8 次提交
  4. 25 1月, 2018 1 次提交
  5. 24 1月, 2018 3 次提交
  6. 23 1月, 2018 5 次提交
    • E
      signal/ptrace: Add force_sig_ptrace_errno_trap and use it where needed · f71dd7dc
      Eric W. Biederman 提交于
      There are so many places that build struct siginfo by hand that at
      least one of them is bound to get it wrong.  A handful of cases in the
      kernel arguably did just that when using the errno field of siginfo to
      pass no errno values to userspace.  The usage is limited to a single
      si_code so at least does not mess up anything else.
      
      Encapsulate this questionable pattern in a helper function so
      that the userspace ABI is preserved.
      
      Update all of the places that use this pattern to use the new helper
      function.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      f71dd7dc
    • E
      signal: Helpers for faults with specialized siginfo layouts · 38246735
      Eric W. Biederman 提交于
      The helpers added are:
      send_sig_mceerr
      force_sig_mceerr
      force_sig_bnderr
      force_sig_pkuerr
      
      Filling out siginfo properly can ge tricky.  Especially for these
      specialized cases where the temptation is to share code with other
      cases which use a different subset of siginfo fields.  Unfortunately
      that code sharing frequently results in bugs with the wrong siginfo
      fields filled in, and makes it harder to verify that the siginfo
      structure was properly initialized.
      
      Provide these helpers instead that get all of the details right, and
      guarantee that siginfo is properly initialized.
      
      send_sig_mceerr and force_sig_mceer are a little special as two si
      codes BUS_MCEERR_AO and BUS_MCEER_AR both use the same extended
      signinfo layout.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      38246735
    • E
      signal: Add send_sig_fault and force_sig_fault · f8ec6601
      Eric W. Biederman 提交于
      The vast majority of signals sent from architecture specific code are
      simple faults.  Encapsulate this reality with two helper functions so
      that the nit-picky implementation of preparing a siginfo does not need
      to be repeated many times on each architecture.
      
      As only some architectures support the trapno field, make the trapno
      arguement only present on those architectures.
      
      Similary as ia64 has three fields: imm, flags, and isr that
      are specific to it.  Have those arguments always present on ia64
      and no where else.
      
      This ensures the architecture specific code always remembers which
      fields it needs to pass into the siginfo structure.
      Signed-off-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      f8ec6601
    • J
      f2fs: allow to recover node blocks given updated checkpoint · f2367923
      Jaegeuk Kim 提交于
      If fsck.f2fs changes crc, we have no way to recover some inode blocks by roll-
      forward recovery. Let's relax the condition to recover them.
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      f2367923
    • J
      f2fs: add an ioctl to disable GC for specific file · 1ad71a27
      Jaegeuk Kim 提交于
      This patch gives a flag to disable GC on given file, which would be useful, when
      user wants to keep its block map. It also conducts in-place-update for dontmove
      file.
      Reviewed-by: NChao Yu <yuchao0@huawei.com>
      Signed-off-by: NJaegeuk Kim <jaegeuk@kernel.org>
      1ad71a27
  7. 22 1月, 2018 1 次提交
  8. 20 1月, 2018 5 次提交
  9. 19 1月, 2018 1 次提交
  10. 18 1月, 2018 4 次提交
  11. 17 1月, 2018 6 次提交
  12. 16 1月, 2018 2 次提交
    • A
      blkcg: simplify statistic accumulation code · ddc21231
      Arnd Bergmann 提交于
      Some older compilers (gcc-4.4 through 4.6 in particular) struggle
      with the way that blkg_rwstat_read() returns a structure, leading
      to excessive stack usage and rather inefficient code:
      
      block/blk-cgroup.c: In function 'blkg_destroy':
      block/blk-cgroup.c:354:1: error: the frame size of 1296 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
      block/cfq-iosched.c: In function 'cfqg_stats_add_aux':
      block/cfq-iosched.c:753:1: error: the frame size of 1928 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
      block/bfq-cgroup.c: In function 'bfqg_stats_add_aux':
      block/bfq-cgroup.c:299:1: error: the frame size of 1928 bytes is larger than 1024 bytes [-Werror=frame-larger-than=]
      
      I also notice that there is no point in using atomic accesses
      for the local variables, so storing the temporaries in simple 'u64'
      variables not only avoids the stack usage on older compilers but
      also improves the object code on modern versions.
      
      Fixes: e6269c44 ("blkcg: add blkg_[rw]stat->aux_cnt and replace cfq_group->dead_stats with it")
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NArnd Bergmann <arnd@arndb.de>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      ddc21231
    • F
      nubus: Add support for the driver model · 7f86c765
      Finn Thain 提交于
      This patch brings basic support for the Linux Driver Model to the
      NuBus subsystem.
      
      For flexibility, the matching of boards with drivers is left up to the
      drivers. This is also the approach taken by NetBSD. A board may have
      many functions, and drivers may have to consider many functional
      resources and board resources in order to match a device.
      
      This implementation does not bind drivers to resources (nor does it bind
      many drivers to the same board). Apple's NuBus declaration ROM design
      is flexible enough to allow that, but I don't see a need to support it
      as we don't use the "slot zero" resources (in the main logic board ROM).
      
      Eliminate the global nubus_boards linked list by rewriting the procfs
      board iterator around bus_for_each_dev(). Hence the nubus device refcount
      can be used to determine the lifespan of board objects.
      
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Reviewed-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      Tested-by: NStan Johnson <userm57@yahoo.com>
      Signed-off-by: NFinn Thain <fthain@telegraphics.com.au>
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      7f86c765