1. 30 10月, 2013 5 次提交
  2. 29 10月, 2013 4 次提交
  3. 27 10月, 2013 1 次提交
  4. 03 10月, 2013 1 次提交
  5. 11 9月, 2013 1 次提交
    • D
      fs: convert fs shrinkers to new scan/count API · 1ab6c499
      Dave Chinner 提交于
      Convert the filesystem shrinkers to use the new API, and standardise some
      of the behaviours of the shrinkers at the same time.  For example,
      nr_to_scan means the number of objects to scan, not the number of objects
      to free.
      
      I refactored the CIFS idmap shrinker a little - it really needs to be
      broken up into a shrinker per tree and keep an item count with the tree
      root so that we don't need to walk the tree every time the shrinker needs
      to count the number of objects in the tree (i.e.  all the time under
      memory pressure).
      
      [glommer@openvz.org: fixes for ext4, ubifs, nfs, cifs and glock. Fixes are needed mainly due to new code merged in the tree]
      [assorted fixes folded in]
      Signed-off-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NGlauber Costa <glommer@openvz.org>
      Acked-by: NMel Gorman <mgorman@suse.de>
      Acked-by: NArtem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Acked-by: NJan Kara <jack@suse.cz>
      Acked-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Hunter <adrian.hunter@intel.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Artem Bityutskiy <artem.bityutskiy@linux.intel.com>
      Cc: Arve Hjønnevåg <arve@android.com>
      Cc: Carlos Maiolino <cmaiolino@redhat.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Chuck Lever <chuck.lever@oracle.com>
      Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Gleb Natapov <gleb@redhat.com>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: J. Bruce Fields <bfields@redhat.com>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Kent Overstreet <koverstreet@google.com>
      Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Cc: Marcelo Tosatti <mtosatti@redhat.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Thomas Hellstrom <thellstrom@vmware.com>
      Cc: Trond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      1ab6c499
  6. 04 9月, 2013 1 次提交
  7. 31 8月, 2013 3 次提交
  8. 08 8月, 2013 2 次提交
  9. 27 7月, 2013 1 次提交
  10. 24 7月, 2013 2 次提交
    • H
      nfsd: nfs4_file_get_access: need to be more careful with O_RDWR · df66e753
      Harshula Jayasuriya 提交于
      If fi_fds = {non-NULL, NULL, non-NULL} and oflag = O_WRONLY
      the WARN_ON_ONCE(!(fp->fi_fds[oflag] || fp->fi_fds[O_RDWR]))
      doesn't trigger when it should.
      Signed-off-by: NHarshula Jayasuriya <harshula@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      df66e753
    • H
      nfsd: nfsd_open: when dentry_open returns an error do not propagate as struct file · e4daf1ff
      Harshula Jayasuriya 提交于
      The following call chain:
      ------------------------------------------------------------
      nfs4_get_vfs_file
      - nfsd_open
        - dentry_open
          - do_dentry_open
            - __get_file_write_access
              - get_write_access
                - return atomic_inc_unless_negative(&inode->i_writecount) ? 0 : -ETXTBSY;
      ------------------------------------------------------------
      
      can result in the following state:
      ------------------------------------------------------------
      struct nfs4_file {
      ...
        fi_fds = {0xffff880c1fa65c80, 0xffffffffffffffe6, 0x0},
        fi_access = {{
            counter = 0x1
          }, {
            counter = 0x0
          }},
      ...
      ------------------------------------------------------------
      
      1) First time around, in nfs4_get_vfs_file() fp->fi_fds[O_WRONLY] is
      NULL, hence nfsd_open() is called where we get status set to an error
      and fp->fi_fds[O_WRONLY] to -ETXTBSY. Thus we do not reach
      nfs4_file_get_access() and fi_access[O_WRONLY] is not incremented.
      
      2) Second time around, in nfs4_get_vfs_file() fp->fi_fds[O_WRONLY] is
      NOT NULL (-ETXTBSY), so nfsd_open() is NOT called, but
      nfs4_file_get_access() IS called and fi_access[O_WRONLY] is incremented.
      Thus we leave a landmine in the form of the nfs4_file data structure in
      an incorrect state.
      
      3) Eventually, when __nfs4_file_put_access() is called it finds
      fi_access[O_WRONLY] being non-zero, it decrements it and calls
      nfs4_file_put_fd() which tries to fput -ETXTBSY.
      ------------------------------------------------------------
      ...
           [exception RIP: fput+0x9]
           RIP: ffffffff81177fa9  RSP: ffff88062e365c90  RFLAGS: 00010282
           RAX: ffff880c2b3d99cc  RBX: ffff880c2b3d9978  RCX: 0000000000000002
           RDX: dead000000100101  RSI: 0000000000000001  RDI: ffffffffffffffe6
           RBP: ffff88062e365c90   R8: ffff88041fe797d8   R9: ffff88062e365d58
           R10: 0000000000000008  R11: 0000000000000000  R12: 0000000000000001
           R13: 0000000000000007  R14: 0000000000000000  R15: 0000000000000000
           ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
        #9 [ffff88062e365c98] __nfs4_file_put_access at ffffffffa0562334 [nfsd]
       #10 [ffff88062e365cc8] nfs4_file_put_access at ffffffffa05623ab [nfsd]
       #11 [ffff88062e365ce8] free_generic_stateid at ffffffffa056634d [nfsd]
       #12 [ffff88062e365d18] release_open_stateid at ffffffffa0566e4b [nfsd]
       #13 [ffff88062e365d38] nfsd4_close at ffffffffa0567401 [nfsd]
       #14 [ffff88062e365d88] nfsd4_proc_compound at ffffffffa0557f28 [nfsd]
       #15 [ffff88062e365dd8] nfsd_dispatch at ffffffffa054543e [nfsd]
       #16 [ffff88062e365e18] svc_process_common at ffffffffa04ba5a4 [sunrpc]
       #17 [ffff88062e365e98] svc_process at ffffffffa04babe0 [sunrpc]
       #18 [ffff88062e365eb8] nfsd at ffffffffa0545b62 [nfsd]
       #19 [ffff88062e365ee8] kthread at ffffffff81090886
       #20 [ffff88062e365f48] kernel_thread at ffffffff8100c14a
      ------------------------------------------------------------
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NHarshula Jayasuriya <harshula@redhat.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      e4daf1ff
  11. 13 7月, 2013 1 次提交
    • J
      nfsd4: fix minorversion support interface · 35f7a14f
      J. Bruce Fields 提交于
      You can turn on or off support for minorversions using e.g.
      
      	echo "-4.2" >/proc/fs/nfsd/versions
      
      However, the current implementation is a little wonky.  For example, the
      above will turn off 4.2 support, but it will also turn *on* 4.1 support.
      
      This didn't matter as long as we only had 2 minorversions, which was
      true till very recently.
      
      And do a little cleanup here.
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      35f7a14f
  12. 09 7月, 2013 2 次提交
    • J
      nfsd4: support minorversion 1 by default · d1091481
      J. Bruce Fields 提交于
      We now have minimal minorversion 1 support; turn it on by default.
      
      This can still be turned off with "echo -4.1 >/proc/fs/nfsd/versions".
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      d1091481
    • J
      nfsd4: allow destroy_session over destroyed session · f0f51f5c
      J. Bruce Fields 提交于
      RFC 5661 allows a client to destroy a session using a compound
      associated with the destroyed session, as long as the DESTROY_SESSION op
      is the last op of the compound.
      
      We attempt to allow this, but testing against a Solaris client (which
      does destroy sessions in this way) showed that we were failing the
      DESTROY_SESSION with NFS4ERR_DELAY, because we assumed the reference
      count on the session (held by us) represented another rpc in progress
      over this session.
      
      Fix this by noting that in this case the expected reference count is 1,
      not 0.
      
      Also, note as long as the session holds a reference to the compound
      we're destroying, we can't free it here--instead, delay the free till
      the final put in nfs4svc_encode_compoundres.
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      f0f51f5c
  13. 02 7月, 2013 11 次提交
    • J
      nfsd4: return delegation immediately if lease fails · d08d32e6
      J. Bruce Fields 提交于
      This case shouldn't happen--the administrator shouldn't really allow
      other applications access to the export until clients have had the
      chance to reclaim their state--but if it does then we should set the
      "return this lease immediately" bit on the reply.  That still leaves
      some small races, but it's the best the protocol allows us to do in the
      case a lease is ripped out from under us....
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      d08d32e6
    • J
      nfsd4: do not throw away 4.1 lock state on last unlock · 0a262ffb
      J. Bruce Fields 提交于
      This reverts commit eb2099f3 "nfsd4:
      release lockowners on last unlock in 4.1 case".  Trond identified
      language in rfc 5661 section 8.2.4 which forbids this behavior:
      
      	Stateids associated with byte-range locks are an exception.
      	They remain valid even if a LOCKU frees all remaining locks, so
      	long as the open file with which they are associated remains
      	open, unless the client frees the stateids via the FREE_STATEID
      	operation.
      
      And bakeathon 2013 testing found a 4.1 freebsd client was getting an
      incorrect BAD_STATEID return from a FREE_STATEID in the above situation
      and then failing.
      
      The spec language honestly was probably a mistake but at this point with
      implementations already following it we're probably stuck with that.
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      0a262ffb
    • J
      nfsd4: delegation-based open reclaims should bypass permissions · 89f6c336
      J. Bruce Fields 提交于
      We saw a v4.0 client's create fail as follows:
      
      	- open create succeeds and gets a read delegation
      	- client attempts to set mode on new file, gets DELAY while
      	  server recalls delegation.
      	- client attempts a CLAIM_DELEGATE_CUR open using the
      	  delegation, gets error because of new file mode.
      
      This probably can't happen on a recent kernel since we're no longer
      giving out delegations on create opens.  Nevertheless, it's a
      bug--reclaim opens should bypass permission checks.
      Reported-by: NSteve Dickson <steved@redhat.com>
      Reported-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      89f6c336
    • J
      nfsd4: minor read_buf cleanup · 590b7431
      J. Bruce Fields 提交于
      The code to step to the next page seems reasonably self-contained.
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      590b7431
    • J
      nfsd4: fix decoding of compounds across page boundaries · 24750082
      J. Bruce Fields 提交于
      A freebsd NFSv4.0 client was getting rare IO errors expanding a tarball.
      A network trace showed the server returning BAD_XDR on the final getattr
      of a getattr+write+getattr compound.  The final getattr started on a
      page boundary.
      
      I believe the Linux client ignores errors on the post-write getattr, and
      that that's why we haven't seen this before.
      
      Cc: stable@vger.kernel.org
      Reported-by: NRick Macklem <rmacklem@uoguelph.ca>
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      24750082
    • J
      nfsd4: clean up nfs4_open_delegation · 99c41515
      J. Bruce Fields 提交于
      The nfs4_open_delegation logic is unecessarily baroque.
      
      Also stop pretending we support write delegations in several places.
      
      Some day we will support write delegations, but when that happens adding
      back in these flag parameters will be the easy part.  For now they're
      just confusing.
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      99c41515
    • S
      NFSD: Don't give out read delegations on creates · 9a0590ae
      Steve Dickson 提交于
      When an exclusive create is done with the mode bits
      set (aka open(testfile, O_CREAT | O_EXCL, 0777)) this
      causes a OPEN op followed by a SETATTR op. When a
      read delegation is given in the OPEN, it causes
      the SETATTR to delay with EAGAIN until the
      delegation is recalled.
      
      This patch caused exclusive creates to give out
      a write delegation (which turn into no delegation)
      which allows the SETATTR seamlessly succeed.
      Signed-off-by: NSteve Dickson <steved@redhat.com>
      [bfields: do this for any CREATE, not just exclusive; comment]
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      9a0590ae
    • J
      nfsd4: allow client to send no cb_sec flavors · 57569a70
      J. Bruce Fields 提交于
      In testing I notice that some of the pynfs tests forget to send any
      cb_sec flavors, and that we haven't necessarily errored out in that case
      before.
      
      I'll fix pynfs, but am also inclined to default to trying AUTH_NONE in
      that case in case this is something clients actually do.
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      57569a70
    • J
      nfsd4: fail attempts to request gss on the backchannel · b78724b7
      J. Bruce Fields 提交于
      We don't support gss on the backchannel.  We should state that fact up
      front rather than just letting things continue and later making the
      client try to figure out why the backchannel isn't working.
      
      Trond suggested instead returning NFS4ERR_NOENT.  I think it would be
      tricky for the client to distinguish between the case "I don't support
      gss on the backchannel" and "I can't find that in my cache, please
      create another context and try that instead", and I'd prefer something
      that currently doesn't have any other meaning for this operation, hence
      the (somewhat arbitrary) NFS4ERR_ENCR_ALG_UNSUPP.
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      b78724b7
    • J
      nfsd4: implement minimal SP4_MACH_CRED · 57266a6e
      J. Bruce Fields 提交于
      Do a minimal SP4_MACH_CRED implementation suggested by Trond, ignoring
      the client-provided spo_must_* arrays and just enforcing credential
      checks for the minimum required operations.
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      57266a6e
    • J
      svcrpc: store gss mech in svc_cred · 0dc1531a
      J. Bruce Fields 提交于
      Store a pointer to the gss mechanism used in the rq_cred and cl_cred.
      This will make it easier to enforce SP4_MACH_CRED, which needs to
      compare the mechanism used on the exchange_id with that used on
      protected operations.
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      0dc1531a
  14. 29 6月, 2013 4 次提交
    • J
      locks: protect most of the file_lock handling with i_lock · 1c8c601a
      Jeff Layton 提交于
      Having a global lock that protects all of this code is a clear
      scalability problem. Instead of doing that, move most of the code to be
      protected by the i_lock instead. The exceptions are the global lists
      that the ->fl_link sits on, and the ->fl_block list.
      
      ->fl_link is what connects these structures to the
      global lists, so we must ensure that we hold those locks when iterating
      over or updating these lists.
      
      Furthermore, sound deadlock detection requires that we hold the
      blocked_list state steady while checking for loops. We also must ensure
      that the search and update to the list are atomic.
      
      For the checking and insertion side of the blocked_list, push the
      acquisition of the global lock into __posix_lock_file and ensure that
      checking and update of the  blocked_list is done without dropping the
      lock in between.
      
      On the removal side, when waking up blocked lock waiters, take the
      global lock before walking the blocked list and dequeue the waiters from
      the global list prior to removal from the fl_block list.
      
      With this, deadlock detection should be race free while we minimize
      excessive file_lock_lock thrashing.
      
      Finally, in order to avoid a lock inversion problem when handling
      /proc/locks output we must ensure that manipulations of the fl_block
      list are also protected by the file_lock_lock.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      1c8c601a
    • A
      [readdir] constify ->actor · ac6614b7
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      ac6614b7
    • A
      [readdir] introduce ->iterate(), ctx->pos, dir_emit() · bb6f619b
      Al Viro 提交于
      New method - ->iterate(file, ctx).  That's the replacement for ->readdir();
      it takes callback from ctx->actor, uses ctx->pos instead of file->f_pos and
      calls dir_emit(ctx, ...) instead of filldir(data, ...).  It does *not*
      update file->f_pos (or look at it, for that matter); iterate_dir() does the
      update.
      
      Note that dir_emit() takes the offset from ctx->pos (and eventually
      filldir_t will lose that argument).
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      bb6f619b
    • A
      [readdir] introduce iterate_dir() and dir_context · 5c0ba4e0
      Al Viro 提交于
      iterate_dir(): new helper, replacing vfs_readdir().
      
      struct dir_context: contains the readdir callback (and will get more stuff
      in it), embedded into whatever data that callback wants to deal with;
      eventually, we'll be passing it to ->readdir() replacement instead of
      (data,filldir) pair.
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      5c0ba4e0
  15. 21 5月, 2013 1 次提交