1. 07 1月, 2009 14 次提交
    • C
      NSM: Generate NSMPROC_MON's "priv" argument when nsm_handle is created · 7e44d3be
      Chuck Lever 提交于
      Introduce a new data type, used by both the in-kernel NLM and NSM
      implementations, that is used to manage the opaque "priv" argument
      for the NSMPROC_MON and NLMPROC_SM_NOTIFY calls.
      
      Construct the "priv" cookie when the nsm_handle is created.
      
      The nsm_init_private() function may look a little strange, but it is
      roughly equivalent to how the XDR encoder formed the "priv" argument.
      It's going to go away soon.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      7e44d3be
    • C
      NSM: Move nsm_find() to fs/lockd/mon.c · 67c6d107
      Chuck Lever 提交于
      The nsm_find() function sets up fresh nsm_handle entries.  This is
      where we will store the "priv" cookie used to lookup nsm_handles during
      reboot recovery.  The cookie will be constructed when nsm_find()
      creates a new nsm_handle.
      
      As much as possible, I would like to keep everything that handles a
      "priv" cookie in fs/lockd/mon.c so that all the smarts are in one
      source file.  That organization should make it pretty simple to see how
      all this works.
      
      To me, it makes more sense than the current arrangement to keep
      nsm_find() with nsm_monitor() and nsm_unmonitor().
      
      So, start reorganizing by moving nsm_find() into fs/lockd/mon.c.  The
      nsm_release() function comes along too, since it shares the nsm_lock
      global variable.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      67c6d107
    • C
      NSM: Move NSM program and procedure numbers to fs/lockd/mon.c · 36e8e668
      Chuck Lever 提交于
      Clean up: Move the RPC program and procedure numbers for NSM into the
      one source file that needs them: fs/lockd/mon.c.
      
      And, as with NLM, NFS, and rpcbind calls, use NSMPROC_FOO instead of
      SM_FOO for NSM procedure numbers.
      
      Finally, make a couple of comments more precise: what is referred to
      here as SM_NOTIFY is really the NLM (lockd) NLMPROC_SM_NOTIFY downcall,
      not NSMPROC_NOTIFY.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      36e8e668
    • C
      NSM: Move NSM-related XDR data structures to lockd's xdr.h · 9c1bfd03
      Chuck Lever 提交于
      Clean up: NSM's XDR data structures are used only in fs/lockd/mon.c,
      so move them there.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      9c1bfd03
    • C
      NLM: Move the public declaration of nsm_unmonitor() to lockd.h · 356c3eb4
      Chuck Lever 提交于
      Clean up.
      
      Make the nlm_host argument "const," and move the public declaration to
      lockd.h.  Add a documenting comment.
      
      Bruce observed that nsm_unmonitor()'s only caller doesn't care about
      its return code, so make nsm_unmonitor() return void.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      356c3eb4
    • C
      NSM: Release nsmhandle in nlm_destroy_host · c8c23c42
      Chuck Lever 提交于
      The nsm_handle's reference count is bumped in nlm_lookup_host().  It
      should be decremented in nlm_destroy_host() to make it easier to see
      the balance of these two operations.
      
      Move the nsm_release() call to fs/lockd/host.c.
      
      The h_nsmhandle pointer is set in nlm_lookup_host(), and never cleared.
      The nlm_destroy_host() function is never called for the same nlm_host
      twice, so h_nsmhandle won't ever be NULL when nsm_unmonitor() is
      called.
      
      All references to the nlm_host are gone before it is freed.  We can
      skip making h_nsmhandle NULL just before the nlm_host is deallocated.
      
      It's also likely we can remove the h_nsmhandle NULL check in
      nlmsvc_is_client() as well, but we can do that later when rearchitect-
      ing the nlm_host cache.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      c8c23c42
    • C
      NLM: Move the public declaration of nsm_monitor() to lockd.h · 1e49323c
      Chuck Lever 提交于
      Clean up.
      
      Make the nlm_host argument "const," and move the public declaration to
      lockd.h with other NSM public function (nsm_release, eg) and global
      variable declarations.
      
      Add a documenting comment.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      1e49323c
    • C
      NSM: Support IPv6 version of mon_name · 29ed1407
      Chuck Lever 提交于
      The "mon_name" argument of the NSMPROC_MON and NSMPROC_UNMON upcalls
      is a string that contains the hostname or IP address of the remote peer
      to be notified when this host has rebooted.  The sm-notify command uses
      this identifier to contact the peer when we reboot, so it must be
      either a well-qualified DNS hostname or a presentation format IP
      address string.
      
      When the "nsm_use_hostnames" sysctl is set to zero, the kernel's NSM
      provides a presentation format IP address in the "mon_name" argument.
      Otherwise, the "caller_name" argument from NLM requests is used,
      which is usually just the DNS hostname of the peer.
      
      To support IPv6 addresses for the mon_name argument, we use the
      nsm_handle's address eye-catcher, which already contains an appropriate
      presentation format address string.  Using the eye-catcher string
      obviates the need to use a large buffer on the stack to form the
      presentation address string for the upcall.
      
      This patch also addresses a subtle bug.
      
      An NSMPROC_MON request and the subsequent NSMPROC_UNMON request for the
      same peer are required to use the same value for the "mon_name"
      argument.  Otherwise, rpc.statd's NSMPROC_UNMON processing cannot
      locate the database entry for that peer and remove it.
      
      If the setting of nsm_use_hostnames is changed between the time the
      kernel sends an NSMPROC_MON request and the time it sends the
      NSMPROC_UNMON request for the same peer, the "mon_name" argument for
      these two requests may not be the same.  This is because the value of
      "mon_name" is currently chosen at the moment the call is made based on
      the setting of nsm_use_hostnames
      
      To ensure both requests pass identical contents in the "mon_name"
      argument, we now select which string to use for the argument in the
      nsm_monitor() function.  A pointer to this string is saved in the
      nsm_handle so it can be used for a subsequent NSMPROC_UNMON upcall.
      
      NB: There are other potential problems, such as how nlm_host_rebooted()
      might behave if nsm_use_hostnames were changed while hosts are still
      being monitored.  This patch does not attempt to address those
      problems.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      29ed1407
    • C
      NSM: Use modern style for sm_name field in nsm_handle · f47534f7
      Chuck Lever 提交于
      Clean up: I'm about to add another "char *" field to the nsm_handle
      structure.  The sm_name field uses an older style of declaring a
      "char *" field.  If I match that style for the new field, checkpatch.pl
      will complain.
      
      So, fix the sm_name field to use the new style.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      f47534f7
    • C
      NLM: Support IPv6 scope IDs in nlm_display_address() · bc995801
      Chuck Lever 提交于
      Scope ID support is needed since the kernel's NSM implementation is
      about to use these displayed addresses as a mon_name in some cases.
      
      When nsm_use_hostnames is zero, without scope ID support NSM will fail
      to handle peers that contact us via a link-local address.  Link-local
      addresses do not work without an interface ID, which is stored in the
      sockaddr's sin6_scope_id field.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      bc995801
    • C
      NLM: Remove address eye-catcher buffers from nlm_host · 1df40b60
      Chuck Lever 提交于
      The h_name field in struct nlm_host is a just copy of
      h_nsmhandle->sm_name.  Likewise, the contents of the h_addrbuf field
      should be identical to the sm_addrbuf field.
      
      The h_srcaddrbuf field is used only in one place for debugging.  We can
      live without this until we get %pI formatting for printk().
      
      Currently these buffers are 48 bytes, but we need to support scope IDs
      in IPv6 presentation addresses, which means making the buffers even
      larger.  Instead, let's find ways to eliminate them to save space.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      1df40b60
    • C
      NLM: Use modern style for pointer fields in nlm_host · 7538ce1e
      Chuck Lever 提交于
      Clean up: I'm about to add another "char *" field to the nlm_host
      structure.  The h_name field, for example, uses an older style of
      declaring a "char *" field.  If I match that style for the new field,
      checkpatch.pl will complain.
      
      So, fix pointer fields to use the new style.
      Signed-off-by: NChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      7538ce1e
    • J
      sunrpc: add sv_maxconn field to svc_serv (try #3) · c9233eb7
      Jeff Layton 提交于
      svc_check_conn_limits() attempts to prevent denial of service attacks
      by having the service close old connections once it reaches a
      threshold. This threshold is based on the number of threads in the
      service:
      
      	(serv->sv_nrthreads + 3) * 20
      
      Once we reach this, we drop the oldest connections and a printk pops
      to warn the admin that they should increase the number of threads.
      
      Increasing the number of threads isn't an option however for services
      like lockd. We don't want to eliminate this check entirely for such
      services but we need some way to increase this limit.
      
      This patch adds a sv_maxconn field to the svc_serv struct. When it's
      set to 0, we use the current method to calculate the max number of
      connections. RPC services can then set this on an as-needed basis.
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Acked-by: NNeil Brown <neilb@suse.de>
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      c9233eb7
    • J
      nfsd: document new filehandle fsid types · 548eaca4
      J. Bruce Fields 提交于
      Descriptions taken from mountd code (in nfs-utils/utils/mountd/cache.c).
      Signed-off-by: NJ. Bruce Fields <bfields@citi.umich.edu>
      548eaca4
  2. 06 1月, 2009 24 次提交
    • A
      dm: support barriers on simple devices · ab4c1424
      Andi Kleen 提交于
      Implement barrier support for single device DM devices
      
      This patch implements barrier support in DM for the common case of dm linear
      just remapping a single underlying device. In this case we can safely
      pass the barrier through because there can be no reordering between
      devices.
      
       NB. Any DM device might cease to support barriers if it gets
           reconfigured so code must continue to allow for a possible
           -EOPNOTSUPP on every barrier bio submitted.  - agk
      Signed-off-by: NAndi Kleen <ak@suse.de>
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      ab4c1424
    • K
      dm request: extend target interface · 7d76345d
      Kiyoshi Ueda 提交于
      This patch adds the following target interfaces for request-based dm.
      
        map_rq    : for mapping a request
      
        rq_end_io : for finishing a request
      
        busy      : for avoiding performance regression from bio-based dm.
                    Target can tell dm core not to map requests now, and
                    that may help requests in the block layer queue to be
                    bigger by I/O merging.
                    In bio-based dm, this behavior is done by device
                    drivers managing the block layer queue.
                    But in request-based dm, dm core has to do that
                    since dm core manages the block layer queue.
      Signed-off-by: NKiyoshi Ueda <k-ueda@ct.jp.nec.com>
      Signed-off-by: NJun'ichi Nomura <j-nomura@ce.jp.nec.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      7d76345d
    • M
      dm: consolidate target deregistration error handling · 10d3bd09
      Mikulas Patocka 提交于
      Change dm_unregister_target to return void and use BUG() for error
      reporting.
      
      dm_unregister_target can only fail because of programming bug in the
      target driver. It can't fail because of user's behavior or disk errors.
      
      This patch changes unregister_target to return void and use BUG if
      someone tries to unregister non-registered target or unregister target
      that is in use.
      
      This patch removes code duplication (testing of error codes in all dm
      targets) and reports bugs in just one place, in dm_unregister_target. In
      some target drivers, these return codes were ignored, which could lead
      to a situation where bugs could be missed.
      Signed-off-by: NMikulas Patocka <mpatocka@redhat.com>
      Signed-off-by: NAlasdair G Kergon <agk@redhat.com>
      10d3bd09
    • N
      mm lockless pagecache barrier fix · e8c82c2e
      Nick Piggin 提交于
      An XFS workload showed up a bug in the lockless pagecache patch. Basically it
      would go into an "infinite" loop, although it would sometimes be able to break
      out of the loop! The reason is a missing compiler barrier in the "increment
      reference count unless it was zero" case of the lockless pagecache protocol in
      the gang lookup functions.
      
      This would cause the compiler to use a cached value of struct page pointer to
      retry the operation with, rather than reload it. So the page might have been
      removed from pagecache and freed (refcount==0) but the lookup would not correctly
      notice the page is no longer in pagecache, and keep attempting to increment the
      refcount and failing, until the page gets reallocated for something else. This
      isn't a data corruption because the condition will be detected if the page has
      been reallocated. However it can result in a lockup.
      
      Linus points out that ACCESS_ONCE is also required in that pointer load, even
      if it's absence is not causing a bug on our particular build. The most general
      way to solve this is just to put an rcu_dereference in radix_tree_deref_slot.
      
      Assembly of find_get_pages,
      before:
      .L220:
              movq    (%rbx), %rax    #* ivtmp.1162, tmp82
              movq    (%rax), %rdi    #, prephitmp.1149
      .L218:
              testb   $1, %dil        #, prephitmp.1149
              jne     .L217   #,
              testq   %rdi, %rdi      # prephitmp.1149
              je      .L203   #,
              cmpq    $-1, %rdi       #, prephitmp.1149
              je      .L217   #,
              movl    8(%rdi), %esi   # <variable>._count.counter, c
              testl   %esi, %esi      # c
              je      .L218   #,
      
      after:
      .L212:
              movq    (%rbx), %rax    #* ivtmp.1109, tmp81
              movq    (%rax), %rdi    #, ret
              testb   $1, %dil        #, ret
              jne     .L211   #,
              testq   %rdi, %rdi      # ret
              je      .L197   #,
              cmpq    $-1, %rdi       #, ret
              je      .L211   #,
              movl    8(%rdi), %esi   # <variable>._count.counter, c
              testl   %esi, %esi      # c
              je      .L212   #,
      
      (notice the obvious infinite loop in the first example, if page->count remains 0)
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Reviewed-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e8c82c2e
    • M
      inotify: fix type errors in interfaces · 4ae8978c
      Michael Kerrisk 提交于
      The problems lie in the types used for some inotify interfaces, both at the kernel level and at the glibc level. This mail addresses the kernel problem. I will follow up with some suggestions for glibc changes.
      
      For the sys_inotify_rm_watch() interface, the type of the 'wd' argument is
      currently 'u32', it should be '__s32' .  That is Robert's suggestion, and
      is consistent with the other declarations of watch descriptors in the
      kernel source, in particular, the inotify_event structure in
      include/linux/inotify.h:
      
      struct inotify_event {
              __s32           wd;             /* watch descriptor */
              __u32           mask;           /* watch mask */
              __u32           cookie;         /* cookie to synchronize two events */
              __u32           len;            /* length (including nulls) of name */
              char            name[0];        /* stub for possible name */
      };
      
      The patch makes the changes needed for inotify_rm_watch().
      Signed-off-by: NMichael Kerrisk <mtk.manpages@googlemail.com>
      Cc: Robert Love <rlove@google.com>
      Cc: Vegard Nossum <vegard.nossum@gmail.com>
      Cc: Ulrich Drepper <drepper@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      4ae8978c
    • C
      add a vfs_fsync helper · 4c728ef5
      Christoph Hellwig 提交于
      Fsync currently has a fdatawrite/fdatawait pair around the method call,
      and a mutex_lock/unlock of the inode mutex.  All callers of fsync have
      to duplicate this, but we have a few and most of them don't quite get
      it right.  This patch adds a new vfs_fsync that takes care of this.
      It's a little more complicated as usual as ->fsync might get a NULL file
      pointer and just a dentry from nfsd, but otherwise gets afile and we
      want to take the mapping and file operations from it when it is there.
      
      Notes on the fsync callers:
      
       - ecryptfs wasn't calling filemap_fdatawrite / filemap_fdatawait on the
         	lower file
       - coda wasn't calling filemap_fdatawrite / filemap_fdatawait on the host
      	file, and returning 0 when ->fsync was missing
       - shm wasn't calling either filemap_fdatawrite / filemap_fdatawait nor
         taking i_mutex.  Now given that shared memory doesn't have disk
         backing not doing anything in fsync seems fine and I left it out of
         the vfs_fsync conversion for now, but in that case we might just
         not pass it through to the lower file at all but just call the no-op
         simple_sync_file directly.
      
      [and now actually export vfs_fsync]
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      4c728ef5
    • J
      jbd2: Add buffer triggers · e06c8227
      Joel Becker 提交于
      Filesystems often to do compute intensive operation on some
      metadata.  If this operation is repeated many times, it can be very
      expensive.  It would be much nicer if the operation could be performed
      once before a buffer goes to disk.
      
      This adds triggers to jbd2 buffer heads.  Just before writing a metadata
      buffer to the journal, jbd2 will optionally call a commit trigger associated
      with the buffer.  If the journal is aborted, an abort trigger will be
      called on any dirty buffers as they are dropped from pending
      transactions.
      
      ocfs2 will use this feature.
      
      Initially I tried to come up with a more generic trigger that could be
      used for non-buffer-related events like transaction completion.  It
      doesn't tie nicely, because the information a buffer trigger needs
      (specific to a journal_head) isn't the same as what a transaction
      trigger needs (specific to a tranaction_t or perhaps journal_t).  So I
      implemented a buffer set, with the understanding that
      journal/transaction wide triggers should be implemented separately.
      
      There is only one trigger set allowed per buffer.  I can't think of any
      reason to attach more than one set.  Contrast this with a journal or
      transaction in which multiple places may want to watch the entire
      transaction separately.
      
      The trigger sets are considered static allocation from the jbd2
      perspective.  ocfs2 will just have one trigger set per block type,
      setting the same set on every bh of the same type.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: <linux-ext4@vger.kernel.org>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      e06c8227
    • J
      quota: Export dquot_alloc() and dquot_destroy() functions · 7d9056ba
      Jan Kara 提交于
      These are default functions for creating and destroying quota structures
      and they should be used from filesystems.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      7d9056ba
    • J
      quota: Unexport dqblk_v1.h and dqblk_v2.h · 5cd9d5bb
      Jan Kara 提交于
      Unexport header files dqblk_v[12].h since except for quota format ID they
      don't contain information userspace should be interested in. Move ID
      definitions to quota.h.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      5cd9d5bb
    • M
      jbd2: Add BH_JBDPrivateStart · e97fcd95
      Mark Fasheh 提交于
      Add this so that file systems using JBD2 can safely allocate unused b_state
      bits.
      
      In this case, we add it so that Ocfs2 can define a single bit for tracking
      the validation state of a buffer.
      Acked-by: N"Theodore Ts'o" <tytso@mit.edu>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      e97fcd95
    • J
      quota: Implement function for scanning active dquots · 12c77527
      Jan Kara 提交于
      OCFS2 needs to scan all active dquots once in a while and sync quota
      information among cluster nodes. Provide a helper function for it so
      that it does not have to reimplement internally a list which VFS
      already has. Moreover this function is probably going to be useful
      for other clustered filesystems if they decide to use VFS quotas.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      12c77527
    • J
      quota: Add helpers to allow ocfs2 specific quota initialization, freeing and recovery · 3d9ea253
      Jan Kara 提交于
      OCFS2 needs to peek whether quota structure is already in memory so
      that it can avoid expensive cluster locking in that case. Similarly
      when freeing dquots, it checks whether it is the last quota structure
      user or not. Finally, it needs to get reference to dquot structure for
      specified id and quota type when recovering quota file after crash.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      3d9ea253
    • J
      quota: Update version number · 571b46e4
      Jan Kara 提交于
      Increase reported version number of quota support since quota core has changed
      significantly. Also remove __DQUOT_NUM_VERSION__ since nobody uses it.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      571b46e4
    • J
      quota: Keep which entries were set by SETQUOTA quotactl · 4d59bce4
      Jan Kara 提交于
      Quota in a clustered environment needs to synchronize quota information
      among cluster nodes. This means we have to occasionally update some
      information in dquot from disk / network. On the other hand we have to
      be careful not to overwrite changes administrator did via SETQUOTA.
      So indicate in dquot->dq_flags which entries have been set by SETQUOTA
      and quota format can clear these flags when it properly propagated
      the changes.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      4d59bce4
    • J
      quota: Allow negative usage of space and inodes · db49d2df
      Jan Kara 提交于
      For clustered filesystems, it can happen that space / inode usage goes
      negative temporarily (because some node is allocating another node
      is freeing and they are not completely in sync). So let quota code
      allow this and change qsize_t so a signed type so that we don't
      underflow the variables.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      db49d2df
    • J
      quota: Convert union in mem_dqinfo to a pointer · e3d4d56b
      Jan Kara 提交于
      Coming quota support for OCFS2 is going to need quite a bit
      of additional per-sb quota information. Moreover having fs.h
      include all the types needed for this structure would be a
      pain in the a**. So remove the union from mem_dqinfo and add
      a private pointer for filesystem's use.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      e3d4d56b
    • J
      quota: Split off quota tree handling into a separate file · 1ccd14b9
      Jan Kara 提交于
      There is going to be a new version of quota format having 64-bit
      quota limits and a new quota format for OCFS2. They are both
      going to use the same tree structure as VFSv0 quota format. So
      split out tree handling into a separate file and make size of
      leaf blocks, amount of space usable in each block (needed for
      checksumming) and structures contained in them configurable
      so that the code can be shared.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      1ccd14b9
    • J
      quota: Move quotaio_v[12].h from include/linux/ to fs/ · cf770c13
      Jan Kara 提交于
      Since these include files are used only by implementation of quota formats,
      there's no need to have them in include/linux/.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      cf770c13
    • J
      quota: Introduce DQUOT_QUOTA_SYS_FILE flag · ca785ec6
      Jan Kara 提交于
      If filesystem can handle quota files as system files hidden from users, we can
      skip a lot of cache invalidation, syncing, inode flags setting etc. when
      turning quotas on, off and quota_sync. Allow filesystem to indicate that it is
      hiding quota files from users by DQUOT_QUOTA_SYS_FILE flag.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      ca785ec6
    • J
      quota: Remove compatibility function sb_any_quota_enabled() · dcb30695
      Jan Kara 提交于
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      dcb30695
    • J
      quota: Allow to separately enable quota accounting and enforcing limits · f55abc0f
      Jan Kara 提交于
      Split DQUOT_USR_ENABLED (and DQUOT_GRP_ENABLED) into DQUOT_USR_USAGE_ENABLED
      and DQUOT_USR_LIMITS_ENABLED. This way we are able to separately enable /
      disable whether we should:
      1) ignore quotas completely
      2) just keep uptodate information about usage
      3) actually enforce quota limits
      
      This is going to be useful when quota is treated as filesystem metadata - we
      then want to keep quota information uptodate all the time and just enable /
      disable limits enforcement.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      f55abc0f
    • J
      quota: Make _SUSPENDED just a flag · e4bc7b4b
      Jan Kara 提交于
      Upto now, DQUOT_USR_SUSPENDED behaved like a state - i.e., either quota
      was enabled or suspended or none. Now allowed states are 0, ENABLED,
      ENABLED | SUSPENDED. This will be useful later when we implement separate
      enabling of quota usage tracking and limits enforcement because we need to
      keep track of a state which has been suspended.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      e4bc7b4b
    • J
      quota: Increase size of variables for limits and inode usage · 12095460
      Jan Kara 提交于
      So far quota was fine with quota block limits and inode limits/numbers in
      a 32-bit type. Now with rapid increase in storage sizes there are coming
      requests to be able to handle quota limits above 4TB / more that 2^32 inodes.
      So bump up sizes of types in mem_dqblk structure to 64-bits to be able to
      handle this. Also update inode allocation / checking functions to use qsize_t
      and make global structure keep quota limits in bytes so that things are
      consistent.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      12095460
    • J
      quota: Add callbacks for allocating and destroying dquot structures · 74f783af
      Jan Kara 提交于
      Some filesystems would like to keep private information together with each
      dquot. Add callbacks alloc_dquot and destroy_dquot allowing filesystem to
      allocate larger dquots from their private slab in a similar fashion we
      currently allocate inodes.
      Signed-off-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NMark Fasheh <mfasheh@suse.com>
      74f783af
  3. 05 1月, 2009 2 次提交
    • S
      GFS2: Support for FIEMAP ioctl · e9079cce
      Steven Whitehouse 提交于
      This patch implements the FIEMAP ioctl for GFS2. We can use the generic
      code (aside from a lock order issue, solved as per Ted Tso's suggestion)
      for which I've introduced a new variant of the generic function. We also
      have one exception to deal with, namely stuffed files, so we do that
      "by hand", setting all the required flags.
      
      This has been tested with a modified (I could only find an old version) of
      Eric's test program, and appears to work correctly.
      
      This patch does not currently support FIEMAP of xattrs, but the plan is to add
      that feature at some future point.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      Cc: Theodore Tso <tytso@mit.edu>
      Cc: Eric Sandeen <sandeen@redhat.com>
      e9079cce
    • H
      gro: Add page frag support · 5d38a079
      Herbert Xu 提交于
      This patch allows GRO to merge page frags (skb_shinfo(skb)->frags)
      in one skb, rather than using the less efficient frag_list.
      
      It also adds a new interface, napi_gro_frags to allow drivers
      to inject page frags directly into the stack without allocating
      an skb.  This is intended to be the GRO equivalent for LRO's
      lro_receive_frags interface.
      
      The existing GSO interface can already handle page frags with
      or without an appended frag_list so nothing needs to be changed
      there.
      
      The merging itself is rather simple.  We store any new frag entries
      after the last existing entry, without checking whether the first
      new entry can be merged with the last existing entry.  Making this
      check would actually be easy but since no existing driver can
      produce contiguous frags anyway it would just be mental masturbation.
      
      If the total number of entries would exceed the capacity of a
      single skb, we simply resort to using frag_list as we do now.
      Signed-off-by: NHerbert Xu <herbert@gondor.apana.org.au>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      5d38a079