1. 07 1月, 2011 2 次提交
    • N
      fs: change d_compare for rcu-walk · 621e155a
      Nick Piggin 提交于
      Change d_compare so it may be called from lock-free RCU lookups. This
      does put significant restrictions on what may be done from the callback,
      however there don't seem to have been any problems with in-tree fses.
      If some strange use case pops up that _really_ cannot cope with the
      rcu-walk rules, we can just add new rcu-unaware callbacks, which would
      cause name lookup to drop out of rcu-walk mode.
      
      For in-tree filesystems, this is just a mechanical change.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      621e155a
    • N
      fs: change d_delete semantics · fe15ce44
      Nick Piggin 提交于
      Change d_delete from a dentry deletion notification to a dentry caching
      advise, more like ->drop_inode. Require it to be constant and idempotent,
      and not take d_lock. This is how all existing filesystems use the callback
      anyway.
      
      This makes fine grained dentry locking of dput and dentry lru scanning
      much simpler.
      Signed-off-by: NNick Piggin <npiggin@kernel.dk>
      fe15ce44
  2. 05 1月, 2011 1 次提交
  3. 31 12月, 2010 1 次提交
  4. 03 12月, 2010 1 次提交
  5. 02 12月, 2010 1 次提交
    • L
      Call the filesystem back whenever a page is removed from the page cache · 6072d13c
      Linus Torvalds 提交于
      NFS needs to be able to release objects that are stored in the page
      cache once the page itself is no longer visible from the page cache.
      
      This patch adds a callback to the address space operations that allows
      filesystems to perform page cleanups once the page has been removed
      from the page cache.
      
      Original patch by: Linus Torvalds <torvalds@linux-foundation.org>
      [trondmy: cover the cases of invalidate_inode_pages2() and
                truncate_inode_pages()]
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      6072d13c
  6. 19 11月, 2010 1 次提交
  7. 11 11月, 2010 1 次提交
  8. 31 10月, 2010 1 次提交
  9. 28 10月, 2010 4 次提交
  10. 27 10月, 2010 2 次提交
  11. 26 10月, 2010 2 次提交
  12. 25 10月, 2010 1 次提交
  13. 23 10月, 2010 2 次提交
    • L
      Revert "tty: Add a new file /proc/tty/consoles" · 6c2754c2
      Linus Torvalds 提交于
      This reverts commit f4a3e0bc.  Jiri
      Sladby points out that the tty structure we're using may already be
      gone, and Al Viro doesn't hold back in complaining about the random
      loading of 'filp->private_data' which doesn't have to be a pointer at
      all, nor does checking the magic field for TTY_MAGIC prove anything.
      
      Belated review by Al:
      
       "a) global variable depending on stdin of the last opener? Affecting
           output of read(2)? Really?
      
        b) iterator is broken; list should be locked in ->start(), unlocked in
           ->stop() and *NOT* unlocked/relocked in ->next()
      
        c) ->show() ought to do nothing in case of ->device == NULL, instead
           of skipping those in ->next()/->start()
      
        d) regardless of the merits of the bright idea about asterisk at that
           line in output *and* regardless of (a), the implementation is not
           only atrociously ugly, it's actually very likely to be a roothole.
           Verifying that Cthulhu knows what number happens to be address of a
           tty_struct by blindly dereferencing memory at that address...
           Ouch.
      
        Please revert that crap."
      
      And Christoph pipes in and NAK's the approach of walking fd tables etc
      too.  So it's pretty unanimous.
      Noticed-by: NJri Slaby <jslaby@suse.cz>
      Requested-by: NAl Viro <viro@zeniv.linux.org.uk>
      Cc: Greg Kroah-Hartman <gregkh@suse.de>
      Cc: Werner Fink <werner@suse.de>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6c2754c2
    • D
      tty: Add a new file /proc/tty/consoles · f4a3e0bc
      Dr. Werner Fink 提交于
      Add a new file /proc/tty/consoles to be able to determine the registered
      system console lines.  If the reading process holds /dev/console open at
      the regular standard input stream the active device will be marked by an
      asterisk.  Show possible operations and also decode the used flags of
      the listed console lines.
      Signed-off-by: NWerner Fink <werner@suse.de>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      f4a3e0bc
  14. 12 10月, 2010 1 次提交
    • T
      ocfs2: Add a mount option "coherency=*" to handle cluster coherency for O_DIRECT writes. · 7bdb0d18
      Tristan Ye 提交于
      Currently, the default behavior of O_DIRECT writes was allowing
      concurrent writing among nodes to the same file, with no cluster
      coherency guaranteed (no EX lock held).  This can leave stale data in
      the cache for buffered reads on other nodes.
      
      The new mount option introduce a chance to choose two different
      behaviors for O_DIRECT writes:
      
          * coherency=full, as the default value, will disallow
                            concurrent O_DIRECT writes by taking
                            EX locks.
      
          * coherency=buffered, allow concurrent O_DIRECT writes
                                without EX lock among nodes, which
                                gains high performance at risk of
                                getting stale data on other nodes.
      Signed-off-by: NTristan Ye <tristan.ye@oracle.com>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      7bdb0d18
  15. 08 10月, 2010 1 次提交
    • B
      NFS: new idmapper · 955a857e
      Bryan Schumaker 提交于
      This patch creates a new idmapper system that uses the request-key function to
      place a call into userspace to map user and group ids to names.  The old
      idmapper was single threaded, which prevented more than one request from running
      at a single time.  This means that a user would have to wait for an upcall to
      finish before accessing a cached result.
      
      The upcall result is stored on a keyring of type id_resolver.  See the file
      Documentation/filesystems/nfs/idmapper.txt for instructions.
      Signed-off-by: NBryan Schumaker <bjschuma@netapp.com>
      [Trond: fix up the return value of nfs_idmap_lookup_name and clean up code]
      Signed-off-by: NTrond Myklebust <Trond.Myklebust@netapp.com>
      955a857e
  16. 06 10月, 2010 1 次提交
  17. 17 9月, 2010 1 次提交
  18. 14 8月, 2010 1 次提交
  19. 10 8月, 2010 4 次提交
    • D
      oom: deprecate oom_adj tunable · 51b1bd2a
      David Rientjes 提交于
      /proc/pid/oom_adj is now deprecated so that that it may eventually be
      removed.  The target date for removal is August 2012.
      
      A warning will be printed to the kernel log if a task attempts to use this
      interface.  Future warning will be suppressed until the kernel is rebooted
      to prevent spamming the kernel log.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      51b1bd2a
    • D
      oom: badness heuristic rewrite · a63d83f4
      David Rientjes 提交于
      This a complete rewrite of the oom killer's badness() heuristic which is
      used to determine which task to kill in oom conditions.  The goal is to
      make it as simple and predictable as possible so the results are better
      understood and we end up killing the task which will lead to the most
      memory freeing while still respecting the fine-tuning from userspace.
      
      Instead of basing the heuristic on mm->total_vm for each task, the task's
      rss and swap space is used instead.  This is a better indication of the
      amount of memory that will be freeable if the oom killed task is chosen
      and subsequently exits.  This helps specifically in cases where KDE or
      GNOME is chosen for oom kill on desktop systems instead of a memory
      hogging task.
      
      The baseline for the heuristic is a proportion of memory that each task is
      currently using in memory plus swap compared to the amount of "allowable"
      memory.  "Allowable," in this sense, means the system-wide resources for
      unconstrained oom conditions, the set of mempolicy nodes, the mems
      attached to current's cpuset, or a memory controller's limit.  The
      proportion is given on a scale of 0 (never kill) to 1000 (always kill),
      roughly meaning that if a task has a badness() score of 500 that the task
      consumes approximately 50% of allowable memory resident in RAM or in swap
      space.
      
      The proportion is always relative to the amount of "allowable" memory and
      not the total amount of RAM systemwide so that mempolicies and cpusets may
      operate in isolation; they shall not need to know the true size of the
      machine on which they are running if they are bound to a specific set of
      nodes or mems, respectively.
      
      Root tasks are given 3% extra memory just like __vm_enough_memory()
      provides in LSMs.  In the event of two tasks consuming similar amounts of
      memory, it is generally better to save root's task.
      
      Because of the change in the badness() heuristic's baseline, it is also
      necessary to introduce a new user interface to tune it.  It's not possible
      to redefine the meaning of /proc/pid/oom_adj with a new scale since the
      ABI cannot be changed for backward compatability.  Instead, a new tunable,
      /proc/pid/oom_score_adj, is added that ranges from -1000 to +1000.  It may
      be used to polarize the heuristic such that certain tasks are never
      considered for oom kill while others may always be considered.  The value
      is added directly into the badness() score so a value of -500, for
      example, means to discount 50% of its memory consumption in comparison to
      other tasks either on the system, bound to the mempolicy, in the cpuset,
      or sharing the same memory controller.
      
      /proc/pid/oom_adj is changed so that its meaning is rescaled into the
      units used by /proc/pid/oom_score_adj, and vice versa.  Changing one of
      these per-task tunables will rescale the value of the other to an
      equivalent meaning.  Although /proc/pid/oom_adj was originally defined as
      a bitshift on the badness score, it now shares the same linear growth as
      /proc/pid/oom_score_adj but with different granularity.  This is required
      so the ABI is not broken with userspace applications and allows oom_adj to
      be deprecated for future removal.
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      Cc: Nick Piggin <npiggin@suse.de>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a63d83f4
    • A
      update VFS documentation for method changes. · 336fb3b9
      Al Viro 提交于
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      336fb3b9
    • C
      1e231735
  20. 06 8月, 2010 3 次提交
  21. 04 8月, 2010 1 次提交
    • J
      Documentation: update broken web addresses. · 0ea6e611
      Justin P. Mattock 提交于
      Below you will find an updated version from the original series bunching all patches into one big patch
      updating broken web addresses that are located in Documentation/*
      Some of the addresses date as far far back as 1995 etc... so searching became a bit difficult,
      the best way to deal with these is to use web.archive.org to locate these addresses that are outdated.
      Now there are also some addresses pointing to .spec files some are located, but some(after searching
      on the companies site)where still no where to be found. In this case I just changed the address
      to the company site this way the users can contact the company and they can locate them for the users.
      Signed-off-by: NJustin P. Mattock <justinmattock@gmail.com>
      Signed-off-by: NThomas Weber <weber@corscience.de>
      Signed-off-by: NMike Frysinger <vapier.adi@gmail.com>
      Cc: Paulo Marques <pmarques@grupopie.com>
      Cc: Randy Dunlap <rdunlap@xenotime.net>
      Cc: Michael Neuling <mikey@neuling.org>
      Signed-off-by: NJiri Kosina <jkosina@suse.cz>
      0ea6e611
  22. 31 7月, 2010 1 次提交
  23. 27 7月, 2010 1 次提交
  24. 23 7月, 2010 3 次提交
    • R
      nilfs2: add nodiscard mount option · 802d3177
      Ryusuke Konishi 提交于
      Nilfs has "discard" mount option which issues discard/TRIM commands to
      underlying block device, but it lacks a complementary option and has
      no way to disable the feature through remount.
      
      This adds "nodiscard" option to resolve this imbalance.
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      802d3177
    • R
      nilfs2: add barrier mount option · 773bc4f3
      Ryusuke Konishi 提交于
      Nilfs enables write barriers by default and has "nobarrier" mount
      option to disable this feature.  But it lacks the complementary option
      and has no way to re-enable the feature on remount.
      
      This adds "barrier" option to resolve this imbalance.
      Signed-off-by: NRyusuke Konishi <konishi.ryusuke@lab.ntt.co.jp>
      773bc4f3
    • T
      fscache: convert object to use workqueue instead of slow-work · 8b8edefa
      Tejun Heo 提交于
      Make fscache object state transition callbacks use workqueue instead
      of slow-work.  New dedicated unbound CPU workqueue fscache_object_wq
      is created.  get/put callbacks are renamed and modified to take
      @object and called directly from the enqueue wrapper and the work
      function.  While at it, make all open coded instances of get/put to
      use fscache_get/put_object().
      
      * Unbound workqueue is used.
      
      * work_busy() output is printed instead of slow-work flags in object
        debugging outputs.  They mean basically the same thing bit-for-bit.
      
      * sysctl fscache.object_max_active added to control concurrency.  The
        default value is nr_cpus clamped between 4 and
        WQ_UNBOUND_MAX_ACTIVE.
      
      * slow_work_sleep_till_thread_needed() is replaced with fscache
        private implementation fscache_object_sleep_till_congested() which
        waits on fscache_object_wq congestion.
      
      * debugfs support is dropped for now.  Tracing API based debug
        facility is planned to be added.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NDavid Howells <dhowells@redhat.com>
      8b8edefa
  25. 03 6月, 2010 2 次提交