1. 17 10月, 2007 1 次提交
  2. 13 10月, 2007 1 次提交
    • A
      NTFS: Fix a mount time deadlock. · bfab36e8
      Anton Altaparmakov 提交于
      Big thanks go to Mathias Kolehmainen for reporting the bug, providing
      debug output and testing the patches I sent him to get it working.
      
      The fix was to stop calling ntfs_attr_set() at mount time as that causes
      balance_dirty_pages_ratelimited() to be called which on systems with
      little memory actually tries to go and balance the dirty pages which tries
      to take the s_umount semaphore but because we are still in fill_super()
      across which the VFS holds s_umount for writing this results in a
      deadlock.
      
      We now do the dirty work by hand by submitting individual buffers.  This
      has the annoying "feature" that mounting can take a few seconds if the
      journal is large as we have clear it all.  One day someone should improve
      on this by deferring the journal clearing to a helper kernel thread so it
      can be done in the background but I don't have time for this at the moment
      and the current solution works fine so I am leaving it like this for now.
      Signed-off-by: NAnton Altaparmakov <aia21@cantab.net>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bfab36e8
  3. 10 10月, 2007 3 次提交
  4. 12 9月, 2007 2 次提交
  5. 23 8月, 2007 1 次提交
  6. 01 8月, 2007 1 次提交
  7. 20 7月, 2007 6 次提交
    • Y
      some kmalloc/memset ->kzalloc (tree wide) · dd00cc48
      Yoann Padioleau 提交于
      Transform some calls to kmalloc/memset to a single kzalloc (or kcalloc).
      
      Here is a short excerpt of the semantic patch performing
      this transformation:
      
      @@
      type T2;
      expression x;
      identifier f,fld;
      expression E;
      expression E1,E2;
      expression e1,e2,e3,y;
      statement S;
      @@
      
       x =
      - kmalloc
      + kzalloc
        (E1,E2)
        ...  when != \(x->fld=E;\|y=f(...,x,...);\|f(...,x,...);\|x=E;\|while(...) S\|for(e1;e2;e3) S\)
      - memset((T2)x,0,E1);
      
      @@
      expression E1,E2,E3;
      @@
      
      - kzalloc(E1 * E2,E3)
      + kcalloc(E1,E2,E3)
      
      [akpm@linux-foundation.org: get kcalloc args the right way around]
      Signed-off-by: NYoann Padioleau <padator@wanadoo.fr>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Acked-by: NRussell King <rmk@arm.linux.org.uk>
      Cc: Bryan Wu <bryan.wu@analog.com>
      Acked-by: NJiri Slaby <jirislaby@gmail.com>
      Cc: Dave Airlie <airlied@linux.ie>
      Acked-by: NRoland Dreier <rolandd@cisco.com>
      Cc: Jiri Kosina <jkosina@suse.cz>
      Acked-by: NDmitry Torokhov <dtor@mail.ru>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Acked-by: NMauro Carvalho Chehab <mchehab@infradead.org>
      Acked-by: NPierre Ossman <drzeus-list@drzeus.cx>
      Cc: Jeff Garzik <jeff@garzik.org>
      Cc: "David S. Miller" <davem@davemloft.net>
      Acked-by: NGreg KH <greg@kroah.com>
      Cc: James Bottomley <James.Bottomley@steeleye.com>
      Cc: "Antonino A. Daplas" <adaplas@pol.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dd00cc48
    • K
      coredump masking: documentation for /proc/pid/coredump_filter · bb90110d
      Kawai, Hidehiro 提交于
      This patch adds the documentation for /proc/<pid>/coredump_filter.
      Signed-off-by: NHidehiro Kawai <hidehiro.kawai.ez@hitachi.com>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: "Randy.Dunlap" <rdunlap@xenotime.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bb90110d
    • P
      audit: rework execve audit · bdf4c48a
      Peter Zijlstra 提交于
      The purpose of audit_bprm() is to log the argv array to a userspace daemon at
      the end of the execve system call.  Since user-space hasn't had time to run,
      this array is still in pristine state on the process' stack; so no need to
      copy it, we can just grab it from there.
      
      In order to minimize the damage to audit_log_*() copy each string into a
      temporary kernel buffer first.
      
      Currently the audit code requires that the full argument vector fits in a
      single packet.  So currently it does clip the argv size to a (sysctl) limit,
      but only when execve auditing is enabled.
      
      If the audit protocol gets extended to allow for multiple packets this check
      can be removed.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NOllie Wild <aaw@google.com>
      Cc: <linux-audit@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bdf4c48a
    • N
      mm: fault feedback #1 · d0217ac0
      Nick Piggin 提交于
      Change ->fault prototype.  We now return an int, which contains
      VM_FAULT_xxx code in the low byte, and FAULT_RET_xxx code in the next byte.
       FAULT_RET_ code tells the VM whether a page was found, whether it has been
      locked, and potentially other things.  This is not quite the way he wanted
      it yet, but that's changed in the next patch (which requires changes to
      arch code).
      
      This means we no longer set VM_CAN_INVALIDATE in the vma in order to say
      that a page is locked which requires filemap_nopage to go away (because we
      can no longer remain backward compatible without that flag), but we were
      going to do that anyway.
      
      struct fault_data is renamed to struct vm_fault as Linus asked. address
      is now a void __user * that we should firmly encourage drivers not to use
      without really good reason.
      
      The page is now returned via a page pointer in the vm_fault struct.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d0217ac0
    • M
      Document ->page_mkwrite() locking · ed2f2f9b
      Mark Fasheh 提交于
      There seems to be very little documentation about this callback in general.
      The locking in particular is a bit tricky, so it's worth having this in
      writing.
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: David Howells <dhowells@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ed2f2f9b
    • N
      mm: merge populate and nopage into fault (fixes nonlinear) · 54cb8821
      Nick Piggin 提交于
      Nonlinear mappings are (AFAIKS) simply a virtual memory concept that encodes
      the virtual address -> file offset differently from linear mappings.
      
      ->populate is a layering violation because the filesystem/pagecache code
      should need to know anything about the virtual memory mapping.  The hitch here
      is that the ->nopage handler didn't pass down enough information (ie.  pgoff).
       But it is more logical to pass pgoff rather than have the ->nopage function
      calculate it itself anyway (because that's a similar layering violation).
      
      Having the populate handler install the pte itself is likewise a nasty thing
      to be doing.
      
      This patch introduces a new fault handler that replaces ->nopage and
      ->populate and (later) ->nopfn.  Most of the old mechanism is still in place
      so there is a lot of duplication and nice cleanups that can be removed if
      everyone switches over.
      
      The rationale for doing this in the first place is that nonlinear mappings are
      subject to the pagefault vs invalidate/truncate race too, and it seemed stupid
      to duplicate the synchronisation logic rather than just consolidate the two.
      
      After this patch, MAP_NONBLOCK no longer sets up ptes for pages present in
      pagecache.  Seems like a fringe functionality anyway.
      
      NOPAGE_REFAULT is removed.  This should be implemented with ->fault, and no
      users have hit mainline yet.
      
      [akpm@linux-foundation.org: cleanup]
      [randy.dunlap@oracle.com: doc. fixes for readahead]
      [akpm@linux-foundation.org: build fix]
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Cc: Mark Fasheh <mark.fasheh@oracle.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      54cb8821
  8. 18 7月, 2007 2 次提交
  9. 17 7月, 2007 3 次提交
  10. 11 7月, 2007 3 次提交
    • J
      configfs: config item dependancies. · 631d1feb
      Joel Becker 提交于
      Sometimes other drivers depend on particular configfs items.  For
      example, ocfs2 mounts depend on a heartbeat region item.  If that
      region item is removed with rmdir(2), the ocfs2 mount must BUG or go
      readonly.  Not happy.
      
      This provides two additional API calls: configfs_depend_item() and
      configfs_undepend_item().  A client driver can call
      configfs_depend_item() on an existing item to tell configfs that it is
      depended on.  configfs will then return -EBUSY from rmdir(2) for that
      item.  When the item is no longer depended on, the client driver calls
      configfs_undepend_item() on it.
      
      These API cannot be called underneath any configfs callbacks, as
      they will conflict.  They can block and allocate.  A client driver
      probably shouldn't calling them of its own gumption.  Rather it should
      be providing an API that external subsystems call.
      
      How does this work?  Imagine the ocfs2 mount process.  When it mounts,
      it asks for a heart region item.  This is done via a call into the
      heartbeat code.  Inside the heartbeat code, the region item is looked
      up.  Here, the heartbeat code calls configfs_depend_item().  If it
      succeeds, then heartbeat knows the region is safe to give to ocfs2.
      If it fails, it was being torn down anyway, and heartbeat can gracefully
      pass up an error.
      
      [ Fixed some bad whitespace in configfs.txt. --Mark ]
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      631d1feb
    • J
      configfs: accessing item hierarchy during rmdir(2) · 299894cc
      Joel Becker 提交于
      Add a notification callback, ops->disconnect_notify(). It has the same
      prototype as ->drop_item(), but it will be called just before the item
      linkage is broken. This way, configfs users who want to do work while
      the object is still in the heirarchy have a chance.
      
      Client drivers will still need to config_item_put() in their
      ->drop_item(), if they implement it.  They need do nothing in
      ->disconnect_notify().  They don't have to provide it if they don't
      care.  But someone who wants to be notified before ci_parent is set to
      NULL can now be notified.
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      299894cc
    • J
      configfs: Convert subsystem semaphore to mutex · e6bd07ae
      Joel Becker 提交于
      Convert the su_sem member of struct configfs_subsystem to a struct
      mutex, as that's what it is. Also convert all the users and update
      Documentation/configfs.txt and Documentation/configfs_example.c
      accordingly.
      
      [ Conflict in fs/dlm/config.c with commit
        3168b078 manually resolved. --Mark ]
      Inspired-by: NSatyam Sharma <ssatyam@cse.iitk.ac.in>
      Signed-off-by: NJoel Becker <joel.becker@oracle.com>
      Signed-off-by: NMark Fasheh <mark.fasheh@oracle.com>
      e6bd07ae
  11. 09 6月, 2007 1 次提交
  12. 25 5月, 2007 1 次提交
  13. 09 5月, 2007 6 次提交
  14. 08 5月, 2007 1 次提交
    • D
      smaps: add clear_refs file to clear reference · b813e931
      David Rientjes 提交于
      Adds /proc/pid/clear_refs.  When any non-zero number is written to this file,
      pte_mkold() and ClearPageReferenced() is called for each pte and its
      corresponding page, respectively, in that task's VMAs.  This file is only
      writable by the user who owns the task.
      
      It is now possible to measure _approximately_ how much memory a task is using
      by clearing the reference bits with
      
      	echo 1 > /proc/pid/clear_refs
      
      and checking the reference count for each VMA from the /proc/pid/smaps output
      at a measured time interval.  For example, to observe the approximate change
      in memory footprint for a task, write a script that clears the references
      (echo 1 > /proc/pid/clear_refs), sleeps, and then greps for Pgs_Referenced and
      extracts the size in kB.  Add the sizes for each VMA together for the total
      referenced footprint.  Moments later, repeat the process and observe the
      difference.
      
      For example, using an efficient Mozilla:
      
      	accumulated time		referenced memory
      	----------------		-----------------
      		 0 s				 408 kB
      		 1 s				 408 kB
      		 2 s				 556 kB
      		 3 s				1028 kB
      		 4 s				 872 kB
      		 5 s				1956 kB
      		 6 s				 416 kB
      		 7 s				1560 kB
      		 8 s				2336 kB
      		 9 s				1044 kB
      		10 s				 416 kB
      
      This is a valuable tool to get an approximate measurement of the memory
      footprint for a task.
      
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Paul Mundt <lethal@linux-sh.org>
      Cc: Christoph Lameter <clameter@sgi.com>
      Signed-off-by: NDavid Rientjes <rientjes@google.com>
      [akpm@linux-foundation.org: build fixes]
      [mpm@selenic.com: rename for_each_pmd]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b813e931
  15. 27 4月, 2007 1 次提交
  16. 26 4月, 2007 1 次提交
  17. 10 3月, 2007 1 次提交
  18. 05 3月, 2007 1 次提交
  19. 21 2月, 2007 1 次提交
    • N
      [PATCH] fs: fix libfs data leak · 955eff5a
      Nick Piggin 提交于
      simple_prepare_write leaks uninitialised kernel data.  This happens because
      the it leaves an uninitialised "hole" over the part of the page that the
      write is expected to go to.  This is fine, but it then marks the page
      uptodate, which means a concurrent read can come in and copy the
      uninitialised memory into userspace before it written to.
      
      Fix it by simply marking it uptodate in simple_commit_write instead, after
      the hole has been filled in.  This could theoretically break an fs that
      uses simple_prepare_write and not simple_commit_write, and that relies on
      the incorrect simple_prepare_write behaviour.  Luckily, none of those
      exists in the tree.
      Signed-off-by: NNick Piggin <npiggin@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      955eff5a
  20. 19 2月, 2007 1 次提交
    • E
      9p: implement optional loose read cache · e03abc0c
      Eric Van Hensbergen 提交于
      While cacheing is generally frowned upon in the 9p world, it has its
      place -- particularly in situations where the remote file system is
      exclusive and/or read-only.  The vacfs views of venti content addressable
      store are a real-world instance of such a situation.  To facilitate higher
      performance for these workloads (and eventually use the fscache patches),
      we have enabled a "loose" cache mode which does not attempt to maintain
      any form of consistency on the page-cache or dcache.  This results in over
      two orders of magnitude performance improvement for cacheable block reads
      in the Bonnie benchmark.  The more aggressive use of the dcache also seems
      to improve metadata operational performance.
      Signed-off-by: NEric Van Hensbergen <ericvh@gmail.com>
      e03abc0c
  21. 18 2月, 2007 1 次提交
  22. 13 2月, 2007 1 次提交
    • E
      [PATCH] ufs2 write: mount as rw · cbcae39f
      Evgeniy Dushistov 提交于
      These series of patches add UFS2 write-support.  UFS2 - is default file system
      for recent versions of FreeBSD.
      
      The main differences from UFS1 from write support point of view
      are:
      1)Not all inodes are allocated during formatation of disk.
      2)All meta-data(pointer to data blocks) are 64bit(in UFS1 they
      are 32bit).
      
      So patch series consist of
      1)make possible mount UFS2 in read-write mode
      2)code to write ufs2 inodes and code to initialize inodes chunks.
      3)work with 64bit meta-data
      
      I made simple testing like create/deleting/writing/reading/truncating, also I
      ran fsx-linux and untar and build kernel on UFS1 and UFS2, after that FreeBSD
      fsck do not find any errors in fs.
      
      This patch makes possible to mount ufs2 "rw", and updates UFS2 documentation:
      remove note about bug(it fixed by reallocate blocks on the fly patch) and add
      me in the list of people who want receive bug reports.
      Signed-off-by: NEvgeniy Dushistov <dushistov@mail.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cbcae39f