1. 13 5月, 2008 1 次提交
    • M
      fuse: add flag to turn on big writes · 78bb6cb9
      Miklos Szeredi 提交于
      Prior to 2.6.26 fuse only supported single page write requests.  In theory all
      fuse filesystem should be able support bigger than 4k writes, as there's
      nothing in the API to prevent it.  Unfortunately there's a known case in
      NTFS-3G where big writes cause filesystem corruption.  There could also be
      other filesystems, where the lack of testing with big write requests would
      result in bugs.
      
      To prevent such problems on a kernel upgrade, disable big writes by default,
      but let filesystems set a flag to turn it on.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Cc: Szabolcs Szakacsits <szaka@ntfs-3g.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      78bb6cb9
  2. 30 4月, 2008 4 次提交
    • M
      fuse: fix node ID type · b48badf0
      Miklos Szeredi 提交于
      Node ID is 64bit but it is passed as unsigned long to some functions.  This
      breakage wasn't noticed, because libfuse uses unsigned long too.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b48badf0
    • M
      fuse: fix max i/o size calculation · e5d9a0df
      Miklos Szeredi 提交于
      Fix a bug that Werner Baumann reported: fuse can send a bigger write request
      than the maximum specified.  This only affected direct_io operation.
      
      In addition set a sane minimum for the max_read and max_write tunables, so I/O
      always makes some progress.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e5d9a0df
    • M
      fuse: support writable mmap · 3be5a52b
      Miklos Szeredi 提交于
      Quoting Linus (3 years ago, FUSE inclusion discussions):
      
        "User-space filesystems are hard to get right. I'd claim that they
         are almost impossible, unless you limit them somehow (shared
         writable mappings are the nastiest part - if you don't have those,
         you can reasonably limit your problems by limiting the number of
         dirty pages you accept through normal "write()" calls)."
      
      Instead of attempting the impossible, I've just waited for the dirty page
      accounting infrastructure to materialize (thanks to Peter Zijlstra and
      others).  This nicely solved the biggest problem: limiting the number of pages
      used for write caching.
      
      Some small details remained, however, which this largish patch attempts to
      address.  It provides a page writeback implementation for fuse, which is
      completely safe against VM related deadlocks.  Performance may not be very
      good for certain usage patterns, but generally it should be acceptable.
      
      It has been tested extensively with fsx-linux and bash-shared-mapping.
      
      Fuse page writeback design
      --------------------------
      
      fuse_writepage() allocates a new temporary page with GFP_NOFS|__GFP_HIGHMEM.
      It copies the contents of the original page, and queues a WRITE request to the
      userspace filesystem using this temp page.
      
      The writeback is finished instantly from the MM's point of view: the page is
      removed from the radix trees, and the PageDirty and PageWriteback flags are
      cleared.
      
      For the duration of the actual write, the NR_WRITEBACK_TEMP counter is
      incremented.  The per-bdi writeback count is not decremented until the actual
      write completes.
      
      On dirtying the page, fuse waits for a previous write to finish before
      proceeding.  This makes sure, there can only be one temporary page used at a
      time for one cached page.
      
      This approach is wasteful in both memory and CPU bandwidth, so why is this
      complication needed?
      
      The basic problem is that there can be no guarantee about the time in which
      the userspace filesystem will complete a write.  It may be buggy or even
      malicious, and fail to complete WRITE requests.  We don't want unrelated parts
      of the system to grind to a halt in such cases.
      
      Also a filesystem may need additional resources (particularly memory) to
      complete a WRITE request.  There's a great danger of a deadlock if that
      allocation may wait for the writepage to finish.
      
      Currently there are several cases where the kernel can block on page
      writeback:
      
        - allocation order is larger than PAGE_ALLOC_COSTLY_ORDER
        - page migration
        - throttle_vm_writeout (through NR_WRITEBACK)
        - sync(2)
      
      Of course in some cases (fsync, msync) we explicitly want to allow blocking.
      So for these cases new code has to be added to fuse, since the VM is not
      tracking writeback pages for us any more.
      
      As an extra safetly measure, the maximum dirty ratio allocated to a single
      fuse filesystem is set to 1% by default.  This way one (or several) buggy or
      malicious fuse filesystems cannot slow down the rest of the system by hogging
      dirty memory.
      
      With appropriate privileges, this limit can be raised through
      '/sys/class/bdi/<bdi>/max_ratio'.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3be5a52b
    • M
      mm: bdi: expose the BDI object in sysfs for FUSE · b6f2fcbc
      Miklos Szeredi 提交于
      Register FUSE's backing_dev_info under sysfs with the name "fuse-MAJOR:MINOR"
      
      Make the fuse control filesystem use s_dev instead of a fuse specific ID.
      This makes it easier to match directories under /sys/fs/fuse/connections/ with
      directories under /sys/class/bdi, and with actual mounts.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      b6f2fcbc
  3. 25 4月, 2008 1 次提交
  4. 09 2月, 2008 1 次提交
  5. 08 2月, 2008 1 次提交
  6. 07 2月, 2008 1 次提交
  7. 25 1月, 2008 4 次提交
  8. 30 11月, 2007 2 次提交
  9. 19 10月, 2007 6 次提交
  10. 17 10月, 2007 7 次提交
  11. 20 7月, 2007 1 次提交
    • P
      mm: Remove slab destructors from kmem_cache_create(). · 20c2df83
      Paul Mundt 提交于
      Slab destructors were no longer supported after Christoph's
      c59def9f change. They've been
      BUGs for both slab and slub, and slob never supported them
      either.
      
      This rips out support for the dtor pointer from kmem_cache_create()
      completely and fixes up every single callsite in the kernel (there were
      about 224, not including the slab allocator definitions themselves,
      or the documentation references).
      Signed-off-by: NPaul Mundt <lethal@linux-sh.org>
      20c2df83
  12. 17 6月, 2007 1 次提交
  13. 24 5月, 2007 1 次提交
  14. 22 5月, 2007 1 次提交
    • A
      Detach sched.h from mm.h · e8edc6e0
      Alexey Dobriyan 提交于
      First thing mm.h does is including sched.h solely for can_do_mlock() inline
      function which has "current" dereference inside. By dealing with can_do_mlock()
      mm.h can be detached from sched.h which is good. See below, why.
      
      This patch
      a) removes unconditional inclusion of sched.h from mm.h
      b) makes can_do_mlock() normal function in mm/mlock.c
      c) exports can_do_mlock() to not break compilation
      d) adds sched.h inclusions back to files that were getting it indirectly.
      e) adds less bloated headers to some files (asm/signal.h, jiffies.h) that were
         getting them indirectly
      
      Net result is:
      a) mm.h users would get less code to open, read, preprocess, parse, ... if
         they don't need sched.h
      b) sched.h stops being dependency for significant number of files:
         on x86_64 allmodconfig touching sched.h results in recompile of 4083 files,
         after patch it's only 3744 (-8.3%).
      
      Cross-compile tested on
      
      	all arm defconfigs, all mips defconfigs, all powerpc defconfigs,
      	alpha alpha-up
      	arm
      	i386 i386-up i386-defconfig i386-allnoconfig
      	ia64 ia64-up
      	m68k
      	mips
      	parisc parisc-up
      	powerpc powerpc-up
      	s390 s390-up
      	sparc sparc-up
      	sparc64 sparc64-up
      	um-x86_64
      	x86_64 x86_64-up x86_64-defconfig x86_64-allnoconfig
      
      as well as my two usual configs.
      Signed-off-by: NAlexey Dobriyan <adobriyan@gmail.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      e8edc6e0
  15. 17 5月, 2007 1 次提交
    • C
      Remove SLAB_CTOR_CONSTRUCTOR · a35afb83
      Christoph Lameter 提交于
      SLAB_CTOR_CONSTRUCTOR is always specified. No point in checking it.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Cc: David Howells <dhowells@redhat.com>
      Cc: Jens Axboe <jens.axboe@oracle.com>
      Cc: Steven French <sfrench@us.ibm.com>
      Cc: Michael Halcrow <mhalcrow@us.ibm.com>
      Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Steven Whitehouse <swhiteho@redhat.com>
      Cc: Roman Zippel <zippel@linux-m68k.org>
      Cc: David Woodhouse <dwmw2@infradead.org>
      Cc: Dave Kleikamp <shaggy@austin.ibm.com>
      Cc: Trond Myklebust <trond.myklebust@fys.uio.no>
      Cc: "J. Bruce Fields" <bfields@fieldses.org>
      Cc: Anton Altaparmakov <aia21@cantab.net>
      Cc: Mark Fasheh <mark.fasheh@oracle.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Jan Kara <jack@ucw.cz>
      Cc: David Chinner <dgc@sgi.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a35afb83
  16. 09 5月, 2007 1 次提交
    • M
      add filesystem subtype support · 79c0b2df
      Miklos Szeredi 提交于
      There's a slight problem with filesystem type representation in fuse
      based filesystems.
      
      From the kernel's view, there are just two filesystem types: fuse and
      fuseblk.  From the user's view there are lots of different filesystem
      types.  The user is not even much concerned if the filesystem is fuse based
      or not.  So there's a conflict of interest in how this should be
      represented in fstab, mtab and /proc/mounts.
      
      The current scheme is to encode the real filesystem type in the mount
      source.  So an sshfs mount looks like this:
      
        sshfs#user@server:/   /mnt/server    fuse   rw,nosuid,nodev,...
      
      This url-ish syntax works OK for sshfs and similar filesystems.  However
      for block device based filesystems (ntfs-3g, zfs) it doesn't work, since
      the kernel expects the mount source to be a real device name.
      
      A possibly better scheme would be to encode the real type in the type
      field as "type.subtype".  So fuse mounts would look like this:
      
        /dev/hda1       /mnt/windows   fuseblk.ntfs-3g   rw,...
        user@server:/   /mnt/server    fuse.sshfs        rw,nosuid,nodev,...
      
      This patch adds the necessary code to the kernel so that this can be
      correctly displayed in /proc/mounts.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      79c0b2df
  17. 08 5月, 2007 1 次提交
    • C
      slab allocators: Remove SLAB_DEBUG_INITIAL flag · 50953fe9
      Christoph Lameter 提交于
      I have never seen a use of SLAB_DEBUG_INITIAL.  It is only supported by
      SLAB.
      
      I think its purpose was to have a callback after an object has been freed
      to verify that the state is the constructor state again?  The callback is
      performed before each freeing of an object.
      
      I would think that it is much easier to check the object state manually
      before the free.  That also places the check near the code object
      manipulation of the object.
      
      Also the SLAB_DEBUG_INITIAL callback is only performed if the kernel was
      compiled with SLAB debugging on.  If there would be code in a constructor
      handling SLAB_DEBUG_INITIAL then it would have to be conditional on
      SLAB_DEBUG otherwise it would just be dead code.  But there is no such code
      in the kernel.  I think SLUB_DEBUG_INITIAL is too problematic to make real
      use of, difficult to understand and there are easier ways to accomplish the
      same effect (i.e.  add debug code before kfree).
      
      There is a related flag SLAB_CTOR_VERIFY that is frequently checked to be
      clear in fs inode caches.  Remove the pointless checks (they would even be
      pointless without removeal of SLAB_DEBUG_INITIAL) from the fs constructors.
      
      This is the last slab flag that SLUB did not support.  Remove the check for
      unimplemented flags from SLUB.
      Signed-off-by: NChristoph Lameter <clameter@sgi.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      50953fe9
  18. 03 5月, 2007 1 次提交
  19. 09 4月, 2007 1 次提交
  20. 13 2月, 2007 1 次提交
  21. 12 2月, 2007 1 次提交
  22. 08 12月, 2006 1 次提交