1. 12 7月, 2010 2 次提交
    • M
      fuse: add store request · a1d75f25
      Miklos Szeredi 提交于
      Userspace filesystem can request data to be stored in the inode's
      mapping.  This request is synchronous and has no reply.  If the write
      to the fuse device returns an error then the store request was not
      fully completed (but may have updated some pages).
      
      If the stored data overflows the current file size, then the size is
      extended, similarly to a write(2) on the filesystem.
      
      Pages which have been completely stored are marked uptodate.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      a1d75f25
    • M
      fuse: don't use atomic kmap · 7909b1c6
      Miklos Szeredi 提交于
      Don't use atomic kmap for mapping userspace buffers in device
      read/write/splice.
      
      This is necessary because the next patch (adding store notify)
      requires that caller of fuse_copy_page() may sleep between
      invocations.  The simplest way to ensure this is to change the atomic
      kmaps to non-atomic ones.
      
      Thankfully architectures where kmap() is not a no-op are going out of
      fashion, so we can ignore the (probably negligible) performance impact
      of this change.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      7909b1c6
  2. 28 5月, 2010 1 次提交
  3. 26 5月, 2010 1 次提交
    • K
      driver core: add devname module aliases to allow module on-demand auto-loading · 578454ff
      Kay Sievers 提交于
      This adds:
        alias: devname:<name>
      to some common kernel modules, which will allow the on-demand loading
      of the kernel module when the device node is accessed.
      
      Ideally all these modules would be compiled-in, but distros seems too
      much in love with their modularization that we need to cover the common
      cases with this new facility. It will allow us to remove a bunch of pretty
      useless init scripts and modprobes from init scripts.
      
      The static device node aliases will be carried in the module itself. The
      program depmod will extract this information to a file in the module directory:
        $ cat /lib/modules/2.6.34-00650-g537b60d1-dirty/modules.devname
        # Device nodes to trigger on-demand module loading.
        microcode cpu/microcode c10:184
        fuse fuse c10:229
        ppp_generic ppp c108:0
        tun net/tun c10:200
        dm_mod mapper/control c10:235
      
      Udev will pick up the depmod created file on startup and create all the
      static device nodes which the kernel modules specify, so that these modules
      get automatically loaded when the device node is accessed:
        $ /sbin/udevd --debug
        ...
        static_dev_create_from_modules: mknod '/dev/cpu/microcode' c10:184
        static_dev_create_from_modules: mknod '/dev/fuse' c10:229
        static_dev_create_from_modules: mknod '/dev/ppp' c108:0
        static_dev_create_from_modules: mknod '/dev/net/tun' c10:200
        static_dev_create_from_modules: mknod '/dev/mapper/control' c10:235
        udev_rules_apply_static_dev_perms: chmod '/dev/net/tun' 0666
        udev_rules_apply_static_dev_perms: chmod '/dev/fuse' 0666
      
      A few device nodes are switched to statically allocated numbers, to allow
      the static nodes to work. This might also useful for systems which still run
      a plain static /dev, which is completely unsafe to use with any dynamic minor
      numbers.
      
      Note:
      The devname aliases must be limited to the *common* and *single*instance*
      device nodes, like the misc devices, and never be used for conceptually limited
      systems like the loop devices, which should rather get fixed properly and get a
      control node for losetup to talk to, instead of creating a random number of
      device nodes in advance, regardless if they are ever used.
      
      This facility is to hide the mess distros are creating with too modualized
      kernels, and just to hide that these modules are not compiled-in, and not to
      paper-over broken concepts. Thanks! :)
      
      Cc: Greg Kroah-Hartman <gregkh@suse.de>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Chris Mason <chris.mason@oracle.com>
      Cc: Alasdair G Kergon <agk@redhat.com>
      Cc: Tigran Aivazian <tigran@aivazian.fsnet.co.uk>
      Cc: Ian Kent <raven@themaw.net>
      Signed-Off-By: NKay Sievers <kay.sievers@vrfy.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@suse.de>
      578454ff
  4. 25 5月, 2010 6 次提交
    • M
      fuse: support splice() reading from fuse device · c3021629
      Miklos Szeredi 提交于
      Allow userspace filesystem implementation to use splice() to read from
      the fuse device.
      
      The userspace filesystem can now transfer data coming from a WRITE
      request to an arbitrary file descriptor (regular file, block device or
      socket) without having to go through a userspace buffer.
      
      The semantics of using splice() to read messages are:
      
       1)  with a single splice() call move the whole message from the fuse
           device to a temporary pipe
       2)  read the header from the pipe and determine the message type
       3a) if message is a WRITE then splice data from pipe to destination
       3b) else read rest of message to userspace buffer
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      c3021629
    • M
      fuse: allow splice to move pages · ce534fb0
      Miklos Szeredi 提交于
      When splicing buffers to the fuse device with SPLICE_F_MOVE, try to
      move pages from the pipe buffer into the page cache.  This allows
      populating the fuse filesystem's cache without ever touching the page
      contents, i.e. zero copy read capability.
      
      The following steps are performed when trying to move a page into the
      page cache:
      
       - buf->ops->confirm() to make sure the new page is uptodate
       - buf->ops->steal() to try to remove the new page from it's previous place
       - remove_from_page_cache() on the old page
       - add_to_page_cache_locked() on the new page
      
      If any of the above steps fail (non fatally) then the code falls back
      to copying the page.  In particular ->steal() will fail if there are
      external references (other than the page cache and the pipe buffer) to
      the page.
      
      Also since the remove_from_page_cache() + add_to_page_cache_locked()
      are non-atomic it is possible that the page cache is repopulated in
      between the two and add_to_page_cache_locked() will fail.  This could
      be fixed by creating a new atomic replace_page_cache_page() function.
      
      fuse_readpages_end() needed to be reworked so it works even if
      page->mapping is NULL for some or all pages which can happen if the
      add_to_page_cache_locked() failed.
      
      A number of sanity checks were added to make sure the stolen pages
      don't have weird flags set, etc...  These could be moved into generic
      splice/steal code.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      ce534fb0
    • M
      fuse: support splice() writing to fuse device · dd3bb14f
      Miklos Szeredi 提交于
      Allow userspace filesystem implementation to use splice() to write to
      the fuse device.  The semantics of using splice() are:
      
       1) buffer the message header and data in a temporary pipe
       2) with a *single* splice() call move the message from the temporary pipe
          to the fuse device
      
      The READ reply message has the most interesting use for this, since
      now the data from an arbitrary file descriptor (which could be a
      regular file, a block device or a socket) can be tranferred into the
      fuse device without having to go through a userspace buffer.  It will
      also allow zero copy moving of pages.
      
      One caveat is that the protocol on the fuse device requires the length
      of the whole message to be written into the header.  But the length of
      the data transferred into the temporary pipe may not be known in
      advance.  The current library implementation works around this by
      using vmplice to write the header and modifying the header after
      splicing the data into the pipe (error handling omitted):
      
      	struct fuse_out_header out;
      
      	iov.iov_base = &out;
      	iov.iov_len = sizeof(struct fuse_out_header);
      	vmsplice(pip[1], &iov, 1, 0);
      	len = splice(input_fd, input_offset, pip[1], NULL, len, 0);
      	/* retrospectively modify the header: */
      	out.len = len + sizeof(struct fuse_out_header);
      	splice(pip[0], NULL, fuse_chan_fd(req->ch), NULL, out.len, flags);
      
      This works since vmsplice only saves a pointer to the data, it does
      not copy the data itself.
      
      Since pipes are currently limited to 16 pages and messages need to be
      spliced atomically, the length of the data is limited to 15 pages (or
      60kB for 4k pages).
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      dd3bb14f
    • M
      fuse: get page reference for readpages · b5dd3285
      Miklos Szeredi 提交于
      Acquire a page ref on pages in ->readpages() and release them when the
      read has finished.  Not acquiring a reference didn't seem to cause any
      trouble since the page is locked and will not be kicked out of the
      page cache during the read.
      
      However the following patches will want to remove the page from the
      cache so a separate ref is needed.  Making the reference in req->pages
      explicit also makes the code easier to understand.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      b5dd3285
    • M
      fuse: use get_user_pages_fast() · 1bf94ca7
      Miklos Szeredi 提交于
      Replace uses of get_user_pages() with get_user_pages_fast().  It looks
      nicer and should be faster in most cases.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      1bf94ca7
    • D
      fuse: remove unneeded variable · 4aa0edd2
      Dan Carpenter 提交于
      "map" isn't needed any more after: 0bd87182 "fuse: fix kunmap in
      fuse_ioctl_copy_user" 
      Signed-off-by: NDan Carpenter <error27@gmail.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      4aa0edd2
  5. 30 3月, 2010 1 次提交
    • T
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking... · 5a0e3ad6
      Tejun Heo 提交于
      include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h
      
      percpu.h is included by sched.h and module.h and thus ends up being
      included when building most .c files.  percpu.h includes slab.h which
      in turn includes gfp.h making everything defined by the two files
      universally available and complicating inclusion dependencies.
      
      percpu.h -> slab.h dependency is about to be removed.  Prepare for
      this change by updating users of gfp and slab facilities include those
      headers directly instead of assuming availability.  As this conversion
      needs to touch large number of source files, the following script is
      used as the basis of conversion.
      
        http://userweb.kernel.org/~tj/misc/slabh-sweep.py
      
      The script does the followings.
      
      * Scan files for gfp and slab usages and update includes such that
        only the necessary includes are there.  ie. if only gfp is used,
        gfp.h, if slab is used, slab.h.
      
      * When the script inserts a new include, it looks at the include
        blocks and try to put the new include such that its order conforms
        to its surrounding.  It's put in the include block which contains
        core kernel includes, in the same order that the rest are ordered -
        alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
        doesn't seem to be any matching order.
      
      * If the script can't find a place to put a new include (mostly
        because the file doesn't have fitting include block), it prints out
        an error message indicating which .h file needs to be added to the
        file.
      
      The conversion was done in the following steps.
      
      1. The initial automatic conversion of all .c files updated slightly
         over 4000 files, deleting around 700 includes and adding ~480 gfp.h
         and ~3000 slab.h inclusions.  The script emitted errors for ~400
         files.
      
      2. Each error was manually checked.  Some didn't need the inclusion,
         some needed manual addition while adding it to implementation .h or
         embedding .c file was more appropriate for others.  This step added
         inclusions to around 150 files.
      
      3. The script was run again and the output was compared to the edits
         from #2 to make sure no file was left behind.
      
      4. Several build tests were done and a couple of problems were fixed.
         e.g. lib/decompress_*.c used malloc/free() wrappers around slab
         APIs requiring slab.h to be added manually.
      
      5. The script was run on all .h files but without automatically
         editing them as sprinkling gfp.h and slab.h inclusions around .h
         files could easily lead to inclusion dependency hell.  Most gfp.h
         inclusion directives were ignored as stuff from gfp.h was usually
         wildly available and often used in preprocessor macros.  Each
         slab.h inclusion directive was examined and added manually as
         necessary.
      
      6. percpu.h was updated not to include slab.h.
      
      7. Build test were done on the following configurations and failures
         were fixed.  CONFIG_GCOV_KERNEL was turned off for all tests (as my
         distributed build env didn't work with gcov compiles) and a few
         more options had to be turned off depending on archs to make things
         build (like ipr on powerpc/64 which failed due to missing writeq).
      
         * x86 and x86_64 UP and SMP allmodconfig and a custom test config.
         * powerpc and powerpc64 SMP allmodconfig
         * sparc and sparc64 SMP allmodconfig
         * ia64 SMP allmodconfig
         * s390 SMP allmodconfig
         * alpha SMP allmodconfig
         * um on x86_64 SMP allmodconfig
      
      8. percpu.h modifications were reverted so that it could be applied as
         a separate patch and serve as bisection point.
      
      Given the fact that I had only a couple of failures from tests on step
      6, I'm fairly confident about the coverage of this conversion patch.
      If there is a breakage, it's likely to be something in one of the arch
      headers which should be easily discoverable easily on most builds of
      the specific arch.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Guess-its-ok-by: NChristoph Lameter <cl@linux-foundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
      5a0e3ad6
  6. 09 2月, 2010 1 次提交
  7. 05 2月, 2010 2 次提交
  8. 03 2月, 2010 1 次提交
    • A
      mm: flush dcache before writing into page to avoid alias · 931e80e4
      anfei zhou 提交于
      The cache alias problem will happen if the changes of user shared mapping
      is not flushed before copying, then user and kernel mapping may be mapped
      into two different cache line, it is impossible to guarantee the coherence
      after iov_iter_copy_from_user_atomic.  So the right steps should be:
      
      	flush_dcache_page(page);
      	kmap_atomic(page);
      	write to page;
      	kunmap_atomic(page);
      	flush_dcache_page(page);
      
      More precisely, we might create two new APIs flush_dcache_user_page and
      flush_dcache_kern_page to replace the two flush_dcache_page accordingly.
      
      Here is a snippet tested on omap2430 with VIPT cache, and I think it is
      not ARM-specific:
      
      	int val = 0x11111111;
      	fd = open("abc", O_RDWR);
      	addr = mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
      	*(addr+0) = 0x44444444;
      	tmp = *(addr+0);
      	*(addr+1) = 0x77777777;
      	write(fd, &val, sizeof(int));
      	close(fd);
      
      The results are not always 0x11111111 0x77777777 at the beginning as expected.  Sometimes we see 0x44444444 0x77777777.
      Signed-off-by: NAnfei <anfei.zhou@gmail.com>
      Cc: Russell King <rmk@arm.linux.org.uk>
      Cc: Miklos Szeredi <miklos@szeredi.hu>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Cc: <linux-arch@vger.kernel.org>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      931e80e4
  9. 27 11月, 2009 1 次提交
    • C
      fuse: reject O_DIRECT flag also in fuse_create · 1b732396
      Csaba Henk 提交于
      The comment in fuse_open about O_DIRECT:
      
        "VFS checks this, but only _after_ ->open()"
      
      also holds for fuse_create, however, the same kind of check was missing there.
      
      As an impact of this bug, open(newfile, O_RDWR|O_CREAT|O_DIRECT) fails, but a
      stub newfile will remain if the fuse server handled the implied FUSE_CREATE
      request appropriately.
      
      Other impact: in the above situation ima_file_free() will complain to open/free
      imbalance if CONFIG_IMA is set.
      Signed-off-by: NCsaba Henk <csaba@gluster.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      Cc: Harshavardhana <harsha@gluster.com>
      Cc: stable@kernel.org
      1b732396
  10. 04 11月, 2009 3 次提交
  11. 28 9月, 2009 1 次提交
  12. 24 9月, 2009 1 次提交
  13. 16 9月, 2009 4 次提交
  14. 11 9月, 2009 1 次提交
  15. 12 7月, 2009 1 次提交
  16. 11 7月, 2009 2 次提交
  17. 07 7月, 2009 1 次提交
  18. 01 7月, 2009 4 次提交
    • J
      fuse: invalidation reverse calls · 3b463ae0
      John Muir 提交于
      Add notification messages that allow the filesystem to invalidate VFS
      caches.
      
      Two notifications are added:
      
       1) inode invalidation
      
         - invalidate cached attributes
         - invalidate a range of pages in the page cache (this is optional)
      
       2) dentry invalidation
      
         - try to invalidate a subtree in the dentry cache
      
      Care must be taken while accessing the 'struct super_block' for the
      mount, as it can go away while an invalidation is in progress.  To
      prevent this, introduce a rw-semaphore, that is taken for read during
      the invalidation and taken for write in the ->kill_sb callback.
      
      Cc: Csaba Henk <csaba@gluster.com>
      Cc: Anand Avati <avati@zresearch.com>
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      3b463ae0
    • M
      fuse: allow umask processing in userspace · e0a43ddc
      Miklos Szeredi 提交于
      This patch lets filesystems handle masking the file mode on creation.
      This is needed if filesystem is using ACLs.
      
       - The CREATE, MKDIR and MKNOD requests are extended with a "umask"
         parameter.
      
       - A new FUSE_DONT_MASK flag is added to the INIT request/reply.  With
         this the filesystem may request that the create mode is not masked.
      
      CC: Jean-Pierre André <jean-pierre.andre@wanadoo.fr>
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      e0a43ddc
    • M
      fuse: fix bad return value in fuse_file_poll() · 201fa69a
      Miklos Szeredi 提交于
      Fix fuse_file_poll() which returned a -errno value instead of a poll
      mask.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      CC: stable@kernel.org
      201fa69a
    • C
      fuse: fix return value of fuse_dev_write() · b4c458b3
      Csaba Henk 提交于
      On 64 bit systems -- where sizeof(ssize_t) > sizeof(int) -- the following test
      exposes a bug due to a non-careful return of an int or unsigned value:
      
      implement a FUSE filesystem which sends an unsolicited notification to
      the kernel with invalid opcode. The respective write to /dev/fuse
      will return (1 << 32) - EINVAL with errno == 0 instead of -1 with
      errno == EINVAL.
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      CC: stable@kernel.org
      b4c458b3
  19. 17 6月, 2009 1 次提交
  20. 09 6月, 2009 1 次提交
    • T
      CUSE: implement CUSE - Character device in Userspace · 151060ac
      Tejun Heo 提交于
      CUSE enables implementing character devices in userspace.  With recent
      additions of ioctl and poll support, FUSE already has most of what's
      necessary to implement character devices.  All CUSE has to do is
      bonding all those components - FUSE, chardev and the driver model -
      nicely.
      
      When client opens /dev/cuse, kernel starts conversation with
      CUSE_INIT.  The client tells CUSE which device it wants to create.  As
      the previous patch made fuse_file usable without associated
      fuse_inode, CUSE doesn't create super block or inodes.  It attaches
      fuse_file to cdev file->private_data during open and set ff->fi to
      NULL.  The rest of the operation is almost identical to FUSE direct IO
      case.
      
      Each CUSE device has a corresponding directory /sys/class/cuse/DEVNAME
      (which is symlink to /sys/devices/virtual/class/DEVNAME if
      SYSFS_DEPRECATED is turned off) which hosts "waiting" and "abort"
      among other things.  Those two files have the same meaning as the FUSE
      control files.
      
      The only notable lacking feature compared to in-kernel implementation
      is mmap support.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NMiklos Szeredi <mszeredi@suse.cz>
      151060ac
  21. 09 5月, 2009 1 次提交
  22. 28 4月, 2009 3 次提交