1. 28 11月, 2013 3 次提交
    • T
      sysfs, kernfs: add skeletons for kernfs · b8441ed2
      Tejun Heo 提交于
      Core sysfs implementation will be separated into kernfs so that it can
      be used by other non-kobject users.
      
      This patch creates fs/kernfs/ directory and makes boilerplate changes.
      kernfs interface will be directly based on sysfs_dirent and its
      forward declaration is moved to include/linux/kernfs.h which is
      included from include/linux/sysfs.h.  sysfs core implementation will
      be gradually separated out and moved to kernfs.
      
      This patch doesn't introduce any functional changes.
      
      v2: mount.c added.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: linux-fsdevel@vger.kernel.org
      Cc: Christoph Hellwig <hch@infradead.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b8441ed2
    • T
      sysfs: make __sysfs_add_one() fail if the parent isn't a directory · ae2108ad
      Tejun Heo 提交于
      Currently the kobject based interface guarantees that a parent
      sysfs_dirent is always a directory; however, the planned kernfs
      interface will be directly based on sysfs_dirents and the caller may
      specify non-directory node as the parent.  Add an explicit check in
      __sysfs_add_one() so that such attempts fail with -EINVAL.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ae2108ad
    • T
      sysfs: drop kobj_ns_type handling, take #2 · c84a3b27
      Tejun Heo 提交于
      The way namespace tags are implemented in sysfs is more complicated
      than necessary.  As each tag is a pointer value and required to be
      non-NULL under a namespace enabled parent, there's no need to record
      separately what type each tag is.  If multiple namespace types are
      needed, which currently aren't, we can simply compare the tag to a set
      of allowed tags in the superblock assuming that the tags, being
      pointers, won't have the same value across multiple types.
      
      This patch rips out kobj_ns_type handling from sysfs.  sysfs now has
      an enable switch to turn on namespace under a node.  If enabled, all
      children are required to have non-NULL namespace tags and filtered
      against the super_block's tag.
      
      kobject namespace determination is now performed in
      lib/kobject.c::create_dir() making sysfs_read_ns_type() unnecessary.
      The sanity checks are also moved.  create_dir() is restructured to
      ease such addition.  This removes most kobject namespace knowledge
      from sysfs proper which will enable proper separation and layering of
      sysfs.
      
      This is the second try.  The first one was cb26a311 ("sysfs: drop
      kobj_ns_type handling") which tried to automatically enable namespace
      if there are children with non-NULL namespace tags; however, it was
      broken for symlinks as they should inherit the target's tag iff
      namespace is enabled in the parent.  This led to namespace filtering
      enabled incorrectly for wireless net class devices through phy80211
      symlinks and thus network configuration failure.  a1212d27
      ("Revert "sysfs: drop kobj_ns_type handling"") reverted the commit.
      
      This shouldn't introduce any behavior changes, for real.
      
      v2: Dummy implementation of sysfs_enable_ns() for !CONFIG_SYSFS was
          missing and caused build failure.  Reported by kbuild test robot.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Kay Sievers <kay@vrfy.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: kbuild test robot <fengguang.wu@intel.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c84a3b27
  2. 22 11月, 2013 2 次提交
    • J
      configfs: fix race between dentry put and lookup · 76ae281f
      Junxiao Bi 提交于
      A race window in configfs, it starts from one dentry is UNHASHED and end
      before configfs_d_iput is called.  In this window, if a lookup happen,
      since the original dentry was UNHASHED, so a new dentry will be
      allocated, and then in configfs_attach_attr(), sd->s_dentry will be
      updated to the new dentry.  Then in configfs_d_iput(),
      BUG_ON(sd->s_dentry != dentry) will be triggered and system panic.
      
      sys_open:                     sys_close:
       ...                           fput
                                      dput
                                       dentry_kill
                                        __d_drop <--- dentry unhashed here,
                                                 but sd->dentry still point
                                                 to this dentry.
      
       lookup_real
        configfs_lookup
         configfs_attach_attr---> update sd->s_dentry
                                  to new allocated dentry here.
      
                                         d_kill
                                           configfs_d_iput <--- BUG_ON(sd->s_dentry != dentry)
                                                           triggered here.
      
      To fix it, change configfs_d_iput to not update sd->s_dentry if
      sd->s_count > 2, that means there are another dentry is using the sd
      beside the one that is going to be put.  Use configfs_dirent_lock in
      configfs_attach_attr to sync with configfs_d_iput.
      
      With the following steps, you can reproduce the bug.
      
      1. enable ocfs2, this will mount configfs at /sys/kernel/config and
         fill configure in it.
      
      2. run the following script.
      	while [ 1 ]; do cat /sys/kernel/config/cluster/$your_cluster_name/idle_timeout_ms > /dev/null; done &
      	while [ 1 ]; do cat /sys/kernel/config/cluster/$your_cluster_name/idle_timeout_ms > /dev/null; done &
      Signed-off-by: NJunxiao Bi <junxiao.bi@oracle.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      76ae281f
    • S
      GFS2: Fix ref count bug relating to atomic_open · ea0341e0
      Steven Whitehouse 提交于
      In the case that atomic_open calls finish_no_open() with
      the dentry that was supplied to gfs2_atomic_open() an
      extra reference count is required. This patch fixes that
      issue preventing a bug trap triggering at umount time.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      ea0341e0
  3. 21 11月, 2013 17 次提交
  4. 20 11月, 2013 14 次提交
    • P
      Squashfs: Check stream is not NULL in decompressor_multi.c · ed4f381e
      Phillip Lougher 提交于
      Fix static checker complaint that stream is not checked in
      squashfs_decompressor_destroy().
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NPhillip Lougher <phillip@squashfs.org.uk>
      Reviewed-by: NMinchan Kim <minchan@kernel.org>
      ed4f381e
    • P
      Squashfs: Directly decompress into the page cache for file data · 0d455c12
      Phillip Lougher 提交于
      This introduces an implementation of squashfs_readpage_block()
      that directly decompresses into the page cache.
      
      This uses the previously added page handler abstraction to push
      down the necessary kmap_atomic/kunmap_atomic operations on the
      page cache buffers into the decompressors.  This enables
      direct copying into the page cache without using the slow
      kmap/kunmap calls.
      
      The code detects when multiple threads are racing in
      squashfs_readpage() to decompress the same block, and avoids
      this regression by falling back to using an intermediate
      buffer.
      
      This patch enhances the performance of Squashfs significantly
      when multiple processes are accessing the filesystem simultaneously
      because it not only reduces memcopying, but it more importantly
      eliminates the lock contention on the intermediate buffer.
      
      Using single-thread decompression.
      
              dd if=file1 of=/dev/null bs=4096 &
              dd if=file2 of=/dev/null bs=4096 &
              dd if=file3 of=/dev/null bs=4096 &
              dd if=file4 of=/dev/null bs=4096
      
      Before:
      
      629145600 bytes (629 MB) copied, 45.8046 s, 13.7 MB/s
      
      After:
      
      629145600 bytes (629 MB) copied, 9.29414 s, 67.7 MB/s
      Signed-off-by: NPhillip Lougher <phillip@squashfs.org.uk>
      Reviewed-by: NMinchan Kim <minchan@kernel.org>
      0d455c12
    • P
      Squashfs: Restructure squashfs_readpage() · 5f55dbc0
      Phillip Lougher 提交于
      Restructure squashfs_readpage() splitting it into separate
      functions for datablocks, fragments and sparse blocks.
      
      Move the memcpying (from squashfs cache entry) implementation of
      squashfs_readpage_block into file_cache.c
      
      This allows different implementations to be supported.
      Signed-off-by: NPhillip Lougher <phillip@squashfs.org.uk>
      Reviewed-by: NMinchan Kim <minchan@kernel.org>
      5f55dbc0
    • P
      Squashfs: Generalise paging handling in the decompressors · 846b730e
      Phillip Lougher 提交于
      Further generalise the decompressors by adding a page handler
      abstraction.  This adds helpers to allow the decompressors
      to access and process the output buffers in an implementation
      independant manner.
      
      This allows different types of output buffer to be passed
      to the decompressors, with the implementation specific
      aspects handled at decompression time, but without the
      knowledge being held in the decompressor wrapper code.
      
      This will allow the decompressors to handle Squashfs
      cache buffers, and page cache pages.
      
      This patch adds the abstraction and an implementation for
      the caches.
      Signed-off-by: NPhillip Lougher <phillip@squashfs.org.uk>
      Reviewed-by: NMinchan Kim <minchan@kernel.org>
      846b730e
    • P
      Squashfs: add multi-threaded decompression using percpu variable · d208383d
      Phillip Lougher 提交于
      Add a multi-threaded decompression implementation which uses
      percpu variables.
      
      Using percpu variables has advantages and disadvantages over
      implementations which do not use percpu variables.
      
      Advantages:
        * the nature of percpu variables ensures decompression is
          load-balanced across the multiple cores.
        * simplicity.
      
      Disadvantages: it limits decompression to one thread per core.
      Signed-off-by: NPhillip Lougher <phillip@squashfs.org.uk>
      d208383d
    • M
      squashfs: Enhance parallel I/O · cd59c2ec
      Minchan Kim 提交于
      Now squashfs have used for only one stream buffer for decompression
      so it hurts parallel read performance so this patch supports
      multiple decompressor to enhance performance parallel I/O.
      
      Four 1G file dd read on KVM machine which has 2 CPU and 4G memory.
      
      dd if=test/test1.dat of=/dev/null &
      dd if=test/test2.dat of=/dev/null &
      dd if=test/test3.dat of=/dev/null &
      dd if=test/test4.dat of=/dev/null &
      
      old : 1m39s -> new : 9s
      
      * From v1
        * Change comp_strm with decomp_strm - Phillip
        * Change/add comments - Phillip
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Signed-off-by: NPhillip Lougher <phillip@squashfs.org.uk>
      cd59c2ec
    • P
      Squashfs: Refactor decompressor interface and code · 9508c6b9
      Phillip Lougher 提交于
      The decompressor interface and code was written from
      the point of view of single-threaded operation.  In doing
      so it mixed a lot of single-threaded implementation specific
      aspects into the decompressor code and elsewhere which makes it
      difficult to seamlessly support multiple different decompressor
      implementations.
      
      This patch does the following:
      
      1.  It removes compressor_options parsing from the decompressor
          init() function.  This allows the decompressor init() function
          to be dynamically called to instantiate multiple decompressors,
          without the compressor options needing to be read and parsed each
          time.
      
      2.  It moves threading and all sleeping operations out of the
          decompressors.  In doing so, it makes the decompressors
          non-blocking wrappers which only deal with interfacing with
          the decompressor implementation.
      
      3. It splits decompressor.[ch] into decompressor generic functions
         in decompressor.[ch], and moves the single threaded
         decompressor implementation into decompressor_single.c.
      
      The result of this patch is Squashfs should now be able to
      support multiple decompressors by adding new decompressor_xxx.c
      files with specialised implementations of the functions in
      decompressor_single.c
      Signed-off-by: NPhillip Lougher <phillip@squashfs.org.uk>
      Reviewed-by: NMinchan Kim <minchan@kernel.org>
      9508c6b9
    • J
      nfsd4: fix xdr decoding of large non-write compounds · 365da4ad
      J. Bruce Fields 提交于
      This fixes a regression from 24750082
      "nfsd4: fix decoding of compounds across page boundaries".  The previous
      code was correct: argp->pagelist is initialized in
      nfs4svc_deocde_compoundargs to rqstp->rq_arg.pages, and is therefore a
      pointer to the page *after* the page we are currently decoding.
      
      The reason that patch nevertheless fixed a problem with decoding
      compounds containing write was a bug in the write decoding introduced by
      5a80a54d "nfsd4: reorganize write
      decoding", after which write decoding no longer adhered to the rule that
      argp->pagelist point to the next page.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      365da4ad
    • S
      aio: nullify aio->ring_pages after freeing it · ddb8c45b
      Sasha Levin 提交于
      After freeing ring_pages we leave it as is causing a dangling pointer. This
      has already caused an issue so to help catching any issues in the future
      NULL it out.
      Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org>
      ddb8c45b
    • S
      aio: prevent double free in ioctx_alloc · d5580232
      Sasha Levin 提交于
      ioctx_alloc() calls aio_setup_ring() to allocate a ring. If aio_setup_ring()
      fails to do so it would call aio_free_ring() before returning, but
      ioctx_alloc() would call aio_free_ring() again causing a double free of
      the ring.
      
      This is easily reproducible from userspace.
      Signed-off-by: NSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: NBenjamin LaHaise <bcrl@kvack.org>
      d5580232
    • J
      genetlink: make multicast groups const, prevent abuse · 2a94fe48
      Johannes Berg 提交于
      Register generic netlink multicast groups as an array with
      the family and give them contiguous group IDs. Then instead
      of passing the global group ID to the various functions that
      send messages, pass the ID relative to the family - for most
      families that's just 0 because the only have one group.
      
      This avoids the list_head and ID in each group, adding a new
      field for the mcast group ID offset to the family.
      
      At the same time, this allows us to prevent abusing groups
      again like the quota and dropmon code did, since we can now
      check that a family only uses a group it owns.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2a94fe48
    • J
      genetlink: pass family to functions using groups · 68eb5503
      Johannes Berg 提交于
      This doesn't really change anything, but prepares for the
      next patch that will change the APIs to pass the group ID
      within the family, rather than the global group ID.
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      68eb5503
    • J
      quota/genetlink: use proper genetlink multicast APIs · 2ecf7536
      Johannes Berg 提交于
      The quota code is abusing the genetlink API and is using
      its family ID as the multicast group ID, which is invalid
      and may belong to somebody else (and likely will.)
      
      Make the quota code use the correct API, but since this
      is already used as-is by userspace, reserve a family ID
      for this code and also reserve that group ID to not break
      userspace assumptions.
      Acked-by: NJan Kara <jack@suse.cz>
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      2ecf7536
    • J
      genetlink: only pass array to genl_register_family_with_ops() · c53ed742
      Johannes Berg 提交于
      As suggested by David Miller, make genl_register_family_with_ops()
      a macro and pass only the array, evaluating ARRAY_SIZE() in the
      macro, this is a little safer.
      
      The openvswitch has some indirection, assing ops/n_ops directly in
      that code. This might ultimately just assign the pointers in the
      family initializations, saving the struct genl_family_and_ops and
      code (once mcast groups are handled differently.)
      Signed-off-by: NJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      c53ed742
  5. 19 11月, 2013 4 次提交