1. 29 11月, 2013 1 次提交
  2. 28 11月, 2013 2 次提交
  3. 25 11月, 2013 1 次提交
    • S
      [CIFS] Do not use btrfs refcopy ioctl for SMB2 copy offload · f19e84df
      Steve French 提交于
      Change cifs.ko to using CIFS_IOCTL_COPYCHUNK instead
      of BTRFS_IOC_CLONE to avoid confusion about whether
      copy-on-write is required or optional for this operation.
      
      SMB2/SMB3 copyoffload had used the BTRFS_IOC_CLONE ioctl since
      they both speed up copy by offloading the copy rather than
      passing many read and write requests back and forth and both have
      identical syntax (passing file handles), but for SMB2/SMB3
      CopyChunk the server is not required to use copy-on-write
      to make a copy of the file (although some do), and Christoph
      has commented that since CopyChunk does not require
      copy-on-write we should not reuse BTRFS_IOC_CLONE.
      
      This patch renames the ioctl to use a cifs specific IOCTL
      CIFS_IOCTL_COPYCHUNK.  This ioctl is particularly important
      for SMB2/SMB3 since large file copy over the network otherwise
      can be very slow, and with this is often more than 100 times
      faster putting less load on server and client.
      
      Note that if a copy syscall is ever introduced, depending on
      its requirements/format it could end up using one of the other
      three methods that CIFS/SMB2/SMB3 can do for copy offload,
      but this method is particularly useful for file copy
      and broadly supported (not just by Samba server).
      Signed-off-by: NSteve French <smfrench@gmail.com>
      Reviewed-by: NJeff Layton <jlayton@redhat.com>
      Reviewed-by: NDavid Disseldorp <ddiss@samba.org>
      f19e84df
  4. 24 11月, 2013 8 次提交
    • L
      ceph: allocate non-zero page to fscache in readpage() · ff638b7d
      Li Wang 提交于
      ceph_osdc_readpages() returns number of bytes read, currently,
      the code only allocate full-zero page into fscache, this patch
      fixes this.
      Signed-off-by: NLi Wang <liwang@ubuntukylin.com>
      Reviewed-by: NMilosz Tanski <milosz@adfin.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      ff638b7d
    • Y
      ceph: wake up 'safe' waiters when unregistering request · fc55d2c9
      Yan, Zheng 提交于
      We also need to wake up 'safe' waiters if error occurs or request
      aborted. Otherwise sync(2)/fsync(2) may hang forever.
      Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
      Signed-off-by: NSage Weil <sage@inktank.com>
      fc55d2c9
    • Y
      ceph: cleanup aborted requests when re-sending requests. · eb1b8af3
      Yan, Zheng 提交于
      Aborted requests usually get cleared when the reply is received.
      If MDS crashes, no reply will be received. So we need to cleanup
      aborted requests when re-sending requests.
      Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
      Reviewed-by: NGreg Farnum <greg@inktank.com>
      Signed-off-by: NSage Weil <sage@inktank.com>
      eb1b8af3
    • Y
      ceph: handle race between cap reconnect and cap release · 99a9c273
      Yan, Zheng 提交于
      When a cap get released while composing the cap reconnect message.
      We should skip queuing the release message if the cap hasn't been
      added to the cap reconnect message.
      Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      99a9c273
    • Y
      ceph: set caps count after composing cap reconnect message · 44c99757
      Yan, Zheng 提交于
      It's possible that some caps get released while composing the cap
      reconnect message.
      Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      44c99757
    • Y
      ceph: queue cap release in __ceph_remove_cap() · a096b09a
      Yan, Zheng 提交于
      call __queue_cap_release() in __ceph_remove_cap(), this avoids
      acquiring s_cap_lock twice.
      Signed-off-by: NYan, Zheng <zheng.z.yan@intel.com>
      Reviewed-by: NSage Weil <sage@inktank.com>
      a096b09a
    • T
      sysfs: use a separate locking class for open files depending on mmap · 027a485d
      Tejun Heo 提交于
      The following two commits implemented mmap support in the regular file
      path and merged bin file support into the regular path.
      
       73d97146 ("sysfs: copy bin mmap support from fs/sysfs/bin.c to fs/sysfs/file.c")
       3124eb16 ("sysfs: merge regular and bin file handling")
      
      After the merge, the following commands trigger a spurious lockdep
      warning.  "test-mmap-read" simply mmaps the file and dumps the
      content.
      
        $ cat /sys/block/sda/trace/act_mask
        $ test-mmap-read /sys/devices/pci0000\:00/0000\:00\:03.0/resource0 4096
      
        ======================================================
        [ INFO: possible circular locking dependency detected ]
        3.12.0-work+ #378 Not tainted
        -------------------------------------------------------
        test-mmap-read/567 is trying to acquire lock:
         (&of->mutex){+.+.+.}, at: [<ffffffff8120a8df>] sysfs_bin_mmap+0x4f/0x120
      
        but task is already holding lock:
         (&mm->mmap_sem){++++++}, at: [<ffffffff8114b399>] vm_mmap_pgoff+0x49/0xa0
      
        which lock already depends on the new lock.
      
        the existing dependency chain (in reverse order) is:
      
        -> #3 (&mm->mmap_sem){++++++}:
        ...
        -> #2 (sr_mutex){+.+.+.}:
        ...
        -> #1 (&bdev->bd_mutex){+.+.+.}:
        ...
        -> #0 (&of->mutex){+.+.+.}:
        ...
      
        other info that might help us debug this:
      
        Chain exists of:
         &of->mutex --> sr_mutex --> &mm->mmap_sem
      
         Possible unsafe locking scenario:
      
      	 CPU0                    CPU1
      	 ----                    ----
          lock(&mm->mmap_sem);
      				 lock(sr_mutex);
      				 lock(&mm->mmap_sem);
          lock(&of->mutex);
      
         *** DEADLOCK ***
      
        1 lock held by test-mmap-read/567:
         #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff8114b399>] vm_mmap_pgoff+0x49/0xa0
      
        stack backtrace:
        CPU: 3 PID: 567 Comm: test-mmap-read Not tainted 3.12.0-work+ #378
        Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
         ffffffff81ed41a0 ffff880009441bc8 ffffffff81611ad2 ffffffff81eccb80
         ffff880009441c08 ffffffff8160f215 ffff880009441c60 ffff880009c75208
         0000000000000000 ffff880009c751e0 ffff880009c75208 ffff880009c74ac0
        Call Trace:
         [<ffffffff81611ad2>] dump_stack+0x4e/0x7a
         [<ffffffff8160f215>] print_circular_bug+0x2b0/0x2bf
         [<ffffffff8109ca0a>] __lock_acquire+0x1a3a/0x1e60
         [<ffffffff8109d6ba>] lock_acquire+0x9a/0x1d0
         [<ffffffff81615547>] mutex_lock_nested+0x67/0x3f0
         [<ffffffff8120a8df>] sysfs_bin_mmap+0x4f/0x120
         [<ffffffff8115d363>] mmap_region+0x3b3/0x5b0
         [<ffffffff8115d8ae>] do_mmap_pgoff+0x34e/0x3d0
         [<ffffffff8114b3ba>] vm_mmap_pgoff+0x6a/0xa0
         [<ffffffff8115be3e>] SyS_mmap_pgoff+0xbe/0x250
         [<ffffffff81008282>] SyS_mmap+0x22/0x30
         [<ffffffff8161a4d2>] system_call_fastpath+0x16/0x1b
      
      This happens because one file nests sr_mutex, which nests mm->mmap_sem
      under it, under of->mutex while mmap implementation naturally nests
      of->mutex under mm->mmap_sem.  The warning is false positive as
      of->mutex is per open-file and the two paths belong to two different
      files.  This warning didn't trigger before regular and bin file
      supports were merged because only bin file supported mmap and the
      other side of locking happened only on regular files which used
      equivalent but separate locking.
      
      It'd be best if we give separate locking classes per file but we can't
      easily do that.  Let's differentiate on ->mmap() for now.  Later we'll
      add explicit file operations struct and can add per-ops lockdep key
      there.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      027a485d
    • M
      sysfs: handle duplicate removal attempts in sysfs_remove_group() · 54d71145
      Mika Westerberg 提交于
      Commit bcdde7e2 (sysfs: make __sysfs_remove_dir() recursive) changed
      the behavior so that directory removals will be done recursively. This
      means that the sysfs group might already be removed if its parent directory
      has been removed.
      
      The current code outputs warnings similar to following log snippet when it
      detects that there is no group for the given kobject:
      
       WARNING: CPU: 0 PID: 4 at fs/sysfs/group.c:214 sysfs_remove_group+0xc6/0xd0()
       sysfs group ffffffff81c6f1e0 not found for kobject 'host7'
       Modules linked in:
       CPU: 0 PID: 4 Comm: kworker/0:0 Not tainted 3.12.0+ #13
       Hardware name:                  /D33217CK, BIOS GKPPT10H.86A.0042.2013.0422.1439 04/22/2013
       Workqueue: kacpi_hotplug acpi_hotplug_work_fn
        0000000000000009 ffff8801002459b0 ffffffff817daab1 ffff8801002459f8
        ffff8801002459e8 ffffffff810436b8 0000000000000000 ffffffff81c6f1e0
        ffff88006d440358 ffff88006d440188 ffff88006e8b4c28 ffff880100245a48
       Call Trace:
        [<ffffffff817daab1>] dump_stack+0x45/0x56
        [<ffffffff810436b8>] warn_slowpath_common+0x78/0xa0
        [<ffffffff81043727>] warn_slowpath_fmt+0x47/0x50
        [<ffffffff811ad319>] ? sysfs_get_dirent_ns+0x49/0x70
        [<ffffffff811ae526>] sysfs_remove_group+0xc6/0xd0
        [<ffffffff81432f7e>] dpm_sysfs_remove+0x3e/0x50
        [<ffffffff8142a0d0>] device_del+0x40/0x1b0
        [<ffffffff8142a24d>] device_unregister+0xd/0x20
        [<ffffffff8144131a>] scsi_remove_host+0xba/0x110
        [<ffffffff8145f526>] ata_host_detach+0xc6/0x100
        [<ffffffff8145f578>] ata_pci_remove_one+0x18/0x20
        [<ffffffff812e8f48>] pci_device_remove+0x28/0x60
        [<ffffffff8142d854>] __device_release_driver+0x64/0xd0
        [<ffffffff8142d8de>] device_release_driver+0x1e/0x30
        [<ffffffff8142d257>] bus_remove_device+0xf7/0x140
        [<ffffffff8142a1b1>] device_del+0x121/0x1b0
        [<ffffffff812e43d4>] pci_stop_bus_device+0x94/0xa0
        [<ffffffff812e437b>] pci_stop_bus_device+0x3b/0xa0
        [<ffffffff812e437b>] pci_stop_bus_device+0x3b/0xa0
        [<ffffffff812e44dd>] pci_stop_and_remove_bus_device+0xd/0x20
        [<ffffffff812fc743>] trim_stale_devices+0x73/0xe0
        [<ffffffff812fc78b>] trim_stale_devices+0xbb/0xe0
        [<ffffffff812fc78b>] trim_stale_devices+0xbb/0xe0
        [<ffffffff812fcb6e>] acpiphp_check_bridge+0x7e/0xd0
        [<ffffffff812fd90d>] hotplug_event+0xcd/0x160
        [<ffffffff812fd9c5>] hotplug_event_work+0x25/0x60
        [<ffffffff81316749>] acpi_hotplug_work_fn+0x17/0x22
        [<ffffffff8105cf3a>] process_one_work+0x17a/0x430
        [<ffffffff8105db29>] worker_thread+0x119/0x390
        [<ffffffff8105da10>] ? manage_workers.isra.25+0x2a0/0x2a0
        [<ffffffff81063a5d>] kthread+0xcd/0xf0
        [<ffffffff81063990>] ? kthread_create_on_node+0x180/0x180
        [<ffffffff817eb33c>] ret_from_fork+0x7c/0xb0
        [<ffffffff81063990>] ? kthread_create_on_node+0x180/0x180
      
      On this particular machine I see ~16 of these message during Thunderbolt
      hot-unplug.
      
      Fix this in similar way that was done for sysfs_remove_one() by checking
      if the parent directory has already been removed and bailing out early.
      Signed-off-by: NMika Westerberg <mika.westerberg@linux.intel.com>
      Acked-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      54d71145
  5. 22 11月, 2013 2 次提交
    • J
      configfs: fix race between dentry put and lookup · 76ae281f
      Junxiao Bi 提交于
      A race window in configfs, it starts from one dentry is UNHASHED and end
      before configfs_d_iput is called.  In this window, if a lookup happen,
      since the original dentry was UNHASHED, so a new dentry will be
      allocated, and then in configfs_attach_attr(), sd->s_dentry will be
      updated to the new dentry.  Then in configfs_d_iput(),
      BUG_ON(sd->s_dentry != dentry) will be triggered and system panic.
      
      sys_open:                     sys_close:
       ...                           fput
                                      dput
                                       dentry_kill
                                        __d_drop <--- dentry unhashed here,
                                                 but sd->dentry still point
                                                 to this dentry.
      
       lookup_real
        configfs_lookup
         configfs_attach_attr---> update sd->s_dentry
                                  to new allocated dentry here.
      
                                         d_kill
                                           configfs_d_iput <--- BUG_ON(sd->s_dentry != dentry)
                                                           triggered here.
      
      To fix it, change configfs_d_iput to not update sd->s_dentry if
      sd->s_count > 2, that means there are another dentry is using the sd
      beside the one that is going to be put.  Use configfs_dirent_lock in
      configfs_attach_attr to sync with configfs_d_iput.
      
      With the following steps, you can reproduce the bug.
      
      1. enable ocfs2, this will mount configfs at /sys/kernel/config and
         fill configure in it.
      
      2. run the following script.
      	while [ 1 ]; do cat /sys/kernel/config/cluster/$your_cluster_name/idle_timeout_ms > /dev/null; done &
      	while [ 1 ]; do cat /sys/kernel/config/cluster/$your_cluster_name/idle_timeout_ms > /dev/null; done &
      Signed-off-by: NJunxiao Bi <junxiao.bi@oracle.com>
      Cc: Joel Becker <jlbec@evilplan.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      76ae281f
    • S
      GFS2: Fix ref count bug relating to atomic_open · ea0341e0
      Steven Whitehouse 提交于
      In the case that atomic_open calls finish_no_open() with
      the dentry that was supplied to gfs2_atomic_open() an
      extra reference count is required. This patch fixes that
      issue preventing a bug trap triggering at umount time.
      Signed-off-by: NSteven Whitehouse <swhiteho@redhat.com>
      ea0341e0
  6. 21 11月, 2013 17 次提交
  7. 20 11月, 2013 9 次提交
    • S
      Check SMB3 dialects against downgrade attacks · ff1c038a
      Steve French 提交于
      When we are running SMB3 or SMB3.02 connections which are signed
      we need to validate the protocol negotiation information,
      to ensure that the negotiate protocol response was not tampered with.
      
      Add the missing FSCTL which is sent at mount time (immediately after
      the SMB3 Tree Connect) to validate that the capabilities match
      what we think the server sent.
      
      "Secure dialect negotiation is introduced in SMB3 to protect against
      man-in-the-middle attempt to downgrade dialect negotiation.
      The idea is to prevent an eavesdropper from downgrading the initially
      negotiated dialect and capabilities between the client and the server."
      
      For more explanation see 2.2.31.4 of MS-SMB2 or
      http://blogs.msdn.com/b/openspecification/archive/2012/06/28/smb3-secure-dialect-negotiation.aspxReviewed-by: NPavel Shilovsky <piastry@etersoft.ru>
      Signed-off-by: NSteve French <smfrench@gmail.com>
      ff1c038a
    • P
      Squashfs: Check stream is not NULL in decompressor_multi.c · ed4f381e
      Phillip Lougher 提交于
      Fix static checker complaint that stream is not checked in
      squashfs_decompressor_destroy().
      Reported-by: NDan Carpenter <dan.carpenter@oracle.com>
      Signed-off-by: NPhillip Lougher <phillip@squashfs.org.uk>
      Reviewed-by: NMinchan Kim <minchan@kernel.org>
      ed4f381e
    • P
      Squashfs: Directly decompress into the page cache for file data · 0d455c12
      Phillip Lougher 提交于
      This introduces an implementation of squashfs_readpage_block()
      that directly decompresses into the page cache.
      
      This uses the previously added page handler abstraction to push
      down the necessary kmap_atomic/kunmap_atomic operations on the
      page cache buffers into the decompressors.  This enables
      direct copying into the page cache without using the slow
      kmap/kunmap calls.
      
      The code detects when multiple threads are racing in
      squashfs_readpage() to decompress the same block, and avoids
      this regression by falling back to using an intermediate
      buffer.
      
      This patch enhances the performance of Squashfs significantly
      when multiple processes are accessing the filesystem simultaneously
      because it not only reduces memcopying, but it more importantly
      eliminates the lock contention on the intermediate buffer.
      
      Using single-thread decompression.
      
              dd if=file1 of=/dev/null bs=4096 &
              dd if=file2 of=/dev/null bs=4096 &
              dd if=file3 of=/dev/null bs=4096 &
              dd if=file4 of=/dev/null bs=4096
      
      Before:
      
      629145600 bytes (629 MB) copied, 45.8046 s, 13.7 MB/s
      
      After:
      
      629145600 bytes (629 MB) copied, 9.29414 s, 67.7 MB/s
      Signed-off-by: NPhillip Lougher <phillip@squashfs.org.uk>
      Reviewed-by: NMinchan Kim <minchan@kernel.org>
      0d455c12
    • P
      Squashfs: Restructure squashfs_readpage() · 5f55dbc0
      Phillip Lougher 提交于
      Restructure squashfs_readpage() splitting it into separate
      functions for datablocks, fragments and sparse blocks.
      
      Move the memcpying (from squashfs cache entry) implementation of
      squashfs_readpage_block into file_cache.c
      
      This allows different implementations to be supported.
      Signed-off-by: NPhillip Lougher <phillip@squashfs.org.uk>
      Reviewed-by: NMinchan Kim <minchan@kernel.org>
      5f55dbc0
    • P
      Squashfs: Generalise paging handling in the decompressors · 846b730e
      Phillip Lougher 提交于
      Further generalise the decompressors by adding a page handler
      abstraction.  This adds helpers to allow the decompressors
      to access and process the output buffers in an implementation
      independant manner.
      
      This allows different types of output buffer to be passed
      to the decompressors, with the implementation specific
      aspects handled at decompression time, but without the
      knowledge being held in the decompressor wrapper code.
      
      This will allow the decompressors to handle Squashfs
      cache buffers, and page cache pages.
      
      This patch adds the abstraction and an implementation for
      the caches.
      Signed-off-by: NPhillip Lougher <phillip@squashfs.org.uk>
      Reviewed-by: NMinchan Kim <minchan@kernel.org>
      846b730e
    • P
      Squashfs: add multi-threaded decompression using percpu variable · d208383d
      Phillip Lougher 提交于
      Add a multi-threaded decompression implementation which uses
      percpu variables.
      
      Using percpu variables has advantages and disadvantages over
      implementations which do not use percpu variables.
      
      Advantages:
        * the nature of percpu variables ensures decompression is
          load-balanced across the multiple cores.
        * simplicity.
      
      Disadvantages: it limits decompression to one thread per core.
      Signed-off-by: NPhillip Lougher <phillip@squashfs.org.uk>
      d208383d
    • M
      squashfs: Enhance parallel I/O · cd59c2ec
      Minchan Kim 提交于
      Now squashfs have used for only one stream buffer for decompression
      so it hurts parallel read performance so this patch supports
      multiple decompressor to enhance performance parallel I/O.
      
      Four 1G file dd read on KVM machine which has 2 CPU and 4G memory.
      
      dd if=test/test1.dat of=/dev/null &
      dd if=test/test2.dat of=/dev/null &
      dd if=test/test3.dat of=/dev/null &
      dd if=test/test4.dat of=/dev/null &
      
      old : 1m39s -> new : 9s
      
      * From v1
        * Change comp_strm with decomp_strm - Phillip
        * Change/add comments - Phillip
      Signed-off-by: NMinchan Kim <minchan@kernel.org>
      Signed-off-by: NPhillip Lougher <phillip@squashfs.org.uk>
      cd59c2ec
    • P
      Squashfs: Refactor decompressor interface and code · 9508c6b9
      Phillip Lougher 提交于
      The decompressor interface and code was written from
      the point of view of single-threaded operation.  In doing
      so it mixed a lot of single-threaded implementation specific
      aspects into the decompressor code and elsewhere which makes it
      difficult to seamlessly support multiple different decompressor
      implementations.
      
      This patch does the following:
      
      1.  It removes compressor_options parsing from the decompressor
          init() function.  This allows the decompressor init() function
          to be dynamically called to instantiate multiple decompressors,
          without the compressor options needing to be read and parsed each
          time.
      
      2.  It moves threading and all sleeping operations out of the
          decompressors.  In doing so, it makes the decompressors
          non-blocking wrappers which only deal with interfacing with
          the decompressor implementation.
      
      3. It splits decompressor.[ch] into decompressor generic functions
         in decompressor.[ch], and moves the single threaded
         decompressor implementation into decompressor_single.c.
      
      The result of this patch is Squashfs should now be able to
      support multiple decompressors by adding new decompressor_xxx.c
      files with specialised implementations of the functions in
      decompressor_single.c
      Signed-off-by: NPhillip Lougher <phillip@squashfs.org.uk>
      Reviewed-by: NMinchan Kim <minchan@kernel.org>
      9508c6b9
    • J
      nfsd4: fix xdr decoding of large non-write compounds · 365da4ad
      J. Bruce Fields 提交于
      This fixes a regression from 24750082
      "nfsd4: fix decoding of compounds across page boundaries".  The previous
      code was correct: argp->pagelist is initialized in
      nfs4svc_deocde_compoundargs to rqstp->rq_arg.pages, and is therefore a
      pointer to the page *after* the page we are currently decoding.
      
      The reason that patch nevertheless fixed a problem with decoding
      compounds containing write was a bug in the write decoding introduced by
      5a80a54d "nfsd4: reorganize write
      decoding", after which write decoding no longer adhered to the rule that
      argp->pagelist point to the next page.
      
      Cc: stable@vger.kernel.org
      Signed-off-by: NJ. Bruce Fields <bfields@redhat.com>
      365da4ad