1. 15 2月, 2014 1 次提交
    • J
      cifs: ensure that uncached writes handle unmapped areas correctly · 5d81de8e
      Jeff Layton 提交于
      It's possible for userland to pass down an iovec via writev() that has a
      bogus user pointer in it. If that happens and we're doing an uncached
      write, then we can end up getting less bytes than we expect from the
      call to iov_iter_copy_from_user. This is CVE-2014-0069
      
      cifs_iovec_write isn't set up to handle that situation however. It'll
      blindly keep chugging through the page array and not filling those pages
      with anything useful. Worse yet, we'll later end up with a negative
      number in wdata->tailsz, which will confuse the sending routines and
      cause an oops at the very least.
      
      Fix this by having the copy phase of cifs_iovec_write stop copying data
      in this situation and send the last write as a short one. At the same
      time, we want to avoid sending a zero-length write to the server, so
      break out of the loop and set rc to -EFAULT if that happens. This also
      allows us to handle the case where no address in the iovec is valid.
      
      [Note: Marking this for stable on v3.4+ kernels, but kernels as old as
             v2.6.38 may have a similar problem and may need similar fix]
      
      Cc: <stable@vger.kernel.org> # v3.4+
      Reviewed-by: NPavel Shilovsky <piastry@etersoft.ru>
      Reported-by: NAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: NJeff Layton <jlayton@redhat.com>
      Signed-off-by: NSteve French <smfrench@gmail.com>
      5d81de8e
  2. 11 2月, 2014 1 次提交
    • S
      [CIFS] Fix cifsacl mounts over smb2 to not call cifs · 42eacf9e
      Steve French 提交于
      When mounting with smb2/smb3 (e.g. vers=2.1) and cifsacl mount option,
      it was trying to get the mode by querying the acl over the cifs
      rather than smb2 protocol.  This patch makes that protocol
      independent and makes cifsacl smb2 mounts return a more intuitive
      operation not supported error (until we add a worker function
      for smb2_get_acl).
      
      Note that a previous patch fixed getxattr/setxattr for the CIFSACL xattr
      which would unconditionally call cifs_get_acl and cifs_set_acl (even when
      mounted smb2). I made those protocol independent last week (new protocol
      version operations "get_acl" and "set_acl" but did not add an
      smb2_get_acl and smb2_set_acl yet so those now simply return EOPNOTSUPP
      which at least is better than sending cifs requests on smb2 mount)
      
      The previous patches did not fix the one remaining case though ie
      mounting with "cifsacl" when getting mode from acl would unconditionally
      end up calling "cifs_get_acl_from_fid" even for smb2 - so made that protocol
      independent but to make that protocol independent had to make sure that the callers
      were passing the protocol independent handle structure (cifs_fid) instead
      of cifs specific _u16 network file handle (ie cifs_fid instead of cifs_fid->fid)
      
      Now mount with smb2 and cifsacl mount options will return EOPNOTSUP (instead
      of timing out) and a future patch will add smb2 operations (e.g. get_smb2_acl)
      to enable this.
      Signed-off-by: NSteve French <smfrench@gmail.com>
      42eacf9e
  3. 08 2月, 2014 4 次提交
  4. 07 2月, 2014 11 次提交
    • L
      Merge branch 'akpm' (patches from Andrew Morton) · 9343224b
      Linus Torvalds 提交于
      Merge a bunch of fixes from Andrew Morton:
       "Commit 579f8290 ("swap: add a simple detector for inappropriate
        swapin readahead") is a feature.  No probs if you decide to defer it
        until the next merge window.
      
        It has been sitting in my tree for over a year because of my dislike
        of all the magic numbers, but recent discussion with Hugh has made me
        give up"
      
      * emailed patches fron Andrew Morton <akpm@linux-foundation.org>:
        mm: __set_page_dirty uses spin_lock_irqsave instead of spin_lock_irq
        arch/x86/mm/numa.c: fix array index overflow when synchronizing nid to memblock.reserved.
        arch/x86/mm/numa.c: initialize numa_kernel_nodes in numa_clear_kernel_node_hotplug()
        mm: __set_page_dirty_nobuffers() uses spin_lock_irqsave() instead of spin_lock_irq()
        mm/swap: fix race on swap_info reuse between swapoff and swapon
        swap: add a simple detector for inappropriate swapin readahead
        ocfs2: free allocated clusters if error occurs after ocfs2_claim_clusters
        Documentation/kernel-parameters.txt: fix memmap= language
      9343224b
    • K
      mm: __set_page_dirty uses spin_lock_irqsave instead of spin_lock_irq · 227d53b3
      KOSAKI Motohiro 提交于
      To use spin_{un}lock_irq is dangerous if caller disabled interrupt.
      During aio buffer migration, we have a possibility to see the following
      call stack.
      
      aio_migratepage  [disable interrupt]
        migrate_page_copy
          clear_page_dirty_for_io
            set_page_dirty
              __set_page_dirty_buffers
                __set_page_dirty
                  spin_lock_irq
      
      This mean, current aio migration is a deadlockable.  spin_lock_irqsave
      is a safer alternative and we should use it.
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Reported-by: David Rientjes rientjes@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      227d53b3
    • T
      arch/x86/mm/numa.c: fix array index overflow when synchronizing nid to memblock.reserved. · 7bc35fdd
      Tang Chen 提交于
      The following path will cause array out of bound.
      
      memblock_add_region() will always set nid in memblock.reserved to
      MAX_NUMNODES.  In numa_register_memblks(), after we set all nid to
      correct valus in memblock.reserved, we called setup_node_data(), and
      used memblock_alloc_nid() to allocate memory, with nid set to
      MAX_NUMNODES.
      
      The nodemask_t type can be seen as a bit array.  And the index is 0 ~
      MAX_NUMNODES-1.
      
      After that, when we call node_set() in numa_clear_kernel_node_hotplug(),
      the nodemask_t got an index of value MAX_NUMNODES, which is out of [0 ~
      MAX_NUMNODES-1].
      
      See below:
      
      numa_init()
       |---> numa_register_memblks()
       |      |---> memblock_set_node(memory)		set correct nid in memblock.memory
       |      |---> memblock_set_node(reserved)	set correct nid in memblock.reserved
       |      |......
       |      |---> setup_node_data()
       |             |---> memblock_alloc_nid()	here, nid is set to MAX_NUMNODES (1024)
       |......
       |---> numa_clear_kernel_node_hotplug()
              |---> node_set()			here, we have an index 1024, and overflowed
      
      This patch moves nid setting to numa_clear_kernel_node_hotplug() to fix
      this problem.
      Reported-by: NDave Jones <davej@redhat.com>
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Tested-by: NGu Zheng <guz.fnst@cn.fujitsu.com>
      Reported-by: NDave Jones <davej@redhat.com>
      Cc: David Rientjes <rientjes@google.com>
      Tested-by: NDave Jones <davej@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7bc35fdd
    • T
      arch/x86/mm/numa.c: initialize numa_kernel_nodes in numa_clear_kernel_node_hotplug() · 017c217a
      Tang Chen 提交于
      On-stack variable numa_kernel_nodes in numa_clear_kernel_node_hotplug()
      was not initialized.  So we need to initialize it.
      
      [akpm@linux-foundation.org: use NODE_MASK_NONE, per David]
      Signed-off-by: NTang Chen <tangchen@cn.fujitsu.com>
      Tested-by: NGu Zheng <guz.fnst@cn.fujitsu.com>
      Reported-by: NDave Jones <davej@redhat.com>
      Reported-by: NDavid Rientjes <rientjes@google.com>
      Tested-by: NDave Jones <davej@redhat.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      017c217a
    • K
      mm: __set_page_dirty_nobuffers() uses spin_lock_irqsave() instead of spin_lock_irq() · a85d9df1
      KOSAKI Motohiro 提交于
      During aio stress test, we observed the following lockdep warning.  This
      mean AIO+numa_balancing is currently deadlockable.
      
      The problem is, aio_migratepage disable interrupt, but
      __set_page_dirty_nobuffers unintentionally enable it again.
      
      Generally, all helper function should use spin_lock_irqsave() instead of
      spin_lock_irq() because they don't know caller at all.
      
         other info that might help us debug this:
          Possible unsafe locking scenario:
      
                CPU0
                ----
           lock(&(&ctx->completion_lock)->rlock);
           <Interrupt>
             lock(&(&ctx->completion_lock)->rlock);
      
          *** DEADLOCK ***
      
            dump_stack+0x19/0x1b
            print_usage_bug+0x1f7/0x208
            mark_lock+0x21d/0x2a0
            mark_held_locks+0xb9/0x140
            trace_hardirqs_on_caller+0x105/0x1d0
            trace_hardirqs_on+0xd/0x10
            _raw_spin_unlock_irq+0x2c/0x50
            __set_page_dirty_nobuffers+0x8c/0xf0
            migrate_page_copy+0x434/0x540
            aio_migratepage+0xb1/0x140
            move_to_new_page+0x7d/0x230
            migrate_pages+0x5e5/0x700
            migrate_misplaced_page+0xbc/0xf0
            do_numa_page+0x102/0x190
            handle_pte_fault+0x241/0x970
            handle_mm_fault+0x265/0x370
            __do_page_fault+0x172/0x5a0
            do_page_fault+0x1a/0x70
            page_fault+0x28/0x30
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Larry Woodman <lwoodman@redhat.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Johannes Weiner <jweiner@redhat.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a85d9df1
    • W
      mm/swap: fix race on swap_info reuse between swapoff and swapon · f893ab41
      Weijie Yang 提交于
      swapoff clear swap_info's SWP_USED flag prematurely and free its
      resources after that.  A concurrent swapon will reuse this swap_info
      while its previous resources are not cleared completely.
      
      These late freed resources are:
       - p->percpu_cluster
       - swap_cgroup_ctrl[type]
       - block_device setting
       - inode->i_flags &= ~S_SWAPFILE
      
      This patch clears the SWP_USED flag after all its resources are freed,
      so that swapon can reuse this swap_info by alloc_swap_info() safely.
      
      [akpm@linux-foundation.org: tidy up code comment]
      Signed-off-by: NWeijie Yang <weijie.yang@samsung.com>
      Acked-by: NHugh Dickins <hughd@google.com>
      Cc: Krzysztof Kozlowski <k.kozlowski@samsung.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f893ab41
    • S
      swap: add a simple detector for inappropriate swapin readahead · 579f8290
      Shaohua Li 提交于
      This is a patch to improve swap readahead algorithm.  It's from Hugh and
      I slightly changed it.
      
      Hugh's original changelog:
      
      swapin readahead does a blind readahead, whether or not the swapin is
      sequential.  This may be ok on harddisk, because large reads have
      relatively small costs, and if the readahead pages are unneeded they can
      be reclaimed easily - though, what if their allocation forced reclaim of
      useful pages? But on SSD devices large reads are more expensive than
      small ones: if the readahead pages are unneeded, reading them in caused
      significant overhead.
      
      This patch adds very simplistic random read detection.  Stealing the
      PageReadahead technique from Konstantin Khlebnikov's patch, avoiding the
      vma/anon_vma sophistications of Shaohua Li's patch, swapin_nr_pages()
      simply looks at readahead's current success rate, and narrows or widens
      its readahead window accordingly.  There is little science to its
      heuristic: it's about as stupid as can be whilst remaining effective.
      
      The table below shows elapsed times (in centiseconds) when running a
      single repetitive swapping load across a 1000MB mapping in 900MB ram
      with 1GB swap (the harddisk tests had taken painfully too long when I
      used mem=500M, but SSD shows similar results for that).
      
      Vanilla is the 3.6-rc7 kernel on which I started; Shaohua denotes his
      Sep 3 patch in mmotm and linux-next; HughOld denotes my Oct 1 patch
      which Shaohua showed to be defective; HughNew this Nov 14 patch, with
      page_cluster as usual at default of 3 (8-page reads); HughPC4 this same
      patch with page_cluster 4 (16-page reads); HughPC0 with page_cluster 0
      (1-page reads: no readahead).
      
      HDD for swapping to harddisk, SSD for swapping to VertexII SSD.  Seq for
      sequential access to the mapping, cycling five times around; Rand for
      the same number of random touches.  Anon for a MAP_PRIVATE anon mapping;
      Shmem for a MAP_SHARED anon mapping, equivalent to tmpfs.
      
      One weakness of Shaohua's vma/anon_vma approach was that it did not
      optimize Shmem: seen below.  Konstantin's approach was perhaps mistuned,
      50% slower on Seq: did not compete and is not shown below.
      
      HDD        Vanilla Shaohua HughOld HughNew HughPC4 HughPC0
      Seq Anon     73921   76210   75611   76904   78191  121542
      Seq Shmem    73601   73176   73855   72947   74543  118322
      Rand Anon   895392  831243  871569  845197  846496  841680
      Rand Shmem 1058375 1053486  827935  764955  764376  756489
      
      SSD        Vanilla Shaohua HughOld HughNew HughPC4 HughPC0
      Seq Anon     24634   24198   24673   25107   21614   70018
      Seq Shmem    24959   24932   25052   25703   22030   69678
      Rand Anon    43014   26146   28075   25989   26935   25901
      Rand Shmem   45349   45215   28249   24268   24138   24332
      
      These tests are, of course, two extremes of a very simple case: under
      heavier mixed loads I've not yet observed any consistent improvement or
      degradation, and wider testing would be welcome.
      
      Shaohua Li:
      
      Test shows Vanilla is slightly better in sequential workload than Hugh's
      patch.  I observed with Hugh's patch sometimes the readahead size is
      shrinked too fast (from 8 to 1 immediately) in sequential workload if
      there is no hit.  And in such case, continuing doing readahead is good
      actually.
      
      I don't prepare a sophisticated algorithm for the sequential workload
      because so far we can't guarantee sequential accessed pages are swap out
      sequentially.  So I slightly change Hugh's heuristic - don't shrink
      readahead size too fast.
      
      Here is my test result (unit second, 3 runs average):
      	Vanilla		Hugh		New
      Seq	356		370		360
      Random	4525		2447		2444
      
      Attached graph is the swapin/swapout throughput I collected with 'vmstat
      2'.  The first part is running a random workload (till around 1200 of
      the x-axis) and the second part is running a sequential workload.
      swapin and swapout throughput are almost identical in steady state in
      both workloads.  These are expected behavior.  while in Vanilla, swapin
      is much bigger than swapout especially in random workload (because wrong
      readahead).
      
      Original patches by: Shaohua Li and Konstantin Khlebnikov.
      
      [fengguang.wu@intel.com: swapin_nr_pages() can be static]
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NShaohua Li <shli@fusionio.com>
      Signed-off-by: NFengguang Wu <fengguang.wu@intel.com>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Wu Fengguang <fengguang.wu@intel.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Konstantin Khlebnikov <khlebnikov@openvz.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      579f8290
    • Z
      ocfs2: free allocated clusters if error occurs after ocfs2_claim_clusters · fb951eb5
      Zongxun Wang 提交于
      Even if using the same jbd2 handle, we cannot rollback a transaction.
      So once some error occurs after successfully allocating clusters, the
      allocated clusters will never be used and it means they are lost.  For
      example, call ocfs2_claim_clusters successfully when expanding a file,
      but failed in ocfs2_insert_extent.  So we need free the allocated
      clusters if they are not used indeed.
      Signed-off-by: NZongxun Wang <wangzongxun@huawei.com>
      Signed-off-by: NJoseph Qi <joseph.qi@huawei.com>
      Acked-by: NJoel Becker <jlbec@evilplan.org>
      Cc: Mark Fasheh <mfasheh@suse.com>
      Cc: Li Zefan <lizefan@huawei.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      fb951eb5
    • R
      Documentation/kernel-parameters.txt: fix memmap= language · 277cba1d
      Randy Dunlap 提交于
      Clean up descriptions of memmap= boot options.
      
      Add periods (full stops), drop commas, change "used" to "reserved" or
      "marked".
      Signed-off-by: NRandy Dunlap <rdunlap@infradead.org>
      Cc: Andiry Xu <andiry.xu@gmail.com>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      277cba1d
    • L
      Merge tag 'sound-3.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound · f2de3a15
      Linus Torvalds 提交于
      Pull sound fixes from Takashi Iwai:
       "A few HD-audio fixes and one USB-audio kconfig dependency fix.  All
        small and device-specific changes marked with Cc to stable"
      
      * tag 'sound-3.14-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound:
        ALSA: hda - Improve loopback path lookups for AD1983
        ALSA: hda - Fix missing VREF setup for Mac Pro 1,1
        ALSA: hda - Add missing mixer widget for AD1983
        ALSA: hda/realtek - Avoid invalid COEFs for ALC271X
        ALSA: hda - Fix silent output on Toshiba Satellite L40
        ALSA: usb-audio: Add missing kconfig dependecy
      f2de3a15
    • L
      Merge branch 'drm-fixes' of git://people.freedesktop.org/~airlied/linux · 65f0505b
      Linus Torvalds 提交于
      Pull drm fixes from Dave Airlie:
       "A few regression fixes already, one for my own stupidity, and mgag200
        typo fix, vmwgfx fixes and ttm regression fixes, and a radeon register
        checker update for older cards to handle geom shaders"
      
      * 'drm-fixes' of git://people.freedesktop.org/~airlied/linux:
        drm/radeon: allow geom rings to be setup on r600/r700 (v2)
        drm/mgag200,ast,cirrus: fix regression with drm_can_sleep conversion
        drm/ttm: Don't clear page metadata of imported sg pages
        drm/ttm: Fix TTM object open regression
        vmwgfx: Fix unitialized stack read in vmw_setup_otable_base
        drm/vmwgfx: Reemit context bindings when necessary v2
        drm/vmwgfx: Detect old user-space drivers and set up legacy emulation v2
        drm/vmwgfx: Emulate legacy shaders on guest-backed devices v2
        drm/vmwgfx: Fix legacy surface reference size copyback
        drm/vmwgfx: Fix SET_SHADER_CONST emulation on guest-backed devices
        drm/vmwgfx: Fix regression caused by "drm/ttm: make ttm reservation calls behave like reservation calls"
        drm/vmwgfx: Don't commit staged bindings if execbuf fails
        drm/mgag200: fix typo causing bw limits to be ignored on some chips
      65f0505b
  5. 06 2月, 2014 12 次提交
  6. 05 2月, 2014 11 次提交