1. 31 10月, 2013 3 次提交
    • G
      memcg: use __this_cpu_sub() to dec stats to avoid incorrect subtrahend casting · 5e8cfc3c
      Greg Thelen 提交于
      As of commit 3ea67d06 ("memcg: add per cgroup writeback pages
      accounting") memcg counter errors are possible when moving charged
      memory to a different memcg.  Charge movement occurs when processing
      writes to memory.force_empty, moving tasks to a memcg with
      memcg.move_charge_at_immigrate=1, or memcg deletion.
      
      An example showing error after memory.force_empty:
      
        $ cd /sys/fs/cgroup/memory
        $ mkdir x
        $ rm /data/tmp/file
        $ (echo $BASHPID >> x/tasks && exec mmap_writer /data/tmp/file 1M) &
        [1] 13600
        $ grep ^mapped x/memory.stat
        mapped_file 1048576
        $ echo 13600 > tasks
        $ echo 1 > x/memory.force_empty
        $ grep ^mapped x/memory.stat
        mapped_file 4503599627370496
      
      mapped_file should end with 0.
        4503599627370496 == 0x10,0000,0000,0000 == 0x100,0000,0000 pages
        1048576          == 0x10,0000           == 0x100 pages
      
      This issue only affects the source memcg on 64 bit machines; the
      destination memcg counters are correct.  So the rmdir case is not too
      important because such counters are soon disappearing with the entire
      memcg.  But the memcg.force_empty and memory.move_charge_at_immigrate=1
      cases are larger problems as the bogus counters are visible for the
      (possibly long) remaining life of the source memcg.
      
      The problem is due to memcg use of __this_cpu_from(.., -nr_pages), which
      is subtly wrong because it subtracts the unsigned int nr_pages (either
      -1 or -512 for THP) from a signed long percpu counter.  When
      nr_pages=-1, -nr_pages=0xffffffff.  On 64 bit machines stat->count[idx]
      is signed 64 bit.  So memcg's attempt to simply decrement a count (e.g.
      from 1 to 0) boils down to:
      
        long count = 1
        unsigned int nr_pages = 1
        count += -nr_pages  /* -nr_pages == 0xffff,ffff */
        count is now 0x1,0000,0000 instead of 0
      
      The fix is to subtract the unsigned page count rather than adding its
      negation.  This only works once "percpu: fix this_cpu_sub() subtrahend
      casting for unsigneds" is applied to fix this_cpu_sub().
      Signed-off-by: NGreg Thelen <gthelen@google.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5e8cfc3c
    • G
      percpu: fix this_cpu_sub() subtrahend casting for unsigneds · bd09d9a3
      Greg Thelen 提交于
      this_cpu_sub() is implemented as negation and addition.
      
      This patch casts the adjustment to the counter type before negation to
      sign extend the adjustment.  This helps in cases where the counter type
      is wider than an unsigned adjustment.  An alternative to this patch is
      to declare such operations unsupported, but it seemed useful to avoid
      surprises.
      
      This patch specifically helps the following example:
        unsigned int delta = 1
        preempt_disable()
        this_cpu_write(long_counter, 0)
        this_cpu_sub(long_counter, delta)
        preempt_enable()
      
      Before this change long_counter on a 64 bit machine ends with value
      0xffffffff, rather than 0xffffffffffffffff.  This is because
      this_cpu_sub(pcp, delta) boils down to this_cpu_add(pcp, -delta),
      which is basically:
        long_counter = 0 + 0xffffffff
      
      Also apply the same cast to:
        __this_cpu_sub()
        __this_cpu_sub_return()
        this_cpu_sub_return()
      
      All percpu_test.ko passes, especially the following cases which
      previously failed:
      
        l -= ui_one;
        __this_cpu_sub(long_counter, ui_one);
        CHECK(l, long_counter, -1);
      
        l -= ui_one;
        this_cpu_sub(long_counter, ui_one);
        CHECK(l, long_counter, -1);
        CHECK(l, long_counter, 0xffffffffffffffff);
      
        ul -= ui_one;
        __this_cpu_sub(ulong_counter, ui_one);
        CHECK(ul, ulong_counter, -1);
        CHECK(ul, ulong_counter, 0xffffffffffffffff);
      
        ul = this_cpu_sub_return(ulong_counter, ui_one);
        CHECK(ul, ulong_counter, 2);
      
        ul = __this_cpu_sub_return(ulong_counter, ui_one);
        CHECK(ul, ulong_counter, 1);
      Signed-off-by: NGreg Thelen <gthelen@google.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Acked-by: NJohannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      bd09d9a3
    • C
      mm/pagewalk.c: fix walk_page_range() access of wrong PTEs · 3017f079
      Chen LinX 提交于
      When walk_page_range walk a memory map's page tables, it'll skip
      VM_PFNMAP area, then variable 'next' will to assign to vma->vm_end, it
      maybe larger than 'end'.  In next loop, 'addr' will be larger than
      'next'.  Then in /proc/XXXX/pagemap file reading procedure, the 'addr'
      will growing forever in pagemap_pte_range, pte_to_pagemap_entry will
      access the wrong pte.
      
        BUG: Bad page map in process procrank  pte:8437526f pmd:785de067
        addr:9108d000 vm_flags:00200073 anon_vma:f0d99020 mapping:  (null) index:9108d
        CPU: 1 PID: 4974 Comm: procrank Tainted: G    B   W  O 3.10.1+ #1
        Call Trace:
          dump_stack+0x16/0x18
          print_bad_pte+0x114/0x1b0
          vm_normal_page+0x56/0x60
          pagemap_pte_range+0x17a/0x1d0
          walk_page_range+0x19e/0x2c0
          pagemap_read+0x16e/0x200
          vfs_read+0x84/0x150
          SyS_read+0x4a/0x80
          syscall_call+0x7/0xb
      Signed-off-by: NLiu ShuoX <shuox.liu@intel.com>
      Signed-off-by: NChen LinX <linx.z.chen@intel.com>
      Acked-by: NKirill A. Shutemov <kirill.shutemov@linux.intel.com>
      Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Cc: <stable@vger.kernel.org>	[3.10.x+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3017f079
  2. 30 10月, 2013 1 次提交
    • L
      Fix a few incorrectly checked [io_]remap_pfn_range() calls · 7314e613
      Linus Torvalds 提交于
      Nico Golde reports a few straggling uses of [io_]remap_pfn_range() that
      really should use the vm_iomap_memory() helper.  This trivially converts
      two of them to the helper, and comments about why the third one really
      needs to continue to use remap_pfn_range(), and adds the missing size
      check.
      Reported-by: NNico Golde <nico@ngolde.de>
      Cc: stable@kernel.org
      Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org.
      7314e613
  3. 29 10月, 2013 9 次提交
  4. 28 10月, 2013 9 次提交
  5. 27 10月, 2013 1 次提交
    • H
      parisc: Do not crash 64bit SMP kernels on machines with >= 4GB RAM · 54e181e0
      Helge Deller 提交于
      Since the beginning of the parisc-linux port, sometimes 64bit SMP kernels were
      not able to bring up other CPUs than the monarch CPU and instead crashed the
      kernel.  The reason was unclear, esp. since it involved various machines (e.g.
      J5600, J6750 and SuperDome). Testing showed, that those crashes didn't happened
      when less than 4GB were installed, or if a 32bit Linux kernel was booted.
      
      In the end, the fix for those SMP problems is trivial:
      During the early phase of the initialization of the CPUs, including the monarch
      CPU, the PDC_PSW firmware function to enable WIDE (=64bit) mode is called.
      It's documented that this firmware function may clobber various registers, and
      one one of those possibly clobbered registers is %cr30 which holds the task
      thread info pointer.
      
      Now, if %cr30 would always have been clobbered, then this bug would have been
      detected much earlier. But lots of testing finally showed, that - at least for
      %cr30 - on some machines only the upper 32bits of the 64bit register suddenly
      turned zero after the firmware call.
      
      So, after finding the root cause, the explanation for the various crashes
      became clear:
      - On 32bit SMP Linux kernels all upper 32bit were zero, so we didn't faced this
        problem.
      - Monarch CPUs in 64bit mode always booted sucessfully, because the inital task
        thread info pointer was below 4GB.
      - Secondary CPUs booted sucessfully on machines with less than 4GB RAM because
        the upper 32bit were zero anyay.
      - Secondary CPus failed to boot if we had more than 4GB RAM and the task thread
        info pointer was located above the 4GB boundary.
      
      Finally, the patch to fix this problem is trivial by saving the %cr30 register
      before the firmware call and restoring it afterwards.
      Signed-off-by: NHelge Deller <deller@gmx.de>
      Signed-off-by: NJohn David Anglin <dave.anglin@bell.net>
      Cc: <stable@vger.kernel.org> # 2.6.12+
      Signed-off-by: NHelge Deller <deller@gmx.de>
      54e181e0
  6. 26 10月, 2013 6 次提交
    • L
      Merge tag 'pm+acpi-3.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · 20582e34
      Linus Torvalds 提交于
      Pull ACPI and power management fixes from
       "These fix two bugs in the intel_pstate driver, a hibernate bug leading
        to nasty resume failures sometimes and acpi-cpufreq initialization bug
        that causes problems to happen during module unload when intel_pstate
        is in use.
      
        Specifics:
      
         - Fix for rounding errors in intel_pstate causing CPU utilization to
           be underestimated from Brennan Shacklett.
      
         - intel_pstate fix to always use the correct max pstate value when
           computing the min pstate from Dirk Brandewie.
      
         - Hibernation fix for deadlocking resume in cases when the probing of
           the device containing the image is deferred from Russ Dill.
      
         - acpi-cpufreq fix to prevent the module from staying in memory when
           the driver cannot be registered and then attempting to unregister
           things that have never been registered on exit"
      
      * tag 'pm+acpi-3.12-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        acpi-cpufreq: Fail initialization if driver cannot be registered
        PM / hibernate: Move software_resume to late_initcall_sync
        intel_pstate: Correct calculation of min pstate value
        intel_pstate: Improve accuracy by not truncating until final result
      20582e34
    • L
      Merge tag 'for-linus-20131025' of git://git.infradead.org/linux-mtd · d255c59a
      Linus Torvalds 提交于
      Pull final mtd fixes from Brian Norris:
       "A few more last-minute regression fixes, prepared jointly by me and
        David Woodhouse:
      
         - Revert pxa3xx to its old name to avoid breaking existing
           'mtdparts=' boot strings.
      
         - Return GPMI NAND to its legacy ECC layout for backwards
           compatibility.  We will revisit this in 3.13.
      
        A note from David on the latter fix: 'This leaves a harmless cosmetic
        warning about an unused function.  At this point in the cycle I really
        don't care.'"
      
      * tag 'for-linus-20131025' of git://git.infradead.org/linux-mtd:
        mtd: gpmi: fix ECC regression
        mtd: nand: pxa3xx: Fix registered MTD name
      d255c59a
    • N
      vhost/scsi: Fix incorrect usage of get_user_pages_fast write parameter · 60a01f55
      Nicholas Bellinger 提交于
      This patch addresses a long-standing bug where the get_user_pages_fast()
      write parameter used for setting the underlying page table entry permission
      bits was incorrectly set to write=1 for data_direction=DMA_TO_DEVICE, and
      passed into get_user_pages_fast() via vhost_scsi_map_iov_to_sgl().
      
      However, this parameter is intended to signal WRITEs to pinned userspace
      PTEs for the virtio-scsi DMA_FROM_DEVICE -> READ payload case, and *not*
      for the virtio-scsi DMA_TO_DEVICE -> WRITE payload case.
      
      This bug would manifest itself as random process segmentation faults on
      KVM host after repeated vhost starts + stops and/or with lots of vhost
      endpoints + LUNs.
      
      Cc: Stefan Hajnoczi <stefanha@redhat.com>
      Cc: Michael S. Tsirkin <mst@redhat.com>
      Cc: Asias He <asias@redhat.com>
      Cc: <stable@vger.kernel.org> # 3.6+
      Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
      60a01f55
    • W
      target/pscsi: fix return value check · 58932e96
      Wei Yongjun 提交于
      In case of error, the function scsi_host_lookup() returns NULL
      pointer not ERR_PTR(). The IS_ERR() test in the return value check
      should be replaced with NULL test.
      Signed-off-by: NWei Yongjun <yongjun_wei@trendmicro.com.cn>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NNicholas Bellinger <nab@linux-iscsi.org>
      58932e96
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs · f55ac56d
      Linus Torvalds 提交于
      Pull vfs fixes (try two) from Al Viro:
       "nfsd performance regression fix + seq_file lseek(2) fix"
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs:
        seq_file: always update file->f_pos in seq_lseek()
        nfsd regression since delayed fput()
      f55ac56d
    • D
      mtd: gpmi: fix ECC regression · 031e2777
      David Woodhouse 提交于
      The "legacy" ECC layout used until 3.12-rc1 uses all the OOB area by
      computing the ECC strength and ECC step size ourselves.
      
      Commit 2febcdf8 ("mtd: gpmi: set the BCHs geometry with the ecc info")
      makes the driver use the ECC info (ECC strength and ECC step size)
      provided by the MTD code, and creates a different NAND ECC layout
      for the BCH, and use the new ECC layout. This causes a regression:
      
         We can not mount the ubifs which was created by the old NAND ECC layout.
      
      This patch fixes this issue by reverting to the legacy ECC layout.
      
      We will probably introduce a new device-tree property to indicate that
      the new ECC layout can be used. For now though, for the imminent 3.12
      release, we just unconditionally revert to the 3.11 behaviour.
      
      This leaves a harmless cosmetic warning about an unused function. At
      this point in the cycle I really don't care.
      Signed-off-by: NDavid Woodhouse <David.Woodhouse@intel.com>
      Signed-off-by: NBrian Norris <computersforpeace@gmail.com>
      Acked-by: NHuang Shijie <b32955@freescale.com>
      Acked-by: NMarek Vasut <marex@denx.de>
      Tested-by: NMarek Vasut <marex@denx.de>
      031e2777
  7. 25 10月, 2013 11 次提交