1. 26 1月, 2013 9 次提交
    • D
      x86, kvm: Fix kvm's use of __pa() on percpu areas · 5dfd486c
      Dave Hansen 提交于
      In short, it is illegal to call __pa() on an address holding
      a percpu variable.  This replaces those __pa() calls with
      slow_virt_to_phys().  All of the cases in this patch are
      in boot time (or CPU hotplug time at worst) code, so the
      slow pagetable walking in slow_virt_to_phys() is not expected
      to have a performance impact.
      
      The times when this actually matters are pretty obscure
      (certain 32-bit NUMA systems), but it _does_ happen.  It is
      important to keep KVM guests working on these systems because
      the real hardware is getting harder and harder to find.
      
      This bug manifested first by me seeing a plain hang at boot
      after this message:
      
      	CPU 0 irqstacks, hard=f3018000 soft=f301a000
      
      or, sometimes, it would actually make it out to the console:
      
      [    0.000000] BUG: unable to handle kernel paging request at ffffffff
      
      I eventually traced it down to the KVM async pagefault code.
      This can be worked around by disabling that code either at
      compile-time, or on the kernel command-line.
      
      The kvm async pagefault code was injecting page faults in
      to the guest which the guest misinterpreted because its
      "reason" was not being properly sent from the host.
      
      The guest passes a physical address of an per-cpu async page
      fault structure via an MSR to the host.  Since __pa() is
      broken on percpu data, the physical address it sent was
      bascially bogus and the host went scribbling on random data.
      The guest never saw the real reason for the page fault (it
      was injected by the host), assumed that the kernel had taken
      a _real_ page fault, and panic()'d.  The behavior varied,
      though, depending on what got corrupted by the bad write.
      Signed-off-by: NDave Hansen <dave@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/20130122212435.4905663F@kernel.stglabs.ibm.comAcked-by: NRik van Riel <riel@redhat.com>
      Reviewed-by: NMarcelo Tosatti <mtosatti@redhat.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      5dfd486c
    • D
      x86, mm: Create slow_virt_to_phys() · d7656534
      Dave Hansen 提交于
      This is necessary because __pa() does not work on some kinds of
      memory, like vmalloc() or the alloc_remap() areas on 32-bit
      NUMA systems.  We have some functions to do conversions _like_
      this in the vmalloc() code (like vmalloc_to_page()), but they
      do not work on sizes other than 4k pages.  We would potentially
      need to be able to handle all the page sizes that we use for
      the kernel linear mapping (4k, 2M, 1G).
      
      In practice, on 32-bit NUMA systems, the percpu areas get stuck
      in the alloc_remap() area.  Any __pa() call on them will break
      and basically return garbage.
      
      This patch introduces a new function slow_virt_to_phys(), which
      walks the kernel page tables on x86 and should do precisely
      the same logical thing as __pa(), but actually work on a wider
      range of memory.  It should work on the normal linear mapping,
      vmalloc(), kmap(), etc...
      Signed-off-by: NDave Hansen <dave@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/20130122212433.4D1FCA62@kernel.stglabs.ibm.comAcked-by: NRik van Riel <riel@redhat.com>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      d7656534
    • D
      x86, mm: Use new pagetable helpers in try_preserve_large_page() · f3c4fbb6
      Dave Hansen 提交于
      try_preserve_large_page() can be slightly simplified by using
      the new page_level_*() helpers.  This also moves the 'level'
      over to the new pg_level enum type.
      Signed-off-by: NDave Hansen <dave@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/20130122212432.14F3D993@kernel.stglabs.ibm.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      f3c4fbb6
    • D
      x86, mm: Pagetable level size/shift/mask helpers · 4cbeb51b
      Dave Hansen 提交于
      I plan to use lookup_address() to walk the kernel pagetables
      in a later patch.  It returns a "pte" and the level in the
      pagetables where the "pte" was found.  The level is just an
      enum and needs to be converted to a useful value in order to
      do address calculations with it.  These helpers will be used
      in at least two places.
      
      This also gives the anonymous enum a real name so that no one
      gets confused about what they should be passing in to these
      helpers.
      
      "PTE_SHIFT" was chosen for naming consistency with the other
      pagetable levels (PGD/PUD/PMD_SHIFT).
      
      Cc: H. Peter Anvin <hpa@zytor.com>
      Signed-off-by: NDave Hansen <dave@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/20130122212431.405D3A8C@kernel.stglabs.ibm.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      4cbeb51b
    • D
      x86, mm: Make DEBUG_VIRTUAL work earlier in boot · a25b9316
      Dave Hansen 提交于
      The KVM code has some repeated bugs in it around use of __pa() on
      per-cpu data.  Those data are not in an area on which using
      __pa() is valid.  However, they are also called early enough in
      boot that __vmalloc_start_set is not set, and thus the
      CONFIG_DEBUG_VIRTUAL debugging does not catch them.
      
      This adds a check to also verify __pa() calls against max_low_pfn,
      which we can use earler in boot than is_vmalloc_addr().  However,
      if we are super-early in boot, max_low_pfn=0 and this will trip
      on every call, so also make sure that max_low_pfn is set before
      we try to use it.
      
      With this patch applied, CONFIG_DEBUG_VIRTUAL will actually
      catch the bug I was chasing (and fix later in this series).
      
      I'd love to find a generic way so that any __pa() call on percpu
      areas could do a BUG_ON(), but there don't appear to be any nice
      and easy ways to check if an address is a percpu one.  Anybody
      have ideas on a way to do this?
      Signed-off-by: NDave Hansen <dave@linux.vnet.ibm.com>
      Link: http://lkml.kernel.org/r/20130122212430.F46F8159@kernel.stglabs.ibm.comSigned-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      a25b9316
    • H
      Merge tag 'v3.8-rc5' into x86/mm · 7b5c4a65
      H. Peter Anvin 提交于
      The __pa() fixup series that follows touches KVM code that is not
      present in the existing branch based on v3.7-rc5, so merge in the
      current upstream from Linus.
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      7b5c4a65
    • H
      Merge branch 'x86/mm' of ssh://ra.kernel.org/pub/scm/linux/kernel/git/tip/tip into x86/mm · 3596f5bb
      H. Peter Anvin 提交于
      Add missing patch from the __pa_symbol conversion series by Alexander
      Duyck.
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      3596f5bb
    • L
      Linux 3.8-rc5 · 949db153
      Linus Torvalds 提交于
      949db153
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs · d7df025e
      Linus Torvalds 提交于
      Pull btrfs fixes from Chris Mason:
       "It turns out that we had two crc bugs when running fsx-linux in a
        loop.  Many thanks to Josef, Miao Xie, and Dave Sterba for nailing it
        all down.  Miao also has a new OOM fix in this v2 pull as well.
      
        Ilya fixed a regression Liu Bo found in the balance ioctls for pausing
        and resuming a running balance across drives.
      
        Josef's orphan truncate patch fixes an obscure corruption we'd see
        during xfstests.
      
        Arne's patches address problems with subvolume quotas.  If the user
        destroys quota groups incorrectly the FS will refuse to mount.
      
        The rest are smaller fixes and plugs for memory leaks."
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/linux-btrfs: (30 commits)
        Btrfs: fix repeated delalloc work allocation
        Btrfs: fix wrong max device number for single profile
        Btrfs: fix missed transaction->aborted check
        Btrfs: Add ACCESS_ONCE() to transaction->abort accesses
        Btrfs: put csums on the right ordered extent
        Btrfs: use right range to find checksum for compressed extents
        Btrfs: fix panic when recovering tree log
        Btrfs: do not allow logged extents to be merged or removed
        Btrfs: fix a regression in balance usage filter
        Btrfs: prevent qgroup destroy when there are still relations
        Btrfs: ignore orphan qgroup relations
        Btrfs: reorder locks and sanity checks in btrfs_ioctl_defrag
        Btrfs: fix unlock order in btrfs_ioctl_rm_dev
        Btrfs: fix unlock order in btrfs_ioctl_resize
        Btrfs: fix "mutually exclusive op is running" error code
        Btrfs: bring back balance pause/resume logic
        btrfs: update timestamps on truncate()
        btrfs: fix btrfs_cont_expand() freeing IS_ERR em
        Btrfs: fix a bug when llseek for delalloc bytes behind prealloc extents
        Btrfs: fix off-by-one in lseek
        ...
      d7df025e
  2. 25 1月, 2013 17 次提交
    • L
      Merge branch 'for-next' of git://git.samba.org/sfrench/cifs-2.6 · 66e2d3e8
      Linus Torvalds 提交于
      Pull cifs fixes from Steve French:
       "Two small cifs fixes"
      
      * 'for-next' of git://git.samba.org/sfrench/cifs-2.6:
        fs/cifs/cifs_dfs_ref.c: fix potential memory leakage
        cifs: fix srcip_matches() for ipv6
      66e2d3e8
    • L
      Merge git://git.kernel.org/pub/scm/virt/kvm/kvm · d93816a6
      Linus Torvalds 提交于
      Pull kvm fixlet from Marcelo Tosatti.
      
      * git://git.kernel.org/pub/scm/virt/kvm/kvm:
        KVM: PPC: Emulate dcbf
      d93816a6
    • L
      Merge branch 'fixes' of git://git.linaro.org/people/rmk/linux-arm · 01acd3ef
      Linus Torvalds 提交于
      Pull ARM fixes from Russell King:
       "A number of fixes:
      
        Patrik found a problem with preempt counting in the VFP assembly
        functions which can cause the preempt count to be upset.
      
        Nicolas fixed a problem with the parsing of the DT when it straddles a
        1MB boundary.
      
        Subhash Jadavani reported a problem with sparsemem and our highmem
        support for cache maintanence for DMA areas, and TI found a bug in
        their strongly ordered memory mapping type.
      
        Also, three fixes by way of Will Deacon's tree from Dave Martin for
        instruction compatibility and Marc Zyngier to fix hypervisor boot mode
        issues."
      
      * 'fixes' of git://git.linaro.org/people/rmk/linux-arm:
        ARM: 7629/1: mm: Fix missing XN flag for for MT_MEMORY_SO
        ARM: DMA: Fix struct page iterator in dma_cache_maint() to work with sparsemem
        ARM: 7628/1: head.S: map one extra section for the ATAG/DTB area
        ARM: 7627/1: Predicate preempt logic on PREEMP_COUNT not PREEMPT alone
        ARM: virt: simplify __hyp_stub_install epilog
        ARM: virt: boot secondary CPUs through the right entry point
        ARM: virt: Avoid bx instruction for compatibility with <=ARMv4
      01acd3ef
    • L
      Merge tag 'fixes-for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · 1496ec13
      Linus Torvalds 提交于
      Pull ARM SoC fixes from Olof Johansson:
       "Here's a long-pending fixes pull request for arm-soc (I didn't send
        one in the -rc4 cycle).
      
        The larger deltas are from:
      
         - A fixup of error paths in the mvsdio driver
      
         - Header file move for a driver that hadn't been properly converted
           to multiplatform on i.MX, which was causing build failures when
           included
      
         - Device tree updates for at91 dealing mostly with their new pinctrl
           setup merged in 3.8 and mistakes in those initial configs
      
        The rest are the normal mix of small fixes all over the place; sunxi,
        omap, imx, mvebu, etc, etc."
      
      * tag 'fixes-for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc: (40 commits)
        mfd: vexpress-sysreg: Don't skip initialization on probe
        ARM: vexpress: Enable A7 cores in V2P-CA15_A7's Device Tree
        ARM: vexpress: extend the MPIDR range used for pen release check
        ARM: at91/dts: correct comment in at91sam9x5.dtsi for mii
        ARM: at91/at91_dt_defconfig: add at91sam9n12 SoC to DT defconfig
        ARM: at91/at91_dt_defconfig: remove memory specification to cmdline
        ARM: at91/dts: add macb mii pinctrl config for kizbox
        ARM: at91: rm9200: remake the BGA as default version
        ARM: at91: fix gpios on i2c-gpio for RM9200 DT
        ARM: at91/at91sam9x5 DTS: add SCK USART pins
        ARM: at91/at91sam9x5 DTS: correct wrong PIO BANK values on u(s)arts
        ARM: at91/at91-pinctrl documentation: fix typo and add some details
        ARM: kirkwood: fix missing #interrupt-cells property
        mmc: mvsdio: use devm_ API to simplify/correct error paths.
        clk: mvebu/clk-cpu.c: fix memory leakage
        ARM: OMAP2+: omap4-panda: add UART2 muxing for WiLink shared transport
        ARM: OMAP2+: DT node Timer iteration fix
        ARM: OMAP2+: Fix section warning for omap_init_ocp2scp()
        ARM: OMAP2+: fix build break for omapdrm
        ARM: OMAP2: Fix missing omap2xxx_clkt_vps_late_init function calls
        ...
      1496ec13
    • L
      Merge tag 'pm+acpi-for-3.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm · ba2ab41f
      Linus Torvalds 提交于
      Pull ACPI and power management fixes from Rafael Wysocki:
      
       - Two cpuidle initialization fixes from Konrad Rzeszutek Wilk.
      
       - cpufreq regression fixes for AMD processors from Borislav Petkov,
         Stefan Bader, and Matthew Garrett.
      
       - ACPI cpufreq fix from Thomas Schlichter.
      
       - cpufreq and devfreq fixes related to incorrect usage of operating
         performance points (OPP) framework and RCU from Nishanth Menon.
      
       - APEI workaround for incorrect BIOS information from Lans Zhang.
      
      * tag 'pm+acpi-for-3.8-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
        cpufreq: Add module aliases for acpi-cpufreq
        ACPI: Check MSR valid bit before using P-state frequencies
        PM / devfreq: exynos4_bus: honor RCU lock usage
        PM / devfreq: add locking documentation for recommended_opp
        cpufreq: cpufreq-cpu0: use RCU locks around usage of OPP
        cpufreq: OMAP: use RCU locks around usage of OPP
        ACPI, APEI: Fixup incorrect 64-bit access width firmware bug
        ACPI / processor: Get power info before updating the C-states
        powernow-k8: Add a kconfig dependency on acpi-cpufreq
        ACPI / cpuidle: Fix NULL pointer issues when cpuidle is disabled
        intel_idle: Don't register CPU notifier if we are not running.
      ba2ab41f
    • L
      Merge tag 'regmap-fix-3.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap · bff92411
      Linus Torvalds 提交于
      Pull regmap fixes from Mark Brown:
       "One more oversight in the debugfs code was reported and fixed, plus a
        documentation fix."
      
      * tag 'regmap-fix-3.8-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regmap:
        regmap: fix small typo in regmap_bulk_write comment
        regmap: debugfs: Fix seeking from the cache
      bff92411
    • L
      Merge branch 'fixes' of git://git.infradead.org/users/vkoul/slave-dma · 3f58e094
      Linus Torvalds 提交于
      Pull slave-dmaengine fixes from Vinod Koul:
       "A few fixes on slave dmanengine.  There are trivial fixes in imx-dma,
        tegra-dma & ioat driver"
      
      * 'fixes' of git://git.infradead.org/users/vkoul/slave-dma:
        dma: tegra: implement flags parameters for cyclic transfer
        dmaengine: imx-dma: Disable use of hw_chain to fix sg_dma transfers.
        ioat: Fix DMA memory sync direction correct flag
      3f58e094
    • L
      Merge branch 'i2c-embedded/for-current' of git://git.pengutronix.de/git/wsa/linux · acc5da0f
      Linus Torvalds 提交于
      Pill i2c fixes from Wolfram Sang:
       "Here are a few, typical driver fixes for the I2C subsystem"
      
      * 'i2c-embedded/for-current' of git://git.pengutronix.de/git/wsa/linux:
        i2c-designware: add missing MODULE_LICENSE
        i2c: omap: fix draining irq handling
        i2c: omap: errata i462: fix incorrect ack for arbitration lost interrupt
        i2c: muxes: fix wrong use of sizeof(ptr)
        i2c: sirf: register i2c_client from dt child-nodes in probe entry
        i2c: mxs: Fix type of error code
        i2c: mxs: Fix misuse init_completion
      acc5da0f
    • M
      Btrfs: fix repeated delalloc work allocation · 1eafa6c7
      Miao Xie 提交于
      btrfs_start_delalloc_inodes() locks the delalloc_inodes list, fetches the
      first inode, unlocks the list, triggers btrfs_alloc_delalloc_work/
      btrfs_queue_worker for this inode, and then it locks the list, checks the
      head of the list again. But because we don't delete the first inode that it
      deals with before, it will fetch the same inode. As a result, this function
      allocates a huge amount of btrfs_delalloc_work structures, and OOM happens.
      
      Fix this problem by splice this delalloc list.
      Reported-by: NAlex Lyakas <alex.btrfs@zadarastorage.com>
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      1eafa6c7
    • M
      Btrfs: fix wrong max device number for single profile · c9f01bfe
      Miao Xie 提交于
      The max device number of single profile is 1, not 0 (0 means 'as many as
      possible'). Fix it.
      
      Cc: Liu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Reviewed-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      c9f01bfe
    • M
      Btrfs: fix missed transaction->aborted check · 2cba30f1
      Miao Xie 提交于
      First, though the current transaction->aborted check can stop the commit early
      and avoid unnecessary operations, it is too early, and some transaction handles
      don't end, those handles may set transaction->aborted after the check.
      
      Second, when we commit the transaction, we will wake up some worker threads to
      flush the space cache and inode cache. Those threads also allocate some transaction
      handles and may set transaction->aborted if some serious error happens.
      
      So we need more check for ->aborted when committing the transaction. Fix it.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      2cba30f1
    • M
      Btrfs: Add ACCESS_ONCE() to transaction->abort accesses · 8d25a086
      Miao Xie 提交于
      We may access and update transaction->aborted on the different CPUs without
      lock, so we need ACCESS_ONCE() wrapper to prevent the compiler from creating
      unsolicited accesses and make sure we can get the right value.
      Signed-off-by: NMiao Xie <miaox@cn.fujitsu.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      8d25a086
    • J
      Btrfs: put csums on the right ordered extent · e58dd74b
      Josef Bacik 提交于
      I noticed a WARN_ON going off when adding csums because we were going over
      the amount of csum bytes that should have been allowed for an ordered
      extent.  This is a leftover from when we used to hold the csums privately
      for direct io, but now we use the normal ordered sum stuff so we need to
      make sure and check if we've moved on to another extent so that the csums
      are added to the right extent.  Without this we could end up with csums for
      bytenrs that don't have extents to cover them yet.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      e58dd74b
    • L
      Btrfs: use right range to find checksum for compressed extents · 192000dd
      Liu Bo 提交于
      For compressed extents, the range of checksum is covered by disk length,
      and the disk length is different with ram length, so we need to use disk
      length instead to get us the right checksum.
      Signed-off-by: NLiu Bo <bo.li.liu@oracle.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      192000dd
    • J
      Btrfs: fix panic when recovering tree log · b0175117
      Josef Bacik 提交于
      A user reported a BUG_ON(ret) that occured during tree log replay.  Ret was
      -EAGAIN, so what I think happened is that we removed an extent that covered
      a bitmap entry and an extent entry.  We remove the part from the bitmap and
      return -EAGAIN and then search for the next piece we want to remove, which
      happens to be an entire extent entry, so we just free the sucker and return.
      The problem is ret is still set to -EAGAIN so we trip the BUG_ON().  The
      user used btrfs-zero-log so I'm not 100% sure this is what happened so I've
      added a WARN_ON() to catch the other possibility.  Thanks,
      Reported-by: NJan Steffens <jan.steffens@gmail.com>
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      b0175117
    • J
      Btrfs: do not allow logged extents to be merged or removed · 201a9038
      Josef Bacik 提交于
      We drop the extent map tree lock while we're logging extents, so somebody
      could come in and merge another extent into this one and screw up our
      logging, or they could even remove us from the list which would keep us from
      logging the extent or freeing our ref on it, so we need to make sure to not
      clear LOGGING until after the extent is logged, and then we can merge it to
      adjacent extents.  Thanks,
      Signed-off-by: NJosef Bacik <jbacik@fusionio.com>
      201a9038
    • O
      Merge branch 'vexpress/fixes' of git://git.linaro.org/people/pawelmoll/linux into fixes · 3836414f
      Olof Johansson 提交于
      From Pawel Moll:
      - makes the V2P-CA15_A7 (a.k.a. TC2) work with 3.8 kernels
      - improves vexpress-sysreg.c behaviour on arm64 platforms
      
      * 'vexpress/fixes' of git://git.linaro.org/people/pawelmoll/linux:
        mfd: vexpress-sysreg: Don't skip initialization on probe
        ARM: vexpress: Enable A7 cores in V2P-CA15_A7's Device Tree
        ARM: vexpress: extend the MPIDR range used for pen release check
      3836414f
  3. 24 1月, 2013 14 次提交