1. 19 5月, 2011 1 次提交
    • G
      drivercore: revert addition of of_match to struct device · b1608d69
      Grant Likely 提交于
      Commit b826291c, "drivercore/dt: add a match table pointer to struct
      device" added an of_match pointer to struct device to cache the
      of_match_table entry discovered at driver match time.  This was unsafe
      because matching is not an atomic operation with probing a driver.  If
      two or more drivers are attempted to be matched to a driver at the
      same time, then the cached matching entry pointer could get
      overwritten.
      
      This patch reverts the of_match cache pointer and reworks all users to
      call of_match_device() directly instead.
      Signed-off-by: NGrant Likely <grant.likely@secretlab.ca>
      b1608d69
  2. 18 5月, 2011 5 次提交
  3. 17 5月, 2011 2 次提交
  4. 16 5月, 2011 1 次提交
    • Y
      x86, apic: Fix spurious error interrupts triggering on all non-boot APs · e503f9e4
      Youquan Song 提交于
      This patch fixes a bug reported by a customer, who found
      that many unreasonable error interrupts reported on all
      non-boot CPUs (APs) during the system boot stage.
      
      According to Chapter 10 of Intel Software Developer Manual
      Volume 3A, Local APIC may signal an illegal vector error when
      an LVT entry is set as an illegal vector value (0~15) under
      FIXED delivery mode (bits 8-11 is 0), regardless of whether
      the mask bit is set or an interrupt actually happen. These
      errors are seen as error interrupts.
      
      The initial value of thermal LVT entries on all APs always reads
      0x10000 because APs are woken up by BSP issuing INIT-SIPI-SIPI
      sequence to them and LVT registers are reset to 0s except for
      the mask bits which are set to 1s when APs receive INIT IPI.
      
      When the BIOS takes over the thermal throttling interrupt,
      the LVT thermal deliver mode should be SMI and it is required
      from the kernel to keep AP's LVT thermal monitoring register
      programmed as such as well.
      
      This issue happens when BIOS does not take over thermal throttling
      interrupt, AP's LVT thermal monitor register will be restored to
      0x10000 which means vector 0 and fixed deliver mode, so all APs will
      signal illegal vector error interrupts.
      
      This patch check if interrupt delivery mode is not fixed mode before
      restoring AP's LVT thermal monitor register.
      Signed-off-by: NYouquan Song <youquan.song@intel.com>
      Acked-by: NSuresh Siddha <suresh.b.siddha@intel.com>
      Acked-by: NYong Wang <yong.y.wang@intel.com>
      Cc: hpa@linux.intel.com
      Cc: joe@perches.com
      Cc: jbaron@redhat.com
      Cc: trenn@suse.de
      Cc: kent.liu@intel.com
      Cc: chaohong.guo@intel.com
      Cc: <stable@kernel.org> # As far back as possible
      Link: http://lkml.kernel.org/r/1303402963-17738-1-git-send-email-youquan.song@intel.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      e503f9e4
  5. 14 5月, 2011 2 次提交
  6. 13 5月, 2011 6 次提交
    • J
      x86, mce, AMD: Fix leaving freed data in a list · d9a5ac9e
      Julia Lawall 提交于
      b may be added to a list, but is not removed before being freed
      in the case of an error.  This is done in the corresponding
      deallocation function, so the code here has been changed to
      follow that.
      
      The sematic match that finds this problem is as follows:
      (http://coccinelle.lip6.fr/)
      
      // <smpl>
      @@
      expression E,E1,E2;
      identifier l;
      @@
      
      *list_add(&E->l,E1);
      ... when != E1
          when != list_del(&E->l)
          when != list_del_init(&E->l)
          when != E = E2
      *kfree(E);// </smpl>
      Signed-off-by: NJulia Lawall <julia@diku.dk>
      Cc: Borislav Petkov <borislav.petkov@amd.com>
      Cc: Robert Richter <robert.richter@amd.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Andreas Herrmann <andreas.herrmann3@amd.com>
      Cc: <stable@kernel.org>
      Link: http://lkml.kernel.org/r/1305294731-12127-1-git-send-email-julia@diku.dkSigned-off-by: NIngo Molnar <mingo@elte.hu>
      d9a5ac9e
    • A
      OMAP3: set the core dpll clk rate in its set_rate function · 5fd2a84a
      Avinash H.M 提交于
      The debug l3_ick/rate is not displaying the actual rate of the clock in
      hardware. This is because, the core dpll set_rate function doesn't update the
      clk.rate. After fixing, the l3_ick/rate is displaying proper values.
      Signed-off-by: NShweta Gulati <shweta.gulati@ti.com>
      Signed-off-by: NAvinash.H.M <avinashhm@ti.com>
      Cc: Rajendra Nayak <rnayak@ti.com>
      Cc: Paul Wamsley <paul@pwsan.com>
      Acked-by: NPaul Walmsley <paul@pwsan.com>
      Signed-off-by: NTony Lindgren <tony@atomide.com>
      5fd2a84a
    • C
      x86: Fix UV BAU for non-consecutive nasids · 77ed23f8
      Cliff Wickman 提交于
      This is a fix for the SGI Altix-UV Broadcast Assist Unit code,
      which is used for TLB flushing.
      
      Certain hardware configurations (that customers are ordering)
      cause nasids (numa address space id's) to be non-consecutive.
      Specifically, once you have more than 4 blades in a IRU
      (Individual Rack Unit - or 1/2 rack) but less than the maximum
      of 16, the nasid numbering becomes non-consecutive.  This
      currently results in a 'catastrophic error' (CATERR) detected by
      the firmware during OS boot.  The BAU is generating an 'INTD'
      request that is targeting a non-existent nasid value. Such
      configurations may also occur when a blade is configured off
      because of hardware errors. (There is one UV hub per blade.)
      
      This patch is required to support such configurations.
      
      The problem with the tlb_uv.c code is that is using the
      consecutive hub numbers as indices to the BAU distribution bit
      map. These are simply the ordinal position of the hub or blade
      within its partition.  It should be using physical node numbers
      (pnodes), which correspond to the physical nasid values. Use of
      the hub number only works as long as the nasids in the partition
      are consecutive and increase with a stride of 1.
      
      This patch changes the index to be the pnode number, thus
      allowing nasids to be non-consecutive.
      It also provides a table in local memory for each cpu to
      translate target cpu number to target pnode and nasid.
      And it improves naming to properly reflect 'node' and 'uvhub'
      versus 'nasid'.
      Signed-off-by: NCliff Wickman <cpw@sgi.com>
      Cc: <stable@kernel.org>
      Link: http://lkml.kernel.org/r/E1QJmxX-0002Mz-Fk@eag09.americas.sgi.comSigned-off-by: NIngo Molnar <mingo@elte.hu>
      77ed23f8
    • S
      x86/mm: Fix section mismatch derived from native_pagetable_reserve() · 53f8023f
      Sedat Dilek 提交于
      With CONFIG_DEBUG_SECTION_MISMATCH=y I see these warnings in next-20110415:
      
        LD      vmlinux.o
        MODPOST vmlinux.o
      WARNING: vmlinux.o(.text+0x1ba48): Section mismatch in reference from the function native_pagetable_reserve() to the function .init.text:memblock_x86_reserve_range()
      The function native_pagetable_reserve() references
      the function __init memblock_x86_reserve_range().
      This is often because native_pagetable_reserve lacks a __init
      annotation or the annotation of memblock_x86_reserve_range is wrong.
      
      This patch fixes the issue.
      Thanks to pipacs from PaX project for help on IRC.
      Acked-by: N"H. Peter Anvin" <hpa@zytor.com>
      Signed-off-by: NSedat Dilek <sedat.dilek@gmail.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      53f8023f
    • S
      x86,xen: introduce x86_init.mapping.pagetable_reserve · 279b706b
      Stefano Stabellini 提交于
      Introduce a new x86_init hook called pagetable_reserve that at the end
      of init_memory_mapping is used to reserve a range of memory addresses for
      the kernel pagetable pages we used and free the other ones.
      
      On native it just calls memblock_x86_reserve_range while on xen it also
      takes care of setting the spare memory previously allocated
      for kernel pagetable pages from RO to RW, so that it can be used for
      other purposes.
      
      A detailed explanation of the reason why this hook is needed follows.
      
      As a consequence of the commit:
      
      commit 4b239f45
      Author: Yinghai Lu <yinghai@kernel.org>
      Date:   Fri Dec 17 16:58:28 2010 -0800
      
          x86-64, mm: Put early page table high
      
      at some point init_memory_mapping is going to reach the pagetable pages
      area and map those pages too (mapping them as normal memory that falls
      in the range of addresses passed to init_memory_mapping as argument).
      Some of those pages are already pagetable pages (they are in the range
      pgt_buf_start-pgt_buf_end) therefore they are going to be mapped RO and
      everything is fine.
      Some of these pages are not pagetable pages yet (they fall in the range
      pgt_buf_end-pgt_buf_top; for example the page at pgt_buf_end) so they
      are going to be mapped RW.  When these pages become pagetable pages and
      are hooked into the pagetable, xen will find that the guest has already
      a RW mapping of them somewhere and fail the operation.
      The reason Xen requires pagetables to be RO is that the hypervisor needs
      to verify that the pagetables are valid before using them. The validation
      operations are called "pinning" (more details in arch/x86/xen/mmu.c).
      
      In order to fix the issue we mark all the pages in the entire range
      pgt_buf_start-pgt_buf_top as RO, however when the pagetable allocation
      is completed only the range pgt_buf_start-pgt_buf_end is reserved by
      init_memory_mapping. Hence the kernel is going to crash as soon as one
      of the pages in the range pgt_buf_end-pgt_buf_top is reused (b/c those
      ranges are RO).
      
      For this reason we need a hook to reserve the kernel pagetable pages we
      used and free the other ones so that they can be reused for other
      purposes.
      On native it just means calling memblock_x86_reserve_range, on Xen it
      also means marking RW the pagetable pages that we allocated before but
      that haven't been used before.
      
      Another way to fix this is without using the hook is by adding a 'if
      (xen_pv_domain)' in the 'init_memory_mapping' code and calling the Xen
      counterpart, but that is just nasty.
      Signed-off-by: NStefano Stabellini <stefano.stabellini@eu.citrix.com>
      Acked-by: NYinghai Lu <yinghai@kernel.org>
      Acked-by: NH. Peter Anvin <hpa@zytor.com>
      Cc: Ingo Molnar <mingo@elte.hu>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      279b706b
    • K
      Revert "xen/mmu: Add workaround "x86-64, mm: Put early page table high"" · 92bdaef7
      Konrad Rzeszutek Wilk 提交于
      This reverts commit a3864783.
      
      It does not work with certain AMD machines.
      
      last_pfn = 0x100000 max_arch_pfn = 0x400000000
      initial memory mapped : 0 - 02c3a000
      Base memory trampoline at [ffff88000009b000] 9b000 size 20480
      init_memory_mapping: 0000000000000000-0000000100000000
       0000000000 - 0100000000 page 4k
      kernel direct mapping tables up to 100000000 @ ff7fb000-100000000
      init_memory_mapping: 0000000100000000-00000001e0800000
       0100000000 - 01e0800000 page 4k
      kernel direct mapping tables up to 1e0800000 @ 1df0f3000-1e0000000
      xen: setting RW the range fffdc000 - 100000000
      RAMDISK: 0203b000 - 02c3a000
      No NUMA configuration found
      Faking a node at 0000000000000000-00000001e0800000
      NUMA: Using 63 for the hash shift.
      Initmem setup node 0 0000000000000000-00000001e0800000
        NODE_DATA [00000001dfffb000 - 00000001dfffffff]
      BUG: unable to handle kernel NULL pointer dereference at           (null)
      IP: [<ffffffff81cf6a75>] setup_node_bootmem+0x18a/0x1ea
      PGD 0
      Oops: 0003 [#1] SMP
      last sysfs file:
      CPU 0
      Modules linked in:
      
      Pid: 0, comm: swapper Not tainted 2.6.39-0-virtual #6~smb1
      RIP: e030:[<ffffffff81cf6a75>]  [<ffffffff81cf6a75>] setup_node_bootmem+0x18a/0x1ea
      RSP: e02b:ffffffff81c01e38  EFLAGS: 00010046
      RAX: 0000000000000000 RBX: 00000001e0800000 RCX: 0000000000001040
      RDX: 0000000000004100 RSI: 0000000000000000 RDI: ffff8801dfffb000
      RBP: ffffffff81c01e58 R08: 0000000000000020 R09: 0000000000000000
      R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000000
      R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000bfe400
      FS:  0000000000000000(0000) GS:ffffffff81cca000(0000) knlGS:0000000000000000
      CS:  e033 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000000 CR3: 0000000001c03000 CR4: 0000000000000660
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process swapper (pid: 0, threadinfo ffffffff81c00000, task ffffffff81c0b020)
      Stack:
       0000000000000040 0000000000000001 0000000000000000 ffffffffffffffff
       ffffffff81c01e88 ffffffff81cf6c25 0000000000000000 0000000000000000
       ffffffff81cf687f 0000000000000000 ffffffff81c01ea8 ffffffff81cf6e45
      Call Trace:
       [<ffffffff81cf6c25>] numa_register_memblks.constprop.3+0x150/0x181
       [<ffffffff81cf687f>] ? numa_add_memblk+0x7c/0x7c
       [<ffffffff81cf6e45>] numa_init.part.2+0x1c/0x7c
       [<ffffffff81cf687f>] ? numa_add_memblk+0x7c/0x7c
       [<ffffffff81cf6f67>] numa_init+0x6c/0x70
       [<ffffffff81cf7057>] initmem_init+0x39/0x3b
       [<ffffffff81ce5865>] setup_arch+0x64e/0x769
       [<ffffffff815e43c1>] ? printk+0x51/0x53
       [<ffffffff81cdf92b>] start_kernel+0xd4/0x3f3
       [<ffffffff81cdf388>] x86_64_start_reservations+0x132/0x136
       [<ffffffff81ce2ed4>] xen_start_kernel+0x588/0x58f
      Code: 41 00 00 48 8b 3c c5 a0 24 cc 81 31 c0 40 f6 c7 01 74 05 aa 66 ba ff 40 40 f6 c7 02 74 05 66 ab 83 ea 02 89 d1 c1 e9 02 f6 c2 02 <f3> ab 74 02 66 ab 80 e2 01 74 01 aa 49 63 c4 48 c1 eb 0c 44 89
      RIP  [<ffffffff81cf6a75>] setup_node_bootmem+0x18a/0x1ea
       RSP <ffffffff81c01e38>
      CR2: 0000000000000000
      ---[ end trace a7919e7f17c0a725 ]---
      Kernel panic - not syncing: Attempted to kill the idle task!
      Pid: 0, comm: swapper Tainted: G      D     2.6.39-0-virtual #6~smb1
      Reported-by: NStefan Bader <stefan.bader@canonical.com>
      Signed-off-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      92bdaef7
  7. 12 5月, 2011 5 次提交
    • C
      ARM: 6870/1: The mandatory barrier rmb() must be a dsb() in for device accesses · a904f5f9
      Catalin Marinas 提交于
      Since mandatory barriers may be used (explicitly or implicitly via readl
      etc.) to ensure the ordering between Device and Normal memory accesses,
      a DMB is not enough. This patch converts it to a DSB.
      
      Cc: Colin Cross <ccross@android.com>
      Signed-off-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      a904f5f9
    • A
      ARM: 6892/1: handle ptrace requests to change PC during interrupted system calls · 2af68df0
      Arnd Bergmann 提交于
      GDB's interrupt.exp test cases currenly fail on ARM.  The problem is how do_signal
      handled restarting interrupted system calls:
      
      The entry.S assembler code determines that we come from a system call; and that
      information is passed as "syscall" parameter to do_signal.  That routine then
      calls get_signal_to_deliver [*] and if a signal is to be delivered, calls into
      handle_signal.  If a system call is to be restarted either after the signal
      handler returns, or if no handler is to be called in the first place, the PC
      is updated after the get_signal_to_deliver call, either in handle_signal (if
      we have a handler) or at the end of do_signal (otherwise).
      
      Now the problem is that during [*], the call to get_signal_to_deliver, a ptrace
      intercept may happen.  During this intercept, the debugger may change registers,
      including the PC.  This is done by GDB if it wants to execute an "inferior call",
      i.e. the execution of some code in the debugged program triggered by GDB.
      
      To this purpose, GDB will save all registers, allocate a stack frame, set up
      PC and arguments as appropriate for the call, and point the link register to
      a dummy breakpoint instruction.  Once the process is restarted, it will execute
      the call and then trap back to the debugger, at which point GDB will restore
      all registers and continue original execution.
      
      This generally works fine.  However, now consider what happens when GDB attempts
      to do exactly that while the process was interrupted during execution of a to-be-
      restarted system call:  do_signal is called with the syscall flag set; it calls
      get_signal_to_deliver, at which point the debugger takes over and changes the PC
      to point to a completely different place.  Now get_signal_to_deliver returns
      without a signal to deliver; but now do_signal decides it should be restarting
      a system call, and decrements the PC by 2 or 4 -- so it now points to 2 or 4
      bytes before the function GDB wants to call -- which leads to a subsequent crash.
      
      To fix this problem, two things need to be supported:
      - do_signal must be able to recognize that get_signal_to_deliver changed the PC
        to a different location, and skip the restart-syscall sequence
      - once the debugger has restored all registers at the end of the inferior call
        sequence, do_signal must recognize that *now* it needs to restart the pending
        system call, even though it was now entered from a breakpoint instead of an
        actual svc instruction
      
      This set of issues is solved on other platforms, usually by one of two
      mechanisms:
      
      - The status information "do_signal is handling a system call that may need
        restarting" is itself carried in some register that can be accessed via
        ptrace.  This is e.g. on Intel the "orig_eax" register; on Sparc the kernel
        defines a magic extra bit in the flags register for this purpose.
        This allows GDB to manage that state: reset it when doing an inferior call,
        and restore it after the call is finished.
      
      - On s390, do_signal transparently handles this problem without requiring
        GDB interaction, by performing system call restarting in the following
        way: first, adjust the PC as necessary for restarting the call.  Then,
        call get_signal_to_deliver; and finally just continue execution at the
        PC.  This way, if GDB does not change the PC, everything is as before.
        If GDB *does* change the PC, execution will simply continue there --
        and once GDB restores the PC it saved at that point, it will automatically
        point to the *restarted* system call.  (There is the minor twist how to
        handle system calls that do *not* need restarting -- do_signal will undo
        the PC change in this case, after get_signal_to_deliver has returned, and
        only if ptrace did not change the PC during that call.)
      
      Because there does not appear to be any obvious register to carry the
      syscall-restart information on ARM, we'd either have to introduce a new
      artificial ptrace register just for that purpose, or else handle the issue
      transparently like on s390.  The patch below implements the second option;
      using this patch makes the interrupt.exp test cases pass on ARM, with no
      regression in the GDB test suite otherwise.
      
      Cc: patches@linaro.org
      Signed-off-by: NUlrich Weigand <ulrich.weigand@linaro.org>
      Signed-off-by: NArnd Bergmann <arnd.bergmann@linaro.org>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      2af68df0
    • W
      ARM: 6890/1: memmap: only free allocated memmap entries when using SPARSEMEM · 9af386c8
      Will Deacon 提交于
      The SPARSEMEM code allocates memmap entries only for sections which are
      present (i.e. those which contain some valid memory). The membank checks
      in free_unused_memmap do not take this into account and can incorrectly
      attempt to free memory which is not allocated, resulting in a BUG() in
      the bootmem code.
      
      However, if memory is configured as follows:
      
          |<----section---->|<----hole---->|<----section---->|
          +--------+--------+--------------+--------+--------+
          | bank 0 | unused |              | bank 1 | unused |
          +--------+--------+--------------+--------+--------+
      
      where a bank only occupies part of a section, the memmap allocated for
      the remainder of the section *can* be freed.
      
      This patch modifies the checks in free_unused_memmap so that only valid
      memmap entries are considered for removal.
      Acked-by: NCatalin Marinas <catalin.marinas@arm.com>
      Signed-off-by: NWill Deacon <will.deacon@arm.com>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      9af386c8
    • T
      sparc32: Fixed unaligned memory copying in function __csum_partial_copy_sparc_generic · b1054282
      Tkhai Kirill 提交于
      When we are in the label cc_dword_align, registers %o0 and %o1 have the same last 2 bits,
      but it's not guaranteed one of them is zero. So we can get unaligned memory access
      in label ccte. Example of parameters which lead to this:
      %o0=0x7ff183e9, %o1=0x8e709e7d, %g1=3
      
      With the parameters I had a memory corruption, when the additional 5 bytes were rewritten.
      This patch corrects the error.
      
      One comment to the patch. We don't care about the third bit in %o1, because cc_end_cruft
      stores word or less.
      Signed-off-by: NTkhai Kirill <tkhai@yandex.ru>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      b1054282
    • L
      omap: iommu: Return IRQ_HANDLED in fault handler when no fault occured · c56b2ddd
      Laurent Pinchart 提交于
      Commit d594f1f3 (omap: IOMMU: add
      support to callback during fault handling) broke interrupt line sharing
      between the OMAP3 ISP and its IOMMU. Because of this, every interrupt
      generated by the OMAP3 ISP is handled by the IOMMU driver instead of
      being passed to the OMAP3 ISP driver.
      Signed-off-by: NLaurent Pinchart <laurent.pinchart@ideasonboard.com>
      Acked-by: NHiroshi DOYU <Hiroshi.DOYU@nokia.com>
      Signed-off-by: NTony Lindgren <tony@atomide.com>
      c56b2ddd
  8. 11 5月, 2011 18 次提交