1. 13 10月, 2010 1 次提交
    • Y
      memblock, bootmem: Round pfn properly for memory and reserved regions · c7fc2de0
      Yinghai Lu 提交于
      We need to round memory regions correctly -- specifically, we need to
      round reserved region in the more expansive direction (lower limit
      down, upper limit up) whereas usable memory regions need to be rounded
      in the more restrictive direction (lower limit up, upper limit down).
      
      This introduces two set of inlines:
      
      	memblock_region_memory_base_pfn()
      	memblock_region_memory_end_pfn()
      	memblock_region_reserved_base_pfn()
      	memblock_region_reserved_end_pfn()
      
      Although they are antisymmetric (and therefore are technically
      duplicates) the use of the different inlines explicitly documents the
      programmer's intention.
      
      The lack of proper rounding caused a bug on ARM, which was then found
      to also affect other architectures.
      Reported-by: NRussell King <rmk@arm.linux.org.uk>
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      LKML-Reference: <4CB4CDFD.4020105@kernel.org>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      c7fc2de0
  2. 12 10月, 2010 4 次提交
    • H
      Merge branch 'x86/urgent' into core/memblock · 8e4029ee
      H. Peter Anvin 提交于
      Reason for merge:
      
      Forward-port urgent change to arch/x86/mm/srat_64.c to the memblock tree.
      
      Resolved Conflicts:
      	arch/x86/mm/srat_64.c
      Originally-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      8e4029ee
    • Y
      memblock: Annotate memblock functions with __init_memblock · cd79481d
      Yinghai Lu 提交于
      Stephen found
      
      WARNING: mm/built-in.o(.text+0x25ab8): Section mismatch in reference from the function memblock_find_base() to the function .init.text:memblock_find_region()
      The function memblock_find_base() references
      the function __init memblock_find_region().
      This is often because memblock_find_base lacks a __init
      annotation or the annotation of memblock_find_region is wrong.
      
      So let memblock_find_region() to use __init_memblock instead of __init
      directly.
      
      Also fix one function that did not have __init* to be __init_memblock.
      Reported-by: NStephen Rothwell <sfr@canb.auug.org.au>
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      LKML-Reference: <4CB366B1.40405@kernel.org>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      cd79481d
    • J
      memblock: Allow memblock_init to be called early · 236260b9
      Jeremy Fitzhardinge 提交于
      The Xen setup code needs to call memblock_x86_reserve_range() very early,
      so allow it to initialize the memblock subsystem before doing so.  The
      second memblock_init() is ignored.
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Cc: Yinghai Lu <yinghai@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      LKML-Reference: <4CACFDAD.3090900@goop.org>
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      236260b9
    • Y
      x86, numa: For each node, register the memory blocks actually used · 73cf624d
      Yinghai Lu 提交于
      Russ reported SGI UV is broken recently. He said:
      
      | The SRAT table shows that memory range is spread over two nodes.
      |
      | SRAT: Node 0 PXM 0 100000000-800000000
      | SRAT: Node 1 PXM 1 800000000-1000000000
      | SRAT: Node 0 PXM 0 1000000000-1080000000
      |
      |Previously, the kernel early_node_map[] would show three entries
      |with the proper node.
      |
      |[    0.000000]     0: 0x00100000 -> 0x00800000
      |[    0.000000]     1: 0x00800000 -> 0x01000000
      |[    0.000000]     0: 0x01000000 -> 0x01080000
      |
      |The problem is recent community kernel early_node_map[] shows
      |only two entries with the node 0 entry overlapping the node 1
      |entry.
      |
      |    0: 0x00100000 -> 0x01080000
      |    1: 0x00800000 -> 0x01000000
      
      After looking at the changelog, Found out that it has been broken for a while by
      following commit
      
      |commit 8716273c
      |Author: David Rientjes <rientjes@google.com>
      |Date:   Fri Sep 25 15:20:04 2009 -0700
      |
      |    x86: Export srat physical topology
      
      Before that commit, register_active_regions() is called for every SRAT memory
      entry right away.
      
      Use nodememblk_range[] instead of nodes[] in order to make sure we
      capture the actual memory blocks registered with each node.  nodes[]
      contains an extended range which spans all memory regions associated
      with a node, but that does not mean that all the memory in between are
      included.
      Reported-by: NRuss Anderson <rja@sgi.com>
      Tested-by: NRuss Anderson <rja@sgi.com>
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      LKML-Reference: <4CB27BDF.5000800@kernel.org>
      Acked-by: NDavid Rientjes <rientjes@google.com>
      Cc: <stable@kernel.org> 2.6.33 .34 .35 .36
      Signed-off-by: NH. Peter Anvin <hpa@linux.intel.com>
      73cf624d
  3. 11 10月, 2010 1 次提交
    • B
      x86, AMD, MCE thresholding: Fix the MCi_MISCj iteration order · 6dcbfe4f
      Borislav Petkov 提交于
      This fixes possible cases of not collecting valid error info in
      the MCE error thresholding groups on F10h hardware.
      
      The current code contains a subtle problem of checking only the
      Valid bit of MSR0000_0413 (which is MC4_MISC0 - DRAM
      thresholding group) in its first iteration and breaking out if
      the bit is cleared.
      
      But (!), this MSR contains an offset value, BlkPtr[31:24], which
      points to the remaining MSRs in this thresholding group which
      might contain valid information too. But if we bail out only
      after we checked the valid bit in the first MSR and not the
      block pointer too, we miss that other information.
      
      The thing is, MC4_MISC0[BlkPtr] is not predicated on
      MCi_STATUS[MiscV] or MC4_MISC0[Valid] and should be checked
      prior to iterating over the MCI_MISCj thresholding group,
      irrespective of the MC4_MISC0[Valid] setting.
      Signed-off-by: NBorislav Petkov <borislav.petkov@amd.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      6dcbfe4f
  4. 08 10月, 2010 16 次提交
  5. 07 10月, 2010 15 次提交
    • A
      HWPOISON: Stop shrinking at right page count · 47f43e7e
      Andi Kleen 提交于
      When we call the slab shrinker to free a page we need to stop at
      page count one because the caller always holds a single reference, not zero.
      
      This avoids useless looping over slab shrinkers and freeing too much
      memory.
      Reviewed-by: NWu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      47f43e7e
    • A
      HWPOISON: Report correct address granuality for AO huge page errors · 0d9ee6a2
      Andi Kleen 提交于
      The SIGBUS user space signalling is supposed to report the
      address granuality of a corruption. Pass this information correctly
      for huge pages by querying the hpage order.
      Reviewed-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Reviewed-by: NWu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      0d9ee6a2
    • A
      HWPOISON: Copy si_addr_lsb to user · a337fdac
      Andi Kleen 提交于
      The original hwpoison code added a new siginfo field si_addr_lsb to
      pass the granuality of the fault address to user space. Unfortunately
      this field was never copied to user space. Fix this here.
      
      I added explicit checks for the MCEERR codes to avoid having
      to patch all potential callers to initialize the field.
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      a337fdac
    • N
      page-types.c: fix name of unpoison interface · 67159813
      Naoya Horiguchi 提交于
      The page-types utility still uses an out of date name for the
      unpoison interface: debugfs:hwpoison/renew-pfn
      This patch renames and fixes it.
      Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Acked-by: NWu Fengguang <fengguang.wu@intel.com>
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      67159813
    • J
      elevator: fix oops on early call to elevator_change() · 430c62fb
      Jens Axboe 提交于
      2.6.36 introduces an API for drivers to switch the IO scheduler
      instead of manually calling the elevator exit and init functions.
      This API was added since q->elevator must be cleared in between
      those two calls. And since we already have this functionality
      directly from use by the sysfs interface to switch schedulers
      online, it was prudent to reuse it internally too.
      
      But this API needs the queue to be in a fully initialized state
      before it is called, or it will attempt to unregister elevator
      kobjects before they have been added. This results in an oops
      like this:
      
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000051
      IP: [<ffffffff8116f15e>] sysfs_create_dir+0x2e/0xc0
      PGD 47ddfc067 PUD 47c6a1067 PMD 0
      Oops: 0000 [#1] PREEMPT SMP
      last sysfs file: /sys/devices/pci0000:00/0000:00:02.0/0000:04:00.1/irq
      CPU 2
      Modules linked in: t(+) loop hid_apple usbhid ahci ehci_hcd uhci_hcd libahci usbcore nls_base igb
      
      Pid: 7319, comm: modprobe Not tainted 2.6.36-rc6+ #132 QSSC-S4R/QSSC-S4R
      RIP: 0010:[<ffffffff8116f15e>]  [<ffffffff8116f15e>] sysfs_create_dir+0x2e/0xc0
      RSP: 0018:ffff88027da25d08  EFLAGS: 00010246
      RAX: ffff88047c68c528 RBX: 00000000fffffffe RCX: 0000000000000000
      RDX: 000000000000002f RSI: 000000000000002f RDI: ffff88047e196c88
      RBP: ffff88027da25d38 R08: 0000000000000000 R09: d84156c5635688c0
      R10: d84156c5635688c0 R11: 0000000000000000 R12: ffff88047e196c88
      R13: 0000000000000000 R14: 0000000000000000 R15: ffff88047c68c528
      FS:  00007fcb0b26f6e0(0000) GS:ffff880287400000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
      CR2: 0000000000000051 CR3: 000000047e76e000 CR4: 00000000000006e0
      DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
      DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
      Process modprobe (pid: 7319, threadinfo ffff88027da24000, task ffff88027d377090)
      Stack:
       ffff88027da25d58 ffff88047c68c528 00000000fffffffe ffff88047e196c88
      <0> ffff88047c68c528 ffff88047e05bd90 ffff88027da25d78 ffffffff8123fb77
      <0> ffff88047e05bd90 0000000000000000 ffff88047e196c88 ffff88047c68c528
      Call Trace:
       [<ffffffff8123fb77>] kobject_add_internal+0xe7/0x1f0
       [<ffffffff8123fd98>] kobject_add_varg+0x38/0x60
       [<ffffffff8123feb9>] kobject_add+0x69/0x90
       [<ffffffff8116efe0>] ? sysfs_remove_dir+0x20/0xa0
       [<ffffffff8103d48d>] ? sub_preempt_count+0x9d/0xe0
       [<ffffffff8143de20>] ? _raw_spin_unlock+0x30/0x50
       [<ffffffff8116efe0>] ? sysfs_remove_dir+0x20/0xa0
       [<ffffffff8116eff4>] ? sysfs_remove_dir+0x34/0xa0
       [<ffffffff81224204>] elv_register_queue+0x34/0xa0
       [<ffffffff81224aad>] elevator_change+0xfd/0x250
       [<ffffffffa007e000>] ? t_init+0x0/0x361 [t]
       [<ffffffffa007e000>] ? t_init+0x0/0x361 [t]
       [<ffffffffa007e0a8>] t_init+0xa8/0x361 [t]
       [<ffffffff810001de>] do_one_initcall+0x3e/0x170
       [<ffffffff8108c3fd>] sys_init_module+0xbd/0x220
       [<ffffffff81002f2b>] system_call_fastpath+0x16/0x1b
      Code: e5 41 56 41 55 41 54 49 89 fc 53 48 83 ec 10 48 85 ff 74 52 48 8b 47 18 49 c7 c5 00 46 61 81 48 85 c0 74 04 4c 8b 68 30 45 31 f6 <41> 80 7d 51 00 74 0e 49 8b 44 24 28 4c 89 e7 ff 50 20 49 89 c6
      RIP  [<ffffffff8116f15e>] sysfs_create_dir+0x2e/0xc0
       RSP <ffff88027da25d08>
      CR2: 0000000000000051
      ---[ end trace a6541d3bf07945df ]---
      
      Fix this by adding a registered bit to the elevator queue, which is
      set when the sysfs kobjects have been registered.
      Signed-off-by: NJens Axboe <jaxboe@fusionio.com>
      430c62fb
    • D
      drm: don't drop handle reference on unload · dab8dcfa
      Dave Airlie 提交于
      since the handle references are all tied to a file_priv, and when it disappears
      all the handle refs go with it.
      
      The fbcon ones we'd only notice on unload, but the nouveau notifier one
      would would happen on reboot.
      
      nouveau: Reported-by: Marc Dionne <marc.c.dionne@gmail.com>
      nouveau: Tested-by: Marc Dionne <marc.c.dionne@gmail.com>
      i915 unload: Reported-by: Keith Packard <keithp@keithp.com>
      Acked-by: NBen Skeggs <bskeggs@redhat.com>
      Signed-off-by: NDave Airlie <airlied@redhat.com>
      dab8dcfa
    • J
      xfs: properly account for reclaimed inodes · 081003ff
      Johannes Weiner 提交于
      When marking an inode reclaimable, a per-AG counter is increased, the
      inode is tagged reclaimable in its per-AG tree, and, when this is the
      first reclaimable inode in the AG, the AG entry in the per-mount tree
      is also tagged.
      
      When an inode is finally reclaimed, however, it is only deleted from
      the per-AG tree.  Neither the counter is decreased, nor is the parent
      tree's AG entry untagged properly.
      
      Since the tags in the per-mount tree are not cleared, the inode
      shrinker iterates over all AGs that have had reclaimable inodes at one
      point in time.
      
      The counters on the other hand signal an increasing amount of slab
      objects to reclaim.  Since "70e60ce7 xfs: convert inode shrinker to
      per-filesystem context" this is not a real issue anymore because the
      shrinker bails out after one iteration.
      
      But the problem was observable on a machine running v2.6.34, where the
      reclaimable work increased and each process going into direct reclaim
      eventually got stuck on the xfs inode shrinking path, trying to scan
      several million objects.
      
      Fix this by properly unwinding the reclaimable-state tracking of an
      inode when it is reclaimed.
      Signed-off-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: stable@kernel.org
      Reviewed-by: NDave Chinner <dchinner@redhat.com>
      Signed-off-by: NAlex Elder <aelder@sgi.com>
      081003ff
    • V
      md: check return code of read_sb_page · 5c04f551
      Vasiliy Kulikov 提交于
      Function read_sb_page may return ERR_PTR(...). Check for it.
      Signed-off-by: NVasiliy Kulikov <segooon@gmail.com>
      Cc: Neil Brown <neilb@suse.de>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      5c04f551
    • N
      md/raid1: minor bio initialisation improvements. · db8d9d35
      NeilBrown 提交于
      
      When performing a resync we pre-allocate some bios and repeatedly use
      them.  This requires us to re-initialise them each time.
      One field (bi_comp_cpu) and some flags weren't being initiaised
      reliably.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      db8d9d35
    • N
      md/raid1: avoid overflow in raid1 resync when bitmap is in use. · 7571ae88
      NeilBrown 提交于
      bitmap_start_sync returns - via a pass-by-reference variable - the
      number of sectors before we need to check with the bitmap again.
      Since commit ef425673 this number can be substantially larger,
      2^27 is a common value.
      
      Unfortunately it is an 'int' and so when raid1.c:sync_request shifts
      it 9 places to the left it becomes 0.  This results in a zero-length
      read which the scsi layer justifiably complains about.
      
      This patch just removes the shift so the common case becomes safe with
      a trivially-correct patch.
      
      In the next merge window we will convert this 'int' to a 'sector_t'
      Reported-by: N"George Spelvin" <linux@horizon.com>
      Signed-off-by: NNeilBrown <neilb@suse.de>
      7571ae88
    • L
      Linux 2.6.36-rc7 · cb655d0f
      Linus Torvalds 提交于
      cb655d0f
    • L
      Merge branch 'upstream' of git://git.linux-mips.org/pub/scm/upstream-linus · 81c20b96
      Linus Torvalds 提交于
      * 'upstream' of git://git.linux-mips.org/pub/scm/upstream-linus:
        MIPS: Octeon: Place cnmips_cu2_setup in __init memory.
        MIPS: Don't place cu2 notifiers in __cpuinitdata
        MIPS: Calculate VMLINUZ_LOAD_ADDRESS based on the length of vmlinux.bin
        MIPS: Alchemy: Resolve prom section mismatches
        MIPS: Fix syscall 64 bit number comments.
        MIPS: Hookup fanotify_init, fanotify_mark, and prlimit64 syscalls.
        MIPS: TX49xx: Rename ARCH_KMALLOC_MINALIGN to ARCH_DMA_MINALIGN
        MIPS: N32: Fix getdents64 syscall for n32
        MIPS: Remove pr_<level> uses of KERN_<level>
        MIPS: PNX8550: Sort out machine halt, restart and powerdown functions.
        MIPS: GIC: Remove dependencies from Malta files.
        MIPS: Kconfig: Fix and clarify kconfig help text for VSMP and SMTC.
        MIPS: DMA: Fix computation of DMA flags from device's coherent_dma_mask.
        MIPS: Audit: Fix hang in entry.S.
        MIPS: Document why RELOC_HIDE is there.
        MIPS: Octeon: Determine if helper needs to be built
        MIPS: Use generic atomic64 for 32-bit kernels
        MIPS: RM7000: Symbol should be static
        MIPS: kspd: Adjust confusing if indentation
        MIPS: Fix a typo.
      81c20b96
    • L
      Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block · 089eed29
      Linus Torvalds 提交于
      * 'for-linus' of git://git.kernel.dk/linux-2.6-block:
        writeback: always use sb->s_bdi for writeback purposes
      089eed29
    • L
      Merge branch 'v2.6.36-rc6-urgent-fixes' of git://xenbits.xen.org/people/sstabellini/linux-pvhvm · 34984f54
      Linus Torvalds 提交于
      * 'v2.6.36-rc6-urgent-fixes' of git://xenbits.xen.org/people/sstabellini/linux-pvhvm:
        xen: do not initialize PV timers on HVM if !xen_have_vector_callback
        xen: do not set xenstored_ready before xenbus_probe on hvm
      34984f54
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse · 8fe9793a
      Linus Torvalds 提交于
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mszeredi/fuse:
        fuse: Initialize total_len in fuse_retrieve()
      8fe9793a
  6. 06 10月, 2010 3 次提交
    • Y
      x86, memblock: Remove __memblock_x86_find_in_range_size() · 16c36f74
      Yinghai Lu 提交于
      Fold it into memblock_x86_find_in_range(), and change bad_addr_size()
      to check_reserve_memblock().
      
      So whole memblock_x86_find_in_range_size() code is more readable.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      LKML-Reference: <4CAA4DEC.4000401@kernel.org>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      16c36f74
    • Y
      memblock: Fix wraparound in find_region() · f1af98c7
      Yinghai Lu 提交于
      When trying to find huge range for crashkernel, get
      
      [    0.000000] ------------[ cut here ]------------
      [    0.000000] WARNING: at arch/x86/mm/memblock.c:248 memblock_x86_reserve_range+0x40/0x7a()
      [    0.000000] Hardware name: Sun Fire x4800
      [    0.000000] memblock_x86_reserve_range: wrong range [0xffffffff37000000, 0x137000000)
      [    0.000000] Modules linked in:
      [    0.000000] Pid: 0, comm: swapper Not tainted 2.6.36-rc5-tip-yh-01876-g1cac214-dirty #59
      [    0.000000] Call Trace:
      [    0.000000]  [<ffffffff82816f7e>] ? memblock_x86_reserve_range+0x40/0x7a
      [    0.000000]  [<ffffffff81078c2d>] warn_slowpath_common+0x85/0x9e
      [    0.000000]  [<ffffffff81078d38>] warn_slowpath_fmt+0x6e/0x70
      [    0.000000]  [<ffffffff8281e77c>] ? memblock_find_region+0x40/0x78
      [    0.000000]  [<ffffffff8281eb1f>] ? memblock_find_base+0x9a/0xb9
      [    0.000000]  [<ffffffff82816f7e>] memblock_x86_reserve_range+0x40/0x7a
      [    0.000000]  [<ffffffff8280452c>] setup_arch+0x99d/0xb2a
      [    0.000000]  [<ffffffff810a3e02>] ? trace_hardirqs_off+0xd/0xf
      [    0.000000]  [<ffffffff81cec7d8>] ? _raw_spin_unlock_irqrestore+0x3d/0x4c
      [    0.000000]  [<ffffffff827ffcec>] start_kernel+0xde/0x3f1
      [    0.000000]  [<ffffffff827ff2d4>] x86_64_start_reservations+0xa0/0xa4
      [    0.000000]  [<ffffffff827ff3de>] x86_64_start_kernel+0x106/0x10d
      [    0.000000] ---[ end trace a7919e7f17c0a725 ]---
      [    0.000000] Reserving 8192MB of memory at 17592186041200MB for crashkernel (System RAM: 526336MB)
      
      This is caused by a wraparound in the test due to size > end;
      explicitly check for this condition and fail.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      LKML-Reference: <4CAA4DD3.1080401@kernel.org>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      f1af98c7
    • Y
      x86-32, memblock: Make add_highpages honor early reserved ranges · 1d931264
      Yinghai Lu 提交于
      Originally the only early reserved range that is overlapped with high
      pages is "KVA RAM", but we already do remove that from the active ranges.
      
      However, It turns out Xen could have that kind of overlapping to support memory
      ballooning.x
      
      So we need to make add_highpage_with_active_regions() to subtract
      memblock reserved just like low ram; this is the proper design anyway.
      
      In this patch, refactering get_freel_all_memory_range() to make it can
      be used by add_highpage_with_active_regions().  Also we don't need to
      remove "KVA RAM" from active ranges.
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      LKML-Reference: <4CABB183.1040607@kernel.org>
      Signed-off-by: NH. Peter Anvin <hpa@zytor.com>
      1d931264