1. 21 2月, 2009 4 次提交
    • S
      ftrace: break out modify loop immediately on detection of error · 4377245a
      Steven Rostedt 提交于
      Impact: added precaution on failure detection
      
      Break out of the modifying loop as soon as a failure is detected.
      This is just an added precaution found by code review and was not
      found by any bug chasing.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      4377245a
    • S
      ftrace: immediately stop code modification if failure is detected · 90c7ac49
      Steven Rostedt 提交于
      Impact: fix to prevent NMI lockup
      
      If the page fault handler produces a WARN_ON in the modifying of
      text, and the system is setup to have a high frequency of NMIs,
      we can lock up the system on a failure to modify code.
      
      The modifying of code with NMIs allows all NMIs to modify the code
      if it is about to run. This prevents a modifier on one CPU from
      modifying code running in NMI context on another CPU. The modifying
      is done through stop_machine, so only NMIs must be considered.
      
      But if the write causes the page fault handler to produce a warning,
      the print can slow it down enough that as soon as it is done
      it will take another NMI before going back to the process context.
      The new NMI will perform the write again causing another print and
      this will hang the box.
      
      This patch turns off the writing as soon as a failure is detected
      and does not wait for it to be turned off by the process context.
      This will keep NMIs from getting stuck in this back and forth
      of print outs.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      90c7ac49
    • S
      ftrace, x86: make kernel text writable only for conversions · 16239630
      Steven Rostedt 提交于
      Impact: keep kernel text read only
      
      Because dynamic ftrace converts the calls to mcount into and out of
      nops at run time, we needed to always keep the kernel text writable.
      
      But this defeats the point of CONFIG_DEBUG_RODATA. This patch converts
      the kernel code to writable before ftrace modifies the text, and converts
      it back to read only afterward.
      
      The kernel text is converted to read/write, stop_machine is called to
      modify the code, then the kernel text is converted back to read only.
      
      The original version used SYSTEM_STATE to determine when it was OK
      or not to change the code to rw or ro. Andrew Morton pointed out that
      using SYSTEM_STATE is a bad idea since there is no guarantee to what
      its state will actually be.
      
      Instead, I moved the check into the set_kernel_text_* functions
      themselves, and use a local variable to determine when it is
      OK to change the kernel text RW permissions.
      
      [ Update: Ingo Molnar suggested moving the prototypes to cacheflush.h ]
      Reviewed-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      16239630
    • S
      ftrace: allow archs to preform pre and post process for code modification · 000ab691
      Steven Rostedt 提交于
      This patch creates the weak functions: ftrace_arch_code_modify_prepare
      and ftrace_arch_code_modify_post_process that are called before and
      after the stop machine is called to modify the kernel text.
      
      If the arch needs to do pre or post processing, it only needs to define
      these functions.
      
      [ Update: Ingo Molnar suggested using the name ftrace_arch_code_modify_*
                over using ftrace_arch_modify_* ]
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      000ab691
  2. 20 2月, 2009 8 次提交
  3. 19 2月, 2009 28 次提交
    • M
      [ARM] 5404/1: Fix condition in arm_elf_read_implies_exec() to set READ_IMPLIES_EXEC · 9da616fb
      Makito SHIOKAWA 提交于
      READ_IMPLIES_EXEC must be set when:
      o binary _is_ an executable stack (i.e. not EXSTACK_DISABLE_X)
      o processor architecture is _under_ ARMv6 (XN bit is supported from ARMv6)
      Signed-off-by: NMakito SHIOKAWA <lkhmkt@gmail.com>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      9da616fb
    • H
      [S390] fix "mem=" handling in case of standby memory · 23d75d9c
      Heiko Carstens 提交于
      Standby memory detected with the sclp interface gets always registered
      with add_memory calls without considering the limitationt that the
      "mem=" kernel paramater implies.
      So fix this and only register standby memory that is below the specified
      limit.
      This fixes zfcpdump since it uses "mem=32M". In case there is appr.
      2GB standby memory present all of usable memory would be used for the
      struct pages needed for standby memory.
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      23d75d9c
    • C
      [S390] Fix timeval regression on s390 · d5cd0343
      Christian Borntraeger 提交于
      commit aa5e97ce
      [PATCH] improve precision of process accounting.
      
      Introduced a timing regression:
      -bash-3.2# time ls
      real    0m0.006s
      user    0m1.754s
      sys     0m1.094s
      
      The problem was introduced by an error in cputime_to_timeval.
      Cputime is now 1/4096 microsecond, therefore, we have to divide
      the remainder with 4096 to get the microseconds.
      Signed-off-by: NChristian Borntraeger <borntraeger@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      d5cd0343
    • P
      [S390] sclp: handle empty event buffers · e2e5a0f2
      Peter Oberparleiter 提交于
      Handle a malformed hardware response which some versions of the
      Support Element (SE) may present during SE restart and which otherwise
      would result in an endless loop in function sclp_dispatch_evbufs.
      Signed-off-by: NPeter Oberparleiter <peter.oberparleiter@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      e2e5a0f2
    • R
      [ARM] omap: fix clock reparenting in omap2_clk_set_parent() · 41f3103f
      Russell King 提交于
      When changing the parent of a clock, it is necessary to keep the
      clock use counts balanced otherwise things the parent state will
      get corrupted.  Since we already disable and re-enable the clock,
      we might as well use the recursive versions instead.
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      41f3103f
    • T
      Merge branch 'fix/usb-audio' into for-linus · e432472d
      Takashi Iwai 提交于
      e432472d
    • T
      Merge branch 'fix/misc' into for-linus · e6845d91
      Takashi Iwai 提交于
      e6845d91
    • T
      Merge branch 'fix/hda' into for-linus · 379752fd
      Takashi Iwai 提交于
      379752fd
    • R
      [ARM] 5403/1: pxa25x_ep_fifo_flush() *ep->reg_udccs always set to 0 · 22eb36f4
      Roel Kluin 提交于
      *ep->reg_udccs is always set to 0.
      Signed-off-by: NRoel Kluin <roel.kluin@gmail.com>
      Acked-by: NEric Miao <eric.miao@marvell.com>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      22eb36f4
    • N
      [ARM] 5402/1: fix a case of wrap-around in sanity_check_meminfo() · 3fd9825c
      Nicolas Pitre 提交于
      In the non highmem case, if two memory banks of 1GB each are provided,
      the second bank would evade suppression since its virtual base would
      be 0.  Fix this by disallowing any memory bank which virtual base
      address is found to be lower than PAGE_OFFSET.
      Reported-by: NLennert Buytenhek <buytenh@marvell.com>
      Signed-off-by: NNicolas Pitre <nico@marvell.com>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      3fd9825c
    • I
      Merge branch 'tip/tracing/urgent' of... · ed4a2f37
      Ingo Molnar 提交于
      Merge branch 'tip/tracing/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-2.6-trace into tracing/urgent
      ed4a2f37
    • C
      sound: virtuoso: revert "do not overwrite EEPROM on Xonar D2/D2X" · 6ce6c473
      Clemens Ladisch 提交于
      This reverts commit 7e86c0e6 ("do not
      overwrite EEPROM on Xonar D2/D2X") because it did not actually help with
      the problem.
      
      More user reports show that the overwriting of the EEPROM is not
      triggered by using this driver but by installing Linux, and that the
      installation of any other operating system (even one without any CMI8788
      driver) has the same effect.  In other words, the presence of this
      driver does not have any effect on the occurrence of the error.  (So
      far, the available evidence seems to point to a BIOS bug.)
      
      Furthermore, it turns out that the EEPROM chip is protected against
      stray write commands by the command format and by requiring a separate
      write-enable command, so the error scenario in the previous commit (that
      SPI writes can be misinterpreted as an EEPROM write command) is not even
      theoretically possible.
      
      The mixer control that was removed as a consequence of the previous
      commit can only be partially emulated in userspace, which also means it
      cannot be seen be the in-kernel OSS API emulation, so it is better to
      revert that change.
      Signed-off-by: NClemens Ladisch <clemens@ladisch.de>
      Cc: <stable@kernel.org>
      Signed-off-by: NTakashi Iwai <tiwai@suse.de>
      6ce6c473
    • S
      tracing: limit the number of loops the ring buffer self test can make · 4b3e3d22
      Steven Rostedt 提交于
      Impact: prevent deadlock if ring buffer gets corrupted
      
      This patch adds a paranoid check to make sure the ring buffer consumer
      does not go into an infinite loop. Since the ring buffer has been set
      to read only, the consumer should not loop for more than the ring buffer
      size. A check is added to make sure the consumer does not loop more than
      the ring buffer size.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      4b3e3d22
    • S
      tracing: have function trace select kallsyms · 4d7a077c
      Steven Rostedt 提交于
      Impact: fix output of function tracer to be useful
      
      The function tracer is pretty useless if KALLSYMS is not configured.
      Unless you are good at reading hex values, the function tracer should
      select the KALLSYMS configuration.
      
      Also, the dynamic function tracer will fail its self test if KALLSYMS
      is not selected.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      4d7a077c
    • S
      tracing: disable tracing while testing ring buffer · 0c5119c1
      Steven Rostedt 提交于
      Impact: fix to prevent hard lockup on self tests
      
      If one of the tracers are broken and is constantly filling the ring
      buffer while the test of the ring buffer is running, it will hang
      the box. The reason is that the test is a consumer that will not
      stop till the ring buffer is empty. But if the tracer is broken and
      is constantly producing input to the buffer, this test will never
      end. The result is a lockup of the box.
      
      This happened when KALLSYMS was not defined and the dynamic ftrace
      test constantly filled the ring buffer, because the filter failed
      and all functions were being traced. Something was being called
      that constantly filled the buffer.
      Signed-off-by: NSteven Rostedt <srostedt@redhat.com>
      0c5119c1
    • L
      Merge branch 'for-linus' of git://git.kernel.dk/linux-2.6-block · ba95fd47
      Linus Torvalds 提交于
      * 'for-linus' of git://git.kernel.dk/linux-2.6-block:
        block: fix deadlock in blk_abort_queue() for drivers that readd to timeout list
        block: fix booting from partitioned md array
        block: revert part of 18ce3751
        cciss: PCI power management reset for kexec
        paride/pg.c: xs(): &&/|| confusion
        fs/bio: bio_alloc_bioset: pass right object ptr to mempool_free
        block: fix bad definition of BIO_RW_SYNC
        bsg: Fix sense buffer bug in SG_IO
      ba95fd47
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc · 59af0a0b
      Linus Torvalds 提交于
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/drzeus/mmc:
        omap_hsmmc: Change while(); loops with finite version
        omap_hsmmc: recover from transfer failures
        omap_hsmmc: only MMC1 allows HCTL.SDVS != 1.8V
        omap_hsmmc: card detect irq bugfix
        sdhci: fix led naming
        mmc_test: fix basic read test
        s3cmci: Fix hangup in do_pio_write()
        Revert "sdhci: force high speed capability on some controllers"
        MMC: fix bug - SDHC card capacity not correct
      59af0a0b
    • I
      inotify: fix GFP_KERNEL related deadlock · f04b30de
      Ingo Molnar 提交于
      Enhanced lockdep coverage of __GFP_NOFS turned up this new lockdep
      assert:
      
      [ 1093.677775]
      [ 1093.677781] =================================
      [ 1093.680031] [ INFO: inconsistent lock state ]
      [ 1093.680031] 2.6.29-rc5-tip-01504-gb49eca1-dirty #1
      [ 1093.680031] ---------------------------------
      [ 1093.680031] inconsistent {RECLAIM_FS-ON-W} -> {IN-RECLAIM_FS-W} usage.
      [ 1093.680031] kswapd0/308 [HC0[0]:SC0[0]:HE1:SE1] takes:
      [ 1093.680031]  (&inode->inotify_mutex){+.+.?.}, at: [<c0205942>] inotify_inode_is_dead+0x20/0x80
      [ 1093.680031] {RECLAIM_FS-ON-W} state was registered at:
      [ 1093.680031]   [<c01696b9>] mark_held_locks+0x43/0x5b
      [ 1093.680031]   [<c016baa4>] lockdep_trace_alloc+0x6c/0x6e
      [ 1093.680031]   [<c01cf8b0>] kmem_cache_alloc+0x20/0x150
      [ 1093.680031]   [<c040d0ec>] idr_pre_get+0x27/0x6c
      [ 1093.680031]   [<c02056e3>] inotify_handle_get_wd+0x25/0xad
      [ 1093.680031]   [<c0205f43>] inotify_add_watch+0x7a/0x129
      [ 1093.680031]   [<c020679e>] sys_inotify_add_watch+0x20f/0x250
      [ 1093.680031]   [<c010389e>] sysenter_do_call+0x12/0x35
      [ 1093.680031]   [<ffffffff>] 0xffffffff
      [ 1093.680031] irq event stamp: 60417
      [ 1093.680031] hardirqs last  enabled at (60417): [<c018d5f5>] call_rcu+0x53/0x59
      [ 1093.680031] hardirqs last disabled at (60416): [<c018d5b9>] call_rcu+0x17/0x59
      [ 1093.680031] softirqs last  enabled at (59656): [<c0146229>] __do_softirq+0x157/0x16b
      [ 1093.680031] softirqs last disabled at (59651): [<c0106293>] do_softirq+0x74/0x15d
      [ 1093.680031]
      [ 1093.680031] other info that might help us debug this:
      [ 1093.680031] 2 locks held by kswapd0/308:
      [ 1093.680031]  #0:  (shrinker_rwsem){++++..}, at: [<c01b0502>] shrink_slab+0x36/0x189
      [ 1093.680031]  #1:  (&type->s_umount_key#4){+++++.}, at: [<c01e6d77>] shrink_dcache_memory+0x110/0x1fb
      [ 1093.680031]
      [ 1093.680031] stack backtrace:
      [ 1093.680031] Pid: 308, comm: kswapd0 Not tainted 2.6.29-rc5-tip-01504-gb49eca1-dirty #1
      [ 1093.680031] Call Trace:
      [ 1093.680031]  [<c016947a>] valid_state+0x12a/0x13d
      [ 1093.680031]  [<c016954e>] mark_lock+0xc1/0x1e9
      [ 1093.680031]  [<c016a5b4>] ? check_usage_forwards+0x0/0x3f
      [ 1093.680031]  [<c016ab74>] __lock_acquire+0x2c6/0xac8
      [ 1093.680031]  [<c01688d9>] ? register_lock_class+0x17/0x228
      [ 1093.680031]  [<c016b3d3>] lock_acquire+0x5d/0x7a
      [ 1093.680031]  [<c0205942>] ? inotify_inode_is_dead+0x20/0x80
      [ 1093.680031]  [<c08824c4>] __mutex_lock_common+0x3a/0x4cb
      [ 1093.680031]  [<c0205942>] ? inotify_inode_is_dead+0x20/0x80
      [ 1093.680031]  [<c08829ed>] mutex_lock_nested+0x2e/0x36
      [ 1093.680031]  [<c0205942>] ? inotify_inode_is_dead+0x20/0x80
      [ 1093.680031]  [<c0205942>] inotify_inode_is_dead+0x20/0x80
      [ 1093.680031]  [<c01e6672>] dentry_iput+0x90/0xc2
      [ 1093.680031]  [<c01e67a3>] d_kill+0x21/0x45
      [ 1093.680031]  [<c01e6a46>] __shrink_dcache_sb+0x27f/0x355
      [ 1093.680031]  [<c01e6dc5>] shrink_dcache_memory+0x15e/0x1fb
      [ 1093.680031]  [<c01b05ed>] shrink_slab+0x121/0x189
      [ 1093.680031]  [<c01b0d12>] kswapd+0x39f/0x561
      [ 1093.680031]  [<c01ae499>] ? isolate_pages_global+0x0/0x233
      [ 1093.680031]  [<c0157eae>] ? autoremove_wake_function+0x0/0x43
      [ 1093.680031]  [<c01b0973>] ? kswapd+0x0/0x561
      [ 1093.680031]  [<c0157daf>] kthread+0x41/0x82
      [ 1093.680031]  [<c0157d6e>] ? kthread+0x0/0x82
      [ 1093.680031]  [<c01043ab>] kernel_thread_helper+0x7/0x10
      
      inotify_handle_get_wd() does idr_pre_get() which does a
      kmem_cache_alloc() without __GFP_FS - and is hence deadlockable under
      extreme MM pressure.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      Acked-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: MinChan Kim <minchan.kim@gmail.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f04b30de
    • M
      spi-gpio: sanitize MISO bitvalue · be50344e
      Michael Buesch 提交于
      gpio_get_value() returns 0 or nonzero, but getmiso() expects 0 or 1.
      Sanitize the value to a 0/1 boolean.
      Signed-off-by: NMichael Buesch <mb@bu3sch.de>
      Acked-by: NDavid Brownell <dbrownell@users.sourceforge.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      be50344e
    • B
      Bernhard has moved · 97bef7dd
      Bernhard Walle 提交于
      Since I don't work for SUSE any more and the bwalle@suse.de address is
      invalid, correct it in the copyright headers and documentation.
      Signed-off-by: NBernhard Walle <bernhard.walle@gmx.de>
      Cc: Greg KH <greg@kroah.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      97bef7dd
    • R
      x86: dell-laptop: depends on POWER_SUPPLY · 310d8c93
      Randy Dunlap 提交于
      Build breaks when DELL_LAPTOP=y and POWER_SUPPLY=m.  DELL_LAPTOP needs to
      depend on POWER_SUPPLY.
      
      dell-laptop.c:(.text+0x1ef3c4): undefined reference to `power_supply_is_system_supplied'
      dell-laptop.c:(.text+0x1ef45e): undefined reference to `power_supply_is_system_supplied'
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Cc: Matthew Garrett <mjg59@srcf.ucam.org>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Len Brown <lenb@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      310d8c93
    • B
      vt: Declare PIO_CMAP/GIO_CMAP as compatbile ioctls. · 2db69a93
      Bill Nottingham 提交于
      Otherwise, these don't work when called from 32-bit userspace on 64-bit
      kernels.
      
      Cc: Jiri Kosina <jkosina@suse.cz>
      Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
      Cc: <stable@kernel.org>		[2.6.25.x, 2.6.26.x, 2.6.27.x, 2.6.28.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2db69a93
    • K
      fbdev/drm: fix Kconfig submenu mess in "Graphics support" · a1a5c3b9
      Krzysztof Helt 提交于
      Submenus of the graphics support "Support for frame buffer devices" and
      "Direct Rendering Manager (XFree86 4.1.0 and higher DRI support)" are
      broken in half after latest changes for Intel 915 mode setting support.
      
      The DRM subsection is broken because one option is put outside the choice
      section it depends on.
      
      The frame buffers part is broken then due to circular dependency.  Fix
      this by make Intel frame buffers depend on CONFIG_INTEL_AGP.
      
      Kconfigs are broken by d2f59357
      ("drm/i915: select framebuffer support automatically").
      
      This is probably not only way to fix this.
      Signed-off-by: NKrzysztof Helt <krzysztof.h1@wp.pl>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Dave Airlie <airlied@linux.ie>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a1a5c3b9
    • P
      floppy: request and release only the ports we actually use · 5a74db06
      Philippe De Muyter 提交于
      The floppy driver requests an I/O port it doesn't need, and sometimes this
      causes a conflict with a motherboard device reported by PNPBIOS.
      
      This patch makes the floppy driver request and release only the ports it
      actually uses.  It also factors out the request/release stuff and the
      io-ports list so they're all in one place now.
      
      The current floppy driver uses only these ports:
      
          0x3f2 (FD_DOR)
          0x3f4 (FD_STATUS)
          0x3f5 (FD_DATA)
          0x3f7 (FD_DCR/FD_DIR)
      
      but it requests 0x3f2-0x3f5 and 0x3f7, which includes the unused port
      0x3f3.
      
      Some BIOSes report 0x3f3 as a motherboard resource.  The PNP system driver
      reserves that, which causes a conflict when the floppy driver requests
      0x3f2-0x3f5 later.
      
      Philippe reported that this conflict broke the floppy driver between
      2.6.11 and 2.6.22.  His PNPBIOS reports these devices:
      
          $ cat 00:07/id 00:07/resources	# motherboard device
          PNP0c02
          state = active
          io 0x80-0x80
          io 0x10-0x1f
          io 0x22-0x3f
          io 0x44-0x5f
          io 0x90-0x9f
          io 0xa2-0xbf
          io 0x3f0-0x3f1
          io 0x3f3-0x3f3
      
          $ cat 00:03/id 00:03/resources	# floppy device
          PNP0700
          state = active
          io 0x3f4-0x3f5
          io 0x3f2-0x3f2
      
      Reference:
          http://lkml.org/lkml/2009/1/31/162Signed-off-by: NBjorn Helgaas <bjorn.helgaas@hp.com>
      Signed-off-by: NPhilippe De Muyter <phdm@macqel.be>
      Reported-by: NPhilippe De Muyter <phdm@macqel.be>
      Tested-by: NPhilippe De Muyter <phdm@macqel.be>
      Cc: Adam M Belay <abelay@mit.edu>
      Cc: Robert Hancock <hancockrwd@gmail.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5a74db06
    • A
      jsm: additional device support · ffa7525c
      Adam Lackorzynski 提交于
      I have a Digi Neo 8 PCI card (114f:00b1) Serial controller: Digi
      International Digi Neo 8 (rev 05)
      
      that works with the jsm driver after using the following patch.
      Signed-off-by: NAdam Lackorzynski <adam@os.inf.tu-dresden.de>
      Cc: Scott H Kilau <Scott_Kilau@digi.com>
      Cc: Wendy Xiong <wendyx@us.ibm.com>
      Acked-by: NAlan Cox <alan@lxorguk.ukuu.org.uk>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ffa7525c
    • K
      mm: fix memmap init for handling memory hole · cc2559bc
      KAMEZAWA Hiroyuki 提交于
      Now, early_pfn_in_nid(PFN, NID) may returns false if PFN is a hole.
      and memmap initialization was not done. This was a trouble for
      sparc boot.
      
      To fix this, the PFN should be initialized and marked as PG_reserved.
      This patch changes early_pfn_in_nid() return true if PFN is a hole.
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Reported-by: NDavid Miller <davem@davemlloft.net>
      Tested-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: <stable@kernel.org>		[2.6.25.x, 2.6.26.x, 2.6.27.x, 2.6.28.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      cc2559bc
    • K
      mm: clean up for early_pfn_to_nid() · f2dbcfa7
      KAMEZAWA Hiroyuki 提交于
      What's happening is that the assertion in mm/page_alloc.c:move_freepages()
      is triggering:
      
      	BUG_ON(page_zone(start_page) != page_zone(end_page));
      
      Once I knew this is what was happening, I added some annotations:
      
      	if (unlikely(page_zone(start_page) != page_zone(end_page))) {
      		printk(KERN_ERR "move_freepages: Bogus zones: "
      		       "start_page[%p] end_page[%p] zone[%p]\n",
      		       start_page, end_page, zone);
      		printk(KERN_ERR "move_freepages: "
      		       "start_zone[%p] end_zone[%p]\n",
      		       page_zone(start_page), page_zone(end_page));
      		printk(KERN_ERR "move_freepages: "
      		       "start_pfn[0x%lx] end_pfn[0x%lx]\n",
      		       page_to_pfn(start_page), page_to_pfn(end_page));
      		printk(KERN_ERR "move_freepages: "
      		       "start_nid[%d] end_nid[%d]\n",
      		       page_to_nid(start_page), page_to_nid(end_page));
       ...
      
      And here's what I got:
      
      	move_freepages: Bogus zones: start_page[2207d0000] end_page[2207dffc0] zone[fffff8103effcb00]
      	move_freepages: start_zone[fffff8103effcb00] end_zone[fffff8003fffeb00]
      	move_freepages: start_pfn[0x81f600] end_pfn[0x81f7ff]
      	move_freepages: start_nid[1] end_nid[0]
      
      My memory layout on this box is:
      
      [    0.000000] Zone PFN ranges:
      [    0.000000]   Normal   0x00000000 -> 0x0081ff5d
      [    0.000000] Movable zone start PFN for each node
      [    0.000000] early_node_map[8] active PFN ranges
      [    0.000000]     0: 0x00000000 -> 0x00020000
      [    0.000000]     1: 0x00800000 -> 0x0081f7ff
      [    0.000000]     1: 0x0081f800 -> 0x0081fe50
      [    0.000000]     1: 0x0081fed1 -> 0x0081fed8
      [    0.000000]     1: 0x0081feda -> 0x0081fedb
      [    0.000000]     1: 0x0081fedd -> 0x0081fee5
      [    0.000000]     1: 0x0081fee7 -> 0x0081ff51
      [    0.000000]     1: 0x0081ff59 -> 0x0081ff5d
      
      So it's a block move in that 0x81f600-->0x81f7ff region which triggers
      the problem.
      
      This patch:
      
      Declaration of early_pfn_to_nid() is scattered over per-arch include
      files, and it seems it's complicated to know when the declaration is used.
       I think it makes fix-for-memmap-init not easy.
      
      This patch moves all declaration to include/linux/mm.h
      
      After this,
        if !CONFIG_NODES_POPULATES_NODE_MAP && !CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID
           -> Use static definition in include/linux/mm.h
        else if !CONFIG_HAVE_ARCH_EARLY_PFN_TO_NID
           -> Use generic definition in mm/page_alloc.c
        else
           -> per-arch back end function will be called.
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Tested-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Reported-by: NDavid Miller <davem@davemlloft.net>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: <stable@kernel.org>		[2.6.25.x, 2.6.26.x, 2.6.27.x, 2.6.28.x]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f2dbcfa7
    • P
      fs/super.c: add lockdep annotation to s_umount · ada723dc
      Peter Zijlstra 提交于
      Li Zefan said:
      
      Thread 1:
        for ((; ;))
        {
            mount -t cpuset xxx /mnt > /dev/null 2>&1
            cat /mnt/cpus > /dev/null 2>&1
            umount /mnt > /dev/null 2>&1
        }
      
      Thread 2:
        for ((; ;))
        {
            mount -t cpuset xxx /mnt > /dev/null 2>&1
            umount /mnt > /dev/null 2>&1
        }
      
      (Note: It is irrelevant which cgroup subsys is used.)
      
      After a while a lockdep warning showed up:
      
      =============================================
      [ INFO: possible recursive locking detected ]
      2.6.28 #479
      ---------------------------------------------
      mount/13554 is trying to acquire lock:
       (&type->s_umount_key#19){--..}, at: [<c049d888>] sget+0x5e/0x321
      
      but task is already holding lock:
       (&type->s_umount_key#19){--..}, at: [<c049da0c>] sget+0x1e2/0x321
      
      other info that might help us debug this:
      1 lock held by mount/13554:
       #0:  (&type->s_umount_key#19){--..}, at: [<c049da0c>] sget+0x1e2/0x321
      
      stack backtrace:
      Pid: 13554, comm: mount Not tainted 2.6.28-mc #479
      Call Trace:
       [<c044ad2e>] validate_chain+0x4c6/0xbbd
       [<c044ba9b>] __lock_acquire+0x676/0x700
       [<c044bb82>] lock_acquire+0x5d/0x7a
       [<c049d888>] ? sget+0x5e/0x321
       [<c061b9b8>] down_write+0x34/0x50
       [<c049d888>] ? sget+0x5e/0x321
       [<c049d888>] sget+0x5e/0x321
       [<c045a2e7>] ? cgroup_set_super+0x0/0x3e
       [<c045959f>] ? cgroup_test_super+0x0/0x2f
       [<c045bcea>] cgroup_get_sb+0x98/0x2e7
       [<c045cfb6>] cpuset_get_sb+0x4a/0x5f
       [<c049dfa4>] vfs_kern_mount+0x40/0x7b
       [<c049e02d>] do_kern_mount+0x37/0xbf
       [<c04af4a0>] do_mount+0x5c3/0x61a
       [<c04addd2>] ? copy_mount_options+0x2c/0x111
       [<c04af560>] sys_mount+0x69/0xa0
       [<c0403251>] sysenter_do_call+0x12/0x31
      
      The cause is after alloc_super() and then retry, an old entry in list
      fs_supers is found, so grab_super(old) is called, but both functions hold
      s_umount lock:
      
      struct super_block *sget(...)
      {
      	...
      retry:
      	spin_lock(&sb_lock);
      	if (test) {
      		list_for_each_entry(old, &type->fs_supers, s_instances) {
      			if (!test(old, data))
      				continue;
      			if (!grab_super(old))  <--- 2nd: down_write(&old->s_umount);
      				goto retry;
      			if (s)
      				destroy_super(s);
      			return old;
      		}
      	}
      	if (!s) {
      		spin_unlock(&sb_lock);
      		s = alloc_super(type);   <--- 1th: down_write(&s->s_umount)
      		if (!s)
      			return ERR_PTR(-ENOMEM);
      		goto retry;
      	}
      	...
      }
      
      It seems like a false positive, and seems like VFS but not cgroup needs to
      be fixed.
      
      Peter said:
      
      We can simply put the new s_umount instance in a but lockdep doesn't
      particularly cares about subclass order.
      
      If there's any issue with the callers of sget() assuming the s_umount lock
      being of sublcass 0, then there is another annotation we can use to fix
      that, but lets not bother with that if this is sufficient.
      
      Addresses http://bugzilla.kernel.org/show_bug.cgi?id=12673Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Tested-by: NLi Zefan <lizf@cn.fujitsu.com>
      Reported-by: NLi Zefan <lizf@cn.fujitsu.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Paul Menage <menage@google.com>
      Cc: Arjan van de Ven <arjan@infradead.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ada723dc