1. 05 4月, 2008 18 次提交
    • B
      memory controller: make memory resource control aware of boot options · 4077960e
      Balbir Singh 提交于
      A boot option for the memory controller was discussed on lkml.  It is a good
      idea to add it, since it saves memory for people who want to turn off the
      memory controller.
      
      By default the option is on for the following two reasons:
      
      1. It provides compatibility with the current scheme where the memory
         controller turns on if the config option is enabled
      2. It allows for wider testing of the memory controller, once the config
         option is enabled
      
      We still allow the create, destroy callbacks to succeed, since they are not
      aware of boot options.  We do not populate the directory will memory resource
      controller specific files.
      Signed-off-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Paul Menage <menage@google.com>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Sudhir Kumar <skumar@linux.vnet.ibm.com>
      Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4077960e
    • P
      cgroups: add cgroup support for enabling controllers at boot time · 8bab8dde
      Paul Menage 提交于
      The effects of cgroup_disable=foo are:
      
      - foo isn't auto-mounted if you mount all cgroups in a single hierarchy
      - foo isn't visible as an individually mountable subsystem
      
      As a result there will only ever be one call to foo->create(), at init time;
      all processes will stay in this group, and the group will never be mounted on
      a visible hierarchy.  Any additional effects (e.g.  not allocating metadata)
      are up to the foo subsystem.
      
      This doesn't handle early_init subsystems (their "disabled" bit isn't set be,
      but it could easily be extended to do so if any of the early_init systems
      wanted it - I think it would just involve some nastier parameter processing
      since it would occur before the command-line argument parser had been run.
      
      Hugh said:
      
        Ballpark figures, I'm trying to get this question out rather than
        processing the exact numbers: CONFIG_CGROUP_MEM_RES_CTLR adds 15% overhead
        to the affected paths, booting with cgroup_disable=memory cuts that back to
        1% overhead (due to slightly bigger struct page).
      
        I'm no expert on distros, they may have no interest whatever in
        CONFIG_CGROUP_MEM_RES_CTLR=y; and the rest of us can easily build with or
        without it, or apply the cgroup_disable=memory patches.
      
      Unix bench's execl test result on x86_64 was
      
      == just after boot without mounting any cgroup fs.==
      mem_cgorup=off : Execl Throughput       43.0     3150.1      732.6
      mem_cgroup=on  : Execl Throughput       43.0     2932.6      682.0
      ==
      
      [lizf@cn.fujitsu.com: fix boot option parsing]
      Signed-off-by: NBalbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Paul Menage <menage@google.com>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Sudhir Kumar <skumar@linux.vnet.ibm.com>
      Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: NLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8bab8dde
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86 · 3a143125
      Linus Torvalds 提交于
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/x86/linux-2.6-x86:
        x86: revert assign IRQs to hpet timer
        x86: tsc prevent time going backwards
        xen: Clear PG_pinned in release_{pt,pd}()
        xen: Do not pin/unpin PMD pages
        xen: refactor xen_{alloc,release}_{pt,pd}()
        x86, agpgart: scary messages are fortunately obsolete
        xen: fix grant table bug
        x86: fix breakage of vSMP irq operations
        x86: print message if nmi_watchdog=2 cannot be enabled
        x86: fix nmi_watchdog=2 on Pentium-D CPUs
      3a143125
    • G
      m68k: update defconfigs for 2.6.25 · a1aa758d
      Geert Uytterhoeven 提交于
      Long overdue update of the m68k defconfigs
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a1aa758d
    • A
      m68k: use KBUILD_DEFCONFIG · ef85ecbf
      Adrian Bunk 提交于
      The default defconfig should be one from arch/m68k/configs/
      
      arch/m68k/defconfig was not exactly identical to amiga_defconfig but
      also considering how long they have been without any update that doesn't
      seem to have been on purpose.
      Signed-off-by: NAdrian Bunk <adrian.bunk@movial.fi>
      Signed-off-by: NGeert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ef85ecbf
    • L
      Merge branch 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev · 7a5ac8de
      Linus Torvalds 提交于
      * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev:
        pata_ali: disable ATAPI DMA
        libata: ATA_12/16 doesn't fall into ATAPI_MISC
        libata: uninline atapi_cmd_type()
        libata: fix IDENTIFY order in ata_bus_probe()
      7a5ac8de
    • L
      Be more careful about marking buffers dirty · 1be62dc1
      Linus Torvalds 提交于
      Mikulas Patocka noted that the optimization where we check if a buffer
      was already dirty (and we avoid re-dirtying it) was not really SMP-safe.
      
      Since the read of the old status was not synchronized with anything, an
      aggressive CPU re-ordering of memory accesses might have moved that read
      up to before the data was even written to the buffer, and another CPU
      that cleaned it again, causing the newly dirty state to never actually
      hit the disk.
      
      Admittedly this would probably never trigger in practice, but it's still
      wrong.
      
      Mikulas sent a patch that fixed the problem, but I dislike the subtlety
      of the whole optimization, so this is an alternate fix that is more
      explicit about the particular SMP ordering for the optimization, and
      separates out the speculative reads of the buffer state into its own
      conditional (and makes the memory barrier only happen if we are likely
      to actually hit the optimized case in the first place).
      
      I considered removing the optimization entirely, but Andrew argued for
      it's continued existence. I'm a push-over.
      
      Cc: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1be62dc1
    • L
      parport_pc: make sure to release IO ports after probing for IT87XX · 4ed91901
      Linus Torvalds 提交于
      Commit f63fd7e2 ("parport_pc: detection
      for SuperIO IT87XX POST") only released the IO port region on success,
      not when the probe for the IT87XX chip failed.
      
      That caused not only a reserved region to leak, but also caused an oops
      when the driver module was unloaded and somebody tried to cat
      /proc/ioports - because the string that was assigned to the IO port
      region was a static string in the module virtual address area.
      Reported-by: NLubos Lunak <l.lunak@suse.cz>
      Cc: Jan Kara <jack@suse.cz>
      Cc: Petr Cvek <petr.cvek@tul.cz>
      Acked-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4ed91901
    • T
      x86: revert assign IRQs to hpet timer · 5761d64b
      Thomas Gleixner 提交于
      The commits:
      
      commit 37a47db8
      Author: Balaji Rao <balajirrao@gmail.com>
      Date:   Wed Jan 30 13:30:03 2008 +0100
      
          x86: assign IRQs to HPET timers, fix
      
      and
      
      commit e3f37a54
      Author: Balaji Rao <balajirrao@gmail.com>
      Date:   Wed Jan 30 13:30:03 2008 +0100
      
          x86: assign IRQs to HPET timers
      
      have been identified to cause a regression on some platforms due to
      the assignement of legacy IRQs which makes the legacy devices
      connected to those IRQs disfunctional.
      
      Revert them.
      
      This fixes http://bugzilla.kernel.org/show_bug.cgi?id=10382Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      5761d64b
    • T
      x86: tsc prevent time going backwards · 47001d60
      Thomas Gleixner 提交于
      We already catch most of the TSC problems by sanity checks, but there
      is a subtle bug which has been in the code for ever. This can cause
      time jumps in the range of hours.
      
      This was reported in:
           http://lkml.org/lkml/2007/8/23/96
      and
           http://lkml.org/lkml/2008/3/31/23
      
      I was able to reproduce the problem with a gettimeofday loop test on a
      dual core and a quad core machine which both have sychronized
      TSCs. The TSCs seems not to be perfectly in sync though, but the
      kernel is not able to detect the slight delta in the sync check. Still
      there exists an extremly small window where this delta can be observed
      with a real big time jump. So far I was only able to reproduce this
      with the vsyscall gettimeofday implementation, but in theory this
      might be observable with the syscall based version as well.
      
      CPU 0 updates the clock source variables under xtime/vyscall lock and
      CPU1, where the TSC is slighty behind CPU0, is reading the time right
      after the seqlock was unlocked.
      
      The clocksource reference data was updated with the TSC from CPU0 and
      the value which is read from TSC on CPU1 is less than the reference
      data. This results in a huge delta value due to the unsigned
      subtraction of the TSC value and the reference value. This algorithm
      can not be changed due to the support of wrapping clock sources like
      pm timer.
      
      The huge delta is converted to nanoseconds and added to xtime, which
      is then observable by the caller. The next gettimeofday call on CPU1
      will show the correct time again as now the TSC has advanced above the
      reference value.
      
      To prevent this TSC specific wreckage we need to compare the TSC value
      against the reference value and return the latter when it is larger
      than the actual TSC value.
      
      I pondered to mark the TSC unstable when the readout is smaller than
      the reference value, but this would render an otherwise good and fast
      clocksource unusable without a real good reason.
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      47001d60
    • M
      xen: Clear PG_pinned in release_{pt,pd}() · c946c7de
      Mark McLoughlin 提交于
      Signed-off-by: NMark McLoughlin <markmc@redhat.com>
      Cc: xen-devel@lists.xensource.com
      Cc: Mark McLoughlin <markmc@redhat.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      c946c7de
    • M
      xen: Do not pin/unpin PMD pages · a684d69d
      Mark McLoughlin 提交于
      i.e. with this simple test case:
      
          int fd = open("/dev/zero", O_RDONLY);
          munmap(mmap((void *)0x40000000, 0x1000_LEN, PROT_READ, MAP_PRIVATE, fd, 0), 0x1000);
          close(fd);
      
      we currently get:
      
         kernel BUG at arch/x86/xen/enlighten.c:678!
         ...
         EIP is at xen_release_pt+0x79/0xa9
         ...
         Call Trace:
          [<c041da25>] ? __pmd_free_tlb+0x1a/0x75
          [<c047a192>] ? free_pgd_range+0x1d2/0x2b5
          [<c047a2f3>] ? free_pgtables+0x7e/0x93
          [<c047b272>] ? unmap_region+0xb9/0xf5
          [<c047c1bd>] ? do_munmap+0x193/0x1f5
          [<c047c24f>] ? sys_munmap+0x30/0x3f
          [<c0408cce>] ? syscall_call+0x7/0xb
          =======================
      
      and xen complains:
      
        (XEN) mm.c:2241:d4 Mfn 1cc37 not pinned
      
      Further details at:
      
        https://bugzilla.redhat.com/436453Signed-off-by: NMark McLoughlin <markmc@redhat.com>
      Cc: xen-devel@lists.xensource.com
      Cc: Mark McLoughlin <markmc@redhat.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      a684d69d
    • M
      xen: refactor xen_{alloc,release}_{pt,pd}() · f6433706
      Mark McLoughlin 提交于
      Signed-off-by: NMark McLoughlin <markmc@redhat.com>
      Cc: xen-devel@lists.xensource.com
      Cc: Mark McLoughlin <markmc@redhat.com>
      Cc: Jeremy Fitzhardinge <jeremy@goop.org>
      Signed-off-by: NThomas Gleixner <tglx@linutronix.de>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      f6433706
    • P
      x86, agpgart: scary messages are fortunately obsolete · 8f59610d
      Pavel Machek 提交于
      Fix obsolete printks in aperture-64. We used not to handle missing
      agpgart, but we handle it okay now.
      Signed-off-by: NPavel Machek <pavel@suse.cz>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8f59610d
    • M
      xen: fix grant table bug · bbc60c18
      Michael Abd-El-Malek 提交于
      fix memory corruption and crash due to mis-sized grant table.
      
      A PV OS has two grant table data structures: the grant table itself
      and a free list.  The free list is composed of an array of pages,
      which grow dynamically as the guest OS requires more grants.  While
      the grant table contains 8-byte entries, the free list contains 4-byte
      entries.  So we have half as many pages in the free list than in the
      grant table.
      
      There was a bug in the free list allocation code. The free list was
      indexed as if it was the same size as the grant table.  But it's only
      half as large.  So memory got corrupted, and I was seeing crashes in
      the slab allocator later on.
      
      Taken from:
      
        http://xenbits.xensource.com/linux-2.6.18-xen.hg?rev/4018c0da3360Signed-off-by: NMichael Abd-El-Malek <mabdelmalek@cmu.edu>
      Signed-off-by: NMark McLoughlin <markmc@redhat.com>
      Signed-off-by: NJeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      bbc60c18
    • R
      x86: fix breakage of vSMP irq operations · bae1d250
      Ravikiran G Thirumalai 提交于
      25-rc* stopped working with CONFIG_X86_VSMP on vSMP machines.
      
      Looks like the vsmp irq ops got accidentally removed during merge of x86_64
      pvops in 2.6.25. -- commit 6abcd98f removed
      vsmp irq ops.
      
      Tested with both CONFIG_X86_VSMP and without CONFIG_X86_VSMP, on vSMP and non
      vSMP x86_64 machines.
      
      Please apply.
      Signed-off-by: NRavikiran Thirumalai <kiran@scalex86.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      bae1d250
    • I
      x86: print message if nmi_watchdog=2 cannot be enabled · 9c9b81f7
      Ingo Molnar 提交于
      right now if there's no CPU support for nmi_watchdog=2 we'll just
      refuse it silently.
      
      print a useful warning.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9c9b81f7
    • I
      x86: fix nmi_watchdog=2 on Pentium-D CPUs · 4f14bdef
      Ingo Molnar 提交于
      implement nmi_watchdog=2 on this class of CPUs:
      
        cpu family      : 15
        model           : 6
        model name      : Intel(R) Pentium(R) D CPU 3.00GHz
      
      the watchdog's ->setup() method is safe anyway, so if the CPU
      cannot support it we'll bail out safely.
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      4f14bdef
  2. 04 4月, 2008 12 次提交
  3. 03 4月, 2008 10 次提交
    • R
      [POWERPC] Fix MPC5200 (not B!) device tree so FEC ethernet works · 8d813941
      René Bürgel 提交于
      This gets the FEC ethernet driver working again on the lite5200
      platform.
      
      The FEC driver is also compatible with the MPC5200, not only with the
      MPC5200B, so this adds a suitable entry to the driver's match list.
      Furthermore this adds the settings for the PHY in the dts file for the
      Lite5200.  Note, that this is not exactly the same as in the
      Lite5200B, because the PHY is located at f0003000:01 for the 5200, and
      at :00 for the 5200B.  This was tested on a Lite5200 and a Lite5200B,
      both booted a kernel via tftp and mounted the root via nfs
      successfully.
      Signed-off-by: NRené Bürgel <r.buergel@unicontrol.de>
      Acked-by: NGrant Likely <grant.likely@secretlab.ca>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      8d813941
    • B
      [POWERPC] mpc5200: Amalgamated DTS fixes and updates · 115e1adc
      Bartlomiej Sieka 提交于
      DTS updates that fix booting problems on mpc5200-based boards:
      - change to ethernet reg property
      - addition of mdio and phy nodes
      - removal of pci node (Motion-Pro board)
      
      Other DTS updates:
      - update i2c device tree nodes
      - add lpb bus node and flash device (without partitions defined)
      - add rtc i2c nodes
      Signed-off-by: NMarian Balakowicz <m8@semihalf.com>
      Acked-by: NGrant Likely <grant.likely@secretlab.ca>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      115e1adc
    • M
      [POWERPC] Fix rtas_flash procfs interface · 74848398
      Maxim Shchetynin 提交于
      Handling of the proc_dir_entry->count was changed in 2.6.24-rc5.
      After this change, the default value for pde->count is 1 and not 0 as
      before.  Therefore, if we want to check whether our procfs file is
      already opened (already in use), we have to check if pde->count is
      greater than 2 rather than 1.
      Signed-off-by: NMaxim Shchetynin <maxim@de.ibm.com>
      Signed-off-by: NJens Osterkamp <jens@de.ibm.com>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      74848398
    • B
      [POWERPC] Fix deadlock with mmu_hash_lock in hash_page_sync · b991f05f
      Benjamin Herrenschmidt 提交于
      hash_page_sync() takes and releases the low level mmu hash
      lock in order to sync with other processors disposing of page
      tables.  Because that lock can be needed to service hash misses
      triggered by interrupt handlers, taking it must be done with
      interrupts off.  However, hash_page_sync() appears to be called
      with interrupts enabled, thus causing occasional deadlocks.
      
      We fix it by making sure hash_page_sync() masks interrupts while
      holding the lock.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      b991f05f
    • B
      [POWERPC] Fix iSeries hard irq enabling regression · ff3da2e0
      Benjamin Herrenschmidt 提交于
      A subtle bug sneaked into iSeries recently.  On this platform, we must
      not normally clear MSR:EE (the hardware external interrupt enable)
      except for short periods of time.  Taking an interrupt while
      soft-disabled doesn't cause us to clear it for example.
      
      The iSeries kernel expects to mostly run with MSR:EE enabled at all
      times except in a few exception entry/exit code paths.  Thus
      local_irq_enable() doesn't check if it needs to hard-enable as it
      expects this to be unnecessary on iSeries.
      
      However, hard_irq_disable() _does_ cause MSR:EE to be cleared,
      including on iSeries.  A call to it was recently added to the
      context switch code, thus causing interrupts to become disabled
      for a long periods of time, causing the iSeries watchdog to kick
      in under some circumstances and other nasty things.
      
      This patch fixes it by making local_irq_enable() properly re-enable
      MSR:EE on iSeries.  It basically removes a return statement here
      to make iSeries use the same code path as everybody else.  That does
      mean that we might occasionally get spurious decrementer interrupts
      but I don't think that matters.
      
      Another option would have been to make hard_irq_disable() a nop
      on iSeries but I didn't like it much, in case we have good reasons
      to hard-disable.
      
      Part of the patch is fixes to make sure the hard_enabled PACA field
      is properly set on iSeries as it used not to be before, since it
      was mostly unused.
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      ff3da2e0
    • L
      [POWERPC] Fix CPM2 SCC1 clock initialization. · 025306f3
      Laurent Pinchart 提交于
      A missing break statement in a switch caused cpm2_clk_setup() to initialize
      SCC2 instead of SCC1.
      Signed-off-by: NLaurent Pinchart <laurentp@cse-semaphore.com>
      Signed-off-by: NKumar Gala <galak@kernel.crashing.org>
      025306f3
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6 · 9597362d
      Linus Torvalds 提交于
      * git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb-2.6:
        USB: ohci: fix 2 timers to fire at jiffies + 1s
        USB: Allow initialization of broken keyspan serial adapters.
        USB: fix bug in sg initialization in usbtest
        USB: serial: fix regression in Visor/Palm OS module for kernels >= 2.6.24
        USB: cp2101: Add identifiers for the Telegesys ETRX2USB
        USB: serial: ti_usb_3410_5052: Correct TUSB3410 endpoint requirements.
        USB: another ehci_iaa_watchdog fix
      9597362d
    • A
      alpha: get_current(): don't add zero to current_thread_info()->task · 06f11f37
      Andrew Morton 提交于
      A nasty compile error:
      
      In file included from security/keys/internal.h:16,
                       from security/keys/sysctl.c:14:
      include/linux/key-ui.h: In function 'key_permission':
      include/linux/key-ui.h:51: error: invalid use of undefined type 'struct task_struct'
      
      apparently the compiler has decided that it needs to know sizeof(task_struct)
      so that it can add zero to a task_struct* (which is rather dumb of it).
      
      Getting task_struct in scope in these deeply-nested headers is scary-looking,
      so let's just remove the "+ 0".
      
      Cc: David Howells <dhowells@redhat.com>
      Cc: Richard Henderson <rth@twiddle.net>
      Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      06f11f37
    • M
      markers: use synchronize_sched() · 6496968e
      Mathieu Desnoyers 提交于
      Markers do not mix well with CONFIG_PREEMPT_RCU because it uses
      preempt_disable/enable() and not rcu_read_lock/unlock for minimal
      intrusiveness.  We would need call_sched and sched_barrier primitives.
      
      Currently, the modification (connection and disconnection) of probes
      from markers requires changes to the data structure done in RCU-style :
      a new data structure is created, the pointer is changed atomically, a
      quiescent state is reached and then the old data structure is freed.
      
      The quiescent state is reached once all the currently running
      preempt_disable regions are done running.  We use the call_rcu mechanism
      to execute kfree() after such quiescent state has been reached.
      However, the new CONFIG_PREEMPT_RCU version of call_rcu and rcu_barrier
      does not guarantee that all preempt_disable code regions have finished,
      hence the race.
      
      The "proper" way to do this is to use rcu_read_lock/unlock, but we don't
      want to use it to minimize intrusiveness on the traced system.  (we do
      not want the marker code to call into much of the OS code, because it
      would quickly restrict what can and cannot be instrumented, such as the
      scheduler).
      
      The temporary fix, until we get call_rcu_sched and rcu_barrier_sched in
      mainline, is to use synchronize_sched before each call_rcu calls, so we
      wait for the quiescent state in the system call code path.  It will slow
      down batch marker enable/disable, but will make sure the race is gone.
      Signed-off-by: NMathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
      Acked-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      6496968e
    • K
      vmcoreinfo: add the symbol "phys_base" · 629c8b4c
      Ken'ichi Ohmichi 提交于
      Fix the problem that makedumpfile sometimes fails on x86_64 machine.
      
      This patch adds the symbol "phys_base" to a vmcoreinfo data.  The
      vmcoreinfo data has the minimum debugging information only for dump
      filtering.  makedumpfile (dump filtering command) gets it to distinguish
      unnecessary pages, and makedumpfile creates a small dumpfile.
      
      On x86_64 kernel which compiled with CONFIG_PHYSICAL_START=0x0 and
      CONFIG_RELOCATABLE=y, makedumpfile fails like the following:
      
       # makedumpfile -d31 /proc/vmcore dumpfile
       The kernel version is not supported.
       The created dumpfile may be incomplete.
       _exclude_free_page: Can't get next online node.
      
       makedumpfile Failed.
       #
      
      The cause is the lack of the symbol "phys_base" in a vmcoreinfo data.
      If the symbol "phys_base" does not exist, makedumpfile considers an
      x86_64 kernel as non relocatable.  As the result, makedumpfile
      misunderstands the physical address where the kernel is loaded, and it
      cannot translate a kernel virtual address to physical address correctly.
      
      To fix this problem, this patch adds the symbol "phys_base" to a
      vmcoreinfo data.
      Signed-off-by: NKen'ichi Ohmichi <oomichi@mxs.nes.nec.co.jp>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: <stable@kernel.org>
      Acked-by: NVivek Goyal <vgoyal@redhat.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      629c8b4c