1. 30 5月, 2011 3 次提交
  2. 29 5月, 2011 34 次提交
    • L
      x86 idle: deprecate mwait_idle() and "idle=mwait" cmdline param · 5d4c47e0
      Len Brown 提交于
      mwait_idle() is a C1-only idle loop intended to be more efficient
      than HLT on SMP hardware that supports it.
      
      But mwait_idle() has been replaced by the more general
      mwait_idle_with_hints(), which handles both C1 and deeper C-states.
      ACPI uses only mwait_idle_with_hints(), and never uses mwait_idle().
      
      Deprecate mwait_idle() and the "idle=mwait" cmdline param
      to simplify the x86 idle code.
      
      After this change, kernels configured with
      (!CONFIG_ACPI=n && !CONFIG_INTEL_IDLE=n) when run on hardware
      that support MWAIT will simply use HLT.  If MWAIT is desired
      on those systems, cpuidle and the cpuidle drivers above
      can be used.
      
      cc: x86@kernel.org
      cc: stable@kernel.org # .39.x
      Signed-off-by: NLen Brown <len.brown@intel.com>
      5d4c47e0
    • L
      x86 idle: deprecate "no-hlt" cmdline param · cdaab4a0
      Len Brown 提交于
      We'd rather that modern machines not check if HLT works on
      every entry into idle, for the benefit of machines that had
      marginal electricals 15-years ago.  If those machines are still running
      the upstream kernel, they can use "idle=poll".  The only difference
      will be that they'll now invoke HLT in machine_hlt().
      
      cc: x86@kernel.org # .39.x
      Signed-off-by: NLen Brown <len.brown@intel.com>
      cdaab4a0
    • L
      x86 idle APM: deprecate CONFIG_APM_CPU_IDLE · 99c63221
      Len Brown 提交于
      We don't want to export the pm_idle function pointer to modules.
      Currently CONFIG_APM_CPU_IDLE w/ CONFIG_APM_MODULE forces us to.
      
      CONFIG_APM_CPU_IDLE is of dubious value, it runs only on 32-bit
      uniprocessor laptops that are over 10 years old.  It calls into
      the BIOS during idle, and is known to cause a number of machines
      to fail.
      
      Removing CONFIG_APM_CPU_IDLE and will allow us to stop exporting
      pm_idle.  Any systems that were calling into the APM BIOS
      at run-time will simply use HLT instead.
      
      cc: x86@kernel.org
      cc: Jiri Kosina <jkosina@suse.cz>
      cc: stable@kernel.org # .39.x
      Signed-off-by: NLen Brown <len.brown@intel.com>
      99c63221
    • L
      x86 idle floppy: deprecate disable_hlt() · 3b70b2e5
      Len Brown 提交于
      Plan to remove floppy_disable_hlt in 2012, an ancient
      workaround with comments that it should be removed.
      
      This allows us to remove clutter and a run-time branch
      from the idle code.
      
      WARN_ONCE() on invocation until it is removed.
      
      cc: x86@kernel.org
      cc: stable@kernel.org # .39.x
      Signed-off-by: NLen Brown <len.brown@intel.com>
      3b70b2e5
    • L
      x86 idle: EXPORT_SYMBOL(default_idle, pm_idle) only when APM demands it · 06ae40ce
      Len Brown 提交于
      In the long run, we don't want default_idle() or (pm_idle)() to
      be exported outside of process.c.  Start by not exporting them
      to modules, unless the APM build demands it.
      
      cc: x86@kernel.org
      cc: Jiri Kosina <jkosina@suse.cz>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      06ae40ce
    • L
      x86 idle: clarify AMD erratum 400 workaround · 02c68a02
      Len Brown 提交于
      The workaround for AMD erratum 400 uses the term "c1e" falsely suggesting:
      1. Intel C1E is somehow involved
      2. All AMD processors with C1E are involved
      
      Use the string "amd_c1e" instead of simply "c1e" to clarify that
      this workaround is specific to AMD's version of C1E.
      Use the string "e400" to clarify that the workaround is specific
      to AMD processors with Erratum 400.
      
      This patch is text-substitution only, with no functional change.
      
      cc: x86@kernel.org
      Acked-by: NBorislav Petkov <borislav.petkov@amd.com>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      02c68a02
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/vapier/blackfin · 139f37f5
      Linus Torvalds 提交于
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/vapier/blackfin:
        Blackfin: debug-mmrs: include RSI_PID[4567] MMRs
        Blackfin: bf51x: fix up RSI_PID# MMR defines
        Blackfin: bf52x/bf54x: fix up usb MMR defines
        Blackfin: debug-mmrs: fix typos with gptimers/mdma/ppi
        Blackfin: gptimers: add structure for hardware register layout
        Blackfin: wire up new sendmmsg syscall
        Blackfin: mach/bfin_serial_5xx.h: punt now-unused header
        Blackfin: bfin_serial.h: turn default port wrappers into stubs
      139f37f5
    • R
      scsi: fix scsi_proc new kernel-doc warning · 5be7ef00
      Randy Dunlap 提交于
      Fix kernel-doc warnings in scsi_proc.c:
      
        Warning(drivers/scsi/scsi_proc.c:390): No description found for parameter 'dev'
        Warning(drivers/scsi/scsi_proc.c:390): No description found for parameter 'data'
        Warning(drivers/scsi/scsi_proc.c:390): Excess function parameter 's' description in 'always_match'
        Warning(drivers/scsi/scsi_proc.c:390): Excess function parameter 'p' description in 'always_match'
      Signed-off-by: NRandy Dunlap <randy.dunlap@oracle.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5be7ef00
    • T
      idle governor: Avoid lock acquisition to read pm_qos before entering idle · 333c5ae9
      Tim Chen 提交于
      Thanks to the reviews and comments by Rafael, James, Mark and Andi.
      Here's version 2 of the patch incorporating your comments and also some
      update to my previous patch comments.
      
      I noticed that before entering idle state, the menu idle governor will
      look up the current pm_qos target value according to the list of qos
      requests received.  This look up currently needs the acquisition of a
      lock to access the list of qos requests to find the qos target value,
      slowing down the entrance into idle state due to contention by multiple
      cpus to access this list.  The contention is severe when there are a lot
      of cpus waking and going into idle.  For example, for a simple workload
      that has 32 pair of processes ping ponging messages to each other, where
      64 cpu cores are active in test system, I see the following profile with
      37.82% of cpu cycles spent in contention of pm_qos_lock:
      
      -     37.82%          swapper  [kernel.kallsyms]          [k]
      _raw_spin_lock_irqsave
         - _raw_spin_lock_irqsave
            - 95.65% pm_qos_request
                 menu_select
                 cpuidle_idle_call
               - cpu_idle
                    99.98% start_secondary
      
      A better approach will be to cache the updated pm_qos target value so
      reading it does not require lock acquisition as in the patch below.
      With this patch the contention for pm_qos_lock is removed and I saw a
      2.2X increase in throughput for my message passing workload.
      
      cc: stable@kernel.org
      Signed-off-by: NTim Chen <tim.c.chen@linux.intel.com>
      Acked-by: NAndi Kleen <ak@linux.intel.com>
      Acked-by: NJames Bottomley <James.Bottomley@suse.de>
      Acked-by: Nmark gross <markgross@thegnar.org>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      333c5ae9
    • T
      cpuidle: menu: fixed wrapping timers at 4.294 seconds · 7467571f
      Tero Kristo 提交于
      Cpuidle menu governor is using u32 as a temporary datatype for storing
      nanosecond values which wrap around at 4.294 seconds. This causes errors
      in predicted sleep times resulting in higher than should be C state
      selection and increased power consumption. This also breaks cpuidle
      state residency statistics.
      
      cc: stable@kernel.org # .32.x through .39.x
      Signed-off-by: NTero Kristo <tero.kristo@nokia.com>
      Signed-off-by: NLen Brown <len.brown@intel.com>
      7467571f
    • H
      mm: fix page_lock_anon_vma leaving mutex locked · eee0f252
      Hugh Dickins 提交于
      On one machine I've been getting hangs, a page fault's anon_vma_prepare()
      waiting in anon_vma_lock(), other processes waiting for that page's lock.
      
      This is a replay of last year's f1819427 "mm: fix hang on
      anon_vma->root->lock".
      
      The new page_lock_anon_vma() places too much faith in its refcount: when
      it has acquired the mutex_trylock(), it's possible that a racing task in
      anon_vma_alloc() has just reallocated the struct anon_vma, set refcount
      to 1, and is about to reset its anon_vma->root.
      
      Fix this by saving anon_vma->root, and relying on the usual page_mapped()
      check instead of a refcount check: if page is still mapped, the anon_vma
      is still ours; if page is not still mapped, we're no longer interested.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      eee0f252
    • H
      mm: fix kernel BUG at mm/rmap.c:1017! · 5dbe0af4
      Hugh Dickins 提交于
      I've hit the "address >= vma->vm_end" check in do_page_add_anon_rmap()
      just once.  The stack showed khugepaged allocation trying to compact
      pages: the call to page_add_anon_rmap() coming from remove_migration_pte().
      
      That path holds anon_vma lock, but does not hold mmap_sem: it can
      therefore race with a split_vma(), and in commit 5f70b962 "mmap:
      avoid unnecessary anon_vma lock" we just took away the anon_vma lock
      protection when adjusting vma->vm_end.
      
      I don't think that particular BUG_ON ever caught anything interesting,
      so better replace it by a comment, than reinstate the anon_vma locking.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      5dbe0af4
    • H
      tmpfs: fix race between truncate and writepage · 826267cf
      Hugh Dickins 提交于
      While running fsx on tmpfs with a memhog then swapoff, swapoff was hanging
      (interruptibly), repeatedly failing to locate the owner of a 0xff entry in
      the swap_map.
      
      Although shmem_writepage() does abandon when it sees incoming page index
      is beyond eof, there was still a window in which shmem_truncate_range()
      could come in between writepage's dropping lock and updating swap_map,
      find the half-completed swap_map entry, and in trying to free it,
      leave it in a state that swap_shmem_alloc() could not correct.
      
      Arguably a bug in __swap_duplicate()'s and swap_entry_free()'s handling
      of the different cases, but easiest to fix by moving swap_shmem_alloc()
      under cover of the lock.
      
      More interesting than the bug: it's been there since 2.6.33, why could
      I not see it with earlier kernels?  The mmotm of two weeks ago seems to
      have some magic for generating races, this is just one of three I found.
      
      With yesterday's git I first saw this in mainline, bisected in search of
      that magic, but the easy reproducibility evaporated.  Oh well, fix the bug.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: stable@kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      826267cf
    • M
      Blackfin: debug-mmrs: include RSI_PID[4567] MMRs · c320afe9
      Mike Frysinger 提交于
      The documentation is a little iffy as to whether these are actual MMRs,
      but reading them on the hardware works, and the previous version of this
      logic (the SDH) had PID[4567].  So add it for RSI too.
      Signed-off-by: NMike Frysinger <vapier@gentoo.org>
      c320afe9
    • M
      Blackfin: bf51x: fix up RSI_PID# MMR defines · fcb24391
      Mike Frysinger 提交于
      Looks like the copying of MMR defines from the SDH block missed updating
      the addresses of the RSI_PID# registers.  So tweak them to reflect the
      actual hardware.
      Signed-off-by: NMike Frysinger <vapier@gentoo.org>
      fcb24391
    • M
      Blackfin: bf52x/bf54x: fix up usb MMR defines · 61aa818f
      Mike Frysinger 提交于
      The bf52x/bf54x have the incorrect addresses for USB_EP_NI7_RXINTERVAL
      and USB_EP_NI7_TXCOUNT, so adjust those.
      
      Further, the bf54x header puts the USB defines in the wrong place, so
      shuffle them back to the right grouping.
      Signed-off-by: NMike Frysinger <vapier@gentoo.org>
      61aa818f
    • M
      Blackfin: debug-mmrs: fix typos with gptimers/mdma/ppi · d09fb602
      Mike Frysinger 提交于
      This code was mostly developed against a BF54x, so some BF537-specific
      issues were missed.
      
      The PPI block starts at PPI_CONTROL, not PPI_STATUS (which is the reverse
      of the EPPI block).
      
      The MDMA block starts at MDMA_NEXT_DESC_PTR, not MDMA_CONFIG.  Seems the
      sim does not catch misreads here so that'll need to get fixed.
      
      The gptimer block is mostly 32bit regs, not 16bit.  Use the gptimer struct
      to figure that out rather than hardcoding it locally.
      Signed-off-by: NMike Frysinger <vapier@gentoo.org>
      d09fb602
    • M
    • M
      Blackfin: wire up new sendmmsg syscall · 427472c9
      Mike Frysinger 提交于
      Signed-off-by: NMike Frysinger <vapier@gentoo.org>
      427472c9
    • M
      Blackfin: mach/bfin_serial_5xx.h: punt now-unused header · 63917efc
      Mike Frysinger 提交于
      Now that the serial code has been unified in bfin_serial.h, and the
      Blackfin UART driver pushed its resources to the boards files, we
      don't need these headers anymore.
      Signed-off-by: NMike Frysinger <vapier@gentoo.org>
      63917efc
    • M
      Blackfin: bfin_serial.h: turn default port wrappers into stubs · 091c7598
      Mike Frysinger 提交于
      Any consumer that needs to access the MMRs has to provide these helpers,
      so make the default into useless stubs.
      Signed-off-by: NMike Frysinger <vapier@gentoo.org>
      091c7598
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 · 36947a76
      Linus Torvalds 提交于
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6: (36 commits)
        Cache xattr security drop check for write v2
        fs: block_page_mkwrite should wait for writeback to finish
        mm: Wait for writeback when grabbing pages to begin a write
        configfs: remove unnecessary dentry_unhash on rmdir, dir rename
        fat: remove unnecessary dentry_unhash on rmdir, dir rename
        hpfs: remove unnecessary dentry_unhash on rmdir, dir rename
        minix: remove unnecessary dentry_unhash on rmdir, dir rename
        fuse: remove unnecessary dentry_unhash on rmdir, dir rename
        coda: remove unnecessary dentry_unhash on rmdir, dir rename
        afs: remove unnecessary dentry_unhash on rmdir, dir rename
        affs: remove unnecessary dentry_unhash on rmdir, dir rename
        9p: remove unnecessary dentry_unhash on rmdir, dir rename
        ncpfs: fix rename over directory with dangling references
        ncpfs: document dentry_unhash usage
        ecryptfs: remove unnecessary dentry_unhash on rmdir, dir rename
        hostfs: remove unnecessary dentry_unhash on rmdir, dir rename
        hfsplus: remove unnecessary dentry_unhash on rmdir, dir rename
        hfs: remove unnecessary dentry_unhash on rmdir, dir rename
        omfs: remove unnecessary dentry_unhash on rmdir, dir rneame
        udf: remove unnecessary dentry_unhash from rmdir, dir rename
        ...
      36947a76
    • L
      Merge branch 'x86-urgent-for-linus' of... · a947e23a
      Linus Torvalds 提交于
      Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
      
      * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
        x86, asm: Clean up desc.h a bit
        x86, amd: Do not enable ARAT feature on AMD processors below family 0x12
        x86: Move do_page_fault()'s error path under unlikely()
        x86, efi: Retain boot service code until after switching to virtual mode
        x86: Remove unnecessary check in detect_ht()
        x86: Reorder mm_context_t to remove x86_64 alignment padding and thus shrink mm_struct
        x86, UV: Clean up uv_tlb.c
        x86, UV: Add support for SGI UV2 hub chip
        x86, cpufeature: Update CPU feature RDRND to RDRAND
      a947e23a
    • L
      Merge branch 'sched-urgent-for-linus' of... · 08a8b796
      Linus Torvalds 提交于
      Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
      
      * 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
        cpuset: Fix cpuset_cpus_allowed_fallback(), don't update tsk->rt.nr_cpus_allowed
        sched: Fix ->min_vruntime calculation in dequeue_entity()
        sched: Fix ttwu() for __ARCH_WANT_INTERRUPTS_ON_CTXSW
        sched: More sched_domain iterations fixes
      08a8b796
    • L
      Merge branch 'core-urgent-for-linus' of... · 1ba4b8cb
      Linus Torvalds 提交于
      Merge branch 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
      
      * 'core-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
        rcu: Start RCU kthreads in TASK_INTERRUPTIBLE state
        rcu: Remove waitqueue usage for cpu, node, and boost kthreads
        rcu: Avoid acquiring rcu_node locks in timer functions
        atomic: Add atomic_or()
        Documentation: Add statistics about nested locks
        rcu: Decrease memory-barrier usage based on semi-formal proof
        rcu: Make rcu_enter_nohz() pay attention to nesting
        rcu: Don't do reschedule unless in irq
        rcu: Remove old memory barriers from rcu_process_callbacks()
        rcu: Add memory barriers
        rcu: Fix unpaired rcu_irq_enter() from locking selftests
      1ba4b8cb
    • L
      Merge branch 'perf-urgent-for-linus' of... · c4a227d8
      Linus Torvalds 提交于
      Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
      
      * 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (25 commits)
        perf: Fix SIGIO handling
        perf top: Don't stop if no kernel symtab is found
        perf top: Handle kptr_restrict
        perf top: Remove unused macro
        perf events: initialize fd array to -1 instead of 0
        perf tools: Make sure kptr_restrict warnings fit 80 col terms
        perf tools: Fix build on older systems
        perf symbols: Handle /proc/sys/kernel/kptr_restrict
        perf: Remove duplicate headers
        ftrace: Add internal recursive checks
        tracing: Update btrfs's tracepoints to use u64 interface
        tracing: Add __print_symbolic_u64 to avoid warnings on 32bit machine
        ftrace: Set ops->flag to enabled even on static function tracing
        tracing: Have event with function tracer check error return
        ftrace: Have ftrace_startup() return failure code
        jump_label: Check entries limit in __jump_label_update
        ftrace/recordmcount: Avoid STT_FUNC symbols as base on ARM
        scripts/tags.sh: Add magic for trace-events for etags too
        scripts/tags.sh: Fix ctags for DEFINE_EVENT()
        x86/ftrace: Fix compiler warning in ftrace.c
        ...
      c4a227d8
    • L
      Merge branch 'for-usb-next' of git://git.kernel.org/pub/scm/linux/kernel/git/sarah/xhci · 87367a0b
      Linus Torvalds 提交于
      * 'for-usb-next' of git://git.kernel.org/pub/scm/linux/kernel/git/sarah/xhci:
        Intel xhci: Limit number of active endpoints to 64.
        Intel xhci: Ignore spurious successful event.
        Intel xhci: Support EHCI/xHCI port switching.
        Intel xhci: Add PCI id for Panther Point xHCI host.
        xhci: STFU: Be quieter during URB submission and completion.
        xhci: STFU: Don't print event ring dequeue pointer.
        xhci: STFU: Remove function tracing.
        xhci: Don't submit commands when the host is dead.
        xhci: Clear stopped_td when Stop Endpoint command completes.
      87367a0b
    • L
      Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx · 4cb865de
      Linus Torvalds 提交于
      * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/djbw/async_tx: (33 commits)
        x86: poll waiting for I/OAT DMA channel status
        maintainers: add dma engine tree details
        dmaengine: add TODO items for future work on dma drivers
        dmaengine: Add API documentation for slave dma usage
        dmaengine/dw_dmac: Update maintainer-ship
        dmaengine: move link order
        dmaengine/dw_dmac: implement pause and resume in dwc_control
        dmaengine/dw_dmac: Replace spin_lock* with irqsave variants and enable submission from callback
        dmaengine/dw_dmac: Divide one sg to many desc, if sg len is greater than DWC_MAX_COUNT
        dmaengine/dw_dmac: set residue as total len in dwc_tx_status if status is !DMA_SUCCESS
        dmaengine/dw_dmac: don't call callback routine in case dmaengine_terminate_all() is called
        dmaengine: at_hdmac: pause: no need to wait for FIFO empty
        pch_dma: modify pci device table definition
        pch_dma: Support new device ML7223 IOH
        pch_dma: Support I2S for ML7213 IOH
        pch_dma: Fix DMA setting issue
        pch_dma: modify for checkpatch
        pch_dma: fix dma direction issue for ML7213 IOH video-in
        dmaengine: at_hdmac: use descriptor chaining help function
        dmaengine: at_hdmac: implement pause and resume in atc_control
        ...
      
      Fix up trivial conflict in drivers/dma/dw_dmac.c
      4cb865de
    • L
      Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6 · 55f08e1b
      Linus Torvalds 提交于
      * 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/sameo/mfd-2.6:
        mfd: Fix build breakage caused by tps65910 gpio directory move
        mfd: Use mfd cell platform_data for db8500-prcmu cells platform bits
      55f08e1b
    • L
      Merge branch 'spi/next' of git://git.secretlab.ca/git/linux-2.6 · d02bf062
      Linus Torvalds 提交于
      * 'spi/next' of git://git.secretlab.ca/git/linux-2.6:
        spi/spi_bfin_sport: new driver for a SPI bus via the Blackfin SPORT peripheral
        spi/tle620x: add missing device_remove_file()
      d02bf062
    • L
      Merge branch 'gpio/next' of git://git.secretlab.ca/git/linux-2.6 · 04830fcc
      Linus Torvalds 提交于
      * 'gpio/next' of git://git.secretlab.ca/git/linux-2.6:
        gpio/pch_gpio: Support new device ML7223
        gpio: make gpio_{request,free}_array gpio array parameter const
        GPIO: OMAP: move to drivers/gpio
        GPIO: OMAP: move register offset defines into <plat/gpio.h>
        gpio: Convert gpio_is_valid to return bool
        gpio: Move the s5pc100 GPIO to drivers/gpio
        gpio: Move the s5pv210 GPIO to drivers/gpio
        gpio: Move the exynos4 GPIO to drivers/gpio
        gpio: Move to Samsung common GPIO library to drivers/gpio
        gpio/nomadik: add function to read GPIO pull down status
        gpio/nomadik: show all pins in debug
        gpio: move Nomadik GPIO driver to drivers/gpio
        gpio: move U300 GPIO driver to drivers/gpio
        langwell_gpio: add runtime pm support
        gpio/pca953x: Add support for pca9574 and pca9575 devices
        gpio/cs5535: Show explicit dependency between gpio_cs5535 and mfd_cs5535
      04830fcc
    • L
      Merge branch 'setns' · 571503e1
      Linus Torvalds 提交于
      * setns:
        ns: Wire up the setns system call
      
      Done as a merge to make it easier to fix up conflicts in arm due to
      addition of sendmmsg system call
      571503e1
    • E
      ns: Wire up the setns system call · 7b21fddd
      Eric W. Biederman 提交于
      32bit and 64bit on x86 are tested and working.  The rest I have looked
      at closely and I can't find any problems.
      
      setns is an easy system call to wire up.  It just takes two ints so I
      don't expect any weird architecture porting problems.
      
      While doing this I have noticed that we have some architectures that are
      very slow to get new system calls.  cris seems to be the slowest where
      the last system calls wired up were preadv and pwritev.  avr32 is weird
      in that recvmmsg was wired up but never declared in unistd.h.  frv is
      behind with perf_event_open being the last syscall wired up.  On h8300
      the last system call wired up was epoll_wait.  On m32r the last system
      call wired up was fallocate.  mn10300 has recvmmsg as the last system
      call wired up.  The rest seem to at least have syncfs wired up which was
      new in the 2.6.39.
      
      v2: Most of the architecture support added by Daniel Lezcano <dlezcano@fr.ibm.com>
      v3: ported to v2.6.36-rc4 by: Eric W. Biederman <ebiederm@xmission.com>
      v4: Moved wiring up of the system call to another patch
      v5: ported to v2.6.39-rc6
      v6: rebased onto parisc-next and net-next to avoid syscall  conflicts.
      v7: ported to Linus's latest post 2.6.39 tree.
      
      >  arch/blackfin/include/asm/unistd.h     |    3 ++-
      >  arch/blackfin/mach-common/entry.S      |    1 +
      Acked-by: NMike Frysinger <vapier@gentoo.org>
      
      Oh - ia64 wiring looks good.
      Acked-by: NTony Luck <tony.luck@intel.com>
      Signed-off-by: NEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7b21fddd
    • A
      Cache xattr security drop check for write v2 · 69b45732
      Andi Kleen 提交于
      Some recent benchmarking on btrfs showed that a major scaling bottleneck
      on large systems on btrfs is currently the xattr lookup on every write.
      
      Why xattr lookup on every write I hear you ask?
      
      write wants to drop suid and security related xattrs that could set o
      capabilities for executables.  To do that it currently looks up
      security.capability on EVERY write (even for non executables) to decide
      whether to drop it or not.
      
      In btrfs this causes an additional tree walk, hitting some per file system
      locks and quite bad scalability. In a simple read workload on a 8S
      system I saw over 90% CPU time in spinlocks related to that.
      
      Chris Mason tells me this is also a problem in ext4, where it hits
      the global mbcache lock.
      
      This patch adds a simple per inode to avoid this problem.  We only
      do the lookup once per file and then if there is no xattr cache
      the decision. All xattr changes clear the flag.
      
      I also used the same flag to avoid the suid check, although
      that one is pretty cheap.
      
      A file system can also set this flag when it creates the inode,
      if it has a cheap way to do so.  This is done for some common file systems
      in followon patches.
      
      With this patch a major part of the lock contention disappears
      for btrfs. Some testing on smaller systems didn't show significant
      performance changes, but at least it helps the larger systems
      and is generally more efficient.
      
      v2: Rename is_sgid. add file system helper.
      Cc: chris.mason@oracle.com
      Cc: josef@redhat.com
      Cc: viro@zeniv.linux.org.uk
      Cc: agruen@linbit.com
      Cc: Serge E. Hallyn <serue@us.ibm.com>
      Signed-off-by: NAndi Kleen <ak@linux.intel.com>
      Signed-off-by: NAl Viro <viro@zeniv.linux.org.uk>
      69b45732
  3. 28 5月, 2011 3 次提交
    • P
      rcu: Start RCU kthreads in TASK_INTERRUPTIBLE state · cc3ce517
      Paul E. McKenney 提交于
      Upon creation, kthreads are in TASK_UNINTERRUPTIBLE state, which can
      result in softlockup warnings.  Because some of RCU's kthreads can
      legitimately be idle indefinitely, start them in TASK_INTERRUPTIBLE
      state in order to avoid those warnings.
      Suggested-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Tested-by: NYinghai Lu <yinghai@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      cc3ce517
    • P
      rcu: Remove waitqueue usage for cpu, node, and boost kthreads · 08bca60a
      Peter Zijlstra 提交于
      It is not necessary to use waitqueues for the RCU kthreads because
      we always know exactly which thread is to be awakened.  In addition,
      wake_up() only issues an actual wakeup when there is a thread waiting on
      the queue, which was why there was an extra explicit wake_up_process()
      to get the RCU kthreads started.
      
      Eliminating the waitqueues (and wake_up()) in favor of wake_up_process()
      eliminates the need for the initial wake_up_process() and also shrinks
      the data structure size a bit.  The wakeup logic is placed in a new
      rcu_wait() macro.
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      08bca60a
    • P
      rcu: Avoid acquiring rcu_node locks in timer functions · 8826f3b0
      Paul E. McKenney 提交于
      This commit switches manipulations of the rcu_node ->wakemask field
      to atomic operations, which allows rcu_cpu_kthread_timer() to avoid
      acquiring the rcu_node lock.  This should avoid the following lockdep
      splat reported by Valdis Kletnieks:
      
      [   12.872150] usb 1-4: new high speed USB device number 3 using ehci_hcd
      [   12.986667] usb 1-4: New USB device found, idVendor=413c, idProduct=2513
      [   12.986679] usb 1-4: New USB device strings: Mfr=0, Product=0, SerialNumber=0
      [   12.987691] hub 1-4:1.0: USB hub found
      [   12.987877] hub 1-4:1.0: 3 ports detected
      [   12.996372] input: PS/2 Generic Mouse as /devices/platform/i8042/serio1/input/input10
      [   13.071471] udevadm used greatest stack depth: 3984 bytes left
      [   13.172129]
      [   13.172130] =======================================================
      [   13.172425] [ INFO: possible circular locking dependency detected ]
      [   13.172650] 2.6.39-rc6-mmotm0506 #1
      [   13.172773] -------------------------------------------------------
      [   13.172997] blkid/267 is trying to acquire lock:
      [   13.173009]  (&p->pi_lock){-.-.-.}, at: [<ffffffff81032d8f>] try_to_wake_up+0x29/0x1aa
      [   13.173009]
      [   13.173009] but task is already holding lock:
      [   13.173009]  (rcu_node_level_0){..-...}, at: [<ffffffff810901cc>] rcu_cpu_kthread_timer+0x27/0x58
      [   13.173009]
      [   13.173009] which lock already depends on the new lock.
      [   13.173009]
      [   13.173009]
      [   13.173009] the existing dependency chain (in reverse order) is:
      [   13.173009]
      [   13.173009] -> #2 (rcu_node_level_0){..-...}:
      [   13.173009]        [<ffffffff810679b9>] check_prevs_add+0x8b/0x104
      [   13.173009]        [<ffffffff81067da1>] validate_chain+0x36f/0x3ab
      [   13.173009]        [<ffffffff8106846b>] __lock_acquire+0x369/0x3e2
      [   13.173009]        [<ffffffff81068a0f>] lock_acquire+0xfc/0x14c
      [   13.173009]        [<ffffffff815697f1>] _raw_spin_lock+0x36/0x45
      [   13.173009]        [<ffffffff81090794>] rcu_read_unlock_special+0x8c/0x1d5
      [   13.173009]        [<ffffffff8109092c>] __rcu_read_unlock+0x4f/0xd7
      [   13.173009]        [<ffffffff81027bd3>] rcu_read_unlock+0x21/0x23
      [   13.173009]        [<ffffffff8102cc34>] cpuacct_charge+0x6c/0x75
      [   13.173009]        [<ffffffff81030cc6>] update_curr+0x101/0x12e
      [   13.173009]        [<ffffffff810311d0>] check_preempt_wakeup+0xf7/0x23b
      [   13.173009]        [<ffffffff8102acb3>] check_preempt_curr+0x2b/0x68
      [   13.173009]        [<ffffffff81031d40>] ttwu_do_wakeup+0x76/0x128
      [   13.173009]        [<ffffffff81031e49>] ttwu_do_activate.constprop.63+0x57/0x5c
      [   13.173009]        [<ffffffff81031e96>] scheduler_ipi+0x48/0x5d
      [   13.173009]        [<ffffffff810177d5>] smp_reschedule_interrupt+0x16/0x18
      [   13.173009]        [<ffffffff815710f3>] reschedule_interrupt+0x13/0x20
      [   13.173009]        [<ffffffff810b66d1>] rcu_read_unlock+0x21/0x23
      [   13.173009]        [<ffffffff810b739c>] find_get_page+0xa9/0xb9
      [   13.173009]        [<ffffffff810b8b48>] filemap_fault+0x6a/0x34d
      [   13.173009]        [<ffffffff810d1a25>] __do_fault+0x54/0x3e6
      [   13.173009]        [<ffffffff810d447a>] handle_pte_fault+0x12c/0x1ed
      [   13.173009]        [<ffffffff810d48f7>] handle_mm_fault+0x1cd/0x1e0
      [   13.173009]        [<ffffffff8156cfee>] do_page_fault+0x42d/0x5de
      [   13.173009]        [<ffffffff8156a75f>] page_fault+0x1f/0x30
      [   13.173009]
      [   13.173009] -> #1 (&rq->lock){-.-.-.}:
      [   13.173009]        [<ffffffff810679b9>] check_prevs_add+0x8b/0x104
      [   13.173009]        [<ffffffff81067da1>] validate_chain+0x36f/0x3ab
      [   13.173009]        [<ffffffff8106846b>] __lock_acquire+0x369/0x3e2
      [   13.173009]        [<ffffffff81068a0f>] lock_acquire+0xfc/0x14c
      [   13.173009]        [<ffffffff815697f1>] _raw_spin_lock+0x36/0x45
      [   13.173009]        [<ffffffff81027e19>] __task_rq_lock+0x8b/0xd3
      [   13.173009]        [<ffffffff81032f7f>] wake_up_new_task+0x41/0x108
      [   13.173009]        [<ffffffff810376c3>] do_fork+0x265/0x33f
      [   13.173009]        [<ffffffff81007d02>] kernel_thread+0x6b/0x6d
      [   13.173009]        [<ffffffff8153a9dd>] rest_init+0x21/0xd2
      [   13.173009]        [<ffffffff81b1db4f>] start_kernel+0x3bb/0x3c6
      [   13.173009]        [<ffffffff81b1d29f>] x86_64_start_reservations+0xaf/0xb3
      [   13.173009]        [<ffffffff81b1d393>] x86_64_start_kernel+0xf0/0xf7
      [   13.173009]
      [   13.173009] -> #0 (&p->pi_lock){-.-.-.}:
      [   13.173009]        [<ffffffff81067788>] check_prev_add+0x68/0x20e
      [   13.173009]        [<ffffffff810679b9>] check_prevs_add+0x8b/0x104
      [   13.173009]        [<ffffffff81067da1>] validate_chain+0x36f/0x3ab
      [   13.173009]        [<ffffffff8106846b>] __lock_acquire+0x369/0x3e2
      [   13.173009]        [<ffffffff81068a0f>] lock_acquire+0xfc/0x14c
      [   13.173009]        [<ffffffff815698ea>] _raw_spin_lock_irqsave+0x44/0x57
      [   13.173009]        [<ffffffff81032d8f>] try_to_wake_up+0x29/0x1aa
      [   13.173009]        [<ffffffff81032f3c>] wake_up_process+0x10/0x12
      [   13.173009]        [<ffffffff810901e9>] rcu_cpu_kthread_timer+0x44/0x58
      [   13.173009]        [<ffffffff81045286>] call_timer_fn+0xac/0x1e9
      [   13.173009]        [<ffffffff8104556d>] run_timer_softirq+0x1aa/0x1f2
      [   13.173009]        [<ffffffff8103e487>] __do_softirq+0x109/0x26a
      [   13.173009]        [<ffffffff8157144c>] call_softirq+0x1c/0x30
      [   13.173009]        [<ffffffff81003207>] do_softirq+0x44/0xf1
      [   13.173009]        [<ffffffff8103e8b9>] irq_exit+0x58/0xc8
      [   13.173009]        [<ffffffff81017f5a>] smp_apic_timer_interrupt+0x79/0x87
      [   13.173009]        [<ffffffff81570fd3>] apic_timer_interrupt+0x13/0x20
      [   13.173009]        [<ffffffff810bd51a>] get_page_from_freelist+0x2aa/0x310
      [   13.173009]        [<ffffffff810bdf03>] __alloc_pages_nodemask+0x178/0x243
      [   13.173009]        [<ffffffff8101fe2f>] pte_alloc_one+0x1e/0x3a
      [   13.173009]        [<ffffffff810d27fe>] __pte_alloc+0x22/0x14b
      [   13.173009]        [<ffffffff810d48a8>] handle_mm_fault+0x17e/0x1e0
      [   13.173009]        [<ffffffff8156cfee>] do_page_fault+0x42d/0x5de
      [   13.173009]        [<ffffffff8156a75f>] page_fault+0x1f/0x30
      [   13.173009]
      [   13.173009] other info that might help us debug this:
      [   13.173009]
      [   13.173009] Chain exists of:
      [   13.173009]   &p->pi_lock --> &rq->lock --> rcu_node_level_0
      [   13.173009]
      [   13.173009]  Possible unsafe locking scenario:
      [   13.173009]
      [   13.173009]        CPU0                    CPU1
      [   13.173009]        ----                    ----
      [   13.173009]   lock(rcu_node_level_0);
      [   13.173009]                                lock(&rq->lock);
      [   13.173009]                                lock(rcu_node_level_0);
      [   13.173009]   lock(&p->pi_lock);
      [   13.173009]
      [   13.173009]  *** DEADLOCK ***
      [   13.173009]
      [   13.173009] 3 locks held by blkid/267:
      [   13.173009]  #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff8156cdb4>] do_page_fault+0x1f3/0x5de
      [   13.173009]  #1:  (&yield_timer){+.-...}, at: [<ffffffff810451da>] call_timer_fn+0x0/0x1e9
      [   13.173009]  #2:  (rcu_node_level_0){..-...}, at: [<ffffffff810901cc>] rcu_cpu_kthread_timer+0x27/0x58
      [   13.173009]
      [   13.173009] stack backtrace:
      [   13.173009] Pid: 267, comm: blkid Not tainted 2.6.39-rc6-mmotm0506 #1
      [   13.173009] Call Trace:
      [   13.173009]  <IRQ>  [<ffffffff8154a529>] print_circular_bug+0xc8/0xd9
      [   13.173009]  [<ffffffff81067788>] check_prev_add+0x68/0x20e
      [   13.173009]  [<ffffffff8100c861>] ? save_stack_trace+0x28/0x46
      [   13.173009]  [<ffffffff810679b9>] check_prevs_add+0x8b/0x104
      [   13.173009]  [<ffffffff81067da1>] validate_chain+0x36f/0x3ab
      [   13.173009]  [<ffffffff8106846b>] __lock_acquire+0x369/0x3e2
      [   13.173009]  [<ffffffff81032d8f>] ? try_to_wake_up+0x29/0x1aa
      [   13.173009]  [<ffffffff81068a0f>] lock_acquire+0xfc/0x14c
      [   13.173009]  [<ffffffff81032d8f>] ? try_to_wake_up+0x29/0x1aa
      [   13.173009]  [<ffffffff810901a5>] ? rcu_check_quiescent_state+0x82/0x82
      [   13.173009]  [<ffffffff815698ea>] _raw_spin_lock_irqsave+0x44/0x57
      [   13.173009]  [<ffffffff81032d8f>] ? try_to_wake_up+0x29/0x1aa
      [   13.173009]  [<ffffffff81032d8f>] try_to_wake_up+0x29/0x1aa
      [   13.173009]  [<ffffffff810901a5>] ? rcu_check_quiescent_state+0x82/0x82
      [   13.173009]  [<ffffffff81032f3c>] wake_up_process+0x10/0x12
      [   13.173009]  [<ffffffff810901e9>] rcu_cpu_kthread_timer+0x44/0x58
      [   13.173009]  [<ffffffff810901a5>] ? rcu_check_quiescent_state+0x82/0x82
      [   13.173009]  [<ffffffff81045286>] call_timer_fn+0xac/0x1e9
      [   13.173009]  [<ffffffff810451da>] ? del_timer+0x75/0x75
      [   13.173009]  [<ffffffff810901a5>] ? rcu_check_quiescent_state+0x82/0x82
      [   13.173009]  [<ffffffff8104556d>] run_timer_softirq+0x1aa/0x1f2
      [   13.173009]  [<ffffffff8103e487>] __do_softirq+0x109/0x26a
      [   13.173009]  [<ffffffff8106365f>] ? tick_dev_program_event+0x37/0xf6
      [   13.173009]  [<ffffffff810a0e4a>] ? time_hardirqs_off+0x1b/0x2f
      [   13.173009]  [<ffffffff8157144c>] call_softirq+0x1c/0x30
      [   13.173009]  [<ffffffff81003207>] do_softirq+0x44/0xf1
      [   13.173009]  [<ffffffff8103e8b9>] irq_exit+0x58/0xc8
      [   13.173009]  [<ffffffff81017f5a>] smp_apic_timer_interrupt+0x79/0x87
      [   13.173009]  [<ffffffff81570fd3>] apic_timer_interrupt+0x13/0x20
      [   13.173009]  <EOI>  [<ffffffff810bd384>] ? get_page_from_freelist+0x114/0x310
      [   13.173009]  [<ffffffff810bd51a>] ? get_page_from_freelist+0x2aa/0x310
      [   13.173009]  [<ffffffff812220e7>] ? clear_page_c+0x7/0x10
      [   13.173009]  [<ffffffff810bd1ef>] ? prep_new_page+0x14c/0x1cd
      [   13.173009]  [<ffffffff810bd51a>] get_page_from_freelist+0x2aa/0x310
      [   13.173009]  [<ffffffff810bdf03>] __alloc_pages_nodemask+0x178/0x243
      [   13.173009]  [<ffffffff810d46b9>] ? __pmd_alloc+0x87/0x99
      [   13.173009]  [<ffffffff8101fe2f>] pte_alloc_one+0x1e/0x3a
      [   13.173009]  [<ffffffff810d46b9>] ? __pmd_alloc+0x87/0x99
      [   13.173009]  [<ffffffff810d27fe>] __pte_alloc+0x22/0x14b
      [   13.173009]  [<ffffffff810d48a8>] handle_mm_fault+0x17e/0x1e0
      [   13.173009]  [<ffffffff8156cfee>] do_page_fault+0x42d/0x5de
      [   13.173009]  [<ffffffff810d915f>] ? sys_brk+0x32/0x10c
      [   13.173009]  [<ffffffff810a0e4a>] ? time_hardirqs_off+0x1b/0x2f
      [   13.173009]  [<ffffffff81065c4f>] ? trace_hardirqs_off_caller+0x3f/0x9c
      [   13.173009]  [<ffffffff812235dd>] ? trace_hardirqs_off_thunk+0x3a/0x3c
      [   13.173009]  [<ffffffff8156a75f>] page_fault+0x1f/0x30
      [   14.010075] usb 5-1: new full speed USB device number 2 using uhci_hcd
      Reported-by: NValdis Kletnieks <Valdis.Kletnieks@vt.edu>
      Signed-off-by: NPaul E. McKenney <paul.mckenney@linaro.org>
      Signed-off-by: NPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      8826f3b0