1. 04 2月, 2011 5 次提交
    • R
      ARM: mmci: add dmaengine-based DMA support · c8ebae37
      Russell King 提交于
      Based on a patch from Linus Walleij.
      
      Add dmaengine based support for DMA to the MMCI driver, using the
      Primecell DMA engine interface.  The changes over Linus' driver are:
      
      - rename txsize_threshold to dmasize_threshold, as this reflects the
        purpose more.
      - use 'mmci_dma_' as the function prefix rather than 'dma_mmci_'.
      - clean up requesting of dma channels.
      - don't release a single channel twice when it's shared between tx and rx.
      - get rid of 'dma_enable' bool - instead check whether the channel is NULL.
      - detect incomplete DMA at the end of a transfer.  Some DMA controllers
        (eg, PL08x) are unable to be configured for scatter DMA and also listen
        to all four DMA request signals [BREQ,SREQ,LBREQ,LSREQ] from the MMCI.
        They can do one or other but not both.  As MMCI uses LBREQ/LSREQ for the
        final burst/words, PL08x does not transfer the last few words.
      - map and unmap DMA buffers using the DMA engine struct device, not the
        MMCI struct device - the DMA engine is doing the DMA transfer, not us.
      - avoid double-unmapping of the DMA buffers on MMCI data errors.
      - don't check for negative values from the dmaengine tx submission
        function - Dan says this must never fail.
      - use new dmaengine helper functions rather than using the ugly function
        pointers directly.
      - allow DMA code to be fully optimized away using dma_inprogress() which
        is defined to constant 0 if DMA engine support is disabled.
      - request maximum segment size from the DMA engine struct device and
        set this appropriately.
      - removed checking of buffer alignment - the DMA engine should deal with
        its own restrictions on buffer alignment, not the individual DMA engine
        users.
      - removed setting DMAREQCTL - this confuses some DMA controllers as it
        causes LBREQ to be asserted for the last seven transfers, rather than
        six SREQ and one LSREQ.
      - removed burst setting - the DMA controller should not burst past the
        transfer size required to complete the DMA operation.
      Tested-by: NLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      c8ebae37
    • R
      ARM: mmci: no need for separate host->data_xfered · 51d4375d
      Russell King 提交于
      We don't need to store the number of bytes transferred in our host
      structure - we can store this directly in data->bytes_xfered.
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      51d4375d
    • R
      ARM: mmci: avoid unnecessary switch to data available PIO interrupts · c4d877c1
      Russell King 提交于
      We don't need to switch to data available interrupts if there's at
      least half a FIFO depth worth of data remaining, as we'll still get
      the FIFO half full interrupt.  Keep this interrupt masked off until
      we have less than half the FIFO depth worth of data remaining.
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      c4d877c1
    • R
      ARM: mmci: no need to call flush_dcache_page() with sg_miter API · 7d7aa23c
      Russell King 提交于
      The sg_miter API provides the required cache maintainence, so we don't
      need to do that ourselves.  Remove the unnecessary additional cache
      maintainence.
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      7d7aa23c
    • R
      ARM: mmci: avoid reporting too many completed bytes on fifo overrun · c8afc9d5
      Russell King 提交于
      The data counter counts the number of bytes transferred on the MMC bus.
      When a FIFO overrun occurs, we will not have transferred a FIFOs-worth
      of data to memory, and so the data counter will be a FIFOs-worth ahead.
      If this occurs on a block boundary, we will report one too many sectors
      as successful.  Fix this.
      Acked-by: NLinus Walleij <linus.walleij@linaro.org>
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      c8afc9d5
  2. 31 1月, 2011 2 次提交
  3. 28 1月, 2011 8 次提交
  4. 27 1月, 2011 5 次提交
  5. 26 1月, 2011 20 次提交
    • H
      avr32: add missing include causing undefined pgtable_page_* references · 6cb8e872
      Hans-Christian Egtvedt 提交于
      This patch adds the linux/mm.h header file to the AVR32 arch pgalloc.c
      implementation to fix the undefined reference to pgtable_page_ctor() and
      pgtable_page_dtor().
      Signed-off-by: NHans-Christian Egtvedt <hans-christian.egtvedt@atmel.com>
      6cb8e872
    • P
      sched: Use rq->clock_task instead of rq->clock for correctly maintaining load averages · 05ca62c6
      Paul Turner 提交于
      The delta in clock_task is a more fair attribution of how much time a tg has
      been contributing load to the current cpu.
      
      While not really important it also means we're more in sync (by magnitude)
      with respect to periodic updates (since __update_curr deltas are clock_task
      based).
      Signed-off-by: NPaul Turner <pjt@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20110122044852.007092349@google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      05ca62c6
    • P
      sched: Fix/remove redundant cfs_rq checks · b815f196
      Paul Turner 提交于
      Since updates are against an entity's queuing cfs_rq it's not possible to
      enter update_cfs_{shares,load} with a NULL cfs_rq.  (Indeed, update_cfs_load
      would crash prior to the check if we did anyway since we load is examined
      during the initializers).
      
      Also, in the update_cfs_load case there's no point
      in maintaining averages for rq->cfs_rq since we don't perform shares
      distribution at that level -- NULL check is replaced accordingly.
      
      Thanks to Dan Carpenter for pointing out the deference before NULL check.
      Signed-off-by: NPaul Turner <pjt@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20110122044851.825284940@google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      b815f196
    • P
      sched: Fix sign under-flows in wake_affine · e37b6a7b
      Paul Turner 提交于
      While care is taken around the zero-point in effective_load to not exceed
      the instantaneous rq->weight, it's still possible (e.g. using wake_idx != 0)
      for (load + effective_load) to underflow.
      
      In this case the comparing the unsigned values can result in incorrect balanced
      decisions.
      Signed-off-by: NPaul Turner <pjt@google.com>
      Signed-off-by: NPeter Zijlstra <a.p.zijlstra@chello.nl>
      LKML-Reference: <20110122044851.734245014@google.com>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      e37b6a7b
    • E
      percpu, x86: Fix percpu_xchg_op() · 889a7a6a
      Eric Dumazet 提交于
      These recent percpu commits:
      
        2485b646: x86,percpu: Move out of place 64 bit ops into X86_64 section
        8270137a: cpuops: Use cmpxchg for xchg to avoid lock semantics
      
      Caused this 'perf top' crash:
      
       Kernel panic - not syncing: Fatal exception in interrupt
       Pid: 0, comm: swapper Tainted: G     D
       2.6.38-rc2-00181-gef71723 #413 Call Trace: <IRQ> [<ffffffff810465b5>]
          ? panic
          ? kmsg_dump
          ? kmsg_dump
          ? oops_end
          ? no_context
          ? __bad_area_nosemaphore
          ? perf_output_begin
          ? bad_area_nosemaphore
          ? do_page_fault
          ? __task_pid_nr_ns
          ? perf_event_tid
          ? __perf_event_header__init_id
          ? validate_chain
          ? perf_output_sample
          ? trace_hardirqs_off
          ? page_fault
          ? irq_work_run
          ? update_process_times
          ? tick_sched_timer
          ? tick_sched_timer
          ? __run_hrtimer
          ? hrtimer_interrupt
          ? account_system_vtime
          ? smp_apic_timer_interrupt
          ? apic_timer_interrupt
       ...
      
      Looking at assembly code, I found:
      
      list = this_cpu_xchg(irq_work_list, NULL);
      
      gives this wrong code : (gcc-4.1.2 cross compiler)
      
      ffffffff810bc45e:
      	mov    %gs:0xead0,%rax
      	cmpxchg %rax,%gs:0xead0
      	jne    ffffffff810bc45e <irq_work_run+0x3e>
      	test   %rax,%rax
      	je     ffffffff810bc4aa <irq_work_run+0x8a>
      
      Tell gcc we dirty eax/rax register in percpu_xchg_op()
      
      Compiler must use another register to store pxo_new__
      
      We also dont need to reload percpu value after a jump,
      since a 'failed' cmpxchg already updated eax/rax
      
      Wrong generated code was :
      	xor     %rax,%rax   /* load 0 into %rax */
      1:	mov     %gs:0xead0,%rax
      	cmpxchg %rax,%gs:0xead0
      	jne     1b
      	test    %rax,%rax
      
      After patch :
      
      	xor     %rdx,%rdx   /* load 0 into %rdx */
      	mov     %gs:0xead0,%rax
      1:	cmpxchg %rdx,%gs:0xead0
      	jne     1b:
      	test    %rax,%rax
      Signed-off-by: NEric Dumazet <eric.dumazet@gmail.com>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Frederic Weisbecker <fweisbec@gmail.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
      Cc: Christoph Lameter <cl@linux-foundation.org>
      Cc: Tejun Heo <tj@kernel.org>
      LKML-Reference: <1295973114.3588.312.camel@edumazet-laptop>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      889a7a6a
    • Y
      x86: Remove left over system_64.h · 9a57c3e4
      Yinghai Lu 提交于
      Left-over from the x86 merge ...
      Signed-off-by: NYinghai Lu <yinghai@kernel.org>
      LKML-Reference: <4D3E23D1.7010405@kernel.org>
      Signed-off-by: NIngo Molnar <mingo@elte.hu>
      9a57c3e4
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input · 6fb1b304
      Linus Torvalds 提交于
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input:
        Input: wacom - pass touch resolution to clients through input_absinfo
        Input: wacom - add 2 Bamboo Pen and touch models
        Input: sysrq - ensure sysrq_enabled and __sysrq_enabled are consistent
        Input: sparse-keymap - fix KEY_VSW handling in sparse_keymap_setup
        Input: tegra-kbc - add tegra keyboard driver
        Input: gpio_keys - switch to using request_any_context_irq
        Input: serio - allow registered drivers to get status flag
        Input: ct82710c - return proper error code for ct82c710_open
        Input: bu21013_ts - added regulator support
        Input: bu21013_ts - remove duplicate resolution parameters
        Input: tnetv107x-ts - don't treat NULL clk as an error
        Input: tnetv107x-keypad - don't treat NULL clk as an error
      
      Fix up trivial conflicts in drivers/input/keyboard/Makefile due to
      additions of tc3589x/Tegra drivers
      6fb1b304
    • S
      mmc: bfin_sdh: fix alloc size for private data · a34650f0
      Sonic Zhang 提交于
      The bfin_sdh driver allocates the wrong size for the private data
      in the mmc_host.  The first parameter of mmc_alloc_host should be
      the size of the local driver struct rather than the common mmc_host.
      Signed-off-by: NSonic Zhang <sonic.zhang@analog.com>
      Signed-off-by: NMike Frysinger <vapier@gentoo.org>
      Cc: <stable@kernel.org>
      Signed-off-by: NChris Ball <cjb@laptop.org>
      a34650f0
    • J
      mmc: sdhci-s3c: add platform_8bit_width() hook · 548f07d2
      Jaehoon Chung 提交于
      We have 8-bit width support but is not a v3 controller.
      So we need platform_8bit_width() to support 8-bit buswidth.
      Also we need MMC_CAP_8_BIT_DATA, so we add it in platdata.
      
      This gets 8-bit support working again on s3c, after we previously
      disabled 8-bit by default on non-v3 controllers.
      Signed-off-by: NJaehoon Chung <jh80.chung@samsung.com>
      Signed-off-by: NKyungmin Park <kyungmin.park@samsung.com>
      Signed-off-by: NChris Ball <cjb@laptop.org>
      548f07d2
    • J
      mmc: jz4740: don't treat NULL clk as an error · 3119cbda
      Jamie Iles 提交于
      clk_get() returns a struct clk cookie to the driver and some platforms
      may return NULL if they only support a single clock.  clk_get() has only
      failed if it returns a ERR_PTR() encoded pointer.
      Signed-off-by: NJamie Iles <jamie@jamieiles.com>
      Signed-off-by: NChris Ball <cjb@laptop.org>
      3119cbda
    • R
      mmc: mmci: don't read command response when invalid · 9047b435
      Russell King - ARM Linux 提交于
      Don't read the command response from the registers when either the
      command timed out (because there was no response from the card) or
      the checksum on the response was invalid.
      Signed-off-by: NRussell King <rmk+kernel@arm.linux.org.uk>
      Signed-off-by: NChris Ball <cjb@laptop.org>
      9047b435
    • J
      mmc: ushc: Remove duplicate include of usb.h · 021cb59a
      Jesper Juhl 提交于
      Including usb.h once is enough in drivers/mmc/host/ushc.c
      This removes the duplicate.
      Signed-off-by: NJesper Juhl <jj@chaosbits.net>
      Signed-off-by: NChris Ball <cjb@laptop.org>
      021cb59a
    • P
      Input: wacom - pass touch resolution to clients through input_absinfo · 409550f2
      Ping Cheng 提交于
      Also remove fake ABS_RX/ABS_RY "axes" that were used to report physical
      dimensions now that we have better way.
      Signed-off-by: NPing Cheng <pingc@wacom.com>
      Reviewed-by: NHenrik Rydberg <rydberg@euromail.se>
      Signed-off-by: NDmitry Torokhov <dtor@mail.ru>
      409550f2
    • T
      console: rename acquire/release_console_sem() to console_lock/unlock() · ac751efa
      Torben Hohn 提交于
      The -rt patches change the console_semaphore to console_mutex.  As a
      result, a quite large chunk of the patches changes all
      acquire/release_console_sem() to acquire/release_console_mutex()
      
      This commit makes things use more neutral function names which dont make
      implications about the underlying lock.
      
      The only real change is the return value of console_trylock which is
      inverted from try_acquire_console_sem()
      
      This patch also paves the way to switching console_sem from a semaphore to
      a mutex.
      
      [akpm@linux-foundation.org: coding-style fixes]
      [akpm@linux-foundation.org: make console_trylock return 1 on success, per Geert]
      Signed-off-by: NTorben Hohn <torbenh@gmx.de>
      Cc: Thomas Gleixner <tglx@tglx.de>
      Cc: Greg KH <gregkh@suse.de>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ac751efa
    • P
      squashfs: fix use of uninitialised variable in zlib & xz decompressors · 3689456b
      Phillip Lougher 提交于
      Fix potential use of uninitialised variable caused by recent
      decompressor code optimisations.
      
      In zlib_uncompress (zlib_wrapper.c) we have
      
      	int zlib_err, zlib_init = 0;
      	...
      	do {
      		...
      			if (avail == 0) {
      				offset = 0;
      				put_bh(bh[k++]);
      				continue;
      			}
      		...
      		zlib_err = zlib_inflate(stream, Z_SYNC_FLUSH);
      		...
      	} while (zlib_err == Z_OK);
      
      If continue is executed (avail == 0) then the while condition will be
      evaluated testing zlib_err, which is uninitialised first time around the
      loop.
      
      Fix this by getting rid of the 'if (avail == 0)' condition test, this
      edge condition should not be being handled in the decompressor code, and
      instead handle it generically in the caller code.
      
      Similarly for xz_wrapper.c.
      
      Incidentally, on most architectures (bar Mips and Parisc), no
      uninitialised variable warning is generated by gcc, this is because the
      while condition test on continue is optimised out and not performed
      (when executing continue zlib_err has not been changed since entering
      the loop, and logically if the while condition was true previously, then
      it's still true).
      Signed-off-by: NPhillip Lougher <phillip@lougher.demon.co.uk>
      Reported-by: NJesper Juhl <jj@chaosbits.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3689456b
    • T
      radix_tree: radix_tree_gang_lookup_tag_slot() may never return · ac15ee69
      Toshiyuki Okajima 提交于
      Executed command: fsstress -d /mnt -n 600 -p 850
      
        crash> bt
        PID: 7947   TASK: ffff880160546a70  CPU: 0   COMMAND: "fsstress"
         #0 [ffff8800dfc07d00] machine_kexec at ffffffff81030db9
         #1 [ffff8800dfc07d70] crash_kexec at ffffffff810a7952
         #2 [ffff8800dfc07e40] oops_end at ffffffff814aa7c8
         #3 [ffff8800dfc07e70] die_nmi at ffffffff814aa969
         #4 [ffff8800dfc07ea0] do_nmi_callback at ffffffff8102b07b
         #5 [ffff8800dfc07f10] do_nmi at ffffffff814aa514
         #6 [ffff8800dfc07f50] nmi at ffffffff814a9d60
            [exception RIP: __lookup_tag+100]
            RIP: ffffffff812274b4  RSP: ffff88016056b998  RFLAGS: 00000287
            RAX: 0000000000000000  RBX: 0000000000000002  RCX: 0000000000000006
            RDX: 000000000000001d  RSI: ffff88016056bb18  RDI: ffff8800c85366e0
            RBP: ffff88016056b9c8   R8: ffff88016056b9e8   R9: 0000000000000000
            R10: 000000000000000e  R11: ffff8800c8536908  R12: 0000000000000010
            R13: 0000000000000040  R14: ffffffffffffffc0  R15: ffff8800c85366e0
            ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
        <NMI exception stack>
         #7 [ffff88016056b998] __lookup_tag at ffffffff812274b4
         #8 [ffff88016056b9d0] radix_tree_gang_lookup_tag_slot at ffffffff81227605
         #9 [ffff88016056ba20] find_get_pages_tag at ffffffff810fc110
        #10 [ffff88016056ba80] pagevec_lookup_tag at ffffffff81105e85
        #11 [ffff88016056baa0] write_cache_pages at ffffffff81104c47
        #12 [ffff88016056bbd0] generic_writepages at ffffffff81105014
        #13 [ffff88016056bbe0] do_writepages at ffffffff81105055
        #14 [ffff88016056bbf0] __filemap_fdatawrite_range at ffffffff810fb2cb
        #15 [ffff88016056bc40] filemap_write_and_wait_range at ffffffff810fb32a
        #16 [ffff88016056bc70] generic_file_direct_write at ffffffff810fb3dc
        #17 [ffff88016056bce0] __generic_file_aio_write at ffffffff810fcee5
        #18 [ffff88016056bda0] generic_file_aio_write at ffffffff810fd085
        #19 [ffff88016056bdf0] do_sync_write at ffffffff8114f9ea
        #20 [ffff88016056bf00] vfs_write at ffffffff8114fcf8
        #21 [ffff88016056bf30] sys_write at ffffffff81150691
        #22 [ffff88016056bf80] system_call_fastpath at ffffffff8100c0b2
      
      I think this root cause is the following:
      
       radix_tree_range_tag_if_tagged() always tags the root tag with settag
       if the root tag is set with iftag even if there are no iftag tags
       in the specified range (Of course, there are some iftag tags
       outside the specified range).
      
      ===============================================================================
      [[[Detailed description]]]
      
      (1) Why cannot radix_tree_gang_lookup_tag_slot() return forever?
      
      __lookup_tag():
       - Return with 0.
       - Return with the index which is not bigger than the old one as the
         input parameter.
      
      Therefore the following "while" repeats forever because the above
      conditions cause "ret" not to be updated and the cur_index cannot be
      changed into the bigger one.
      
      (So, radix_tree_gang_lookup_tag_slot() cannot return forever.)
      
      radix_tree_gang_lookup_tag_slot():
      1178         while (ret < max_items) {
      1179                 unsigned int slots_found;
      1180                 unsigned long next_index;       /* Index of next search */
      1181
      1182                 if (cur_index > max_index)
      1183                         break;
      1184                 slots_found = __lookup_tag(node, results + ret,
      1185                                 cur_index, max_items - ret, &next_index,
      tag);
      1186                 ret += slots_found;
      			// cannot update ret because slots_found == 0.
      			// so, this while loops forever.
      1187                 if (next_index == 0)
      1188                         break;
      1189                 cur_index = next_index;
      1190         }
      
      (2) Why does __lookup_tag() return with 0 and doesn't update the index?
      
      Assuming the following:
        - the one of the slot in radix_tree_node is NULL.
        - the one of the tag which corresponds to the slot sets with
          PAGECACHE_TAG_TOWRITE or other.
        - In a certain height(!=0), the corresponding index is 0.
      
      a) __lookup_tag() notices that the tag is set.
      
      1005 static unsigned int
      1006 __lookup_tag(struct radix_tree_node *slot, void ***results, unsigned long index,
      1007         unsigned int max_items, unsigned long *next_index, unsigned int tag)
      1008 {
      1009         unsigned int nr_found = 0;
      1010         unsigned int shift, height;
      1011
      1012         height = slot->height;
      1013         if (height == 0)
      1014                 goto out;
      1015         shift = (height-1) * RADIX_TREE_MAP_SHIFT;
      1016
      1017         while (height > 0) {
      1018                 unsigned long i = (index >> shift) & RADIX_TREE_MAP_MASK ;
      1019
      1020                 for (;;) {
      1021                         if (tag_get(slot, tag, i))
      1022                                 break;
       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      * the index is not updated yet.
      
      b) __lookup_tag() notices that the slot is NULL.
      
      1023                         index &= ~((1UL << shift) - 1);
      1024                         index += 1UL << shift;
      1025                         if (index == 0)
      1026                                 goto out;       /* 32-bit wraparound */
      1027                         i++;
      1028                         if (i == RADIX_TREE_MAP_SIZE)
      1029                                 goto out;
      1030                 }
      1031                 height--;
      1032                 if (height == 0) {      /* Bottom level: grab some items */
      ...
      1055                 }
      1056                 shift -= RADIX_TREE_MAP_SHIFT;
      1057                 slot = rcu_dereference_raw(slot->slots[i]);
      1058                 if (slot == NULL)
      1059                         break;
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      
      c) __lookup_tag() doesn't update the index and return with 0.
      
      1060         }
      1061 out:
      1062         *next_index = index;
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      1063         return nr_found;
      1064 }
      
      (3) Why is the slot NULL even if the tag is set?
      
      Because radix_tree_range_tag_if_tagged() always sets the root tag with
      PAGECACHE_TAG_TOWRITE if the root tag is set with PAGECACHE_TAG_DIRTY,
      even if there is no tag which can be set with PAGECACHE_TAG_TOWRITE
      in the specified range (from *first_indexp to last_index). Of course,
      some PAGECACHE_TAG_DIRTY nodes must exist outside the specified range.
      (radix_tree_range_tag_if_tagged() is called only from tag_pages_for_writeback())
      
       640 unsigned long radix_tree_range_tag_if_tagged(struct radix_tree_root
      *root,
       641                 unsigned long *first_indexp, unsigned long last_index,
       642                 unsigned long nr_to_tag,
       643                 unsigned int iftag, unsigned int settag)
       644 {
       645         unsigned int height = root->height;
       646         struct radix_tree_path path[height];
       647         struct radix_tree_path *pathp = path;
       648         struct radix_tree_node *slot;
       649         unsigned int shift;
       650         unsigned long tagged = 0;
       651         unsigned long index = *first_indexp;
       652
       653         last_index = min(last_index, radix_tree_maxindex(height));
       654         if (index > last_index)
       655                 return 0;
       656         if (!nr_to_tag)
       657                 return 0;
       658         if (!root_tag_get(root, iftag)) {
       659                 *first_indexp = last_index + 1;
       660                 return 0;
       661         }
       662         if (height == 0) {
       663                 *first_indexp = last_index + 1;
       664                 root_tag_set(root, settag);
       665                 return 1;
       666         }
      ...
       733         root_tag_set(root, settag);
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
       734         *first_indexp = index;
       735
       736         return tagged;
       737 }
      
      As the result, there is no radix_tree_node which is set with
      PAGECACHE_TAG_TOWRITE but the root tag(radix_tree_root) is set with
      PAGECACHE_TAG_TOWRITE.
      
      [figure: inside radix_tree]
      (Please see the figure with typewriter font)
      ===========================================
                [roottag = DIRTY]
                       |             tag=0:NOTHING
               tag[0 0 0 1]              1:DIRTY
                  [x x x +]              2:WRITEBACK
                         |               3:DIRTY,WRITEBACK
                         p               4:TOWRITE
                   <--->                 5:DIRTY,TOWRITE ...
           specified range (index: 0 to 2)
      
      * There is no DIRTY tag within the specified range.
       (But there is a DIRTY tag outside that range.)
      
                  | | | | | | | | |
          after calling tag_pages_for_writeback()
                  | | | | | | | | |
                  v v v v v v v v v
      
                [roottag = DIRTY,TOWRITE]
                       |                 p is "page".
               tag[0 0 0 1]              x is NULL.
                  [x x x +]              +- is a pointer to "page".
                         |
                         p
      
      * But TOWRITE tag is set on the root tag.
      ============================================
      
      After that, radix_tree_extend() via radix_tree_insert() is called
      when the page is added.
      This function sets the new radix_tree_node with PAGECACHE_TAG_TOWRITE
      to succeed the status of the root tag.
      
       246 static int radix_tree_extend(struct radix_tree_root *root, unsigned long
      index)
       247 {
       248         struct radix_tree_node *node;
       249         unsigned int height;
       250         int tag;
       251
       252         /* Figure out what the height should be.  */
       253         height = root->height + 1;
       254         while (index > radix_tree_maxindex(height))
       255                 height++;
       256
       257         if (root->rnode == NULL) {
       258                 root->height = height;
       259                 goto out;
       260         }
       261
       262         do {
       263                 unsigned int newheight;
       264                 if (!(node = radix_tree_node_alloc(root)))
       265                         return -ENOMEM;
       266
       267                 /* Increase the height.  */
       268                 node->slots[0] = radix_tree_indirect_to_ptr(root->rnode);
       269
       270                 /* Propagate the aggregated tag info into the new root */
       271                 for (tag = 0; tag < RADIX_TREE_MAX_TAGS; tag++) {
       272                         if (root_tag_get(root, tag))
       273                                 tag_set(node, tag, 0);
      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
       274                 }
      
      ===========================================
                [roottag = DIRTY,TOWRITE]
                       |     :
               tag[0 0 0 1] [0 0 0 0]
                  [x x x +] [+ x x x]
                         |   |
                         p   p (new page)
      
                  | | | | | | | | |
          after calling radix_tree_insert
                  | | | | | | | | |
                  v v v v v v v v v
      
                [roottag = DIRTY,TOWRITE]
                       |
               tag [5 0 0 0]    *  DIRTY and TOWRITE tags are
                   [+ + x x]       succeeded to the new node.
                    | |
        tag [0 0 0 1] [0 0 0 0]
            [x x x +] [+ x x x]
                   |   |
                   p   p
      ============================================
      
      After that, the index 3 page is released by remove_from_page_cache().
      Then we can make the situation that the tag is set with PAGECACHE_TAG_TOWRITE
      and that the slot which corresponds to the tag is NULL.
      ===========================================
                [roottag = DIRTY,TOWRITE]
                       |
               tag [5 0 0 0]
                   [+ + x x]
                    | |
        tag [0 0 0 1] [0 0 0 0]
            [x x x +] [+ x x x]
                   |   |
                   p   p
               (remove)
      
                  | | | | | | | | |
          after calling remove_page_cache
                  | | | | | | | | |
                  v v v v v v v v v
      
                [roottag = DIRTY,TOWRITE]
                       |
               tag [4 0 0 0]      * Only DIRTY tag is cleared
                   [x + x x]        because no TOWRITE tag is existed
                      |             in the bottom node.
                      [0 0 0 0]
                      [+ x x x]
                       |
                       p
      ============================================
      
      To solve this problem
      
      Change to that radix_tree_tag_if_tagged() doesn't tag the root tag
      if it doesn't set any tags within the specified range.
      
      Like this.
      ============================================
       640 unsigned long radix_tree_range_tag_if_tagged(struct radix_tree_root
      *root,
       641                 unsigned long *first_indexp, unsigned long last_index,
       642                 unsigned long nr_to_tag,
       643                 unsigned int iftag, unsigned int settag)
       644 {
       650         unsigned long tagged = 0;
      ...
       733 	     if (tagged)
      ^^^^^^^^^^^^^^^^^^^^^^^^
       734            root_tag_set(root, settag);
       735         *first_indexp = index;
       736
       737         return tagged;
       738 }
      
      ============================================
      Signed-off-by: NToshiyuki Okajima <toshi.okajima@jp.fujitsu.com>
      Acked-by: NJan Kara <jack@suse.cz>
      Cc: Dave Chinner <david@fromorbit.com>
      Cc: Nick Piggin <nickpiggin@yahoo.com.au>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ac15ee69
    • V
      drivers/clocksource/tcb_clksrc.c: fix init sequence · 1817dc03
      Voss, Nikolaus 提交于
      setup_irq() was called before clockevents_register_device() which is
      needed by the irq handler.  Bug was reproducible by restarting the
      kernel using kexec (reliable crash).
      Signed-off-by: NNikolaus Voss <n.voss@weinmann.de>
      Cc: David Brownell <dbrownell@users.sourceforge.net>
      Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      1817dc03
    • K
      memcg: fix race at move_parent around compound_order() · 52dbb905
      KAMEZAWA Hiroyuki 提交于
      A fix up mem_cgroup_move_parent() which use compound_order() in
      asynchronous manner.  This compound_order() may return unknown value
      because we don't take lock.  Use PageTransHuge() and HPAGE_SIZE instead
      of it.
      
      Also clean up for mem_cgroup_move_parent().
       - remove unnecessary initialization of local variable.
       - rename charge_size -> page_size
       - remove unnecessary (wrong) comment.
       - added a comment about THP.
      
      Note:
       Current design take compound_page_lock() in caller of move_account().
       This should be revisited when we implement direct move_task of hugepage
       without splitting.
      
      [akpm@linux-foundation.org: coding-style fixes]
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Reviewed-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      52dbb905
    • K
      memcg: bugfix check mem_cgroup_disabled() at split fixup · 3d37c4a9
      KAMEZAWA Hiroyuki 提交于
      mem_cgroup_disabled() should be checked at splitting.  If disabled, no
      heavy work is necesary.
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Acked-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Reviewed-by: NJohannes Weiner <hannes@cmpxchg.org>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3d37c4a9
    • K
      memcg: fix account leak at failure of memsw acconting · 01c88e2d
      KAMEZAWA Hiroyuki 提交于
      Commit 4b534334 ("memcg: clean up try_charge main loop") removes a
      cancel of charge at case: memory charge-> success.  mem+swap charge->
      failure.
      
      This leaks usage of memory.  Fix it.
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Reviewed-by: NJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: NDaisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: Balbir Singh <balbir@in.ibm.com>
      Cc: <stable@kernel.org>	[2.6.36+]
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      01c88e2d