1. 13 5月, 2010 11 次提交
  2. 12 5月, 2010 29 次提交
    • M
      [S390] correct address of _stext with CONFIG_SHARED_KERNEL=y · 57d84906
      Martin Schwidefsky 提交于
      As of git commit 1844c9bc head64.S/head31.S
      are not included in head.S anymore but build as an extra object. This breaks
      shared kernel support because the .org statement in head64.S/head31.S for
      CONFIG_SHARED_KERNEL=y will have a different effect. The end address of the
      head.text section in head.o will be added to the .org value, to compensate
      for this subtract 0x11000 to get the required value of 0x100000 again.
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      57d84906
    • G
      [S390] ptrace: fix return value of do_syscall_trace_enter() · 545c174d
      Gerald Schaefer 提交于
      strace may change the system call number, so regs->gprs[2] must not
      be read before tracehook_report_syscall_entry(). This fixes a bug
      where "strace -f" will hang after a vfork().
      
      Cc: <stable@kernel.org>
      Signed-off-by: NGerald Schaefer <gerald.schaefer@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      545c174d
    • S
      [S390] dasd: fix race between tasklet and dasd_sleep_on · 1c1e093c
      Stefan Weinhuber 提交于
      The various dasd_sleep_on functions use a global wait queue when
      waiting for a cqr. The wait condition checks the status and devlist
      fields of the cqr to determine if it is safe to continue. This
      evaluation may return true, although the tasklet has not finished
      processing of the cqr and the callback function has not been called
      yet. When the callback is finally called, the data in the cqr may
      already be invalid. The sleep_on wait condition needs a safe way to
      determine if the tasklet has finished processing. Use the
      callback_data field of the cqr to store a token, which is set by
      the callback function itself.
      
      Cc: <stable@kernel.org>
      Signed-off-by: NStefan Weinhuber <wein@de.ibm.com>
      Signed-off-by: NHeiko Carstens <heiko.carstens@de.ibm.com>
      Signed-off-by: NMartin Schwidefsky <schwidefsky@de.ibm.com>
      1c1e093c
    • P
      powerpc/perf_event: Fix oops due to perf_event_do_pending call · 0fe1ac48
      Paul Mackerras 提交于
      Anton Blanchard found that large POWER systems would occasionally
      crash in the exception exit path when profiling with perf_events.
      The symptom was that an interrupt would occur late in the exit path
      when the MSR[RI] (recoverable interrupt) bit was clear.  Interrupts
      should be hard-disabled at this point but they were enabled.  Because
      the interrupt was not recoverable the system panicked.
      
      The reason is that the exception exit path was calling
      perf_event_do_pending after hard-disabling interrupts, and
      perf_event_do_pending will re-enable interrupts.
      
      The simplest and cleanest fix for this is to use the same mechanism
      that 32-bit powerpc does, namely to cause a self-IPI by setting the
      decrementer to 1.  This means we can remove the tests in the exception
      exit path and raw_local_irq_restore.
      
      This also makes sure that the call to perf_event_do_pending from
      timer_interrupt() happens within irq_enter/irq_exit.  (Note that
      calling perf_event_do_pending from timer_interrupt does not mean that
      there is a possible 1/HZ latency; setting the decrementer to 1 ensures
      that the timer interrupt will happen immediately, i.e. within one
      timebase tick, which is a few nanoseconds or 10s of nanoseconds.)
      Signed-off-by: NPaul Mackerras <paulus@samba.org>
      Cc: stable@kernel.org
      Signed-off-by: NBenjamin Herrenschmidt <benh@kernel.crashing.org>
      0fe1ac48
    • S
      ceph: preserve seq # on requeued messages after transient transport errors · e84346b7
      Sage Weil 提交于
      If the tcp connection drops and we reconnect to reestablish a stateful
      session (with the mds), we need to resend previously sent (and possibly
      received) messages with the _same_ seq # so that they can be dropped on
      the other end if needed.  Only assign a new seq once after the message is
      queued.
      Signed-off-by: NSage Weil <sage@newdream.net>
      e84346b7
    • S
      ceph: fix cap removal races · f818a736
      Sage Weil 提交于
      The iterate_session_caps helper traverses the session caps list and tries
      to grab an inode reference.  However, the __ceph_remove_cap was clearing
      the inode backpointer _before_ removing itself from the session list,
      causing a null pointer dereference.
      
      Clear cap->ci under protection of s_cap_lock to avoid the race, and to
      tightly couple the list and backpointer state.  Use a local flag to
      indicate whether we are releasing the cap, as cap->session may be modified
      by a racing thread in iterate_session_caps.
      Signed-off-by: NSage Weil <sage@newdream.net>
      f818a736
    • L
      Merge branch 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging · cea0d767
      Linus Torvalds 提交于
      * 'hwmon-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jdelvare/staging:
        hwmon: (applesmc) Correct sysfs fan error handling
        hwmon: (asc7621) Bug fixes
      cea0d767
    • L
      Merge branch 'perf-fixes-for-linus' of... · b2464ab2
      Linus Torvalds 提交于
      Merge branch 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip
      
      * 'perf-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
        kprobes/x86: Fix removed int3 checking order
        perf: Fix static strings treated like dynamic ones
      b2464ab2
    • A
      drivers/gpu/drm/i915/i915_irq.c:i915_error_object_create(): use correct kmap-atomic slot · 788885ae
      Andrew Morton 提交于
      i915_error_object_create() is called from the timer interrupt and hence
      can corrupt the KM_USER0 slot.  Use KM_IRQ0 instead.
      Reported-by: NJaswinder Singh Rajput <jaswinderlinux@gmail.com>
      Tested-by: NJaswinder Singh Rajput <jaswinderlinux@gmail.com>
      Acked-by: NChris Wilson <chris@chris-wilson.co.uk>
      Cc: Dave Airlie <airlied@linux.ie>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      788885ae
    • O
      hp_accel: fix race in device removal · 06efbeb4
      Oliver Neukum 提交于
      The work queue has to be flushed after the device has been made
      inaccessible.  The patch closes a window during which a work queue might
      remain active after the device is removed and would then lead to ACPI
      calls with undefined behavior.
      Signed-off-by: NOliver Neukum <oneukum@suse.de>
      Acked-by: NEric Piel <eric.piel@tremplin-utc.net>
      Acked-by: NPavel Machek <pavel@ucw.cz>
      Cc: Pavel Herrmann <morpheus.ibis@gmail.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      06efbeb4
    • A
      mqueue: fix kernel BUG caused by double free() on mq_open() · a3ed2a15
      André Goddard Rosa 提交于
      In case of aborting because we reach the maximum amount of memory which
      can be allocated to message queues per user (RLIMIT_MSGQUEUE), we would
      try to free the message area twice when bailing out: first by the error
      handling code itself, and then later when cleaning up the inode through
      delete_inode().
      Signed-off-by: NAndré Goddard Rosa <andre.goddard@gmail.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a3ed2a15
    • M
      fbdev: bfin-t350mcqb-fb: fix fbmem allocation with blanking lines · de145b44
      Michael Hennerich 提交于
      The current allocation does not include the memory required for blanking
      lines.  So avoid memory corruption when multiple devices are using the DMA
      memory near each other.
      Signed-off-by: NMichael Hennerich <michael.hennerich@analog.com>
      Signed-off-by: NMike Frysinger <vapier@gentoo.org>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      de145b44
    • K
      memcg: fix css_is_ancestor() RCU locking · 747388d7
      KAMEZAWA Hiroyuki 提交于
      Some callers (in memcontrol.c) calls css_is_ancestor() without
      rcu_read_lock.  Because css_is_ancestor() has to access RCU protected
      data, it should be under rcu_read_lock().
      
      This makes css_is_ancestor() itself does safe access to RCU protected
      area.  (At least, "root" can have refcnt==0 if it's not an ancestor of
      "child".  So, we need rcu_read_lock().)
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      747388d7
    • K
      memcg: fix css_id() RCU locking for real · 7f0f1546
      KAMEZAWA Hiroyuki 提交于
      Commit ad4ba375 ("memcg: css_id() must be
      called under rcu_read_lock()") modifies memcontol.c for fixing RCU check
      message.  But Andrew Morton pointed out that the fix doesn't seems sane
      and it was just for hidining lockdep messages.
      
      This is a patch for do proper things.  Checking again, all places,
      accessing without rcu_read_lock, that commit fixies was intentional....
      all callers of css_id() has reference count on it.  So, it's not necessary
      to be under rcu_read_lock().
      
      Considering again, we can use rcu_dereference_check for css_id().  We know
      css->id is valid if css->refcnt > 0.  (css->id never changes and freed
      after css->refcnt going to be 0.)
      
      This patch makes use of rcu_dereference_check() in css_id/depth and remove
      unnecessary rcu-read-lock added by the commit.
      Signed-off-by: NKAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
      Cc: Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      7f0f1546
    • V
      bsdacct: use del_timer_sync() in acct_exit_ns() · 11cad320
      Vitaliy Gusev 提交于
      acct_exit_ns --> acct_file_reopen deletes timer without check timer
      execution on other CPUs.  So acct_timeout() can change an unmapped memory.
      Signed-off-by: NVitaliy Gusev <vgusev@openvz.org>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      11cad320
    • N
      rmap: remove anon_vma check in page_address_in_vma() · ab941e0f
      Naoya Horiguchi 提交于
      Currently page_address_in_vma() compares vma->anon_vma and
      page_anon_vma(page) for parameter check, but in 2.6.34 a vma can have
      multiple anon_vmas with anon_vma_chain, so current check does not work.
      (For anonymous page shared by multiple processes, some verified (page,vma)
      pairs return -EFAULT wrongly.)
      
      We can go to checking all anon_vmas in the "same_vma" chain, but it needs
      to meet lock requirement.  Instead, we can remove anon_vma check safely
      because page_address_in_vma() assumes that page and vma are already
      checked to belong to the identical process.
      Signed-off-by: NNaoya Horiguchi <n-horiguchi@ah.jp.nec.com>
      Reviewed-by: NRik van Riel <riel@redhat.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Mel Gorman <mel@csn.ul.ie>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ab941e0f
    • M
      hugetlbfs: kill applications that use MAP_NORESERVE with SIGBUS instead of OOM-killer · 4a6018f7
      Mel Gorman 提交于
      Ordinarily, application using hugetlbfs will create mappings with
      reserves.  For shared mappings, these pages are reserved before mmap()
      returns success and for private mappings, the caller process is guaranteed
      and a child process that cannot get the pages gets killed with sigbus.
      
      An application that uses MAP_NORESERVE gets no reservations and mmap()
      will always succeed at the risk the page will not be available at fault
      time.  This might be used for example on very large sparse mappings where
      the developer is confident the necessary huge pages exist to satisfy all
      faults even though the whole mapping cannot be backed by huge pages.
      Unfortunately, if an allocation does fail, VM_FAULT_OOM is returned to the
      fault handler which proceeds to trigger the OOM-killer.  This is
      unhelpful.
      
      Even without hugetlbfs mounted, a user using mmap() can trivially trigger
      the OOM-killer because VM_FAULT_OOM is returned (will provide example
      program if desired - it's a whopping 24 lines long).  It could be
      considered a DOS available to an unprivileged user.
      
      This patch alters hugetlbfs to kill a process that uses MAP_NORESERVE
      where huge pages were not available with SIGBUS instead of triggering the
      OOM killer.
      
      This change affects hugetlb_cow() as well.  I feel there is a failure case
      in there, but I didn't create one.  It would need a fairly specific target
      in terms of the faulting application and the hugepage pool size.  The
      hugetlb_no_page() path is much easier to hit but both might as well be
      closed.
      Signed-off-by: NMel Gorman <mel@csn.ul.ie>
      Cc: Lee Schermerhorn <lee.schermerhorn@hp.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      4a6018f7
    • V
      kexec: fix OOPS in crash_kernel_shrink · 475f9aa6
      Vitaly Mayatskikh 提交于
      Two "echo 0 > /sys/kernel/kexec_crash_size" OOPSes kernel.  Also content
      of this file is invalid after first shrink to zero: it shows 1 instead of
      0.
      
      This scenario is unlikely to happen often (root privs, valid crashkernel=
      in cmdline, dump-capture kernel not loaded), I hit it only by chance.
      
      This patch fixes it.
      Signed-off-by: NVitaly Mayatskikh <v.mayatskih@gmail.com>
      Cc: Cong Wang <amwang@redhat.com>
      Cc: Neil Horman <nhorman@tuxdriver.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      475f9aa6
    • N
      mmc: atmel-mci: fix in debugfs: response value printing · d586ebbb
      Nicolas Ferre 提交于
      In debugfs, printing of command response reports resp[2] twice: fix it to
      resp[3].
      Signed-off-by: NNicolas Ferre <nicolas.ferre@atmel.com>
      Haavard Skinnemoen <hskinnemoen@atmel.com>
      Cc: <linux-mmc@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      d586ebbb
    • N
      mmc: atmel-mci: remove data error interrupt after xfer · abc2c9fd
      Nicolas Ferre 提交于
      Disable data error interrupts while we are actually recording that there
      is not such errors.  This will prevent, in some cases, the warning message
      printed at new request queuing (in atmci_start_request()).
      Signed-off-by: NNicolas Ferre <nicolas.ferre@atmel.com>
      Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
      Cc: <linux-mmc@vger.kernel.org>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      abc2c9fd
    • N
      mmc: atmel-mci: prevent kernel oops while removing card · 009a891b
      Nicolas Ferre 提交于
      The removing of an SD card in certain circumstances can lead to a kernel
      oops if we do not make sure that the "data" field of the host structure is
      valid.  This patch adds a test in atmci_dma_cleanup() function and also
      calls atmci_stop_dma() before throwing away the reference to data.
      Signed-off-by: NNicolas Ferre <nicolas.ferre@atmel.com>
      Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
      Cc: <linux-mmc@vger.kernel.org>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      009a891b
    • N
      mmc: atmel-mci: fix two parameters swapped · ebb1fea9
      Nicolas Ferre 提交于
      Two parameters were swapped in the calls to atmci_init_slot().
      Signed-off-by: NNicolas Ferre <nicolas.ferre@atmel.com>
      Reported-by: NAnders Grahn <anders.grahn@hd-wireless.se>
      Cc: Haavard Skinnemoen <hskinnemoen@atmel.com>
      Cc: <linux-mmc@vger.kernel.org>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      ebb1fea9
    • R
      revert "procfs: provide stack information for threads" and its fixup commits · 34441427
      Robin Holt 提交于
      Originally, commit d899bf7b ("procfs: provide stack information for
      threads") attempted to introduce a new feature for showing where the
      threadstack was located and how many pages are being utilized by the
      stack.
      
      Commit c44972f1 ("procfs: disable per-task stack usage on NOMMU") was
      applied to fix the NO_MMU case.
      
      Commit 89240ba0 ("x86, fs: Fix x86 procfs stack information for threads on
      64-bit") was applied to fix a bug in ia32 executables being loaded.
      
      Commit 9ebd4eba ("procfs: fix /proc/<pid>/stat stack pointer for kernel
      threads") was applied to fix a bug which had kernel threads printing a
      userland stack address.
      
      Commit 1306d603 ('proc: partially revert "procfs: provide stack
      information for threads"') was then applied to revert the stack pages
      being used to solve a significant performance regression.
      
      This patch nearly undoes the effect of all these patches.
      
      The reason for reverting these is it provides an unusable value in
      field 28.  For x86_64, a fork will result in the task->stack_start
      value being updated to the current user top of stack and not the stack
      start address.  This unpredictability of the stack_start value makes
      it worthless.  That includes the intended use of showing how much stack
      space a thread has.
      
      Other architectures will get different values.  As an example, ia64
      gets 0.  The do_fork() and copy_process() functions appear to treat the
      stack_start and stack_size parameters as architecture specific.
      
      I only partially reverted c44972f1 ("procfs: disable per-task stack usage
      on NOMMU") .  If I had completely reverted it, I would have had to change
      mm/Makefile only build pagewalk.o when CONFIG_PROC_PAGE_MONITOR is
      configured.  Since I could not test the builds without significant effort,
      I decided to not change mm/Makefile.
      
      I only partially reverted 89240ba0 ("x86, fs: Fix x86 procfs stack
      information for threads on 64-bit") .  I left the KSTK_ESP() change in
      place as that seemed worthwhile.
      Signed-off-by: NRobin Holt <holt@sgi.com>
      Cc: Stefani Seibold <stefani@seibold.net>
      Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Ingo Molnar <mingo@elte.hu>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      34441427
    • D
      it8761e_gpio: fix bug in gpio numbering · 3c904afd
      Denis Turischev 提交于
      The SIO chip contains 16 possible gpio lines, not 14.  The schematic was
      not read carefully.
      Signed-off-by: NDenis Turischev <denis@compulab.co.il>
      Cc: David Brownell <david-b@pacbell.net>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3c904afd
    • F
      dma-mapping: fix dma_sync_single_range_* · f33d7e2d
      FUJITA Tomonori 提交于
      dma_sync_single_range_for_cpu() and dma_sync_single_range_for_device() use
      a wrong address with a partial synchronization.
      Signed-off-by: NFUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
      Reviewed-by: NKonrad Rzeszutek Wilk <konrad.wilk@oracle.com>
      Cc: <stable@kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f33d7e2d
    • S
      ceph: zero unused message header, footer fields · 45c6ceb5
      Sage Weil 提交于
      We shouldn't leak any prior memory contents to other parties.  And random
      data, particularly in the 'version' field, can cause problems down the
      line.
      Signed-off-by: NSage Weil <sage@newdream.net>
      45c6ceb5
    • L
      Merge branch 'drm-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6 · fc2a093e
      Linus Torvalds 提交于
      * 'drm-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/airlied/drm-2.6:
        drm/radeon: Fix 3 regressions - since buffer rework
      fc2a093e
    • L
      Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6 · 9fc282ba
      Linus Torvalds 提交于
      * git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-2.6:
        net: Fix FDDI and TR config checks in ipv4 arp and LLC.
        IPv4: unresolved multicast route cleanup
        mac80211: remove association work when processing deauth request
        ar9170: wait for asynchronous firmware loading
        ipv4: udp: fix short packet and bad checksum logging
        phy: Fix initialization in micrel driver.
        sctp: Fix a race between ICMP protocol unreachable and connect()
        veth: Dont kfree_skb() after dev_forward_skb()
        IPv6: fix IPV6_RECVERR handling of locally-generated errors
        net/gianfar: drop recycled skbs on MTU change
        iwlwifi: work around passive scan issue
      9fc282ba
    • D
      CacheFiles: Fix occasional EIO on call to vfs_unlink() · c61ea31d
      David Howells 提交于
      Fix an occasional EIO returned by a call to vfs_unlink():
      
      	[ 4868.465413] CacheFiles: I/O Error: Unlink failed
      	[ 4868.465444] FS-Cache: Cache cachefiles stopped due to I/O error
      	[ 4947.320011] CacheFiles: File cache on md3 unregistering
      	[ 4947.320041] FS-Cache: Withdrawing cache "mycache"
      	[ 5127.348683] FS-Cache: Cache "mycache" added (type cachefiles)
      	[ 5127.348716] CacheFiles: File cache on md3 registered
      	[ 7076.871081] CacheFiles: I/O Error: Unlink failed
      	[ 7076.871130] FS-Cache: Cache cachefiles stopped due to I/O error
      	[ 7116.780891] CacheFiles: File cache on md3 unregistering
      	[ 7116.780937] FS-Cache: Withdrawing cache "mycache"
      	[ 7296.813394] FS-Cache: Cache "mycache" added (type cachefiles)
      	[ 7296.813432] CacheFiles: File cache on md3 registered
      
      What happens is this:
      
       (1) A cached NFS file is seen to have become out of date, so NFS retires the
           object and immediately acquires a new object with the same key.
      
       (2) Retirement of the old object is done asynchronously - so the lookup/create
           to generate the new object may be done first.
      
           This can be a problem as the old object and the new object must exist at
           the same point in the backing filesystem (i.e. they must have the same
           pathname).
      
       (3) The lookup for the new object sees that a backing file already exists,
           checks to see whether it is valid and sees that it isn't.  It then deletes
           that file and creates a new one on disk.
      
       (4) The retirement phase for the old file is then performed.  It tries to
           delete the dentry it has, but ext4_unlink() returns -EIO because the inode
           attached to that dentry no longer matches the inode number associated with
           the filename in the parent directory.
      
      The trace below shows this quite well.
      
      	[md5sum] ==> __fscache_relinquish_cookie(ffff88002d12fb58{NFS.fh,ffff88002ce62100},1)
      	[md5sum] ==> __fscache_acquire_cookie({NFS.server},{NFS.fh},ffff88002ce62100)
      
      NFS has retired the old cookie and asked for a new one.
      
      	[kslowd] ==> fscache_object_state_machine({OBJ52,OBJECT_ACTIVE,24})
      	[kslowd] <== fscache_object_state_machine() [->OBJECT_DYING]
      	[kslowd] ==> fscache_object_state_machine({OBJ53,OBJECT_INIT,0})
      	[kslowd] <== fscache_object_state_machine() [->OBJECT_LOOKING_UP]
      	[kslowd] ==> fscache_object_state_machine({OBJ52,OBJECT_DYING,24})
      	[kslowd] <== fscache_object_state_machine() [->OBJECT_RECYCLING]
      
      The old object (OBJ52) is going through the terminal states to get rid of it,
      whilst the new object - (OBJ53) - is coming into being.
      
      	[kslowd] ==> fscache_object_state_machine({OBJ53,OBJECT_LOOKING_UP,0})
      	[kslowd] ==> cachefiles_walk_to_object({ffff88003029d8b8},OBJ53,@68,)
      	[kslowd] lookup '@68'
      	[kslowd] next -> ffff88002ce41bd0 positive
      	[kslowd] advance
      	[kslowd] lookup 'Es0g00og0_Nd_XCYe3BOzvXrsBLMlN6aw16M1htaA'
      	[kslowd] next -> ffff8800369faac8 positive
      
      The new object has looked up the subdir in which the file would be in (getting
      dentry ffff88002ce41bd0) and then looked up the file itself (getting dentry
      ffff8800369faac8).
      
      	[kslowd] validate 'Es0g00og0_Nd_XCYe3BOzvXrsBLMlN6aw16M1htaA'
      	[kslowd] ==> cachefiles_bury_object(,'@68','Es0g00og0_Nd_XCYe3BOzvXrsBLMlN6aw16M1htaA')
      	[kslowd] remove ffff8800369faac8 from ffff88002ce41bd0
      	[kslowd] unlink stale object
      	[kslowd] <== cachefiles_bury_object() = 0
      
      It then checks the file's xattrs to see if it's valid.  NFS says that the
      auxiliary data indicate the file is out of date (obvious to us - that's why NFS
      ditched the old version and got a new one).  CacheFiles then deletes the old
      file (dentry ffff8800369faac8).
      
      	[kslowd] redo lookup
      	[kslowd] lookup 'Es0g00og0_Nd_XCYe3BOzvXrsBLMlN6aw16M1htaA'
      	[kslowd] next -> ffff88002cd94288 negative
      	[kslowd] create -> ffff88002cd94288{ffff88002cdaf238{ino=148247}}
      
      CacheFiles then redoes the lookup and gets a negative result in a new dentry
      (ffff88002cd94288) which it then creates a file for.
      
      	[kslowd] ==> cachefiles_mark_object_active(,OBJ53)
      	[kslowd] <== cachefiles_mark_object_active() = 0
      	[kslowd] === OBTAINED_OBJECT ===
      	[kslowd] <== cachefiles_walk_to_object() = 0 [148247]
      	[kslowd] <== fscache_object_state_machine() [->OBJECT_AVAILABLE]
      
      The new object is then marked active and the state machine moves to the
      available state - at which point NFS can start filling the object.
      
      	[kslowd] ==> fscache_object_state_machine({OBJ52,OBJECT_RECYCLING,20})
      	[kslowd] ==> fscache_release_object()
      	[kslowd] ==> cachefiles_drop_object({OBJ52,2})
      	[kslowd] ==> cachefiles_delete_object(,OBJ52{ffff8800369faac8})
      
      The old object, meanwhile, goes on with being retired.  If allocation occurs
      first, cachefiles_delete_object() has to wait for dir->d_inode->i_mutex to
      become available before it can continue.
      
      	[kslowd] ==> cachefiles_bury_object(,'@68','Es0g00og0_Nd_XCYe3BOzvXrsBLMlN6aw16M1htaA')
      	[kslowd] remove ffff8800369faac8 from ffff88002ce41bd0
      	[kslowd] unlink stale object
      	EXT4-fs warning (device sda6): ext4_unlink: Inode number mismatch in unlink (148247!=148193)
      	CacheFiles: I/O Error: Unlink failed
      	FS-Cache: Cache cachefiles stopped due to I/O error
      
      CacheFiles then tries to delete the file for the old object, but the dentry it
      has (ffff8800369faac8) no longer points to a valid inode for that directory
      entry, and so ext4_unlink() returns -EIO when de->inode does not match i_ino.
      
      	[kslowd] <== cachefiles_bury_object() = -5
      	[kslowd] <== cachefiles_delete_object() = -5
      	[kslowd] <== fscache_object_state_machine() [->OBJECT_DEAD]
      	[kslowd] ==> fscache_object_state_machine({OBJ53,OBJECT_AVAILABLE,0})
      	[kslowd] <== fscache_object_state_machine() [->OBJECT_ACTIVE]
      
      (Note that the above trace includes extra information beyond that produced by
      the upstream code).
      
      The fix is to note when an object that is being retired has had its object
      deleted preemptively by a replacement object that is being created, and to
      skip the second removal attempt in such a case.
      Reported-by: NGreg M <gregm@servu.net.au>
      Reported-by: NMark Moseley <moseleymark@gmail.com>
      Reported-by: NRomain DEGEZ <romain.degez@smartjog.com>
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      c61ea31d