1. 28 8月, 2017 1 次提交
  2. 17 3月, 2017 1 次提交
    • V
      kernfs: Check KERNFS_HAS_RELEASE before calling kernfs_release_file() · 966fa72a
      Vaibhav Jain 提交于
      Recently started seeing a kernel oops when a module tries removing a
      memory mapped sysfs bin_attribute. On closer investigation the root
      cause seems to be kernfs_release_file() trying to call
      kernfs_op.release() callback that's NULL for such sysfs
      bin_attributes. The oops occurs when kernfs_release_file() is called from
      kernfs_drain_open_files() to cleanup any open handles with active
      memory mappings.
      
      The patch fixes this by checking for flag KERNFS_HAS_RELEASE before
      calling kernfs_release_file() in function kernfs_drain_open_files().
      
      On ppc64-le arch with cxl module the oops back-trace is of the
      form below:
      [  861.381126] Unable to handle kernel paging request for instruction fetch
      [  861.381360] Faulting instruction address: 0x00000000
      [  861.381428] Oops: Kernel access of bad area, sig: 11 [#1]
      ....
      [  861.382481] NIP: 0000000000000000 LR: c000000000362c60 CTR:
      0000000000000000
      ....
      Call Trace:
      [c000000f1680b750] [c000000000362c34] kernfs_drain_open_files+0x104/0x1d0 (unreliable)
      [c000000f1680b790] [c00000000035fa00] __kernfs_remove+0x260/0x2c0
      [c000000f1680b820] [c000000000360da0] kernfs_remove_by_name_ns+0x60/0xe0
      [c000000f1680b8b0] [c0000000003638f4] sysfs_remove_bin_file+0x24/0x40
      [c000000f1680b8d0] [c00000000062a164] device_remove_bin_file+0x24/0x40
      [c000000f1680b8f0] [d000000009b7b22c] cxl_sysfs_afu_remove+0x144/0x170 [cxl]
      [c000000f1680b940] [d000000009b7c7e4] cxl_remove+0x6c/0x1a0 [cxl]
      [c000000f1680b990] [c00000000052f694] pci_device_remove+0x64/0x110
      [c000000f1680b9d0] [c0000000006321d4] device_release_driver_internal+0x1f4/0x2b0
      [c000000f1680ba20] [c000000000525cb0] pci_stop_bus_device+0xa0/0xd0
      [c000000f1680ba60] [c000000000525e80] pci_stop_and_remove_bus_device+0x20/0x40
      [c000000f1680ba90] [c00000000004a6c4] pci_hp_remove_devices+0x84/0xc0
      [c000000f1680bad0] [c00000000004a688] pci_hp_remove_devices+0x48/0xc0
      [c000000f1680bb10] [c0000000009dfda4] eeh_reset_device+0xb0/0x290
      [c000000f1680bbb0] [c000000000032b4c] eeh_handle_normal_event+0x47c/0x530
      [c000000f1680bc60] [c000000000032e64] eeh_handle_event+0x174/0x350
      [c000000f1680bd10] [c000000000033228] eeh_event_handler+0x1e8/0x1f0
      [c000000f1680bdc0] [c0000000000d384c] kthread+0x14c/0x190
      [c000000f1680be30] [c00000000000b5a0] ret_from_kernel_thread+0x5c/0xbc
      
      Fixes: f83f3c51 ("kernfs: fix locking around kernfs_ops->release() callback")
      Signed-off-by: NVaibhav Jain <vaibhav@linux.vnet.ibm.com>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      966fa72a
  3. 02 3月, 2017 1 次提交
  4. 25 2月, 2017 1 次提交
  5. 22 2月, 2017 1 次提交
    • T
      kernfs: fix locking around kernfs_ops->release() callback · f83f3c51
      Tejun Heo 提交于
      The release callback may be called from two places - file release
      operation and kernfs open file draining.  kernfs_open_file->mutex is
      used to synchronize the two callsites.  This unfortunately leads to
      possible circular locking because of->mutex is used to protect the
      usual kernfs operations which may use locking constructs which are
      held while removing and thus draining kernfs files.
      
      @of->mutex is for synchronizing concurrent kernfs access operations
      and all we need here is synchronization between the releaes and drain
      paths.  As the drain path has to grab kernfs_open_file_mutex anyway,
      let's use the mutex to synchronize the release operation instead.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-and-tested-by: NTony Lindgren <tony@atomide.com>
      Fixes: 0e67db2f ("kernfs: add kernfs_ops->open/release() callbacks")
      Acked-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f83f3c51
  6. 28 12月, 2016 2 次提交
  7. 27 10月, 2016 1 次提交
  8. 31 8月, 2016 1 次提交
    • T
      kernfs: don't depend on d_find_any_alias() when generating notifications · df6a58c5
      Tejun Heo 提交于
      kernfs_notify_workfn() sends out file modified events for the
      scheduled kernfs_nodes.  Because the modifications aren't from
      userland, it doesn't have the matching file struct at hand and can't
      use fsnotify_modify().  Instead, it looked up the inode and then used
      d_find_any_alias() to find the dentry and used fsnotify_parent() and
      fsnotify() directly to generate notifications.
      
      The assumption was that the relevant dentries would have been pinned
      if there are listeners, which isn't true as inotify doesn't pin
      dentries at all and watching the parent doesn't pin the child dentries
      even for dnotify.  This led to, for example, inotify watchers not
      getting notifications if the system is under memory pressure and the
      matching dentries got reclaimed.  It can also be triggered through
      /proc/sys/vm/drop_caches or a remount attempt which involves shrinking
      dcache.
      
      fsnotify_parent() only uses the dentry to access the parent inode,
      which kernfs can do easily.  Update kernfs_notify_workfn() so that it
      uses fsnotify() directly for both the parent and target inodes without
      going through d_find_any_alias().  While at it, supply the target file
      name to fsnotify() from kernfs_node->name.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NEvgeny Vereshchagin <evvers@ya.ru>
      Fixes: d911d987 ("kernfs: make kernfs_notify() trigger inotify events too")
      Cc: John McCutchan <john@johnmccutchan.com>
      Cc: Robert Love <rlove@rlove.org>
      Cc: Eric Paris <eparis@parisplace.org>
      Cc: stable@vger.kernel.org # v3.16+
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      df6a58c5
  9. 01 5月, 2016 1 次提交
    • C
      kernfs: Move faulting copy_user operations outside of the mutex · e4234a1f
      Chris Wilson 提交于
      A fault in a user provided buffer may lead anywhere, and lockdep warns
      that we have a potential deadlock between the mm->mmap_sem and the
      kernfs file mutex:
      
      [   82.811702] ======================================================
      [   82.811705] [ INFO: possible circular locking dependency detected ]
      [   82.811709] 4.5.0-rc4-gfxbench+ #1 Not tainted
      [   82.811711] -------------------------------------------------------
      [   82.811714] kms_setmode/5859 is trying to acquire lock:
      [   82.811717]  (&dev->struct_mutex){+.+.+.}, at: [<ffffffff8150d9c1>] drm_gem_mmap+0x1a1/0x270
      [   82.811731]
      but task is already holding lock:
      [   82.811734]  (&mm->mmap_sem){++++++}, at: [<ffffffff8117b364>] vm_mmap_pgoff+0x44/0xa0
      [   82.811745]
      which lock already depends on the new lock.
      
      [   82.811749]
      the existing dependency chain (in reverse order) is:
      [   82.811752]
      -> #3 (&mm->mmap_sem){++++++}:
      [   82.811761]        [<ffffffff810cc883>] lock_acquire+0xc3/0x1d0
      [   82.811766]        [<ffffffff8118bc65>] __might_fault+0x75/0xa0
      [   82.811771]        [<ffffffff8124da4a>] kernfs_fop_write+0x8a/0x180
      [   82.811787]        [<ffffffff811d1023>] __vfs_write+0x23/0xe0
      [   82.811792]        [<ffffffff811d1d74>] vfs_write+0xa4/0x190
      [   82.811797]        [<ffffffff811d2c14>] SyS_write+0x44/0xb0
      [   82.811801]        [<ffffffff817bb81b>] entry_SYSCALL_64_fastpath+0x16/0x73
      [   82.811807]
      -> #2 (s_active#6){++++.+}:
      [   82.811814]        [<ffffffff810cc883>] lock_acquire+0xc3/0x1d0
      [   82.811819]        [<ffffffff8124c070>] __kernfs_remove+0x210/0x2f0
      [   82.811823]        [<ffffffff8124d040>] kernfs_remove_by_name_ns+0x40/0xa0
      [   82.811828]        [<ffffffff8124e9e0>] sysfs_remove_file_ns+0x10/0x20
      [   82.811832]        [<ffffffff815318d4>] device_del+0x124/0x250
      [   82.811837]        [<ffffffff81531a19>] device_unregister+0x19/0x60
      [   82.811841]        [<ffffffff8153c051>] cpu_cache_sysfs_exit+0x51/0xb0
      [   82.811846]        [<ffffffff8153c628>] cacheinfo_cpu_callback+0x38/0x70
      [   82.811851]        [<ffffffff8109ae89>] notifier_call_chain+0x39/0xa0
      [   82.811856]        [<ffffffff8109aef9>] __raw_notifier_call_chain+0x9/0x10
      [   82.811860]        [<ffffffff810786de>] cpu_notify+0x1e/0x40
      [   82.811865]        [<ffffffff81078779>] cpu_notify_nofail+0x9/0x20
      [   82.811869]        [<ffffffff81078ac3>] _cpu_down+0x233/0x340
      [   82.811874]        [<ffffffff81079019>] disable_nonboot_cpus+0xc9/0x350
      [   82.811878]        [<ffffffff810d2e11>] suspend_devices_and_enter+0x5a1/0xb50
      [   82.811883]        [<ffffffff810d3903>] pm_suspend+0x543/0x8d0
      [   82.811888]        [<ffffffff810d1b77>] state_store+0x77/0xe0
      [   82.811892]        [<ffffffff813fa68f>] kobj_attr_store+0xf/0x20
      [   82.811897]        [<ffffffff8124e740>] sysfs_kf_write+0x40/0x50
      [   82.811902]        [<ffffffff8124dafc>] kernfs_fop_write+0x13c/0x180
      [   82.811906]        [<ffffffff811d1023>] __vfs_write+0x23/0xe0
      [   82.811910]        [<ffffffff811d1d74>] vfs_write+0xa4/0x190
      [   82.811914]        [<ffffffff811d2c14>] SyS_write+0x44/0xb0
      [   82.811918]        [<ffffffff817bb81b>] entry_SYSCALL_64_fastpath+0x16/0x73
      [   82.811923]
      -> #1 (cpu_hotplug.lock){+.+.+.}:
      [   82.811929]        [<ffffffff810cc883>] lock_acquire+0xc3/0x1d0
      [   82.811933]        [<ffffffff817b6f72>] mutex_lock_nested+0x62/0x3b0
      [   82.811940]        [<ffffffff810784c1>] get_online_cpus+0x61/0x80
      [   82.811944]        [<ffffffff811170eb>] stop_machine+0x1b/0xe0
      [   82.811949]        [<ffffffffa0178edd>] gen8_ggtt_insert_entries__BKL+0x2d/0x30 [i915]
      [   82.812009]        [<ffffffffa017d3a6>] ggtt_bind_vma+0x46/0x70 [i915]
      [   82.812045]        [<ffffffffa017eb70>] i915_vma_bind+0x140/0x290 [i915]
      [   82.812081]        [<ffffffffa01862b9>] i915_gem_object_do_pin+0x899/0xb00 [i915]
      [   82.812117]        [<ffffffffa0186555>] i915_gem_object_pin+0x35/0x40 [i915]
      [   82.812154]        [<ffffffffa019a23e>] intel_init_pipe_control+0xbe/0x210 [i915]
      [   82.812192]        [<ffffffffa0197312>] intel_logical_rings_init+0xe2/0xde0 [i915]
      [   82.812232]        [<ffffffffa0186fe3>] i915_gem_init+0xf3/0x130 [i915]
      [   82.812278]        [<ffffffffa02097ed>] i915_driver_load+0xf2d/0x1770 [i915]
      [   82.812318]        [<ffffffff81512474>] drm_dev_register+0xa4/0xb0
      [   82.812323]        [<ffffffff8151467e>] drm_get_pci_dev+0xce/0x1e0
      [   82.812328]        [<ffffffffa01472cf>] i915_pci_probe+0x2f/0x50 [i915]
      [   82.812360]        [<ffffffff8143f907>] pci_device_probe+0x87/0xf0
      [   82.812366]        [<ffffffff81535f89>] driver_probe_device+0x229/0x450
      [   82.812371]        [<ffffffff81536233>] __driver_attach+0x83/0x90
      [   82.812375]        [<ffffffff81533c61>] bus_for_each_dev+0x61/0xa0
      [   82.812380]        [<ffffffff81535879>] driver_attach+0x19/0x20
      [   82.812384]        [<ffffffff8153535f>] bus_add_driver+0x1ef/0x290
      [   82.812388]        [<ffffffff81536e9b>] driver_register+0x5b/0xe0
      [   82.812393]        [<ffffffff8143e83b>] __pci_register_driver+0x5b/0x60
      [   82.812398]        [<ffffffff81514866>] drm_pci_init+0xd6/0x100
      [   82.812402]        [<ffffffffa027c094>] 0xffffffffa027c094
      [   82.812406]        [<ffffffff810003de>] do_one_initcall+0xae/0x1d0
      [   82.812412]        [<ffffffff811595a0>] do_init_module+0x5b/0x1cb
      [   82.812417]        [<ffffffff81106160>] load_module+0x1c20/0x2480
      [   82.812422]        [<ffffffff81106bae>] SyS_finit_module+0x7e/0xa0
      [   82.812428]        [<ffffffff817bb81b>] entry_SYSCALL_64_fastpath+0x16/0x73
      [   82.812433]
      -> #0 (&dev->struct_mutex){+.+.+.}:
      [   82.812439]        [<ffffffff810cbe59>] __lock_acquire+0x1fc9/0x20f0
      [   82.812443]        [<ffffffff810cc883>] lock_acquire+0xc3/0x1d0
      [   82.812456]        [<ffffffff8150d9e7>] drm_gem_mmap+0x1c7/0x270
      [   82.812460]        [<ffffffff81196a14>] mmap_region+0x334/0x580
      [   82.812466]        [<ffffffff81196fc4>] do_mmap+0x364/0x410
      [   82.812470]        [<ffffffff8117b38d>] vm_mmap_pgoff+0x6d/0xa0
      [   82.812474]        [<ffffffff811950f4>] SyS_mmap_pgoff+0x184/0x220
      [   82.812479]        [<ffffffff8100a0fd>] SyS_mmap+0x1d/0x20
      [   82.812484]        [<ffffffff817bb81b>] entry_SYSCALL_64_fastpath+0x16/0x73
      [   82.812489]
      other info that might help us debug this:
      
      [   82.812493] Chain exists of:
        &dev->struct_mutex --> s_active#6 --> &mm->mmap_sem
      
      [   82.812502]  Possible unsafe locking scenario:
      
      [   82.812506]        CPU0                    CPU1
      [   82.812508]        ----                    ----
      [   82.812510]   lock(&mm->mmap_sem);
      [   82.812514]                                lock(s_active#6);
      [   82.812519]                                lock(&mm->mmap_sem);
      [   82.812522]   lock(&dev->struct_mutex);
      [   82.812526]
       *** DEADLOCK ***
      
      [   82.812531] 1 lock held by kms_setmode/5859:
      [   82.812533]  #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff8117b364>] vm_mmap_pgoff+0x44/0xa0
      [   82.812541]
      stack backtrace:
      [   82.812547] CPU: 0 PID: 5859 Comm: kms_setmode Not tainted 4.5.0-rc4-gfxbench+ #1
      [   82.812550] Hardware name:                  /NUC5CPYB, BIOS PYBSWCEL.86A.0040.2015.0814.1353 08/14/2015
      [   82.812553]  0000000000000000 ffff880079407bf0 ffffffff813f8505 ffffffff825fb270
      [   82.812560]  ffffffff825c4190 ffff880079407c30 ffffffff810c84ac ffff880079407c90
      [   82.812566]  ffff8800797ed328 ffff8800797ecb00 0000000000000001 ffff8800797ed350
      [   82.812573] Call Trace:
      [   82.812578]  [<ffffffff813f8505>] dump_stack+0x67/0x92
      [   82.812582]  [<ffffffff810c84ac>] print_circular_bug+0x1fc/0x310
      [   82.812586]  [<ffffffff810cbe59>] __lock_acquire+0x1fc9/0x20f0
      [   82.812590]  [<ffffffff810cc883>] lock_acquire+0xc3/0x1d0
      [   82.812594]  [<ffffffff8150d9c1>] ? drm_gem_mmap+0x1a1/0x270
      [   82.812599]  [<ffffffff8150d9e7>] drm_gem_mmap+0x1c7/0x270
      [   82.812603]  [<ffffffff8150d9c1>] ? drm_gem_mmap+0x1a1/0x270
      [   82.812608]  [<ffffffff81196a14>] mmap_region+0x334/0x580
      [   82.812612]  [<ffffffff81196fc4>] do_mmap+0x364/0x410
      [   82.812616]  [<ffffffff8117b38d>] vm_mmap_pgoff+0x6d/0xa0
      [   82.812629]  [<ffffffff811950f4>] SyS_mmap_pgoff+0x184/0x220
      [   82.812633]  [<ffffffff8100a0fd>] SyS_mmap+0x1d/0x20
      [   82.812637]  [<ffffffff817bb81b>] entry_SYSCALL_64_fastpath+0x16/0x73
      
      Highly unlikely though this scenario is, we can avoid the issue entirely
      by moving the copy operation from out under the kernfs_get_active()
      tracking by assigning the preallocated buffer its own mutex. The
      temporary buffer allocation doesn't require mutex locking as it is
      entirely local.
      
      The locked section was extended by the addition of the preallocated buf
      to speed up md user operations in
      
      commit 2b75869b
      Author: NeilBrown <neilb@suse.de>
      Date:   Mon Oct 13 16:41:28 2014 +1100
      
          sysfs/kernfs: allow attributes to request write buffer be pre-allocated.
      Reported-by: NVille Syrjälä <ville.syrjala@linux.intel.com>
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=94350Signed-off-by: NChris Wilson <chris@chris-wilson.co.uk>
      Reviewed-by: NJoonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
      Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
      Cc: NeilBrown <neilb@suse.de>
      Acked-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e4234a1f
  10. 25 5月, 2015 1 次提交
  11. 17 3月, 2015 1 次提交
  12. 14 2月, 2015 1 次提交
    • T
      kernfs: remove KERNFS_STATIC_NAME · dfeb0750
      Tejun Heo 提交于
      When a new kernfs node is created, KERNFS_STATIC_NAME is used to avoid
      making a separate copy of its name.  It's currently only used for sysfs
      attributes whose filenames are required to stay accessible and unchanged.
      There are rare exceptions where these names are allocated and formatted
      dynamically but for the vast majority of cases they're consts in the
      rodata section.
      
      Now that kernfs is converted to use kstrdup_const() and kfree_const(),
      there's little point in keeping KERNFS_STATIC_NAME around.  Remove it.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Andrzej Hajda <a.hajda@samsung.com>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      dfeb0750
  13. 17 12月, 2014 1 次提交
  14. 08 11月, 2014 2 次提交
    • N
      sysfs/kernfs: make read requests on pre-alloc files use the buffer. · 4ef67a8c
      NeilBrown 提交于
      To match the previous patch which used the pre-alloc buffer for
      writes, this patch causes reads to use the same buffer.
      This is not strictly necessary as the current seq_read() will allocate
      on first read, so user-space can trigger the required pre-alloc.  But
      consistency is valuable.
      
      The read function is somewhat simpler than seq_read() and, for example,
      does not support reading from an offset into the file: reads must be
      at the start of the file.
      
      As seq_read() does not use the prealloc buffer, ->seq_show is
      incompatible with ->prealloc and caused an EINVAL return from open().
      sysfs code which calls into kernfs always chooses the correct function.
      
      As the buffer is shared with writes and other reads, the mutex is
      extended to cover the copy_to_user.
      Signed-off-by: NNeilBrown <neilb@suse.de>
      Reviewed-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4ef67a8c
    • N
      sysfs/kernfs: allow attributes to request write buffer be pre-allocated. · 2b75869b
      NeilBrown 提交于
      md/raid allows metadata management to be performed in user-space.
      A various times, particularly on device failure, the metadata needs
      to be updated before further writes can be permitted.
      This means that the user-space program which updates metadata much
      not block on writeout, and so must not allocate memory.
      
      mlockall(MCL_CURRENT|MCL_FUTURE) and pre-allocation can avoid all
      memory allocation issues for user-memory, but that does not help
      kernel memory.
      Several kernel objects can be pre-allocated.  e.g. files opened before
      any writes to the array are permitted.
      However some kernel allocation happens in places that cannot be
      pre-allocated.
      In particular, writes to sysfs files (to tell md that it can now
      allow writes to the array) allocate a buffer using GFP_KERNEL.
      
      This patch allows attributes to be marked as "PREALLOC".  In that case
      the maximal buffer is allocated when the file is opened, and then used
      on each write instead of allocating a new buffer.
      
      As the same buffer is now shared for all writes on the same file
      description, the mutex is extended to cover full use of the buffer
      including the copy_from_user().
      
      The new __ATTR_PREALLOC() 'or's a new flag in to the 'mode', which is
      inspected by sysfs_add_file_mode_ns() to determine if the file should be
      marked as requiring prealloc.
      
      Despite the comment, we *do* use ->seq_show together with ->prealloc
      in this patch.  The next patch fixes that.
      Signed-off-by: NNeilBrown  <neilb@suse.de>
      Reviewed-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2b75869b
  15. 10 7月, 2014 1 次提交
  16. 03 7月, 2014 1 次提交
    • T
      kernfs: kernfs_notify() must be useable from non-sleepable contexts · ecca47ce
      Tejun Heo 提交于
      d911d987 ("kernfs: make kernfs_notify() trigger inotify events
      too") added fsnotify triggering to kernfs_notify() which requires a
      sleepable context.  There are already existing users of
      kernfs_notify() which invoke it from an atomic context and in general
      it's silly to require a sleepable context for triggering a
      notification.
      
      The following is an invalid context bug triggerd by md invoking
      sysfs_notify() from IO completion path.
      
       BUG: sleeping function called from invalid context at kernel/locking/mutex.c:586
       in_atomic(): 1, irqs_disabled(): 1, pid: 0, name: swapper/1
       2 locks held by swapper/1/0:
        #0:  (&(&vblk->vq_lock)->rlock){-.-...}, at: [<ffffffffa0039042>] virtblk_done+0x42/0xe0 [virtio_blk]
        #1:  (&(&bitmap->counts.lock)->rlock){-.....}, at: [<ffffffff81633718>] bitmap_endwrite+0x68/0x240
       irq event stamp: 33518
       hardirqs last  enabled at (33515): [<ffffffff8102544f>] default_idle+0x1f/0x230
       hardirqs last disabled at (33516): [<ffffffff818122ed>] common_interrupt+0x6d/0x72
       softirqs last  enabled at (33518): [<ffffffff810a1272>] _local_bh_enable+0x22/0x50
       softirqs last disabled at (33517): [<ffffffff810a29e0>] irq_enter+0x60/0x80
       CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.16.0-0.rc2.git2.1.fc21.x86_64 #1
       Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
        0000000000000000 f90db13964f4ee05 ffff88007d403b80 ffffffff81807b4c
        0000000000000000 ffff88007d403ba8 ffffffff810d4f14 0000000000000000
        0000000000441800 ffff880078fa1780 ffff88007d403c38 ffffffff8180caf2
       Call Trace:
        <IRQ>  [<ffffffff81807b4c>] dump_stack+0x4d/0x66
        [<ffffffff810d4f14>] __might_sleep+0x184/0x240
        [<ffffffff8180caf2>] mutex_lock_nested+0x42/0x440
        [<ffffffff812d76a0>] kernfs_notify+0x90/0x150
        [<ffffffff8163377c>] bitmap_endwrite+0xcc/0x240
        [<ffffffffa00de863>] close_write+0x93/0xb0 [raid1]
        [<ffffffffa00df029>] r1_bio_write_done+0x29/0x50 [raid1]
        [<ffffffffa00e0474>] raid1_end_write_request+0xe4/0x260 [raid1]
        [<ffffffff813acb8b>] bio_endio+0x6b/0xa0
        [<ffffffff813b46c4>] blk_update_request+0x94/0x420
        [<ffffffff813bf0ea>] blk_mq_end_io+0x1a/0x70
        [<ffffffffa00392c2>] virtblk_request_done+0x32/0x80 [virtio_blk]
        [<ffffffff813c0648>] __blk_mq_complete_request+0x88/0x120
        [<ffffffff813c070a>] blk_mq_complete_request+0x2a/0x30
        [<ffffffffa0039066>] virtblk_done+0x66/0xe0 [virtio_blk]
        [<ffffffffa002535a>] vring_interrupt+0x3a/0xa0 [virtio_ring]
        [<ffffffff81116177>] handle_irq_event_percpu+0x77/0x340
        [<ffffffff8111647d>] handle_irq_event+0x3d/0x60
        [<ffffffff81119436>] handle_edge_irq+0x66/0x130
        [<ffffffff8101c3e4>] handle_irq+0x84/0x150
        [<ffffffff818146ad>] do_IRQ+0x4d/0xe0
        [<ffffffff818122f2>] common_interrupt+0x72/0x72
        <EOI>  [<ffffffff8105f706>] ? native_safe_halt+0x6/0x10
        [<ffffffff81025454>] default_idle+0x24/0x230
        [<ffffffff81025f9f>] arch_cpu_idle+0xf/0x20
        [<ffffffff810f5adc>] cpu_startup_entry+0x37c/0x7b0
        [<ffffffff8104df1b>] start_secondary+0x25b/0x300
      
      This patch fixes it by punting the notification delivery through a
      work item.  This ends up adding an extra pointer to kernfs_elem_attr
      enlarging kernfs_node by a pointer, which is not ideal but not a very
      big deal either.  If this turns out to be an actual issue, we can move
      kernfs_elem_attr->size to kernfs_node->iattr later.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NJosh Boyer <jwboyer@fedoraproject.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Reviewed-by: NMichael S. Tsirkin <mst@redhat.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ecca47ce
  17. 13 5月, 2014 1 次提交
    • T
      kernfs, sysfs, cgroup: restrict extra perm check on open to sysfs · 555724a8
      Tejun Heo 提交于
      The kernfs open method - kernfs_fop_open() - inherited extra
      permission checks from sysfs.  While the vfs layer allows ignoring the
      read/write permissions checks if the issuer has CAP_DAC_OVERRIDE,
      sysfs explicitly denied open regardless of the cap if the file doesn't
      have any of the UGO perms of the requested access or doesn't implement
      the requested operation.  It can be debated whether this was a good
      idea or not but the behavior is too subtle and dangerous to change at
      this point.
      
      After cgroup got converted to kernfs, this extra perm check also got
      applied to cgroup breaking libcgroup which opens write-only files with
      O_RDWR as root.  This patch gates the extra open permission check with
      a new flag KERNFS_ROOT_EXTRA_OPEN_PERM_CHECK and enables it for sysfs.
      For sysfs, nothing changes.  For cgroup, root now can perform any
      operation regardless of the permissions as it was before kernfs
      conversion.  Note that kernfs still fails unimplemented operations
      with -EINVAL.
      
      While at it, add comments explaining KERNFS_ROOT flags.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NAndrey Wagin <avagin@gmail.com>
      Tested-by: NAndrey Wagin <avagin@gmail.com>
      Cc: Li Zefan <lizefan@huawei.com>
      References: http://lkml.kernel.org/g/CANaxB-xUm3rJ-Cbp72q-rQJO5mZe1qK6qXsQM=vh0U8upJ44+A@mail.gmail.com
      Fixes: 2bd59d48 ("cgroup: convert to kernfs")
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      555724a8
  18. 26 4月, 2014 2 次提交
    • T
      kernfs: add back missing error check in kernfs_fop_mmap() · b44b2140
      Tejun Heo 提交于
      While updating how mmap enabled kernfs files are handled by lockdep,
      9b2db6e1 ("sysfs: bail early from kernfs_file_mmap() to avoid
      spurious lockdep warning") inadvertently dropped error return check
      from kernfs_file_mmap().  The intention was just dropping "if
      (ops->mmap)" check as the control won't reach the point if the mmap
      callback isn't implemented, but I mistakenly removed the error return
      check together with it.
      
      This led to Xorg crash on i810 which was reported and bisected to the
      commit and then to the specific change by Tobias.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-and-bisected-by: NTobias Powalowski <tobias.powalowski@googlemail.com>
      Tested-by: NTobias Powalowski <tobias.powalowski@googlemail.com>
      References: http://lkml.kernel.org/g/533D01BD.1010200@googlemail.com
      Cc: stable <stable@vger.kernel.org> # 3.14
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b44b2140
    • T
      kernfs: make kernfs_notify() trigger inotify events too · d911d987
      Tejun Heo 提交于
      kernfs_notify() is used to indicate either new data is available or
      the content of a file has changed.  It currently only triggers poll
      which may not be the most convenient to monitor especially when there
      are a lot to monitor.  Let's hook it up to fsnotify too so that the
      events can be monitored via inotify too.
      
      fsnotify_modify() requires file * but kernfs_notify() doesn't have any
      specific file associated; however, we can walk all super_blocks
      associated with a kernfs_root and as kernfs always associate one ino
      with inode and one dentry with an inode, it's trivial to look up the
      dentry associated with a given kernfs_node.  As any active monitor
      would pin dentry, just looking up existing dentry is enough.  This
      patch looks up the dentry associated with the specified kernfs_node
      and generates events equivalent to fsnotify_modify().
      
      Note that as fsnotify doesn't provide fsnotify_modify() equivalent
      which can be called with dentry, kernfs_notify() directly calls
      fsnotify_parent() and fsnotify().  It might be better to add a wrapper
      in fsnotify.h instead.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: John McCutchan <john@johnmccutchan.com>
      Cc: Robert Love <rlove@rlove.org>
      Cc: Eric Paris <eparis@parisplace.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d911d987
  19. 09 3月, 2014 1 次提交
    • T
      kernfs: cache atomic_write_len in kernfs_open_file · b7ce40cf
      Tejun Heo 提交于
      While implementing atomic_write_len, 4d3773c4 ("kernfs: implement
      kernfs_ops->atomic_write_len") moved data copy from userland inside
      kernfs_get_active() and kernfs_open_file->mutex so that
      kernfs_ops->atomic_write_len can be accessed before copying buffer
      from userland; unfortunately, this could lead to locking order
      inversion involving mmap_sem if copy_from_user() takes a page fault.
      
        ======================================================
        [ INFO: possible circular locking dependency detected ]
        3.14.0-rc4-next-20140228-sasha-00011-g4077c67-dirty #26 Tainted: G        W
        -------------------------------------------------------
        trinity-c236/10658 is trying to acquire lock:
         (&of->mutex#2){+.+.+.}, at: [<fs/kernfs/file.c:487>] kernfs_fop_mmap+0x54/0x120
      
        but task is already holding lock:
         (&mm->mmap_sem){++++++}, at: [<mm/util.c:397>] vm_mmap_pgoff+0x6e/0xe0
      
        which lock already depends on the new lock.
      
        the existing dependency chain (in reverse order) is:
      
       -> #1 (&mm->mmap_sem){++++++}:
      	 [<kernel/locking/lockdep.c:1945 kernel/locking/lockdep.c:2131>] validate_chain+0x6c5/0x7b0
      	 [<kernel/locking/lockdep.c:3182>] __lock_acquire+0x4cd/0x5a0
      	 [<arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602>] lock_acquire+0x182/0x1d0
      	 [<mm/memory.c:4188>] might_fault+0x7e/0xb0
      	 [<arch/x86/include/asm/uaccess.h:713 fs/kernfs/file.c:291>] kernfs_fop_write+0xd8/0x190
      	 [<fs/read_write.c:473>] vfs_write+0xe3/0x1d0
      	 [<fs/read_write.c:523 fs/read_write.c:515>] SyS_write+0x5d/0xa0
      	 [<arch/x86/kernel/entry_64.S:749>] tracesys+0xdd/0xe2
      
       -> #0 (&of->mutex#2){+.+.+.}:
      	 [<kernel/locking/lockdep.c:1840>] check_prev_add+0x13f/0x560
      	 [<kernel/locking/lockdep.c:1945 kernel/locking/lockdep.c:2131>] validate_chain+0x6c5/0x7b0
      	 [<kernel/locking/lockdep.c:3182>] __lock_acquire+0x4cd/0x5a0
      	 [<arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602>] lock_acquire+0x182/0x1d0
      	 [<kernel/locking/mutex.c:470 kernel/locking/mutex.c:571>] mutex_lock_nested+0x6a/0x510
      	 [<fs/kernfs/file.c:487>] kernfs_fop_mmap+0x54/0x120
      	 [<mm/mmap.c:1573>] mmap_region+0x310/0x5c0
      	 [<mm/mmap.c:1365>] do_mmap_pgoff+0x385/0x430
      	 [<mm/util.c:399>] vm_mmap_pgoff+0x8f/0xe0
      	 [<mm/mmap.c:1416 mm/mmap.c:1374>] SyS_mmap_pgoff+0x1b0/0x210
      	 [<arch/x86/kernel/sys_x86_64.c:72>] SyS_mmap+0x1d/0x20
      	 [<arch/x86/kernel/entry_64.S:749>] tracesys+0xdd/0xe2
      
        other info that might help us debug this:
      
         Possible unsafe locking scenario:
      
      	 CPU0                    CPU1
      	 ----                    ----
          lock(&mm->mmap_sem);
      				 lock(&of->mutex#2);
      				 lock(&mm->mmap_sem);
          lock(&of->mutex#2);
      
         *** DEADLOCK ***
      
        1 lock held by trinity-c236/10658:
         #0:  (&mm->mmap_sem){++++++}, at: [<mm/util.c:397>] vm_mmap_pgoff+0x6e/0xe0
      
        stack backtrace:
        CPU: 2 PID: 10658 Comm: trinity-c236 Tainted: G        W 3.14.0-rc4-next-20140228-sasha-00011-g4077c67-dirty #26
         0000000000000000 ffff88011911fa48 ffffffff8438e945 0000000000000000
         0000000000000000 ffff88011911fa98 ffffffff811a0109 ffff88011911fab8
         ffff88011911fab8 ffff88011911fa98 ffff880119128cc0 ffff880119128cf8
        Call Trace:
         [<lib/dump_stack.c:52>] dump_stack+0x52/0x7f
         [<kernel/locking/lockdep.c:1213>] print_circular_bug+0x129/0x160
         [<kernel/locking/lockdep.c:1840>] check_prev_add+0x13f/0x560
         [<include/linux/spinlock.h:343 mm/slub.c:1933>] ? deactivate_slab+0x511/0x550
         [<kernel/locking/lockdep.c:1945 kernel/locking/lockdep.c:2131>] validate_chain+0x6c5/0x7b0
         [<kernel/locking/lockdep.c:3182>] __lock_acquire+0x4cd/0x5a0
         [<mm/mmap.c:1552>] ? mmap_region+0x24a/0x5c0
         [<arch/x86/include/asm/current.h:14 kernel/locking/lockdep.c:3602>] lock_acquire+0x182/0x1d0
         [<fs/kernfs/file.c:487>] ? kernfs_fop_mmap+0x54/0x120
         [<kernel/locking/mutex.c:470 kernel/locking/mutex.c:571>] mutex_lock_nested+0x6a/0x510
         [<fs/kernfs/file.c:487>] ? kernfs_fop_mmap+0x54/0x120
         [<kernel/sched/core.c:2477>] ? get_parent_ip+0x11/0x50
         [<fs/kernfs/file.c:487>] ? kernfs_fop_mmap+0x54/0x120
         [<fs/kernfs/file.c:487>] kernfs_fop_mmap+0x54/0x120
         [<mm/mmap.c:1573>] mmap_region+0x310/0x5c0
         [<mm/mmap.c:1365>] do_mmap_pgoff+0x385/0x430
         [<mm/util.c:397>] ? vm_mmap_pgoff+0x6e/0xe0
         [<mm/util.c:399>] vm_mmap_pgoff+0x8f/0xe0
         [<kernel/rcu/update.c:97>] ? __rcu_read_unlock+0x44/0xb0
         [<fs/file.c:641>] ? dup_fd+0x3c0/0x3c0
         [<mm/mmap.c:1416 mm/mmap.c:1374>] SyS_mmap_pgoff+0x1b0/0x210
         [<arch/x86/kernel/sys_x86_64.c:72>] SyS_mmap+0x1d/0x20
         [<arch/x86/kernel/entry_64.S:749>] tracesys+0xdd/0xe2
      
      Fix it by caching atomic_write_len in kernfs_open_file during open so
      that it can be determined without accessing kernfs_ops in
      kernfs_fop_write().  This restores the structure of kernfs_fop_write()
      before 4d3773c4 with updated @len determination logic.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      References: http://lkml.kernel.org/g/53113485.2090407@oracle.comSigned-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      b7ce40cf
  20. 08 2月, 2014 2 次提交
    • T
      kernfs: implement kernfs_ops->atomic_write_len · 4d3773c4
      Tejun Heo 提交于
      A write to a kernfs_node is buffered through a kernel buffer.  Writes
      <= PAGE_SIZE are performed atomically, while larger ones are executed
      in PAGE_SIZE chunks.  While this is enough for sysfs, cgroup which is
      scheduled to be converted to use kernfs needs a bit more control over
      it.
      
      This patch adds kernfs_ops->atomic_write_len.  If not set (zero), the
      behavior stays the same.  If set, writes upto the size are executed
      atomically and larger writes are rejected with -E2BIG.
      
      A different implementation strategy would be allowing configuring
      chunking size while making the original write size available to the
      write method; however, such strategy, while being more complicated,
      doesn't really buy anything.  If the write implementation has to
      handle chunking, the specific chunk size shouldn't matter all that
      much.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      4d3773c4
    • T
      kernfs: remove kernfs_addrm_cxt · 988cd7af
      Tejun Heo 提交于
      kernfs_addrm_cxt and the accompanying kernfs_addrm_start/finish() were
      added because there were operations which should be performed outside
      kernfs_mutex after adding and removing kernfs_nodes.  The necessary
      operations were recorded in kernfs_addrm_cxt and performed by
      kernfs_addrm_finish(); however, after the recent changes which
      relocated deactivation and unmapping so that they're performed
      directly during removal, the only operation kernfs_addrm_finish()
      performs is kernfs_put(), which can be moved inside the removal path
      too.
      
      This patch moves the kernfs_put() of the base ref to __kernfs_remove()
      and remove kernfs_addrm_cxt and kernfs_addrm_start/finish().
      
      * kernfs_add_one() is updated to grab and release kernfs_mutex itself.
        sysfs_addrm_start/finish() invocations around it are removed from
        all users.
      
      * __kernfs_remove() puts an unlinked node directly instead of chaining
        it to kernfs_addrm_cxt.  Its callers are updated to grab and release
        kernfs_mutex instead of calling kernfs_addrm_start/finish() around
        it.
      
      v2: Rebased on top of "kernfs: associate a new kernfs_node with its
          parent on creation" which dropped @parent from kernfs_add_one().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      988cd7af
  21. 18 1月, 2014 1 次提交
    • T
      kernfs: associate a new kernfs_node with its parent on creation · db4aad20
      Tejun Heo 提交于
      Once created, a kernfs_node is always destroyed by kernfs_put().
      Since ba7443bc ("sysfs, kernfs: implement
      kernfs_create/destroy_root()"), kernfs_put() depends on kernfs_root()
      to locate the ino_ida.  kernfs_root() in turn depends on
      kernfs_node->parent being set for !dir nodes.  This means that
      kernfs_put() of a !dir node requires its ->parent to be initialized.
      
      This leads to oops when a newly created !dir node is destroyed without
      going through kernfs_add_one() or after failing kernfs_add_one()
      before ->parent is set.  kernfs_root() invoked from kernfs_put() will
      try to dereference NULL parent.
      
      Fix it by moving parent association to kernfs_new_node() from
      kernfs_add_one().  kernfs_new_node() now takes @parent instead of
      @root and determines the root from the parent and also sets the new
      node's parent properly.  @parent parameter is removed from
      kernfs_add_one().  As there's no parent when creating the root node,
      __kernfs_new_node() which takes @root as before and doesn't set the
      parent is used in that case.
      
      This ensures that a kernfs_node in any stage in its life has its
      parent associated and thus can be put.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      db4aad20
  22. 15 1月, 2014 1 次提交
    • T
      kernfs: fix get_active failure handling in kernfs_seq_*() · bb305947
      Tejun Heo 提交于
      When kernfs_seq_start() fails to obtain an active reference, it
      returns ERR_PTR(-ENODEV).  kernfs_seq_stop() is then invoked with the
      error pointer value; however, it still proceeds to invoke
      kernfs_put_active() on the node leading to unbalanced put.
      
      If kernfs_seq_stop() is called even after active ref failure, it
      should skip invocation of @ops->seq_stop() and put_active.
      Unfortunately, this is a bit complicated because active ref failure
      isn't the only thing which may fail with ERR_PTR(-ENODEV).
      @ops->seq_start/next() may also fail with the error value and
      kernfs_seq_stop() doesn't have a way to tell apart those failures.
      
      Work it around by factoring out the active part of kernfs_seq_stop()
      into kernfs_seq_stop_active() and invoking it directly if
      @ops->seq_start/next() fail with ERR_PTR(-ENODEV) and updating
      kernfs_seq_stop() to skip kernfs_seq_stop_active() on
      ERR_PTR(-ENODEV).  This is a bit nasty but ensures that the active put
      is skipped iff get_active failed in kernfs_seq_start().
      
      tj: This was originally committed as d92d2e6b but got reverted by
          683bb276 along with other kernfs self removal patches.
          However, this one is an independent fix and shouldn't have been
          reverted together.  Reinstate the change.  Sorry about the mess.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      bb305947
  23. 14 1月, 2014 4 次提交
  24. 11 1月, 2014 4 次提交
    • T
      kernfs: remove kernfs_addrm_cxt · 99177a34
      Tejun Heo 提交于
      kernfs_addrm_cxt and the accompanying kernfs_addrm_start/finish() were
      added because there were operations which should be performed outside
      kernfs_mutex after adding and removing kernfs_nodes.  The necessary
      operations were recorded in kernfs_addrm_cxt and performed by
      kernfs_addrm_finish(); however, after the recent changes which
      relocated deactivation and unmapping so that they're performed
      directly during removal, the only operation kernfs_addrm_finish()
      performs is kernfs_put(), which can be moved inside the removal path
      too.
      
      This patch moves the kernfs_put() of the base ref to __kernfs_remove()
      and remove kernfs_addrm_cxt and kernfs_addrm_start/finish().
      
      * kernfs_add_one() is updated to grab and release the parent's active
        ref and kernfs_mutex itself.  kernfs_get/put_active() and
        kernfs_addrm_start/finish() invocations around it are removed from
        all users.
      
      * __kernfs_remove() puts an unlinked node directly instead of chaining
        it to kernfs_addrm_cxt.  Its callers are updated to grab and release
        kernfs_mutex instead of calling kernfs_addrm_start/finish() around
        it.
      
      v2: Updated to fit the v2 restructuring of removal path.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      99177a34
    • T
      kernfs: invoke kernfs_unmap_bin_file() directly from __kernfs_remove() · f601f9a2
      Tejun Heo 提交于
      kernfs_unmap_bin_file() is supposed to unmap all memory mappings of
      the target file before kernfs_remove() finishes; however, it currently
      is being called from kernfs_addrm_finish() and has the same race
      problem as the original implementation of deactivation when there are
      multiple removers - only the remover which snatches the node to its
      addrm_cxt->removed list is guaranteed to wait for its completion
      before returning.
      
      It can be fixed by moving kernfs_unmap_bin_file() invocation from
      kernfs_addrm_finish() to __kernfs_remove().  The function may be
      called multiple times but that shouldn't do any harm.
      
      We end up dropping kernfs_mutex in the removal loop and the node may
      be removed inbetween by someone else.  kernfs_unlink_sibling() is
      updated to test whether the node has already been removed and return
      accordingly.  __kernfs_remove() in turn performs post-unlinking
      cleanup only if it actually unlinked the node.
      
      KERNFS_HAS_MMAP test is moved out of the unmap function into
      __kernfs_remove() so that we don't unlock kernfs_mutex unnecessarily.
      While at it, drop the now meaningless "bin" qualifier from the
      function name.
      
      v2: Rewritten to fit the v2 restructuring of removal path.  HAS_MMAP
          test relocated.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      f601f9a2
    • T
      kernfs: remove KERNFS_REMOVED · ae34372e
      Tejun Heo 提交于
      KERNFS_REMOVED is used to mark half-initialized and dying nodes so
      that they don't show up in lookups and deny adding new nodes under or
      renaming it; however, its role overlaps those of deactivation and
      removal from rbtree.
      
      It's necessary to deny addition of new children while removal is in
      progress; however, this role considerably intersects with deactivation
      - KERNFS_REMOVED prevents new children while deactivation prevents new
      file operations.  There's no reason to have them separate making
      things more complex than necessary.
      
      KERNFS_REMOVED is also used to decide whether a node is still visible
      to vfs layer, which is rather redundant as equivalent determination
      can be made by testing whether the node is on its parent's children
      rbtree or not.
      
      This patch removes KERNFS_REMOVED.
      
      * Instead of KERNFS_REMOVED, each node now starts its life
        deactivated.  This means that we now use both atomic_add() and
        atomic_sub() on KN_DEACTIVATED_BIAS, which is INT_MIN.  The compiler
        generates an overflow warnings when negating INT_MIN as the negation
        can't be represented as a positive number.  Nothing is actually
        broken but let's bump BIAS by one to avoid the warnings for archs
        which negates the subtrahend..
      
      * KERNFS_REMOVED tests in add and rename paths are replaced with
        kernfs_get/put_active() of the target nodes.  Due to the way the add
        path is structured now, active ref handling is done in the callers
        of kernfs_add_one().  This will be consolidated up later.
      
      * kernfs_remove_one() is updated to deactivate instead of setting
        KERNFS_REMOVED.  This removes deactivation from kernfs_deactivate(),
        which is now renamed to kernfs_drain().
      
      * kernfs_dop_revalidate() now tests RB_EMPTY_NODE(&kn->rb) instead of
        KERNFS_REMOVED and KERNFS_REMOVED test in kernfs_dir_pos() is
        dropped.  A node which is removed from the children rbtree is not
        included in the iteration in the first place.  This means that a
        node may be visible through vfs a bit longer - it's now also visible
        after deactivation until the actual removal.  This slightly enlarged
        window difference doesn't make any difference to the userland.
      
      * Sanity check on KERNFS_REMOVED in kernfs_put() is replaced with
        checks on the active ref.
      
      * Some comment style updates in the affected area.
      
      v2: Reordered before removal path restructuring.  kernfs_active()
          dropped and kernfs_get/put_active() used instead.  RB_EMPTY_NODE()
          used in the lookup paths.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ae34372e
    • T
      kernfs: fix get_active failure handling in kernfs_seq_*() · d92d2e6b
      Tejun Heo 提交于
      When kernfs_seq_start() fails to obtain an active reference, it
      returns ERR_PTR(-ENODEV).  kernfs_seq_stop() is then invoked with the
      error pointer value; however, it still proceeds to invoke
      kernfs_put_active() on the node leading to unbalanced put.
      
      If kernfs_seq_stop() is called even after active ref failure, it
      should skip invocation of @ops->seq_stop() and put_active.
      Unfortunately, this is a bit complicated because active ref failure
      isn't the only thing which may fail with ERR_PTR(-ENODEV).
      @ops->seq_start/next() may also fail with the error value and
      kernfs_seq_stop() doesn't have a way to tell apart those failures.
      
      Work it around by factoring out the active part of kernfs_seq_stop()
      into kernfs_seq_stop_active() and invoking it directly if
      @ops->seq_start/next() fail with ERR_PTR(-ENODEV) and updating
      kernfs_seq_stop() to skip kernfs_seq_stop_active() on
      ERR_PTR(-ENODEV).  This is a bit nasty but ensures that the active put
      is skipped iff get_active failed in kernfs_seq_start().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Cc: Sasha Levin <sasha.levin@oracle.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d92d2e6b
  25. 18 12月, 2013 1 次提交
    • T
      kernfs: mark static names with KERNFS_STATIC_NAME · 2063d608
      Tejun Heo 提交于
      Because sysfs used struct attribute which are supposed to stay
      constant, sysfs didn't copy names when creating regular files.  The
      specified string for name was supposed to stay constant.  Such
      distinction isn't inherent for kernfs.  kernfs_create_file[_ns]()
      should be able to take the same @name as kernfs_create_dir[_ns]()
      
      As there can be huge number of sysfs attributes, we still want to be
      able to use static names for sysfs attributes.  This patch renames
      kernfs_create_file_ns_key() to __kernfs_create_file() and adds
      @name_is_static parameter so that the caller can explicitly indicate
      that @name can be used without copying.  kernfs is updated to use
      KERNFS_STATIC_NAME to distinguish static and copied names.
      
      This patch doesn't introduce any behavior changes.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      2063d608
  26. 12 12月, 2013 5 次提交
    • T
      kernfs: s/sysfs/kernfs/ in internal functions and whatever is left · c637b8ac
      Tejun Heo 提交于
      kernfs has just been separated out from sysfs and we're already in
      full conflict mode.  Nothing can make the situation any worse.  Let's
      take the chance to name things properly.
      
      This patch performs the following renames.
      
      * s/sysfs_*()/kernfs_*()/ in all internal functions
      * s/sysfs/kernfs/ in internal strings, comments and whatever is remaining
      * Uniformly rename various vfs operations so that they're consistently
        named and distinguishable.
      
      This patch is strictly rename only and doesn't introduce any
      functional difference.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c637b8ac
    • T
      kernfs: s/sysfs/kernfs/ in global variables · a797bfc3
      Tejun Heo 提交于
      kernfs has just been separated out from sysfs and we're already in
      full conflict mode.  Nothing can make the situation any worse.  Let's
      take the chance to name things properly.
      
      This patch performs the following renames.
      
      * s/sysfs_mutex/kernfs_mutex/
      * s/sysfs_dentry_ops/kernfs_dops/
      * s/sysfs_dir_operations/kernfs_dir_fops/
      * s/sysfs_dir_inode_operations/kernfs_dir_iops/
      * s/kernfs_file_operations/kernfs_file_fops/ - renamed for consistency
      * s/sysfs_symlink_inode_operations/kernfs_symlink_iops/
      * s/sysfs_aops/kernfs_aops/
      * s/sysfs_backing_dev_info/kernfs_bdi/
      * s/sysfs_inode_operations/kernfs_iops/
      * s/sysfs_dir_cachep/kernfs_node_cache/
      * s/sysfs_ops/kernfs_sops/
      
      This patch is strictly rename only and doesn't introduce any
      functional difference.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a797bfc3
    • T
      kernfs: s/sysfs/kernfs/ in constants · df23fc39
      Tejun Heo 提交于
      kernfs has just been separated out from sysfs and we're already in
      full conflict mode.  Nothing can make the situation any worse.  Let's
      take the chance to name things properly.
      
      This patch performs the following renames.
      
      * s/SYSFS_DIR/KERNFS_DIR/
      * s/SYSFS_KOBJ_ATTR/KERNFS_FILE/
      * s/SYSFS_KOBJ_LINK/KERNFS_LINK/
      * s/SYSFS_{TYPE_FLAGS}/KERNFS_{TYPE_FLAGS}/
      * s/SYSFS_FLAG_{FLAG}/KERNFS_{FLAG}/
      * s/sysfs_type()/kernfs_type()/
      * s/SD_DEACTIVATED_BIAS/KN_DEACTIVATED_BIAS/
      
      This patch is strictly rename only and doesn't introduce any
      functional difference.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      df23fc39
    • T
      kernfs: s/sysfs/kernfs/ in various data structures · c525aadd
      Tejun Heo 提交于
      kernfs has just been separated out from sysfs and we're already in
      full conflict mode.  Nothing can make the situation any worse.  Let's
      take the chance to name things properly.
      
      This patch performs the following renames.
      
      * s/sysfs_open_dirent/kernfs_open_node/
      * s/sysfs_open_file/kernfs_open_file/
      * s/sysfs_inode_attrs/kernfs_iattrs/
      * s/sysfs_addrm_cxt/kernfs_addrm_cxt/
      * s/sysfs_super_info/kernfs_super_info/
      * s/sysfs_info()/kernfs_info()/
      * s/sysfs_open_dirent_lock/kernfs_open_node_lock/
      * s/sysfs_open_file_mutex/kernfs_open_file_mutex/
      * s/sysfs_of()/kernfs_of()/
      
      This patch is strictly rename only and doesn't introduce any
      functional difference.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c525aadd
    • T
      kernfs: drop s_ prefix from kernfs_node members · adc5e8b5
      Tejun Heo 提交于
      kernfs has just been separated out from sysfs and we're already in
      full conflict mode.  Nothing can make the situation any worse.  Let's
      take the chance to name things properly.
      
      s_ prefix for kernfs members is used inconsistently and a misnomer
      now.  It's not like kernfs_node is used widely across the kernel
      making the ability to grep for the members particularly useful.  Let's
      just drop the prefix.
      
      This patch is strictly rename only and doesn't introduce any
      functional difference.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      adc5e8b5