1. 16 1月, 2013 1 次提交
    • L
      cpuset: fix RCU lockdep splat · 27e89ae5
      Li Zefan 提交于
      5d21cc2d ("cpuset: replace
      cgroup_mutex locking with cpuset internal locking") incorrectly
      converted proc_cpuset_show() from cgroup_lock() to cpuset_mutex.
      proc_cpuset_show() is accessing cgroup hierarchy proper to determine
      cgroup path which can't be protected by cpuset_mutex.  This triggered
      the following RCU warning.
      
       ===============================
       [ INFO: suspicious RCU usage. ]
       3.8.0-rc3-next-20130114-sasha-00016-ga107525-dirty #262 Tainted: G        W
       -------------------------------
       include/linux/cgroup.h:534 suspicious rcu_dereference_check() usage!
      
       other info that might help us debug this:
      
       rcu_scheduler_active = 1, debug_locks = 1
       2 locks held by trinity/7514:
        #0:  (&p->lock){+.+.+.}, at: [<ffffffff812b06aa>] seq_read+0x3a/0x3e0
        #1:  (cpuset_mutex){+.+...}, at: [<ffffffff811abae4>] proc_cpuset_show+0x84/0x190
      
       stack backtrace:
       Pid: 7514, comm: trinity Tainted: G        W
      +3.8.0-rc3-next-20130114-sasha-00016-ga107525-dirty #262
       Call Trace:
        [<ffffffff81182cab>] lockdep_rcu_suspicious+0x10b/0x120
        [<ffffffff811abb71>] proc_cpuset_show+0x111/0x190
        [<ffffffff812b0827>] seq_read+0x1b7/0x3e0
        [<ffffffff812b0670>] ? seq_lseek+0x110/0x110
        [<ffffffff8128b4fb>] do_loop_readv_writev+0x4b/0x90
        [<ffffffff8128b776>] do_readv_writev+0xf6/0x1d0
        [<ffffffff8128b8ee>] vfs_readv+0x3e/0x60
        [<ffffffff8128b960>] sys_readv+0x50/0xd0
        [<ffffffff83d33d18>] tracesys+0xe1/0xe6
      
      The operation can be performed under RCU read lock.  Replace
      cpuset_mutex locking with RCU read locking.
      
      tj: Rewrote patch description.
      Reported-by: NSasha Levin <sasha.levin@oracle.com>
      Signed-off-by: NLi Zefan <lizefan@huawei.com>
      Signed-off-by: NTejun Heo <tj@kernel.org>
      27e89ae5
  2. 12 1月, 2013 1 次提交
  3. 08 1月, 2013 17 次提交
    • T
      cpuset: remove cpuset->parent · c431069f
      Tejun Heo 提交于
      cgroup already tracks the hierarchy.  Follow cgroup->parent to find
      the parent and drop cpuset->parent.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NMichal Hocko <mhocko@suse.cz>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      c431069f
    • T
      cpuset: replace cpuset->stack_list with cpuset_for_each_descendant_pre() · fc560a26
      Tejun Heo 提交于
      Implement cpuset_for_each_descendant_pre() and replace the
      cpuset-specific tree walking using cpuset->stack_list with it.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reviewed-by: NMichal Hocko <mhocko@suse.cz>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      fc560a26
    • T
      cpuset: replace cgroup_mutex locking with cpuset internal locking · 5d21cc2d
      Tejun Heo 提交于
      Supposedly for historical reasons, cpuset depends on cgroup core for
      locking.  It depends on cgroup_mutex in cgroup callbacks and grabs
      cgroup_mutex from other places where it wants to be synchronized.
      This is majorly messy and highly prone to introducing circular locking
      dependency especially because cgroup_mutex is supposed to be one of
      the outermost locks.
      
      As previous patches already plugged possible races which may happen by
      decoupling from cgroup_mutex, replacing cgroup_mutex with cpuset
      specific cpuset_mutex is mostly straight-forward.  Introduce
      cpuset_mutex, replace all occurrences of cgroup_mutex with it, and add
      cpuset_mutex locking to places which inherited cgroup_mutex from
      cgroup core.
      
      The only complication is from cpuset wanting to initiate task
      migration when a cpuset loses all cpus or memory nodes.  Task
      migration may go through full cgroup and all subsystem locking and
      should be initiated without holding any cpuset specific lock; however,
      a previous patch already made hotplug handled asynchronously and
      moving the task migration part outside other locks is easy.
      cpuset_propagate_hotplug_workfn() now invokes
      remove_tasks_in_empty_cpuset() without holding any lock.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      5d21cc2d
    • T
      cpuset: schedule hotplug propagation from cpuset_attach() if the cpuset is empty · 02bb5863
      Tejun Heo 提交于
      cpuset is scheduled to be decoupled from cgroup_lock which will make
      hotplug handling race with task migration.  cpus or mems will be
      allowed to go offline between ->can_attach() and ->attach().  If
      hotplug takes down all cpus or mems of a cpuset while attach is in
      progress, ->attach() may end up putting tasks into an empty cpuset.
      
      This patchset makes ->attach() schedule hotplug propagation if the
      cpuset is empty after attaching is complete.  This will move the tasks
      to the nearest ancestor which can execute and the end result would be
      as if hotplug handling happened after the tasks finished attaching.
      
      cpuset_write_resmask() now also flushes cpuset_propagate_hotplug_wq to
      wait for propagations scheduled directly by cpuset_attach().
      
      This currently doesn't make any functional difference as everything is
      protected by cgroup_mutex but enables decoupling the locking.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      02bb5863
    • T
      cpuset: pin down cpus and mems while a task is being attached · 452477fa
      Tejun Heo 提交于
      cpuset is scheduled to be decoupled from cgroup_lock which will make
      configuration updates race with task migration.  Any config update
      will be allowed to happen between ->can_attach() and ->attach().  If
      such config update removes either all cpus or mems, by the time
      ->attach() is called, the condition verified by ->can_attach(), that
      the cpuset is capable of hosting the tasks, is no longer true.
      
      This patch adds cpuset->attach_in_progress which is incremented from
      ->can_attach() and decremented when the attach operation finishes
      either successfully or not.  validate_change() treats cpusets w/
      non-zero ->attach_in_progress like cpusets w/ tasks and refuses to
      remove all cpus or mems from it.
      
      This currently doesn't make any functional difference as everything is
      protected by cgroup_mutex but enables decoupling the locking.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      452477fa
    • T
      cpuset: make CPU / memory hotplug propagation asynchronous · 8d033948
      Tejun Heo 提交于
      cpuset_hotplug_workfn() has been invoking cpuset_propagate_hotplug()
      directly to propagate hotplug updates to !root cpusets; however, this
      has the following problems.
      
      * cpuset locking is scheduled to be decoupled from cgroup_mutex,
        cgroup_mutex will be unexported, and cgroup_attach_task() will do
        cgroup locking internally, so propagation can't synchronously move
        tasks to a parent cgroup while walking the hierarchy.
      
      * We can't use cgroup generic tree iterator because propagation to
        each cpuset may sleep.  With propagation done asynchronously, we can
        lose the rather ugly cpuset specific iteration.
      
      Convert cpuset_propagate_hotplug() to
      cpuset_propagate_hotplug_workfn() and execute it from newly added
      cpuset->hotplug_work.  The work items are run on an ordered workqueue,
      so the propagation order is preserved.  cpuset_hotplug_workfn()
      schedules all propagations while holding cgroup_mutex and waits for
      completion without cgroup_mutex.  Each in-flight propagation holds a
      reference to the cpuset->css.
      
      This patch doesn't cause any functional difference.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      8d033948
    • T
      cpuset: drop async_rebuild_sched_domains() · 699140ba
      Tejun Heo 提交于
      In general, we want to make cgroup_mutex one of the outermost locks
      and be able to use get_online_cpus() and friends from cgroup methods.
      With cpuset hotplug made async, get_online_cpus() can now be nested
      inside cgroup_mutex.
      
      Currently, cpuset avoids nesting get_online_cpus() inside cgroup_mutex
      by bouncing sched_domain rebuilding to a work item.  As such nesting
      is allowed now, remove the workqueue bouncing code and always rebuild
      sched_domains synchronously.  This also nests sched_domains_mutex
      inside cgroup_mutex, which is intended and should be okay.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      699140ba
    • T
      cpuset: don't nest cgroup_mutex inside get_online_cpus() · 3a5a6d0c
      Tejun Heo 提交于
      CPU / memory hotplug path currently grabs cgroup_mutex from hotplug
      event notifications.  We want to separate cpuset locking from cgroup
      core and make cgroup_mutex outer to hotplug synchronization so that,
      among other things, mechanisms which depend on get_online_cpus() can
      be used from cgroup callbacks.  In general, we want to keep
      cgroup_mutex the outermost lock to minimize locking interactions among
      different controllers.
      
      Convert cpuset_handle_hotplug() to cpuset_hotplug_workfn() and
      schedule it from the hotplug notifications.  As the function can
      already handle multiple mixed events without any input, converting it
      to a work function is mostly trivial; however, one complication is
      that cpuset_update_active_cpus() needs to update sched domains
      synchronously to reflect an offlined cpu to avoid confusing the
      scheduler.  This is worked around by falling back to the the default
      single sched domain synchronously before scheduling the actual hotplug
      work.  This makes sched domain rebuilt twice per CPU hotplug event but
      the operation isn't that heavy and a lot of the second operation would
      be noop for systems w/ single sched domain, which is the common case.
      
      This decouples cpuset hotplug handling from the notification callbacks
      and there can be an arbitrary delay between the actual event and
      updates to cpusets.  Scheduler and mm can handle it fine but moving
      tasks out of an empty cpuset may race against writes to the cpuset
      restoring execution resources which can lead to confusing behavior.
      Flush hotplug work item from cpuset_write_resmask() to avoid such
      confusions.
      
      v2: Synchronous sched domain rebuilding using the fallback sched
          domain added.  This fixes various issues caused by confused
          scheduler putting tasks on a dead CPU, including the one reported
          by Li Zefan.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      3a5a6d0c
    • T
      cpuset: reorganize CPU / memory hotplug handling · deb7aa30
      Tejun Heo 提交于
      Reorganize hotplug path to prepare for async hotplug handling.
      
      * Both CPU and memory hotplug handlings are collected into a single
        function - cpuset_handle_hotplug().  It doesn't take any argument
        but compares the current setttings of top_cpuset against what's
        actually available to determine what happened.  This function
        directly updates top_cpuset.  If there are CPUs or memory nodes
        which are taken down, cpuset_propagate_hotplug() in invoked on all
        !root cpusets.
      
      * cpuset_propagate_hotplug() is responsible for updating the specified
        cpuset so that it doesn't include any resource which isn't available
        to top_cpuset.  If no CPU or memory is left after update, all tasks
        are moved to the nearest ancestor with both resources.
      
      * update_tasks_cpumask() and update_tasks_nodemask() are now always
        called after cpus or mems masks are updated even if the cpuset
        doesn't have any task.  This is for brevity and not expected to have
        any measureable effect.
      
      * cpu_active_mask and N_HIGH_MEMORY are read exactly once per
        cpuset_handle_hotplug() invocation, all cpusets share the same view
        of what resources are available, and cpuset_handle_hotplug() can
        handle multiple resources going up and down.  These properties will
        allow async operation.
      
      The reorganization, while drastic, is equivalent and shouldn't cause
      any behavior difference.  This will enable making hotplug handling
      async and remove get_online_cpus() -> cgroup_mutex nesting.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      deb7aa30
    • T
      cpuset: cleanup cpuset[_can]_attach() · 4e4c9a14
      Tejun Heo 提交于
      cpuset_can_attach() prepare global variables cpus_attach and
      cpuset_attach_nodemask_{to|from} which are used by cpuset_attach().
      There is no reason to prepare in cpuset_can_attach().  The same
      information can be accessed from cpuset_attach().
      
      Move the prepartion logic from cpuset_can_attach() to cpuset_attach()
      and make the global variables static ones inside cpuset_attach().
      
      With this change, there's no reason to keep
      cpuset_attach_nodemask_{from|to} global.  Move them inside
      cpuset_attach().  Unfortunately, we need to keep cpus_attach global as
      it can't be allocated from cpuset_attach().
      
      v2: cpus_attach not converted to cpumask_t as per Li Zefan and Rusty
          Russell.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      Cc: Rusty Russell <rusty@rustcorp.com.au>
      4e4c9a14
    • T
      cpuset: introduce cpuset_for_each_child() · ae8086ce
      Tejun Heo 提交于
      Instead of iterating cgroup->children directly, introduce and use
      cpuset_for_each_child() which wraps cgroup_for_each_child() and
      performs online check.  As it uses the generic iterator, it requires
      RCU read locking too.
      
      As cpuset is currently protected by cgroup_mutex, non-online cpusets
      aren't visible to all the iterations and this patch currently doesn't
      make any functional difference.  This will be used to de-couple cpuset
      locking from cgroup core.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      ae8086ce
    • T
      cpuset: introduce CS_ONLINE · efeb77b2
      Tejun Heo 提交于
      Add CS_ONLINE which is set from css_online() and cleared from
      css_offline().  This will enable using generic cgroup iterator while
      allowing decoupling cpuset from cgroup internal locking.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      efeb77b2
    • T
      cpuset: introduce ->css_on/offline() · c8f699bb
      Tejun Heo 提交于
      Add cpuset_css_on/offline() and rearrange css init/exit such that,
      
      * Allocation and clearing to the default values happen in css_alloc().
        Allocation now uses kzalloc().
      
      * Config inheritance and registration happen in css_online().
      
      * css_offline() undoes what css_online() did.
      
      * css_free() frees.
      
      This doesn't introduce any visible behavior changes.  This will help
      cleaning up locking.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      c8f699bb
    • T
      cpuset: remove fast exit path from remove_tasks_in_empty_cpuset() · 0772324a
      Tejun Heo 提交于
      The function isn't that hot, the overhead of missing the fast exit is
      low, the test itself depends heavily on cgroup internals, and it's
      gonna be a hindrance when trying to decouple cpuset locking from
      cgroup core.  Remove the fast exit path.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      0772324a
    • T
      cpuset: remove unused cpuset_unlock() · 01c889cf
      Tejun Heo 提交于
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      01c889cf
    • T
      cgroup: implement cgroup_rightmost_descendant() · 12a9d2fe
      Tejun Heo 提交于
      Implement cgroup_rightmost_descendant() which returns the right most
      descendant of the specified cgroup.  This can be used to skip the
      cgroup's subtree while iterating with
      cgroup_for_each_descendant_pre().
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Acked-by: NMichal Hocko <mhocko@suse.cz>
      Acked-by: NLi Zefan <lizefan@huawei.com>
      12a9d2fe
    • T
      cgroup: remove unused dummy cgroup_fork_callbacks() · d5b1fe68
      Tejun Heo 提交于
      5edee61e ("cgroup: cgroup_subsys->fork() should be called after the
      task is added to css_set") removed cgroup_fork_callbacks() but forgot
      to remove its dummy version for !CONFIG_CGROUPS.  Remove it.
      Signed-off-by: NTejun Heo <tj@kernel.org>
      Reported-by: NHerton Ronaldo Krzesinski <herton.krzesinski@canonical.com>
      d5b1fe68
  4. 03 1月, 2013 14 次提交
    • L
      Linux 3.8-rc2 · d1c3ed66
      Linus Torvalds 提交于
      d1c3ed66
    • L
      Merge branch 'fixes-for-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/linux-leds · d50403dc
      Linus Torvalds 提交于
      Pull LED fix from Bryan Wu.
      
      * 'fixes-for-3.8' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/linux-leds:
        leds: leds-gpio: set devm_gpio_request_one() flags param correctly
      d50403dc
    • J
      leds: leds-gpio: set devm_gpio_request_one() flags param correctly · 2d7c22f6
      Javier Martinez Canillas 提交于
      commit a99d76f9 leds: leds-gpio: use gpio_request_one
      
      changed the leds-gpio driver to use gpio_request_one() instead
      of gpio_request() + gpio_direction_output()
      
      Unfortunately, it also made a semantic change that breaks the
      leds-gpio driver.
      
      The gpio_request_one() flags parameter was set to:
      
      GPIOF_DIR_OUT | (led_dat->active_low ^ state)
      
      Since GPIOF_DIR_OUT is 0, the final flags value will just be the
      XOR'ed value of led_dat->active_low and state.
      
      This value were used to distinguish between HIGH/LOW output initial
      level and call gpio_direction_output() accordingly.
      
      With this new semantic gpio_request_one() will take the flags value
      of 1 as a configuration of input direction (GPIOF_DIR_IN) and will
      call gpio_direction_input() instead of gpio_direction_output().
      
      int gpio_request_one(unsigned gpio, unsigned long flags, const char *label)
      {
      ..
      	if (flags & GPIOF_DIR_IN)
      		err = gpio_direction_input(gpio);
      	else
      		err = gpio_direction_output(gpio,
      				(flags & GPIOF_INIT_HIGH) ? 1 : 0);
      ..
      }
      
      The right semantic is to evaluate led_dat->active_low ^ state and
      set the output initial level explicitly.
      Signed-off-by: NJavier Martinez Canillas <javier.martinez@collabora.co.uk>
      Reported-by: NArnaud Patard <arnaud.patard@rtp-net.org>
      Tested-by: NEzequiel Garcia <ezequiel.garcia@free-electrons.com>
      Signed-off-by: NBryan Wu <cooloney@gmail.com>
      2d7c22f6
    • L
      Merge git://www.linux-watchdog.org/linux-watchdog · ef05e9b9
      Linus Torvalds 提交于
      Pull watchdog fixes from Wim Van Sebroeck:
       "This fixes some small errors in the new da9055 driver, eliminates a
        compiler warning and adds DT support for the twl4030_wdt driver (so
        that we can have multiple watchdogs with DT on the omap platforms)."
      
      * git://www.linux-watchdog.org/linux-watchdog:
        watchdog: twl4030_wdt: add DT support
        watchdog: omap_wdt: eliminate unused variable and a compiler warning
        watchdog: da9055: Don't update wdt_dev->timeout in da9055_wdt_set_timeout error path
        watchdog: da9055: Fix invalid free of devm_ allocated data
      ef05e9b9
    • L
      Merge tag '3.8-pci-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci · 080a62e2
      Linus Torvalds 提交于
      Pull PCI updates from Bjorn Helgaas:
       "Some fixes for v3.8.  They include a fix for the new SR-IOV sysfs
        management support, an expanded quirk for Ricoh SD card readers, a
        Stratus DMI quirk fix, and a PME polling fix."
      
      * tag '3.8-pci-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci:
        PCI: Reduce Ricoh 0xe822 SD card reader base clock frequency to 50MHz
        PCI/PM: Do not suspend port if any subordinate device needs PME polling
        PCI: Add PCIe Link Capability link speed and width names
        PCI: Work around Stratus ftServer broken PCIe hierarchy (fix DMI check)
        PCI: Remove spurious error for sriov_numvfs store and simplify flow
      080a62e2
    • D
      UAPI: Strip _UAPI prefix on header install no matter the whitespace · 8a7eab2b
      David Howells 提交于
      Commit 56c176c9 ("UAPI: strip the _UAPI prefix from header guards
      during header installation") strips the _UAPI prefix from header guards,
      but only if there's a single space between the cpp directive and the
      label.
      
      Make it more flexible and able to handle tabs and multiple white space
      characters.
      Signed-off-by: NDavid Howells <dhowell@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      8a7eab2b
    • D
      UAPI: Remove empty Kbuild files · 3d33fcc1
      David Howells 提交于
      Empty files can get deleted by the patch program, so remove empty Kbuild
      files and their links from the parent Kbuilds.
      Signed-off-by: NDavid Howells <dhowells@redhat.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      3d33fcc1
    • L
      Merge tag 'ecryptfs-3.8-rc2-fixes' of... · 007f6c3a
      Linus Torvalds 提交于
      Merge tag 'ecryptfs-3.8-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs
      
      Pull ecryptfs fixes from Tyler Hicks:
       "Two self-explanatory fixes and a third patch which improves
        performance: when overwriting a full page in the eCryptfs page cache,
        skip reading in and decrypting the corresponding lower page."
      
      * tag 'ecryptfs-3.8-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tyhicks/ecryptfs:
        fs/ecryptfs/crypto.c: make ecryptfs_encode_for_filename() static
        eCryptfs: fix to use list_for_each_entry_safe() when delete items
        eCryptfs: Avoid unnecessary disk read and data decryption during writing
      007f6c3a
    • L
      Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client · 58890c06
      Linus Torvalds 提交于
      Pull Ceph fixes from Sage Weil:
       "Two of Alex's patches deal with a race when reseting server
        connections for open RBD images, one demotes some non-fatal BUGs to
        WARNs, and my patch fixes a protocol feature bit failure path."
      
      * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/sage/ceph-client:
        libceph: fix protocol feature mismatch failure path
        libceph: WARN, don't BUG on unexpected connection states
        libceph: always reset osds when kicking
        libceph: move linger requests sooner in kick_requests()
      58890c06
    • M
      mm: mempolicy: Convert shared_policy mutex to spinlock · 42288fe3
      Mel Gorman 提交于
      Sasha was fuzzing with trinity and reported the following problem:
      
        BUG: sleeping function called from invalid context at kernel/mutex.c:269
        in_atomic(): 1, irqs_disabled(): 0, pid: 6361, name: trinity-main
        2 locks held by trinity-main/6361:
         #0:  (&mm->mmap_sem){++++++}, at: [<ffffffff810aa314>] __do_page_fault+0x1e4/0x4f0
         #1:  (&(&mm->page_table_lock)->rlock){+.+...}, at: [<ffffffff8122f017>] handle_pte_fault+0x3f7/0x6a0
        Pid: 6361, comm: trinity-main Tainted: G        W
        3.7.0-rc2-next-20121024-sasha-00001-gd95ef01-dirty #74
        Call Trace:
          __might_sleep+0x1c3/0x1e0
          mutex_lock_nested+0x29/0x50
          mpol_shared_policy_lookup+0x2e/0x90
          shmem_get_policy+0x2e/0x30
          get_vma_policy+0x5a/0xa0
          mpol_misplaced+0x41/0x1d0
          handle_pte_fault+0x465/0x6a0
      
      This was triggered by a different version of automatic NUMA balancing
      but in theory the current version is vunerable to the same problem.
      
      do_numa_page
        -> numa_migrate_prep
          -> mpol_misplaced
            -> get_vma_policy
              -> shmem_get_policy
      
      It's very unlikely this will happen as shared pages are not marked
      pte_numa -- see the page_mapcount() check in change_pte_range() -- but
      it is possible.
      
      To address this, this patch restores sp->lock as originally implemented
      by Kosaki Motohiro.  In the path where get_vma_policy() is called, it
      should not be calling sp_alloc() so it is not necessary to treat the PTL
      specially.
      Signed-off-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Tested-by: NKOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>
      Signed-off-by: NMel Gorman <mgorman@suse.de>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      42288fe3
    • L
      Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · 5439ca6b
      Linus Torvalds 提交于
      Pull ext4 bug fixes from Ted Ts'o:
       "Various bug fixes for ext4.  Perhaps the most serious bug fixed is one
        which could cause file system corruptions when performing file punch
        operations."
      
      * tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
        ext4: avoid hang when mounting non-journal filesystems with orphan list
        ext4: lock i_mutex when truncating orphan inodes
        ext4: do not try to write superblock on ro remount w/o journal
        ext4: include journal blocks in df overhead calcs
        ext4: remove unaligned AIO warning printk
        ext4: fix an incorrect comment about i_mutex
        ext4: fix deadlock in journal_unmap_buffer()
        ext4: split off ext4_journalled_invalidatepage()
        jbd2: fix assertion failure in jbd2_journal_flush()
        ext4: check dioread_nolock on remount
        ext4: fix extent tree corruption caused by hole punch
      5439ca6b
    • H
      mempolicy: remove arg from mpol_parse_str, mpol_to_str · a7a88b23
      Hugh Dickins 提交于
      Remove the unused argument (formerly no_context) from mpol_parse_str()
      and from mpol_to_str().
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      a7a88b23
    • H
      tmpfs mempolicy: fix /proc/mounts corrupting memory · f2a07f40
      Hugh Dickins 提交于
      Recently I suggested using "mount -o remount,mpol=local /tmp" in NUMA
      mempolicy testing.  Very nasty.  Reading /proc/mounts, /proc/pid/mounts
      or /proc/pid/mountinfo may then corrupt one bit of kernel memory, often
      in a page table (causing "Bad swap" or "Bad page map" warning or "Bad
      pagetable" oops), sometimes in a vm_area_struct or rbnode or somewhere
      worse.  "mpol=prefer" and "mpol=prefer:Node" are equally toxic.
      
      Recent NUMA enhancements are not to blame: this dates back to 2.6.35,
      when commit e17f74af "mempolicy: don't call mpol_set_nodemask() when
      no_context" skipped mpol_parse_str()'s call to mpol_set_nodemask(),
      which used to initialize v.preferred_node, or set MPOL_F_LOCAL in flags.
      With slab poisoning, you can then rely on mpol_to_str() to set the bit
      for node 0x6b6b, probably in the next page above the caller's stack.
      
      mpol_parse_str() is only called from shmem_parse_options(): no_context
      is always true, so call it unused for now, and remove !no_context code.
      Set v.nodes or v.preferred_node or MPOL_F_LOCAL as mpol_to_str() might
      expect.  Then mpol_to_str() can ignore its no_context argument also,
      the mpol being appropriately initialized whether contextualized or not.
      Rename its no_context unused too, and let subsequent patch remove them
      (that's not needed for stable backporting, which would involve rejects).
      
      I don't understand why MPOL_LOCAL is described as a pseudo-policy:
      it's a reasonable policy which suffers from a confusing implementation
      in terms of MPOL_PREFERRED with MPOL_F_LOCAL.  I believe this would be
      much more robust if MPOL_LOCAL were recognized in switch statements
      throughout, MPOL_F_LOCAL deleted, and MPOL_PREFERRED use the (possibly
      empty) nodes mask like everyone else, instead of its preferred_node
      variant (I presume an optimization from the days before MPOL_LOCAL).
      But that would take me too long to get right and fully tested.
      Signed-off-by: NHugh Dickins <hughd@google.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      f2a07f40
    • E
      epoll: prevent missed events on EPOLL_CTL_MOD · 128dd175
      Eric Wong 提交于
      EPOLL_CTL_MOD sets the interest mask before calling f_op->poll() to
      ensure events are not missed.  Since the modifications to the interest
      mask are not protected by the same lock as ep_poll_callback, we need to
      ensure the change is visible to other CPUs calling ep_poll_callback.
      
      We also need to ensure f_op->poll() has an up-to-date view of past
      events which occured before we modified the interest mask.  So this
      barrier also pairs with the barrier in wq_has_sleeper().
      
      This should guarantee either ep_poll_callback or f_op->poll() (or both)
      will notice the readiness of a recently-ready/modified item.
      
      This issue was encountered by Andreas Voellmy and Junchang(Jason) Wang in:
      http://thread.gmane.org/gmane.linux.kernel/1408782/Signed-off-by: NEric Wong <normalperson@yhbt.net>
      Cc: Hans Verkuil <hans.verkuil@cisco.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Davide Libenzi <davidel@xmailserver.org>
      Cc: Hans de Goede <hdegoede@redhat.com>
      Cc: Mauro Carvalho Chehab <mchehab@infradead.org>
      Cc: David Miller <davem@davemloft.net>
      Cc: Eric Dumazet <eric.dumazet@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Andreas Voellmy <andreas.voellmy@yale.edu>
      Tested-by: N"Junchang(Jason) Wang" <junchang.wang@yale.edu>
      Cc: netdev@vger.kernel.org
      Cc: linux-fsdevel@vger.kernel.org
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      128dd175
  5. 02 1月, 2013 4 次提交
  6. 31 12月, 2012 3 次提交
    • L
      Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux · 4a490b78
      Linus Torvalds 提交于
      Pull DRM update from Dave Airlie:
       "This is a bit larger due to me not bothering to do anything since
        before Xmas, and other people working too hard after I had clearly
        given up.
      
        It's got the 3 main x86 driver fixes pulls, and a bunch of tegra
        fixes, doesn't fix the Ironlake bug yet, but that does seem to be
        getting closer.
      
         - radeon: gpu reset fixes and userspace packet support
         - i915: watermark fixes, workarounds, i830/845 fix,
         - nouveau: nvd9/kepler microcode fixes, accel is now enabled and
           working, gk106 support
         - tegra: misc fixes."
      
      * 'drm-next' of git://people.freedesktop.org/~airlied/linux: (34 commits)
        Revert "drm: tegra: protect DC register access with mutex"
        drm: tegra: program only one window during modeset
        drm: tegra: clean out old gem prototypes
        drm: tegra: remove redundant tegra2_tmds_config entry
        drm: tegra: protect DC register access with mutex
        drm: tegra: don't leave clients host1x member uninitialized
        drm: tegra: fix front_porch <-> back_porch mixup
        drm/nve0/graph: fix fuc, and enable acceleration on all known chipsets
        drm/nvc0/graph: fix fuc, and enable acceleration on GF119
        drm/nouveau/bios: cache ramcfg strap on later chipsets
        drm/nouveau/mxm: silence output if no bios data
        drm/nouveau/bios: parse/display extra version component
        drm/nouveau/bios: implement opcode 0xa9
        drm/nouveau/bios: update gpio parsing apis to match current design
        drm/nouveau: initial support for GK106
        drm/radeon: add WAIT_UNTIL to evergreen VM safe reg list
        drm/i915: disable shrinker lock stealing for create_mmap_offset
        drm/i915: optionally disable shrinker lock stealing
        drm/i915: fix flags in dma buf exporting
        drm/radeon: add support for MEM_WRITE packet
        ...
      4a490b78
    • L
      Merge tag 'omap-late-cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · 8d91a42e
      Linus Torvalds 提交于
      Pull late ARM cleanups for omap from Olof Johansson:
       "From Tony Lindgren:
      
        Here are few more patches to finish the omap changes for multiplatform
        conversion that are not strictly fixes, but were too complex to do
        with the dependencies during the merge window.  Those are to move of
        serial-omap.h to platform_data, and the removal of remaining
        cpu_is_omap macro usage outside mach-omap2.
      
        Then there are several trivial fixes for typos and few minimal
        omap2plus_defconfig updates."
      
      * tag 'omap-late-cleanups' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
        arch/arm/mach-omap2/dpll3xxx.c: drop if around WARN_ON
        OMAP2: Fix a typo - replace regist with register.
        ARM/omap: use module_platform_driver macro
        ARM: OMAP2+: PMU: Remove unused header
        ARM: OMAP4: remove duplicated include from omap_hwmod_44xx_data.c
        ARM: OMAP2+: omap2plus_defconfig: enable twl4030 SoC audio
        ARM: OMAP2+: omap2plus_defconfig: Add tps65217 support
        ARM: OMAP2+: enable devtmpfs and devtmpfs automount
        ARM: OMAP2+: omap_twl: Change TWL4030_MODULE_PM_RECEIVER to TWL_MODULE_PM_RECEIVER
        ARM: OMAP2+: Drop plat/cpu.h for omap2plus
        ARM: OMAP: Split fb.c to remove last remaining cpu_is_omap usage
        MAINTAINERS: Add an entry for omap related .dts files
      8d91a42e
    • L
      Merge tag 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc · 4fe2dfab
      Linus Torvalds 提交于
      Pull ARM SoC fixes from Olof Johansson:
       "It's been quiet over the holidays, but we have had a couple of trivial
        fixes coming in for the newly introduced sunxi platform; one to add it
        to the multiplatform defconfig for build coverage, and one fixup for
        device tree strings."
      
      * tag 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc:
        sunxi: Change the machine compatible string.
        ARM: multi_v7_defconfig: Add ARCH_SUNXI
      4fe2dfab