1. 15 6月, 2015 3 次提交
    • V
      cpufreq: governor: Serialize governor callbacks · 732b6d61
      Viresh Kumar 提交于
      There are several races reported in cpufreq core around governors (only
      ondemand and conservative) by different people.
      
      There are at least two race scenarios present in governor code:
       (a) Concurrent access/updates of governor internal structures.
      
       It is possible that fields such as 'dbs_data->usage_count', etc.  are
       accessed simultaneously for different policies using same governor
       structure (i.e. CPUFREQ_HAVE_GOVERNOR_PER_POLICY flag unset). And
       because of this we can dereference bad pointers.
      
       For example consider a system with two CPUs with separate 'struct
       cpufreq_policy' instances. CPU0 governor: ondemand and CPU1: powersave.
       CPU0 switching to powersave and CPU1 to ondemand:
      	CPU0				CPU1
      
      	store*				store*
      
      	cpufreq_governor_exit()		cpufreq_governor_init()
      					dbs_data = cdata->gdbs_data;
      
      	if (!--dbs_data->usage_count)
      		kfree(dbs_data);
      
      					dbs_data->usage_count++;
      					*Bad pointer dereference*
      
       There are other races possible between EXIT and START/STOP/LIMIT as
       well. Its really complicated.
      
       (b) Switching governor state in bad sequence:
      
       For example trying to switch a governor to START state, when the
       governor is in EXIT state. There are some checks present in
       __cpufreq_governor() but they aren't sufficient as they compare events
       against 'policy->governor_enabled', where as we need to take governor's
       state into account, which can be used by multiple policies.
      
      These two issues need to be solved separately and the responsibility
      should be properly divided between cpufreq and governor core.
      
      The first problem is more about the governor core, as it needs to
      protect its structures properly. And the second problem should be fixed
      in cpufreq core instead of governor, as its all about sequence of
      events.
      
      This patch is trying to solve only the first problem.
      
      There are two types of data we need to protect,
      - 'struct common_dbs_data': No matter what, there is going to be a
        single copy of this per governor.
      - 'struct dbs_data': With CPUFREQ_HAVE_GOVERNOR_PER_POLICY flag set, we
        will have per-policy copy of this data, otherwise a single copy.
      
      Because of such complexities, the mutex present in 'struct dbs_data' is
      insufficient to solve our problem. For example we need to protect
      fetching of 'dbs_data' from different structures at the beginning of
      cpufreq_governor_dbs(), to make sure it isn't currently being updated.
      
      This can be fixed if we can guarantee serialization of event parsing
      code for an individual governor. This is best solved with a mutex per
      governor, and the placeholder for that is 'struct common_dbs_data'.
      
      And so this patch moves the mutex from 'struct dbs_data' to 'struct
      common_dbs_data' and takes it at the beginning and drops it at the end
      of cpufreq_governor_dbs().
      
      Tested with and without following configuration options:
      
      CONFIG_LOCKDEP_SUPPORT=y
      CONFIG_DEBUG_RT_MUTEXES=y
      CONFIG_DEBUG_PI_LIST=y
      CONFIG_DEBUG_SPINLOCK=y
      CONFIG_DEBUG_MUTEXES=y
      CONFIG_DEBUG_LOCK_ALLOC=y
      CONFIG_PROVE_LOCKING=y
      CONFIG_LOCKDEP=y
      CONFIG_DEBUG_ATOMIC_SLEEP=y
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Reviewed-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      732b6d61
    • V
      cpufreq: governor: split cpufreq_governor_dbs() · 714a2d9c
      Viresh Kumar 提交于
      cpufreq_governor_dbs() is hardly readable, it is just too big and
      complicated. Lets make it more readable by splitting out event specific
      routines.
      
      Order of statements is changed at few places, but that shouldn't bring
      any functional change.
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Reviewed-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      714a2d9c
    • V
      cpufreq: governor: register notifier from cs_init() · 8e0484d2
      Viresh Kumar 提交于
      Notifiers are required only for conservative governor and the common
      governor code is unnecessarily polluted with that. Handle that from
      cs_init/exit() instead of cpufreq_governor_dbs().
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Reviewed-by: NPreeti U Murthy <preeti@linux.vnet.ibm.com>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      8e0484d2
  2. 11 6月, 2015 5 次提交
    • V
      cpufreq: Remove cpufreq_update_policy() · 37829029
      Viresh Kumar 提交于
      cpufreq_update_policy() was kept as a separate routine earlier as it was
      handling migration of sysfs directories, which isn't the case anymore.
      It is only updating policy->cpu now and is called by a single caller.
      
      The WARN_ON() isn't really required anymore, as we are just updating the
      cpu now, not moving the sysfs directories.
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      37829029
    • V
      cpufreq: Restart governor as soon as possible · 9591becb
      Viresh Kumar 提交于
      __cpufreq_remove_dev_finish() is doing two things today:
      - Restarts the governor if some CPUs from concerned policy are still
        online.
      - Frees the policy if all CPUs are offline.
      
      The first task of restarting the governor can be moved to
      __cpufreq_remove_dev_prepare() to restart the governor early. There is
      no race between _prepare() and _finish() as they would be handling
      completely different cases. _finish() will only be required if we are
      going to free the policy and that has nothing to do with restarting the
      governor.
      Original-by: NSaravana Kannan <skannan@codeaurora.org>
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      9591becb
    • V
      cpufreq: Call cpufreq_policy_put_kobj() from cpufreq_policy_free() · 3654c5cc
      Viresh Kumar 提交于
      cpufreq_policy_put_kobj() is actually part of freeing the policy and can
      be called from cpufreq_policy_free() directly instead of a separate
      call.
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      3654c5cc
    • V
      cpufreq: Initialize policy->kobj while allocating policy · 2fc3384d
      Viresh Kumar 提交于
      policy->kobj is required to be initialized once in the lifetime of a
      policy.  Currently we are initializing it from __cpufreq_add_dev() and
      that doesn't look to be the best place for doing so as we have to do
      this on special cases (like: !recover_policy).
      
      We can initialize it from a more obvious place cpufreq_policy_alloc()
      and that will make code look cleaner, specially the error handling part.
      
      The error handling part of __cpufreq_add_dev() was doing almost the same
      thing while recover_policy is true or false. Fix that as well by always
      calling cpufreq_policy_put_kobj() with an additional parameter to skip
      notification part of it.
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      2fc3384d
    • V
      cpufreq: Stop migrating sysfs files on hotplug · 87549141
      Viresh Kumar 提交于
      When we hot-unplug a cpu, we remove its sysfs cpufreq directory and if
      the outgoing cpu was the owner of policy->kobj earlier then we migrate
      the sysfs directory to under another online cpu.
      
      There are few disadvantages this brings:
      - Code Complexity
      - Slower hotplug/suspend/resume
      - sysfs file permissions are reset after all policy->cpus are offlined
      - CPUFreq stats history lost after all policy->cpus are offlined
      - Special management of sysfs stuff during suspend/resume
      
      To overcome these, this patch modifies the way sysfs directories are
      managed:
      - Select sysfs kobjects owner while initializing policy and don't change
        it during hotplugs. Track it with kobj_cpu created earlier.
      
      - Create symlinks for all related CPUs (can be offline) instead of
        affected CPUs on policy initialization and remove them only when the
        policy is freed.
      
      - Free policy structure only on the removal of cpufreq-driver and not
        during hotplug/suspend/resume, detected by checking 'struct
        subsys_interface *' (Valid only when called from
        subsys_interface_unregister() while unregistering driver).
      
      Apart from this, special care is taken to handle physical hoplug of CPUs
      as we wouldn't remove sysfs links or remove policies on logical
      hotplugs. Physical hotplug happens in the following sequence.
      
      Hot removal:
      - CPU is offlined first, ~ 'echo 0 >
        /sys/devices/system/cpu/cpuX/online'
      - Then its device is removed along with all sysfs files, cpufreq core
        notified with cpufreq_remove_dev() callback from subsys-interface..
      
      Hot addition:
      - First the device along with its sysfs files is added, cpufreq core
        notified with cpufreq_add_dev() callback from subsys-interface..
      - CPU is onlined, ~ 'echo 1 > /sys/devices/system/cpu/cpuX/online'
      
      We call the same routines with both hotplug and subsys callbacks, and we
      sense physical hotplug with cpu_offline() check in subsys callback. We
      can handle most of the stuff with regular hotplug callback paths and
      add/remove cpufreq sysfs links or free policy from subsys callbacks.
      Original-by: NSaravana Kannan <skannan@codeaurora.org>
      Signed-off-by: NViresh Kumar <viresh.kumar@linaro.org>
      Signed-off-by: NRafael J. Wysocki <rafael.j.wysocki@intel.com>
      87549141
  3. 10 6月, 2015 3 次提交
  4. 23 5月, 2015 2 次提交
  5. 15 5月, 2015 7 次提交
  6. 13 5月, 2015 1 次提交
  7. 08 5月, 2015 5 次提交
  8. 05 5月, 2015 5 次提交
  9. 02 5月, 2015 1 次提交
    • I
      rbd: end I/O the entire obj_request on error · 082a75da
      Ilya Dryomov 提交于
      When we end I/O struct request with error, we need to pass
      obj_request->length as @nr_bytes so that the entire obj_request worth
      of bytes is completed.  Otherwise block layer ends up confused and we
      trip on
      
          rbd_assert(more ^ (which == img_request->obj_request_count));
      
      in rbd_img_obj_callback() due to more being true no matter what.  We
      already do it in most cases but we are missing some, in particular
      those where we don't even get a chance to submit any obj_requests, due
      to an early -ENOMEM for example.
      
      A number of obj_request->xferred assignments seem to be redundant but
      I haven't touched any of obj_request->xferred stuff to keep this small
      and isolated.
      
      Cc: Alex Elder <elder@linaro.org>
      Cc: stable@vger.kernel.org # 3.10+
      Reported-by: NShawn Edwards <lesser.evil@gmail.com>
      Reviewed-by: NSage Weil <sage@redhat.com>
      Signed-off-by: NIlya Dryomov <idryomov@gmail.com>
      082a75da
  10. 01 5月, 2015 5 次提交
  11. 30 4月, 2015 3 次提交
    • M
      dm: fix free_rq_clone() NULL pointer when requeueing unmapped request · aa6df8dd
      Mike Snitzer 提交于
      Commit 02233342 ("dm: optimize dm_mq_queue_rq to _not_ use kthread if
      using pure blk-mq") mistakenly removed free_rq_clone()'s clone->q check
      before testing clone->q->mq_ops.  It was an oversight to discontinue
      that check for 1 of the 2 use-cases for free_rq_clone():
      1) free_rq_clone() called when an unmapped original request is requeued
      2) free_rq_clone() called in the request-based IO completion path
      
      The clone->q check made sense for case #1 but not for #2.  However, we
      cannot just reinstate the check as it'd mask a serious bug in the IO
      completion case #2 -- no in-flight request should have an uninitialized
      request_queue (basic block layer refcounting _should_ ensure this).
      
      The NULL pointer seen for case #1 is detailed here:
      https://www.redhat.com/archives/dm-devel/2015-April/msg00160.html
      
      Fix this free_rq_clone() NULL pointer by simply checking if the
      mapped_device's type is DM_TYPE_MQ_REQUEST_BASED (clone's queue is
      blk-mq) rather than checking clone->q->mq_ops.  This avoids the need to
      dereference clone->q, but a WARN_ON_ONCE is added to let us know if an
      uninitialized clone request is being completed.
      Reported-by: NBart Van Assche <bart.vanassche@sandisk.com>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      aa6df8dd
    • C
      dm: only initialize the request_queue once · 3e6180f0
      Christoph Hellwig 提交于
      Commit bfebd1cd ("dm: add full blk-mq support to request-based DM")
      didn't properly account for the need to short-circuit re-initializing
      DM's blk-mq request_queue if it was already initialized.
      
      Otherwise, reloading a blk-mq request-based DM table (either manually
      or via multipathd) resulted in errors, see:
       https://www.redhat.com/archives/dm-devel/2015-April/msg00132.html
      
      Fix is to only initialize the request_queue on the initial table load
      (when the mapped_device type is assigned).
      
      This is better than having dm_init_request_based_blk_mq_queue() return
      early if the queue was already initialized because it elevates the
      constraint to a more meaningful location in DM core.  As such the
      pre-existing early return in dm_init_request_based_queue() can now be
      removed.
      
      Fixes: bfebd1cd ("dm: add full blk-mq support to request-based DM")
      Signed-off-by: NChristoph Hellwig <hch@lst.de>
      Signed-off-by: NMike Snitzer <snitzer@redhat.com>
      3e6180f0
    • H
      cxgb4: Fix MC1 memory offset calculation · 7f0b8a56
      Hariprasad Shenai 提交于
      Commit 6559a7e8 ("cxgb4: Cleanup macros so they follow the same
      style and look consistent") introduced a regression where reading MC1
      memory in adapters where MC0 isn't present or MC0 size is not equal to MC1
      size caused the adapter to crash due to incorrect computation of memoffset.
      Fix is to read the size of MC0 instead of MC1 for offset calculation
      Signed-off-by: NSteve Wise <swise@opengridcomputing.com>
      Signed-off-by: NHariprasad Shenai <hariprasad@chelsio.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      7f0b8a56