1. 20 3月, 2018 6 次提交
  2. 15 3月, 2018 1 次提交
    • L
      firmware: enable to split firmware_class into separate target files · ad4365f1
      Luis R. Rodriguez 提交于
      The firmware loader code has grown quite a bit over the years.
      The practice of stuffing everything we need into one file makes
      the code hard to follow.
      
      In order to split the firmware loader code into different components
      we must pick a module name and a first object target file. We must
      keep the firmware_class name to remain compatible with scripts which
      have been relying on the sysfs loader path for years, so the old module
      name stays. We can however rename the C file without affecting the
      module name.
      
      The firmware_class used to represent the idea that the code was a simple
      sysfs firmware loader, provided by the struct class firmware_class.
      The sysfs firmware loader used to be the default, today its only the
      fallback mechanism.
      
      This only renames the target code then to make emphasis of what the code
      does these days. With this change new features can also use a new object
      files.
      Signed-off-by: NLuis R. Rodriguez <mcgrof@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      ad4365f1
  3. 08 12月, 2017 1 次提交
    • G
      driver core: add SPDX identifiers to all driver core files · 989d42e8
      Greg Kroah-Hartman 提交于
      It's good to have SPDX identifiers in all files to make it easier to
      audit the kernel tree for correct licenses.
      
      Update the driver core files files with the correct SPDX license
      identifier based on the license text in the file itself.  The SPDX
      identifier is a legally binding shorthand, which can be used instead of
      the full boiler plate text.
      
      This work is based on a script and data from Thomas Gleixner, Philippe
      Ombredanne, and Kate Stewart.
      
      Cc: Johannes Berg <johannes@sipsolutions.net>
      Cc: "Luis R. Rodriguez" <mcgrof@kernel.org>
      Cc: William Breathitt Gray <vilhelm.gray@gmail.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Kate Stewart <kstewart@linuxfoundation.org>
      Cc: Philippe Ombredanne <pombredanne@nexb.com>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      989d42e8
  4. 29 11月, 2017 16 次提交
  5. 11 9月, 2017 1 次提交
  6. 11 8月, 2017 6 次提交
    • L
      firmware: enable a debug print for batched requests · 30172bea
      Luis R. Rodriguez 提交于
      Otherwise there is no easy way this actually happened.
      Signed-off-by: NLuis R. Rodriguez <mcgrof@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      30172bea
    • L
      firmware: define pr_fmt · 73da4b4b
      Luis R. Rodriguez 提交于
      For some reason we have always forgotten this. Without this
      we don't get a nice prefix on our pr_debug() / pr_*() messages.
      Signed-off-by: NLuis R. Rodriguez <mcgrof@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      73da4b4b
    • L
      firmware: send -EINTR on signal abort on fallback mechanism · 76098b36
      Luis R. Rodriguez 提交于
      Right now we send -EAGAIN to a syfs write which got interrupted.
      Userspace can't tell what happened though, send -EINTR if we
      were killed due to a signal so userspace can tell things apart.
      
      This is only applicable to the fallback mechanism.
      Reported-by: NMartin Fuzzey <mfuzzey@parkeon.com>
      Signed-off-by: NLuis R. Rodriguez <mcgrof@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      76098b36
    • L
      firmware: avoid invalid fallback aborts by using killable wait · 260d9f2f
      Luis R. Rodriguez 提交于
      Commit 0cb64249 ("firmware_loader: abort request if wait_for_completion
      is interrupted") added via 4.0 added support to abort the fallback mechanism
      when a signal was detected and wait_for_completion_interruptible() returned
      -ERESTARTSYS -- for instance when a user hits CTRL-C. The abort was overly
      *too* effective.
      
      When a child process terminates (successful or not) the signal SIGCHLD can
      be sent to the parent process which ran the child in the background and
      later triggered a sync request for firmware through a sysfs interface which
      relies on the fallback mechanism. This signal in turn can be recieved by the
      interruptible wait we constructed on firmware_class and detects it as an
      abort *before* userspace could get a chance to write the firmware. Upon
      failure -EAGAIN is returned, so userspace is also kept in the dark about
      exactly what happened.
      
      We can reproduce the issue with the fw_fallback.sh selftest:
      
      Before this patch:
      $ sudo tools/testing/selftests/firmware/fw_fallback.sh
      ...
      tools/testing/selftests/firmware/fw_fallback.sh: error - sync firmware request cancelled due to SIGCHLD
      
      After this patch:
      $ sudo tools/testing/selftests/firmware/fw_fallback.sh
      ...
      tools/testing/selftests/firmware/fw_fallback.sh: SIGCHLD on sync ignored as expected
      
      Fix this by making the wait killable -- only killable by SIGKILL (kill -9).
      We loose the ability to allow userspace to cancel a write with CTRL-C
      (SIGINT), however its been decided the compromise to require SIGKILL is
      worth the gains.
      
      Chances of this issue occuring are low due to the number of drivers upstream
      exclusively relying on the fallback mechanism for firmware (2 drivers),
      however this is observed in the field with custom drivers with sysfs
      triggers to load firmware. Only distributions relying on the fallback
      mechanism are impacted as well. An example reported issue was on Android,
      as follows:
      
      1) Android init (pid=1) fork()s (say pid=42) [this child process is totally
         unrelated to firmware loading, it could be sleep 2; for all we care ]
      2) Android init (pid=1) does a write() on a (driver custom) sysfs file which
         ends up calling request_firmware() kernel side
      3) The firmware loading fallback mechanism is used, the request is sent to
         userspace and pid 1 waits in the kernel on wait_*
      4) before firmware loading completes pid 42 dies (for any reason, even
         normal termination)
      5) Kernel delivers SIGCHLD to pid=1 to tell it a child has died, which
         causes -ERESTARTSYS to be returned from wait_*
      6) The kernel's wait aborts and return -EAGAIN for the
         request_firmware() caller.
      
      Cc: stable <stable@vger.kernel.org> # 4.0
      Fixes: 0cb64249 ("firmware_loader: abort request if wait_for_completion is interrupted")
      Suggested-by: N"Eric W. Biederman" <ebiederm@xmission.com>
      Suggested-by: NDmitry Torokhov <dmitry.torokhov@gmail.com>
      Tested-by: NMartin Fuzzey <mfuzzey@parkeon.com>
      Reported-by: NMartin Fuzzey <mfuzzey@parkeon.com>
      Signed-off-by: NLuis R. Rodriguez <mcgrof@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      260d9f2f
    • L
      firmware: fix batched requests - send wake up on failure on direct lookups · 90d41e74
      Luis R. Rodriguez 提交于
      Fix batched requests from waiting forever on failure.
      
      The firmware API batched requests feature has been broken since the API call
      request_firmware_direct() was introduced on commit bba3a87e ("firmware:
      Introduce request_firmware_direct()"), added on v3.14 *iff* the firmware
      being requested was not present in *certain kernel builds* [0].
      
      When no firmware is found the worker which goes on to finish never informs
      waiters queued up of this, so any batched request will stall in what seems
      to be forever (MAX_SCHEDULE_TIMEOUT). Sadly, a reboot will also stall, as
      the reboot notifier was only designed to kill custom fallback workers. The
      issue seems to the user as a type of soft lockup, what *actually* happens
      underneath the hood is a wait call which never completes as we failed to
      issue a completion on error.
      
      For device drivers with optional firmware schemes (ie, Intel iwlwifi, or
      Netronome -- even though it uses request_firmware() and not
      request_firmware_direct()), this could mean that when you boot a system with
      multiple cards the firmware will seem to never load on the system, or that
      the card is just not responsive even the driver initialization. Due to
      differences in scheduling possible this should not always trigger --
      one would need to to ensure that multiple requests are in place at the
      right time for this to work, also release_firmware() must not be called
      prior to any other incoming request. The complexity may not be worth
      supporting batched requests in the future given the wait mechanism is
      only used also for the fallback mechanism. We'll keep it for now and
      just fix it.
      
      Its reported that at least with the Intel WiFi cards on one system this
      issue was creeping up 50% of the boots [0].
      
      Before this commit batched requests testing revealed:
      ============================================================================
      CONFIG_FW_LOADER_USER_HELPER_FALLBACK=n
      CONFIG_FW_LOADER_USER_HELPER=y
      
      Most common Linux distribution setup.
      
      API-type                               no-firmware-found   firmware-found
      ----------------------------------------------------------------------
      request_firmware()                     FAIL                OK
      request_firmware_direct()              FAIL                OK
      request_firmware_nowait(uevent=true)   FAIL                OK
      request_firmware_nowait(uevent=false)  FAIL                OK
      ============================================================================
      CONFIG_FW_LOADER_USER_HELPER_FALLBACK=n
      CONFIG_FW_LOADER_USER_HELPER=n
      
      Only possible if CONFIG_DELL_RBU=n and CONFIG_LEDS_LP55XX_COMMON=n, rare.
      
      API-type                               no-firmware-found   firmware-found
      ----------------------------------------------------------------------
      request_firmware()                     FAIL                OK
      request_firmware_direct()              FAIL                OK
      request_firmware_nowait(uevent=true)   FAIL                OK
      request_firmware_nowait(uevent=false)  FAIL                OK
      ============================================================================
      CONFIG_FW_LOADER_USER_HELPER_FALLBACK=y
      CONFIG_FW_LOADER_USER_HELPER=y
      
      Google Android setup.
      
      API-type                               no-firmware-found   firmware-found
      ----------------------------------------------------------------------
      request_firmware()                     OK                  OK
      request_firmware_direct()              FAIL                OK
      request_firmware_nowait(uevent=true)   OK                  OK
      request_firmware_nowait(uevent=false)  OK                  OK
      ============================================================================
      
      Ater this commit batched testing results:
      ============================================================================
      CONFIG_FW_LOADER_USER_HELPER_FALLBACK=n
      CONFIG_FW_LOADER_USER_HELPER=y
      
      Most common Linux distribution setup.
      
      API-type                               no-firmware-found   firmware-found
      ----------------------------------------------------------------------
      request_firmware()                     OK                  OK
      request_firmware_direct()              OK                  OK
      request_firmware_nowait(uevent=true)   OK                  OK
      request_firmware_nowait(uevent=false)  OK                  OK
      ============================================================================
      CONFIG_FW_LOADER_USER_HELPER_FALLBACK=n
      CONFIG_FW_LOADER_USER_HELPER=n
      
      Only possible if CONFIG_DELL_RBU=n and CONFIG_LEDS_LP55XX_COMMON=n, rare.
      
      API-type                               no-firmware-found   firmware-found
      ----------------------------------------------------------------------
      request_firmware()                     OK                  OK
      request_firmware_direct()              OK                  OK
      request_firmware_nowait(uevent=true)   OK                  OK
      request_firmware_nowait(uevent=false)  OK                  OK
      ============================================================================
      CONFIG_FW_LOADER_USER_HELPER_FALLBACK=y
      CONFIG_FW_LOADER_USER_HELPER=y
      
      Google Android setup.
      
      API-type                               no-firmware-found   firmware-found
      ----------------------------------------------------------------------
      request_firmware()                     OK                  OK
      request_firmware_direct()              OK                  OK
      request_firmware_nowait(uevent=true)   OK                  OK
      request_firmware_nowait(uevent=false)  OK                  OK
      ============================================================================
      
      [0] https://bugzilla.kernel.org/show_bug.cgi?id=195477
      
      Cc: stable <stable@vger.kernel.org> # v3.14
      Fixes: bba3a87e ("firmware: Introduce request_firmware_direct()"
      Reported-by: NNicolas <nbroeking@me.com>
      Reported-by: NJohn Ewalt  <jewalt@lgsinnovations.com>
      Reported-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NLuis R. Rodriguez <mcgrof@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      90d41e74
    • L
      firmware: fix batched requests - wake all waiters · e44565f6
      Luis R. Rodriguez 提交于
      The firmware cache mechanism serves two purposes, the secondary purpose is
      not well documented nor understood. This fixes a regression with the
      secondary purpose of the firmware cache mechanism: batched requests on
      successful lookups. Without this fix *any* time a batched request is
      triggered, secondary requests for which the batched request mechanism
      was designed for will seem to last forver and seem to never return.
      This issue is present for all kernel builds possible, and a hard reset
      is required.
      
      The firmware cache is used for:
      
      1) Addressing races with file lookups during the suspend/resume cycle
         by keeping firmware in memory during the suspend/resume cycle
      
      2) Batched requests for the same file rely only on work from the first file
         lookup, which keeps the firmware in memory until the last
         release_firmware() is called
      
      Batched requests *only* take effect if secondary requests come in prior to
      the first user calling release_firmware(). The devres name used for the
      internal firmware cache is used as a hint other pending requests are
      ongoing, the firmware buffer data is kept in memory until the last user of
      the buffer calls release_firmware(), therefore serializing requests and
      delaying the release until all requests are done.
      
      Batched requests wait for a wakup or signal so we can rely on the first file
      fetch to write to the pending secondary requests. Commit 5b029624
      ("firmware: do not use fw_lock for fw_state protection") ported the firmware
      API to use swait, and in doing so failed to convert complete_all() to
      swake_up_all() -- it used swake_up(), loosing the ability for *some* batched
      requests to take effect.
      
      We *could* fix this by just using swake_up_all() *but* swait is now known
      to be very special use case, so its best to just move away from it. So we
      just go back to using completions as before commit 5b029624 ("firmware:
      do not use fw_lock for fw_state protection") given this was using
      complete_all().
      
      Without this fix it has been reported plugging in two Intel 6260 Wifi cards
      on a system will end up enumerating the two devices only 50% of the time
      [0]. The ported swake_up() should have actually handled the case with two
      devices, however, *if more than two cards are used* the swake_up() would
      not have sufficed. This change is only part of the required fixes for
      batched requests. Another fix is provided in the next patch.
      
      This particular change should fix the cases where more than three requests
      with the same firmware name is used, otherwise batched requests will wait
      for MAX_SCHEDULE_TIMEOUT and just timeout eventually.
      
      Below is a summary of tests triggering batched requests on different
      kernel builds.
      
      Before this patch:
      ============================================================================
      CONFIG_FW_LOADER_USER_HELPER_FALLBACK=n
      CONFIG_FW_LOADER_USER_HELPER=y
      
      Most common Linux distribution setup.
      
      API-type                               no-firmware-found   firmware-found
      ----------------------------------------------------------------------
      request_firmware()                     FAIL                FAIL
      request_firmware_direct()              FAIL                FAIL
      request_firmware_nowait(uevent=true)   FAIL                FAIL
      request_firmware_nowait(uevent=false)  FAIL                FAIL
      ============================================================================
      CONFIG_FW_LOADER_USER_HELPER_FALLBACK=n
      CONFIG_FW_LOADER_USER_HELPER=n
      
      Only possible if CONFIG_DELL_RBU=n and CONFIG_LEDS_LP55XX_COMMON=n, rare.
      
      API-type                               no-firmware-found   firmware-found
      ----------------------------------------------------------------------
      request_firmware()                     FAIL                FAIL
      request_firmware_direct()              FAIL                FAIL
      request_firmware_nowait(uevent=true)   FAIL                FAIL
      request_firmware_nowait(uevent=false)  FAIL                FAIL
      ============================================================================
      CONFIG_FW_LOADER_USER_HELPER_FALLBACK=y
      CONFIG_FW_LOADER_USER_HELPER=y
      
      Google Android setup.
      
      API-type                               no-firmware-found   firmware-found
      ----------------------------------------------------------------------
      request_firmware()                     FAIL                FAIL
      request_firmware_direct()              FAIL                FAIL
      request_firmware_nowait(uevent=true)   FAIL                FAIL
      request_firmware_nowait(uevent=false)  FAIL                FAIL
      ============================================================================
      
      After this patch:
      ============================================================================
      CONFIG_FW_LOADER_USER_HELPER_FALLBACK=n
      CONFIG_FW_LOADER_USER_HELPER=y
      
      Most common Linux distribution setup.
      
      API-type                               no-firmware-found   firmware-found
      ----------------------------------------------------------------------
      request_firmware()                     FAIL                OK
      request_firmware_direct()              FAIL                OK
      request_firmware_nowait(uevent=true)   FAIL                OK
      request_firmware_nowait(uevent=false)  FAIL                OK
      ============================================================================
      CONFIG_FW_LOADER_USER_HELPER_FALLBACK=n
      CONFIG_FW_LOADER_USER_HELPER=n
      
      Only possible if CONFIG_DELL_RBU=n and CONFIG_LEDS_LP55XX_COMMON=n, rare.
      
      API-type                               no-firmware-found   firmware-found
      ----------------------------------------------------------------------
      request_firmware()                     FAIL                OK
      request_firmware_direct()              FAIL                OK
      request_firmware_nowait(uevent=true)   FAIL                OK
      request_firmware_nowait(uevent=false)  FAIL                OK
      ============================================================================
      CONFIG_FW_LOADER_USER_HELPER_FALLBACK=y
      CONFIG_FW_LOADER_USER_HELPER=y
      
      Google Android setup.
      
      API-type                               no-firmware-found   firmware-found
      ----------------------------------------------------------------------
      request_firmware()                     OK                  OK
      request_firmware_direct()              FAIL                OK
      request_firmware_nowait(uevent=true)   OK                  OK
      request_firmware_nowait(uevent=false)  OK                  OK
      ============================================================================
      
      [0] https://bugzilla.kernel.org/show_bug.cgi?id=195477
      
      CC: <stable@vger.kernel.org>    [4.10+]
      Cc: Ming Lei <ming.lei@redhat.com>
      Fixes: 5b029624 ("firmware: do not use fw_lock for fw_state protection")
      Reported-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Signed-off-by: NLuis R. Rodriguez <mcgrof@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      e44565f6
  7. 03 6月, 2017 6 次提交
    • L
      firmware: move umh try locks into the umh code · 06a45a93
      Luis R. Rodriguez 提交于
      This moves the usermode helper locks into only code paths that use the
      usermode helper API from the kernel. The usermode helper locks were
      originally added to prevent stalling suspend, later the firmware cache
      was added to help with this, and further later direct filesystem lookup
      was added by Linus to completely bypass udev due to the amount of issues
      the umh approach had.
      
      The usermode helper locks were kept even when the direct filesystem lookup
      mechanism is used though. A lot has changed since the original usermode
      helper locks were added but the recent commit which added the code for
      firmware_enabled() are intended to address any possible races cured only
      as collateral by using the locks as though side consequence of code
      evolution and this not being addressed any time sooner. With the
      firmware_enabled() code in place we are a bit more sure to move the
      usermode helper locks to UMH only code.
      
      There is a bit of history here so let's recap a bit of it to ensure nothing
      is lost and things are clear. The direct filesystem approach to loading
      firmware is rather new, it was added via commit abb139e7 ("firmware:
      teach the kernel to load firmware files directly from the filesystem") by
      Linus merged on the v3.7 release, to enable to bypass udev.
      
      usermodehelper_read_lock_wait() was added earlier via commit 9b78c1da
      ("firmware_class: Do not warn that system is not ready from async loads")
      merged on v3.4, after Rafael noted that the async firmware API call
      request_firmware_nowait() should not be penalized to fail if userspace is
      not available yet or frozen, it'd allow for a timeout grace period before
      giving up. The WARN_ON() was kept for the sync firmware API call though on
      request_firmware(). At this time there was no direct filesystem lookup for
      firmware though.
      
      The original usermode helper lock came from commit a144c6a6 ("PM:
      Print a warning if firmware is requested when tasks are frozen") merged on
      the v3.0 kernel by Rafael to print a warning back when firmware requests
      were used on resume(), thaw() or restore() callbacks and there was no
      direct fs lookups or the firmware cache.
      Signed-off-by: NLuis R. Rodriguez <mcgrof@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      06a45a93
    • L
      firmware: move assign_firmware_buf() further up · 8509adca
      Luis R. Rodriguez 提交于
      This will make subsequent changes easier to read.
      Signed-off-by: NLuis R. Rodriguez <mcgrof@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8509adca
    • L
      firmware: add sanity check on shutdown/suspend · 81f95076
      Luis R. Rodriguez 提交于
      The firmware API should not be used after we go to suspend
      and after we reboot/halt. The suspend/resume case is a bit
      complex, so this documents that so things are clearer.
      
      We want to know about users of the API in incorrect places so
      that their callers are corrected, so this also adds a warn
      for those cases.
      Signed-off-by: NLuis R. Rodriguez <mcgrof@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      81f95076
    • L
      firmware: always enable the reboot notifier · a669f04a
      Luis R. Rodriguez 提交于
      Now that we've have proper wrappers for the fallback mechanism
      we can easily share the reboot notifier for the firmware_class
      at all times.
      
      This change will make subsequent modifications to the reboot
      notifier easier to review.
      Signed-off-by: NLuis R. Rodriguez <mcgrof@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      a669f04a
    • L
      firmware: share fw fallback killing on reboot/suspend · c4b76893
      Luis R. Rodriguez 提交于
      We kill pending fallback requests on suspend and reboot,
      the only difference is that on suspend we only kill custom
      fallback requests. Provide a wrapper that lets us customize
      the request with a flag.
      
      This also lets us simplify the #ifdef'ery over the calls.
      Signed-off-by: NLuis R. Rodriguez <mcgrof@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      c4b76893
    • L
      firmware: move kill_requests_without_uevent() up above · 6383331d
      Luis R. Rodriguez 提交于
      This routine will used in functions declared earlier next. This
      code shift has no functional changes, it will make subsequent
      changes easier to read.
      Signed-off-by: NLuis R. Rodriguez <mcgrof@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6383331d
  8. 27 1月, 2017 1 次提交
    • L
      firmware: fix NULL pointer dereference in __fw_load_abort() · 191e885a
      Luis R. Rodriguez 提交于
      Since commit 5d47ec02 ("firmware: Correct handling of
      fw_state_wait() return value") fw_load_abort() could be called twice and
      lead us to a kernel crash. This happens only when the firmware fallback
      mechanism (regular or custom) is used. The fallback mechanism exposes a
      sysfs interface for userspace to upload a file and notify the kernel
      when the file is loaded and ready, or to cancel an upload by echo'ing -1
      into on the loading file:
      
      echo -n "-1" > /sys/$DEVPATH/loading
      
      This will call fw_load_abort(). Some distributions actually have a udev
      rule in place to *always* immediately cancel all firmware fallback
      mechanism requests (Debian), they have:
      
        $ cat /lib/udev/rules.d/50-firmware.rules
        # stub for immediately telling the kernel that userspace firmware loading
        # failed; necessary to avoid long timeouts with CONFIG_FW_LOADER_USER_HELPER=y
        SUBSYSTEM=="firmware", ACTION=="add", ATTR{loading}="-1
      
      Distributions with this udev rule would run into this crash only if the
      fallback mechanism is used. Since most distributions disable by default
      using the fallback mechanism (CONFIG_FW_LOADER_USER_HELPER_FALLBACK),
      this would typicaly mean only 2 drivers which *require* the fallback
      mechanism could typically incur a crash: drivers/firmware/dell_rbu.c and
      the drivers/leds/leds-lp55xx-common.c driver. Distributions enabling
      CONFIG_FW_LOADER_USER_HELPER_FALLBACK by default are obviously more
      exposed to this crash.
      
      The crash happens because after commit 5b029624 ("firmware: do not
      use fw_lock for fw_state protection") and subsequent fix commit
      5d47ec02 ("firmware: Correct handling of fw_state_wait() return
      value") a race can happen between this cancelation and the firmware
      fw_state_wait_timeout() being woken up after a state change with which
      fw_load_abort() as that calls swake_up(). Upon error
      fw_state_wait_timeout() will also again call fw_load_abort() and trigger
      a null reference.
      
      At first glance we could just fix this with a !buf check on
      fw_load_abort() before accessing buf->fw_st, however there is a logical
      issue in having a state machine used for the fallback mechanism and
      preventing access from it once we abort as its inside the buf
      (buf->fw_st).
      
      The firmware_class.c code is setting the buf to NULL to annotate an
      abort has occurred. Replace this mechanism by simply using the state
      check instead. All the other code in place already uses similar checks
      for aborting as well so no further changes are needed.
      
      An oops can be reproduced with the new fw_fallback.sh fallback mechanism
      cancellation test. Either cancelling the fallback mechanism or the
      custom fallback mechanism triggers a crash.
      
      mcgrof@piggy ~/linux-next/tools/testing/selftests/firmware
      (git::20170111-fw-fixes)$ sudo ./fw_fallback.sh
      
      ./fw_fallback.sh: timeout works
      ./fw_fallback.sh: firmware comparison works
      ./fw_fallback.sh: fallback mechanism works
      
      [ this then sits here when it is trying the cancellation test ]
      
      Kernel log:
      
      test_firmware: loading 'nope-test-firmware.bin'
      misc test_firmware: Direct firmware load for nope-test-firmware.bin failed with error -2
      misc test_firmware: Falling back to user helper
      BUG: unable to handle kernel NULL pointer dereference at 0000000000000038
      IP: _request_firmware+0xa27/0xad0
      PGD 0
      
      Oops: 0000 [#1] SMP
      Modules linked in: test_firmware(E) ... etc ...
      CPU: 1 PID: 1396 Comm: fw_fallback.sh Tainted: G        W E   4.10.0-rc3-next-20170111+ #30
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.10.1-0-g8891697-prebuilt.qemu-project.org 04/01/2014
      task: ffff9740b27f4340 task.stack: ffffbb15c0bc8000
      RIP: 0010:_request_firmware+0xa27/0xad0
      RSP: 0018:ffffbb15c0bcbd10 EFLAGS: 00010246
      RAX: 00000000fffffffe RBX: ffff9740afe5aa80 RCX: 0000000000000000
      RDX: ffff9740b27f4340 RSI: 0000000000000283 RDI: 0000000000000000
      RBP: ffffbb15c0bcbd90 R08: ffffbb15c0bcbcd8 R09: 0000000000000000
      R10: 0000000894a0d4b1 R11: 000000000000008c R12: ffffffffc0312480
      R13: 0000000000000005 R14: ffff9740b1c32400 R15: 00000000000003e8
      FS:  00007f8604422700(0000) GS:ffff9740bfc80000(0000) knlGS:0000000000000000
      CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      CR2: 0000000000000038 CR3: 000000012164c000 CR4: 00000000000006e0
      Call Trace:
       request_firmware+0x37/0x50
       trigger_request_store+0x79/0xd0 [test_firmware]
       dev_attr_store+0x18/0x30
       sysfs_kf_write+0x37/0x40
       kernfs_fop_write+0x110/0x1a0
       __vfs_write+0x37/0x160
       ? _cond_resched+0x1a/0x50
       vfs_write+0xb5/0x1a0
       SyS_write+0x55/0xc0
       ? trace_do_page_fault+0x37/0xd0
       entry_SYSCALL_64_fastpath+0x1e/0xad
      RIP: 0033:0x7f8603f49620
      RSP: 002b:00007fff6287b788 EFLAGS: 00000246 ORIG_RAX: 0000000000000001
      RAX: ffffffffffffffda RBX: 000055c307b110a0 RCX: 00007f8603f49620
      RDX: 0000000000000016 RSI: 000055c3084d8a90 RDI: 0000000000000001
      RBP: 0000000000000016 R08: 000000000000c0ff R09: 000055c3084d6336
      R10: 000055c307b108b0 R11: 0000000000000246 R12: 000055c307b13c80
      R13: 000055c3084d6320 R14: 0000000000000000 R15: 00007fff6287b950
      Code: 9f 64 84 e8 9c 61 fe ff b8 f4 ff ff ff e9 6b f9 ff
      ff 48 c7 c7 40 6b 8d 84 89 45 a8 e8 43 84 18 00 49 8b be 00 03 00 00 8b
      45 a8 <83> 7f 38 02 74 08 e8 6e ec ff ff 8b 45 a8 49 c7 86 00 03 00 00
      RIP: _request_firmware+0xa27/0xad0 RSP: ffffbb15c0bcbd10
      CR2: 0000000000000038
      ---[ end trace 6d94ac339c133e6f ]---
      
      Fixes: 5d47ec02 ("firmware: Correct handling of fw_state_wait() return value")
      Reported-and-Tested-by: NJakub Kicinski <jakub.kicinski@netronome.com>
      Reported-and-Tested-by: NPatrick Bruenn <p.bruenn@beckhoff.com>
      Reported-by: NChris Wilson <chris@chris-wilson.co.uk>
      CC: <stable@vger.kernel.org>    [3.10+]
      Signed-off-by: NLuis R. Rodriguez <mcgrof@kernel.org>
      Signed-off-by: NGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      191e885a
  9. 09 12月, 2016 1 次提交
  10. 02 12月, 2016 1 次提交