1. 26 11月, 2019 1 次提交
    • L
      vfs: properly and reliably lock f_pos in fdget_pos() · 0be0ee71
      Linus Torvalds 提交于
      fdget_pos() is used by file operations that will read and update f_pos:
      things like "read()", "write()" and "lseek()" (but not, for example,
      "pread()/pwrite" that get their file positions elsewhere).
      
      However, it had two separate escape clauses for this, because not
      everybody wants or needs serialization of the file position.
      
      The first and most obvious case is the "file descriptor doesn't have a
      position at all", ie a stream-like file.  Except we didn't actually use
      FMODE_STREAM, but instead used FMODE_ATOMIC_POS.  The reason for that
      was that FMODE_STREAM didn't exist back in the days, but also that we
      didn't want to mark all the special cases, so we only marked the ones
      that _required_ position atomicity according to POSIX - regular files
      and directories.
      
      The case one was intentionally lazy, but now that we _do_ have
      FMODE_STREAM we could and should just use it.  With the change to use
      FMODE_STREAM, there are no remaining uses for FMODE_ATOMIC_POS, and all
      the code to set it is deleted.
      
      Any cases where we don't want the serialization because the driver (or
      subsystem) doesn't use the file position should just be updated to do
      "stream_open()".  We've done that for all the obvious and common
      situations, we may need a few more.  Quoting Kirill Smelkov in the
      original FMODE_STREAM thread (see link below for full email):
      
       "And I appreciate if people could help at least somehow with "getting
        rid of mixed case entirely" (i.e. always lock f_pos_lock on
        !FMODE_STREAM), because this transition starts to diverge from my
        particular use-case too far. To me it makes sense to do that
        transition as follows:
      
         - convert nonseekable_open -> stream_open via stream_open.cocci;
         - audit other nonseekable_open calls and convert left users that
           truly don't depend on position to stream_open;
         - extend stream_open.cocci to analyze alloc_file_pseudo as well (this
           will cover pipes and sockets), or maybe convert pipes and sockets
           to FMODE_STREAM manually;
         - extend stream_open.cocci to analyze file_operations that use
           no_llseek or noop_llseek, but do not use nonseekable_open or
           alloc_file_pseudo. This might find files that have stream semantic
           but are opened differently;
         - extend stream_open.cocci to analyze file_operations whose
           .read/.write do not use ppos at all (independently of how file was
           opened);
         - ...
         - after that remove FMODE_ATOMIC_POS and always take f_pos_lock if
           !FMODE_STREAM;
         - gather bug reports for deadlocked read/write and convert missed
           cases to FMODE_STREAM, probably extending stream_open.cocci along
           the road to catch similar cases
      
        i.e. always take f_pos_lock unless a file is explicitly marked as
        being stream, and try to find and cover all files that are streams"
      
      We have not done the "extend stream_open.cocci to analyze
      alloc_file_pseudo" as well, but the previous commit did manually handle
      the case of pipes and sockets.
      
      The other case where we can avoid locking f_pos is the "this file
      descriptor only has a single user and it is us, and thus there is no
      need to lock it".
      
      The second test was correct, although a bit subtle and worth just
      re-iterating here.  There are two kinds of other sources of references
      to the same file descriptor: file descriptors that have been explicitly
      shared across fork() or with dup(), and file tables having elevated
      reference counts due to threading (or explicit file sharing with
      clone()).
      
      The first case would have incremented the file count explicitly, and in
      the second case the previous __fdget() would have incremented it for us
      and set the FDPUT_FPUT flag.
      
      But in both cases the file count would be greater than one, so the
      "file_count(file) > 1" test catches both situations.  Also note that if
      file_count is 1, that also means that no other thread can have access to
      the file table, so there also cannot be races with concurrent calls to
      dup()/fork()/clone() that would increment the file count any other way.
      
      Link: https://lore.kernel.org/linux-fsdevel/20190413184404.GA13490@deco.navytux.spb.ru
      Cc: Kirill Smelkov <kirr@nexedi.com>
      Cc: Eic Dumazet <edumazet@google.com>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Alan Stern <stern@rowland.harvard.edu>
      Cc: Marco Elver <elver@google.com>
      Cc: Andrea Parri <parri.andrea@gmail.com>
      Cc: Paul McKenney <paulmck@kernel.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      0be0ee71
  2. 23 11月, 2019 1 次提交
  3. 22 11月, 2019 2 次提交
    • A
      nvme: hwmon: provide temperature min and max values for each sensor · 52deba0f
      Akinobu Mita 提交于
      According to the NVMe specification, the over temperature threshold and
      under temperature threshold features shall be implemented for Composite
      Temperature if a non-zero WCTEMP field value is reported in the Identify
      Controller data structure.  The features are also implemented for all
      implemented temperature sensors (i.e., all Temperature Sensor fields that
      report a non-zero value).
      
      This provides the over temperature threshold and under temperature
      threshold for each sensor as temperature min and max values of hwmon
      sysfs attributes.
      
      The WCTEMP is already provided as a temperature max value for Composite
      Temperature, but this change isn't incompatible.  Because the default
      value of the over temperature threshold for Composite Temperature is
      the WCTEMP.
      
      Now the alarm attribute for Composite Temperature indicates one of the
      temperature is outside of a temperature threshold.  Because there is only
      a single bit in Critical Warning field that indicates a temperature is
      outside of a threshold.
      
      Example output from the "sensors" command:
      
      nvme-pci-0100
      Adapter: PCI adapter
      Composite:    +33.9°C  (low  = -273.1°C, high = +69.8°C)
                             (crit = +79.8°C)
      Sensor 1:     +34.9°C  (low  = -273.1°C, high = +65261.8°C)
      Sensor 2:     +31.9°C  (low  = -273.1°C, high = +65261.8°C)
      Sensor 5:     +47.9°C  (low  = -273.1°C, high = +65261.8°C)
      
      This also adds helper macros for kelvin from/to milli Celsius conversion,
      and replaces the repeated code in hwmon.c.
      
      Cc: Keith Busch <kbusch@kernel.org>
      Cc: Jens Axboe <axboe@fb.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Sagi Grimberg <sagi@grimberg.me>
      Cc: Jean Delvare <jdelvare@suse.com>
      Reviewed-by: NGuenter Roeck <linux@roeck-us.net>
      Tested-by: NGuenter Roeck <linux@roeck-us.net>
      Signed-off-by: NAkinobu Mita <akinobu.mita@gmail.com>
      Signed-off-by: NKeith Busch <kbusch@kernel.org>
      52deba0f
    • K
      block: add iostat counters for flush requests · b6866318
      Konstantin Khlebnikov 提交于
      Requests that triggers flushing volatile writeback cache to disk (barriers)
      have significant effect to overall performance.
      
      Block layer has sophisticated engine for combining several flush requests
      into one. But there is no statistics for actual flushes executed by disk.
      Requests which trigger flushes usually are barriers - zero-size writes.
      
      This patch adds two iostat counters into /sys/class/block/$dev/stat and
      /proc/diskstats - count of completed flush requests and their total time.
      Signed-off-by: NKonstantin Khlebnikov <khlebnikov@yandex-team.ru>
      Signed-off-by: NJens Axboe <axboe@kernel.dk>
      b6866318
  4. 20 11月, 2019 1 次提交
  5. 19 11月, 2019 9 次提交
  6. 18 11月, 2019 5 次提交
  7. 16 11月, 2019 3 次提交
    • D
      mm/memory_hotplug: fix try_offline_node() · 2c91f8fc
      David Hildenbrand 提交于
      try_offline_node() is pretty much broken right now:
      
       - The node span is updated when onlining memory, not when adding it. We
         ignore memory that was mever onlined. Bad.
      
       - We touch possible garbage memmaps. The pfn_to_nid(pfn) can easily
         trigger a kernel panic. Bad for memory that is offline but also bad
         for subsection hotadd with ZONE_DEVICE, whereby the memmap of the
         first PFN of a section might contain garbage.
      
       - Sections belonging to mixed nodes are not properly considered.
      
      As memory blocks might belong to multiple nodes, we would have to walk
      all pageblocks (or at least subsections) within present sections.
      However, we don't have a way to identify whether a memmap that is not
      online was initialized (relevant for ZONE_DEVICE).  This makes things
      more complicated.
      
      Luckily, we can piggy pack on the node span and the nid stored in memory
      blocks.  Currently, the node span is grown when calling
      move_pfn_range_to_zone() - e.g., when onlining memory, and shrunk when
      removing memory, before calling try_offline_node().  Sysfs links are
      created via link_mem_sections(), e.g., during boot or when adding
      memory.
      
      If the node still spans memory or if any memory block belongs to the
      nid, we don't set the node offline.  As memory blocks that span multiple
      nodes cannot get offlined, the nid stored in memory blocks is reliable
      enough (for such online memory blocks, the node still spans the memory).
      
      Introduce for_each_memory_block() to efficiently walk all memory blocks.
      
      Note: We will soon stop shrinking the ZONE_DEVICE zone and the node span
      when removing ZONE_DEVICE memory to fix similar issues (access of
      garbage memmaps) - until we have a reliable way to identify whether
      these memmaps were properly initialized.  This implies later, that once
      a node had ZONE_DEVICE memory, we won't be able to set a node offline -
      which should be acceptable.
      
      Since commit f1dd2cd1 ("mm, memory_hotplug: do not associate
      hotadded memory to zones until online") memory that is added is not
      assoziated with a zone/node (memmap not initialized).  The introducing
      commit 60a5a19e ("memory-hotplug: remove sysfs file of node")
      already missed that we could have multiple nodes for a section and that
      the zone/node span is updated when onlining pages, not when adding them.
      
      I tested this by hotplugging two DIMMs to a memory-less and cpu-less
      NUMA node.  The node is properly onlined when adding the DIMMs.  When
      removing the DIMMs, the node is properly offlined.
      
      Masayoshi Mizuma reported:
      
      : Without this patch, memory hotplug fails as panic:
      :
      :  BUG: kernel NULL pointer dereference, address: 0000000000000000
      :  ...
      :  Call Trace:
      :   remove_memory_block_devices+0x81/0xc0
      :   try_remove_memory+0xb4/0x130
      :   __remove_memory+0xa/0x20
      :   acpi_memory_device_remove+0x84/0x100
      :   acpi_bus_trim+0x57/0x90
      :   acpi_bus_trim+0x2e/0x90
      :   acpi_device_hotplug+0x2b2/0x4d0
      :   acpi_hotplug_work_fn+0x1a/0x30
      :   process_one_work+0x171/0x380
      :   worker_thread+0x49/0x3f0
      :   kthread+0xf8/0x130
      :   ret_from_fork+0x35/0x40
      
      [david@redhat.com: v3]
        Link: http://lkml.kernel.org/r/20191102120221.7553-1-david@redhat.com
      Link: http://lkml.kernel.org/r/20191028105458.28320-1-david@redhat.com
      Fixes: 60a5a19e ("memory-hotplug: remove sysfs file of node")
      Fixes: f1dd2cd1 ("mm, memory_hotplug: do not associate hotadded memory to zones until online") # visiable after d0dc12e8Signed-off-by: NDavid Hildenbrand <david@redhat.com>
      Tested-by: NMasayoshi Mizuma <m.mizuma@jp.fujitsu.com>
      Cc: Tang Chen <tangchen@cn.fujitsu.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: "Rafael J. Wysocki" <rafael@kernel.org>
      Cc: Keith Busch <keith.busch@intel.com>
      Cc: Jiri Olsa <jolsa@kernel.org>
      Cc: "Peter Zijlstra (Intel)" <peterz@infradead.org>
      Cc: Jani Nikula <jani.nikula@intel.com>
      Cc: Nayna Jain <nayna@linux.ibm.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Oscar Salvador <osalvador@suse.de>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Dan Williams <dan.j.williams@intel.com>
      Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: NAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: NLinus Torvalds <torvalds@linux-foundation.org>
      2c91f8fc
    • R
      ptp: Introduce strict checking of external time stamp options. · 6138e687
      Richard Cochran 提交于
      User space may request time stamps on rising edges, falling edges, or
      both.  However, the particular mode may or may not be supported in the
      hardware or in the driver.  This patch adds a "strict" flag that tells
      drivers to ensure that the requested mode will be honored.
      Signed-off-by: NRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      6138e687
    • R
      ptp: Validate requests to enable time stamping of external signals. · cd734d54
      Richard Cochran 提交于
      Commit 41560658 ("PTP: introduce new versions of IOCTLs")
      introduced a new external time stamp ioctl that validates the flags.
      This patch extends the validation to ensure that at least one rising
      or falling edge flag is set when enabling external time stamps.
      Signed-off-by: NRichard Cochran <richardcochran@gmail.com>
      Signed-off-by: NDavid S. Miller <davem@davemloft.net>
      cd734d54
  8. 14 11月, 2019 7 次提交
  9. 13 11月, 2019 7 次提交
  10. 12 11月, 2019 3 次提交
  11. 11 11月, 2019 1 次提交
新手
引导
客服 返回
顶部